creek 1.1.2 → 2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- !binary "U0hBMQ==":
3
- metadata.gz: !binary |-
4
- M2M3NGQxMDJmMTc3NDk5MDUzMjFiNTU4NWI1ODZmMjBiNThkYzgyYg==
5
- data.tar.gz: !binary |-
6
- NWE3YmRhNGI5NTkwMDgzNDFiMDJkMmYzYzI1NjNiYjY2MDE0NmQ0Yw==
2
+ SHA1:
3
+ metadata.gz: 6b7d68233b036517f99988f30405a4508c3b4892
4
+ data.tar.gz: a34206d988e501a0324598bcb2de856957de1ec5
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- ZjdmNjdmOTM1Zjc1OGIyNjI1YWZiNjlmNmJhZDliZDViMDJjODE5NDEzNzJi
10
- NjI5MTkxMTYyNTRhMDhkODkxOGExY2E1MTVlMmZkODEwMTM4M2NlYjgyODI2
11
- MDcxMjFmMzNmZDA2ZWU3MWM3OWJmNWRhMzAyYzgxM2E0NzdkODk=
12
- data.tar.gz: !binary |-
13
- ZmYxNDFiYzU0MDM1Y2FlMjgxZmI4OGI2OTRiYTQzMTI2OGRhMGYxNjlhNTU0
14
- MzFjNTY5MjA5M2ZjY2MyODA3MDljYzYwZTNhNmZjZTAzMGQ4ZDFmMTNjNzU1
15
- MTU5Y2I1M2Q3ZGU3YWE1Y2VjZDVhMDY3NTVlYzc1NmEyNmZlODU=
6
+ metadata.gz: 3b75245668526340173ae7be11e0f5f81a92c5f0da6383e2961fd877fef49a7ec8809e7e97202b11cc978db8e2260fc8c3f43c7ad41da285e3d922dc705e4f79
7
+ data.tar.gz: 639e94e90af3d78d1af3ef6a07e08f79c79f21accf9d1f197976022112ee03c9d12284f0a78d674a946f33bf2d2091ed6c614a4d2c49c53329faed9ba5a4b514
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013 TODO: Write your name
1
+ Copyright (c) 2017 Ramtin Vaziri
2
2
 
3
3
  MIT License
4
4
 
@@ -0,0 +1,117 @@
1
+ # Creek - Stream parser for large Excel (xlsx and xlsm) files.
2
+
3
+ Creek is a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.
4
+
5
+
6
+ ## Installation
7
+
8
+ Creek can be used from the command line or as part of a Ruby web framework. To install the gem using terminal, run the following command:
9
+
10
+ ```
11
+ gem install creek
12
+ ```
13
+
14
+ To use it in Rails, add this line to your Gemfile:
15
+
16
+ ```ruby
17
+ gem 'creek'
18
+ ```
19
+
20
+ ## Basic Usage
21
+ Creek can simply parse an Excel file by looping through the rows enumerator:
22
+
23
+ ```ruby
24
+ require 'creek'
25
+ creek = Creek::Book.new 'specs/fixtures/sample.xlsx'
26
+ sheet= creek.sheets[0]
27
+
28
+ sheet.rows.each do |row|
29
+ puts row # => {"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}
30
+ end
31
+
32
+ sheet.rows_with_meta_data.each do |row|
33
+ puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}}
34
+ end
35
+
36
+ sheet.state # => 'visible'
37
+ sheet.name # => 'Sheet1'
38
+ sheet.rid # => 'rId2'
39
+ ```
40
+
41
+ ## Filename considerations
42
+ By default, Creek will ensure that the file extension is either *.xlsx or *.xlsm, but this check can be circumvented as needed:
43
+
44
+ ```ruby
45
+ path = 'sample-as-zip.zip'
46
+ Creek::Book.new path, :check_file_extension => false
47
+ ```
48
+
49
+ By default, the Rails [file_field_tag](http://api.rubyonrails.org/classes/ActionView/Helpers/FormTagHelper.html#method-i-file_field_tag) uploads to a temporary location and stores the original filename with the StringIO object. (See [this section](http://guides.rubyonrails.org/form_helpers.html#uploading-files) of the Rails Guides for more information.)
50
+
51
+ Creek can parse this directly without the need for file upload gems such as Carrierwave or Paperclip by passing the original filename as an option:
52
+
53
+ ```ruby
54
+ # Import endpoint in Rails controller
55
+ def import
56
+ file = params[:file]
57
+ Creek::Book.new file.path, check_file_extension: false
58
+ end
59
+ ```
60
+
61
+ ## Parsing images
62
+ Creek does not parse images by default. If you want to parse the images,
63
+ use `with_images` method before iterating over rows to preload images information. If you don't call this method, Creek will not return images anywhere.
64
+
65
+ Cells with images will be an array of Pathname objects.
66
+ If an image is spread across multiple cells, same Pathname object will be returned for each cell.
67
+
68
+ ```ruby
69
+ sheet.with_images.rows.each do |row|
70
+ puts row # => {"A1"=>[#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>], "B2"=>"Fluffy"}
71
+ end
72
+ ```
73
+
74
+ Images for a specific cell can be obtained with images_at method:
75
+
76
+ ```ruby
77
+ puts sheet.images_at('A1') # => [#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>]
78
+
79
+ # no images in a cell
80
+ puts sheet.images_at('C1') # => nil
81
+ ```
82
+
83
+ Creek will most likely return nil for a cell with images if there is no other text cell in that row - you can use *images_at* method for retrieving images in that cell.
84
+
85
+ ## Remote files
86
+
87
+ ```ruby
88
+ remote_url = 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
89
+ Creek::Book.new remote_url, remote: true
90
+ ```
91
+
92
+ ## Contributing
93
+
94
+ Contributions are welcomed. You can fork a repository, add your code changes to the forked branch, ensure all existing unit tests pass, create new unit tests which cover your new changes and finally create a pull request.
95
+
96
+ After forking and then cloning the repository locally, install the Bundler and then use it
97
+ to install the development gem dependencies:
98
+
99
+ ```
100
+ gem install bundler
101
+ bundle install
102
+ ```
103
+
104
+ Once this is complete, you should be able to run the test suite:
105
+
106
+ ```
107
+ rake
108
+ ```
109
+
110
+ ## Bug Reporting
111
+
112
+ Please use the [Issues](https://github.com/pythonicrubyist/creek/issues) page to report bugs or suggest new enhancements.
113
+
114
+
115
+ ## License
116
+
117
+ Creek has been published under [MIT License](https://github.com/pythonicrubyist/creek/blob/master/LICENSE.txt)
@@ -18,13 +18,14 @@ Gem::Specification.new do |spec|
18
18
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
19
  spec.require_paths = ["lib"]
20
20
 
21
- spec.required_ruby_version = '>= 1.9.2'
21
+ spec.required_ruby_version = '>= 2.0.0'
22
22
 
23
23
  spec.add_development_dependency "bundler", "~> 1.3"
24
24
  spec.add_development_dependency "rake"
25
- spec.add_development_dependency 'rspec', '~> 2.13.0'
25
+ spec.add_development_dependency 'rspec', '~> 3.6.0'
26
26
  spec.add_development_dependency 'pry'
27
27
 
28
- spec.add_dependency 'nokogiri', '~> 1.6.0'
28
+ spec.add_dependency 'nokogiri', '~> 1.7.0'
29
29
  spec.add_dependency 'rubyzip', '>= 1.0.0'
30
+ spec.add_dependency 'httparty', '~> 0.15.5'
30
31
  end
@@ -1,9 +1,11 @@
1
- require "creek/version"
1
+ require 'creek/version'
2
2
  require 'creek/book'
3
3
  require 'creek/styles/constants'
4
4
  require 'creek/styles/style_types'
5
5
  require 'creek/styles/converter'
6
+ require 'creek/utils'
6
7
  require 'creek/styles'
8
+ require 'creek/drawing'
7
9
  require 'creek/sheet'
8
10
  require 'creek/shared_strings'
9
11
 
@@ -1,6 +1,7 @@
1
1
  require 'zip/filesystem'
2
2
  require 'nokogiri'
3
3
  require 'date'
4
+ require 'httparty'
4
5
 
5
6
  module Creek
6
7
 
@@ -19,16 +20,32 @@ module Creek
19
20
  extension = File.extname(options[:original_filename] || path).downcase
20
21
  raise 'Not a valid file format.' unless (['.xlsx', '.xlsm'].include? extension)
21
22
  end
22
- @files = Zip::File.open path
23
+ if options[:remote]
24
+ zipfile = Tempfile.new("file")
25
+ zipfile.binmode
26
+ zipfile.write(HTTParty.get(path).body)
27
+ zipfile.close
28
+ path = zipfile.path
29
+ end
30
+ @files = Zip::File.open(path)
23
31
  @shared_strings = SharedStrings.new(self)
24
32
  end
25
33
 
26
34
  def sheets
27
35
  doc = @files.file.open "xl/workbook.xml"
28
36
  xml = Nokogiri::XML::Document.parse doc
37
+ namespaces = xml.namespaces
38
+
39
+ cssPrefix = ''
40
+ namespaces.each do |namespace|
41
+ if namespace[1] == 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' && namespace[0] != 'xmlns' then
42
+ cssPrefix = namespace[0].split(':')[1]+'|'
43
+ end
44
+ end
45
+
29
46
  rels_doc = @files.file.open "xl/_rels/workbook.xml.rels"
30
47
  rels = Nokogiri::XML::Document.parse(rels_doc).css("Relationship")
31
- @sheets = xml.css('sheet').map do |sheet|
48
+ @sheets = xml.css(cssPrefix+'sheet').map do |sheet|
32
49
  sheetfile = rels.find { |el| sheet.attr("r:id") == el.attr("Id") }.attr("Target")
33
50
  Sheet.new(self, sheet.attr("name"), sheet.attr("sheetid"), sheet.attr("state"), sheet.attr("visible"), sheet.attr("r:id"), sheetfile)
34
51
  end
@@ -0,0 +1,109 @@
1
+ require 'pathname'
2
+
3
+ module Creek
4
+ class Creek::Drawing
5
+ include Creek::Utils
6
+
7
+ COLUMNS = ('A'..'AZ').to_a
8
+
9
+ def initialize(book, drawing_filepath)
10
+ @book = book
11
+ @drawing_filepath = drawing_filepath
12
+ @drawings = []
13
+ @drawings_rels = []
14
+ @images_pathnames = Hash.new { |hash, key| hash[key] = [] }
15
+
16
+ if file_exist?(@drawing_filepath)
17
+ load_drawings_and_rels
18
+ load_images_pathnames_by_cells if has_images?
19
+ end
20
+ end
21
+
22
+ ##
23
+ # Returns false if there are no images in the drawing file or the drawing file does not exist, true otherwise.
24
+ def has_images?
25
+ @has_images ||= !@drawings.empty?
26
+ end
27
+
28
+ ##
29
+ # Extracts images from excel to tmpdir for a cell, if the images are not already extracted (multiple calls or same image file in multiple cells).
30
+ # Returns array of images as Pathname objects or nil.
31
+ def images_at(cell_name)
32
+ coordinate = calc_coordinate(cell_name)
33
+ pathnames_at_coordinate = @images_pathnames[coordinate]
34
+ return if pathnames_at_coordinate.empty?
35
+
36
+ pathnames_at_coordinate.map do |image_pathname|
37
+ if image_pathname.exist?
38
+ image_pathname
39
+ else
40
+ excel_image_path = "xl/media#{image_pathname.to_path.split(tmpdir).last}"
41
+ IO.copy_stream(@book.files.file.open(excel_image_path), image_pathname.to_path)
42
+ image_pathname
43
+ end
44
+ end
45
+ end
46
+
47
+ private
48
+
49
+ ##
50
+ # Transforms cell name to [row, col], e.g. A1 => [0, 0], B3 => [1, 2]
51
+ # Rows and cols start with 0.
52
+ def calc_coordinate(cell_name)
53
+ col = COLUMNS.index(cell_name.slice /[A-Z]+/)
54
+ row = (cell_name.slice /\d+/).to_i - 1 # rows in drawings start with 0
55
+ [row, col]
56
+ end
57
+
58
+ ##
59
+ # Creates/loads temporary directory for extracting images from excel
60
+ def tmpdir
61
+ @tmpdir ||= ::Dir.mktmpdir('creek__drawing')
62
+ end
63
+
64
+ ##
65
+ # Parses drawing and drawing's relationships xmls.
66
+ # Drawing xml contains relationships ID's and coordinates (row, col).
67
+ # Drawing relationships xml contains images' locations.
68
+ def load_drawings_and_rels
69
+ @drawings = parse_xml(@drawing_filepath).css('xdr|twoCellAnchor')
70
+ drawing_rels_filepath = expand_to_rels_path(@drawing_filepath)
71
+ @drawings_rels = parse_xml(drawing_rels_filepath).css('Relationships')
72
+ end
73
+
74
+ ##
75
+ # Iterates through the drawings and saves images' paths as Pathname objects to a hash with [row, col] keys.
76
+ # As multiple images can be located in a single cell, hash values are array of Pathname objects.
77
+ # One image can be spread across multiple cells (defined with from-row/to-row/from-col/to-col attributes) - same Pathname object is associated to each row-col combination for the range.
78
+ def load_images_pathnames_by_cells
79
+ image_selector = 'xdr:pic/xdr:blipFill/a:blip'.freeze
80
+ row_from_selector = 'xdr:from/xdr:row'.freeze
81
+ row_to_selector = 'xdr:to/xdr:row'.freeze
82
+ col_from_selector = 'xdr:from/xdr:col'.freeze
83
+ col_to_selector = 'xdr:to/xdr:col'.freeze
84
+
85
+ @drawings.xpath('//xdr:twoCellAnchor').each do |drawing|
86
+ embed = drawing.xpath(image_selector).first.attributes['embed']
87
+ next if embed.nil?
88
+
89
+ rid = embed.value
90
+ path = Pathname.new("#{tmpdir}/#{extract_drawing_path(rid).slice(/[^\/]*$/)}")
91
+
92
+ row_from = drawing.xpath(row_from_selector).text.to_i
93
+ col_from = drawing.xpath(col_from_selector).text.to_i
94
+ row_to = drawing.xpath(row_to_selector).text.to_i
95
+ col_to = drawing.xpath(col_to_selector).text.to_i
96
+
97
+ (col_from..col_to).each do |col|
98
+ (row_from..row_to).each do |row|
99
+ @images_pathnames[[row, col]].push(path)
100
+ end
101
+ end
102
+ end
103
+ end
104
+
105
+ def extract_drawing_path(rid)
106
+ @drawings_rels.css("Relationship[@Id=#{rid}]").first.attributes['Target'].value
107
+ end
108
+ end
109
+ end
@@ -3,6 +3,7 @@ require 'nokogiri'
3
3
 
4
4
  module Creek
5
5
  class Creek::Sheet
6
+ include Creek::Utils
6
7
 
7
8
  attr_reader :book,
8
9
  :name,
@@ -21,6 +22,28 @@ module Creek
21
22
  @rid = rid
22
23
  @state = state
23
24
  @sheetfile = sheetfile
25
+ @images_present = false
26
+ end
27
+
28
+ ##
29
+ # Preloads images info (coordinates and paths) from related drawing.xml and drawing rels.
30
+ # Must be called before #rows method if you want to have images included.
31
+ # Returns self so you can chain the calls (sheet.with_images.rows).
32
+ def with_images
33
+ @drawingfile = extract_drawing_filepath
34
+ if @drawingfile
35
+ @drawing = Creek::Drawing.new(@book, @drawingfile.sub('..', 'xl'))
36
+ @images_present = @drawing.has_images?
37
+ end
38
+ self
39
+ end
40
+
41
+ ##
42
+ # Extracts images for a cell to a temporary folder.
43
+ # Returns array of Pathnames for the cell.
44
+ # Returns nil if images asre not found for the cell or images were not preloaded with #with_images.
45
+ def images_at(cell)
46
+ @drawing.images_at(cell) if @images_present
24
47
  end
25
48
 
26
49
  ##
@@ -43,7 +66,7 @@ module Creek
43
66
  # Returns a hash per row that includes the cell ids and values.
44
67
  # Empty cells will be also included in the hash with a nil value.
45
68
  def rows_generator include_meta_data=false
46
- path = "xl/#{@sheetfile}"
69
+ path = if @sheetfile.start_with? "/xl/" or @sheetfile.start_with? "xl/" then @sheetfile else "xl/#{@sheetfile}" end
47
70
  if @book.files.file.exist?(path)
48
71
  # SAX parsing, Each element in the stream comes through as two events:
49
72
  # one to open the element and one to close it.
@@ -62,6 +85,14 @@ module Creek
62
85
  y << (include_meta_data ? row : cells) if node.self_closing?
63
86
  elsif (node.name.eql? 'row') and (node.node_type.eql? closer)
64
87
  processed_cells = fill_in_empty_cells(cells, row['r'], cell)
88
+
89
+ if @images_present
90
+ processed_cells.each do |cell_name, cell_value|
91
+ next unless cell_value.nil?
92
+ processed_cells[cell_name] = images_at(cell_name)
93
+ end
94
+ end
95
+
65
96
  row['cells'] = processed_cells
66
97
  y << (include_meta_data ? row : processed_cells)
67
98
  elsif (node.name.eql? 'c') and (node.node_type.eql? opener)
@@ -72,6 +103,10 @@ module Creek
72
103
  unless cell.nil?
73
104
  cells[cell] = convert(node.inner_xml, cell_type, cell_style_idx)
74
105
  end
106
+ elsif (node.name.eql? 't') and (node.node_type.eql? opener)
107
+ unless cell.nil?
108
+ cells[cell] = convert(node.inner_xml, cell_type, cell_style_idx)
109
+ end
75
110
  end
76
111
  end
77
112
  end
@@ -108,5 +143,22 @@ module Creek
108
143
 
109
144
  new_cells
110
145
  end
146
+
147
+ ##
148
+ # Find drawing filepath for the current sheet.
149
+ # Sheet xml contains drawing relationship ID.
150
+ # Sheet relationships xml contains drawing file's location.
151
+ def extract_drawing_filepath
152
+ # Read drawing relationship ID from the sheet.
153
+ sheet_filepath = "xl/#{@sheetfile}"
154
+ drawing = parse_xml(sheet_filepath).css('drawing').first
155
+ return if drawing.nil?
156
+
157
+ drawing_rid = drawing.attributes['id'].value
158
+
159
+ # Read sheet rels to find drawing file's location.
160
+ sheet_rels_filepath = expand_to_rels_path(sheet_filepath)
161
+ parse_xml(sheet_rels_filepath).css("Relationship[@Id='#{drawing_rid}']").first.attributes['Target'].value
162
+ end
111
163
  end
112
- end
164
+ end
@@ -58,10 +58,8 @@ module Creek
58
58
  value
59
59
  when :fixnum
60
60
  value.to_i
61
- when :float
61
+ when :float, :percentage
62
62
  value.to_f
63
- when :percentage
64
- value.to_f / 100
65
63
  when :date, :time, :date_time
66
64
  convert_date(value, options)
67
65
  when :bignum
@@ -0,0 +1,16 @@
1
+ module Creek
2
+ module Utils
3
+ def expand_to_rels_path(filepath)
4
+ filepath.sub(/(\/[^\/]+$)/, '/_rels\1.rels')
5
+ end
6
+
7
+ def file_exist?(path)
8
+ @book.files.file.exist?(path)
9
+ end
10
+
11
+ def parse_xml(xml_path)
12
+ doc = @book.files.file.open(xml_path)
13
+ Nokogiri::XML::Document.parse(doc)
14
+ end
15
+ end
16
+ end
@@ -1,3 +1,3 @@
1
1
  module Creek
2
- VERSION = "1.1.2"
2
+ VERSION = "2.0"
3
3
  end
@@ -0,0 +1,52 @@
1
+ require './spec/spec_helper'
2
+
3
+ describe 'drawing' do
4
+ let(:book) { Creek::Book.new('spec/fixtures/sample-with-images.xlsx') }
5
+ let(:book_no_images) { Creek::Book.new('spec/fixtures/sample.xlsx') }
6
+ let(:drawingfile) { 'xl/drawings/drawing1.xml' }
7
+ let(:drawing) { Creek::Drawing.new(book, drawingfile) }
8
+ let(:drawing_without_images) { Creek::Drawing.new(book_no_images, drawingfile) }
9
+
10
+ describe '#has_images?' do
11
+ it 'has' do
12
+ expect(drawing.has_images?).to eq(true)
13
+ end
14
+
15
+ it 'does not have' do
16
+ expect(drawing_without_images.has_images?).to eq(false)
17
+ end
18
+ end
19
+
20
+ describe '#images_at' do
21
+ it 'returns images pathnames at cell' do
22
+ image = drawing.images_at('A2')[0]
23
+ expect(image.class).to eq(Pathname)
24
+ expect(image.exist?).to eq(true)
25
+ expect(image.to_path).to match(/.+creek__drawing.+\.jpeg$/)
26
+ end
27
+
28
+ context 'when no images in cell' do
29
+ it 'returns nil' do
30
+ images = drawing.images_at('B2')
31
+ expect(images).to eq(nil)
32
+ end
33
+ end
34
+
35
+ context 'when more images in one cell' do
36
+ it 'returns all images at cell' do
37
+ images = drawing.images_at('A10')
38
+ expect(images.size).to eq(2)
39
+ expect(images.all?(&:exist?)).to eq(true)
40
+ end
41
+ end
42
+
43
+ context 'when same image across multiple cells' do
44
+ it 'returns same image for each cell' do
45
+ image1 = drawing.images_at('A4')[0]
46
+ image2 = drawing.images_at('A5')[0]
47
+ expect(image1.class).to eq(Pathname)
48
+ expect(image1).to eq(image2)
49
+ end
50
+ end
51
+ end
52
+ end
Binary file
@@ -7,12 +7,12 @@ describe 'shared strings' do
7
7
  doc = Nokogiri::XML(shared_strings_xml_file)
8
8
  dictionary = Creek::SharedStrings.parse_shared_string_from_document(doc)
9
9
 
10
- dictionary.keys.size.should == 5
11
- dictionary[0].should == 'Cell A1'
12
- dictionary[1].should == 'Cell B1'
13
- dictionary[2].should == 'My Cell'
14
- dictionary[3].should == 'Cell A2'
15
- dictionary[4].should == 'Cell B2'
10
+ expect(dictionary.keys.size).to eq(5)
11
+ expect(dictionary[0]).to eq('Cell A1')
12
+ expect(dictionary[1]).to eq('Cell B1')
13
+ expect(dictionary[2]).to eq('My Cell')
14
+ expect(dictionary[3]).to eq('Cell A2')
15
+ expect(dictionary[4]).to eq('Cell B2')
16
16
  end
17
17
 
18
18
  end
@@ -0,0 +1,85 @@
1
+ require './spec/spec_helper'
2
+
3
+ describe 'sheet' do
4
+ let(:book_with_images) { Creek::Book.new('spec/fixtures/sample-with-images.xlsx') }
5
+ let(:book_no_images) { Creek::Book.new('spec/fixtures/sample.xlsx') }
6
+ let(:sheetfile) { 'worksheets/sheet1.xml' }
7
+ let(:sheet_with_images) { Creek::Sheet.new(book_with_images, 'Sheet 1', 1, '', '', '1', sheetfile) }
8
+ let(:sheet_no_images) { Creek::Sheet.new(book_no_images, 'Sheet 1', 1, '', '', '1', sheetfile) }
9
+
10
+ def load_cell(rows, cell_name)
11
+ cell = rows.find { |row| !row[cell_name].nil? }
12
+ cell[cell_name] if cell
13
+ end
14
+
15
+ describe '#rows' do
16
+ context 'with excel with images' do
17
+ context 'with images preloading' do
18
+ let(:rows) { sheet_with_images.with_images.rows.map { |r| r } }
19
+
20
+ it 'parses single image in a cell' do
21
+ expect(load_cell(rows, 'A2').size).to eq(1)
22
+ end
23
+
24
+ it 'returns nil for cells without images' do
25
+ expect(load_cell(rows, 'A3')).to eq(nil)
26
+ expect(load_cell(rows, 'A7')).to eq(nil)
27
+ expect(load_cell(rows, 'A9')).to eq(nil)
28
+ end
29
+
30
+ it 'returns nil for merged cell within empty row' do
31
+ expect(load_cell(rows, 'A5')).to eq(nil)
32
+ end
33
+
34
+ it 'returns nil for image in a cell with empty row' do
35
+ expect(load_cell(rows, 'A8')).to eq(nil)
36
+ end
37
+
38
+ it 'returns images for merged cells' do
39
+ expect(load_cell(rows, 'A4').size).to eq(1)
40
+ expect(load_cell(rows, 'A6').size).to eq(1)
41
+ end
42
+
43
+ it 'returns multiple images' do
44
+ expect(load_cell(rows, 'A10').size).to eq(2)
45
+ end
46
+ end
47
+
48
+ it 'ignores images' do
49
+ rows = sheet_with_images.rows.map { |r| r }
50
+ expect(load_cell(rows, 'A2')).to eq(nil)
51
+ expect(load_cell(rows, 'A3')).to eq(nil)
52
+ expect(load_cell(rows, 'A4')).to eq(nil)
53
+ end
54
+ end
55
+
56
+ context 'with excel without images' do
57
+ it 'does not break on with_images' do
58
+ rows = sheet_no_images.with_images.rows.map { |r| r }
59
+ expect(load_cell(rows, 'A10')).to eq(0.15)
60
+ end
61
+ end
62
+ end
63
+
64
+ describe '#images_at' do
65
+ it 'returns images for merged cell' do
66
+ image = sheet_with_images.with_images.images_at('A5')[0]
67
+ expect(image.class).to eq(Pathname)
68
+ end
69
+
70
+ it 'returns images for empty row' do
71
+ image = sheet_with_images.with_images.images_at('A8')[0]
72
+ expect(image.class).to eq(Pathname)
73
+ end
74
+
75
+ it 'returns nil for empty cell' do
76
+ image = sheet_with_images.with_images.images_at('B3')
77
+ expect(image).to eq(nil)
78
+ end
79
+
80
+ it 'returns nil for empty cell without preloading images' do
81
+ image = sheet_with_images.images_at('B3')
82
+ expect(image).to eq(nil)
83
+ end
84
+ end
85
+ end
@@ -9,7 +9,7 @@ describe Creek::Styles::Converter do
9
9
 
10
10
  describe :date_time do
11
11
  it "works" do
12
- convert('41275', 'n', :date_time).should == Date.new(2013,01,01)
12
+ expect(convert('41275', 'n', :date_time)).to eq(Date.new(2013,01,01))
13
13
  end
14
14
  end
15
15
  end
@@ -7,9 +7,9 @@ describe Creek::Styles::StyleTypes do
7
7
  xml_file = File.open('spec/fixtures/styles/first.xml')
8
8
  doc = Nokogiri::XML(xml_file)
9
9
  res = Creek::Styles::StyleTypes.new(doc).call
10
- res.size.should == 8
11
- res[3].should == :date_time
12
- res.should == [:unsupported, :unsupported, :unsupported, :date_time, :unsupported, :unsupported, :unsupported, :unsupported]
10
+ expect(res.size).to eq(8)
11
+ expect(res[3]).to eq(:date_time)
12
+ expect(res).to eq([:unsupported, :unsupported, :unsupported, :date_time, :unsupported, :unsupported, :unsupported, :unsupported])
13
13
  end
14
14
  end
15
15
  end
@@ -2,25 +2,38 @@ require './spec/spec_helper'
2
2
 
3
3
  describe 'Creek trying to parsing an invalid file.' do
4
4
  it 'Fail to open a legacy xls file.' do
5
- lambda { Creek::Book.new 'spec/fixtures/invalid.xls' }.should raise_error 'Not a valid file format.'
5
+ expect { Creek::Book.new 'spec/fixtures/invalid.xls' }
6
+ .to raise_error 'Not a valid file format.'
6
7
  end
7
8
 
8
9
  it 'Ignore file extensions on request.' do
9
10
  path = 'spec/fixtures/sample-as-zip.zip'
10
- lambda { Creek::Book.new path, :check_file_extension => false }.should_not raise_error
11
+ expect { Creek::Book.new path, check_file_extension: false }
12
+ .not_to raise_error
11
13
  end
12
14
 
13
15
  it 'Check file extension when requested.' do
14
- open_book = lambda { Creek::Book.new 'spec/fixtures/invalid.xls', :check_file_extension => true }
15
- open_book.should raise_error 'Not a valid file format.'
16
+ expect { Creek::Book.new 'spec/fixtures/invalid.xls', check_file_extension: true }
17
+ .to raise_error 'Not a valid file format.'
18
+ end
19
+
20
+ it 'Fail to open remote file' do
21
+ expect { Creek::Book.new 'http://dev-builds.libreoffice.org/tmp/test.xlsx' }
22
+ .to raise_error(Zip::Error, /not found/)
23
+ end
24
+
25
+ it 'Opens remote file with remote flag' do
26
+ expect { Creek::Book.new 'http://dev-builds.libreoffice.org/tmp/test.xlsx', remote: true }
27
+ .not_to raise_error
16
28
  end
17
29
 
18
30
  it 'Check file extension of original_filename if passed.' do
19
31
  path = 'spec/fixtures/temp_string_io_file_path_with_no_extension'
20
- lambda { Creek::Book.new path, :original_filename => 'invalid.xls' }.should raise_error 'Not a valid file format.'
21
- lambda { Creek::Book.new path, :original_filename => 'valid.xlsx' }.should_not raise_error
32
+ expect { Creek::Book.new path, :original_filename => 'invalid.xls' }
33
+ .to raise_error 'Not a valid file format.'
34
+ expect { Creek::Book.new path, :original_filename => 'valid.xlsx' }
35
+ .not_to raise_error
22
36
  end
23
-
24
37
  end
25
38
 
26
39
  describe 'Creek parsing a sample XLSX file' do
@@ -32,7 +45,8 @@ describe 'Creek parsing a sample XLSX file' do
32
45
  {'A4'=>'Content 7', 'B4'=>'Content 8', 'C4'=>'Content 9', 'D4'=>'Content 10', 'E4'=>'Content 11', 'F4'=>'Content 12'},
33
46
  {'A5'=>nil, 'B5'=>nil, 'C5'=>nil, 'D5'=>nil, 'E5'=>nil, 'F5'=>nil, 'G5'=>nil, 'H5'=>nil, 'I5'=>nil, 'J5'=>nil, 'K5'=>nil, 'L5'=>nil, 'M5'=>nil, 'N5'=>nil, 'O5'=>nil, 'P5'=>nil, 'Q5'=>nil, 'R5'=>nil, 'S5'=>nil, 'T5'=>nil, 'U5'=>nil, 'V5'=>nil, 'W5'=>nil, 'X5'=>nil, 'Y5'=>nil, 'Z5'=>'Z Content', 'AA5'=>nil, 'AB5'=>nil, 'AC5'=>nil, 'AD5'=>nil, 'AE5'=>nil, 'AF5'=>nil, 'AG5'=>nil, 'AH5'=>nil, 'AI5'=>nil, 'AJ5'=>nil, 'AK5'=>nil, 'AL5'=>nil, 'AM5'=>nil, 'AN5'=>nil, 'AO5'=>nil, 'AP5'=>nil, 'AQ5'=>nil, 'AR5'=>nil, 'AS5'=>nil, 'AT5'=>nil, 'AU5'=>nil, 'AV5'=>nil, 'AW5'=>nil, 'AX5'=>nil, 'AY5'=>nil, 'AZ5'=>'Content 13'},
34
47
  {'A6'=>'1', 'B6'=>'2', 'C6'=>'3'}, {'A7'=>'Content 15', 'B7'=>'Content 16', 'C7'=>'Content 18', 'D7'=>'Content 19'},
35
- {'A8'=>nil, 'B8'=>'Content 20', 'C8'=>nil, 'D8'=>nil, 'E8'=>nil, 'F8'=>'Content 21'}]
48
+ {'A8'=>nil, 'B8'=>'Content 20', 'C8'=>nil, 'D8'=>nil, 'E8'=>nil, 'F8'=>'Content 21'},
49
+ {'A10' => 0.15, 'B10' => 0.15}]
36
50
  end
37
51
 
38
52
  after(:all) do
@@ -40,15 +54,15 @@ describe 'Creek parsing a sample XLSX file' do
40
54
  end
41
55
 
42
56
  it 'open an XLSX file successfully.' do
43
- @creek.should_not be_nil
57
+ expect(@creek).not_to be_nil
44
58
  end
45
59
 
46
60
  it 'find sheets successfully.' do
47
- @creek.sheets.count.should == 1
61
+ expect(@creek.sheets.count).to eq(1)
48
62
  sheet = @creek.sheets.first
49
- sheet.state.should eql nil
50
- sheet.name.should eql 'Sheet1'
51
- sheet.rid.should eql 'rId1'
63
+ expect(sheet.state).to eql nil
64
+ expect(sheet.name).to eql 'Sheet1'
65
+ expect(sheet.rid).to eql 'rId1'
52
66
  end
53
67
 
54
68
  it 'Parse rows with empty cells successfully.' do
@@ -59,15 +73,16 @@ describe 'Creek parsing a sample XLSX file' do
59
73
  row_count += 1
60
74
  end
61
75
 
62
- rows[0].should == @expected_rows[0]
63
- rows[1].should == @expected_rows[1]
64
- rows[2].should == @expected_rows[2]
65
- rows[3].should == @expected_rows[3]
66
- rows[4].should == @expected_rows[4]
67
- rows[5].should == @expected_rows[5]
68
- rows[6].should == @expected_rows[6]
69
- rows[7].should == @expected_rows[7]
70
- row_count.should == 8
76
+ expect(rows[0]).to eq(@expected_rows[0])
77
+ expect(rows[1]).to eq(@expected_rows[1])
78
+ expect(rows[2]).to eq(@expected_rows[2])
79
+ expect(rows[3]).to eq(@expected_rows[3])
80
+ expect(rows[4]).to eq(@expected_rows[4])
81
+ expect(rows[5]).to eq(@expected_rows[5])
82
+ expect(rows[6]).to eq(@expected_rows[6])
83
+ expect(rows[7]).to eq(@expected_rows[7])
84
+ expect(rows[8]).to eq(@expected_rows[8])
85
+ expect(row_count).to eq(9)
71
86
  end
72
87
 
73
88
  it 'Parse rows with empty cells and meta data successfully.' do
@@ -77,6 +92,6 @@ describe 'Creek parsing a sample XLSX file' do
77
92
  rows << row
78
93
  row_count += 1
79
94
  end
80
- rows.map{|r| r['cells']}.should == @expected_rows
95
+ expect(rows.map{|r| r['cells']}).to eq(@expected_rows)
81
96
  end
82
97
  end
metadata CHANGED
@@ -1,99 +1,113 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: creek
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.2
4
+ version: '2.0'
5
5
  platform: ruby
6
6
  authors:
7
7
  - pythonicrubyist
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-04-21 00:00:00.000000000 Z
11
+ date: 2017-06-14 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - "~>"
18
18
  - !ruby/object:Gem::Version
19
19
  version: '1.3'
20
20
  type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1.3'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rake
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ! '>='
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '0'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ! '>='
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: rspec
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ~>
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: 2.13.0
47
+ version: 3.6.0
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: 2.13.0
54
+ version: 3.6.0
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: pry
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - ! '>='
59
+ - - ">="
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - ! '>='
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: nokogiri
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - ~>
73
+ - - "~>"
74
74
  - !ruby/object:Gem::Version
75
- version: 1.6.0
75
+ version: 1.7.0
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - ~>
80
+ - - "~>"
81
81
  - !ruby/object:Gem::Version
82
- version: 1.6.0
82
+ version: 1.7.0
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: rubyzip
85
85
  requirement: !ruby/object:Gem::Requirement
86
86
  requirements:
87
- - - ! '>='
87
+ - - ">="
88
88
  - !ruby/object:Gem::Version
89
89
  version: 1.0.0
90
90
  type: :runtime
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
- - - ! '>='
94
+ - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: 1.0.0
97
+ - !ruby/object:Gem::Dependency
98
+ name: httparty
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: 0.15.5
104
+ type: :runtime
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: 0.15.5
97
111
  description: A Ruby gem that streams and parses large Excel(xlsx and xlsm) files fast
98
112
  and efficiently.
99
113
  email:
@@ -102,29 +116,34 @@ executables: []
102
116
  extensions: []
103
117
  extra_rdoc_files: []
104
118
  files:
105
- - .gitignore
119
+ - ".gitignore"
106
120
  - Gemfile
107
121
  - LICENSE.txt
108
- - README.rdoc
122
+ - README.md
109
123
  - Rakefile
110
124
  - creek.gemspec
111
125
  - lib/creek.rb
112
126
  - lib/creek/book.rb
127
+ - lib/creek/drawing.rb
113
128
  - lib/creek/shared_strings.rb
114
129
  - lib/creek/sheet.rb
115
130
  - lib/creek/styles.rb
116
131
  - lib/creek/styles/constants.rb
117
132
  - lib/creek/styles/converter.rb
118
133
  - lib/creek/styles/style_types.rb
134
+ - lib/creek/utils.rb
119
135
  - lib/creek/version.rb
136
+ - spec/drawing_spec.rb
120
137
  - spec/fixtures/invalid.xls
121
138
  - spec/fixtures/sample-as-zip.zip
139
+ - spec/fixtures/sample-with-images.xlsx
122
140
  - spec/fixtures/sample.xlsx
123
141
  - spec/fixtures/sheets/sheet1.xml
124
142
  - spec/fixtures/sst.xml
125
143
  - spec/fixtures/styles/first.xml
126
144
  - spec/fixtures/temp_string_io_file_path_with_no_extension
127
145
  - spec/shared_string_spec.rb
146
+ - spec/sheet_spec.rb
128
147
  - spec/spec_helper.rb
129
148
  - spec/styles/converter_spec.rb
130
149
  - spec/styles/style_types_spec.rb
@@ -139,29 +158,32 @@ require_paths:
139
158
  - lib
140
159
  required_ruby_version: !ruby/object:Gem::Requirement
141
160
  requirements:
142
- - - ! '>='
161
+ - - ">="
143
162
  - !ruby/object:Gem::Version
144
- version: 1.9.2
163
+ version: 2.0.0
145
164
  required_rubygems_version: !ruby/object:Gem::Requirement
146
165
  requirements:
147
- - - ! '>='
166
+ - - ">="
148
167
  - !ruby/object:Gem::Version
149
168
  version: '0'
150
169
  requirements: []
151
170
  rubyforge_project:
152
- rubygems_version: 2.4.3
171
+ rubygems_version: 2.6.12
153
172
  signing_key:
154
173
  specification_version: 4
155
174
  summary: A Ruby gem for parsing large Excel(xlsx and xlsm) files.
156
175
  test_files:
176
+ - spec/drawing_spec.rb
157
177
  - spec/fixtures/invalid.xls
158
178
  - spec/fixtures/sample-as-zip.zip
179
+ - spec/fixtures/sample-with-images.xlsx
159
180
  - spec/fixtures/sample.xlsx
160
181
  - spec/fixtures/sheets/sheet1.xml
161
182
  - spec/fixtures/sst.xml
162
183
  - spec/fixtures/styles/first.xml
163
184
  - spec/fixtures/temp_string_io_file_path_with_no_extension
164
185
  - spec/shared_string_spec.rb
186
+ - spec/sheet_spec.rb
165
187
  - spec/spec_helper.rb
166
188
  - spec/styles/converter_spec.rb
167
189
  - spec/styles/style_types_spec.rb
@@ -1,76 +0,0 @@
1
- = Creek -- Stream parser for large Excel(xlsx and xlsm) files.
2
-
3
- Creek is a Ruby gem that provide a fast, simple and efficient method of parsing large Excel(xlsx and xlsm) files.
4
-
5
-
6
- == Installation
7
-
8
- Creek can be used from the command line or as part of a Ruby web framework. To install the gem using terminal, run the following command:
9
-
10
- gem install creek
11
-
12
- To use it in Rails, add this line to your Gemfile:
13
-
14
- gem "creek"
15
-
16
-
17
- == Basic Usage
18
- Creek can simply parse an Excel file by looping through the rows enumerator:
19
-
20
- require 'creek'
21
- creek = Creek::Book.new "specs/fixtures/sample.xlsx"
22
- sheet= creek.sheets[0]
23
-
24
- sheet.rows.each do |row|
25
- puts row # => {"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}
26
- end
27
-
28
-
29
- sheet.rows_with_meta_data.each do |row|
30
- puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}}
31
- end
32
-
33
-
34
- sheet.state # => 'visible'
35
- sheet.name # => 'Sheet1'
36
- sheet.rid # => 'rId2'
37
-
38
- == Filename considerations
39
- By default, Creek will ensure that the file extension is either *.xlsx or *.xlsm, but this check can be circumvented as needed:
40
-
41
- path = 'sample-as-zip.zip'
42
- Creek::Book.new path, :check_file_extension => false
43
-
44
- By default, the Rails {file_field_tag}[http://api.rubyonrails.org/classes/ActionView/Helpers/FormTagHelper.html#method-i-file_field_tag] uploads to a temporary location and stores the original filename with the StringIO object. (See {this section}[http://guides.rubyonrails.org/form_helpers.html#uploading-files] of the Rails Guides for more information.)
45
-
46
- Creek can parse this directly without the need for file upload gems such as Carrierwave or Paperclip by passing the original filename as an option:
47
-
48
- # Import endpoint in Rails controller
49
- def import
50
- file = params[:file]
51
- Creek::Book.new file.path, check_file_extension: false
52
- end
53
-
54
- == Contributing
55
-
56
- Contributions are welcomed. You can fork a repository, add your code changes to the forked branch, ensure all existing unit tests pass, create new unit tests cover your new changes and finally create a pull request.
57
-
58
- After forking and then cloning the repository locally, install Bundler and then use it
59
- to install the development gem dependecies:
60
-
61
- gem install bundler
62
- bundle install
63
-
64
- Once this is complete, you should be able to run the test suite:
65
-
66
- rake
67
-
68
-
69
- == Bug Reporting
70
-
71
- Please use the {Issues}[https://github.com/pythonicrubyist/creek/issues] page to report bugs or suggest new enhancements.
72
-
73
-
74
- == License
75
-
76
- Creek has been published under {MIT License}[https://github.com/pythonicrubyist/creek/blob/master/LICENSE.txt]