creek 1.1.2 → 2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- !binary "U0hBMQ==":
3
- metadata.gz: !binary |-
4
- M2M3NGQxMDJmMTc3NDk5MDUzMjFiNTU4NWI1ODZmMjBiNThkYzgyYg==
5
- data.tar.gz: !binary |-
6
- NWE3YmRhNGI5NTkwMDgzNDFiMDJkMmYzYzI1NjNiYjY2MDE0NmQ0Yw==
2
+ SHA1:
3
+ metadata.gz: 6b7d68233b036517f99988f30405a4508c3b4892
4
+ data.tar.gz: a34206d988e501a0324598bcb2de856957de1ec5
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- ZjdmNjdmOTM1Zjc1OGIyNjI1YWZiNjlmNmJhZDliZDViMDJjODE5NDEzNzJi
10
- NjI5MTkxMTYyNTRhMDhkODkxOGExY2E1MTVlMmZkODEwMTM4M2NlYjgyODI2
11
- MDcxMjFmMzNmZDA2ZWU3MWM3OWJmNWRhMzAyYzgxM2E0NzdkODk=
12
- data.tar.gz: !binary |-
13
- ZmYxNDFiYzU0MDM1Y2FlMjgxZmI4OGI2OTRiYTQzMTI2OGRhMGYxNjlhNTU0
14
- MzFjNTY5MjA5M2ZjY2MyODA3MDljYzYwZTNhNmZjZTAzMGQ4ZDFmMTNjNzU1
15
- MTU5Y2I1M2Q3ZGU3YWE1Y2VjZDVhMDY3NTVlYzc1NmEyNmZlODU=
6
+ metadata.gz: 3b75245668526340173ae7be11e0f5f81a92c5f0da6383e2961fd877fef49a7ec8809e7e97202b11cc978db8e2260fc8c3f43c7ad41da285e3d922dc705e4f79
7
+ data.tar.gz: 639e94e90af3d78d1af3ef6a07e08f79c79f21accf9d1f197976022112ee03c9d12284f0a78d674a946f33bf2d2091ed6c614a4d2c49c53329faed9ba5a4b514
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013 TODO: Write your name
1
+ Copyright (c) 2017 Ramtin Vaziri
2
2
 
3
3
  MIT License
4
4
 
@@ -0,0 +1,117 @@
1
+ # Creek - Stream parser for large Excel (xlsx and xlsm) files.
2
+
3
+ Creek is a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.
4
+
5
+
6
+ ## Installation
7
+
8
+ Creek can be used from the command line or as part of a Ruby web framework. To install the gem using terminal, run the following command:
9
+
10
+ ```
11
+ gem install creek
12
+ ```
13
+
14
+ To use it in Rails, add this line to your Gemfile:
15
+
16
+ ```ruby
17
+ gem 'creek'
18
+ ```
19
+
20
+ ## Basic Usage
21
+ Creek can simply parse an Excel file by looping through the rows enumerator:
22
+
23
+ ```ruby
24
+ require 'creek'
25
+ creek = Creek::Book.new 'specs/fixtures/sample.xlsx'
26
+ sheet= creek.sheets[0]
27
+
28
+ sheet.rows.each do |row|
29
+ puts row # => {"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}
30
+ end
31
+
32
+ sheet.rows_with_meta_data.each do |row|
33
+ puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}}
34
+ end
35
+
36
+ sheet.state # => 'visible'
37
+ sheet.name # => 'Sheet1'
38
+ sheet.rid # => 'rId2'
39
+ ```
40
+
41
+ ## Filename considerations
42
+ By default, Creek will ensure that the file extension is either *.xlsx or *.xlsm, but this check can be circumvented as needed:
43
+
44
+ ```ruby
45
+ path = 'sample-as-zip.zip'
46
+ Creek::Book.new path, :check_file_extension => false
47
+ ```
48
+
49
+ By default, the Rails [file_field_tag](http://api.rubyonrails.org/classes/ActionView/Helpers/FormTagHelper.html#method-i-file_field_tag) uploads to a temporary location and stores the original filename with the StringIO object. (See [this section](http://guides.rubyonrails.org/form_helpers.html#uploading-files) of the Rails Guides for more information.)
50
+
51
+ Creek can parse this directly without the need for file upload gems such as Carrierwave or Paperclip by passing the original filename as an option:
52
+
53
+ ```ruby
54
+ # Import endpoint in Rails controller
55
+ def import
56
+ file = params[:file]
57
+ Creek::Book.new file.path, check_file_extension: false
58
+ end
59
+ ```
60
+
61
+ ## Parsing images
62
+ Creek does not parse images by default. If you want to parse the images,
63
+ use `with_images` method before iterating over rows to preload images information. If you don't call this method, Creek will not return images anywhere.
64
+
65
+ Cells with images will be an array of Pathname objects.
66
+ If an image is spread across multiple cells, same Pathname object will be returned for each cell.
67
+
68
+ ```ruby
69
+ sheet.with_images.rows.each do |row|
70
+ puts row # => {"A1"=>[#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>], "B2"=>"Fluffy"}
71
+ end
72
+ ```
73
+
74
+ Images for a specific cell can be obtained with images_at method:
75
+
76
+ ```ruby
77
+ puts sheet.images_at('A1') # => [#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>]
78
+
79
+ # no images in a cell
80
+ puts sheet.images_at('C1') # => nil
81
+ ```
82
+
83
+ Creek will most likely return nil for a cell with images if there is no other text cell in that row - you can use *images_at* method for retrieving images in that cell.
84
+
85
+ ## Remote files
86
+
87
+ ```ruby
88
+ remote_url = 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
89
+ Creek::Book.new remote_url, remote: true
90
+ ```
91
+
92
+ ## Contributing
93
+
94
+ Contributions are welcomed. You can fork a repository, add your code changes to the forked branch, ensure all existing unit tests pass, create new unit tests which cover your new changes and finally create a pull request.
95
+
96
+ After forking and then cloning the repository locally, install the Bundler and then use it
97
+ to install the development gem dependencies:
98
+
99
+ ```
100
+ gem install bundler
101
+ bundle install
102
+ ```
103
+
104
+ Once this is complete, you should be able to run the test suite:
105
+
106
+ ```
107
+ rake
108
+ ```
109
+
110
+ ## Bug Reporting
111
+
112
+ Please use the [Issues](https://github.com/pythonicrubyist/creek/issues) page to report bugs or suggest new enhancements.
113
+
114
+
115
+ ## License
116
+
117
+ Creek has been published under [MIT License](https://github.com/pythonicrubyist/creek/blob/master/LICENSE.txt)
@@ -18,13 +18,14 @@ Gem::Specification.new do |spec|
18
18
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
19
  spec.require_paths = ["lib"]
20
20
 
21
- spec.required_ruby_version = '>= 1.9.2'
21
+ spec.required_ruby_version = '>= 2.0.0'
22
22
 
23
23
  spec.add_development_dependency "bundler", "~> 1.3"
24
24
  spec.add_development_dependency "rake"
25
- spec.add_development_dependency 'rspec', '~> 2.13.0'
25
+ spec.add_development_dependency 'rspec', '~> 3.6.0'
26
26
  spec.add_development_dependency 'pry'
27
27
 
28
- spec.add_dependency 'nokogiri', '~> 1.6.0'
28
+ spec.add_dependency 'nokogiri', '~> 1.7.0'
29
29
  spec.add_dependency 'rubyzip', '>= 1.0.0'
30
+ spec.add_dependency 'httparty', '~> 0.15.5'
30
31
  end
@@ -1,9 +1,11 @@
1
- require "creek/version"
1
+ require 'creek/version'
2
2
  require 'creek/book'
3
3
  require 'creek/styles/constants'
4
4
  require 'creek/styles/style_types'
5
5
  require 'creek/styles/converter'
6
+ require 'creek/utils'
6
7
  require 'creek/styles'
8
+ require 'creek/drawing'
7
9
  require 'creek/sheet'
8
10
  require 'creek/shared_strings'
9
11
 
@@ -1,6 +1,7 @@
1
1
  require 'zip/filesystem'
2
2
  require 'nokogiri'
3
3
  require 'date'
4
+ require 'httparty'
4
5
 
5
6
  module Creek
6
7
 
@@ -19,16 +20,32 @@ module Creek
19
20
  extension = File.extname(options[:original_filename] || path).downcase
20
21
  raise 'Not a valid file format.' unless (['.xlsx', '.xlsm'].include? extension)
21
22
  end
22
- @files = Zip::File.open path
23
+ if options[:remote]
24
+ zipfile = Tempfile.new("file")
25
+ zipfile.binmode
26
+ zipfile.write(HTTParty.get(path).body)
27
+ zipfile.close
28
+ path = zipfile.path
29
+ end
30
+ @files = Zip::File.open(path)
23
31
  @shared_strings = SharedStrings.new(self)
24
32
  end
25
33
 
26
34
  def sheets
27
35
  doc = @files.file.open "xl/workbook.xml"
28
36
  xml = Nokogiri::XML::Document.parse doc
37
+ namespaces = xml.namespaces
38
+
39
+ cssPrefix = ''
40
+ namespaces.each do |namespace|
41
+ if namespace[1] == 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' && namespace[0] != 'xmlns' then
42
+ cssPrefix = namespace[0].split(':')[1]+'|'
43
+ end
44
+ end
45
+
29
46
  rels_doc = @files.file.open "xl/_rels/workbook.xml.rels"
30
47
  rels = Nokogiri::XML::Document.parse(rels_doc).css("Relationship")
31
- @sheets = xml.css('sheet').map do |sheet|
48
+ @sheets = xml.css(cssPrefix+'sheet').map do |sheet|
32
49
  sheetfile = rels.find { |el| sheet.attr("r:id") == el.attr("Id") }.attr("Target")
33
50
  Sheet.new(self, sheet.attr("name"), sheet.attr("sheetid"), sheet.attr("state"), sheet.attr("visible"), sheet.attr("r:id"), sheetfile)
34
51
  end
@@ -0,0 +1,109 @@
1
+ require 'pathname'
2
+
3
+ module Creek
4
+ class Creek::Drawing
5
+ include Creek::Utils
6
+
7
+ COLUMNS = ('A'..'AZ').to_a
8
+
9
+ def initialize(book, drawing_filepath)
10
+ @book = book
11
+ @drawing_filepath = drawing_filepath
12
+ @drawings = []
13
+ @drawings_rels = []
14
+ @images_pathnames = Hash.new { |hash, key| hash[key] = [] }
15
+
16
+ if file_exist?(@drawing_filepath)
17
+ load_drawings_and_rels
18
+ load_images_pathnames_by_cells if has_images?
19
+ end
20
+ end
21
+
22
+ ##
23
+ # Returns false if there are no images in the drawing file or the drawing file does not exist, true otherwise.
24
+ def has_images?
25
+ @has_images ||= !@drawings.empty?
26
+ end
27
+
28
+ ##
29
+ # Extracts images from excel to tmpdir for a cell, if the images are not already extracted (multiple calls or same image file in multiple cells).
30
+ # Returns array of images as Pathname objects or nil.
31
+ def images_at(cell_name)
32
+ coordinate = calc_coordinate(cell_name)
33
+ pathnames_at_coordinate = @images_pathnames[coordinate]
34
+ return if pathnames_at_coordinate.empty?
35
+
36
+ pathnames_at_coordinate.map do |image_pathname|
37
+ if image_pathname.exist?
38
+ image_pathname
39
+ else
40
+ excel_image_path = "xl/media#{image_pathname.to_path.split(tmpdir).last}"
41
+ IO.copy_stream(@book.files.file.open(excel_image_path), image_pathname.to_path)
42
+ image_pathname
43
+ end
44
+ end
45
+ end
46
+
47
+ private
48
+
49
+ ##
50
+ # Transforms cell name to [row, col], e.g. A1 => [0, 0], B3 => [1, 2]
51
+ # Rows and cols start with 0.
52
+ def calc_coordinate(cell_name)
53
+ col = COLUMNS.index(cell_name.slice /[A-Z]+/)
54
+ row = (cell_name.slice /\d+/).to_i - 1 # rows in drawings start with 0
55
+ [row, col]
56
+ end
57
+
58
+ ##
59
+ # Creates/loads temporary directory for extracting images from excel
60
+ def tmpdir
61
+ @tmpdir ||= ::Dir.mktmpdir('creek__drawing')
62
+ end
63
+
64
+ ##
65
+ # Parses drawing and drawing's relationships xmls.
66
+ # Drawing xml contains relationships ID's and coordinates (row, col).
67
+ # Drawing relationships xml contains images' locations.
68
+ def load_drawings_and_rels
69
+ @drawings = parse_xml(@drawing_filepath).css('xdr|twoCellAnchor')
70
+ drawing_rels_filepath = expand_to_rels_path(@drawing_filepath)
71
+ @drawings_rels = parse_xml(drawing_rels_filepath).css('Relationships')
72
+ end
73
+
74
+ ##
75
+ # Iterates through the drawings and saves images' paths as Pathname objects to a hash with [row, col] keys.
76
+ # As multiple images can be located in a single cell, hash values are array of Pathname objects.
77
+ # One image can be spread across multiple cells (defined with from-row/to-row/from-col/to-col attributes) - same Pathname object is associated to each row-col combination for the range.
78
+ def load_images_pathnames_by_cells
79
+ image_selector = 'xdr:pic/xdr:blipFill/a:blip'.freeze
80
+ row_from_selector = 'xdr:from/xdr:row'.freeze
81
+ row_to_selector = 'xdr:to/xdr:row'.freeze
82
+ col_from_selector = 'xdr:from/xdr:col'.freeze
83
+ col_to_selector = 'xdr:to/xdr:col'.freeze
84
+
85
+ @drawings.xpath('//xdr:twoCellAnchor').each do |drawing|
86
+ embed = drawing.xpath(image_selector).first.attributes['embed']
87
+ next if embed.nil?
88
+
89
+ rid = embed.value
90
+ path = Pathname.new("#{tmpdir}/#{extract_drawing_path(rid).slice(/[^\/]*$/)}")
91
+
92
+ row_from = drawing.xpath(row_from_selector).text.to_i
93
+ col_from = drawing.xpath(col_from_selector).text.to_i
94
+ row_to = drawing.xpath(row_to_selector).text.to_i
95
+ col_to = drawing.xpath(col_to_selector).text.to_i
96
+
97
+ (col_from..col_to).each do |col|
98
+ (row_from..row_to).each do |row|
99
+ @images_pathnames[[row, col]].push(path)
100
+ end
101
+ end
102
+ end
103
+ end
104
+
105
+ def extract_drawing_path(rid)
106
+ @drawings_rels.css("Relationship[@Id=#{rid}]").first.attributes['Target'].value
107
+ end
108
+ end
109
+ end
@@ -3,6 +3,7 @@ require 'nokogiri'
3
3
 
4
4
  module Creek
5
5
  class Creek::Sheet
6
+ include Creek::Utils
6
7
 
7
8
  attr_reader :book,
8
9
  :name,
@@ -21,6 +22,28 @@ module Creek
21
22
  @rid = rid
22
23
  @state = state
23
24
  @sheetfile = sheetfile
25
+ @images_present = false
26
+ end
27
+
28
+ ##
29
+ # Preloads images info (coordinates and paths) from related drawing.xml and drawing rels.
30
+ # Must be called before #rows method if you want to have images included.
31
+ # Returns self so you can chain the calls (sheet.with_images.rows).
32
+ def with_images
33
+ @drawingfile = extract_drawing_filepath
34
+ if @drawingfile
35
+ @drawing = Creek::Drawing.new(@book, @drawingfile.sub('..', 'xl'))
36
+ @images_present = @drawing.has_images?
37
+ end
38
+ self
39
+ end
40
+
41
+ ##
42
+ # Extracts images for a cell to a temporary folder.
43
+ # Returns array of Pathnames for the cell.
44
+ # Returns nil if images asre not found for the cell or images were not preloaded with #with_images.
45
+ def images_at(cell)
46
+ @drawing.images_at(cell) if @images_present
24
47
  end
25
48
 
26
49
  ##
@@ -43,7 +66,7 @@ module Creek
43
66
  # Returns a hash per row that includes the cell ids and values.
44
67
  # Empty cells will be also included in the hash with a nil value.
45
68
  def rows_generator include_meta_data=false
46
- path = "xl/#{@sheetfile}"
69
+ path = if @sheetfile.start_with? "/xl/" or @sheetfile.start_with? "xl/" then @sheetfile else "xl/#{@sheetfile}" end
47
70
  if @book.files.file.exist?(path)
48
71
  # SAX parsing, Each element in the stream comes through as two events:
49
72
  # one to open the element and one to close it.
@@ -62,6 +85,14 @@ module Creek
62
85
  y << (include_meta_data ? row : cells) if node.self_closing?
63
86
  elsif (node.name.eql? 'row') and (node.node_type.eql? closer)
64
87
  processed_cells = fill_in_empty_cells(cells, row['r'], cell)
88
+
89
+ if @images_present
90
+ processed_cells.each do |cell_name, cell_value|
91
+ next unless cell_value.nil?
92
+ processed_cells[cell_name] = images_at(cell_name)
93
+ end
94
+ end
95
+
65
96
  row['cells'] = processed_cells
66
97
  y << (include_meta_data ? row : processed_cells)
67
98
  elsif (node.name.eql? 'c') and (node.node_type.eql? opener)
@@ -72,6 +103,10 @@ module Creek
72
103
  unless cell.nil?
73
104
  cells[cell] = convert(node.inner_xml, cell_type, cell_style_idx)
74
105
  end
106
+ elsif (node.name.eql? 't') and (node.node_type.eql? opener)
107
+ unless cell.nil?
108
+ cells[cell] = convert(node.inner_xml, cell_type, cell_style_idx)
109
+ end
75
110
  end
76
111
  end
77
112
  end
@@ -108,5 +143,22 @@ module Creek
108
143
 
109
144
  new_cells
110
145
  end
146
+
147
+ ##
148
+ # Find drawing filepath for the current sheet.
149
+ # Sheet xml contains drawing relationship ID.
150
+ # Sheet relationships xml contains drawing file's location.
151
+ def extract_drawing_filepath
152
+ # Read drawing relationship ID from the sheet.
153
+ sheet_filepath = "xl/#{@sheetfile}"
154
+ drawing = parse_xml(sheet_filepath).css('drawing').first
155
+ return if drawing.nil?
156
+
157
+ drawing_rid = drawing.attributes['id'].value
158
+
159
+ # Read sheet rels to find drawing file's location.
160
+ sheet_rels_filepath = expand_to_rels_path(sheet_filepath)
161
+ parse_xml(sheet_rels_filepath).css("Relationship[@Id='#{drawing_rid}']").first.attributes['Target'].value
162
+ end
111
163
  end
112
- end
164
+ end
@@ -58,10 +58,8 @@ module Creek
58
58
  value
59
59
  when :fixnum
60
60
  value.to_i
61
- when :float
61
+ when :float, :percentage
62
62
  value.to_f
63
- when :percentage
64
- value.to_f / 100
65
63
  when :date, :time, :date_time
66
64
  convert_date(value, options)
67
65
  when :bignum
@@ -0,0 +1,16 @@
1
+ module Creek
2
+ module Utils
3
+ def expand_to_rels_path(filepath)
4
+ filepath.sub(/(\/[^\/]+$)/, '/_rels\1.rels')
5
+ end
6
+
7
+ def file_exist?(path)
8
+ @book.files.file.exist?(path)
9
+ end
10
+
11
+ def parse_xml(xml_path)
12
+ doc = @book.files.file.open(xml_path)
13
+ Nokogiri::XML::Document.parse(doc)
14
+ end
15
+ end
16
+ end
@@ -1,3 +1,3 @@
1
1
  module Creek
2
- VERSION = "1.1.2"
2
+ VERSION = "2.0"
3
3
  end
@@ -0,0 +1,52 @@
1
+ require './spec/spec_helper'
2
+
3
+ describe 'drawing' do
4
+ let(:book) { Creek::Book.new('spec/fixtures/sample-with-images.xlsx') }
5
+ let(:book_no_images) { Creek::Book.new('spec/fixtures/sample.xlsx') }
6
+ let(:drawingfile) { 'xl/drawings/drawing1.xml' }
7
+ let(:drawing) { Creek::Drawing.new(book, drawingfile) }
8
+ let(:drawing_without_images) { Creek::Drawing.new(book_no_images, drawingfile) }
9
+
10
+ describe '#has_images?' do
11
+ it 'has' do
12
+ expect(drawing.has_images?).to eq(true)
13
+ end
14
+
15
+ it 'does not have' do
16
+ expect(drawing_without_images.has_images?).to eq(false)
17
+ end
18
+ end
19
+
20
+ describe '#images_at' do
21
+ it 'returns images pathnames at cell' do
22
+ image = drawing.images_at('A2')[0]
23
+ expect(image.class).to eq(Pathname)
24
+ expect(image.exist?).to eq(true)
25
+ expect(image.to_path).to match(/.+creek__drawing.+\.jpeg$/)
26
+ end
27
+
28
+ context 'when no images in cell' do
29
+ it 'returns nil' do
30
+ images = drawing.images_at('B2')
31
+ expect(images).to eq(nil)
32
+ end
33
+ end
34
+
35
+ context 'when more images in one cell' do
36
+ it 'returns all images at cell' do
37
+ images = drawing.images_at('A10')
38
+ expect(images.size).to eq(2)
39
+ expect(images.all?(&:exist?)).to eq(true)
40
+ end
41
+ end
42
+
43
+ context 'when same image across multiple cells' do
44
+ it 'returns same image for each cell' do
45
+ image1 = drawing.images_at('A4')[0]
46
+ image2 = drawing.images_at('A5')[0]
47
+ expect(image1.class).to eq(Pathname)
48
+ expect(image1).to eq(image2)
49
+ end
50
+ end
51
+ end
52
+ end
Binary file
@@ -7,12 +7,12 @@ describe 'shared strings' do
7
7
  doc = Nokogiri::XML(shared_strings_xml_file)
8
8
  dictionary = Creek::SharedStrings.parse_shared_string_from_document(doc)
9
9
 
10
- dictionary.keys.size.should == 5
11
- dictionary[0].should == 'Cell A1'
12
- dictionary[1].should == 'Cell B1'
13
- dictionary[2].should == 'My Cell'
14
- dictionary[3].should == 'Cell A2'
15
- dictionary[4].should == 'Cell B2'
10
+ expect(dictionary.keys.size).to eq(5)
11
+ expect(dictionary[0]).to eq('Cell A1')
12
+ expect(dictionary[1]).to eq('Cell B1')
13
+ expect(dictionary[2]).to eq('My Cell')
14
+ expect(dictionary[3]).to eq('Cell A2')
15
+ expect(dictionary[4]).to eq('Cell B2')
16
16
  end
17
17
 
18
18
  end
@@ -0,0 +1,85 @@
1
+ require './spec/spec_helper'
2
+
3
+ describe 'sheet' do
4
+ let(:book_with_images) { Creek::Book.new('spec/fixtures/sample-with-images.xlsx') }
5
+ let(:book_no_images) { Creek::Book.new('spec/fixtures/sample.xlsx') }
6
+ let(:sheetfile) { 'worksheets/sheet1.xml' }
7
+ let(:sheet_with_images) { Creek::Sheet.new(book_with_images, 'Sheet 1', 1, '', '', '1', sheetfile) }
8
+ let(:sheet_no_images) { Creek::Sheet.new(book_no_images, 'Sheet 1', 1, '', '', '1', sheetfile) }
9
+
10
+ def load_cell(rows, cell_name)
11
+ cell = rows.find { |row| !row[cell_name].nil? }
12
+ cell[cell_name] if cell
13
+ end
14
+
15
+ describe '#rows' do
16
+ context 'with excel with images' do
17
+ context 'with images preloading' do
18
+ let(:rows) { sheet_with_images.with_images.rows.map { |r| r } }
19
+
20
+ it 'parses single image in a cell' do
21
+ expect(load_cell(rows, 'A2').size).to eq(1)
22
+ end
23
+
24
+ it 'returns nil for cells without images' do
25
+ expect(load_cell(rows, 'A3')).to eq(nil)
26
+ expect(load_cell(rows, 'A7')).to eq(nil)
27
+ expect(load_cell(rows, 'A9')).to eq(nil)
28
+ end
29
+
30
+ it 'returns nil for merged cell within empty row' do
31
+ expect(load_cell(rows, 'A5')).to eq(nil)
32
+ end
33
+
34
+ it 'returns nil for image in a cell with empty row' do
35
+ expect(load_cell(rows, 'A8')).to eq(nil)
36
+ end
37
+
38
+ it 'returns images for merged cells' do
39
+ expect(load_cell(rows, 'A4').size).to eq(1)
40
+ expect(load_cell(rows, 'A6').size).to eq(1)
41
+ end
42
+
43
+ it 'returns multiple images' do
44
+ expect(load_cell(rows, 'A10').size).to eq(2)
45
+ end
46
+ end
47
+
48
+ it 'ignores images' do
49
+ rows = sheet_with_images.rows.map { |r| r }
50
+ expect(load_cell(rows, 'A2')).to eq(nil)
51
+ expect(load_cell(rows, 'A3')).to eq(nil)
52
+ expect(load_cell(rows, 'A4')).to eq(nil)
53
+ end
54
+ end
55
+
56
+ context 'with excel without images' do
57
+ it 'does not break on with_images' do
58
+ rows = sheet_no_images.with_images.rows.map { |r| r }
59
+ expect(load_cell(rows, 'A10')).to eq(0.15)
60
+ end
61
+ end
62
+ end
63
+
64
+ describe '#images_at' do
65
+ it 'returns images for merged cell' do
66
+ image = sheet_with_images.with_images.images_at('A5')[0]
67
+ expect(image.class).to eq(Pathname)
68
+ end
69
+
70
+ it 'returns images for empty row' do
71
+ image = sheet_with_images.with_images.images_at('A8')[0]
72
+ expect(image.class).to eq(Pathname)
73
+ end
74
+
75
+ it 'returns nil for empty cell' do
76
+ image = sheet_with_images.with_images.images_at('B3')
77
+ expect(image).to eq(nil)
78
+ end
79
+
80
+ it 'returns nil for empty cell without preloading images' do
81
+ image = sheet_with_images.images_at('B3')
82
+ expect(image).to eq(nil)
83
+ end
84
+ end
85
+ end
@@ -9,7 +9,7 @@ describe Creek::Styles::Converter do
9
9
 
10
10
  describe :date_time do
11
11
  it "works" do
12
- convert('41275', 'n', :date_time).should == Date.new(2013,01,01)
12
+ expect(convert('41275', 'n', :date_time)).to eq(Date.new(2013,01,01))
13
13
  end
14
14
  end
15
15
  end
@@ -7,9 +7,9 @@ describe Creek::Styles::StyleTypes do
7
7
  xml_file = File.open('spec/fixtures/styles/first.xml')
8
8
  doc = Nokogiri::XML(xml_file)
9
9
  res = Creek::Styles::StyleTypes.new(doc).call
10
- res.size.should == 8
11
- res[3].should == :date_time
12
- res.should == [:unsupported, :unsupported, :unsupported, :date_time, :unsupported, :unsupported, :unsupported, :unsupported]
10
+ expect(res.size).to eq(8)
11
+ expect(res[3]).to eq(:date_time)
12
+ expect(res).to eq([:unsupported, :unsupported, :unsupported, :date_time, :unsupported, :unsupported, :unsupported, :unsupported])
13
13
  end
14
14
  end
15
15
  end
@@ -2,25 +2,38 @@ require './spec/spec_helper'
2
2
 
3
3
  describe 'Creek trying to parsing an invalid file.' do
4
4
  it 'Fail to open a legacy xls file.' do
5
- lambda { Creek::Book.new 'spec/fixtures/invalid.xls' }.should raise_error 'Not a valid file format.'
5
+ expect { Creek::Book.new 'spec/fixtures/invalid.xls' }
6
+ .to raise_error 'Not a valid file format.'
6
7
  end
7
8
 
8
9
  it 'Ignore file extensions on request.' do
9
10
  path = 'spec/fixtures/sample-as-zip.zip'
10
- lambda { Creek::Book.new path, :check_file_extension => false }.should_not raise_error
11
+ expect { Creek::Book.new path, check_file_extension: false }
12
+ .not_to raise_error
11
13
  end
12
14
 
13
15
  it 'Check file extension when requested.' do
14
- open_book = lambda { Creek::Book.new 'spec/fixtures/invalid.xls', :check_file_extension => true }
15
- open_book.should raise_error 'Not a valid file format.'
16
+ expect { Creek::Book.new 'spec/fixtures/invalid.xls', check_file_extension: true }
17
+ .to raise_error 'Not a valid file format.'
18
+ end
19
+
20
+ it 'Fail to open remote file' do
21
+ expect { Creek::Book.new 'http://dev-builds.libreoffice.org/tmp/test.xlsx' }
22
+ .to raise_error(Zip::Error, /not found/)
23
+ end
24
+
25
+ it 'Opens remote file with remote flag' do
26
+ expect { Creek::Book.new 'http://dev-builds.libreoffice.org/tmp/test.xlsx', remote: true }
27
+ .not_to raise_error
16
28
  end
17
29
 
18
30
  it 'Check file extension of original_filename if passed.' do
19
31
  path = 'spec/fixtures/temp_string_io_file_path_with_no_extension'
20
- lambda { Creek::Book.new path, :original_filename => 'invalid.xls' }.should raise_error 'Not a valid file format.'
21
- lambda { Creek::Book.new path, :original_filename => 'valid.xlsx' }.should_not raise_error
32
+ expect { Creek::Book.new path, :original_filename => 'invalid.xls' }
33
+ .to raise_error 'Not a valid file format.'
34
+ expect { Creek::Book.new path, :original_filename => 'valid.xlsx' }
35
+ .not_to raise_error
22
36
  end
23
-
24
37
  end
25
38
 
26
39
  describe 'Creek parsing a sample XLSX file' do
@@ -32,7 +45,8 @@ describe 'Creek parsing a sample XLSX file' do
32
45
  {'A4'=>'Content 7', 'B4'=>'Content 8', 'C4'=>'Content 9', 'D4'=>'Content 10', 'E4'=>'Content 11', 'F4'=>'Content 12'},
33
46
  {'A5'=>nil, 'B5'=>nil, 'C5'=>nil, 'D5'=>nil, 'E5'=>nil, 'F5'=>nil, 'G5'=>nil, 'H5'=>nil, 'I5'=>nil, 'J5'=>nil, 'K5'=>nil, 'L5'=>nil, 'M5'=>nil, 'N5'=>nil, 'O5'=>nil, 'P5'=>nil, 'Q5'=>nil, 'R5'=>nil, 'S5'=>nil, 'T5'=>nil, 'U5'=>nil, 'V5'=>nil, 'W5'=>nil, 'X5'=>nil, 'Y5'=>nil, 'Z5'=>'Z Content', 'AA5'=>nil, 'AB5'=>nil, 'AC5'=>nil, 'AD5'=>nil, 'AE5'=>nil, 'AF5'=>nil, 'AG5'=>nil, 'AH5'=>nil, 'AI5'=>nil, 'AJ5'=>nil, 'AK5'=>nil, 'AL5'=>nil, 'AM5'=>nil, 'AN5'=>nil, 'AO5'=>nil, 'AP5'=>nil, 'AQ5'=>nil, 'AR5'=>nil, 'AS5'=>nil, 'AT5'=>nil, 'AU5'=>nil, 'AV5'=>nil, 'AW5'=>nil, 'AX5'=>nil, 'AY5'=>nil, 'AZ5'=>'Content 13'},
34
47
  {'A6'=>'1', 'B6'=>'2', 'C6'=>'3'}, {'A7'=>'Content 15', 'B7'=>'Content 16', 'C7'=>'Content 18', 'D7'=>'Content 19'},
35
- {'A8'=>nil, 'B8'=>'Content 20', 'C8'=>nil, 'D8'=>nil, 'E8'=>nil, 'F8'=>'Content 21'}]
48
+ {'A8'=>nil, 'B8'=>'Content 20', 'C8'=>nil, 'D8'=>nil, 'E8'=>nil, 'F8'=>'Content 21'},
49
+ {'A10' => 0.15, 'B10' => 0.15}]
36
50
  end
37
51
 
38
52
  after(:all) do
@@ -40,15 +54,15 @@ describe 'Creek parsing a sample XLSX file' do
40
54
  end
41
55
 
42
56
  it 'open an XLSX file successfully.' do
43
- @creek.should_not be_nil
57
+ expect(@creek).not_to be_nil
44
58
  end
45
59
 
46
60
  it 'find sheets successfully.' do
47
- @creek.sheets.count.should == 1
61
+ expect(@creek.sheets.count).to eq(1)
48
62
  sheet = @creek.sheets.first
49
- sheet.state.should eql nil
50
- sheet.name.should eql 'Sheet1'
51
- sheet.rid.should eql 'rId1'
63
+ expect(sheet.state).to eql nil
64
+ expect(sheet.name).to eql 'Sheet1'
65
+ expect(sheet.rid).to eql 'rId1'
52
66
  end
53
67
 
54
68
  it 'Parse rows with empty cells successfully.' do
@@ -59,15 +73,16 @@ describe 'Creek parsing a sample XLSX file' do
59
73
  row_count += 1
60
74
  end
61
75
 
62
- rows[0].should == @expected_rows[0]
63
- rows[1].should == @expected_rows[1]
64
- rows[2].should == @expected_rows[2]
65
- rows[3].should == @expected_rows[3]
66
- rows[4].should == @expected_rows[4]
67
- rows[5].should == @expected_rows[5]
68
- rows[6].should == @expected_rows[6]
69
- rows[7].should == @expected_rows[7]
70
- row_count.should == 8
76
+ expect(rows[0]).to eq(@expected_rows[0])
77
+ expect(rows[1]).to eq(@expected_rows[1])
78
+ expect(rows[2]).to eq(@expected_rows[2])
79
+ expect(rows[3]).to eq(@expected_rows[3])
80
+ expect(rows[4]).to eq(@expected_rows[4])
81
+ expect(rows[5]).to eq(@expected_rows[5])
82
+ expect(rows[6]).to eq(@expected_rows[6])
83
+ expect(rows[7]).to eq(@expected_rows[7])
84
+ expect(rows[8]).to eq(@expected_rows[8])
85
+ expect(row_count).to eq(9)
71
86
  end
72
87
 
73
88
  it 'Parse rows with empty cells and meta data successfully.' do
@@ -77,6 +92,6 @@ describe 'Creek parsing a sample XLSX file' do
77
92
  rows << row
78
93
  row_count += 1
79
94
  end
80
- rows.map{|r| r['cells']}.should == @expected_rows
95
+ expect(rows.map{|r| r['cells']}).to eq(@expected_rows)
81
96
  end
82
97
  end
metadata CHANGED
@@ -1,99 +1,113 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: creek
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.2
4
+ version: '2.0'
5
5
  platform: ruby
6
6
  authors:
7
7
  - pythonicrubyist
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-04-21 00:00:00.000000000 Z
11
+ date: 2017-06-14 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - "~>"
18
18
  - !ruby/object:Gem::Version
19
19
  version: '1.3'
20
20
  type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1.3'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rake
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ! '>='
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '0'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ! '>='
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: rspec
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ~>
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: 2.13.0
47
+ version: 3.6.0
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: 2.13.0
54
+ version: 3.6.0
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: pry
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - ! '>='
59
+ - - ">="
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - ! '>='
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: nokogiri
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - ~>
73
+ - - "~>"
74
74
  - !ruby/object:Gem::Version
75
- version: 1.6.0
75
+ version: 1.7.0
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - ~>
80
+ - - "~>"
81
81
  - !ruby/object:Gem::Version
82
- version: 1.6.0
82
+ version: 1.7.0
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: rubyzip
85
85
  requirement: !ruby/object:Gem::Requirement
86
86
  requirements:
87
- - - ! '>='
87
+ - - ">="
88
88
  - !ruby/object:Gem::Version
89
89
  version: 1.0.0
90
90
  type: :runtime
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
- - - ! '>='
94
+ - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: 1.0.0
97
+ - !ruby/object:Gem::Dependency
98
+ name: httparty
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: 0.15.5
104
+ type: :runtime
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: 0.15.5
97
111
  description: A Ruby gem that streams and parses large Excel(xlsx and xlsm) files fast
98
112
  and efficiently.
99
113
  email:
@@ -102,29 +116,34 @@ executables: []
102
116
  extensions: []
103
117
  extra_rdoc_files: []
104
118
  files:
105
- - .gitignore
119
+ - ".gitignore"
106
120
  - Gemfile
107
121
  - LICENSE.txt
108
- - README.rdoc
122
+ - README.md
109
123
  - Rakefile
110
124
  - creek.gemspec
111
125
  - lib/creek.rb
112
126
  - lib/creek/book.rb
127
+ - lib/creek/drawing.rb
113
128
  - lib/creek/shared_strings.rb
114
129
  - lib/creek/sheet.rb
115
130
  - lib/creek/styles.rb
116
131
  - lib/creek/styles/constants.rb
117
132
  - lib/creek/styles/converter.rb
118
133
  - lib/creek/styles/style_types.rb
134
+ - lib/creek/utils.rb
119
135
  - lib/creek/version.rb
136
+ - spec/drawing_spec.rb
120
137
  - spec/fixtures/invalid.xls
121
138
  - spec/fixtures/sample-as-zip.zip
139
+ - spec/fixtures/sample-with-images.xlsx
122
140
  - spec/fixtures/sample.xlsx
123
141
  - spec/fixtures/sheets/sheet1.xml
124
142
  - spec/fixtures/sst.xml
125
143
  - spec/fixtures/styles/first.xml
126
144
  - spec/fixtures/temp_string_io_file_path_with_no_extension
127
145
  - spec/shared_string_spec.rb
146
+ - spec/sheet_spec.rb
128
147
  - spec/spec_helper.rb
129
148
  - spec/styles/converter_spec.rb
130
149
  - spec/styles/style_types_spec.rb
@@ -139,29 +158,32 @@ require_paths:
139
158
  - lib
140
159
  required_ruby_version: !ruby/object:Gem::Requirement
141
160
  requirements:
142
- - - ! '>='
161
+ - - ">="
143
162
  - !ruby/object:Gem::Version
144
- version: 1.9.2
163
+ version: 2.0.0
145
164
  required_rubygems_version: !ruby/object:Gem::Requirement
146
165
  requirements:
147
- - - ! '>='
166
+ - - ">="
148
167
  - !ruby/object:Gem::Version
149
168
  version: '0'
150
169
  requirements: []
151
170
  rubyforge_project:
152
- rubygems_version: 2.4.3
171
+ rubygems_version: 2.6.12
153
172
  signing_key:
154
173
  specification_version: 4
155
174
  summary: A Ruby gem for parsing large Excel(xlsx and xlsm) files.
156
175
  test_files:
176
+ - spec/drawing_spec.rb
157
177
  - spec/fixtures/invalid.xls
158
178
  - spec/fixtures/sample-as-zip.zip
179
+ - spec/fixtures/sample-with-images.xlsx
159
180
  - spec/fixtures/sample.xlsx
160
181
  - spec/fixtures/sheets/sheet1.xml
161
182
  - spec/fixtures/sst.xml
162
183
  - spec/fixtures/styles/first.xml
163
184
  - spec/fixtures/temp_string_io_file_path_with_no_extension
164
185
  - spec/shared_string_spec.rb
186
+ - spec/sheet_spec.rb
165
187
  - spec/spec_helper.rb
166
188
  - spec/styles/converter_spec.rb
167
189
  - spec/styles/style_types_spec.rb
@@ -1,76 +0,0 @@
1
- = Creek -- Stream parser for large Excel(xlsx and xlsm) files.
2
-
3
- Creek is a Ruby gem that provide a fast, simple and efficient method of parsing large Excel(xlsx and xlsm) files.
4
-
5
-
6
- == Installation
7
-
8
- Creek can be used from the command line or as part of a Ruby web framework. To install the gem using terminal, run the following command:
9
-
10
- gem install creek
11
-
12
- To use it in Rails, add this line to your Gemfile:
13
-
14
- gem "creek"
15
-
16
-
17
- == Basic Usage
18
- Creek can simply parse an Excel file by looping through the rows enumerator:
19
-
20
- require 'creek'
21
- creek = Creek::Book.new "specs/fixtures/sample.xlsx"
22
- sheet= creek.sheets[0]
23
-
24
- sheet.rows.each do |row|
25
- puts row # => {"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}
26
- end
27
-
28
-
29
- sheet.rows_with_meta_data.each do |row|
30
- puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A1"=>"Content 1", "B1"=>nil, C1"=>nil, "D1"=>"Content 3"}}
31
- end
32
-
33
-
34
- sheet.state # => 'visible'
35
- sheet.name # => 'Sheet1'
36
- sheet.rid # => 'rId2'
37
-
38
- == Filename considerations
39
- By default, Creek will ensure that the file extension is either *.xlsx or *.xlsm, but this check can be circumvented as needed:
40
-
41
- path = 'sample-as-zip.zip'
42
- Creek::Book.new path, :check_file_extension => false
43
-
44
- By default, the Rails {file_field_tag}[http://api.rubyonrails.org/classes/ActionView/Helpers/FormTagHelper.html#method-i-file_field_tag] uploads to a temporary location and stores the original filename with the StringIO object. (See {this section}[http://guides.rubyonrails.org/form_helpers.html#uploading-files] of the Rails Guides for more information.)
45
-
46
- Creek can parse this directly without the need for file upload gems such as Carrierwave or Paperclip by passing the original filename as an option:
47
-
48
- # Import endpoint in Rails controller
49
- def import
50
- file = params[:file]
51
- Creek::Book.new file.path, check_file_extension: false
52
- end
53
-
54
- == Contributing
55
-
56
- Contributions are welcomed. You can fork a repository, add your code changes to the forked branch, ensure all existing unit tests pass, create new unit tests cover your new changes and finally create a pull request.
57
-
58
- After forking and then cloning the repository locally, install Bundler and then use it
59
- to install the development gem dependecies:
60
-
61
- gem install bundler
62
- bundle install
63
-
64
- Once this is complete, you should be able to run the test suite:
65
-
66
- rake
67
-
68
-
69
- == Bug Reporting
70
-
71
- Please use the {Issues}[https://github.com/pythonicrubyist/creek/issues] page to report bugs or suggest new enhancements.
72
-
73
-
74
- == License
75
-
76
- Creek has been published under {MIT License}[https://github.com/pythonicrubyist/creek/blob/master/LICENSE.txt]