xsv 1.0.3 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 284b7b6ce94f03f8dfd74d77275a4ad2838f44bf9370a4ed2f5f6dba23ad513b
4
- data.tar.gz: cc18f35656e7596c49c1f14e134071f12575204bb8237284cfbb49d2c924728f
3
+ metadata.gz: f1ebfa4e4778af72a8b295d258d899a5b5d01fd029d1294d54af5e4f1e0de05a
4
+ data.tar.gz: aa74ffe0d57eebc12e312bdb42107bc39203cd3a0237f2e2481205c8b3b933c9
5
5
  SHA512:
6
- metadata.gz: 81ef3af200fb019556e1d5093bfc6f01457f0557c5314f088e707ba25274e74e0c90f914d3133de66a0fb27a398c5b1f8ad3e30d9c55f85772ad029160bfcd05
7
- data.tar.gz: a4631f63c346ddfa449181d1e4de685eac9720f87ec296f0cd335e7487ad5ad8e9e973d1caf8893c37038b78f1665bdaeb853ada501f7b813dc8bda36e85c7b6
6
+ metadata.gz: 6ebbb32e48860043bdb0a5d17f6fef525252e8bb01ac63180cd0e749dfbd1b3bb08b8a82e43e9e13da2fa187c02f77e37f525e9c7453334bc3cb8fdf05400187
7
+ data.tar.gz: 4e1450daebcc3ddfbc0585de52f4d1f362ef2d79281e59cc641119a47924f8450b76c3643a6403a440e0a581ebfcc108431a311f902e2b48be61a1d2afd7b19e
@@ -0,0 +1,32 @@
1
+ # This workflow uses actions that are not certified by GitHub.
2
+ # They are provided by a third-party and are governed by
3
+ # separate terms of service, privacy policy, and support
4
+ # documentation.
5
+ # This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
6
+ # For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
7
+
8
+ name: Ruby
9
+
10
+ on:
11
+ push:
12
+ branches: [ main ]
13
+ pull_request:
14
+ branches: [ main ]
15
+
16
+ jobs:
17
+ test:
18
+
19
+ runs-on: ubuntu-latest
20
+ strategy:
21
+ matrix:
22
+ ruby-version: ['2.6', '2.7', '3.0', '3.1', 'jruby', 'truffleruby']
23
+
24
+ steps:
25
+ - uses: actions/checkout@v2
26
+ - name: Set up Ruby
27
+ uses: ruby/setup-ruby@v1
28
+ with:
29
+ ruby-version: ${{ matrix.ruby-version }}
30
+ bundler-cache: true # runs 'bundle install' and caches installed gems automatically
31
+ - name: Run tests
32
+ run: bundle exec rake
data/.standard.yml ADDED
@@ -0,0 +1 @@
1
+ ruby_version: 2.6.9
data/CHANGELOG.md CHANGED
@@ -1,5 +1,24 @@
1
1
  # Xsv Changelog
2
2
 
3
+ ## 1.1.0 2022-02-13
4
+
5
+ - New, shorter `Xsv.open` syntax as a drop-in replacement for `Xsv::Workbook.open`, which is still supported
6
+ - Enable parsing of headers for all sheets by passing `parse_headers: true` to `Xsv.open`
7
+ - Improvements in performance and test coverage
8
+ - Dropped support for Ruby 2.5, which is EOL. Xsv 1.1.0 supports Ruby 2.6+, latest JRuby, latest TruffleRuby
9
+
10
+ ## 1.0.6 2022-01-07
11
+
12
+ - Code cleanup, small performance improvements
13
+
14
+ ## 1.0.5 2022-01-05
15
+
16
+ - Raise exception if given an empty buffer when opening workbook (thanks @kevin-j-m)
17
+
18
+ ## 1.0.4 2021-07-05
19
+
20
+ - Support for custom date/time columns
21
+
3
22
  ## 1.0.3 2021-05-06
4
23
 
5
24
  - Handle nil number formats correctly (regression in Xsv 1.0.2, #29)
data/README.md CHANGED
@@ -1,10 +1,11 @@
1
1
  # Xsv .xlsx reader
2
2
 
3
3
  [![Travis CI](https://img.shields.io/travis/martijn/xsv/master)](https://travis-ci.org/martijn/xsv)
4
+ [![Codecov](https://img.shields.io/codecov/c/github/martijn/xsv/main)](https://app.codecov.io/gh/martijn/xsv)
4
5
  [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)
5
6
  [![Gem Version](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)
6
7
 
7
- Xsv is a fast, lightweight, pure Ruby parser for Office Open XML spreadsheet files
8
+ Xsv is a fast, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheet files
8
9
  (commonly known as Excel or .xlsx files). It strives to be minimal in the
9
10
  sense that it provides nothing a CSV reader wouldn't, meaning it only
10
11
  deals with minimal formatting and cannot create or modify documents.
@@ -41,17 +42,18 @@ when that becomes stable.
41
42
 
42
43
  ## Usage
43
44
 
45
+ ### Array and hash mode
44
46
  Xsv has two modes of operation. By default, it returns an array for
45
47
  each row in the sheet:
46
48
 
47
49
  ```ruby
48
- x = Xsv::Workbook.open("sheet.xlsx")
50
+ x = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>
49
51
 
50
52
  sheet = x.sheets[0]
51
53
 
52
54
  # Iterate over rows
53
- sheet.each_row do |row|
54
- row # => ["header1", "header2"], etc.
55
+ sheet.each do |row|
56
+ row # => ["header1", "header2"]
55
57
  end
56
58
 
57
59
  # Access row by index (zero-based)
@@ -59,40 +61,63 @@ sheet[1] # => ["value1", "value2"]
59
61
  ```
60
62
 
61
63
  Alternatively, it can load the headers from the first row and return a hash
62
- for every row:
64
+ for every row by calling `parse_headers!` on the sheet or setting the `parse_headers`
65
+ option on open:
63
66
 
64
67
  ```ruby
65
- x = Xsv::Workbook.open("sheet.xlsx")
68
+ # Parse headers for all sheets on open
69
+
70
+ x = Xsv.open("sheet.xlsx", parse_headers: true)
71
+
72
+ x.sheets[0][1] # => {"header1" => "value1", "header2" => "value2"}
73
+
74
+ # Manually parse headers for a single sheet
75
+
76
+ x = Xsv.open("sheet.xlsx")
66
77
 
67
78
  sheet = x.sheets[0]
68
79
 
69
- sheet.mode # => :array
80
+ sheet[0] # => ["header1", "header2"]
70
81
 
71
- # Parse headers and switch to hash mode
72
82
  sheet.parse_headers!
73
83
 
74
- sheet.mode # => :hash
84
+ sheet[0] # => {"header1" => "value1", "header2" => "value2"}
85
+ ```
86
+
87
+ Be aware that hash mode will lead to unpredictable results if the worksheet
88
+ has multiple columns with the same header. `Xsv::Sheet` implements `Enumerable` so along with `#each`
89
+ you can call methods like `#first`, `#filter`/`#select`, and `#map` on it.
90
+
91
+ ### Opening a string or buffer instead of filename
75
92
 
76
- sheet.each_row do |row|
77
- row # => {"header1" => "value1", "header2" => "value2"}, etc.
93
+ `Xsv.open` accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block
94
+ which will be called with the workbook as parameter, like `File#open`. Example of this together:
95
+
96
+ ```ruby
97
+ # Use an existing IO-like object as source
98
+
99
+ file = File.open("sheet.xlsx")
100
+
101
+ Xsv.open(file) do |workbook|
102
+ puts workbook.inspect
78
103
  end
79
104
 
80
- sheet[1] # => {"header1" => "value1", "header2" => "value2"}
81
- ```
105
+ # or even:
82
106
 
83
- Be aware that hash mode will lead to unpredictable results if the worksheet
84
- has multiple columns with the same header.
107
+ Xsv.open(file.read) do |workbook|
108
+ puts workbook.inspect
109
+ end
110
+ ```
85
111
 
86
- `Xsv::Workbook.open` accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block
87
- which will be called with the workbook as parameter, like `File#open`.
112
+ Prior to Xsv 1.1.0, `Xsv::Workbook.open` was used instead of `Xsv.open`. The parameters are identical and
113
+ the former is maintained for backwards compatibility.
88
114
 
89
- `Xsv::Sheet` implements `Enumerable` so you can call methods like `#first`,
90
- `#filter`/`#select`, and `#map` on it.
115
+ ### Accessing sheets by name
91
116
 
92
117
  The sheets can be accessed by index or by name:
93
118
 
94
119
  ```ruby
95
- x = Xsv::Workbook.open("sheet.xlsx")
120
+ x = Xsv.open("sheet.xlsx")
96
121
 
97
122
  sheet = x.sheets[0] # gets sheet by index
98
123
 
data/Rakefile CHANGED
@@ -13,4 +13,4 @@ Rake::TestTask.new(:bench) do |t|
13
13
  t.test_files = FileList["test/**/*_benchmark.rb"]
14
14
  end
15
15
 
16
- task :default => [:test, :bench]
16
+ task default: [:test, :bench]
data/benchmark.rb ADDED
@@ -0,0 +1,51 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/inline"
4
+
5
+ gemfile do
6
+ source "https://rubygems.org"
7
+
8
+ gemspec
9
+ gem "benchmark-memory"
10
+ gem "benchmark-perf"
11
+ end
12
+
13
+ def bench_perf(sheet)
14
+ result = Benchmark::Perf.cpu(repeat: 5) do
15
+ sheet.each do |row|
16
+ row.each do |cell|
17
+ cell
18
+ end
19
+ end
20
+ end
21
+
22
+ puts "Performance benchmark: #{result.avg}s avg #{result.stdev}s stdev"
23
+ end
24
+
25
+ def bench_mem(sheet)
26
+ Benchmark.memory do |bm|
27
+ bm.report do
28
+ sheet.each do |row|
29
+ row.each do |cell|
30
+ cell
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
36
+
37
+ file = File.read("test/files/10k-sheet.xlsx")
38
+
39
+ workbook = Xsv.open(file)
40
+
41
+ puts "--- ARRAY MODE ---"
42
+
43
+ bench_perf(workbook.sheets[0])
44
+ bench_mem(workbook.sheets[0])
45
+
46
+ puts "\n--- HASH MODE ---"
47
+
48
+ workbook.sheets[0].parse_headers!
49
+
50
+ bench_perf(workbook.sheets[0])
51
+ bench_mem(workbook.sheets[0])
data/lib/xsv/helpers.rb CHANGED
@@ -5,42 +5,42 @@ module Xsv
5
5
  # The default OOXML Spreadheet number formats according to the ECMA standard
6
6
  # User formats are appended from index 174 onward
7
7
  BUILT_IN_NUMBER_FORMATS = {
8
- 1 => '0',
9
- 2 => '0.00',
10
- 3 => '#, ##0',
11
- 4 => '#, ##0.00',
12
- 5 => '$#, ##0_);($#, ##0)',
13
- 6 => '$#, ##0_);[Red]($#, ##0)',
14
- 7 => '$#, ##0.00_);($#, ##0.00)',
15
- 8 => '$#, ##0.00_);[Red]($#, ##0.00)',
16
- 9 => '0%',
17
- 10 => '0.00%',
18
- 11 => '0.00E+00',
19
- 12 => '# ?/?',
20
- 13 => '# ??/??',
21
- 14 => 'm/d/yyyy',
22
- 15 => 'd-mmm-yy',
23
- 16 => 'd-mmm',
24
- 17 => 'mmm-yy',
25
- 18 => 'h:mm AM/PM',
26
- 19 => 'h:mm:ss AM/PM',
27
- 20 => 'h:mm',
28
- 21 => 'h:mm:ss',
29
- 22 => 'm/d/yyyy h:mm',
30
- 37 => '#, ##0_);(#, ##0)',
31
- 38 => '#, ##0_);[Red](#, ##0)',
32
- 39 => '#, ##0.00_);(#, ##0.00)',
33
- 40 => '#, ##0.00_);[Red](#, ##0.00)',
34
- 45 => 'mm:ss',
35
- 46 => '[h]:mm:ss',
36
- 47 => 'mm:ss.0',
37
- 48 => '##0.0E+0',
38
- 49 => '@'
8
+ 1 => "0",
9
+ 2 => "0.00",
10
+ 3 => "#, ##0",
11
+ 4 => "#, ##0.00",
12
+ 5 => "$#, ##0_);($#, ##0)",
13
+ 6 => "$#, ##0_);[Red]($#, ##0)",
14
+ 7 => "$#, ##0.00_);($#, ##0.00)",
15
+ 8 => "$#, ##0.00_);[Red]($#, ##0.00)",
16
+ 9 => "0%",
17
+ 10 => "0.00%",
18
+ 11 => "0.00E+00",
19
+ 12 => "# ?/?",
20
+ 13 => "# ??/??",
21
+ 14 => "m/d/yyyy",
22
+ 15 => "d-mmm-yy",
23
+ 16 => "d-mmm",
24
+ 17 => "mmm-yy",
25
+ 18 => "h:mm AM/PM",
26
+ 19 => "h:mm:ss AM/PM",
27
+ 20 => "h:mm",
28
+ 21 => "h:mm:ss",
29
+ 22 => "m/d/yyyy h:mm",
30
+ 37 => "#, ##0_);(#, ##0)",
31
+ 38 => "#, ##0_);[Red](#, ##0)",
32
+ 39 => "#, ##0.00_);(#, ##0.00)",
33
+ 40 => "#, ##0.00_);[Red](#, ##0.00)",
34
+ 45 => "mm:ss",
35
+ 46 => "[h]:mm:ss",
36
+ 47 => "mm:ss.0",
37
+ 48 => "##0.0E+0",
38
+ 49 => "@"
39
39
  }.freeze
40
40
 
41
41
  MINUTE = 60
42
42
  HOUR = 3600
43
- A_CODEPOINT = 'A'.ord.freeze
43
+ A_CODEPOINT = "A".ord.freeze
44
44
  # The epoch for all dates in OOXML Spreadsheet documents
45
45
  EPOCH = Date.new(1899, 12, 30).freeze
46
46
 
@@ -74,7 +74,7 @@ module Xsv
74
74
  minutes = minutes % 60
75
75
  end
76
76
 
77
- format('%02d:%02d', hours, minutes)
77
+ format("%02d:%02d", hours, minutes)
78
78
  end
79
79
 
80
80
  # Returns a time including a date as a {Time} object
@@ -92,9 +92,9 @@ module Xsv
92
92
 
93
93
  # Returns a number as either Integer or Float
94
94
  def parse_number(string)
95
- if string.include? '.'
95
+ if string.include? "."
96
96
  string.to_f
97
- elsif string.include? 'E'
97
+ elsif string.include? "E"
98
98
  Complex(string).to_f
99
99
  else
100
100
  string.to_i
@@ -17,7 +17,7 @@ module Xsv
17
17
  end
18
18
 
19
19
  def start_element(name, attrs)
20
- @block.call(attrs.slice(:Id, :Type, :Target)) if name == 'Relationship'
20
+ @block.call(attrs.slice(:Id, :Type, :Target)) if name == "Relationship"
21
21
  end
22
22
  end
23
23
  end
@@ -5,6 +5,9 @@ module Xsv
5
5
  ATTR_REGEX = /((\S+)="(.*?)")/m
6
6
 
7
7
  def parse(io)
8
+ responds_to_end_element = respond_to?(:end_element)
9
+ responds_to_characters = respond_to?(:characters)
10
+
8
11
  state = :look_start
9
12
  if io.is_a?(String)
10
13
  pbuf = io.dup
@@ -29,16 +32,16 @@ module Xsv
29
32
  end
30
33
 
31
34
  if state == :look_start
32
- if (o = pbuf.index('<'))
33
- chars = pbuf.slice!(0, o + 1).chop!.force_encoding('utf-8')
35
+ if (o = pbuf.index("<"))
36
+ chars = pbuf.slice!(0, o + 1).chop!.force_encoding("utf-8")
34
37
 
35
- if respond_to?(:characters) && !chars.empty?
36
- if chars.index('&')
37
- chars.gsub!('&amp;', '&')
38
- chars.gsub!('&apos;', "'")
39
- chars.gsub!('&gt;', '>')
40
- chars.gsub!('&lt;', '<')
41
- chars.gsub!('&quot;', '"')
38
+ if responds_to_characters && !chars.empty?
39
+ if chars.index("&")
40
+ chars.gsub!("&amp;", "&")
41
+ chars.gsub!("&apos;", "'")
42
+ chars.gsub!("&gt;", ">")
43
+ chars.gsub!("&lt;", "<")
44
+ chars.gsub!("&quot;", '"')
42
45
  end
43
46
  characters(chars)
44
47
  end
@@ -55,8 +58,8 @@ module Xsv
55
58
  end
56
59
 
57
60
  if state == :look_end
58
- if (o = pbuf.index('>'))
59
- if (s = pbuf.index(' ')) && s < o
61
+ if (o = pbuf.index(">"))
62
+ if (s = pbuf.index(" ")) && s < o
60
63
  tag_name = pbuf.slice!(0, s + 1).chop!
61
64
  args = pbuf.slice!(0, o - s)
62
65
  else
@@ -64,18 +67,18 @@ module Xsv
64
67
  args = nil
65
68
  end
66
69
 
67
- if tag_name.start_with?('/')
68
- end_element(tag_name[1..-1]) if respond_to?(:end_element)
70
+ if tag_name.start_with?("/")
71
+ end_element(tag_name[1..]) if responds_to_end_element
69
72
  elsif args.nil?
70
73
  start_element(tag_name, nil)
71
74
  else
72
75
  start_element(tag_name, args.scan(ATTR_REGEX).each_with_object({}) { |m, h| h[m[1].to_sym] = m[2] })
73
- end_element(tag_name) if args.end_with?('/') && respond_to?(:end_element)
76
+ end_element(tag_name) if responds_to_end_element && args.end_with?("/")
74
77
  end
75
78
 
76
79
  state = :look_start
77
80
  elsif eof_reached
78
- raise 'Malformed XML document, looking for end of tag beyond EOF'
81
+ raise Xsv::Error, "Malformed XML document, looking for end of tag beyond EOF"
79
82
  else
80
83
  must_read = true
81
84
  end
@@ -18,29 +18,29 @@ module Xsv
18
18
 
19
19
  def start_element(name, _attrs)
20
20
  case name
21
- when 'si'
22
- @current_string = ''
21
+ when "si"
22
+ @current_string = ""
23
23
  @skip = false
24
- when 'rPh'
24
+ when "rPh"
25
25
  @skip = true
26
- when 't'
26
+ when "t"
27
27
  @state = name
28
28
  end
29
29
  end
30
30
 
31
31
  def characters(value)
32
- if @state == 't' && !@skip
32
+ if @state == "t" && !@skip
33
33
  @current_string += value
34
34
  end
35
35
  end
36
36
 
37
37
  def end_element(name)
38
38
  case name
39
- when 'si'
39
+ when "si"
40
40
  @block.call(@current_string)
41
- when 'rPh'
41
+ when "rPh"
42
42
  @skip = false
43
- when 't'
43
+ when "t"
44
44
  @state = nil
45
45
  end
46
46
  end
data/lib/xsv/sheet.rb CHANGED
@@ -40,14 +40,14 @@ module Xsv
40
40
  @headers = []
41
41
  @mode = :array
42
42
  @row_skip = 0
43
- @hidden = ids[:state] == 'hidden'
43
+ @hidden = ids[:state] == "hidden"
44
44
 
45
45
  @last_row, @column_count = SheetBoundsHandler.get_bounds(@io, @workbook)
46
46
  end
47
47
 
48
48
  # @return [String]
49
49
  def inspect
50
- "#<#{self.class.name}:#{object_id}>"
50
+ "#<#{self.class.name}:#{object_id} mode=#{@mode}>"
51
51
  end
52
52
 
53
53
  # Returns true if the worksheet is hidden
@@ -66,7 +66,7 @@ module Xsv
66
66
  true
67
67
  end
68
68
 
69
- alias each each_row
69
+ alias_method :each, :each_row
70
70
 
71
71
  # Get row by number, starting at 0. Returns either a hash or an array based on the current row.
72
72
  # If the specified index is out of bounds an empty row is returned.
@@ -30,40 +30,40 @@ module Xsv
30
30
  @state = nil
31
31
  @cell = nil
32
32
  @row = nil
33
- @maxRow = 0
34
- @maxColumn = 0
33
+ @max_row = 0
34
+ @max_column = 0
35
35
  @trim_empty_rows = trim_empty_rows
36
36
  end
37
37
 
38
38
  def start_element(name, attrs)
39
39
  case name
40
- when 'c'
40
+ when "c"
41
41
  @state = name
42
42
  @cell = attrs[:r]
43
- when 'v'
43
+ when "v"
44
44
  col = column_index(@cell)
45
- @maxColumn = col if col > @maxColumn
46
- @maxRow = @row if @row > @maxRow
47
- when 'row'
45
+ @max_column = col if col > @max_column
46
+ @max_row = @row if @row > @max_row
47
+ when "row"
48
48
  @state = name
49
49
  @row = attrs[:r].to_i
50
- when 'dimension'
50
+ when "dimension"
51
51
  @state = name
52
52
 
53
- _firstCell, lastCell = attrs[:ref].split(':')
53
+ _first_cell, last_cell = attrs[:ref].split(":")
54
54
 
55
- if lastCell
56
- @maxColumn = column_index(lastCell)
55
+ if last_cell
56
+ @max_column = column_index(last_cell)
57
57
  unless @trim_empty_rows
58
- @maxRow = lastCell[/\d+$/].to_i
59
- @block.call(@maxRow, @maxColumn)
58
+ @max_row = last_cell[/\d+$/].to_i
59
+ @block.call(@max_row, @max_column)
60
60
  end
61
61
  end
62
62
  end
63
63
  end
64
64
 
65
65
  def end_element(name)
66
- @block.call(@maxRow, @maxColumn) if name == 'sheetData'
66
+ @block.call(@max_row, @max_column) if name == "sheetData"
67
67
  end
68
68
  end
69
69
  end
@@ -14,58 +14,50 @@ module Xsv
14
14
  @last_row = last_row - @row_skip
15
15
  @block = block
16
16
 
17
- @state = nil
17
+ @store_characters = false
18
18
 
19
19
  @row_index = 0
20
20
  @current_row = {}
21
- @current_row_attrs = {}
21
+ @current_row_number = 0
22
22
  @current_cell = {}
23
- @current_value = String.new
23
+ @current_value = +""
24
24
 
25
25
  @headers = @empty_row.keys if @mode == :hash
26
26
  end
27
27
 
28
28
  def start_element(name, attrs)
29
29
  case name
30
- when 'c'
31
- @state = name
30
+ when "c"
32
31
  @current_cell = attrs
33
32
  @current_value.clear
34
- when 'v', 'is'
35
- @state = name
36
- when 'row'
37
- @state = name
33
+ when "v", "is", "t"
34
+ @store_characters = true
35
+ when "row"
38
36
  @current_row = @empty_row.dup
39
- @current_row_attrs = attrs
40
- when 't'
41
- @state = nil unless @state == 'is'
42
- else
43
- @state = nil
37
+ @current_row_number = attrs[:r].to_i
44
38
  end
45
39
  end
46
40
 
47
41
  def characters(value)
48
- @current_value << value if @state == 'v' || @state == 'is'
42
+ @current_value << value if @store_characters
49
43
  end
50
44
 
51
45
  def end_element(name)
52
46
  case name
53
- when 'v'
54
- @state = nil
55
- when 'c'
47
+ when "v", "is", "t"
48
+ @store_characters = false
49
+ when "c"
56
50
  col_index = column_index(@current_cell[:r])
57
51
 
58
- case @mode
59
- when :array
52
+ if @mode == :array
60
53
  @current_row[col_index] = format_cell
61
- when :hash
54
+ else
62
55
  @current_row[@headers[col_index]] = format_cell
63
56
  end
64
- when 'row'
65
- real_row_number = @current_row_attrs[:r].to_i
66
- adjusted_row_number = real_row_number - @row_skip
57
+ when "row"
58
+ return if @current_row_number <= @row_skip
67
59
 
68
- return if real_row_number <= @row_skip
60
+ adjusted_row_number = @current_row_number - @row_skip
69
61
 
70
62
  @row_index += 1
71
63
 
@@ -90,23 +82,22 @@ module Xsv
90
82
  return nil if @current_value.empty?
91
83
 
92
84
  case @current_cell[:t]
93
- when 's'
85
+ when "s"
94
86
  @workbook.shared_strings[@current_value.to_i]
95
- when 'str', 'inlineStr'
87
+ when "str", "inlineStr"
96
88
  @current_value.strip
97
- when 'e' # N/A
89
+ when "e" # N/A
98
90
  nil
99
- when nil, 'n'
91
+ when nil, "n"
100
92
  if @current_cell[:s]
101
- style = @workbook.xfs[@current_cell[:s].to_i]
102
- numFmt = @workbook.numFmts[style[:numFmtId].to_i]
103
-
104
- parse_number_format(@current_value, numFmt)
93
+ parse_number_format(@current_value, @workbook.get_num_fmt(@current_cell[:s].to_i))
105
94
  else
106
95
  parse_number(@current_value)
107
96
  end
108
- when 'b'
109
- @current_value == '1'
97
+ when "b"
98
+ @current_value == "1"
99
+ when "d"
100
+ DateTime.parse(@current_value)
110
101
  else
111
102
  raise Xsv::Error, "Encountered unknown column type #{@current_cell[:t]}"
112
103
  end
@@ -17,7 +17,7 @@ module Xsv
17
17
  end
18
18
 
19
19
  def start_element(name, attrs)
20
- @block.call(attrs.slice(:name, :sheetId, :state, :'r:id')) if name == 'sheet'
20
+ @block.call(attrs.slice(:name, :sheetId, :state, :'r:id')) if name == "sheet"
21
21
  end
22
22
  end
23
23
  end
@@ -5,39 +5,39 @@ module Xsv
5
5
  # This is used internally when opening a sheet.
6
6
  class StylesHandler < SaxParser
7
7
  def self.get_styles(io)
8
- handler = new(Xsv::Helpers::BUILT_IN_NUMBER_FORMATS.dup) do |xfs, numFmts|
8
+ handler = new(Xsv::Helpers::BUILT_IN_NUMBER_FORMATS.dup) do |xfs, num_fmts|
9
9
  @xfs = xfs
10
- @numFmts = numFmts
10
+ @num_fmts = num_fmts
11
11
  end
12
12
 
13
13
  handler.parse(io)
14
14
 
15
- [@xfs, @numFmts]
15
+ [@xfs, @num_fmts]
16
16
  end
17
17
 
18
- def initialize(numFmts, &block)
18
+ def initialize(num_fmts, &block)
19
19
  @block = block
20
20
  @state = nil
21
21
  @xfs = []
22
- @numFmts = numFmts
22
+ @num_fmts = num_fmts
23
23
  end
24
24
 
25
25
  def start_element(name, attrs)
26
26
  case name
27
- when 'cellXfs'
28
- @state = 'cellXfs'
29
- when 'xf'
30
- @xfs << attrs if @state == 'cellXfs'
31
- when 'numFmt'
32
- @numFmts[attrs[:numFmtId].to_i] = attrs[:formatCode]
27
+ when "cellXfs"
28
+ @state = "cellXfs"
29
+ when "xf"
30
+ @xfs << attrs.transform_values(&:to_i) if @state == "cellXfs"
31
+ when "numFmt"
32
+ @num_fmts[attrs[:numFmtId].to_i] = attrs[:formatCode]
33
33
  end
34
34
  end
35
35
 
36
36
  def end_element(name)
37
37
  case name
38
- when 'styleSheet'
39
- @block.call(@xfs, @numFmts)
40
- when 'cellXfs'
38
+ when "styleSheet"
39
+ @block.call(@xfs, @num_fmts)
40
+ when "cellXfs"
41
41
  @state = nil
42
42
  end
43
43
  end
data/lib/xsv/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Xsv
4
- VERSION = '1.0.3'
4
+ VERSION = "1.1.0"
5
5
  end
data/lib/xsv/workbook.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'zip'
3
+ require "zip"
4
4
 
5
5
  module Xsv
6
6
  # An OOXML Spreadsheet document is called a Workbook. A Workbook consists of
@@ -10,54 +10,36 @@ module Xsv
10
10
  # @return [Array<Sheet>]
11
11
  attr_reader :sheets
12
12
 
13
- attr_reader :shared_strings, :xfs, :numFmts, :trim_empty_rows
14
-
15
- # Open the workbook of the given filename, string or buffer. For additional
16
- # options see {.initialize}
17
- def self.open(data, **kws)
18
- @workbook = if data.is_a?(IO) || data.respond_to?(:read) # is it a buffer?
19
- new(Zip::File.open_buffer(data), **kws)
20
- elsif data.start_with?("PK\x03\x04") # is it a string containing a file?
21
- new(Zip::File.open_buffer(data), **kws)
22
- else # must be a filename
23
- new(Zip::File.open(data), **kws)
24
- end
25
-
26
- if block_given?
27
- begin
28
- yield(@workbook)
29
- ensure
30
- @workbook.close
31
- end
32
- else
33
- @workbook
34
- end
13
+ attr_reader :shared_strings, :xfs, :num_fmts, :trim_empty_rows
14
+
15
+ # @deprecated Use {Xsv.open} instead
16
+ def self.open(data, **kws, &block)
17
+ Xsv.open(data, **kws, &block)
35
18
  end
36
19
 
37
20
  # Open a workbook from an instance of {Zip::File}. Generally it's recommended
38
21
  # to use the {.open} method instead of the constructor.
39
22
  #
40
- # Options:
41
- #
42
- # trim_empty_rows (false) Scan sheet for end of content and don't return trailing rows
43
- #
44
- def initialize(zip, trim_empty_rows: false)
23
+ # @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
24
+ # @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
25
+ def initialize(zip, trim_empty_rows: false, parse_headers: false)
45
26
  raise ArgumentError, "Passed argument is not an instance of Zip::File. Did you mean to use Workbook.open?" unless zip.is_a?(Zip::File)
27
+ raise Xsv::Error, "Zip::File is empty" if zip.size.zero?
46
28
 
47
29
  @zip = zip
48
30
  @trim_empty_rows = trim_empty_rows
49
31
 
50
32
  @sheets = []
51
- @xfs, @numFmts = fetch_styles
33
+ @xfs, @num_fmts = fetch_styles
52
34
  @sheet_ids = fetch_sheet_ids
53
35
  @relationships = fetch_relationships
54
36
  @shared_strings = fetch_shared_strings
55
- @sheets = fetch_sheets
37
+ @sheets = fetch_sheets(parse_headers ? :hash : :array)
56
38
  end
57
39
 
58
40
  # @return [String]
59
41
  def inspect
60
- "#<#{self.class.name}:#{object_id}>"
42
+ "#<#{self.class.name}:#{object_id} sheets=#{sheets.count} trim_empty_rows=#{@trim_empty_rows}>"
61
43
  end
62
44
 
63
45
  # Close the handle to the workbook file and leave all resources for the GC to collect
@@ -67,7 +49,7 @@ module Xsv
67
49
  @zip = nil
68
50
  @sheets = nil
69
51
  @xfs = nil
70
- @numFmts = nil
52
+ @num_fmts = nil
71
53
  @relationships = nil
72
54
  @shared_strings = nil
73
55
  @sheet_ids = nil
@@ -82,10 +64,15 @@ module Xsv
82
64
  @sheets.select { |s| s.name == name }
83
65
  end
84
66
 
67
+ # Get number format for given style index
68
+ def get_num_fmt(style)
69
+ @num_fmts[@xfs[style][:numFmtId]]
70
+ end
71
+
85
72
  private
86
73
 
87
74
  def fetch_shared_strings
88
- handle = @zip.glob('xl/sharedStrings.xml').first
75
+ handle = @zip.glob("xl/sharedStrings.xml").first
89
76
  return if handle.nil?
90
77
 
91
78
  stream = handle.get_input_stream
@@ -95,32 +82,34 @@ module Xsv
95
82
  end
96
83
 
97
84
  def fetch_styles
98
- stream = @zip.glob('xl/styles.xml').first.get_input_stream
85
+ stream = @zip.glob("xl/styles.xml").first.get_input_stream
99
86
 
100
87
  StylesHandler.get_styles(stream)
101
88
  ensure
102
89
  stream.close
103
90
  end
104
91
 
105
- def fetch_sheets
106
- @zip.glob('xl/worksheets/sheet*.xml').sort do |a, b|
92
+ def fetch_sheets(mode)
93
+ @zip.glob("xl/worksheets/sheet*.xml").sort do |a, b|
107
94
  a.name[/\d+/].to_i <=> b.name[/\d+/].to_i
108
95
  end.map do |entry|
109
- rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?('worksheet') }
96
+ rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?("worksheet") }
110
97
  sheet_ids = @sheet_ids.detect { |i| i[:"r:id"] == rel[:Id] }
111
- Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids)
98
+ Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids).tap do |sheet|
99
+ sheet.parse_headers! if mode == :hash
100
+ end
112
101
  end
113
102
  end
114
103
 
115
104
  def fetch_sheet_ids
116
- stream = @zip.glob('xl/workbook.xml').first.get_input_stream
105
+ stream = @zip.glob("xl/workbook.xml").first.get_input_stream
117
106
  SheetsIdsHandler.get_sheets_ids(stream)
118
107
  ensure
119
108
  stream.close
120
109
  end
121
110
 
122
111
  def fetch_relationships
123
- stream = @zip.glob('xl/_rels/workbook.xml.rels').first.get_input_stream
112
+ stream = @zip.glob("xl/_rels/workbook.xml.rels").first.get_input_stream
124
113
  RelationshipsHandler.get_relations(stream)
125
114
  ensure
126
115
  stream.close
data/lib/xsv.rb CHANGED
@@ -1,18 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'date'
3
+ require "date"
4
4
 
5
- require 'xsv/helpers'
6
- require 'xsv/sax_parser'
7
- require 'xsv/relationships_handler'
8
- require 'xsv/shared_strings_parser'
9
- require 'xsv/sheet'
10
- require 'xsv/sheet_bounds_handler'
11
- require 'xsv/sheet_rows_handler'
12
- require 'xsv/sheets_ids_handler'
13
- require 'xsv/styles_handler'
14
- require 'xsv/version'
15
- require 'xsv/workbook'
5
+ require "xsv/helpers"
6
+ require "xsv/sax_parser"
7
+ require "xsv/relationships_handler"
8
+ require "xsv/shared_strings_parser"
9
+ require "xsv/sheet"
10
+ require "xsv/sheet_bounds_handler"
11
+ require "xsv/sheet_rows_handler"
12
+ require "xsv/sheets_ids_handler"
13
+ require "xsv/styles_handler"
14
+ require "xsv/version"
15
+ require "xsv/workbook"
16
16
 
17
17
  # XSV is a fast, lightweight parser for Office Open XML spreadsheet files
18
18
  # (commonly known as Excel or .xlsx files). It strives to be minimal in the
@@ -24,4 +24,31 @@ module Xsv
24
24
  # An AssertionFailed error indicates an unexpected condition, meaning a bug
25
25
  # or misinterpreted .xlsx document
26
26
  class AssertionFailed < StandardError; end
27
+
28
+ # Open the workbook of the given filename, string or buffer.
29
+ # @param filename_or_string [String, IO] the contents or filename of a workbook
30
+ # @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
31
+ # @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
32
+ # @return [Xsv::Workbook] The workbook instance
33
+ def self.open(filename_or_string, trim_empty_rows: false, parse_headers: false)
34
+ zip = if filename_or_string.is_a?(IO) || filename_or_string.respond_to?(:read) # is it a buffer?
35
+ Zip::File.open_buffer(filename_or_string)
36
+ elsif filename_or_string.start_with?("PK\x03\x04") # is it a string containing a file?
37
+ Zip::File.open_buffer(filename_or_string)
38
+ else # must be a filename
39
+ Zip::File.open(filename_or_string)
40
+ end
41
+
42
+ workbook = Xsv::Workbook.new(zip, trim_empty_rows: trim_empty_rows, parse_headers: parse_headers)
43
+
44
+ if block_given?
45
+ begin
46
+ yield(workbook)
47
+ ensure
48
+ workbook.close
49
+ end
50
+ else
51
+ workbook
52
+ end
53
+ end
27
54
  end
data/xsv.gemspec CHANGED
@@ -14,7 +14,7 @@ Gem::Specification.new do |spec|
14
14
  (commonly known as Excel or .xlsx files). It strives to be minimal in the
15
15
  sense that it provides nothing a CSV reader wouldn't, meaning it only
16
16
  deals with minimal formatting and cannot create or modify documents.
17
- EOF
17
+ EOF
18
18
  spec.homepage = "https://github.com/martijn/xsv"
19
19
  spec.license = "MIT"
20
20
 
@@ -36,11 +36,13 @@ Gem::Specification.new do |spec|
36
36
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
37
37
  spec.require_paths = ["lib"]
38
38
 
39
- spec.required_ruby_version = ">= 2.5"
39
+ spec.required_ruby_version = ">= 2.6"
40
40
 
41
41
  spec.add_dependency "rubyzip", ">= 1.3", "< 3"
42
42
 
43
43
  spec.add_development_dependency "bundler", "< 3"
44
44
  spec.add_development_dependency "rake", "~> 13.0"
45
45
  spec.add_development_dependency "minitest", "~> 5.14.2"
46
+ spec.add_development_dependency "standard", "~> 1.6.0"
47
+ spec.add_development_dependency "codecov", ">= 0.6.0"
46
48
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.3
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Martijn Storck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-05-06 00:00:00.000000000 Z
11
+ date: 2022-02-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rubyzip
@@ -72,6 +72,34 @@ dependencies:
72
72
  - - "~>"
73
73
  - !ruby/object:Gem::Version
74
74
  version: 5.14.2
75
+ - !ruby/object:Gem::Dependency
76
+ name: standard
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: 1.6.0
82
+ type: :development
83
+ prerelease: false
84
+ version_requirements: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: 1.6.0
89
+ - !ruby/object:Gem::Dependency
90
+ name: codecov
91
+ requirement: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - ">="
94
+ - !ruby/object:Gem::Version
95
+ version: 0.6.0
96
+ type: :development
97
+ prerelease: false
98
+ version_requirements: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - ">="
101
+ - !ruby/object:Gem::Version
102
+ version: 0.6.0
75
103
  description: |2
76
104
  Xsv is a fast, lightweight parser for Office Open XML spreadsheet files
77
105
  (commonly known as Excel or .xlsx files). It strives to be minimal in the
@@ -83,13 +111,15 @@ executables: []
83
111
  extensions: []
84
112
  extra_rdoc_files: []
85
113
  files:
114
+ - ".github/workflows/ruby.yml"
86
115
  - ".gitignore"
87
- - ".travis.yml"
116
+ - ".standard.yml"
88
117
  - CHANGELOG.md
89
118
  - Gemfile
90
119
  - LICENSE.txt
91
120
  - README.md
92
121
  - Rakefile
122
+ - benchmark.rb
93
123
  - bin/console
94
124
  - bin/setup
95
125
  - lib/xsv.rb
@@ -120,14 +150,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
120
150
  requirements:
121
151
  - - ">="
122
152
  - !ruby/object:Gem::Version
123
- version: '2.5'
153
+ version: '2.6'
124
154
  required_rubygems_version: !ruby/object:Gem::Requirement
125
155
  requirements:
126
156
  - - ">="
127
157
  - !ruby/object:Gem::Version
128
158
  version: '0'
129
159
  requirements: []
130
- rubygems_version: 3.2.15
160
+ rubygems_version: 3.2.3
131
161
  signing_key:
132
162
  specification_version: 4
133
163
  summary: A fast and lightweight xlsx parser that provides nothing a CSV parser wouldn't
data/.travis.yml DELETED
@@ -1,10 +0,0 @@
1
- ---
2
- sudo: false
3
- language: ruby
4
- cache: bundler
5
- rvm:
6
- - 2.5.8
7
- - 3.0
8
- - truffleruby
9
- - jruby
10
- before_install: gem install bundler