xsv 1.2.1 → 1.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5ba17dfa73b0bb77884f1f95bac715291316aa415c512b5a60b230250395f87d
4
- data.tar.gz: d5c9ce6ac4b64dc854f6dd133fb139fae13b343b9e08465712d6b0e0189a6628
3
+ metadata.gz: 1ff63ab6263b0486a400d5d36eb53d0af7265532b67536c7024b414b2bcf3ad2
4
+ data.tar.gz: a885af30f1a874818bb80f984115d7bc47e178007bf9848407e279af3effcd3b
5
5
  SHA512:
6
- metadata.gz: ec2f2bfd10455002aa2ebe8fe661a2b347e5831fde513d57a08fa43a58ea24a8105a2b308d504d5e65f30a5ac5a5c029daf0a6c2ac9c81ca722d6ff456d512ac
7
- data.tar.gz: 232238fb39c14a4a9fb7d080b0817bd83620f9caeecd812f3058cc473d97506d8b8fbfd5ef74fc25afaf2c46674d230f2292e9b2e1b8e89b2734bb8b1540c679
6
+ metadata.gz: 37b9b49b66fcec0c393a56382b162d0fb3c77e2342798292d3a4c73c49939ce3d0827c783a7b9b5c139b52553e3a622f6e47f510e4839963609c571e7ab53067
7
+ data.tar.gz: ecd5e49c82d802e2a654c7d6321cdf81b5ea81482a83821ca6ddd19ae5036a6e2a516569356ded600b2181b0c4161c5c7e4e36a43570b4bef82f7c9d58038392
@@ -19,7 +19,7 @@ jobs:
19
19
  runs-on: ubuntu-latest
20
20
  strategy:
21
21
  matrix:
22
- ruby-version: ['2.6', '2.7', '3.0', '3.1', '3.2', 'jruby', 'truffleruby']
22
+ ruby-version: ['2.7', '3.0', '3.1', '3.2', '3.3', 'jruby', 'truffleruby']
23
23
 
24
24
  steps:
25
25
  - uses: actions/checkout@v3
data/.standard.yml CHANGED
@@ -1 +1 @@
1
- ruby_version: 2.6.9
1
+ ruby_version: 2.7.8
data/CHANGELOG.md CHANGED
@@ -1,5 +1,12 @@
1
1
  # Xsv Changelog
2
2
 
3
+ ## 1.3.0 2023-12-16
4
+
5
+ - Ruby 2.6 is no longer supported. Xsv is compatible with Ruby 2.7 through 3.3, latest JRuby, and latest TruffleRuby
6
+ - Easier access worksheets using `Xsv::Workbook#[]` and `Enumerable` methods on `Xsv::Workbook`. The old `sheets` and `sheet_by_name` method have been retained for backward compatibility.
7
+ - Update of development dependencies including minitest and standardrb
8
+ - Various performance improvements, especially with YJIT on Ruby 3.3
9
+
3
10
  ## 1.2.1 2023-05-09
4
11
 
5
12
  - Handle columns without `r` attribute (issue #48)
data/README.md CHANGED
@@ -1,23 +1,20 @@
1
- # Xsv .xlsx reader
2
-
3
-
1
+ # Xsv .xlsx reader for Ruby
4
2
 
5
3
  [![Test badge](https://img.shields.io/github/actions/workflow/status/martijn/xsv/ruby.yml?branch=main)](https://github.com/martijn/xsv/actions/workflows/ruby.yml)
6
- [![Codecov badge](https://img.shields.io/codecov/c/github/martijn/xsv/main)](https://app.codecov.io/gh/martijn/xsv)
7
4
  [![Yard Docs badge](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)
8
5
  [![Gem Version badge](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)
9
6
 
10
- Xsv is a fast, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheet files
11
- (commonly known as Excel or .xlsx files). It strives to be minimal in the
12
- sense that it provides nothing a CSV reader wouldn't, meaning it only
13
- deals with minimal formatting and cannot create or modify documents.
7
+ Xsv is a high performance, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheets
8
+ (commonly known as Excel or .xlsx files). It strives to be minimal in the sense that it provides nothing a
9
+ CSV reader wouldn't. This means it only deals with the minimal required formatting and cannot create or modify
10
+ documents.
11
+ Xsv can handle very large Excel files with minimal resources thanks to a custom streaming XML parser that
12
+ is optimized for the Excel file format.
14
13
 
15
14
  Xsv is designed for worksheets with a single table of data, optionally
16
15
  with a header row. It only casts values to basic Ruby types (integer, float,
17
16
  date and time) and does not deal with most formatting or more advanced
18
- functionality. It strives for fast processing of large worksheets with
19
- minimal RAM and CPU consumption and has been in production use since the earliest
20
- versions.
17
+ functionality. Xsv has been production-ready since the initial release.
21
18
 
22
19
  Xsv stands for 'Excel Separated Values', because Excel just gets in the way.
23
20
 
@@ -37,20 +34,24 @@ Or install it yourself as:
37
34
 
38
35
  $ gem install xsv
39
36
 
40
- Xsv targets ruby >= 2.6 and has a just single dependency, `rubyzip`. It has been
37
+ Xsv targets ruby >= 2.7 and has a just single dependency, `rubyzip`. It has been
41
38
  tested successfully with MRI, JRuby, and TruffleRuby. It has no native extensions
42
39
  and is designed to be thread-safe.
43
40
 
44
41
  ## Usage
45
42
 
46
43
  ### Array and hash mode
44
+
47
45
  Xsv has two modes of operation. By default, it returns an array for
48
46
  each row in the sheet:
49
47
 
50
48
  ```ruby
51
- x = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>
49
+ workbook = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>
52
50
 
53
- sheet = x.sheets[0]
51
+ # Access worksheet by index, 0 is the first sheet
52
+ sheet = workbook[0]
53
+ # or, access worksheet by name
54
+ sheet = workbook["Sheet1"]
54
55
 
55
56
  # Iterate over rows
56
57
  sheet.each do |row|
@@ -68,29 +69,35 @@ option on open:
68
69
  ```ruby
69
70
  # Parse headers for all sheets on open
70
71
 
71
- x = Xsv.open("sheet.xlsx", parse_headers: true)
72
+ workbook = Xsv.open("sheet.xlsx", parse_headers: true)
72
73
 
73
- x.sheets[0][1] # => {"header1" => "value1", "header2" => "value2"}
74
+ # Get the first row from the first sheet
75
+ workbook.first.first # => {"header1" => "value1", "header2" => "value2"}
74
76
 
75
77
  # Manually parse headers for a single sheet
76
78
 
77
- x = Xsv.open("sheet.xlsx")
79
+ workbook = Xsv.open("sheet.xlsx")
78
80
 
79
- sheet = x.sheets[0]
81
+ sheet = workbook.first
80
82
 
81
- sheet[0] # => ["header1", "header2"]
83
+ sheet.first # => ["header1", "header2"]
82
84
 
83
85
  sheet.parse_headers!
84
86
 
85
- sheet[0] # => {"header1" => "value1", "header2" => "value2"}
87
+ sheet.first # => {"header1" => "value1", "header2" => "value2"}
86
88
  ```
87
89
 
88
- Because of the way Ruby hashes work will raise `Xsv::DuplicateHeaders` if it detects
89
- duplicate values in the header row when calling `#parse_headers!` or when opening
90
- a workbook with `parse_headers: true`.
90
+ Xsv will raise `Xsv::DuplicateHeaders` if it detects duplicate values in the header row when calling
91
+ `#parse_headers!` or when opening a workbook with `parse_headers: true` to ensure hash keys are unique.
91
92
 
92
93
  `Xsv::Sheet` implements `Enumerable` so along with `#each`
93
- you can call methods like `#first`, `#filter`/`#select`, and `#map` on it.
94
+ you can call methods like `#first`, `#filter`/`#select`, and `#map` on it. Likewise these methods can
95
+ be used on `Xsv::Workbook` to iterate over sheets, for example:
96
+
97
+ ```ruby
98
+ # Get the name of all the sheets in a workbook
99
+ sheet_names = @workbook.map(&:name)
100
+ ```
94
101
 
95
102
  ### Opening a string or buffer instead of filename
96
103
 
@@ -116,24 +123,6 @@ end
116
123
  Prior to Xsv 1.1.0, `Xsv::Workbook.open` was used instead of `Xsv.open`. The parameters are identical and
117
124
  the former is maintained for backwards compatibility.
118
125
 
119
- ### Accessing sheets by name
120
-
121
- The sheets can be accessed by index or by name:
122
-
123
- ```ruby
124
- x = Xsv.open("sheet.xlsx")
125
-
126
- sheet = x.sheets[0] # gets sheet by index
127
-
128
- sheet = x.sheets_by_name('Name').first # gets sheet by name
129
- ```
130
-
131
- To get all the sheets names:
132
-
133
- ```ruby
134
- sheet_names = x.sheets.map(&:name)
135
- ```
136
-
137
126
  ### Assumptions
138
127
 
139
128
  Since Xsv treats worksheets like csv files it makes certain assumptions about your
@@ -147,26 +136,35 @@ If your data or headers do not start on the first row of the sheet you can
147
136
  tell Xsv to skip a number of rows:
148
137
 
149
138
  ```ruby
150
- workbook.sheets[0].row_skip = 1
139
+ sheet = workbook[0]
140
+ sheet.row_skip = 1
151
141
  ```
152
142
 
153
143
  All operations will honour this offset, making the skipped rows unreachable.
154
144
 
155
145
  ## Development
156
146
 
157
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
147
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can
148
+ also run `bin/console` for an interactive prompt that will allow you to experiment.
158
149
 
159
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
150
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the
151
+ version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,
152
+ push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
160
153
 
161
154
  ## Performance and Benchmarks
162
155
 
163
- Xsv is faster and more memory efficient than other gems because of two things: it only _reads values_ from Excel files and it's based on a SAX-based parser instead of a DOM-based parser. If you want to read some background on this, check out my blog post on
156
+ Xsv is faster and more memory efficient than other gems because of two things: it only _reads values_ from Excel files
157
+ and it's based on a SAX-based parser instead of a DOM-based parser. If you want to read some background on this, check
158
+ out my blog post on
164
159
  [Efficient XML parsing in Ruby](https://storck.io/posts/efficient-xml-parsing-in-ruby/).
165
160
 
166
- Jamie Schembri did a shootout of Xsv against various other Excel reading gems comparing parsing speed, memory usage, and allocations.
161
+ Jamie Schembri did a shootout of Xsv against various other Excel reading gems comparing parsing speed, memory usage, and
162
+ allocations.
167
163
  Check our his blog post: [Faster Excel parsing in Ruby](https://blog.schembri.me/post/faster-excel-parsing-in-ruby/).
168
164
 
169
- Pre-1.0, Xsv used a native extension for XML parsing, which was faster than the native Ruby one (on MRI). But even with the native Ruby version generally Xsv still outperforms other Ruby parsing gems.
165
+ Pre-1.0, Xsv used a native extension for XML parsing, which was faster than the native Ruby one (on MRI). But even
166
+ the current native Ruby parser generally outperforms the competition. For maximum performance, it is recommended to
167
+ enable YJIT.
170
168
 
171
169
  ## Contributing
172
170
 
@@ -176,4 +174,6 @@ for inclusion in the source code repository.
176
174
 
177
175
  ## License
178
176
 
177
+ Copyright © Martijn Storck and Xsv contributors
178
+
179
179
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/benchmark.rb CHANGED
@@ -14,7 +14,6 @@ def bench_perf(sheet)
14
14
  result = Benchmark::Perf.cpu(repeat: 5) do
15
15
  sheet.each do |row|
16
16
  row.each do |cell|
17
- cell
18
17
  end
19
18
  end
20
19
  end
@@ -27,13 +26,14 @@ def bench_mem(sheet)
27
26
  bm.report do
28
27
  sheet.each do |row|
29
28
  row.each do |cell|
30
- cell
31
29
  end
32
30
  end
33
31
  end
34
32
  end
35
33
  end
36
34
 
35
+ puts RUBY_DESCRIPTION
36
+
37
37
  file = File.read("test/files/10k-sheet.xlsx")
38
38
 
39
39
  workbook = Xsv.open(file)
data/lib/xsv/helpers.rb CHANGED
@@ -45,11 +45,14 @@ module Xsv
45
45
  EPOCH = Date.new(1899, 12, 30).freeze
46
46
 
47
47
  # Return the index number for the given Excel column name (i.e. "A1" => 0)
48
+ # @param col [String] Column name in A1 notation
48
49
  def column_index(col)
49
- col.each_codepoint.reduce(0) do |sum, n|
50
- break sum - 1 if n < A_CODEPOINT # reached a number
50
+ chars = col.bytes
51
+ sum = 0
52
+ while (char = chars.delete_at(0))
53
+ break sum - 1 if char < A_CODEPOINT # reached the number
51
54
 
52
- sum * 26 + (n - A_CODEPOINT + 1)
55
+ sum = sum * 26 + (char - A_CODEPOINT + 1)
53
56
  end
54
57
  end
55
58
 
@@ -62,15 +62,27 @@ module Xsv
62
62
  args = nil
63
63
  end
64
64
 
65
- stripped_tag_name = strip_namespace(tag_name)
65
+ is_close_tag = tag_name.delete_prefix!("/")
66
66
 
67
- if tag_name.start_with?("/")
68
- end_element(strip_namespace(tag_name[1..])) if responds_to_end_element
67
+ # Strip XML namespace from tag
68
+ if (offset = tag_name.index(":"))
69
+ tag_name.slice!(0, offset + 1)
70
+ end
71
+
72
+ if is_close_tag
73
+ end_element(tag_name) if responds_to_end_element
69
74
  elsif args.nil?
70
- start_element(stripped_tag_name, nil)
75
+ start_element(tag_name, nil)
71
76
  else
72
- start_element(stripped_tag_name, args.scan(ATTR_REGEX).each_with_object({}) { |(_, k, v), h| h[k.to_sym] = v })
73
- end_element(stripped_tag_name) if responds_to_end_element && args.end_with?("/")
77
+ attribute_buffer = {}
78
+ attributes = args.scan(ATTR_REGEX)
79
+ while (attr = attributes.delete_at(0))
80
+ attribute_buffer[attr[1].to_sym] = attr[2]
81
+ end
82
+
83
+ start_element(tag_name, attribute_buffer)
84
+
85
+ end_element(tag_name) if responds_to_end_element && args.end_with?("/")
74
86
  end
75
87
 
76
88
  state = :look_start
@@ -82,16 +94,5 @@ module Xsv
82
94
  end
83
95
  end
84
96
  end
85
-
86
- private
87
-
88
- # I am not proud of this, but there's simply no need to deal with xmlns for this application ¯\_(ツ)_/¯
89
- def strip_namespace(tag)
90
- if (offset = tag.index(":"))
91
- tag[offset + 1..]
92
- else
93
- tag
94
- end
95
- end
96
97
  end
97
98
  end
data/lib/xsv/sheet.rb CHANGED
@@ -30,13 +30,11 @@ module Xsv
30
30
  #
31
31
  # @param workbook [Workbook] The Workbook with shared data such as shared strings and styles
32
32
  # @param io [IO] A handle to an open worksheet XML file
33
- # @param size [Number] size of the XML file
34
- def initialize(workbook, io, size, ids)
33
+ def initialize(workbook, io, ids)
35
34
  @workbook = workbook
36
35
  @id = ids[:sheetId].to_i
37
36
  @io = io
38
37
  @name = ids[:name]
39
- @size = size
40
38
  @headers = []
41
39
  @mode = :array
42
40
  @row_skip = 0
@@ -58,11 +56,7 @@ module Xsv
58
56
  # Iterate over rows, returning either hashes or arrays based on the current mode.
59
57
  def each_row(&block)
60
58
  @io.rewind
61
-
62
- handler = SheetRowsHandler.new(@mode, empty_row, @workbook, @row_skip, @last_row, &block)
63
-
64
- handler.parse(@io)
65
-
59
+ SheetRowsHandler.new(@mode, empty_row, @workbook, @row_skip, @last_row, &block).parse(@io)
66
60
  true
67
61
  end
68
62
 
@@ -34,7 +34,7 @@ module Xsv
34
34
  when "v", "is", "t"
35
35
  @store_characters = true
36
36
  when "row"
37
- @current_row = @mode == :array ? [] : @empty_row.dup
37
+ @current_row = @mode == :array ? [] : @empty_row.dup
38
38
  @current_row_number = attrs[:r].to_i
39
39
  end
40
40
  end
data/lib/xsv/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Xsv
4
- VERSION = "1.2.1"
4
+ VERSION = "1.3.0"
5
5
  end
data/lib/xsv/workbook.rb CHANGED
@@ -6,6 +6,8 @@ module Xsv
6
6
  # An OOXML Spreadsheet document is called a Workbook. A Workbook consists of
7
7
  # multiple Sheets that are available in the array that's accessible through {#sheets}
8
8
  class Workbook
9
+ include Enumerable
10
+
9
11
  # Access the Sheet objects contained in the workbook
10
12
  # @return [Array<Sheet>]
11
13
  attr_reader :sheets
@@ -69,6 +71,24 @@ module Xsv
69
71
  @num_fmts[@xfs[style][:numFmtId]]
70
72
  end
71
73
 
74
+ def each(&block)
75
+ sheets.each(&block)
76
+ end
77
+
78
+ # Get a sheet by index or name
79
+ # @param [String, Integer] index_or_name The name of the sheet or index in the workbook
80
+ # @return [<Xsv::Sheet>, nil] returns the sheet instance or nil if it was not found
81
+ def [](index_or_name)
82
+ case index_or_name
83
+ when Integer
84
+ sheets[index_or_name]
85
+ when String
86
+ sheets_by_name(index_or_name).first
87
+ else
88
+ raise ArgumentError, "Sheets can be accessed by Integer of String only"
89
+ end
90
+ end
91
+
72
92
  private
73
93
 
74
94
  def fetch_shared_strings
@@ -98,7 +118,7 @@ module Xsv
98
118
  r[:Type].end_with?("worksheet")
99
119
  end
100
120
  sheet_ids = @sheet_ids.detect { |i| i[:id] == rel[:Id] }
101
- Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids).tap do |sheet|
121
+ Xsv::Sheet.new(self, entry.get_input_stream, sheet_ids).tap do |sheet|
102
122
  sheet.parse_headers! if mode == :hash
103
123
  end
104
124
  end
data/xsv.gemspec CHANGED
@@ -36,13 +36,12 @@ Gem::Specification.new do |spec|
36
36
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
37
37
  spec.require_paths = ["lib"]
38
38
 
39
- spec.required_ruby_version = ">= 2.6"
39
+ spec.required_ruby_version = ">= 2.7"
40
40
 
41
41
  spec.add_dependency "rubyzip", ">= 1.3", "< 3"
42
42
 
43
43
  spec.add_development_dependency "bundler", "< 3"
44
- spec.add_development_dependency "rake", "~> 13.0"
45
- spec.add_development_dependency "minitest", "~> 5.14.2"
46
- spec.add_development_dependency "standard", "~> 1.6.0"
47
- spec.add_development_dependency "codecov", ">= 0.6.0"
44
+ spec.add_development_dependency "rake", "~> 13.1.0"
45
+ spec.add_development_dependency "minitest", "~> 5.20.0"
46
+ spec.add_development_dependency "standard", "~> 1.32.1"
48
47
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Martijn Storck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2023-05-09 00:00:00.000000000 Z
11
+ date: 2023-12-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rubyzip
@@ -50,56 +50,42 @@ dependencies:
50
50
  requirements:
51
51
  - - "~>"
52
52
  - !ruby/object:Gem::Version
53
- version: '13.0'
53
+ version: 13.1.0
54
54
  type: :development
55
55
  prerelease: false
56
56
  version_requirements: !ruby/object:Gem::Requirement
57
57
  requirements:
58
58
  - - "~>"
59
59
  - !ruby/object:Gem::Version
60
- version: '13.0'
60
+ version: 13.1.0
61
61
  - !ruby/object:Gem::Dependency
62
62
  name: minitest
63
63
  requirement: !ruby/object:Gem::Requirement
64
64
  requirements:
65
65
  - - "~>"
66
66
  - !ruby/object:Gem::Version
67
- version: 5.14.2
67
+ version: 5.20.0
68
68
  type: :development
69
69
  prerelease: false
70
70
  version_requirements: !ruby/object:Gem::Requirement
71
71
  requirements:
72
72
  - - "~>"
73
73
  - !ruby/object:Gem::Version
74
- version: 5.14.2
74
+ version: 5.20.0
75
75
  - !ruby/object:Gem::Dependency
76
76
  name: standard
77
77
  requirement: !ruby/object:Gem::Requirement
78
78
  requirements:
79
79
  - - "~>"
80
80
  - !ruby/object:Gem::Version
81
- version: 1.6.0
81
+ version: 1.32.1
82
82
  type: :development
83
83
  prerelease: false
84
84
  version_requirements: !ruby/object:Gem::Requirement
85
85
  requirements:
86
86
  - - "~>"
87
87
  - !ruby/object:Gem::Version
88
- version: 1.6.0
89
- - !ruby/object:Gem::Dependency
90
- name: codecov
91
- requirement: !ruby/object:Gem::Requirement
92
- requirements:
93
- - - ">="
94
- - !ruby/object:Gem::Version
95
- version: 0.6.0
96
- type: :development
97
- prerelease: false
98
- version_requirements: !ruby/object:Gem::Requirement
99
- requirements:
100
- - - ">="
101
- - !ruby/object:Gem::Version
102
- version: 0.6.0
88
+ version: 1.32.1
103
89
  description: |2
104
90
  Xsv is a fast, lightweight parser for Office Open XML spreadsheet files
105
91
  (commonly known as Excel or .xlsx files). It strives to be minimal in the
@@ -150,14 +136,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
150
136
  requirements:
151
137
  - - ">="
152
138
  - !ruby/object:Gem::Version
153
- version: '2.6'
139
+ version: '2.7'
154
140
  required_rubygems_version: !ruby/object:Gem::Requirement
155
141
  requirements:
156
142
  - - ">="
157
143
  - !ruby/object:Gem::Version
158
144
  version: '0'
159
145
  requirements: []
160
- rubygems_version: 3.2.3
146
+ rubygems_version: 3.5.1
161
147
  signing_key:
162
148
  specification_version: 4
163
149
  summary: A fast and lightweight xlsx parser that provides nothing a CSV parser wouldn't