xsv 1.0.6 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/ruby.yml +1 -1
- data/.standard.yml +1 -1
- data/CHANGELOG.md +7 -0
- data/README.md +44 -19
- data/benchmark.rb +2 -2
- data/lib/xsv/sax_parser.rb +2 -2
- data/lib/xsv/sheet.rb +1 -1
- data/lib/xsv/version.rb +1 -1
- data/lib/xsv/workbook.rb +12 -29
- data/lib/xsv.rb +27 -0
- data/xsv.gemspec +2 -1
- metadata +21 -7
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: f1ebfa4e4778af72a8b295d258d899a5b5d01fd029d1294d54af5e4f1e0de05a
|
|
4
|
+
data.tar.gz: aa74ffe0d57eebc12e312bdb42107bc39203cd3a0237f2e2481205c8b3b933c9
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 6ebbb32e48860043bdb0a5d17f6fef525252e8bb01ac63180cd0e749dfbd1b3bb08b8a82e43e9e13da2fa187c02f77e37f525e9c7453334bc3cb8fdf05400187
|
|
7
|
+
data.tar.gz: 4e1450daebcc3ddfbc0585de52f4d1f362ef2d79281e59cc641119a47924f8450b76c3643a6403a440e0a581ebfcc108431a311f902e2b48be61a1d2afd7b19e
|
data/.github/workflows/ruby.yml
CHANGED
data/.standard.yml
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
ruby_version: 2.
|
|
1
|
+
ruby_version: 2.6.9
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,12 @@
|
|
|
1
1
|
# Xsv Changelog
|
|
2
2
|
|
|
3
|
+
## 1.1.0 2022-02-13
|
|
4
|
+
|
|
5
|
+
- New, shorter `Xsv.open` syntax as a drop-in replacement for `Xsv::Workbook.open`, which is still supported
|
|
6
|
+
- Enable parsing of headers for all sheets by passing `parse_headers: true` to `Xsv.open`
|
|
7
|
+
- Improvements in performance and test coverage
|
|
8
|
+
- Dropped support for Ruby 2.5, which is EOL. Xsv 1.1.0 supports Ruby 2.6+, latest JRuby, latest TruffleRuby
|
|
9
|
+
|
|
3
10
|
## 1.0.6 2022-01-07
|
|
4
11
|
|
|
5
12
|
- Code cleanup, small performance improvements
|
data/README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# Xsv .xlsx reader
|
|
2
2
|
|
|
3
3
|
[](https://travis-ci.org/martijn/xsv)
|
|
4
|
+
[](https://app.codecov.io/gh/martijn/xsv)
|
|
4
5
|
[](https://rubydoc.info/github/martijn/xsv)
|
|
5
6
|
[](https://badge.fury.io/rb/xsv)
|
|
6
7
|
|
|
@@ -41,17 +42,18 @@ when that becomes stable.
|
|
|
41
42
|
|
|
42
43
|
## Usage
|
|
43
44
|
|
|
45
|
+
### Array and hash mode
|
|
44
46
|
Xsv has two modes of operation. By default, it returns an array for
|
|
45
47
|
each row in the sheet:
|
|
46
48
|
|
|
47
49
|
```ruby
|
|
48
|
-
x = Xsv
|
|
50
|
+
x = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>
|
|
49
51
|
|
|
50
52
|
sheet = x.sheets[0]
|
|
51
53
|
|
|
52
54
|
# Iterate over rows
|
|
53
|
-
sheet.
|
|
54
|
-
row # => ["header1", "header2"]
|
|
55
|
+
sheet.each do |row|
|
|
56
|
+
row # => ["header1", "header2"]
|
|
55
57
|
end
|
|
56
58
|
|
|
57
59
|
# Access row by index (zero-based)
|
|
@@ -59,40 +61,63 @@ sheet[1] # => ["value1", "value2"]
|
|
|
59
61
|
```
|
|
60
62
|
|
|
61
63
|
Alternatively, it can load the headers from the first row and return a hash
|
|
62
|
-
for every row
|
|
64
|
+
for every row by calling `parse_headers!` on the sheet or setting the `parse_headers`
|
|
65
|
+
option on open:
|
|
63
66
|
|
|
64
67
|
```ruby
|
|
65
|
-
|
|
68
|
+
# Parse headers for all sheets on open
|
|
69
|
+
|
|
70
|
+
x = Xsv.open("sheet.xlsx", parse_headers: true)
|
|
71
|
+
|
|
72
|
+
x.sheets[0][1] # => {"header1" => "value1", "header2" => "value2"}
|
|
73
|
+
|
|
74
|
+
# Manually parse headers for a single sheet
|
|
75
|
+
|
|
76
|
+
x = Xsv.open("sheet.xlsx")
|
|
66
77
|
|
|
67
78
|
sheet = x.sheets[0]
|
|
68
79
|
|
|
69
|
-
sheet
|
|
80
|
+
sheet[0] # => ["header1", "header2"]
|
|
70
81
|
|
|
71
|
-
# Parse headers and switch to hash mode
|
|
72
82
|
sheet.parse_headers!
|
|
73
83
|
|
|
74
|
-
sheet
|
|
84
|
+
sheet[0] # => {"header1" => "value1", "header2" => "value2"}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Be aware that hash mode will lead to unpredictable results if the worksheet
|
|
88
|
+
has multiple columns with the same header. `Xsv::Sheet` implements `Enumerable` so along with `#each`
|
|
89
|
+
you can call methods like `#first`, `#filter`/`#select`, and `#map` on it.
|
|
90
|
+
|
|
91
|
+
### Opening a string or buffer instead of filename
|
|
75
92
|
|
|
76
|
-
|
|
77
|
-
|
|
93
|
+
`Xsv.open` accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block
|
|
94
|
+
which will be called with the workbook as parameter, like `File#open`. Example of this together:
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
# Use an existing IO-like object as source
|
|
98
|
+
|
|
99
|
+
file = File.open("sheet.xlsx")
|
|
100
|
+
|
|
101
|
+
Xsv.open(file) do |workbook|
|
|
102
|
+
puts workbook.inspect
|
|
78
103
|
end
|
|
79
104
|
|
|
80
|
-
|
|
81
|
-
```
|
|
105
|
+
# or even:
|
|
82
106
|
|
|
83
|
-
|
|
84
|
-
|
|
107
|
+
Xsv.open(file.read) do |workbook|
|
|
108
|
+
puts workbook.inspect
|
|
109
|
+
end
|
|
110
|
+
```
|
|
85
111
|
|
|
86
|
-
`Xsv::Workbook.open`
|
|
87
|
-
|
|
112
|
+
Prior to Xsv 1.1.0, `Xsv::Workbook.open` was used instead of `Xsv.open`. The parameters are identical and
|
|
113
|
+
the former is maintained for backwards compatibility.
|
|
88
114
|
|
|
89
|
-
|
|
90
|
-
`#filter`/`#select`, and `#map` on it.
|
|
115
|
+
### Accessing sheets by name
|
|
91
116
|
|
|
92
117
|
The sheets can be accessed by index or by name:
|
|
93
118
|
|
|
94
119
|
```ruby
|
|
95
|
-
x = Xsv
|
|
120
|
+
x = Xsv.open("sheet.xlsx")
|
|
96
121
|
|
|
97
122
|
sheet = x.sheets[0] # gets sheet by index
|
|
98
123
|
|
data/benchmark.rb
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
#!/usr/bin/env ruby
|
|
2
2
|
|
|
3
|
-
require
|
|
3
|
+
require "bundler/inline"
|
|
4
4
|
|
|
5
5
|
gemfile do
|
|
6
6
|
source "https://rubygems.org"
|
|
@@ -36,7 +36,7 @@ end
|
|
|
36
36
|
|
|
37
37
|
file = File.read("test/files/10k-sheet.xlsx")
|
|
38
38
|
|
|
39
|
-
workbook = Xsv
|
|
39
|
+
workbook = Xsv.open(file)
|
|
40
40
|
|
|
41
41
|
puts "--- ARRAY MODE ---"
|
|
42
42
|
|
data/lib/xsv/sax_parser.rb
CHANGED
|
@@ -68,7 +68,7 @@ module Xsv
|
|
|
68
68
|
end
|
|
69
69
|
|
|
70
70
|
if tag_name.start_with?("/")
|
|
71
|
-
end_element(tag_name[1
|
|
71
|
+
end_element(tag_name[1..]) if responds_to_end_element
|
|
72
72
|
elsif args.nil?
|
|
73
73
|
start_element(tag_name, nil)
|
|
74
74
|
else
|
|
@@ -78,7 +78,7 @@ module Xsv
|
|
|
78
78
|
|
|
79
79
|
state = :look_start
|
|
80
80
|
elsif eof_reached
|
|
81
|
-
raise "Malformed XML document, looking for end of tag beyond EOF"
|
|
81
|
+
raise Xsv::Error, "Malformed XML document, looking for end of tag beyond EOF"
|
|
82
82
|
else
|
|
83
83
|
must_read = true
|
|
84
84
|
end
|
data/lib/xsv/sheet.rb
CHANGED
data/lib/xsv/version.rb
CHANGED
data/lib/xsv/workbook.rb
CHANGED
|
@@ -12,36 +12,17 @@ module Xsv
|
|
|
12
12
|
|
|
13
13
|
attr_reader :shared_strings, :xfs, :num_fmts, :trim_empty_rows
|
|
14
14
|
|
|
15
|
-
#
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
@workbook = if data.is_a?(IO) || data.respond_to?(:read) # is it a buffer?
|
|
19
|
-
new(Zip::File.open_buffer(data), **kws)
|
|
20
|
-
elsif data.start_with?("PK\x03\x04") # is it a string containing a file?
|
|
21
|
-
new(Zip::File.open_buffer(data), **kws)
|
|
22
|
-
else # must be a filename
|
|
23
|
-
new(Zip::File.open(data), **kws)
|
|
24
|
-
end
|
|
25
|
-
|
|
26
|
-
if block_given?
|
|
27
|
-
begin
|
|
28
|
-
yield(@workbook)
|
|
29
|
-
ensure
|
|
30
|
-
@workbook.close
|
|
31
|
-
end
|
|
32
|
-
else
|
|
33
|
-
@workbook
|
|
34
|
-
end
|
|
15
|
+
# @deprecated Use {Xsv.open} instead
|
|
16
|
+
def self.open(data, **kws, &block)
|
|
17
|
+
Xsv.open(data, **kws, &block)
|
|
35
18
|
end
|
|
36
19
|
|
|
37
20
|
# Open a workbook from an instance of {Zip::File}. Generally it's recommended
|
|
38
21
|
# to use the {.open} method instead of the constructor.
|
|
39
22
|
#
|
|
40
|
-
#
|
|
41
|
-
#
|
|
42
|
-
|
|
43
|
-
#
|
|
44
|
-
def initialize(zip, trim_empty_rows: false)
|
|
23
|
+
# @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
|
|
24
|
+
# @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
|
|
25
|
+
def initialize(zip, trim_empty_rows: false, parse_headers: false)
|
|
45
26
|
raise ArgumentError, "Passed argument is not an instance of Zip::File. Did you mean to use Workbook.open?" unless zip.is_a?(Zip::File)
|
|
46
27
|
raise Xsv::Error, "Zip::File is empty" if zip.size.zero?
|
|
47
28
|
|
|
@@ -53,12 +34,12 @@ module Xsv
|
|
|
53
34
|
@sheet_ids = fetch_sheet_ids
|
|
54
35
|
@relationships = fetch_relationships
|
|
55
36
|
@shared_strings = fetch_shared_strings
|
|
56
|
-
@sheets = fetch_sheets
|
|
37
|
+
@sheets = fetch_sheets(parse_headers ? :hash : :array)
|
|
57
38
|
end
|
|
58
39
|
|
|
59
40
|
# @return [String]
|
|
60
41
|
def inspect
|
|
61
|
-
"#<#{self.class.name}:#{object_id}>"
|
|
42
|
+
"#<#{self.class.name}:#{object_id} sheets=#{sheets.count} trim_empty_rows=#{@trim_empty_rows}>"
|
|
62
43
|
end
|
|
63
44
|
|
|
64
45
|
# Close the handle to the workbook file and leave all resources for the GC to collect
|
|
@@ -108,13 +89,15 @@ module Xsv
|
|
|
108
89
|
stream.close
|
|
109
90
|
end
|
|
110
91
|
|
|
111
|
-
def fetch_sheets
|
|
92
|
+
def fetch_sheets(mode)
|
|
112
93
|
@zip.glob("xl/worksheets/sheet*.xml").sort do |a, b|
|
|
113
94
|
a.name[/\d+/].to_i <=> b.name[/\d+/].to_i
|
|
114
95
|
end.map do |entry|
|
|
115
96
|
rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?("worksheet") }
|
|
116
97
|
sheet_ids = @sheet_ids.detect { |i| i[:"r:id"] == rel[:Id] }
|
|
117
|
-
Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids)
|
|
98
|
+
Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids).tap do |sheet|
|
|
99
|
+
sheet.parse_headers! if mode == :hash
|
|
100
|
+
end
|
|
118
101
|
end
|
|
119
102
|
end
|
|
120
103
|
|
data/lib/xsv.rb
CHANGED
|
@@ -24,4 +24,31 @@ module Xsv
|
|
|
24
24
|
# An AssertionFailed error indicates an unexpected condition, meaning a bug
|
|
25
25
|
# or misinterpreted .xlsx document
|
|
26
26
|
class AssertionFailed < StandardError; end
|
|
27
|
+
|
|
28
|
+
# Open the workbook of the given filename, string or buffer.
|
|
29
|
+
# @param filename_or_string [String, IO] the contents or filename of a workbook
|
|
30
|
+
# @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
|
|
31
|
+
# @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
|
|
32
|
+
# @return [Xsv::Workbook] The workbook instance
|
|
33
|
+
def self.open(filename_or_string, trim_empty_rows: false, parse_headers: false)
|
|
34
|
+
zip = if filename_or_string.is_a?(IO) || filename_or_string.respond_to?(:read) # is it a buffer?
|
|
35
|
+
Zip::File.open_buffer(filename_or_string)
|
|
36
|
+
elsif filename_or_string.start_with?("PK\x03\x04") # is it a string containing a file?
|
|
37
|
+
Zip::File.open_buffer(filename_or_string)
|
|
38
|
+
else # must be a filename
|
|
39
|
+
Zip::File.open(filename_or_string)
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
workbook = Xsv::Workbook.new(zip, trim_empty_rows: trim_empty_rows, parse_headers: parse_headers)
|
|
43
|
+
|
|
44
|
+
if block_given?
|
|
45
|
+
begin
|
|
46
|
+
yield(workbook)
|
|
47
|
+
ensure
|
|
48
|
+
workbook.close
|
|
49
|
+
end
|
|
50
|
+
else
|
|
51
|
+
workbook
|
|
52
|
+
end
|
|
53
|
+
end
|
|
27
54
|
end
|
data/xsv.gemspec
CHANGED
|
@@ -36,7 +36,7 @@ Gem::Specification.new do |spec|
|
|
|
36
36
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
|
37
37
|
spec.require_paths = ["lib"]
|
|
38
38
|
|
|
39
|
-
spec.required_ruby_version = ">= 2.
|
|
39
|
+
spec.required_ruby_version = ">= 2.6"
|
|
40
40
|
|
|
41
41
|
spec.add_dependency "rubyzip", ">= 1.3", "< 3"
|
|
42
42
|
|
|
@@ -44,4 +44,5 @@ Gem::Specification.new do |spec|
|
|
|
44
44
|
spec.add_development_dependency "rake", "~> 13.0"
|
|
45
45
|
spec.add_development_dependency "minitest", "~> 5.14.2"
|
|
46
46
|
spec.add_development_dependency "standard", "~> 1.6.0"
|
|
47
|
+
spec.add_development_dependency "codecov", ">= 0.6.0"
|
|
47
48
|
end
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: xsv
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.0
|
|
4
|
+
version: 1.1.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Martijn Storck
|
|
8
|
-
autorequire:
|
|
8
|
+
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2022-
|
|
11
|
+
date: 2022-02-13 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: rubyzip
|
|
@@ -86,6 +86,20 @@ dependencies:
|
|
|
86
86
|
- - "~>"
|
|
87
87
|
- !ruby/object:Gem::Version
|
|
88
88
|
version: 1.6.0
|
|
89
|
+
- !ruby/object:Gem::Dependency
|
|
90
|
+
name: codecov
|
|
91
|
+
requirement: !ruby/object:Gem::Requirement
|
|
92
|
+
requirements:
|
|
93
|
+
- - ">="
|
|
94
|
+
- !ruby/object:Gem::Version
|
|
95
|
+
version: 0.6.0
|
|
96
|
+
type: :development
|
|
97
|
+
prerelease: false
|
|
98
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
99
|
+
requirements:
|
|
100
|
+
- - ">="
|
|
101
|
+
- !ruby/object:Gem::Version
|
|
102
|
+
version: 0.6.0
|
|
89
103
|
description: |2
|
|
90
104
|
Xsv is a fast, lightweight parser for Office Open XML spreadsheet files
|
|
91
105
|
(commonly known as Excel or .xlsx files). It strives to be minimal in the
|
|
@@ -128,7 +142,7 @@ metadata:
|
|
|
128
142
|
homepage_uri: https://github.com/martijn/xsv
|
|
129
143
|
source_code_uri: https://github.com/martijn/xsv
|
|
130
144
|
changelog_uri: https://github.com/martijn/xsv/CHANGELOG.md
|
|
131
|
-
post_install_message:
|
|
145
|
+
post_install_message:
|
|
132
146
|
rdoc_options: []
|
|
133
147
|
require_paths:
|
|
134
148
|
- lib
|
|
@@ -136,15 +150,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
136
150
|
requirements:
|
|
137
151
|
- - ">="
|
|
138
152
|
- !ruby/object:Gem::Version
|
|
139
|
-
version: '2.
|
|
153
|
+
version: '2.6'
|
|
140
154
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
141
155
|
requirements:
|
|
142
156
|
- - ">="
|
|
143
157
|
- !ruby/object:Gem::Version
|
|
144
158
|
version: '0'
|
|
145
159
|
requirements: []
|
|
146
|
-
rubygems_version: 3.
|
|
147
|
-
signing_key:
|
|
160
|
+
rubygems_version: 3.2.3
|
|
161
|
+
signing_key:
|
|
148
162
|
specification_version: 4
|
|
149
163
|
summary: A fast and lightweight xlsx parser that provides nothing a CSV parser wouldn't
|
|
150
164
|
test_files: []
|