xsv 1.0.6 → 1.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/ruby.yml +1 -1
- data/.standard.yml +1 -1
- data/CHANGELOG.md +7 -0
- data/README.md +44 -19
- data/benchmark.rb +2 -2
- data/lib/xsv/sax_parser.rb +2 -2
- data/lib/xsv/sheet.rb +1 -1
- data/lib/xsv/version.rb +1 -1
- data/lib/xsv/workbook.rb +12 -29
- data/lib/xsv.rb +27 -0
- data/xsv.gemspec +2 -1
- metadata +21 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f1ebfa4e4778af72a8b295d258d899a5b5d01fd029d1294d54af5e4f1e0de05a
|
4
|
+
data.tar.gz: aa74ffe0d57eebc12e312bdb42107bc39203cd3a0237f2e2481205c8b3b933c9
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6ebbb32e48860043bdb0a5d17f6fef525252e8bb01ac63180cd0e749dfbd1b3bb08b8a82e43e9e13da2fa187c02f77e37f525e9c7453334bc3cb8fdf05400187
|
7
|
+
data.tar.gz: 4e1450daebcc3ddfbc0585de52f4d1f362ef2d79281e59cc641119a47924f8450b76c3643a6403a440e0a581ebfcc108431a311f902e2b48be61a1d2afd7b19e
|
data/.github/workflows/ruby.yml
CHANGED
data/.standard.yml
CHANGED
@@ -1 +1 @@
|
|
1
|
-
ruby_version: 2.
|
1
|
+
ruby_version: 2.6.9
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,12 @@
|
|
1
1
|
# Xsv Changelog
|
2
2
|
|
3
|
+
## 1.1.0 2022-02-13
|
4
|
+
|
5
|
+
- New, shorter `Xsv.open` syntax as a drop-in replacement for `Xsv::Workbook.open`, which is still supported
|
6
|
+
- Enable parsing of headers for all sheets by passing `parse_headers: true` to `Xsv.open`
|
7
|
+
- Improvements in performance and test coverage
|
8
|
+
- Dropped support for Ruby 2.5, which is EOL. Xsv 1.1.0 supports Ruby 2.6+, latest JRuby, latest TruffleRuby
|
9
|
+
|
3
10
|
## 1.0.6 2022-01-07
|
4
11
|
|
5
12
|
- Code cleanup, small performance improvements
|
data/README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
# Xsv .xlsx reader
|
2
2
|
|
3
3
|
[![Travis CI](https://img.shields.io/travis/martijn/xsv/master)](https://travis-ci.org/martijn/xsv)
|
4
|
+
[![Codecov](https://img.shields.io/codecov/c/github/martijn/xsv/main)](https://app.codecov.io/gh/martijn/xsv)
|
4
5
|
[![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)
|
5
6
|
[![Gem Version](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)
|
6
7
|
|
@@ -41,17 +42,18 @@ when that becomes stable.
|
|
41
42
|
|
42
43
|
## Usage
|
43
44
|
|
45
|
+
### Array and hash mode
|
44
46
|
Xsv has two modes of operation. By default, it returns an array for
|
45
47
|
each row in the sheet:
|
46
48
|
|
47
49
|
```ruby
|
48
|
-
x = Xsv
|
50
|
+
x = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>
|
49
51
|
|
50
52
|
sheet = x.sheets[0]
|
51
53
|
|
52
54
|
# Iterate over rows
|
53
|
-
sheet.
|
54
|
-
row # => ["header1", "header2"]
|
55
|
+
sheet.each do |row|
|
56
|
+
row # => ["header1", "header2"]
|
55
57
|
end
|
56
58
|
|
57
59
|
# Access row by index (zero-based)
|
@@ -59,40 +61,63 @@ sheet[1] # => ["value1", "value2"]
|
|
59
61
|
```
|
60
62
|
|
61
63
|
Alternatively, it can load the headers from the first row and return a hash
|
62
|
-
for every row
|
64
|
+
for every row by calling `parse_headers!` on the sheet or setting the `parse_headers`
|
65
|
+
option on open:
|
63
66
|
|
64
67
|
```ruby
|
65
|
-
|
68
|
+
# Parse headers for all sheets on open
|
69
|
+
|
70
|
+
x = Xsv.open("sheet.xlsx", parse_headers: true)
|
71
|
+
|
72
|
+
x.sheets[0][1] # => {"header1" => "value1", "header2" => "value2"}
|
73
|
+
|
74
|
+
# Manually parse headers for a single sheet
|
75
|
+
|
76
|
+
x = Xsv.open("sheet.xlsx")
|
66
77
|
|
67
78
|
sheet = x.sheets[0]
|
68
79
|
|
69
|
-
sheet
|
80
|
+
sheet[0] # => ["header1", "header2"]
|
70
81
|
|
71
|
-
# Parse headers and switch to hash mode
|
72
82
|
sheet.parse_headers!
|
73
83
|
|
74
|
-
sheet
|
84
|
+
sheet[0] # => {"header1" => "value1", "header2" => "value2"}
|
85
|
+
```
|
86
|
+
|
87
|
+
Be aware that hash mode will lead to unpredictable results if the worksheet
|
88
|
+
has multiple columns with the same header. `Xsv::Sheet` implements `Enumerable` so along with `#each`
|
89
|
+
you can call methods like `#first`, `#filter`/`#select`, and `#map` on it.
|
90
|
+
|
91
|
+
### Opening a string or buffer instead of filename
|
75
92
|
|
76
|
-
|
77
|
-
|
93
|
+
`Xsv.open` accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block
|
94
|
+
which will be called with the workbook as parameter, like `File#open`. Example of this together:
|
95
|
+
|
96
|
+
```ruby
|
97
|
+
# Use an existing IO-like object as source
|
98
|
+
|
99
|
+
file = File.open("sheet.xlsx")
|
100
|
+
|
101
|
+
Xsv.open(file) do |workbook|
|
102
|
+
puts workbook.inspect
|
78
103
|
end
|
79
104
|
|
80
|
-
|
81
|
-
```
|
105
|
+
# or even:
|
82
106
|
|
83
|
-
|
84
|
-
|
107
|
+
Xsv.open(file.read) do |workbook|
|
108
|
+
puts workbook.inspect
|
109
|
+
end
|
110
|
+
```
|
85
111
|
|
86
|
-
`Xsv::Workbook.open`
|
87
|
-
|
112
|
+
Prior to Xsv 1.1.0, `Xsv::Workbook.open` was used instead of `Xsv.open`. The parameters are identical and
|
113
|
+
the former is maintained for backwards compatibility.
|
88
114
|
|
89
|
-
|
90
|
-
`#filter`/`#select`, and `#map` on it.
|
115
|
+
### Accessing sheets by name
|
91
116
|
|
92
117
|
The sheets can be accessed by index or by name:
|
93
118
|
|
94
119
|
```ruby
|
95
|
-
x = Xsv
|
120
|
+
x = Xsv.open("sheet.xlsx")
|
96
121
|
|
97
122
|
sheet = x.sheets[0] # gets sheet by index
|
98
123
|
|
data/benchmark.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
|
-
require
|
3
|
+
require "bundler/inline"
|
4
4
|
|
5
5
|
gemfile do
|
6
6
|
source "https://rubygems.org"
|
@@ -36,7 +36,7 @@ end
|
|
36
36
|
|
37
37
|
file = File.read("test/files/10k-sheet.xlsx")
|
38
38
|
|
39
|
-
workbook = Xsv
|
39
|
+
workbook = Xsv.open(file)
|
40
40
|
|
41
41
|
puts "--- ARRAY MODE ---"
|
42
42
|
|
data/lib/xsv/sax_parser.rb
CHANGED
@@ -68,7 +68,7 @@ module Xsv
|
|
68
68
|
end
|
69
69
|
|
70
70
|
if tag_name.start_with?("/")
|
71
|
-
end_element(tag_name[1
|
71
|
+
end_element(tag_name[1..]) if responds_to_end_element
|
72
72
|
elsif args.nil?
|
73
73
|
start_element(tag_name, nil)
|
74
74
|
else
|
@@ -78,7 +78,7 @@ module Xsv
|
|
78
78
|
|
79
79
|
state = :look_start
|
80
80
|
elsif eof_reached
|
81
|
-
raise "Malformed XML document, looking for end of tag beyond EOF"
|
81
|
+
raise Xsv::Error, "Malformed XML document, looking for end of tag beyond EOF"
|
82
82
|
else
|
83
83
|
must_read = true
|
84
84
|
end
|
data/lib/xsv/sheet.rb
CHANGED
data/lib/xsv/version.rb
CHANGED
data/lib/xsv/workbook.rb
CHANGED
@@ -12,36 +12,17 @@ module Xsv
|
|
12
12
|
|
13
13
|
attr_reader :shared_strings, :xfs, :num_fmts, :trim_empty_rows
|
14
14
|
|
15
|
-
#
|
16
|
-
|
17
|
-
|
18
|
-
@workbook = if data.is_a?(IO) || data.respond_to?(:read) # is it a buffer?
|
19
|
-
new(Zip::File.open_buffer(data), **kws)
|
20
|
-
elsif data.start_with?("PK\x03\x04") # is it a string containing a file?
|
21
|
-
new(Zip::File.open_buffer(data), **kws)
|
22
|
-
else # must be a filename
|
23
|
-
new(Zip::File.open(data), **kws)
|
24
|
-
end
|
25
|
-
|
26
|
-
if block_given?
|
27
|
-
begin
|
28
|
-
yield(@workbook)
|
29
|
-
ensure
|
30
|
-
@workbook.close
|
31
|
-
end
|
32
|
-
else
|
33
|
-
@workbook
|
34
|
-
end
|
15
|
+
# @deprecated Use {Xsv.open} instead
|
16
|
+
def self.open(data, **kws, &block)
|
17
|
+
Xsv.open(data, **kws, &block)
|
35
18
|
end
|
36
19
|
|
37
20
|
# Open a workbook from an instance of {Zip::File}. Generally it's recommended
|
38
21
|
# to use the {.open} method instead of the constructor.
|
39
22
|
#
|
40
|
-
#
|
41
|
-
#
|
42
|
-
|
43
|
-
#
|
44
|
-
def initialize(zip, trim_empty_rows: false)
|
23
|
+
# @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
|
24
|
+
# @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
|
25
|
+
def initialize(zip, trim_empty_rows: false, parse_headers: false)
|
45
26
|
raise ArgumentError, "Passed argument is not an instance of Zip::File. Did you mean to use Workbook.open?" unless zip.is_a?(Zip::File)
|
46
27
|
raise Xsv::Error, "Zip::File is empty" if zip.size.zero?
|
47
28
|
|
@@ -53,12 +34,12 @@ module Xsv
|
|
53
34
|
@sheet_ids = fetch_sheet_ids
|
54
35
|
@relationships = fetch_relationships
|
55
36
|
@shared_strings = fetch_shared_strings
|
56
|
-
@sheets = fetch_sheets
|
37
|
+
@sheets = fetch_sheets(parse_headers ? :hash : :array)
|
57
38
|
end
|
58
39
|
|
59
40
|
# @return [String]
|
60
41
|
def inspect
|
61
|
-
"#<#{self.class.name}:#{object_id}>"
|
42
|
+
"#<#{self.class.name}:#{object_id} sheets=#{sheets.count} trim_empty_rows=#{@trim_empty_rows}>"
|
62
43
|
end
|
63
44
|
|
64
45
|
# Close the handle to the workbook file and leave all resources for the GC to collect
|
@@ -108,13 +89,15 @@ module Xsv
|
|
108
89
|
stream.close
|
109
90
|
end
|
110
91
|
|
111
|
-
def fetch_sheets
|
92
|
+
def fetch_sheets(mode)
|
112
93
|
@zip.glob("xl/worksheets/sheet*.xml").sort do |a, b|
|
113
94
|
a.name[/\d+/].to_i <=> b.name[/\d+/].to_i
|
114
95
|
end.map do |entry|
|
115
96
|
rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?("worksheet") }
|
116
97
|
sheet_ids = @sheet_ids.detect { |i| i[:"r:id"] == rel[:Id] }
|
117
|
-
Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids)
|
98
|
+
Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids).tap do |sheet|
|
99
|
+
sheet.parse_headers! if mode == :hash
|
100
|
+
end
|
118
101
|
end
|
119
102
|
end
|
120
103
|
|
data/lib/xsv.rb
CHANGED
@@ -24,4 +24,31 @@ module Xsv
|
|
24
24
|
# An AssertionFailed error indicates an unexpected condition, meaning a bug
|
25
25
|
# or misinterpreted .xlsx document
|
26
26
|
class AssertionFailed < StandardError; end
|
27
|
+
|
28
|
+
# Open the workbook of the given filename, string or buffer.
|
29
|
+
# @param filename_or_string [String, IO] the contents or filename of a workbook
|
30
|
+
# @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
|
31
|
+
# @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
|
32
|
+
# @return [Xsv::Workbook] The workbook instance
|
33
|
+
def self.open(filename_or_string, trim_empty_rows: false, parse_headers: false)
|
34
|
+
zip = if filename_or_string.is_a?(IO) || filename_or_string.respond_to?(:read) # is it a buffer?
|
35
|
+
Zip::File.open_buffer(filename_or_string)
|
36
|
+
elsif filename_or_string.start_with?("PK\x03\x04") # is it a string containing a file?
|
37
|
+
Zip::File.open_buffer(filename_or_string)
|
38
|
+
else # must be a filename
|
39
|
+
Zip::File.open(filename_or_string)
|
40
|
+
end
|
41
|
+
|
42
|
+
workbook = Xsv::Workbook.new(zip, trim_empty_rows: trim_empty_rows, parse_headers: parse_headers)
|
43
|
+
|
44
|
+
if block_given?
|
45
|
+
begin
|
46
|
+
yield(workbook)
|
47
|
+
ensure
|
48
|
+
workbook.close
|
49
|
+
end
|
50
|
+
else
|
51
|
+
workbook
|
52
|
+
end
|
53
|
+
end
|
27
54
|
end
|
data/xsv.gemspec
CHANGED
@@ -36,7 +36,7 @@ Gem::Specification.new do |spec|
|
|
36
36
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
37
37
|
spec.require_paths = ["lib"]
|
38
38
|
|
39
|
-
spec.required_ruby_version = ">= 2.
|
39
|
+
spec.required_ruby_version = ">= 2.6"
|
40
40
|
|
41
41
|
spec.add_dependency "rubyzip", ">= 1.3", "< 3"
|
42
42
|
|
@@ -44,4 +44,5 @@ Gem::Specification.new do |spec|
|
|
44
44
|
spec.add_development_dependency "rake", "~> 13.0"
|
45
45
|
spec.add_development_dependency "minitest", "~> 5.14.2"
|
46
46
|
spec.add_development_dependency "standard", "~> 1.6.0"
|
47
|
+
spec.add_development_dependency "codecov", ">= 0.6.0"
|
47
48
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xsv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0
|
4
|
+
version: 1.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Martijn Storck
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-
|
11
|
+
date: 2022-02-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rubyzip
|
@@ -86,6 +86,20 @@ dependencies:
|
|
86
86
|
- - "~>"
|
87
87
|
- !ruby/object:Gem::Version
|
88
88
|
version: 1.6.0
|
89
|
+
- !ruby/object:Gem::Dependency
|
90
|
+
name: codecov
|
91
|
+
requirement: !ruby/object:Gem::Requirement
|
92
|
+
requirements:
|
93
|
+
- - ">="
|
94
|
+
- !ruby/object:Gem::Version
|
95
|
+
version: 0.6.0
|
96
|
+
type: :development
|
97
|
+
prerelease: false
|
98
|
+
version_requirements: !ruby/object:Gem::Requirement
|
99
|
+
requirements:
|
100
|
+
- - ">="
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: 0.6.0
|
89
103
|
description: |2
|
90
104
|
Xsv is a fast, lightweight parser for Office Open XML spreadsheet files
|
91
105
|
(commonly known as Excel or .xlsx files). It strives to be minimal in the
|
@@ -128,7 +142,7 @@ metadata:
|
|
128
142
|
homepage_uri: https://github.com/martijn/xsv
|
129
143
|
source_code_uri: https://github.com/martijn/xsv
|
130
144
|
changelog_uri: https://github.com/martijn/xsv/CHANGELOG.md
|
131
|
-
post_install_message:
|
145
|
+
post_install_message:
|
132
146
|
rdoc_options: []
|
133
147
|
require_paths:
|
134
148
|
- lib
|
@@ -136,15 +150,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
136
150
|
requirements:
|
137
151
|
- - ">="
|
138
152
|
- !ruby/object:Gem::Version
|
139
|
-
version: '2.
|
153
|
+
version: '2.6'
|
140
154
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
141
155
|
requirements:
|
142
156
|
- - ">="
|
143
157
|
- !ruby/object:Gem::Version
|
144
158
|
version: '0'
|
145
159
|
requirements: []
|
146
|
-
rubygems_version: 3.
|
147
|
-
signing_key:
|
160
|
+
rubygems_version: 3.2.3
|
161
|
+
signing_key:
|
148
162
|
specification_version: 4
|
149
163
|
summary: A fast and lightweight xlsx parser that provides nothing a CSV parser wouldn't
|
150
164
|
test_files: []
|