xsv 1.0.3 → 1.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/ruby.yml +32 -0
- data/.standard.yml +1 -0
- data/CHANGELOG.md +19 -0
- data/README.md +45 -20
- data/Rakefile +1 -1
- data/benchmark.rb +51 -0
- data/lib/xsv/helpers.rb +35 -35
- data/lib/xsv/relationships_handler.rb +1 -1
- data/lib/xsv/sax_parser.rb +18 -15
- data/lib/xsv/shared_strings_parser.rb +8 -8
- data/lib/xsv/sheet.rb +3 -3
- data/lib/xsv/sheet_bounds_handler.rb +14 -14
- data/lib/xsv/sheet_rows_handler.rb +26 -35
- data/lib/xsv/sheets_ids_handler.rb +1 -1
- data/lib/xsv/styles_handler.rb +14 -14
- data/lib/xsv/version.rb +1 -1
- data/lib/xsv/workbook.rb +29 -40
- data/lib/xsv.rb +39 -12
- data/xsv.gemspec +4 -2
- metadata +35 -5
- data/.travis.yml +0 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f1ebfa4e4778af72a8b295d258d899a5b5d01fd029d1294d54af5e4f1e0de05a
|
4
|
+
data.tar.gz: aa74ffe0d57eebc12e312bdb42107bc39203cd3a0237f2e2481205c8b3b933c9
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6ebbb32e48860043bdb0a5d17f6fef525252e8bb01ac63180cd0e749dfbd1b3bb08b8a82e43e9e13da2fa187c02f77e37f525e9c7453334bc3cb8fdf05400187
|
7
|
+
data.tar.gz: 4e1450daebcc3ddfbc0585de52f4d1f362ef2d79281e59cc641119a47924f8450b76c3643a6403a440e0a581ebfcc108431a311f902e2b48be61a1d2afd7b19e
|
@@ -0,0 +1,32 @@
|
|
1
|
+
# This workflow uses actions that are not certified by GitHub.
|
2
|
+
# They are provided by a third-party and are governed by
|
3
|
+
# separate terms of service, privacy policy, and support
|
4
|
+
# documentation.
|
5
|
+
# This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
|
6
|
+
# For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
|
7
|
+
|
8
|
+
name: Ruby
|
9
|
+
|
10
|
+
on:
|
11
|
+
push:
|
12
|
+
branches: [ main ]
|
13
|
+
pull_request:
|
14
|
+
branches: [ main ]
|
15
|
+
|
16
|
+
jobs:
|
17
|
+
test:
|
18
|
+
|
19
|
+
runs-on: ubuntu-latest
|
20
|
+
strategy:
|
21
|
+
matrix:
|
22
|
+
ruby-version: ['2.6', '2.7', '3.0', '3.1', 'jruby', 'truffleruby']
|
23
|
+
|
24
|
+
steps:
|
25
|
+
- uses: actions/checkout@v2
|
26
|
+
- name: Set up Ruby
|
27
|
+
uses: ruby/setup-ruby@v1
|
28
|
+
with:
|
29
|
+
ruby-version: ${{ matrix.ruby-version }}
|
30
|
+
bundler-cache: true # runs 'bundle install' and caches installed gems automatically
|
31
|
+
- name: Run tests
|
32
|
+
run: bundle exec rake
|
data/.standard.yml
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
ruby_version: 2.6.9
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,24 @@
|
|
1
1
|
# Xsv Changelog
|
2
2
|
|
3
|
+
## 1.1.0 2022-02-13
|
4
|
+
|
5
|
+
- New, shorter `Xsv.open` syntax as a drop-in replacement for `Xsv::Workbook.open`, which is still supported
|
6
|
+
- Enable parsing of headers for all sheets by passing `parse_headers: true` to `Xsv.open`
|
7
|
+
- Improvements in performance and test coverage
|
8
|
+
- Dropped support for Ruby 2.5, which is EOL. Xsv 1.1.0 supports Ruby 2.6+, latest JRuby, latest TruffleRuby
|
9
|
+
|
10
|
+
## 1.0.6 2022-01-07
|
11
|
+
|
12
|
+
- Code cleanup, small performance improvements
|
13
|
+
|
14
|
+
## 1.0.5 2022-01-05
|
15
|
+
|
16
|
+
- Raise exception if given an empty buffer when opening workbook (thanks @kevin-j-m)
|
17
|
+
|
18
|
+
## 1.0.4 2021-07-05
|
19
|
+
|
20
|
+
- Support for custom date/time columns
|
21
|
+
|
3
22
|
## 1.0.3 2021-05-06
|
4
23
|
|
5
24
|
- Handle nil number formats correctly (regression in Xsv 1.0.2, #29)
|
data/README.md
CHANGED
@@ -1,10 +1,11 @@
|
|
1
1
|
# Xsv .xlsx reader
|
2
2
|
|
3
3
|
[![Travis CI](https://img.shields.io/travis/martijn/xsv/master)](https://travis-ci.org/martijn/xsv)
|
4
|
+
[![Codecov](https://img.shields.io/codecov/c/github/martijn/xsv/main)](https://app.codecov.io/gh/martijn/xsv)
|
4
5
|
[![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)
|
5
6
|
[![Gem Version](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)
|
6
7
|
|
7
|
-
Xsv is a fast, lightweight, pure Ruby parser for Office Open XML spreadsheet files
|
8
|
+
Xsv is a fast, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheet files
|
8
9
|
(commonly known as Excel or .xlsx files). It strives to be minimal in the
|
9
10
|
sense that it provides nothing a CSV reader wouldn't, meaning it only
|
10
11
|
deals with minimal formatting and cannot create or modify documents.
|
@@ -41,17 +42,18 @@ when that becomes stable.
|
|
41
42
|
|
42
43
|
## Usage
|
43
44
|
|
45
|
+
### Array and hash mode
|
44
46
|
Xsv has two modes of operation. By default, it returns an array for
|
45
47
|
each row in the sheet:
|
46
48
|
|
47
49
|
```ruby
|
48
|
-
x = Xsv
|
50
|
+
x = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>
|
49
51
|
|
50
52
|
sheet = x.sheets[0]
|
51
53
|
|
52
54
|
# Iterate over rows
|
53
|
-
sheet.
|
54
|
-
row # => ["header1", "header2"]
|
55
|
+
sheet.each do |row|
|
56
|
+
row # => ["header1", "header2"]
|
55
57
|
end
|
56
58
|
|
57
59
|
# Access row by index (zero-based)
|
@@ -59,40 +61,63 @@ sheet[1] # => ["value1", "value2"]
|
|
59
61
|
```
|
60
62
|
|
61
63
|
Alternatively, it can load the headers from the first row and return a hash
|
62
|
-
for every row
|
64
|
+
for every row by calling `parse_headers!` on the sheet or setting the `parse_headers`
|
65
|
+
option on open:
|
63
66
|
|
64
67
|
```ruby
|
65
|
-
|
68
|
+
# Parse headers for all sheets on open
|
69
|
+
|
70
|
+
x = Xsv.open("sheet.xlsx", parse_headers: true)
|
71
|
+
|
72
|
+
x.sheets[0][1] # => {"header1" => "value1", "header2" => "value2"}
|
73
|
+
|
74
|
+
# Manually parse headers for a single sheet
|
75
|
+
|
76
|
+
x = Xsv.open("sheet.xlsx")
|
66
77
|
|
67
78
|
sheet = x.sheets[0]
|
68
79
|
|
69
|
-
sheet
|
80
|
+
sheet[0] # => ["header1", "header2"]
|
70
81
|
|
71
|
-
# Parse headers and switch to hash mode
|
72
82
|
sheet.parse_headers!
|
73
83
|
|
74
|
-
sheet
|
84
|
+
sheet[0] # => {"header1" => "value1", "header2" => "value2"}
|
85
|
+
```
|
86
|
+
|
87
|
+
Be aware that hash mode will lead to unpredictable results if the worksheet
|
88
|
+
has multiple columns with the same header. `Xsv::Sheet` implements `Enumerable` so along with `#each`
|
89
|
+
you can call methods like `#first`, `#filter`/`#select`, and `#map` on it.
|
90
|
+
|
91
|
+
### Opening a string or buffer instead of filename
|
75
92
|
|
76
|
-
|
77
|
-
|
93
|
+
`Xsv.open` accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block
|
94
|
+
which will be called with the workbook as parameter, like `File#open`. Example of this together:
|
95
|
+
|
96
|
+
```ruby
|
97
|
+
# Use an existing IO-like object as source
|
98
|
+
|
99
|
+
file = File.open("sheet.xlsx")
|
100
|
+
|
101
|
+
Xsv.open(file) do |workbook|
|
102
|
+
puts workbook.inspect
|
78
103
|
end
|
79
104
|
|
80
|
-
|
81
|
-
```
|
105
|
+
# or even:
|
82
106
|
|
83
|
-
|
84
|
-
|
107
|
+
Xsv.open(file.read) do |workbook|
|
108
|
+
puts workbook.inspect
|
109
|
+
end
|
110
|
+
```
|
85
111
|
|
86
|
-
`Xsv::Workbook.open`
|
87
|
-
|
112
|
+
Prior to Xsv 1.1.0, `Xsv::Workbook.open` was used instead of `Xsv.open`. The parameters are identical and
|
113
|
+
the former is maintained for backwards compatibility.
|
88
114
|
|
89
|
-
|
90
|
-
`#filter`/`#select`, and `#map` on it.
|
115
|
+
### Accessing sheets by name
|
91
116
|
|
92
117
|
The sheets can be accessed by index or by name:
|
93
118
|
|
94
119
|
```ruby
|
95
|
-
x = Xsv
|
120
|
+
x = Xsv.open("sheet.xlsx")
|
96
121
|
|
97
122
|
sheet = x.sheets[0] # gets sheet by index
|
98
123
|
|
data/Rakefile
CHANGED
data/benchmark.rb
ADDED
@@ -0,0 +1,51 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/inline"
|
4
|
+
|
5
|
+
gemfile do
|
6
|
+
source "https://rubygems.org"
|
7
|
+
|
8
|
+
gemspec
|
9
|
+
gem "benchmark-memory"
|
10
|
+
gem "benchmark-perf"
|
11
|
+
end
|
12
|
+
|
13
|
+
def bench_perf(sheet)
|
14
|
+
result = Benchmark::Perf.cpu(repeat: 5) do
|
15
|
+
sheet.each do |row|
|
16
|
+
row.each do |cell|
|
17
|
+
cell
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
puts "Performance benchmark: #{result.avg}s avg #{result.stdev}s stdev"
|
23
|
+
end
|
24
|
+
|
25
|
+
def bench_mem(sheet)
|
26
|
+
Benchmark.memory do |bm|
|
27
|
+
bm.report do
|
28
|
+
sheet.each do |row|
|
29
|
+
row.each do |cell|
|
30
|
+
cell
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
file = File.read("test/files/10k-sheet.xlsx")
|
38
|
+
|
39
|
+
workbook = Xsv.open(file)
|
40
|
+
|
41
|
+
puts "--- ARRAY MODE ---"
|
42
|
+
|
43
|
+
bench_perf(workbook.sheets[0])
|
44
|
+
bench_mem(workbook.sheets[0])
|
45
|
+
|
46
|
+
puts "\n--- HASH MODE ---"
|
47
|
+
|
48
|
+
workbook.sheets[0].parse_headers!
|
49
|
+
|
50
|
+
bench_perf(workbook.sheets[0])
|
51
|
+
bench_mem(workbook.sheets[0])
|
data/lib/xsv/helpers.rb
CHANGED
@@ -5,42 +5,42 @@ module Xsv
|
|
5
5
|
# The default OOXML Spreadheet number formats according to the ECMA standard
|
6
6
|
# User formats are appended from index 174 onward
|
7
7
|
BUILT_IN_NUMBER_FORMATS = {
|
8
|
-
1 =>
|
9
|
-
2 =>
|
10
|
-
3 =>
|
11
|
-
4 =>
|
12
|
-
5 =>
|
13
|
-
6 =>
|
14
|
-
7 =>
|
15
|
-
8 =>
|
16
|
-
9 =>
|
17
|
-
10 =>
|
18
|
-
11 =>
|
19
|
-
12 =>
|
20
|
-
13 =>
|
21
|
-
14 =>
|
22
|
-
15 =>
|
23
|
-
16 =>
|
24
|
-
17 =>
|
25
|
-
18 =>
|
26
|
-
19 =>
|
27
|
-
20 =>
|
28
|
-
21 =>
|
29
|
-
22 =>
|
30
|
-
37 =>
|
31
|
-
38 =>
|
32
|
-
39 =>
|
33
|
-
40 =>
|
34
|
-
45 =>
|
35
|
-
46 =>
|
36
|
-
47 =>
|
37
|
-
48 =>
|
38
|
-
49 =>
|
8
|
+
1 => "0",
|
9
|
+
2 => "0.00",
|
10
|
+
3 => "#, ##0",
|
11
|
+
4 => "#, ##0.00",
|
12
|
+
5 => "$#, ##0_);($#, ##0)",
|
13
|
+
6 => "$#, ##0_);[Red]($#, ##0)",
|
14
|
+
7 => "$#, ##0.00_);($#, ##0.00)",
|
15
|
+
8 => "$#, ##0.00_);[Red]($#, ##0.00)",
|
16
|
+
9 => "0%",
|
17
|
+
10 => "0.00%",
|
18
|
+
11 => "0.00E+00",
|
19
|
+
12 => "# ?/?",
|
20
|
+
13 => "# ??/??",
|
21
|
+
14 => "m/d/yyyy",
|
22
|
+
15 => "d-mmm-yy",
|
23
|
+
16 => "d-mmm",
|
24
|
+
17 => "mmm-yy",
|
25
|
+
18 => "h:mm AM/PM",
|
26
|
+
19 => "h:mm:ss AM/PM",
|
27
|
+
20 => "h:mm",
|
28
|
+
21 => "h:mm:ss",
|
29
|
+
22 => "m/d/yyyy h:mm",
|
30
|
+
37 => "#, ##0_);(#, ##0)",
|
31
|
+
38 => "#, ##0_);[Red](#, ##0)",
|
32
|
+
39 => "#, ##0.00_);(#, ##0.00)",
|
33
|
+
40 => "#, ##0.00_);[Red](#, ##0.00)",
|
34
|
+
45 => "mm:ss",
|
35
|
+
46 => "[h]:mm:ss",
|
36
|
+
47 => "mm:ss.0",
|
37
|
+
48 => "##0.0E+0",
|
38
|
+
49 => "@"
|
39
39
|
}.freeze
|
40
40
|
|
41
41
|
MINUTE = 60
|
42
42
|
HOUR = 3600
|
43
|
-
A_CODEPOINT =
|
43
|
+
A_CODEPOINT = "A".ord.freeze
|
44
44
|
# The epoch for all dates in OOXML Spreadsheet documents
|
45
45
|
EPOCH = Date.new(1899, 12, 30).freeze
|
46
46
|
|
@@ -74,7 +74,7 @@ module Xsv
|
|
74
74
|
minutes = minutes % 60
|
75
75
|
end
|
76
76
|
|
77
|
-
format(
|
77
|
+
format("%02d:%02d", hours, minutes)
|
78
78
|
end
|
79
79
|
|
80
80
|
# Returns a time including a date as a {Time} object
|
@@ -92,9 +92,9 @@ module Xsv
|
|
92
92
|
|
93
93
|
# Returns a number as either Integer or Float
|
94
94
|
def parse_number(string)
|
95
|
-
if string.include?
|
95
|
+
if string.include? "."
|
96
96
|
string.to_f
|
97
|
-
elsif string.include?
|
97
|
+
elsif string.include? "E"
|
98
98
|
Complex(string).to_f
|
99
99
|
else
|
100
100
|
string.to_i
|
data/lib/xsv/sax_parser.rb
CHANGED
@@ -5,6 +5,9 @@ module Xsv
|
|
5
5
|
ATTR_REGEX = /((\S+)="(.*?)")/m
|
6
6
|
|
7
7
|
def parse(io)
|
8
|
+
responds_to_end_element = respond_to?(:end_element)
|
9
|
+
responds_to_characters = respond_to?(:characters)
|
10
|
+
|
8
11
|
state = :look_start
|
9
12
|
if io.is_a?(String)
|
10
13
|
pbuf = io.dup
|
@@ -29,16 +32,16 @@ module Xsv
|
|
29
32
|
end
|
30
33
|
|
31
34
|
if state == :look_start
|
32
|
-
if (o = pbuf.index(
|
33
|
-
chars = pbuf.slice!(0, o + 1).chop!.force_encoding(
|
35
|
+
if (o = pbuf.index("<"))
|
36
|
+
chars = pbuf.slice!(0, o + 1).chop!.force_encoding("utf-8")
|
34
37
|
|
35
|
-
if
|
36
|
-
if chars.index(
|
37
|
-
chars.gsub!(
|
38
|
-
chars.gsub!(
|
39
|
-
chars.gsub!(
|
40
|
-
chars.gsub!(
|
41
|
-
chars.gsub!(
|
38
|
+
if responds_to_characters && !chars.empty?
|
39
|
+
if chars.index("&")
|
40
|
+
chars.gsub!("&", "&")
|
41
|
+
chars.gsub!("'", "'")
|
42
|
+
chars.gsub!(">", ">")
|
43
|
+
chars.gsub!("<", "<")
|
44
|
+
chars.gsub!(""", '"')
|
42
45
|
end
|
43
46
|
characters(chars)
|
44
47
|
end
|
@@ -55,8 +58,8 @@ module Xsv
|
|
55
58
|
end
|
56
59
|
|
57
60
|
if state == :look_end
|
58
|
-
if (o = pbuf.index(
|
59
|
-
if (s = pbuf.index(
|
61
|
+
if (o = pbuf.index(">"))
|
62
|
+
if (s = pbuf.index(" ")) && s < o
|
60
63
|
tag_name = pbuf.slice!(0, s + 1).chop!
|
61
64
|
args = pbuf.slice!(0, o - s)
|
62
65
|
else
|
@@ -64,18 +67,18 @@ module Xsv
|
|
64
67
|
args = nil
|
65
68
|
end
|
66
69
|
|
67
|
-
if tag_name.start_with?(
|
68
|
-
end_element(tag_name[1
|
70
|
+
if tag_name.start_with?("/")
|
71
|
+
end_element(tag_name[1..]) if responds_to_end_element
|
69
72
|
elsif args.nil?
|
70
73
|
start_element(tag_name, nil)
|
71
74
|
else
|
72
75
|
start_element(tag_name, args.scan(ATTR_REGEX).each_with_object({}) { |m, h| h[m[1].to_sym] = m[2] })
|
73
|
-
end_element(tag_name) if args.end_with?(
|
76
|
+
end_element(tag_name) if responds_to_end_element && args.end_with?("/")
|
74
77
|
end
|
75
78
|
|
76
79
|
state = :look_start
|
77
80
|
elsif eof_reached
|
78
|
-
raise
|
81
|
+
raise Xsv::Error, "Malformed XML document, looking for end of tag beyond EOF"
|
79
82
|
else
|
80
83
|
must_read = true
|
81
84
|
end
|
@@ -18,29 +18,29 @@ module Xsv
|
|
18
18
|
|
19
19
|
def start_element(name, _attrs)
|
20
20
|
case name
|
21
|
-
when
|
22
|
-
@current_string =
|
21
|
+
when "si"
|
22
|
+
@current_string = ""
|
23
23
|
@skip = false
|
24
|
-
when
|
24
|
+
when "rPh"
|
25
25
|
@skip = true
|
26
|
-
when
|
26
|
+
when "t"
|
27
27
|
@state = name
|
28
28
|
end
|
29
29
|
end
|
30
30
|
|
31
31
|
def characters(value)
|
32
|
-
if @state ==
|
32
|
+
if @state == "t" && !@skip
|
33
33
|
@current_string += value
|
34
34
|
end
|
35
35
|
end
|
36
36
|
|
37
37
|
def end_element(name)
|
38
38
|
case name
|
39
|
-
when
|
39
|
+
when "si"
|
40
40
|
@block.call(@current_string)
|
41
|
-
when
|
41
|
+
when "rPh"
|
42
42
|
@skip = false
|
43
|
-
when
|
43
|
+
when "t"
|
44
44
|
@state = nil
|
45
45
|
end
|
46
46
|
end
|
data/lib/xsv/sheet.rb
CHANGED
@@ -40,14 +40,14 @@ module Xsv
|
|
40
40
|
@headers = []
|
41
41
|
@mode = :array
|
42
42
|
@row_skip = 0
|
43
|
-
@hidden = ids[:state] ==
|
43
|
+
@hidden = ids[:state] == "hidden"
|
44
44
|
|
45
45
|
@last_row, @column_count = SheetBoundsHandler.get_bounds(@io, @workbook)
|
46
46
|
end
|
47
47
|
|
48
48
|
# @return [String]
|
49
49
|
def inspect
|
50
|
-
"#<#{self.class.name}:#{object_id}>"
|
50
|
+
"#<#{self.class.name}:#{object_id} mode=#{@mode}>"
|
51
51
|
end
|
52
52
|
|
53
53
|
# Returns true if the worksheet is hidden
|
@@ -66,7 +66,7 @@ module Xsv
|
|
66
66
|
true
|
67
67
|
end
|
68
68
|
|
69
|
-
|
69
|
+
alias_method :each, :each_row
|
70
70
|
|
71
71
|
# Get row by number, starting at 0. Returns either a hash or an array based on the current row.
|
72
72
|
# If the specified index is out of bounds an empty row is returned.
|
@@ -30,40 +30,40 @@ module Xsv
|
|
30
30
|
@state = nil
|
31
31
|
@cell = nil
|
32
32
|
@row = nil
|
33
|
-
@
|
34
|
-
@
|
33
|
+
@max_row = 0
|
34
|
+
@max_column = 0
|
35
35
|
@trim_empty_rows = trim_empty_rows
|
36
36
|
end
|
37
37
|
|
38
38
|
def start_element(name, attrs)
|
39
39
|
case name
|
40
|
-
when
|
40
|
+
when "c"
|
41
41
|
@state = name
|
42
42
|
@cell = attrs[:r]
|
43
|
-
when
|
43
|
+
when "v"
|
44
44
|
col = column_index(@cell)
|
45
|
-
@
|
46
|
-
@
|
47
|
-
when
|
45
|
+
@max_column = col if col > @max_column
|
46
|
+
@max_row = @row if @row > @max_row
|
47
|
+
when "row"
|
48
48
|
@state = name
|
49
49
|
@row = attrs[:r].to_i
|
50
|
-
when
|
50
|
+
when "dimension"
|
51
51
|
@state = name
|
52
52
|
|
53
|
-
|
53
|
+
_first_cell, last_cell = attrs[:ref].split(":")
|
54
54
|
|
55
|
-
if
|
56
|
-
@
|
55
|
+
if last_cell
|
56
|
+
@max_column = column_index(last_cell)
|
57
57
|
unless @trim_empty_rows
|
58
|
-
@
|
59
|
-
@block.call(@
|
58
|
+
@max_row = last_cell[/\d+$/].to_i
|
59
|
+
@block.call(@max_row, @max_column)
|
60
60
|
end
|
61
61
|
end
|
62
62
|
end
|
63
63
|
end
|
64
64
|
|
65
65
|
def end_element(name)
|
66
|
-
@block.call(@
|
66
|
+
@block.call(@max_row, @max_column) if name == "sheetData"
|
67
67
|
end
|
68
68
|
end
|
69
69
|
end
|
@@ -14,58 +14,50 @@ module Xsv
|
|
14
14
|
@last_row = last_row - @row_skip
|
15
15
|
@block = block
|
16
16
|
|
17
|
-
@
|
17
|
+
@store_characters = false
|
18
18
|
|
19
19
|
@row_index = 0
|
20
20
|
@current_row = {}
|
21
|
-
@
|
21
|
+
@current_row_number = 0
|
22
22
|
@current_cell = {}
|
23
|
-
@current_value =
|
23
|
+
@current_value = +""
|
24
24
|
|
25
25
|
@headers = @empty_row.keys if @mode == :hash
|
26
26
|
end
|
27
27
|
|
28
28
|
def start_element(name, attrs)
|
29
29
|
case name
|
30
|
-
when
|
31
|
-
@state = name
|
30
|
+
when "c"
|
32
31
|
@current_cell = attrs
|
33
32
|
@current_value.clear
|
34
|
-
when
|
35
|
-
@
|
36
|
-
when
|
37
|
-
@state = name
|
33
|
+
when "v", "is", "t"
|
34
|
+
@store_characters = true
|
35
|
+
when "row"
|
38
36
|
@current_row = @empty_row.dup
|
39
|
-
@
|
40
|
-
when 't'
|
41
|
-
@state = nil unless @state == 'is'
|
42
|
-
else
|
43
|
-
@state = nil
|
37
|
+
@current_row_number = attrs[:r].to_i
|
44
38
|
end
|
45
39
|
end
|
46
40
|
|
47
41
|
def characters(value)
|
48
|
-
@current_value << value if @
|
42
|
+
@current_value << value if @store_characters
|
49
43
|
end
|
50
44
|
|
51
45
|
def end_element(name)
|
52
46
|
case name
|
53
|
-
when
|
54
|
-
@
|
55
|
-
when
|
47
|
+
when "v", "is", "t"
|
48
|
+
@store_characters = false
|
49
|
+
when "c"
|
56
50
|
col_index = column_index(@current_cell[:r])
|
57
51
|
|
58
|
-
|
59
|
-
when :array
|
52
|
+
if @mode == :array
|
60
53
|
@current_row[col_index] = format_cell
|
61
|
-
|
54
|
+
else
|
62
55
|
@current_row[@headers[col_index]] = format_cell
|
63
56
|
end
|
64
|
-
when
|
65
|
-
|
66
|
-
adjusted_row_number = real_row_number - @row_skip
|
57
|
+
when "row"
|
58
|
+
return if @current_row_number <= @row_skip
|
67
59
|
|
68
|
-
|
60
|
+
adjusted_row_number = @current_row_number - @row_skip
|
69
61
|
|
70
62
|
@row_index += 1
|
71
63
|
|
@@ -90,23 +82,22 @@ module Xsv
|
|
90
82
|
return nil if @current_value.empty?
|
91
83
|
|
92
84
|
case @current_cell[:t]
|
93
|
-
when
|
85
|
+
when "s"
|
94
86
|
@workbook.shared_strings[@current_value.to_i]
|
95
|
-
when
|
87
|
+
when "str", "inlineStr"
|
96
88
|
@current_value.strip
|
97
|
-
when
|
89
|
+
when "e" # N/A
|
98
90
|
nil
|
99
|
-
when nil,
|
91
|
+
when nil, "n"
|
100
92
|
if @current_cell[:s]
|
101
|
-
|
102
|
-
numFmt = @workbook.numFmts[style[:numFmtId].to_i]
|
103
|
-
|
104
|
-
parse_number_format(@current_value, numFmt)
|
93
|
+
parse_number_format(@current_value, @workbook.get_num_fmt(@current_cell[:s].to_i))
|
105
94
|
else
|
106
95
|
parse_number(@current_value)
|
107
96
|
end
|
108
|
-
when
|
109
|
-
@current_value ==
|
97
|
+
when "b"
|
98
|
+
@current_value == "1"
|
99
|
+
when "d"
|
100
|
+
DateTime.parse(@current_value)
|
110
101
|
else
|
111
102
|
raise Xsv::Error, "Encountered unknown column type #{@current_cell[:t]}"
|
112
103
|
end
|
data/lib/xsv/styles_handler.rb
CHANGED
@@ -5,39 +5,39 @@ module Xsv
|
|
5
5
|
# This is used internally when opening a sheet.
|
6
6
|
class StylesHandler < SaxParser
|
7
7
|
def self.get_styles(io)
|
8
|
-
handler = new(Xsv::Helpers::BUILT_IN_NUMBER_FORMATS.dup) do |xfs,
|
8
|
+
handler = new(Xsv::Helpers::BUILT_IN_NUMBER_FORMATS.dup) do |xfs, num_fmts|
|
9
9
|
@xfs = xfs
|
10
|
-
@
|
10
|
+
@num_fmts = num_fmts
|
11
11
|
end
|
12
12
|
|
13
13
|
handler.parse(io)
|
14
14
|
|
15
|
-
[@xfs, @
|
15
|
+
[@xfs, @num_fmts]
|
16
16
|
end
|
17
17
|
|
18
|
-
def initialize(
|
18
|
+
def initialize(num_fmts, &block)
|
19
19
|
@block = block
|
20
20
|
@state = nil
|
21
21
|
@xfs = []
|
22
|
-
@
|
22
|
+
@num_fmts = num_fmts
|
23
23
|
end
|
24
24
|
|
25
25
|
def start_element(name, attrs)
|
26
26
|
case name
|
27
|
-
when
|
28
|
-
@state =
|
29
|
-
when
|
30
|
-
@xfs << attrs if @state ==
|
31
|
-
when
|
32
|
-
@
|
27
|
+
when "cellXfs"
|
28
|
+
@state = "cellXfs"
|
29
|
+
when "xf"
|
30
|
+
@xfs << attrs.transform_values(&:to_i) if @state == "cellXfs"
|
31
|
+
when "numFmt"
|
32
|
+
@num_fmts[attrs[:numFmtId].to_i] = attrs[:formatCode]
|
33
33
|
end
|
34
34
|
end
|
35
35
|
|
36
36
|
def end_element(name)
|
37
37
|
case name
|
38
|
-
when
|
39
|
-
@block.call(@xfs, @
|
40
|
-
when
|
38
|
+
when "styleSheet"
|
39
|
+
@block.call(@xfs, @num_fmts)
|
40
|
+
when "cellXfs"
|
41
41
|
@state = nil
|
42
42
|
end
|
43
43
|
end
|
data/lib/xsv/version.rb
CHANGED
data/lib/xsv/workbook.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require
|
3
|
+
require "zip"
|
4
4
|
|
5
5
|
module Xsv
|
6
6
|
# An OOXML Spreadsheet document is called a Workbook. A Workbook consists of
|
@@ -10,54 +10,36 @@ module Xsv
|
|
10
10
|
# @return [Array<Sheet>]
|
11
11
|
attr_reader :sheets
|
12
12
|
|
13
|
-
attr_reader :shared_strings, :xfs, :
|
14
|
-
|
15
|
-
#
|
16
|
-
|
17
|
-
|
18
|
-
@workbook = if data.is_a?(IO) || data.respond_to?(:read) # is it a buffer?
|
19
|
-
new(Zip::File.open_buffer(data), **kws)
|
20
|
-
elsif data.start_with?("PK\x03\x04") # is it a string containing a file?
|
21
|
-
new(Zip::File.open_buffer(data), **kws)
|
22
|
-
else # must be a filename
|
23
|
-
new(Zip::File.open(data), **kws)
|
24
|
-
end
|
25
|
-
|
26
|
-
if block_given?
|
27
|
-
begin
|
28
|
-
yield(@workbook)
|
29
|
-
ensure
|
30
|
-
@workbook.close
|
31
|
-
end
|
32
|
-
else
|
33
|
-
@workbook
|
34
|
-
end
|
13
|
+
attr_reader :shared_strings, :xfs, :num_fmts, :trim_empty_rows
|
14
|
+
|
15
|
+
# @deprecated Use {Xsv.open} instead
|
16
|
+
def self.open(data, **kws, &block)
|
17
|
+
Xsv.open(data, **kws, &block)
|
35
18
|
end
|
36
19
|
|
37
20
|
# Open a workbook from an instance of {Zip::File}. Generally it's recommended
|
38
21
|
# to use the {.open} method instead of the constructor.
|
39
22
|
#
|
40
|
-
#
|
41
|
-
#
|
42
|
-
|
43
|
-
#
|
44
|
-
def initialize(zip, trim_empty_rows: false)
|
23
|
+
# @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
|
24
|
+
# @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
|
25
|
+
def initialize(zip, trim_empty_rows: false, parse_headers: false)
|
45
26
|
raise ArgumentError, "Passed argument is not an instance of Zip::File. Did you mean to use Workbook.open?" unless zip.is_a?(Zip::File)
|
27
|
+
raise Xsv::Error, "Zip::File is empty" if zip.size.zero?
|
46
28
|
|
47
29
|
@zip = zip
|
48
30
|
@trim_empty_rows = trim_empty_rows
|
49
31
|
|
50
32
|
@sheets = []
|
51
|
-
@xfs, @
|
33
|
+
@xfs, @num_fmts = fetch_styles
|
52
34
|
@sheet_ids = fetch_sheet_ids
|
53
35
|
@relationships = fetch_relationships
|
54
36
|
@shared_strings = fetch_shared_strings
|
55
|
-
@sheets = fetch_sheets
|
37
|
+
@sheets = fetch_sheets(parse_headers ? :hash : :array)
|
56
38
|
end
|
57
39
|
|
58
40
|
# @return [String]
|
59
41
|
def inspect
|
60
|
-
"#<#{self.class.name}:#{object_id}>"
|
42
|
+
"#<#{self.class.name}:#{object_id} sheets=#{sheets.count} trim_empty_rows=#{@trim_empty_rows}>"
|
61
43
|
end
|
62
44
|
|
63
45
|
# Close the handle to the workbook file and leave all resources for the GC to collect
|
@@ -67,7 +49,7 @@ module Xsv
|
|
67
49
|
@zip = nil
|
68
50
|
@sheets = nil
|
69
51
|
@xfs = nil
|
70
|
-
@
|
52
|
+
@num_fmts = nil
|
71
53
|
@relationships = nil
|
72
54
|
@shared_strings = nil
|
73
55
|
@sheet_ids = nil
|
@@ -82,10 +64,15 @@ module Xsv
|
|
82
64
|
@sheets.select { |s| s.name == name }
|
83
65
|
end
|
84
66
|
|
67
|
+
# Get number format for given style index
|
68
|
+
def get_num_fmt(style)
|
69
|
+
@num_fmts[@xfs[style][:numFmtId]]
|
70
|
+
end
|
71
|
+
|
85
72
|
private
|
86
73
|
|
87
74
|
def fetch_shared_strings
|
88
|
-
handle = @zip.glob(
|
75
|
+
handle = @zip.glob("xl/sharedStrings.xml").first
|
89
76
|
return if handle.nil?
|
90
77
|
|
91
78
|
stream = handle.get_input_stream
|
@@ -95,32 +82,34 @@ module Xsv
|
|
95
82
|
end
|
96
83
|
|
97
84
|
def fetch_styles
|
98
|
-
stream = @zip.glob(
|
85
|
+
stream = @zip.glob("xl/styles.xml").first.get_input_stream
|
99
86
|
|
100
87
|
StylesHandler.get_styles(stream)
|
101
88
|
ensure
|
102
89
|
stream.close
|
103
90
|
end
|
104
91
|
|
105
|
-
def fetch_sheets
|
106
|
-
@zip.glob(
|
92
|
+
def fetch_sheets(mode)
|
93
|
+
@zip.glob("xl/worksheets/sheet*.xml").sort do |a, b|
|
107
94
|
a.name[/\d+/].to_i <=> b.name[/\d+/].to_i
|
108
95
|
end.map do |entry|
|
109
|
-
rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?(
|
96
|
+
rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?("worksheet") }
|
110
97
|
sheet_ids = @sheet_ids.detect { |i| i[:"r:id"] == rel[:Id] }
|
111
|
-
Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids)
|
98
|
+
Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids).tap do |sheet|
|
99
|
+
sheet.parse_headers! if mode == :hash
|
100
|
+
end
|
112
101
|
end
|
113
102
|
end
|
114
103
|
|
115
104
|
def fetch_sheet_ids
|
116
|
-
stream = @zip.glob(
|
105
|
+
stream = @zip.glob("xl/workbook.xml").first.get_input_stream
|
117
106
|
SheetsIdsHandler.get_sheets_ids(stream)
|
118
107
|
ensure
|
119
108
|
stream.close
|
120
109
|
end
|
121
110
|
|
122
111
|
def fetch_relationships
|
123
|
-
stream = @zip.glob(
|
112
|
+
stream = @zip.glob("xl/_rels/workbook.xml.rels").first.get_input_stream
|
124
113
|
RelationshipsHandler.get_relations(stream)
|
125
114
|
ensure
|
126
115
|
stream.close
|
data/lib/xsv.rb
CHANGED
@@ -1,18 +1,18 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require
|
3
|
+
require "date"
|
4
4
|
|
5
|
-
require
|
6
|
-
require
|
7
|
-
require
|
8
|
-
require
|
9
|
-
require
|
10
|
-
require
|
11
|
-
require
|
12
|
-
require
|
13
|
-
require
|
14
|
-
require
|
15
|
-
require
|
5
|
+
require "xsv/helpers"
|
6
|
+
require "xsv/sax_parser"
|
7
|
+
require "xsv/relationships_handler"
|
8
|
+
require "xsv/shared_strings_parser"
|
9
|
+
require "xsv/sheet"
|
10
|
+
require "xsv/sheet_bounds_handler"
|
11
|
+
require "xsv/sheet_rows_handler"
|
12
|
+
require "xsv/sheets_ids_handler"
|
13
|
+
require "xsv/styles_handler"
|
14
|
+
require "xsv/version"
|
15
|
+
require "xsv/workbook"
|
16
16
|
|
17
17
|
# XSV is a fast, lightweight parser for Office Open XML spreadsheet files
|
18
18
|
# (commonly known as Excel or .xlsx files). It strives to be minimal in the
|
@@ -24,4 +24,31 @@ module Xsv
|
|
24
24
|
# An AssertionFailed error indicates an unexpected condition, meaning a bug
|
25
25
|
# or misinterpreted .xlsx document
|
26
26
|
class AssertionFailed < StandardError; end
|
27
|
+
|
28
|
+
# Open the workbook of the given filename, string or buffer.
|
29
|
+
# @param filename_or_string [String, IO] the contents or filename of a workbook
|
30
|
+
# @param trim_empty_rows [Boolean] Scan sheet for end of content and don't return trailing rows
|
31
|
+
# @param parse_headers [Boolean] Call `parse_headers!` on all sheets on load
|
32
|
+
# @return [Xsv::Workbook] The workbook instance
|
33
|
+
def self.open(filename_or_string, trim_empty_rows: false, parse_headers: false)
|
34
|
+
zip = if filename_or_string.is_a?(IO) || filename_or_string.respond_to?(:read) # is it a buffer?
|
35
|
+
Zip::File.open_buffer(filename_or_string)
|
36
|
+
elsif filename_or_string.start_with?("PK\x03\x04") # is it a string containing a file?
|
37
|
+
Zip::File.open_buffer(filename_or_string)
|
38
|
+
else # must be a filename
|
39
|
+
Zip::File.open(filename_or_string)
|
40
|
+
end
|
41
|
+
|
42
|
+
workbook = Xsv::Workbook.new(zip, trim_empty_rows: trim_empty_rows, parse_headers: parse_headers)
|
43
|
+
|
44
|
+
if block_given?
|
45
|
+
begin
|
46
|
+
yield(workbook)
|
47
|
+
ensure
|
48
|
+
workbook.close
|
49
|
+
end
|
50
|
+
else
|
51
|
+
workbook
|
52
|
+
end
|
53
|
+
end
|
27
54
|
end
|
data/xsv.gemspec
CHANGED
@@ -14,7 +14,7 @@ Gem::Specification.new do |spec|
|
|
14
14
|
(commonly known as Excel or .xlsx files). It strives to be minimal in the
|
15
15
|
sense that it provides nothing a CSV reader wouldn't, meaning it only
|
16
16
|
deals with minimal formatting and cannot create or modify documents.
|
17
|
-
|
17
|
+
EOF
|
18
18
|
spec.homepage = "https://github.com/martijn/xsv"
|
19
19
|
spec.license = "MIT"
|
20
20
|
|
@@ -36,11 +36,13 @@ Gem::Specification.new do |spec|
|
|
36
36
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
37
37
|
spec.require_paths = ["lib"]
|
38
38
|
|
39
|
-
spec.required_ruby_version = ">= 2.
|
39
|
+
spec.required_ruby_version = ">= 2.6"
|
40
40
|
|
41
41
|
spec.add_dependency "rubyzip", ">= 1.3", "< 3"
|
42
42
|
|
43
43
|
spec.add_development_dependency "bundler", "< 3"
|
44
44
|
spec.add_development_dependency "rake", "~> 13.0"
|
45
45
|
spec.add_development_dependency "minitest", "~> 5.14.2"
|
46
|
+
spec.add_development_dependency "standard", "~> 1.6.0"
|
47
|
+
spec.add_development_dependency "codecov", ">= 0.6.0"
|
46
48
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xsv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0
|
4
|
+
version: 1.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Martijn Storck
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2022-02-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rubyzip
|
@@ -72,6 +72,34 @@ dependencies:
|
|
72
72
|
- - "~>"
|
73
73
|
- !ruby/object:Gem::Version
|
74
74
|
version: 5.14.2
|
75
|
+
- !ruby/object:Gem::Dependency
|
76
|
+
name: standard
|
77
|
+
requirement: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - "~>"
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: 1.6.0
|
82
|
+
type: :development
|
83
|
+
prerelease: false
|
84
|
+
version_requirements: !ruby/object:Gem::Requirement
|
85
|
+
requirements:
|
86
|
+
- - "~>"
|
87
|
+
- !ruby/object:Gem::Version
|
88
|
+
version: 1.6.0
|
89
|
+
- !ruby/object:Gem::Dependency
|
90
|
+
name: codecov
|
91
|
+
requirement: !ruby/object:Gem::Requirement
|
92
|
+
requirements:
|
93
|
+
- - ">="
|
94
|
+
- !ruby/object:Gem::Version
|
95
|
+
version: 0.6.0
|
96
|
+
type: :development
|
97
|
+
prerelease: false
|
98
|
+
version_requirements: !ruby/object:Gem::Requirement
|
99
|
+
requirements:
|
100
|
+
- - ">="
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: 0.6.0
|
75
103
|
description: |2
|
76
104
|
Xsv is a fast, lightweight parser for Office Open XML spreadsheet files
|
77
105
|
(commonly known as Excel or .xlsx files). It strives to be minimal in the
|
@@ -83,13 +111,15 @@ executables: []
|
|
83
111
|
extensions: []
|
84
112
|
extra_rdoc_files: []
|
85
113
|
files:
|
114
|
+
- ".github/workflows/ruby.yml"
|
86
115
|
- ".gitignore"
|
87
|
-
- ".
|
116
|
+
- ".standard.yml"
|
88
117
|
- CHANGELOG.md
|
89
118
|
- Gemfile
|
90
119
|
- LICENSE.txt
|
91
120
|
- README.md
|
92
121
|
- Rakefile
|
122
|
+
- benchmark.rb
|
93
123
|
- bin/console
|
94
124
|
- bin/setup
|
95
125
|
- lib/xsv.rb
|
@@ -120,14 +150,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
120
150
|
requirements:
|
121
151
|
- - ">="
|
122
152
|
- !ruby/object:Gem::Version
|
123
|
-
version: '2.
|
153
|
+
version: '2.6'
|
124
154
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
125
155
|
requirements:
|
126
156
|
- - ">="
|
127
157
|
- !ruby/object:Gem::Version
|
128
158
|
version: '0'
|
129
159
|
requirements: []
|
130
|
-
rubygems_version: 3.2.
|
160
|
+
rubygems_version: 3.2.3
|
131
161
|
signing_key:
|
132
162
|
specification_version: 4
|
133
163
|
summary: A fast and lightweight xlsx parser that provides nothing a CSV parser wouldn't
|