simple_xlsx_reader 2.0.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 979490ce3bd7f0482879fb5fb5465e10ad1b07c1488d0a544950131d9063050a
4
- data.tar.gz: 412d0040a586cc5ee4acdd4a2f74dd74f3bf9eb781a35d8a36c12f6caadc566c
3
+ metadata.gz: 8552d34f153cbdc6561c40725488d193e9aa48debcded0af24d32daf01b2f951
4
+ data.tar.gz: 2a0fecdec3698bb16717244fc7bf9b45b4fe0f6b216038e9823f9a5fea2ea8fa
5
5
  SHA512:
6
- metadata.gz: 00c01bc0c2a393eb35e458411dfeab55b8bf30cee2661324cbd97a175baf0ceb31a881b1b2b7bd668a2b475ff008372c1428908340e30769308884355fdd46e8
7
- data.tar.gz: 81b1b26806a97c56710cab64aa22212985dea82b308e2fbba6835f4ea7a69b79067268bb13537999594dc5722928f1df235938355a7d4a51b58ae7ed4af1d093
6
+ metadata.gz: 77f99e8ad1020f0313171dcd0b14f7200fdf116e16de312146eb66a4d9347e94a0bf1cb4483f606975cd8bc776e80995473485271e05ee0a11136ef72cdeeae5
7
+ data.tar.gz: 7ee3ed8c37df6632981bd6eeb301de5f852df0f66534ce91593923cf1b51aa1dc0b07aed224d5d88cbd4b1f8a6901fdb17164e6e9f22fb10d4e5d90a3c24f437
@@ -22,15 +22,12 @@ jobs:
22
22
  runs-on: ubuntu-latest
23
23
  strategy:
24
24
  matrix:
25
- ruby-version: ['2.6', '2.7', '3.0']
25
+ ruby-version: ['2.6', '2.7', '3.0', '3.1', '3.2']
26
26
 
27
27
  steps:
28
28
  - uses: actions/checkout@v3
29
29
  - name: Set up Ruby
30
- # To automatically get bug fixes and new Ruby versions for ruby/setup-ruby,
31
- # change this to (see https://github.com/ruby/setup-ruby#versioning):
32
- # uses: ruby/setup-ruby@v1
33
- uses: ruby/setup-ruby@2b019609e2b0f1ea1a2bc8ca11cb82ab46ada124
30
+ uses: ruby/setup-ruby@v1
34
31
  with:
35
32
  ruby-version: ${{ matrix.ruby-version }}
36
33
  bundler-cache: true # runs 'bundle install' and caches installed gems automatically
data/CHANGELOG.md CHANGED
@@ -1,3 +1,52 @@
1
+ ### 5.0.0
2
+
3
+ * Change SimpleXlsxReader::Hyperlink to default to the visible cell value
4
+ instead of the hyperlink URL, which in the case of mailto hyperlinks is
5
+ surprising.
6
+ * Fix blank content when parsing docs from string (@codemole)
7
+
8
+ ### 4.0.1
9
+
10
+ * Fix nil error when handling some inline strings
11
+
12
+ Inline strings are almost exclusively used by non-Excel XLSX
13
+ implementations, but are valid, and sometimes have nil chunks.
14
+
15
+ Also, inline strings weren't preserving whitespace if Nokogiri is
16
+ parsing the string in chunks, as it does when encountering escaped
17
+ characters. Fixed.
18
+
19
+ ### 4.0.0
20
+
21
+ * Fix percentage rounding errors. Previously we were dividing by 100, when we
22
+ actually don't need to, so percentage types were 100x too small. Fixes #21.
23
+ Major bump because workarounds might have been implemented for previous
24
+ incorrect behavior.
25
+ * Fix small oddity in one currency format where round numbers would be cast
26
+ to an integer instead of a float.
27
+
28
+ ### 3.0.1
29
+
30
+ * Fix parsing "chunky" UTF-8 workbooks. Closes issues #39 and #45. See ce67f0d4.
31
+
32
+ ### 3.0.0
33
+
34
+ * Change the way we typecast cells in the General format. This probably won't
35
+ break anything in your app, but it's a change in behavior that theoretically
36
+ could.
37
+
38
+ Previously, we were treating cells using General the format as strings, when
39
+ according to the Office XML standard, they should be treated as numbers. We
40
+ now attempt to cast such cells as numbers, and fall back to strings if number
41
+ casting fails.
42
+
43
+ Thanks @jrodrigosm
44
+
45
+ ### 2.0.1
46
+
47
+ * Restore ability to parse IO strings (@robbevp)
48
+ * Add Ruby 3.1 and 3.2 to CI (@taichi-ishitani)
49
+
1
50
  ### 2.0.0
2
51
 
3
52
  * SPEED
data/README.md CHANGED
@@ -9,15 +9,17 @@ then forgotten. We just want to get the data, and get out!
9
9
 
10
10
  ## Summary (now with stream parsing):
11
11
 
12
- doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
13
- doc.sheets # => [<#SXR::Sheet>, ...]
14
- doc.sheets.first.name # 'Sheet1'
15
- doc.sheets.first.rows # <SXR::Document::RowsProxy>
16
- doc.sheets.first.rows.each # an <Enumerator> ready to chain or stream
17
- doc.sheets.first.rows.each {} # Streams the rows to your block
18
- doc.sheets.first.rows.each(headers: true) {} # Streams row-hashes
19
- doc.sheets.first.rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
20
- doc.sheets.first.rows.slurp # Slurps rows into memory as a 2D array
12
+ ```ruby
13
+ doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
14
+ doc.sheets # => [<#SXR::Sheet>, ...]
15
+ doc.sheets.first.name # 'Sheet1'
16
+ rows = doc.sheet.first.rows # <SXR::Document::RowsProxy>
17
+ rows.each # an <Enumerator> ready to chain or stream
18
+ rows.each {} # Streams the rows to your block
19
+ rows.each(headers: true) {} # Streams row-hashes
20
+ rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
21
+ rows.slurp # Slurps rows into memory as a 2D array
22
+ ```
21
23
 
22
24
  That's the gist of it!
23
25
 
@@ -29,7 +31,8 @@ See also the [Document](https://github.com/woahdae/simple_xlsx_reader/blob/2.0.0
29
31
 
30
32
  This project was started years ago, primarily because other Ruby xlsx parsers
31
33
  didn't import data with the correct types. Numbers as strings, dates as numbers,
32
- hyperlinks with inaccessible URLs, or - subtly buggy - simple dates as DateTime
34
+ [hyperlinks](https://github.com/woahdae/simple_xlsx_reader/blob/master/lib/simple_xlsx_reader/hyperlink.rb)
35
+ with inaccessible URLs, or - subtly buggy - simple dates as DateTime
33
36
  objects. If your app uses a timezone offset, depending on what timezone and
34
37
  what time of day you load the xlsx file, your dates might end up a day off!
35
38
  SimpleXlsxReader understands all these correctly.
@@ -39,12 +42,14 @@ SimpleXlsxReader understands all these correctly.
39
42
  Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly.
40
43
  SimpleXlsxReader strives to be fairly idiomatic Ruby:
41
44
 
42
- # quick example having fun w/ ruby
43
- doc = SimpleXlsxReader.open(path_or_io)
44
- doc.sheets.first.rows.each(headers: {id: /ID/})
45
- .with_index.with_object({}) do |(row, index), acc|
46
- acc[row[:id]] = index
47
- end
45
+ ```ruby
46
+ # quick example having fun w/ ruby
47
+ doc = SimpleXlsxReader.open(path_or_io)
48
+ doc.sheets.first.rows.each(headers: {id: /ID/})
49
+ .with_index.with_object({}) do |(row, index), acc|
50
+ acc[row[:id]] = index
51
+ end
52
+ ```
48
53
 
49
54
  ### Now faster
50
55
 
@@ -77,15 +82,19 @@ If you had an excel sheet representing this data:
77
82
 
78
83
  Get a handle on the rows proxy:
79
84
 
80
- `rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows`
85
+ ```ruby
86
+ rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows
87
+ ```
81
88
 
82
89
  Simple streaming (kinda boring):
83
90
 
84
- `rows.each { |row| ... }`
91
+ ```ruby
92
+ rows.each { |row| ... }
93
+ ````
85
94
 
86
95
  Streaming with headers, and how about a little enumerable chaining:
87
96
 
88
- ```
97
+ ```ruby
89
98
  # Map of hero names by ID: { 117 => 'John Halo', ... }
90
99
 
91
100
  rows.each(headers: true).with_object({}) do |row, acc|
@@ -108,7 +117,7 @@ Sometimes though you have some junk at the top of your spreadsheet:
108
117
  For this, `headers` can be a hash whose keys replace headers and whose values
109
118
  help find the correct header row:
110
119
 
111
- ```
120
+ ```ruby
112
121
  # Same map of hero names by ID: { 117 => 'John Halo', ... }
113
122
 
114
123
  rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
@@ -119,7 +128,7 @@ end
119
128
  If your header-to-attribute mapping is more complicated than key/value, you
120
129
  can do the mapping elsewhere, but use a block to find the header row:
121
130
 
122
- ```
131
+ ```ruby
123
132
  # Example roughly analogous to some production code mapping a single spreadsheet
124
133
  # across many objects. Might be a simpler way now that we have the headers-hash
125
134
  # feature.
@@ -168,9 +177,11 @@ can set `SimpleXlsxReader.configuration.catch_cell_load_errors =
168
177
  true`, and load errors will instead be inserted into Sheet#load_errors keyed
169
178
  by [rownum, colnum]:
170
179
 
171
- {
172
- [rownum, colnum] => '[error]'
173
- }
180
+ ```ruby
181
+ {
182
+ [rownum, colnum] => '[error]'
183
+ }
184
+ ```
174
185
 
175
186
  ### Performance
176
187
 
@@ -233,11 +244,9 @@ This project follows [semantic versioning 1.0](http://semver.org/spec/v1.0.0.htm
233
244
  Remember to write tests, think about edge cases, and run the existing
234
245
  suite.
235
246
 
236
- Note that as of commit 665cbafdde, the most extreme end of the
237
- linear-time performance test, which is 10,000 rows (12 columns), runs in
238
- ~4 seconds on Ruby 2.1 on a 2012 MBP. If the linear time assertion fails
239
- or you're way off that, there is probably a performance regression in
240
- your code.
247
+ The full suite contains a performance test that on an M1 MBP runs the final
248
+ large file in about five seconds. Check out that test before & after your
249
+ change to check for performance changes.
241
250
 
242
251
  Then, the standard stuff:
243
252
 
@@ -8,14 +8,16 @@ module SimpleXlsxReader
8
8
  # Main class for the public API. See the README for usage examples,
9
9
  # or read the code, it's pretty friendly.
10
10
  class Document
11
- attr_reader :file_path
11
+ attr_reader :string_or_io
12
12
 
13
- def initialize(file_path)
14
- @file_path = file_path
13
+ def initialize(legacy_file_path = nil, file_path: nil, string_or_io: nil)
14
+ fail(ArgumentError, 'either file_path or string_or_io must be provided') if legacy_file_path.nil? && file_path.nil? && string_or_io.nil?
15
+
16
+ @string_or_io = string_or_io || File.new(legacy_file_path || file_path)
15
17
  end
16
18
 
17
19
  def sheets
18
- @sheets ||= Loader.new(file_path).init_sheets
20
+ @sheets ||= Loader.new(string_or_io).init_sheets
19
21
  end
20
22
 
21
23
  # Expensive because it slurps all the sheets into memory,
@@ -4,27 +4,26 @@ module SimpleXlsxReader
4
4
  # We support hyperlinks as a "type" even though they're technically
5
5
  # represented either as a function or an external reference in the xlsx spec.
6
6
  #
7
- # Since having hyperlink data in our sheet usually means we might want to do
8
- # something primarily with the URL (store it in the database, download it, etc),
9
- # we go through extra effort to parse the function or follow the reference
10
- # to represent the hyperlink primarily as a URL. However, maybe we do want
11
- # the hyperlink "friendly name" part (as MS calls it), so here we've subclassed
12
- # string to tack on the friendly name. This means 80% of us that just want
13
- # the URL value will have to do nothing extra, but the 20% that might want the
14
- # friendly name can access it.
7
+ # In practice, hyperlinks are usually a link or a mailto. In the case of a
8
+ # link, we probably want to follow it to download something, but in the case
9
+ # of an email, we probably just want the email and not the mailto. So we
10
+ # represent a hyperlink primarily as it is seen by the user, following the
11
+ # principle of least surprise, but the url is accessible via #url.
15
12
  #
16
- # Note, by default, the value we would get by just asking the cell would
17
- # be the "friendly name" and *not* the URL, which is tucked away in the
18
- # function definition or a separate "relationships" meta-document.
13
+ # Microsoft calls the visible part of a hyperlink cell the "friendly name,"
14
+ # so we expose that as a method too, in case you want to be explicit about
15
+ # how you're accessing it.
19
16
  #
20
17
  # See MS documentation on the HYPERLINK function for some background:
21
18
  # https://support.office.com/en-us/article/HYPERLINK-function-333c7ce6-c5ae-4164-9c47-7de9b76f577f
22
19
  class Hyperlink < String
23
20
  attr_reader :friendly_name
21
+ attr_reader :url
24
22
 
25
23
  def initialize(url, friendly_name = nil)
26
24
  @friendly_name = friendly_name
27
- super(url)
25
+ @url = url
26
+ super(friendly_name || url)
28
27
  end
29
28
  end
30
29
  end
@@ -31,10 +31,9 @@ module SimpleXlsxReader
31
31
  @url = nil # silence warnings
32
32
  @function = nil # silence warnings
33
33
  @capture = nil # silence warnings
34
+ @captured = nil # silence warnings
34
35
  @dimension = nil # silence warnings
35
36
 
36
- @file_io.rewind # in case we've already parsed this once
37
-
38
37
  # In this project this is only used for GUI-made hyperlinks (as opposed
39
38
  # to FUNCTION-based hyperlinks). Unfortunately the're needed to parse
40
39
  # the spreadsheet, and they come AFTER the sheet data. So, solution is
@@ -44,9 +43,10 @@ module SimpleXlsxReader
44
43
  if xrels_file&.grep(/hyperlink/)&.any?
45
44
  xrels_file.rewind
46
45
  load_gui_hyperlinks # represented as hyperlinks_by_cell
47
- @file_io.rewind
48
46
  end
49
47
 
48
+ @file_io.rewind # in case we've already parsed this once
49
+
50
50
  Nokogiri::XML::SAX::Parser.new(self).parse(@file_io)
51
51
  end
52
52
 
@@ -77,10 +77,10 @@ module SimpleXlsxReader
77
77
 
78
78
  return unless @capture
79
79
 
80
- @current_row[cell_idx] =
80
+ captured =
81
81
  begin
82
82
  SimpleXlsxReader::Loader.cast(
83
- string.strip, @type, @style,
83
+ string, @type, @style,
84
84
  url: @url || hyperlinks_by_cell&.[](@cell_name),
85
85
  shared_strings: shared_strings,
86
86
  base_date: base_date
@@ -99,9 +99,19 @@ module SimpleXlsxReader
99
99
  else
100
100
  @load_errors[[row_idx, col_idx]] = e.message
101
101
 
102
- string.strip
102
+ string
103
103
  end
104
104
  end
105
+
106
+ # For some reason I can't figure out in a reasonable timeframe,
107
+ # SAX parsing some workbooks captures separate strings in the same cell
108
+ # when we encounter UTF-8, although I can't get workbooks made in my
109
+ # own version of excel to repro it. Our fix is just to keep building
110
+ # the string in this case, although maybe there's a setting in Nokogiri
111
+ # to make it not do this (looked, couldn't find it).
112
+ #
113
+ # Loading the workbook test/chunky_utf8.xlsx repros the issue.
114
+ @captured = @captured ? @captured + (captured || '') : captured
105
115
  end
106
116
 
107
117
  def end_element(name)
@@ -134,7 +144,10 @@ module SimpleXlsxReader
134
144
  # isn't the most robust strategy, but it likely fits 99% of use cases
135
145
  # considering it's not a problem with actual excel docs.
136
146
  @dimension = "A1:#{@cell_name}" if @dimension.nil?
137
- when 'v', 't' then @capture = false
147
+ when 'v', 't'
148
+ @current_row[cell_idx] = @captured
149
+ @capture = false
150
+ @captured = nil
138
151
  when 'f' then @function = false
139
152
  when 'c' then @url = nil
140
153
  end
@@ -9,38 +9,39 @@ module SimpleXlsxReader
9
9
 
10
10
  # Map of non-custom numFmtId to casting symbol
11
11
  NumFmtMap = {
12
- 0 => :string, # General
13
- 1 => :fixnum, # 0
14
- 2 => :float, # 0.00
15
- 3 => :fixnum, # #,##0
16
- 4 => :float, # #,##0.00
17
- 5 => :unsupported, # $#,##0_);($#,##0)
18
- 6 => :unsupported, # $#,##0_);[Red]($#,##0)
19
- 7 => :unsupported, # $#,##0.00_);($#,##0.00)
20
- 8 => :unsupported, # $#,##0.00_);[Red]($#,##0.00)
21
- 9 => :percentage, # 0%
22
- 10 => :percentage, # 0.00%
23
- 11 => :bignum, # 0.00E+00
24
- 12 => :unsupported, # # ?/?
25
- 13 => :unsupported, # # ??/??
26
- 14 => :date, # mm-dd-yy
27
- 15 => :date, # d-mmm-yy
28
- 16 => :date, # d-mmm
29
- 17 => :date, # mmm-yy
30
- 18 => :time, # h:mm AM/PM
31
- 19 => :time, # h:mm:ss AM/PM
32
- 20 => :time, # h:mm
33
- 21 => :time, # h:mm:ss
34
- 22 => :date_time, # m/d/yy h:mm
35
- 37 => :unsupported, # #,##0 ;(#,##0)
36
- 38 => :unsupported, # #,##0 ;[Red](#,##0)
37
- 39 => :unsupported, # #,##0.00;(#,##0.00)
38
- 40 => :unsupported, # #,##0.00;[Red](#,##0.00)
39
- 45 => :time, # mm:ss
40
- 46 => :time, # [h]:mm:ss
41
- 47 => :time, # mmss.0
42
- 48 => :bignum, # ##0.0E+0
43
- 49 => :unsupported # @
12
+ 0 => :string, # General
13
+ 1 => :fixnum, # 0
14
+ 2 => :float, # 0.00
15
+ 3 => :fixnum, # #,##0
16
+ 4 => :float, # #,##0.00
17
+ 5 => :unsupported, # $#,##0_);($#,##0)
18
+ 6 => :unsupported, # $#,##0_);[Red]($#,##0)
19
+ 7 => :unsupported, # $#,##0.00_);($#,##0.00)
20
+ 8 => :unsupported, # $#,##0.00_);[Red]($#,##0.00)
21
+ 9 => :percentage, # 0%
22
+ 10 => :percentage, # 0.00%
23
+ 11 => :bignum, # 0.00E+00
24
+ 12 => :unsupported, # # ?/?
25
+ 13 => :unsupported, # # ??/??
26
+ 14 => :date, # mm-dd-yy
27
+ 15 => :date, # d-mmm-yy
28
+ 16 => :date, # d-mmm
29
+ 17 => :date, # mmm-yy
30
+ 18 => :time, # h:mm AM/PM
31
+ 19 => :time, # h:mm:ss AM/PM
32
+ 20 => :time, # h:mm
33
+ 21 => :time, # h:mm:ss
34
+ 22 => :date_time, # m/d/yy h:mm
35
+ 37 => :unsupported, # #,##0 ;(#,##0)
36
+ 38 => :unsupported, # #,##0 ;[Red](#,##0)
37
+ 39 => :unsupported, # #,##0.00;(#,##0.00)
38
+ 40 => :unsupported, # #,##0.00;[Red](#,##0.00)
39
+ 44 => :float, # some odd currency format ?from Office 2007?
40
+ 45 => :time, # mm:ss
41
+ 46 => :time, # [h]:mm:ss
42
+ 47 => :time, # mmss.0
43
+ 48 => :bignum, # ##0.0E+0
44
+ 49 => :unsupported # @
44
45
  }.freeze
45
46
 
46
47
  def parse
@@ -1,12 +1,12 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SimpleXlsxReader
4
- class Loader < Struct.new(:file_path)
4
+ class Loader < Struct.new(:string_or_io)
5
5
  attr_accessor :shared_strings, :sheet_parsers, :sheet_toc, :style_types, :base_date
6
6
 
7
7
  def init_sheets
8
8
  ZipReader.new(
9
- file_path: file_path,
9
+ string_or_io: string_or_io,
10
10
  loader: self
11
11
  ).read
12
12
 
@@ -19,12 +19,12 @@ module SimpleXlsxReader
19
19
  end
20
20
  end
21
21
 
22
- ZipReader = Struct.new(:file_path, :loader, keyword_init: true) do
22
+ ZipReader = Struct.new(:string_or_io, :loader, keyword_init: true) do
23
23
  attr_reader :zip
24
24
 
25
25
  def initialize(*args)
26
26
  super
27
- @zip = SimpleXlsxReader::Zip.open(file_path)
27
+ @zip = SimpleXlsxReader::Zip.open_buffer(string_or_io)
28
28
  end
29
29
 
30
30
  def read
@@ -149,14 +149,20 @@ module SimpleXlsxReader
149
149
  # detected earlier and cast here by its standardized symbol
150
150
  ##
151
151
 
152
- when :string, :unsupported
152
+ # no type encoded with the the General format defaults to a number type
153
+ when nil, :string
154
+ retval = Integer(value, exception: false)
155
+ retval ||= Float(value, exception: false)
156
+ retval ||= value
157
+ retval
158
+ when :unsupported
153
159
  value
154
160
  when :fixnum
155
161
  value.to_i
156
162
  when :float
157
163
  value.to_f
158
164
  when :percentage
159
- value.to_f / 100
165
+ value.to_f
160
166
  # the trickiest. note that all these formats can vary on
161
167
  # whether they actually contain a date, time, or datetime.
162
168
  when :date, :time, :date_time
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SimpleXlsxReader
4
- VERSION = '2.0.0'
4
+ VERSION = '5.0.0'
5
5
  end
@@ -42,8 +42,11 @@ module SimpleXlsxReader
42
42
  end
43
43
 
44
44
  def open(file_path)
45
- Document.new(file_path).tap(&:sheets)
45
+ Document.new(file_path: file_path).tap(&:sheets)
46
+ end
47
+
48
+ def parse(string_or_io)
49
+ Document.new(string_or_io: string_or_io).tap(&:sheets)
46
50
  end
47
- alias parse open
48
51
  end
49
52
  end
Binary file
Binary file
Binary file
@@ -70,7 +70,7 @@ describe 'SimpleXlsxReader Benchmark' do
70
70
  let(:styles) do
71
71
  # s='0' above refers to the value of numFmtId at cellXfs index 0,
72
72
  # which is in this case 'General' type
73
- styles =
73
+ _styles =
74
74
  <<-XML
75
75
  <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
76
76
  <cellXfs count="1">
@@ -18,6 +18,7 @@ describe SimpleXlsxReader do
18
18
 
19
19
  let(:sesame_street_blog_file_path) { File.join(File.dirname(__FILE__), 'sesame_street_blog.xlsx') }
20
20
  let(:sesame_street_blog_io) { File.new(sesame_street_blog_file_path) }
21
+ let(:sesame_street_blog_string) { IO.read(sesame_street_blog_file_path) }
21
22
 
22
23
  let(:expected_result) do
23
24
  {
@@ -54,6 +55,14 @@ describe SimpleXlsxReader do
54
55
  end
55
56
  end
56
57
 
58
+ describe 'load from string' do
59
+ let(:subject) { SimpleXlsxReader.parse(sesame_street_blog_io) }
60
+
61
+ it 'reads an xlsx string into a hash of {[sheet name] => [data]}' do
62
+ _(subject.to_hash).must_equal(expected_result)
63
+ end
64
+ end
65
+
57
66
  it 'outputs strings in UTF-8 encoding' do
58
67
  document = SimpleXlsxReader.parse(sesame_street_blog_io)
59
68
  _(document.sheets[0].rows.to_a.flatten.map(&:encoding).uniq)
@@ -83,7 +92,7 @@ describe SimpleXlsxReader do
83
92
  body: 'The Greatest',
84
93
  created_at: Time.parse('2002-01-01 11:00:00 UTC'),
85
94
  count: 1,
86
- "URL" => 'http://www.example.com/hyperlink-function'
95
+ "URL" => 'This uses the HYPERLINK() function'
87
96
  )
88
97
 
89
98
  _(rows.slurped?).must_equal false
@@ -113,6 +122,52 @@ describe SimpleXlsxReader do
113
122
 
114
123
  let(:reader) { SimpleXlsxReader.open(xlsx.archive.path) }
115
124
 
125
+ describe 'when parsing escaped characters' do
126
+ let(:escaped_content) do
127
+ '&lt;a href="https://www.example.com"&gt;Link A&lt;/a&gt; &amp;bull; &lt;a href="https://www.example.com"&gt;Link B&lt;/a&gt;'
128
+ end
129
+
130
+ let(:unescaped_content) do
131
+ '<a href="https://www.example.com">Link A</a> &bull; <a href="https://www.example.com">Link B</a>'
132
+ end
133
+
134
+ let(:sheet) do
135
+ <<~XML
136
+ <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
137
+ <dimension ref="A1:B1" />
138
+ <sheetData>
139
+ <row r="1">
140
+ <c r="A1" s="1" t="s">
141
+ <v>0</v>
142
+ </c>
143
+ <c r='B1' s='0'>
144
+ <v>#{escaped_content}</v>
145
+ </c>
146
+ </row>
147
+ </sheetData>
148
+ </worksheet>
149
+ XML
150
+ end
151
+
152
+ let(:shared_strings) do
153
+ <<~XML
154
+ <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
155
+ <si>
156
+ <t>#{escaped_content}</t>
157
+ </si>
158
+ </sst>
159
+ XML
160
+ end
161
+
162
+ it 'loads correctly using inline strings' do
163
+ _(reader.sheets[0].rows.slurp[0][0]).must_equal(unescaped_content)
164
+ end
165
+
166
+ it 'loads correctly using shared strings' do
167
+ _(reader.sheets[0].rows.slurp[0][1]).must_equal(unescaped_content)
168
+ end
169
+ end
170
+
116
171
  describe 'Sheet#rows#each(headers: true)' do
117
172
  let(:sheet) do
118
173
  <<~XML
@@ -818,6 +873,10 @@ describe SimpleXlsxReader do
818
873
  <c r='I1' s='0'>
819
874
  <v>GUI-made hyperlink</v>
820
875
  </c>
876
+
877
+ <c r='J1' s='0'>
878
+ <v>1</v>
879
+ </c>
821
880
  </row>
822
881
  </sheetData>
823
882
 
@@ -916,6 +975,10 @@ describe SimpleXlsxReader do
916
975
  )
917
976
  )
918
977
  end
978
+
979
+ it "reads 'Generic' cells with numbers as numbers" do
980
+ _(@row[9]).must_equal 1
981
+ end
919
982
  end
920
983
 
921
984
  describe 'parsing documents with blank rows' do
@@ -927,7 +990,7 @@ describe SimpleXlsxReader do
927
990
  <sheetData>
928
991
  <row r="2" spans="1:1">
929
992
  <c r="A2" s="0">
930
- <v>0</v>
993
+ <v>a</v>
931
994
  </c>
932
995
  </row>
933
996
  <row r="4" spans="1:1">
@@ -958,13 +1021,107 @@ describe SimpleXlsxReader do
958
1021
  it 'reads row data despite gaps in row numbering' do
959
1022
  _(@rows).must_equal [
960
1023
  [nil, nil, nil, nil],
961
- ['0', nil, nil, nil],
1024
+ ['a', nil, nil, nil],
962
1025
  [nil, nil, nil, nil],
963
- [nil, '1', nil, nil],
964
- [nil, nil, '2', nil],
1026
+ [nil, 1, nil, nil],
1027
+ [nil, nil, 2, nil],
965
1028
  [nil, nil, nil, nil],
966
- [nil, nil, nil, '3']
1029
+ [nil, nil, nil, 3]
1030
+ ]
1031
+ end
1032
+ end
1033
+
1034
+ describe 'parsing documents with non-hyperlinked rels' do
1035
+ let(:rels) do
1036
+ [
1037
+ Nokogiri::XML(
1038
+ <<-XML
1039
+ <?xml version="1.0" encoding="UTF-8"?>
1040
+ <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>
1041
+ XML
1042
+ ).remove_namespaces!
967
1043
  ]
968
1044
  end
1045
+
1046
+ describe 'when document is opened as path' do
1047
+ before do
1048
+ @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
1049
+ end
1050
+
1051
+ it 'reads cell content' do
1052
+ _(@row[0]).must_equal 'Cell A'
1053
+ end
1054
+ end
1055
+
1056
+ describe 'when document is parsed as a String' do
1057
+ before do
1058
+ output = File.binread(xlsx.archive.path)
1059
+ @row = SimpleXlsxReader.parse(output).sheets[0].rows.to_a[0]
1060
+ end
1061
+
1062
+ it 'reads cell content' do
1063
+ _(@row[0]).must_equal 'Cell A'
1064
+ end
1065
+ end
1066
+
1067
+ describe 'when document is parsed as StringIO' do
1068
+ before do
1069
+ stream = StringIO.new(File.binread(xlsx.archive.path), 'rb')
1070
+ @row = SimpleXlsxReader.parse(stream).sheets[0].rows.to_a[0]
1071
+ stream.close
1072
+ end
1073
+
1074
+ it 'reads cell content' do
1075
+ _(@row[0]).must_equal 'Cell A'
1076
+ end
1077
+ end
1078
+ end
1079
+
1080
+ # https://support.microsoft.com/en-us/office/available-number-formats-in-excel-0afe8f52-97db-41f1-b972-4b46e9f1e8d2
1081
+ describe 'numeric fields styled as "General"' do
1082
+ let(:misc_numbers_path) do
1083
+ File.join(File.dirname(__FILE__), 'misc_numbers.xlsx')
1084
+ end
1085
+
1086
+ let(:sheet) { SimpleXlsxReader.open(misc_numbers_path).sheets[0] }
1087
+
1088
+ it 'reads medium sized integers as integers' do
1089
+ _(sheet.rows.slurp[1][0]).must_equal 98070
1090
+ end
1091
+
1092
+ it 'reads large (>12 char) integers as integers' do
1093
+ _(sheet.rows.slurp[1][1]).must_equal 1234567890123
1094
+ end
1095
+ end
1096
+
1097
+ describe 'with mysteriously chunky UTF-8 text' do
1098
+ let(:chunky_utf8_path) do
1099
+ File.join(File.dirname(__FILE__), 'chunky_utf8.xlsx')
1100
+ end
1101
+
1102
+ let(:sheet) { SimpleXlsxReader.open(chunky_utf8_path).sheets[0] }
1103
+
1104
+ it 'reads the whole cell text' do
1105
+ _(sheet.rows.slurp[1]).must_equal(
1106
+ ["sample-company-1", "Korntal-Münchingen", "Bronholmer straße"]
1107
+ )
1108
+ end
1109
+ end
1110
+
1111
+ describe 'when using percentages & currencies' do
1112
+ let(:pnc_path) do
1113
+ # This file provided by a GitHub user having parse errors in these fields
1114
+ File.join(File.dirname(__FILE__), 'percentages_n_currencies.xlsx')
1115
+ end
1116
+
1117
+ let(:sheet) { SimpleXlsxReader.open(pnc_path).sheets[0] }
1118
+
1119
+ it 'reads percentages as floats of the form 0.XX' do
1120
+ _(sheet.rows.slurp[1][2]).must_equal(0.87)
1121
+ end
1122
+
1123
+ it 'reads currencies as floats' do
1124
+ _(sheet.rows.slurp[1][4]).must_equal(300.0)
1125
+ end
969
1126
  end
970
1127
  end
@@ -57,7 +57,6 @@ TestXlsxBuilder = Struct.new(:shared_strings, :styles, :sheets, :workbook, :rels
57
57
  self.styles ||= DEFAULTS[:styles]
58
58
  self.sheets ||= [DEFAULTS[:sheet]]
59
59
  self.rels ||= []
60
- self.shared_strings ||= []
61
60
  end
62
61
 
63
62
  def archive
@@ -76,7 +75,7 @@ TestXlsxBuilder = Struct.new(:shared_strings, :styles, :sheets, :workbook, :rels
76
75
  styles_file.write(styles)
77
76
  end
78
77
 
79
- if shared_strings.any?
78
+ if shared_strings
80
79
  zip.get_output_stream('xl/sharedStrings.xml') do |ss_file|
81
80
  ss_file.write(shared_strings)
82
81
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: simple_xlsx_reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0
4
+ version: 5.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Woody Peterson
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-08-18 00:00:00.000000000 Z
11
+ date: 2023-06-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -105,6 +105,7 @@ files:
105
105
  - lib/simple_xlsx_reader/loader/workbook_parser.rb
106
106
  - lib/simple_xlsx_reader/version.rb
107
107
  - simple_xlsx_reader.gemspec
108
+ - test/chunky_utf8.xlsx
108
109
  - test/date1904.xlsx
109
110
  - test/date1904_test.rb
110
111
  - test/datetime_test.rb
@@ -113,6 +114,8 @@ files:
113
114
  - test/gdocs_sheet_test.rb
114
115
  - test/lower_case_sharedstrings.xlsx
115
116
  - test/lower_case_sharedstrings_test.rb
117
+ - test/misc_numbers.xlsx
118
+ - test/percentages_n_currencies.xlsx
116
119
  - test/performance_test.rb
117
120
  - test/sesame_street_blog.xlsx
118
121
  - test/shared_strings.xml
@@ -144,6 +147,7 @@ signing_key:
144
147
  specification_version: 4
145
148
  summary: Read xlsx data the Ruby way
146
149
  test_files:
150
+ - test/chunky_utf8.xlsx
147
151
  - test/date1904.xlsx
148
152
  - test/date1904_test.rb
149
153
  - test/datetime_test.rb
@@ -152,6 +156,8 @@ test_files:
152
156
  - test/gdocs_sheet_test.rb
153
157
  - test/lower_case_sharedstrings.xlsx
154
158
  - test/lower_case_sharedstrings_test.rb
159
+ - test/misc_numbers.xlsx
160
+ - test/percentages_n_currencies.xlsx
155
161
  - test/performance_test.rb
156
162
  - test/sesame_street_blog.xlsx
157
163
  - test/shared_strings.xml