simple_xlsx_reader 4.0.1 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f5e0de15ab326f027127322eac9d88d752ffd2d55797df23ed525a7eea1d9833
4
- data.tar.gz: 415972aaf4f77e4bdb5e60b0095cf01f6d7e24a0cd28d493c8ad8d01fa50d66b
3
+ metadata.gz: 8552d34f153cbdc6561c40725488d193e9aa48debcded0af24d32daf01b2f951
4
+ data.tar.gz: 2a0fecdec3698bb16717244fc7bf9b45b4fe0f6b216038e9823f9a5fea2ea8fa
5
5
  SHA512:
6
- metadata.gz: cfea3adb62767bedfe6470377b54078066b1bf07e13c065160a31c260261f65a658c34c37dc850c01dd27ba7038a4261f1a6ea9c6ff42771e8113dbabc51897b
7
- data.tar.gz: a4b996c3d15b2f54a61d8fc90366ba502d6bc108b1bfe5ed34b9d46f63fe7c52381f181421cc31eea46b7c73e6925d3aa3cfeb8e1b737021be93e4863ea9e703
6
+ metadata.gz: 77f99e8ad1020f0313171dcd0b14f7200fdf116e16de312146eb66a4d9347e94a0bf1cb4483f606975cd8bc776e80995473485271e05ee0a11136ef72cdeeae5
7
+ data.tar.gz: 7ee3ed8c37df6632981bd6eeb301de5f852df0f66534ce91593923cf1b51aa1dc0b07aed224d5d88cbd4b1f8a6901fdb17164e6e9f22fb10d4e5d90a3c24f437
data/CHANGELOG.md CHANGED
@@ -1,3 +1,10 @@
1
+ ### 5.0.0
2
+
3
+ * Change SimpleXlsxReader::Hyperlink to default to the visible cell value
4
+ instead of the hyperlink URL, which in the case of mailto hyperlinks is
5
+ surprising.
6
+ * Fix blank content when parsing docs from string (@codemole)
7
+
1
8
  ### 4.0.1
2
9
 
3
10
  * Fix nil error when handling some inline strings
data/README.md CHANGED
@@ -9,15 +9,17 @@ then forgotten. We just want to get the data, and get out!
9
9
 
10
10
  ## Summary (now with stream parsing):
11
11
 
12
- doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
13
- doc.sheets # => [<#SXR::Sheet>, ...]
14
- doc.sheets.first.name # 'Sheet1'
15
- doc.sheets.first.rows # <SXR::Document::RowsProxy>
16
- doc.sheets.first.rows.each # an <Enumerator> ready to chain or stream
17
- doc.sheets.first.rows.each {} # Streams the rows to your block
18
- doc.sheets.first.rows.each(headers: true) {} # Streams row-hashes
19
- doc.sheets.first.rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
20
- doc.sheets.first.rows.slurp # Slurps rows into memory as a 2D array
12
+ ```ruby
13
+ doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
14
+ doc.sheets # => [<#SXR::Sheet>, ...]
15
+ doc.sheets.first.name # 'Sheet1'
16
+ rows = doc.sheet.first.rows # <SXR::Document::RowsProxy>
17
+ rows.each # an <Enumerator> ready to chain or stream
18
+ rows.each {} # Streams the rows to your block
19
+ rows.each(headers: true) {} # Streams row-hashes
20
+ rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
21
+ rows.slurp # Slurps rows into memory as a 2D array
22
+ ```
21
23
 
22
24
  That's the gist of it!
23
25
 
@@ -29,7 +31,8 @@ See also the [Document](https://github.com/woahdae/simple_xlsx_reader/blob/2.0.0
29
31
 
30
32
  This project was started years ago, primarily because other Ruby xlsx parsers
31
33
  didn't import data with the correct types. Numbers as strings, dates as numbers,
32
- hyperlinks with inaccessible URLs, or - subtly buggy - simple dates as DateTime
34
+ [hyperlinks](https://github.com/woahdae/simple_xlsx_reader/blob/master/lib/simple_xlsx_reader/hyperlink.rb)
35
+ with inaccessible URLs, or - subtly buggy - simple dates as DateTime
33
36
  objects. If your app uses a timezone offset, depending on what timezone and
34
37
  what time of day you load the xlsx file, your dates might end up a day off!
35
38
  SimpleXlsxReader understands all these correctly.
@@ -39,12 +42,14 @@ SimpleXlsxReader understands all these correctly.
39
42
  Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly.
40
43
  SimpleXlsxReader strives to be fairly idiomatic Ruby:
41
44
 
42
- # quick example having fun w/ ruby
43
- doc = SimpleXlsxReader.open(path_or_io)
44
- doc.sheets.first.rows.each(headers: {id: /ID/})
45
- .with_index.with_object({}) do |(row, index), acc|
46
- acc[row[:id]] = index
47
- end
45
+ ```ruby
46
+ # quick example having fun w/ ruby
47
+ doc = SimpleXlsxReader.open(path_or_io)
48
+ doc.sheets.first.rows.each(headers: {id: /ID/})
49
+ .with_index.with_object({}) do |(row, index), acc|
50
+ acc[row[:id]] = index
51
+ end
52
+ ```
48
53
 
49
54
  ### Now faster
50
55
 
@@ -77,15 +82,19 @@ If you had an excel sheet representing this data:
77
82
 
78
83
  Get a handle on the rows proxy:
79
84
 
80
- `rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows`
85
+ ```ruby
86
+ rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows
87
+ ```
81
88
 
82
89
  Simple streaming (kinda boring):
83
90
 
84
- `rows.each { |row| ... }`
91
+ ```ruby
92
+ rows.each { |row| ... }
93
+ ````
85
94
 
86
95
  Streaming with headers, and how about a little enumerable chaining:
87
96
 
88
- ```
97
+ ```ruby
89
98
  # Map of hero names by ID: { 117 => 'John Halo', ... }
90
99
 
91
100
  rows.each(headers: true).with_object({}) do |row, acc|
@@ -108,7 +117,7 @@ Sometimes though you have some junk at the top of your spreadsheet:
108
117
  For this, `headers` can be a hash whose keys replace headers and whose values
109
118
  help find the correct header row:
110
119
 
111
- ```
120
+ ```ruby
112
121
  # Same map of hero names by ID: { 117 => 'John Halo', ... }
113
122
 
114
123
  rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
@@ -119,7 +128,7 @@ end
119
128
  If your header-to-attribute mapping is more complicated than key/value, you
120
129
  can do the mapping elsewhere, but use a block to find the header row:
121
130
 
122
- ```
131
+ ```ruby
123
132
  # Example roughly analogous to some production code mapping a single spreadsheet
124
133
  # across many objects. Might be a simpler way now that we have the headers-hash
125
134
  # feature.
@@ -168,9 +177,11 @@ can set `SimpleXlsxReader.configuration.catch_cell_load_errors =
168
177
  true`, and load errors will instead be inserted into Sheet#load_errors keyed
169
178
  by [rownum, colnum]:
170
179
 
171
- {
172
- [rownum, colnum] => '[error]'
173
- }
180
+ ```ruby
181
+ {
182
+ [rownum, colnum] => '[error]'
183
+ }
184
+ ```
174
185
 
175
186
  ### Performance
176
187
 
@@ -233,11 +244,9 @@ This project follows [semantic versioning 1.0](http://semver.org/spec/v1.0.0.htm
233
244
  Remember to write tests, think about edge cases, and run the existing
234
245
  suite.
235
246
 
236
- Note that as of commit 665cbafdde, the most extreme end of the
237
- linear-time performance test, which is 10,000 rows (12 columns), runs in
238
- ~4 seconds on Ruby 2.1 on a 2012 MBP. If the linear time assertion fails
239
- or you're way off that, there is probably a performance regression in
240
- your code.
247
+ The full suite contains a performance test that on an M1 MBP runs the final
248
+ large file in about five seconds. Check out that test before & after your
249
+ change to check for performance changes.
241
250
 
242
251
  Then, the standard stuff:
243
252
 
@@ -4,27 +4,26 @@ module SimpleXlsxReader
4
4
  # We support hyperlinks as a "type" even though they're technically
5
5
  # represented either as a function or an external reference in the xlsx spec.
6
6
  #
7
- # Since having hyperlink data in our sheet usually means we might want to do
8
- # something primarily with the URL (store it in the database, download it, etc),
9
- # we go through extra effort to parse the function or follow the reference
10
- # to represent the hyperlink primarily as a URL. However, maybe we do want
11
- # the hyperlink "friendly name" part (as MS calls it), so here we've subclassed
12
- # string to tack on the friendly name. This means 80% of us that just want
13
- # the URL value will have to do nothing extra, but the 20% that might want the
14
- # friendly name can access it.
7
+ # In practice, hyperlinks are usually a link or a mailto. In the case of a
8
+ # link, we probably want to follow it to download something, but in the case
9
+ # of an email, we probably just want the email and not the mailto. So we
10
+ # represent a hyperlink primarily as it is seen by the user, following the
11
+ # principle of least surprise, but the url is accessible via #url.
15
12
  #
16
- # Note, by default, the value we would get by just asking the cell would
17
- # be the "friendly name" and *not* the URL, which is tucked away in the
18
- # function definition or a separate "relationships" meta-document.
13
+ # Microsoft calls the visible part of a hyperlink cell the "friendly name,"
14
+ # so we expose that as a method too, in case you want to be explicit about
15
+ # how you're accessing it.
19
16
  #
20
17
  # See MS documentation on the HYPERLINK function for some background:
21
18
  # https://support.office.com/en-us/article/HYPERLINK-function-333c7ce6-c5ae-4164-9c47-7de9b76f577f
22
19
  class Hyperlink < String
23
20
  attr_reader :friendly_name
21
+ attr_reader :url
24
22
 
25
23
  def initialize(url, friendly_name = nil)
26
24
  @friendly_name = friendly_name
27
- super(url)
25
+ @url = url
26
+ super(friendly_name || url)
28
27
  end
29
28
  end
30
29
  end
@@ -31,10 +31,9 @@ module SimpleXlsxReader
31
31
  @url = nil # silence warnings
32
32
  @function = nil # silence warnings
33
33
  @capture = nil # silence warnings
34
+ @captured = nil # silence warnings
34
35
  @dimension = nil # silence warnings
35
36
 
36
- @file_io.rewind # in case we've already parsed this once
37
-
38
37
  # In this project this is only used for GUI-made hyperlinks (as opposed
39
38
  # to FUNCTION-based hyperlinks). Unfortunately the're needed to parse
40
39
  # the spreadsheet, and they come AFTER the sheet data. So, solution is
@@ -44,9 +43,10 @@ module SimpleXlsxReader
44
43
  if xrels_file&.grep(/hyperlink/)&.any?
45
44
  xrels_file.rewind
46
45
  load_gui_hyperlinks # represented as hyperlinks_by_cell
47
- @file_io.rewind
48
46
  end
49
47
 
48
+ @file_io.rewind # in case we've already parsed this once
49
+
50
50
  Nokogiri::XML::SAX::Parser.new(self).parse(@file_io)
51
51
  end
52
52
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SimpleXlsxReader
4
- VERSION = '4.0.1'
4
+ VERSION = '5.0.0'
5
5
  end
@@ -70,7 +70,7 @@ describe 'SimpleXlsxReader Benchmark' do
70
70
  let(:styles) do
71
71
  # s='0' above refers to the value of numFmtId at cellXfs index 0,
72
72
  # which is in this case 'General' type
73
- styles =
73
+ _styles =
74
74
  <<-XML
75
75
  <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
76
76
  <cellXfs count="1">
@@ -92,7 +92,7 @@ describe SimpleXlsxReader do
92
92
  body: 'The Greatest',
93
93
  created_at: Time.parse('2002-01-01 11:00:00 UTC'),
94
94
  count: 1,
95
- "URL" => 'http://www.example.com/hyperlink-function'
95
+ "URL" => 'This uses the HYPERLINK() function'
96
96
  )
97
97
 
98
98
  _(rows.slurped?).must_equal false
@@ -1031,6 +1031,52 @@ describe SimpleXlsxReader do
1031
1031
  end
1032
1032
  end
1033
1033
 
1034
+ describe 'parsing documents with non-hyperlinked rels' do
1035
+ let(:rels) do
1036
+ [
1037
+ Nokogiri::XML(
1038
+ <<-XML
1039
+ <?xml version="1.0" encoding="UTF-8"?>
1040
+ <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>
1041
+ XML
1042
+ ).remove_namespaces!
1043
+ ]
1044
+ end
1045
+
1046
+ describe 'when document is opened as path' do
1047
+ before do
1048
+ @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
1049
+ end
1050
+
1051
+ it 'reads cell content' do
1052
+ _(@row[0]).must_equal 'Cell A'
1053
+ end
1054
+ end
1055
+
1056
+ describe 'when document is parsed as a String' do
1057
+ before do
1058
+ output = File.binread(xlsx.archive.path)
1059
+ @row = SimpleXlsxReader.parse(output).sheets[0].rows.to_a[0]
1060
+ end
1061
+
1062
+ it 'reads cell content' do
1063
+ _(@row[0]).must_equal 'Cell A'
1064
+ end
1065
+ end
1066
+
1067
+ describe 'when document is parsed as StringIO' do
1068
+ before do
1069
+ stream = StringIO.new(File.binread(xlsx.archive.path), 'rb')
1070
+ @row = SimpleXlsxReader.parse(stream).sheets[0].rows.to_a[0]
1071
+ stream.close
1072
+ end
1073
+
1074
+ it 'reads cell content' do
1075
+ _(@row[0]).must_equal 'Cell A'
1076
+ end
1077
+ end
1078
+ end
1079
+
1034
1080
  # https://support.microsoft.com/en-us/office/available-number-formats-in-excel-0afe8f52-97db-41f1-b972-4b46e9f1e8d2
1035
1081
  describe 'numeric fields styled as "General"' do
1036
1082
  let(:misc_numbers_path) do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: simple_xlsx_reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.0.1
4
+ version: 5.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Woody Peterson
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-06 00:00:00.000000000 Z
11
+ date: 2023-06-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri