simple_xlsx_reader 4.0.1 → 5.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f5e0de15ab326f027127322eac9d88d752ffd2d55797df23ed525a7eea1d9833
4
- data.tar.gz: 415972aaf4f77e4bdb5e60b0095cf01f6d7e24a0cd28d493c8ad8d01fa50d66b
3
+ metadata.gz: 8552d34f153cbdc6561c40725488d193e9aa48debcded0af24d32daf01b2f951
4
+ data.tar.gz: 2a0fecdec3698bb16717244fc7bf9b45b4fe0f6b216038e9823f9a5fea2ea8fa
5
5
  SHA512:
6
- metadata.gz: cfea3adb62767bedfe6470377b54078066b1bf07e13c065160a31c260261f65a658c34c37dc850c01dd27ba7038a4261f1a6ea9c6ff42771e8113dbabc51897b
7
- data.tar.gz: a4b996c3d15b2f54a61d8fc90366ba502d6bc108b1bfe5ed34b9d46f63fe7c52381f181421cc31eea46b7c73e6925d3aa3cfeb8e1b737021be93e4863ea9e703
6
+ metadata.gz: 77f99e8ad1020f0313171dcd0b14f7200fdf116e16de312146eb66a4d9347e94a0bf1cb4483f606975cd8bc776e80995473485271e05ee0a11136ef72cdeeae5
7
+ data.tar.gz: 7ee3ed8c37df6632981bd6eeb301de5f852df0f66534ce91593923cf1b51aa1dc0b07aed224d5d88cbd4b1f8a6901fdb17164e6e9f22fb10d4e5d90a3c24f437
data/CHANGELOG.md CHANGED
@@ -1,3 +1,10 @@
1
+ ### 5.0.0
2
+
3
+ * Change SimpleXlsxReader::Hyperlink to default to the visible cell value
4
+ instead of the hyperlink URL, which in the case of mailto hyperlinks is
5
+ surprising.
6
+ * Fix blank content when parsing docs from string (@codemole)
7
+
1
8
  ### 4.0.1
2
9
 
3
10
  * Fix nil error when handling some inline strings
data/README.md CHANGED
@@ -9,15 +9,17 @@ then forgotten. We just want to get the data, and get out!
9
9
 
10
10
  ## Summary (now with stream parsing):
11
11
 
12
- doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
13
- doc.sheets # => [<#SXR::Sheet>, ...]
14
- doc.sheets.first.name # 'Sheet1'
15
- doc.sheets.first.rows # <SXR::Document::RowsProxy>
16
- doc.sheets.first.rows.each # an <Enumerator> ready to chain or stream
17
- doc.sheets.first.rows.each {} # Streams the rows to your block
18
- doc.sheets.first.rows.each(headers: true) {} # Streams row-hashes
19
- doc.sheets.first.rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
20
- doc.sheets.first.rows.slurp # Slurps rows into memory as a 2D array
12
+ ```ruby
13
+ doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
14
+ doc.sheets # => [<#SXR::Sheet>, ...]
15
+ doc.sheets.first.name # 'Sheet1'
16
+ rows = doc.sheet.first.rows # <SXR::Document::RowsProxy>
17
+ rows.each # an <Enumerator> ready to chain or stream
18
+ rows.each {} # Streams the rows to your block
19
+ rows.each(headers: true) {} # Streams row-hashes
20
+ rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
21
+ rows.slurp # Slurps rows into memory as a 2D array
22
+ ```
21
23
 
22
24
  That's the gist of it!
23
25
 
@@ -29,7 +31,8 @@ See also the [Document](https://github.com/woahdae/simple_xlsx_reader/blob/2.0.0
29
31
 
30
32
  This project was started years ago, primarily because other Ruby xlsx parsers
31
33
  didn't import data with the correct types. Numbers as strings, dates as numbers,
32
- hyperlinks with inaccessible URLs, or - subtly buggy - simple dates as DateTime
34
+ [hyperlinks](https://github.com/woahdae/simple_xlsx_reader/blob/master/lib/simple_xlsx_reader/hyperlink.rb)
35
+ with inaccessible URLs, or - subtly buggy - simple dates as DateTime
33
36
  objects. If your app uses a timezone offset, depending on what timezone and
34
37
  what time of day you load the xlsx file, your dates might end up a day off!
35
38
  SimpleXlsxReader understands all these correctly.
@@ -39,12 +42,14 @@ SimpleXlsxReader understands all these correctly.
39
42
  Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly.
40
43
  SimpleXlsxReader strives to be fairly idiomatic Ruby:
41
44
 
42
- # quick example having fun w/ ruby
43
- doc = SimpleXlsxReader.open(path_or_io)
44
- doc.sheets.first.rows.each(headers: {id: /ID/})
45
- .with_index.with_object({}) do |(row, index), acc|
46
- acc[row[:id]] = index
47
- end
45
+ ```ruby
46
+ # quick example having fun w/ ruby
47
+ doc = SimpleXlsxReader.open(path_or_io)
48
+ doc.sheets.first.rows.each(headers: {id: /ID/})
49
+ .with_index.with_object({}) do |(row, index), acc|
50
+ acc[row[:id]] = index
51
+ end
52
+ ```
48
53
 
49
54
  ### Now faster
50
55
 
@@ -77,15 +82,19 @@ If you had an excel sheet representing this data:
77
82
 
78
83
  Get a handle on the rows proxy:
79
84
 
80
- `rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows`
85
+ ```ruby
86
+ rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows
87
+ ```
81
88
 
82
89
  Simple streaming (kinda boring):
83
90
 
84
- `rows.each { |row| ... }`
91
+ ```ruby
92
+ rows.each { |row| ... }
93
+ ````
85
94
 
86
95
  Streaming with headers, and how about a little enumerable chaining:
87
96
 
88
- ```
97
+ ```ruby
89
98
  # Map of hero names by ID: { 117 => 'John Halo', ... }
90
99
 
91
100
  rows.each(headers: true).with_object({}) do |row, acc|
@@ -108,7 +117,7 @@ Sometimes though you have some junk at the top of your spreadsheet:
108
117
  For this, `headers` can be a hash whose keys replace headers and whose values
109
118
  help find the correct header row:
110
119
 
111
- ```
120
+ ```ruby
112
121
  # Same map of hero names by ID: { 117 => 'John Halo', ... }
113
122
 
114
123
  rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
@@ -119,7 +128,7 @@ end
119
128
  If your header-to-attribute mapping is more complicated than key/value, you
120
129
  can do the mapping elsewhere, but use a block to find the header row:
121
130
 
122
- ```
131
+ ```ruby
123
132
  # Example roughly analogous to some production code mapping a single spreadsheet
124
133
  # across many objects. Might be a simpler way now that we have the headers-hash
125
134
  # feature.
@@ -168,9 +177,11 @@ can set `SimpleXlsxReader.configuration.catch_cell_load_errors =
168
177
  true`, and load errors will instead be inserted into Sheet#load_errors keyed
169
178
  by [rownum, colnum]:
170
179
 
171
- {
172
- [rownum, colnum] => '[error]'
173
- }
180
+ ```ruby
181
+ {
182
+ [rownum, colnum] => '[error]'
183
+ }
184
+ ```
174
185
 
175
186
  ### Performance
176
187
 
@@ -233,11 +244,9 @@ This project follows [semantic versioning 1.0](http://semver.org/spec/v1.0.0.htm
233
244
  Remember to write tests, think about edge cases, and run the existing
234
245
  suite.
235
246
 
236
- Note that as of commit 665cbafdde, the most extreme end of the
237
- linear-time performance test, which is 10,000 rows (12 columns), runs in
238
- ~4 seconds on Ruby 2.1 on a 2012 MBP. If the linear time assertion fails
239
- or you're way off that, there is probably a performance regression in
240
- your code.
247
+ The full suite contains a performance test that on an M1 MBP runs the final
248
+ large file in about five seconds. Check out that test before & after your
249
+ change to check for performance changes.
241
250
 
242
251
  Then, the standard stuff:
243
252
 
@@ -4,27 +4,26 @@ module SimpleXlsxReader
4
4
  # We support hyperlinks as a "type" even though they're technically
5
5
  # represented either as a function or an external reference in the xlsx spec.
6
6
  #
7
- # Since having hyperlink data in our sheet usually means we might want to do
8
- # something primarily with the URL (store it in the database, download it, etc),
9
- # we go through extra effort to parse the function or follow the reference
10
- # to represent the hyperlink primarily as a URL. However, maybe we do want
11
- # the hyperlink "friendly name" part (as MS calls it), so here we've subclassed
12
- # string to tack on the friendly name. This means 80% of us that just want
13
- # the URL value will have to do nothing extra, but the 20% that might want the
14
- # friendly name can access it.
7
+ # In practice, hyperlinks are usually a link or a mailto. In the case of a
8
+ # link, we probably want to follow it to download something, but in the case
9
+ # of an email, we probably just want the email and not the mailto. So we
10
+ # represent a hyperlink primarily as it is seen by the user, following the
11
+ # principle of least surprise, but the url is accessible via #url.
15
12
  #
16
- # Note, by default, the value we would get by just asking the cell would
17
- # be the "friendly name" and *not* the URL, which is tucked away in the
18
- # function definition or a separate "relationships" meta-document.
13
+ # Microsoft calls the visible part of a hyperlink cell the "friendly name,"
14
+ # so we expose that as a method too, in case you want to be explicit about
15
+ # how you're accessing it.
19
16
  #
20
17
  # See MS documentation on the HYPERLINK function for some background:
21
18
  # https://support.office.com/en-us/article/HYPERLINK-function-333c7ce6-c5ae-4164-9c47-7de9b76f577f
22
19
  class Hyperlink < String
23
20
  attr_reader :friendly_name
21
+ attr_reader :url
24
22
 
25
23
  def initialize(url, friendly_name = nil)
26
24
  @friendly_name = friendly_name
27
- super(url)
25
+ @url = url
26
+ super(friendly_name || url)
28
27
  end
29
28
  end
30
29
  end
@@ -31,10 +31,9 @@ module SimpleXlsxReader
31
31
  @url = nil # silence warnings
32
32
  @function = nil # silence warnings
33
33
  @capture = nil # silence warnings
34
+ @captured = nil # silence warnings
34
35
  @dimension = nil # silence warnings
35
36
 
36
- @file_io.rewind # in case we've already parsed this once
37
-
38
37
  # In this project this is only used for GUI-made hyperlinks (as opposed
39
38
  # to FUNCTION-based hyperlinks). Unfortunately the're needed to parse
40
39
  # the spreadsheet, and they come AFTER the sheet data. So, solution is
@@ -44,9 +43,10 @@ module SimpleXlsxReader
44
43
  if xrels_file&.grep(/hyperlink/)&.any?
45
44
  xrels_file.rewind
46
45
  load_gui_hyperlinks # represented as hyperlinks_by_cell
47
- @file_io.rewind
48
46
  end
49
47
 
48
+ @file_io.rewind # in case we've already parsed this once
49
+
50
50
  Nokogiri::XML::SAX::Parser.new(self).parse(@file_io)
51
51
  end
52
52
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SimpleXlsxReader
4
- VERSION = '4.0.1'
4
+ VERSION = '5.0.0'
5
5
  end
@@ -70,7 +70,7 @@ describe 'SimpleXlsxReader Benchmark' do
70
70
  let(:styles) do
71
71
  # s='0' above refers to the value of numFmtId at cellXfs index 0,
72
72
  # which is in this case 'General' type
73
- styles =
73
+ _styles =
74
74
  <<-XML
75
75
  <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
76
76
  <cellXfs count="1">
@@ -92,7 +92,7 @@ describe SimpleXlsxReader do
92
92
  body: 'The Greatest',
93
93
  created_at: Time.parse('2002-01-01 11:00:00 UTC'),
94
94
  count: 1,
95
- "URL" => 'http://www.example.com/hyperlink-function'
95
+ "URL" => 'This uses the HYPERLINK() function'
96
96
  )
97
97
 
98
98
  _(rows.slurped?).must_equal false
@@ -1031,6 +1031,52 @@ describe SimpleXlsxReader do
1031
1031
  end
1032
1032
  end
1033
1033
 
1034
+ describe 'parsing documents with non-hyperlinked rels' do
1035
+ let(:rels) do
1036
+ [
1037
+ Nokogiri::XML(
1038
+ <<-XML
1039
+ <?xml version="1.0" encoding="UTF-8"?>
1040
+ <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>
1041
+ XML
1042
+ ).remove_namespaces!
1043
+ ]
1044
+ end
1045
+
1046
+ describe 'when document is opened as path' do
1047
+ before do
1048
+ @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
1049
+ end
1050
+
1051
+ it 'reads cell content' do
1052
+ _(@row[0]).must_equal 'Cell A'
1053
+ end
1054
+ end
1055
+
1056
+ describe 'when document is parsed as a String' do
1057
+ before do
1058
+ output = File.binread(xlsx.archive.path)
1059
+ @row = SimpleXlsxReader.parse(output).sheets[0].rows.to_a[0]
1060
+ end
1061
+
1062
+ it 'reads cell content' do
1063
+ _(@row[0]).must_equal 'Cell A'
1064
+ end
1065
+ end
1066
+
1067
+ describe 'when document is parsed as StringIO' do
1068
+ before do
1069
+ stream = StringIO.new(File.binread(xlsx.archive.path), 'rb')
1070
+ @row = SimpleXlsxReader.parse(stream).sheets[0].rows.to_a[0]
1071
+ stream.close
1072
+ end
1073
+
1074
+ it 'reads cell content' do
1075
+ _(@row[0]).must_equal 'Cell A'
1076
+ end
1077
+ end
1078
+ end
1079
+
1034
1080
  # https://support.microsoft.com/en-us/office/available-number-formats-in-excel-0afe8f52-97db-41f1-b972-4b46e9f1e8d2
1035
1081
  describe 'numeric fields styled as "General"' do
1036
1082
  let(:misc_numbers_path) do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: simple_xlsx_reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.0.1
4
+ version: 5.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Woody Peterson
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-03-06 00:00:00.000000000 Z
11
+ date: 2023-06-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri