RubyGems - simple_xlsx_reader - Versions diffs - 4.0.1 → 5.0.0 - Mend

simple_xlsx_reader 4.0.1 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +7 -0
data/README.md +38 -29
data/lib/simple_xlsx_reader/hyperlink.rb +11 -12
data/lib/simple_xlsx_reader/loader/sheet_parser.rb +3 -3
data/lib/simple_xlsx_reader/version.rb +1 -1
data/test/performance_test.rb +1 -1
data/test/simple_xlsx_reader_test.rb +47 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: f5e0de15ab326f027127322eac9d88d752ffd2d55797df23ed525a7eea1d9833
-  data.tar.gz: 415972aaf4f77e4bdb5e60b0095cf01f6d7e24a0cd28d493c8ad8d01fa50d66b
+  metadata.gz: 8552d34f153cbdc6561c40725488d193e9aa48debcded0af24d32daf01b2f951
+  data.tar.gz: 2a0fecdec3698bb16717244fc7bf9b45b4fe0f6b216038e9823f9a5fea2ea8fa
 SHA512:
-  metadata.gz: cfea3adb62767bedfe6470377b54078066b1bf07e13c065160a31c260261f65a658c34c37dc850c01dd27ba7038a4261f1a6ea9c6ff42771e8113dbabc51897b
-  data.tar.gz: a4b996c3d15b2f54a61d8fc90366ba502d6bc108b1bfe5ed34b9d46f63fe7c52381f181421cc31eea46b7c73e6925d3aa3cfeb8e1b737021be93e4863ea9e703
+  metadata.gz: 77f99e8ad1020f0313171dcd0b14f7200fdf116e16de312146eb66a4d9347e94a0bf1cb4483f606975cd8bc776e80995473485271e05ee0a11136ef72cdeeae5
+  data.tar.gz: 7ee3ed8c37df6632981bd6eeb301de5f852df0f66534ce91593923cf1b51aa1dc0b07aed224d5d88cbd4b1f8a6901fdb17164e6e9f22fb10d4e5d90a3c24f437

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,10 @@
+### 5.0.0
+* Change SimpleXlsxReader::Hyperlink to default to the visible cell value
+  instead of the hyperlink URL, which in the case of mailto hyperlinks is
+  surprising.
+* Fix blank content when parsing docs from string (@codemole)
 ### 4.0.1
 * Fix nil error when handling some inline strings

data/README.md CHANGED Viewed

@@ -9,15 +9,17 @@ then forgotten. We just want to get the data, and get out!
 ## Summary (now with stream parsing):
-    doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
-    doc.sheets # => [<#SXR::Sheet>, ...]
-    doc.sheets.first.name # 'Sheet1'
-    doc.sheets.first.rows # <SXR::Document::RowsProxy>
-    doc.sheets.first.rows.each # an <Enumerator> ready to chain or stream
-    doc.sheets.first.rows.each {} # Streams the rows to your block
-    doc.sheets.first.rows.each(headers: true) {} # Streams row-hashes
-    doc.sheets.first.rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
-    doc.sheets.first.rows.slurp # Slurps rows into memory as a 2D array
+```ruby
+doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
+doc.sheets # => [<#SXR::Sheet>, ...]
+doc.sheets.first.name # 'Sheet1'
+rows = doc.sheet.first.rows # <SXR::Document::RowsProxy>
+rows.each # an <Enumerator> ready to chain or stream
+rows.each {} # Streams the rows to your block
+rows.each(headers: true) {} # Streams row-hashes
+rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
+rows.slurp # Slurps rows into memory as a 2D array
+```
 That's the gist of it!
@@ -29,7 +31,8 @@ See also the [Document](https://github.com/woahdae/simple_xlsx_reader/blob/2.0.0
 This project was started years ago, primarily because other Ruby xlsx parsers
 didn't import data with the correct types. Numbers as strings, dates as numbers,
-hyperlinks with inaccessible URLs, or - subtly buggy - simple dates as DateTime
+[hyperlinks](https://github.com/woahdae/simple_xlsx_reader/blob/master/lib/simple_xlsx_reader/hyperlink.rb)
+with inaccessible URLs, or - subtly buggy - simple dates as DateTime
 objects. If your app uses a timezone offset, depending on what timezone and
 what time of day you load the xlsx file, your dates might end up a day off!
 SimpleXlsxReader understands all these correctly.
@@ -39,12 +42,14 @@ SimpleXlsxReader understands all these correctly.
 Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly.
 SimpleXlsxReader strives to be fairly idiomatic Ruby:
-    # quick example having fun w/ ruby
-    doc = SimpleXlsxReader.open(path_or_io)
-    doc.sheets.first.rows.each(headers: {id: /ID/})
-      .with_index.with_object({}) do |(row, index), acc|
-        acc[row[:id]] = index
-      end
+```ruby
+# quick example having fun w/ ruby
+doc = SimpleXlsxReader.open(path_or_io)
+doc.sheets.first.rows.each(headers: {id: /ID/})
+  .with_index.with_object({}) do |(row, index), acc|
+    acc[row[:id]] = index
+end
+```
 ### Now faster
@@ -77,15 +82,19 @@ If you had an excel sheet representing this data:
 Get a handle on the rows proxy:
-`rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows`
+```ruby
+rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows
+```
 Simple streaming (kinda boring):
-`rows.each { |row| ... }`
+```ruby
+rows.each { |row| ... }
+````
 Streaming with headers, and how about a little enumerable chaining:
-```
+```ruby
 # Map of hero names by ID: { 117 => 'John Halo', ... }
 rows.each(headers: true).with_object({}) do |row, acc|
@@ -108,7 +117,7 @@ Sometimes though you have some junk at the top of your spreadsheet:
 For this, `headers` can be a hash whose keys replace headers and whose values
 help find the correct header row:
-```
+```ruby
 # Same map of hero names by ID: { 117 => 'John Halo', ... }
 rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
@@ -119,7 +128,7 @@ end
 If your header-to-attribute mapping is more complicated than key/value, you
 can do the mapping elsewhere, but use a block to find the header row:
-```
+```ruby
 # Example roughly analogous to some production code mapping a single spreadsheet
 # across many objects. Might be a simpler way now that we have the headers-hash
 # feature.
@@ -168,9 +177,11 @@ can set `SimpleXlsxReader.configuration.catch_cell_load_errors =
 true`, and load errors will instead be inserted into Sheet#load_errors keyed
 by [rownum, colnum]:
-    {
-      [rownum, colnum] => '[error]'
-    }
+```ruby
+{
+  [rownum, colnum] => '[error]'
+}
+```
 ### Performance
@@ -233,11 +244,9 @@ This project follows [semantic versioning 1.0](http://semver.org/spec/v1.0.0.htm
 Remember to write tests, think about edge cases, and run the existing
 suite.
-Note that as of commit 665cbafdde, the most extreme end of the
-linear-time performance test, which is 10,000 rows (12 columns), runs in
-~4 seconds on Ruby 2.1 on a 2012 MBP. If the linear time assertion fails
-or you're way off that, there is probably a performance regression in
-your code.
+The full suite contains a performance test that on an M1 MBP runs the final
+large file in about five seconds. Check out that test before & after your
+change to check for performance changes.
 Then, the standard stuff:

data/lib/simple_xlsx_reader/hyperlink.rb CHANGED Viewed

@@ -4,27 +4,26 @@ module SimpleXlsxReader
   # We support hyperlinks as a "type" even though they're technically
   # represented either as a function or an external reference in the xlsx spec.
   #
-  # Since having hyperlink data in our sheet usually means we might want to do
-  # something primarily with the URL (store it in the database, download it, etc),
-  # we go through extra effort to parse the function or follow the reference
-  # to represent the hyperlink primarily as a URL. However, maybe we do want
-  # the hyperlink "friendly name" part (as MS calls it), so here we've subclassed
-  # string to tack on the friendly name. This means 80% of us that just want
-  # the URL value will have to do nothing extra, but the 20% that might want the
-  # friendly name can access it.
+  # In practice, hyperlinks are usually a link or a mailto. In the case of a
+  # link, we probably want to follow it to download something, but in the case
+  # of an email, we probably just want the email and not the mailto. So we
+  # represent a hyperlink primarily as it is seen by the user, following the
+  # principle of least surprise, but the url is accessible via #url.
   #
-  # Note, by default, the value we would get by just asking the cell would
-  # be the "friendly name" and *not* the URL, which is tucked away in the
-  # function definition or a separate "relationships" meta-document.
+  # Microsoft calls the visible part of a hyperlink cell the "friendly name,"
+  # so we expose that as a method too, in case you want to be explicit about
+  # how you're accessing it.
   #
   # See MS documentation on the HYPERLINK function for some background:
   # https://support.office.com/en-us/article/HYPERLINK-function-333c7ce6-c5ae-4164-9c47-7de9b76f577f
   class Hyperlink < String
     attr_reader :friendly_name
+    attr_reader :url
     def initialize(url, friendly_name = nil)
       @friendly_name = friendly_name
-      super(url)
+      @url = url
+      super(friendly_name || url)
     end
   end
 end

data/lib/simple_xlsx_reader/loader/sheet_parser.rb CHANGED Viewed

@@ -31,10 +31,9 @@ module SimpleXlsxReader
         @url = nil # silence warnings
         @function = nil # silence warnings
         @capture = nil # silence warnings
+        @captured = nil # silence warnings
         @dimension = nil # silence warnings
-        @file_io.rewind # in case we've already parsed this once
         # In this project this is only used for GUI-made hyperlinks (as opposed
         # to FUNCTION-based hyperlinks). Unfortunately the're needed to parse
         # the spreadsheet, and they come AFTER the sheet data. So, solution is
@@ -44,9 +43,10 @@ module SimpleXlsxReader
         if xrels_file&.grep(/hyperlink/)&.any?
           xrels_file.rewind
           load_gui_hyperlinks # represented as hyperlinks_by_cell
-          @file_io.rewind
         end
+        @file_io.rewind # in case we've already parsed this once
         Nokogiri::XML::SAX::Parser.new(self).parse(@file_io)
       end

data/lib/simple_xlsx_reader/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SimpleXlsxReader
-  VERSION = '4.0.1'
+  VERSION = '5.0.0'
 end

data/test/performance_test.rb CHANGED Viewed

@@ -70,7 +70,7 @@ describe 'SimpleXlsxReader Benchmark' do
   let(:styles) do
     # s='0' above refers to the value of numFmtId at cellXfs index 0,
     # which is in this case 'General' type
-    styles =
+    _styles =
       <<-XML
         <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
           <cellXfs count="1">

data/test/simple_xlsx_reader_test.rb CHANGED Viewed

@@ -92,7 +92,7 @@ describe SimpleXlsxReader do
         body: 'The Greatest',
         created_at: Time.parse('2002-01-01 11:00:00 UTC'),
         count: 1,
-        "URL" => 'http://www.example.com/hyperlink-function'
+        "URL" => 'This uses the HYPERLINK() function'
       )
       _(rows.slurped?).must_equal false
@@ -1031,6 +1031,52 @@ describe SimpleXlsxReader do
     end
   end
+  describe 'parsing documents with non-hyperlinked rels' do
+    let(:rels) do
+      [
+        Nokogiri::XML(
+          <<-XML
+          <?xml version="1.0" encoding="UTF-8"?>
+          <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>
+          XML
+        ).remove_namespaces!
+      ]
+    end
+    describe 'when document is opened as path' do
+      before do
+        @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
+      end
+      it 'reads cell content' do
+        _(@row[0]).must_equal 'Cell A'
+      end
+    end
+    describe 'when document is parsed as a String' do
+      before do
+        output = File.binread(xlsx.archive.path)
+        @row = SimpleXlsxReader.parse(output).sheets[0].rows.to_a[0]
+      end
+      it 'reads cell content' do
+        _(@row[0]).must_equal 'Cell A'
+      end
+    end
+    describe 'when document is parsed as StringIO' do
+      before do
+        stream = StringIO.new(File.binread(xlsx.archive.path), 'rb')
+        @row = SimpleXlsxReader.parse(stream).sheets[0].rows.to_a[0]
+        stream.close
+      end
+      it 'reads cell content' do
+        _(@row[0]).must_equal 'Cell A'
+      end
+    end
+  end
   # https://support.microsoft.com/en-us/office/available-number-formats-in-excel-0afe8f52-97db-41f1-b972-4b46e9f1e8d2
   describe 'numeric fields styled as "General"' do
     let(:misc_numbers_path) do

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: simple_xlsx_reader
 version: !ruby/object:Gem::Version
-  version: 4.0.1
+  version: 5.0.0
 platform: ruby
 authors:
 - Woody Peterson
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-03-06 00:00:00.000000000 Z
+date: 2023-06-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri