xsv 1.1.0 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f1ebfa4e4778af72a8b295d258d899a5b5d01fd029d1294d54af5e4f1e0de05a
4
- data.tar.gz: aa74ffe0d57eebc12e312bdb42107bc39203cd3a0237f2e2481205c8b3b933c9
3
+ metadata.gz: 58e5d405e39f42d0e5287d47dd65c65b39a0ab5a2fc7fde3fd85c7211469e6e1
4
+ data.tar.gz: 7100a73ce192536f81a34ffbb1b431a793edf9cb71c1612a547a64a686a8330f
5
5
  SHA512:
6
- metadata.gz: 6ebbb32e48860043bdb0a5d17f6fef525252e8bb01ac63180cd0e749dfbd1b3bb08b8a82e43e9e13da2fa187c02f77e37f525e9c7453334bc3cb8fdf05400187
7
- data.tar.gz: 4e1450daebcc3ddfbc0585de52f4d1f362ef2d79281e59cc641119a47924f8450b76c3643a6403a440e0a581ebfcc108431a311f902e2b48be61a1d2afd7b19e
6
+ metadata.gz: 3b8fcbab2e2aa1f02dc0b51051a9b60dd2518b18b72007c2f3e77fa99248e864069d54b0bb43d783f8bb6ef79b6c2504c8cd05c244a2e9c85cddb882de224556
7
+ data.tar.gz: 3ec5120d8b6e365996985c75f4c291e3f4805e9876fd93ae2dfe071c5bd69ad751677cf71121ac23f6e9bed75ab80296b12346d79696cb6105974345d289bb7e
data/CHANGELOG.md CHANGED
@@ -1,5 +1,17 @@
1
1
  # Xsv Changelog
2
2
 
3
+ ## 1.2.0 2023-01-01
4
+
5
+ **This release contains the following minor breaking changes**
6
+
7
+ - Raise an error when entering hash mode on a sheet with duplicate headers to prevent unintentional behaviour (fixes #44)
8
+ - Xsv now returns frozen strings to further improve performance. This means it's no longer possible to call mutating methods on strings read from worksheets without unfreezing them first.
9
+ - Unescape all HTML entities in XML characters (thanks @til)
10
+
11
+ ## 1.1.1 2022-04-01
12
+
13
+ - Improve compatibility with files generated by the Open XML SDK (#40)
14
+
3
15
  ## 1.1.0 2022-02-13
4
16
 
5
17
  - New, shorter `Xsv.open` syntax as a drop-in replacement for `Xsv::Workbook.open`, which is still supported
@@ -115,4 +127,4 @@ Fix a Gemfile small Gemfile issue that broke the 0.3.3 and 0.3.4 releases
115
127
 
116
128
  ## 0.3.3 - 2020-03-02
117
129
 
118
- Intial version with a changelog and reasonably complete YARD documentation.
130
+ Initial version with a changelog and reasonably complete YARD documentation.
data/README.md CHANGED
@@ -1,9 +1,11 @@
1
1
  # Xsv .xlsx reader
2
2
 
3
- [![Travis CI](https://img.shields.io/travis/martijn/xsv/master)](https://travis-ci.org/martijn/xsv)
4
- [![Codecov](https://img.shields.io/codecov/c/github/martijn/xsv/main)](https://app.codecov.io/gh/martijn/xsv)
5
- [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)
6
- [![Gem Version](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)
3
+
4
+
5
+ [![Test badge](https://img.shields.io/github/actions/workflow/status/martijn/xsv/ruby.yml?branch=main)](https://github.com/martijn/xsv/actions/workflows/ruby.yml)
6
+ [![Codecov badge](https://img.shields.io/codecov/c/github/martijn/xsv/main)](https://app.codecov.io/gh/martijn/xsv)
7
+ [![Yard Docs badge](http://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/github/martijn/xsv)
8
+ [![Gem Version badge](https://badge.fury.io/rb/xsv.svg)](https://badge.fury.io/rb/xsv)
7
9
 
8
10
  Xsv is a fast, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheet files
9
11
  (commonly known as Excel or .xlsx files). It strives to be minimal in the
@@ -35,10 +37,9 @@ Or install it yourself as:
35
37
 
36
38
  $ gem install xsv
37
39
 
38
- Xsv targets ruby >= 2.5 and has a just single dependency, `rubyzip`. It has been
39
- tested successfully with MRI, JRuby, and TruffleRuby. Due to the lack of
40
- native extensions should work well in multi-threaded environments or in Ractor
41
- when that becomes stable.
40
+ Xsv targets ruby >= 2.6 and has a just single dependency, `rubyzip`. It has been
41
+ tested successfully with MRI, JRuby, and TruffleRuby. It has no native extensions
42
+ and is designed to be thread-safe.
42
43
 
43
44
  ## Usage
44
45
 
@@ -84,8 +85,11 @@ sheet.parse_headers!
84
85
  sheet[0] # => {"header1" => "value1", "header2" => "value2"}
85
86
  ```
86
87
 
87
- Be aware that hash mode will lead to unpredictable results if the worksheet
88
- has multiple columns with the same header. `Xsv::Sheet` implements `Enumerable` so along with `#each`
88
+ Because of the way Ruby hashes work will raise `Xsv::DuplicateHeaders` if it detects
89
+ duplicate values in the header row when calling `#parse_headers!` or when opening
90
+ a workbook with `parse_headers: true`.
91
+
92
+ `Xsv::Sheet` implements `Enumerable` so along with `#each`
89
93
  you can call methods like `#first`, `#filter`/`#select`, and `#map` on it.
90
94
 
91
95
  ### Opening a string or buffer instead of filename
@@ -1,8 +1,10 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "cgi"
4
+
3
5
  module Xsv
4
6
  class SaxParser
5
- ATTR_REGEX = /((\S+)="(.*?)")/m
7
+ ATTR_REGEX = /((\p{Alnum}+)="(.*?)")/mn
6
8
 
7
9
  def parse(io)
8
10
  responds_to_end_element = respond_to?(:end_element)
@@ -36,14 +38,7 @@ module Xsv
36
38
  chars = pbuf.slice!(0, o + 1).chop!.force_encoding("utf-8")
37
39
 
38
40
  if responds_to_characters && !chars.empty?
39
- if chars.index("&")
40
- chars.gsub!("&", "&")
41
- chars.gsub!("'", "'")
42
- chars.gsub!(">", ">")
43
- chars.gsub!("&lt;", "<")
44
- chars.gsub!("&quot;", '"')
45
- end
46
- characters(chars)
41
+ characters(CGI.unescapeHTML(chars))
47
42
  end
48
43
 
49
44
  state = :look_end
@@ -67,13 +62,15 @@ module Xsv
67
62
  args = nil
68
63
  end
69
64
 
65
+ stripped_tag_name = strip_namespace(tag_name)
66
+
70
67
  if tag_name.start_with?("/")
71
- end_element(tag_name[1..]) if responds_to_end_element
68
+ end_element(strip_namespace(tag_name[1..])) if responds_to_end_element
72
69
  elsif args.nil?
73
- start_element(tag_name, nil)
70
+ start_element(stripped_tag_name, nil)
74
71
  else
75
- start_element(tag_name, args.scan(ATTR_REGEX).each_with_object({}) { |m, h| h[m[1].to_sym] = m[2] })
76
- end_element(tag_name) if responds_to_end_element && args.end_with?("/")
72
+ start_element(stripped_tag_name, args.scan(ATTR_REGEX).each_with_object({}) { |(_, k, v), h| h[k.to_sym] = v })
73
+ end_element(stripped_tag_name) if responds_to_end_element && args.end_with?("/")
77
74
  end
78
75
 
79
76
  state = :look_start
@@ -85,5 +82,16 @@ module Xsv
85
82
  end
86
83
  end
87
84
  end
85
+
86
+ private
87
+
88
+ # I am not proud of this, but there's simply no need to deal with xmlns for this application ¯\_(ツ)_/¯
89
+ def strip_namespace(tag)
90
+ if (offset = tag.index(":"))
91
+ tag[offset + 1..]
92
+ else
93
+ tag
94
+ end
95
+ end
88
96
  end
89
97
  end
@@ -6,7 +6,7 @@ module Xsv
6
6
  class SharedStringsParser < SaxParser
7
7
  def self.parse(io)
8
8
  strings = []
9
- new { |s| strings << s }.parse(io)
9
+ new { |s| strings << -s }.parse(io)
10
10
  strings
11
11
  end
12
12
 
data/lib/xsv/sheet.rb CHANGED
@@ -83,6 +83,12 @@ module Xsv
83
83
  # @return [self]
84
84
  def parse_headers!
85
85
  @headers = parse_headers
86
+
87
+ # Check for duplicate headers, but don't care about nil columns
88
+ if (duplicate_header = @headers.detect { |h| @headers.count(h) > 1 })
89
+ raise Xsv::DuplicateHeaders, "Duplicate header '#{duplicate_header}' found, consider parsing this sheet in array mode."
90
+ end
91
+
86
92
  @mode = :hash
87
93
 
88
94
  self
@@ -85,7 +85,7 @@ module Xsv
85
85
  when "s"
86
86
  @workbook.shared_strings[@current_value.to_i]
87
87
  when "str", "inlineStr"
88
- @current_value.strip
88
+ -@current_value.strip
89
89
  when "e" # N/A
90
90
  nil
91
91
  when nil, "n"
@@ -17,7 +17,7 @@ module Xsv
17
17
  end
18
18
 
19
19
  def start_element(name, attrs)
20
- @block.call(attrs.slice(:name, :sheetId, :state, :'r:id')) if name == "sheet"
20
+ @block.call(attrs.slice(:name, :sheetId, :state, :id)) if name == "sheet"
21
21
  end
22
22
  end
23
23
  end
data/lib/xsv/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Xsv
4
- VERSION = "1.1.0"
4
+ VERSION = "1.2.0"
5
5
  end
data/lib/xsv/workbook.rb CHANGED
@@ -93,8 +93,11 @@ module Xsv
93
93
  @zip.glob("xl/worksheets/sheet*.xml").sort do |a, b|
94
94
  a.name[/\d+/].to_i <=> b.name[/\d+/].to_i
95
95
  end.map do |entry|
96
- rel = @relationships.detect { |r| entry.name.end_with?(r[:Target]) && r[:Type].end_with?("worksheet") }
97
- sheet_ids = @sheet_ids.detect { |i| i[:"r:id"] == rel[:Id] }
96
+ rel = @relationships.detect do |r|
97
+ entry.name.end_with?(r[:Target].sub(/^\//, "")) && # ignore leading / in some files
98
+ r[:Type].end_with?("worksheet")
99
+ end
100
+ sheet_ids = @sheet_ids.detect { |i| i[:id] == rel[:Id] }
98
101
  Xsv::Sheet.new(self, entry.get_input_stream, entry.size, sheet_ids).tap do |sheet|
99
102
  sheet.parse_headers! if mode == :hash
100
103
  end
data/lib/xsv.rb CHANGED
@@ -21,6 +21,8 @@ require "xsv/workbook"
21
21
  module Xsv
22
22
  class Error < StandardError; end
23
23
 
24
+ class DuplicateHeaders < StandardError; end
25
+
24
26
  # An AssertionFailed error indicates an unexpected condition, meaning a bug
25
27
  # or misinterpreted .xlsx document
26
28
  class AssertionFailed < StandardError; end
data/xsv.gemspec CHANGED
@@ -21,7 +21,7 @@ Gem::Specification.new do |spec|
21
21
  if spec.respond_to?(:metadata)
22
22
  spec.metadata["homepage_uri"] = spec.homepage
23
23
  spec.metadata["source_code_uri"] = "https://github.com/martijn/xsv"
24
- spec.metadata["changelog_uri"] = "https://github.com/martijn/xsv/CHANGELOG.md"
24
+ spec.metadata["changelog_uri"] = "https://raw.githubusercontent.com/martijn/xsv/main/CHANGELOG.md"
25
25
  else
26
26
  raise "RubyGems 2.0 or newer is required to protect against " \
27
27
  "public gem pushes."
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Martijn Storck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-02-13 00:00:00.000000000 Z
11
+ date: 2023-01-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rubyzip
@@ -141,7 +141,7 @@ licenses:
141
141
  metadata:
142
142
  homepage_uri: https://github.com/martijn/xsv
143
143
  source_code_uri: https://github.com/martijn/xsv
144
- changelog_uri: https://github.com/martijn/xsv/CHANGELOG.md
144
+ changelog_uri: https://raw.githubusercontent.com/martijn/xsv/main/CHANGELOG.md
145
145
  post_install_message:
146
146
  rdoc_options: []
147
147
  require_paths: