rbxl 1.0.2 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +6 -0
- data/README.md +41 -10
- data/lib/rbxl/read_only_workbook.rb +78 -5
- data/lib/rbxl/read_only_worksheet.rb +74 -10
- data/lib/rbxl/version.rb +1 -1
- data/lib/rbxl.rb +32 -16
- data/sig/rbxl.rbs +5 -5
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5213a5a5d1091d4f8927631c50c7c690362eb284ba1eb31ee80bf3d9d0a1ec7b
|
|
4
|
+
data.tar.gz: 2e5120093c09738342b76fb160b7e259649049dcec66b325bdc88a21f59bc9dd
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '038c1112ff36766d74aea7b9092ace0a0eed88d9ad7c1db28356e3d7598edd52d93d21c128a40c7b4e822719a9e329bca4515118a76ca7fce3fed0a00407f342'
|
|
7
|
+
data.tar.gz: b14444ae769c953832fba7da2c6f09d1371ad14206160cc5d6fd711c583ed33300438322873e9863ba199d79fa60b30e642071546878e9b418530b4dc5d8007f
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,11 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 1.1.0
|
|
4
|
+
|
|
5
|
+
- `Rbxl.open` and `Rbxl.new` now default `read_only: true` and `write_only: true` respectively, so the call site no longer needs the boilerplate. Explicitly passing `false` raises `NotImplementedError`.
|
|
6
|
+
- Add `date_conversion: true` to `Rbxl.open`: numeric cells whose style points at a date/time `numFmt` (built-in ids 14–22, 27–36, 45–47, 50–58, or a custom format code containing date tokens) are returned as `Date` or `Time`. Off by default — no change in output shape or throughput when the flag is absent.
|
|
7
|
+
- Fix Ruby reader path so self-closing `<row/>` and `<c/>` elements are iterated instead of silently dropped, and never yield `nil` for a row.
|
|
8
|
+
|
|
3
9
|
## 1.0.2
|
|
4
10
|
|
|
5
11
|
- Add `streaming: true` to `Rbxl.open` to feed worksheet XML to the native reader in 64 KiB chunks instead of buffering the full worksheet first.
|
data/README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# rbxl
|
|
2
2
|
|
|
3
|
+
[](https://badge.fury.io/rb/rbxl)
|
|
4
|
+
|
|
3
5
|
Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writes.
|
|
4
6
|
|
|
5
7
|
`rbxl` is built for the two workbook workflows that scale cleanly:
|
|
@@ -10,15 +12,14 @@ Fast, memory-friendly Ruby gem for row-by-row `.xlsx` reads and append-only writ
|
|
|
10
12
|
The API is intentionally small and `openpyxl`-inspired, with an optional
|
|
11
13
|
native extension for faster XML parsing when you need more throughput.
|
|
12
14
|
|
|
13
|
-
|
|
15
|
+
Supported:
|
|
14
16
|
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
- `
|
|
18
|
-
- minimal `openpyxl`-like API
|
|
17
|
+
- write-only workbook generation
|
|
18
|
+
- read-only row-by-row iteration
|
|
19
|
+
- opt-in date/time conversion driven by the workbook's `numFmt` styles
|
|
19
20
|
- optional C extension (`rbxl/native`) for maximum performance
|
|
20
21
|
|
|
21
|
-
Out of scope
|
|
22
|
+
Out of scope:
|
|
22
23
|
|
|
23
24
|
- preserving arbitrary workbook structure on save
|
|
24
25
|
- rich style round-tripping
|
|
@@ -29,7 +30,7 @@ Out of scope for this MVP:
|
|
|
29
30
|
```ruby
|
|
30
31
|
require "rbxl"
|
|
31
32
|
|
|
32
|
-
book = Rbxl.new
|
|
33
|
+
book = Rbxl.new
|
|
33
34
|
sheet = book.add_sheet("Report")
|
|
34
35
|
sheet.append(["id", "name", "score"])
|
|
35
36
|
sheet.append([1, "alice", 100])
|
|
@@ -40,7 +41,7 @@ book.save("report.xlsx")
|
|
|
40
41
|
```ruby
|
|
41
42
|
require "rbxl"
|
|
42
43
|
|
|
43
|
-
book = Rbxl.open("report.xlsx"
|
|
44
|
+
book = Rbxl.open("report.xlsx")
|
|
44
45
|
sheet = book.sheet("Report")
|
|
45
46
|
|
|
46
47
|
sheet.each_row do |row|
|
|
@@ -52,8 +53,38 @@ p sheet.calculate_dimension
|
|
|
52
53
|
book.close
|
|
53
54
|
```
|
|
54
55
|
|
|
55
|
-
`
|
|
56
|
-
|
|
56
|
+
`Rbxl.open` defaults to read-only and `Rbxl.new` defaults to write-only;
|
|
57
|
+
the `read_only:` / `write_only:` keywords remain for call-site clarity and
|
|
58
|
+
to leave room for a future read/write mode. Write-only workbooks are
|
|
59
|
+
save-once by design — this matches the optimized mode tradeoff: low
|
|
60
|
+
flexibility in exchange for simpler memory behavior.
|
|
61
|
+
|
|
62
|
+
### Date / time conversion
|
|
63
|
+
|
|
64
|
+
Numeric cells in `.xlsx` files are serial days since 1899-12-31; whether
|
|
65
|
+
they display as `44562`, `2022-01-01`, or `12:00` depends on the cell's
|
|
66
|
+
`numFmt` style. `rbxl` leaves cells as raw `Float` by default so the read
|
|
67
|
+
path stays allocation-light. Pass `date_conversion: true` to opt into
|
|
68
|
+
interpreting the style:
|
|
69
|
+
|
|
70
|
+
```ruby
|
|
71
|
+
require "rbxl"
|
|
72
|
+
|
|
73
|
+
book = Rbxl.open("schedule.xlsx", date_conversion: true)
|
|
74
|
+
book.sheet("Timeline").each_row(values_only: true) do |row|
|
|
75
|
+
row.each { |v| p v } # => Date / Time / Float / String / ...
|
|
76
|
+
end
|
|
77
|
+
book.close
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
With the flag on, `rbxl` parses `xl/styles.xml` once at first use and
|
|
81
|
+
converts numeric cells whose style maps to a built-in date `numFmtId`
|
|
82
|
+
(14–22, 27–36, 45–47, 50–58) or to a custom `formatCode` containing date
|
|
83
|
+
tokens. Whole-number serials return `Date`; fractional serials return
|
|
84
|
+
`Time` so the time-of-day portion is preserved. The flag is off by
|
|
85
|
+
default; leaving it off skips the styles parse entirely and keeps the
|
|
86
|
+
native fast path in use. Turning it on routes reads through the pure-Ruby
|
|
87
|
+
worksheet parser.
|
|
57
88
|
|
|
58
89
|
## Native C Extension
|
|
59
90
|
|
|
@@ -36,14 +36,17 @@ module Rbxl
|
|
|
36
36
|
# @return [Array<String>] visible sheet names in workbook order
|
|
37
37
|
attr_reader :sheet_names
|
|
38
38
|
|
|
39
|
-
# Convenience constructor equivalent to
|
|
39
|
+
# Convenience constructor equivalent to
|
|
40
|
+
# <tt>new(path, streaming:, date_conversion:)</tt>.
|
|
40
41
|
#
|
|
41
42
|
# @param path [String, #to_path] path to the <tt>.xlsx</tt> file
|
|
42
43
|
# @param streaming [Boolean] feed worksheet XML to the native parser in
|
|
43
44
|
# chunks (see {Rbxl.open})
|
|
45
|
+
# @param date_conversion [Boolean] convert numeric cells backed by a
|
|
46
|
+
# date/time +numFmt+ to Ruby date/time objects (see {Rbxl.open})
|
|
44
47
|
# @return [Rbxl::ReadOnlyWorkbook]
|
|
45
|
-
def self.open(path, streaming: false)
|
|
46
|
-
new(path, streaming: streaming)
|
|
48
|
+
def self.open(path, streaming: false, date_conversion: false)
|
|
49
|
+
new(path, streaming: streaming, date_conversion: date_conversion)
|
|
47
50
|
end
|
|
48
51
|
|
|
49
52
|
# Opens the ZIP archive, pre-loads shared strings, and indexes the
|
|
@@ -51,13 +54,17 @@ module Rbxl
|
|
|
51
54
|
#
|
|
52
55
|
# @param path [String, #to_path] path to the <tt>.xlsx</tt> file
|
|
53
56
|
# @param streaming [Boolean] forwarded to produced worksheets
|
|
54
|
-
|
|
57
|
+
# @param date_conversion [Boolean] lazily load styles.xml and forward the
|
|
58
|
+
# date-style lookup table to produced worksheets
|
|
59
|
+
def initialize(path, streaming: false, date_conversion: false)
|
|
55
60
|
@path = path
|
|
56
61
|
@zip = Zip::File.open(path)
|
|
57
62
|
@streaming = streaming
|
|
63
|
+
@date_conversion = date_conversion
|
|
58
64
|
@shared_strings = load_shared_strings
|
|
59
65
|
@sheet_entries = load_sheet_entries
|
|
60
66
|
@sheet_names = @sheet_entries.keys.freeze
|
|
67
|
+
@date_styles = nil
|
|
61
68
|
@closed = false
|
|
62
69
|
end
|
|
63
70
|
|
|
@@ -77,7 +84,14 @@ module Rbxl
|
|
|
77
84
|
raise SheetNotFoundError, "sheet not found: #{name}"
|
|
78
85
|
end
|
|
79
86
|
|
|
80
|
-
ReadOnlyWorksheet.new(
|
|
87
|
+
ReadOnlyWorksheet.new(
|
|
88
|
+
zip: @zip,
|
|
89
|
+
entry_path: entry_path,
|
|
90
|
+
shared_strings: @shared_strings,
|
|
91
|
+
name: name,
|
|
92
|
+
streaming: @streaming,
|
|
93
|
+
date_styles: date_styles
|
|
94
|
+
)
|
|
81
95
|
end
|
|
82
96
|
|
|
83
97
|
# Releases the underlying ZIP file handle. Idempotent; subsequent calls
|
|
@@ -102,6 +116,65 @@ module Rbxl
|
|
|
102
116
|
raise ClosedWorkbookError, "workbook has been closed" if closed?
|
|
103
117
|
end
|
|
104
118
|
|
|
119
|
+
# Built-in numFmtId values that Excel resolves to date/time formats.
|
|
120
|
+
# Ids outside this set are dates only when the workbook provides a
|
|
121
|
+
# matching custom +<numFmt>+ entry whose format code contains date
|
|
122
|
+
# tokens. See ECMA-376 part 1 §18.8.30.
|
|
123
|
+
BUILTIN_DATE_FMT_IDS = Set.new([14, 15, 16, 17, 18, 19, 20, 21, 22,
|
|
124
|
+
27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
|
|
125
|
+
45, 46, 47, 50, 51, 52, 53, 54, 55, 56,
|
|
126
|
+
57, 58]).freeze
|
|
127
|
+
|
|
128
|
+
def date_styles
|
|
129
|
+
return nil unless @date_conversion
|
|
130
|
+
|
|
131
|
+
@date_styles ||= load_date_styles
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
def load_date_styles
|
|
135
|
+
entry = @zip.find_entry("xl/styles.xml")
|
|
136
|
+
return [].freeze unless entry
|
|
137
|
+
|
|
138
|
+
custom_date_ids = Set.new
|
|
139
|
+
date_styles = []
|
|
140
|
+
in_cell_xfs = false
|
|
141
|
+
|
|
142
|
+
each_xml_node("xl/styles.xml") do |node|
|
|
143
|
+
case node.node_type
|
|
144
|
+
when Nokogiri::XML::Reader::TYPE_ELEMENT
|
|
145
|
+
case node.local_name
|
|
146
|
+
when "cellXfs"
|
|
147
|
+
in_cell_xfs = true
|
|
148
|
+
when "numFmt"
|
|
149
|
+
id = node.attribute("numFmtId")
|
|
150
|
+
code = node.attribute("formatCode")
|
|
151
|
+
custom_date_ids << id.to_i if id && code && date_format_code?(code)
|
|
152
|
+
when "xf"
|
|
153
|
+
next unless in_cell_xfs
|
|
154
|
+
|
|
155
|
+
fmt_id_int = node.attribute("numFmtId")&.to_i
|
|
156
|
+
date_styles << (!fmt_id_int.nil? &&
|
|
157
|
+
(BUILTIN_DATE_FMT_IDS.include?(fmt_id_int) || custom_date_ids.include?(fmt_id_int)))
|
|
158
|
+
end
|
|
159
|
+
when Nokogiri::XML::Reader::TYPE_END_ELEMENT
|
|
160
|
+
in_cell_xfs = false if node.local_name == "cellXfs"
|
|
161
|
+
end
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
date_styles.freeze
|
|
165
|
+
end
|
|
166
|
+
|
|
167
|
+
# Quoted literals, bracketed directives (e.g. [Red], [$-409]), and
|
|
168
|
+
# backslash-escaped characters never introduce date tokens, so strip
|
|
169
|
+
# them before looking for +y/m/d/h/s+.
|
|
170
|
+
def date_format_code?(code)
|
|
171
|
+
stripped = code.dup
|
|
172
|
+
stripped.gsub!(/\[[^\]]*\]/, "")
|
|
173
|
+
stripped.gsub!(/"[^"]*"/, "")
|
|
174
|
+
stripped.gsub!(/\\./, "")
|
|
175
|
+
stripped.match?(/[ymdhs]/i)
|
|
176
|
+
end
|
|
177
|
+
|
|
105
178
|
def load_shared_strings
|
|
106
179
|
entry = @zip.find_entry("xl/sharedStrings.xml")
|
|
107
180
|
return [] unless entry
|
|
@@ -55,12 +55,18 @@ module Rbxl
|
|
|
55
55
|
# @param streaming [Boolean] when the native extension is loaded, feed
|
|
56
56
|
# worksheet XML to the parser in chunks instead of reading the entry
|
|
57
57
|
# into memory first
|
|
58
|
-
|
|
58
|
+
# @param date_styles [Array<Boolean>, nil] +true+ at a style id when the
|
|
59
|
+
# id's numFmt is a date/time format. When provided, numeric cells with
|
|
60
|
+
# a matching style are returned as +Date+ or +Time+ instead of +Float+,
|
|
61
|
+
# and the native fast path is bypassed.
|
|
62
|
+
def initialize(zip:, entry_path:, shared_strings:, name:, streaming: false, date_styles: nil)
|
|
59
63
|
@zip = zip
|
|
60
64
|
@entry_path = entry_path
|
|
61
65
|
@shared_strings = shared_strings
|
|
62
66
|
@name = name
|
|
63
67
|
@streaming = streaming
|
|
68
|
+
@date_styles = date_styles
|
|
69
|
+
@disable_native = !date_styles.nil?
|
|
64
70
|
@dimensions = extract_dimensions
|
|
65
71
|
@merge_ranges_by_row = nil
|
|
66
72
|
@merge_anchor_values = {}
|
|
@@ -164,12 +170,14 @@ module Rbxl
|
|
|
164
170
|
end
|
|
165
171
|
|
|
166
172
|
cell_type = nil
|
|
173
|
+
cell_style = nil
|
|
167
174
|
collecting_value = false
|
|
168
175
|
in_v = false
|
|
169
176
|
raw_value = nil
|
|
170
177
|
value_buffer = +""
|
|
171
178
|
current_values = nil
|
|
172
179
|
row_depth = nil
|
|
180
|
+
track_style = !@date_styles.nil?
|
|
173
181
|
|
|
174
182
|
with_sheet_reader do |reader|
|
|
175
183
|
reader.each do |node|
|
|
@@ -179,9 +187,19 @@ module Rbxl
|
|
|
179
187
|
when "row"
|
|
180
188
|
current_values = []
|
|
181
189
|
row_depth = node.depth
|
|
190
|
+
if node.self_closing?
|
|
191
|
+
yield current_values.freeze
|
|
192
|
+
current_values = nil
|
|
193
|
+
end
|
|
182
194
|
when "c"
|
|
183
195
|
cell_type = node.attribute("t")
|
|
196
|
+
cell_style = track_style ? node.attribute("s")&.to_i : nil
|
|
184
197
|
raw_value = nil
|
|
198
|
+
if current_values && node.self_closing?
|
|
199
|
+
current_values << coerce_value(raw_value, cell_type, cell_style)
|
|
200
|
+
cell_type = nil
|
|
201
|
+
cell_style = nil
|
|
202
|
+
end
|
|
185
203
|
when "v"
|
|
186
204
|
collecting_value = true
|
|
187
205
|
in_v = true
|
|
@@ -202,12 +220,13 @@ module Rbxl
|
|
|
202
220
|
raw_value = raw_value ? raw_value << value_buffer : value_buffer.dup
|
|
203
221
|
collecting_value = false
|
|
204
222
|
end
|
|
205
|
-
elsif node.depth == row_depth
|
|
223
|
+
elsif current_values && node.depth == row_depth
|
|
206
224
|
yield current_values.freeze
|
|
207
225
|
current_values = nil
|
|
208
226
|
elsif current_values && node.depth == row_depth + 1
|
|
209
|
-
current_values << coerce_value(raw_value, cell_type)
|
|
227
|
+
current_values << coerce_value(raw_value, cell_type, cell_style)
|
|
210
228
|
cell_type = nil
|
|
229
|
+
cell_style = nil
|
|
211
230
|
raw_value = nil
|
|
212
231
|
end
|
|
213
232
|
end
|
|
@@ -231,12 +250,14 @@ module Rbxl
|
|
|
231
250
|
current_cells = nil
|
|
232
251
|
cell_ref = nil
|
|
233
252
|
cell_type = nil
|
|
253
|
+
cell_style = nil
|
|
234
254
|
current_col_index = 0
|
|
235
255
|
collecting_value = false
|
|
236
256
|
in_v = false
|
|
237
257
|
raw_value = nil
|
|
238
258
|
value_buffer = +""
|
|
239
259
|
row_depth = nil
|
|
260
|
+
track_style = !@date_styles.nil?
|
|
240
261
|
|
|
241
262
|
with_sheet_reader do |reader|
|
|
242
263
|
reader.each do |node|
|
|
@@ -248,6 +269,14 @@ module Rbxl
|
|
|
248
269
|
current_col_index = 0
|
|
249
270
|
current_cells = []
|
|
250
271
|
row_depth = node.depth
|
|
272
|
+
if node.self_closing?
|
|
273
|
+
emit_row(current_cells, current_row_index,
|
|
274
|
+
pad_cells: pad_cells, expand_merged: expand_merged,
|
|
275
|
+
values_only: values_only, &block)
|
|
276
|
+
last_row_index = current_row_index
|
|
277
|
+
current_row_index = nil
|
|
278
|
+
current_cells = nil
|
|
279
|
+
end
|
|
251
280
|
when "c"
|
|
252
281
|
cell_ref = node.attribute("r")
|
|
253
282
|
if cell_ref
|
|
@@ -257,7 +286,14 @@ module Rbxl
|
|
|
257
286
|
cell_ref = "#{column_name(current_col_index)}#{current_row_index}"
|
|
258
287
|
end
|
|
259
288
|
cell_type = node.attribute("t")
|
|
289
|
+
cell_style = track_style ? node.attribute("s")&.to_i : nil
|
|
260
290
|
raw_value = nil
|
|
291
|
+
if current_cells && node.self_closing?
|
|
292
|
+
current_cells << build_row_entry(cell_ref, coerce_value(raw_value, cell_type, cell_style), values_only)
|
|
293
|
+
cell_ref = nil
|
|
294
|
+
cell_type = nil
|
|
295
|
+
cell_style = nil
|
|
296
|
+
end
|
|
261
297
|
when "v"
|
|
262
298
|
collecting_value = true
|
|
263
299
|
in_v = true
|
|
@@ -278,17 +314,18 @@ module Rbxl
|
|
|
278
314
|
raw_value = raw_value ? raw_value << value_buffer : value_buffer.dup
|
|
279
315
|
collecting_value = false
|
|
280
316
|
end
|
|
281
|
-
elsif node.depth == row_depth
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
317
|
+
elsif current_cells && node.depth == row_depth
|
|
318
|
+
emit_row(current_cells, current_row_index,
|
|
319
|
+
pad_cells: pad_cells, expand_merged: expand_merged,
|
|
320
|
+
values_only: values_only, &block)
|
|
285
321
|
last_row_index = current_row_index
|
|
286
322
|
current_row_index = nil
|
|
287
323
|
current_cells = nil
|
|
288
324
|
elsif current_cells && node.depth == row_depth + 1
|
|
289
|
-
current_cells << build_row_entry(cell_ref, coerce_value(raw_value, cell_type), values_only)
|
|
325
|
+
current_cells << build_row_entry(cell_ref, coerce_value(raw_value, cell_type, cell_style), values_only)
|
|
290
326
|
cell_ref = nil
|
|
291
327
|
cell_type = nil
|
|
328
|
+
cell_style = nil
|
|
292
329
|
raw_value = nil
|
|
293
330
|
end
|
|
294
331
|
end
|
|
@@ -296,6 +333,12 @@ module Rbxl
|
|
|
296
333
|
end
|
|
297
334
|
end
|
|
298
335
|
|
|
336
|
+
def emit_row(cells, row_index, pad_cells:, expand_merged:, values_only:)
|
|
337
|
+
cells = pad_row(cells, row_index, values_only: values_only) if pad_cells
|
|
338
|
+
cells = expand_merged_cells(cells, row_index, values_only: values_only) if expand_merged
|
|
339
|
+
yield values_only ? extract_values(cells).freeze : Row.new(index: row_index, cells: cells)
|
|
340
|
+
end
|
|
341
|
+
|
|
299
342
|
def with_sheet_reader
|
|
300
343
|
io = @zip.get_entry(@entry_path).get_input_stream
|
|
301
344
|
reader = Nokogiri::XML::Reader(io)
|
|
@@ -527,7 +570,7 @@ module Rbxl
|
|
|
527
570
|
@merge_ranges_by_row ||= extract_merge_ranges_by_row
|
|
528
571
|
end
|
|
529
572
|
|
|
530
|
-
def coerce_value(raw_value, type)
|
|
573
|
+
def coerce_value(raw_value, type, style_id = nil)
|
|
531
574
|
case type
|
|
532
575
|
when "s"
|
|
533
576
|
@shared_strings[raw_value.to_i]
|
|
@@ -536,10 +579,31 @@ module Rbxl
|
|
|
536
579
|
when "b"
|
|
537
580
|
raw_value == "1"
|
|
538
581
|
else
|
|
539
|
-
infer_scalar(raw_value)
|
|
582
|
+
value = infer_scalar(raw_value)
|
|
583
|
+
return value unless @date_styles && style_id && value.is_a?(Numeric) && @date_styles[style_id]
|
|
584
|
+
|
|
585
|
+
excel_serial_to_ruby(value)
|
|
540
586
|
end
|
|
541
587
|
end
|
|
542
588
|
|
|
589
|
+
# Excel's serial date counts days from 1899-12-31 as serial 1, with a
|
|
590
|
+
# documented leap-year bug for the non-existent 1900-02-29 (serial 60)
|
|
591
|
+
# — for serials >= 60 the day-count is shifted back by one so that
|
|
592
|
+
# post-1900 dates line up with the proleptic Gregorian calendar.
|
|
593
|
+
# Whole-number serials are returned as +Date+; fractional serials as
|
|
594
|
+
# +Time+ so that both date and time-of-day survive the conversion.
|
|
595
|
+
def excel_serial_to_ruby(serial)
|
|
596
|
+
whole = serial.to_i
|
|
597
|
+
whole -= 1 if whole >= 60
|
|
598
|
+
frac = serial - serial.to_i
|
|
599
|
+
base = Date.new(1899, 12, 31) + whole
|
|
600
|
+
|
|
601
|
+
return base if frac.zero?
|
|
602
|
+
|
|
603
|
+
seconds = (frac * 86_400).round
|
|
604
|
+
Time.new(base.year, base.month, base.day) + seconds
|
|
605
|
+
end
|
|
606
|
+
|
|
543
607
|
def infer_scalar(raw_value)
|
|
544
608
|
return nil if raw_value.nil? || raw_value.empty?
|
|
545
609
|
|
data/lib/rbxl/version.rb
CHANGED
data/lib/rbxl.rb
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
require "cgi"
|
|
2
2
|
require "date"
|
|
3
3
|
require "nokogiri"
|
|
4
|
+
require "set"
|
|
4
5
|
require "stringio"
|
|
5
6
|
require "zip"
|
|
6
7
|
|
|
@@ -32,16 +33,19 @@ require_relative "rbxl/write_only_worksheet"
|
|
|
32
33
|
#
|
|
33
34
|
# require "rbxl"
|
|
34
35
|
#
|
|
35
|
-
# book = Rbxl.open("report.xlsx"
|
|
36
|
+
# book = Rbxl.open("report.xlsx")
|
|
36
37
|
# sheet = book.sheet("Report")
|
|
37
38
|
# sheet.each_row(values_only: true) { |values| p values }
|
|
38
39
|
# book.close
|
|
39
40
|
#
|
|
41
|
+
# Pass <tt>date_conversion: true</tt> to return Date/Time objects for
|
|
42
|
+
# numeric cells that carry a date +numFmt+ style.
|
|
43
|
+
#
|
|
40
44
|
# == Writing
|
|
41
45
|
#
|
|
42
46
|
# require "rbxl"
|
|
43
47
|
#
|
|
44
|
-
# book = Rbxl.new
|
|
48
|
+
# book = Rbxl.new
|
|
45
49
|
# sheet = book.add_sheet("Report")
|
|
46
50
|
# sheet << ["id", "name", "score"]
|
|
47
51
|
# sheet << [1, "alice", 100]
|
|
@@ -84,9 +88,9 @@ module Rbxl
|
|
|
84
88
|
|
|
85
89
|
# Opens an existing workbook in read-only row-by-row mode.
|
|
86
90
|
#
|
|
87
|
-
# The +read_only+ keyword
|
|
88
|
-
#
|
|
89
|
-
#
|
|
91
|
+
# The +read_only+ keyword defaults to +true+ and exists to mark the
|
|
92
|
+
# intent explicitly at the call site. Passing +read_only: false+ raises
|
|
93
|
+
# {NotImplementedError}; a read/write mode is not available.
|
|
90
94
|
#
|
|
91
95
|
# With <tt>streaming: true</tt>, the native backend (when loaded) feeds
|
|
92
96
|
# worksheet XML to the parser in chunks pulled from the ZIP input stream
|
|
@@ -97,29 +101,41 @@ module Rbxl
|
|
|
97
101
|
# differs — and typically pays back a few percent of throughput on small
|
|
98
102
|
# sheets in exchange for the flat memory profile.
|
|
99
103
|
#
|
|
104
|
+
# With <tt>date_conversion: true</tt>, numeric cells whose style points at
|
|
105
|
+
# a date/time +numFmt+ (built-in ids 14–22, 27–36, 45–47, 50–58, or any
|
|
106
|
+
# custom format code containing a date/time token) are returned as
|
|
107
|
+
# +Date+, +Time+, or +DateTime+ instead of a raw serial +Float+. The flag
|
|
108
|
+
# is off by default to preserve byte-for-byte behavior and skip the
|
|
109
|
+
# styles.xml parse for workbooks that don't need it; enabling it
|
|
110
|
+
# disables the native fast path and routes reads through the Ruby
|
|
111
|
+
# worksheet parser.
|
|
112
|
+
#
|
|
100
113
|
# @param path [String, #to_path] filesystem path to an <tt>.xlsx</tt> file
|
|
101
|
-
# @param read_only [Boolean] must be +true+
|
|
114
|
+
# @param read_only [Boolean] retained for call-site clarity; must be +true+
|
|
102
115
|
# @param streaming [Boolean] feed worksheet XML to the native parser in
|
|
103
116
|
# chunks instead of fully inflating the entry in advance. Ignored when
|
|
104
117
|
# the native extension is not loaded.
|
|
118
|
+
# @param date_conversion [Boolean] convert numeric cells backed by a
|
|
119
|
+
# date/time +numFmt+ to +Date+ / +Time+ / +DateTime+
|
|
105
120
|
# @return [Rbxl::ReadOnlyWorkbook]
|
|
106
|
-
# @raise [
|
|
107
|
-
def open(path, read_only:
|
|
108
|
-
raise
|
|
121
|
+
# @raise [NotImplementedError] if +read_only+ is not +true+
|
|
122
|
+
def open(path, read_only: true, streaming: false, date_conversion: false)
|
|
123
|
+
raise NotImplementedError, "read/write mode is not supported; pass read_only: true" unless read_only
|
|
109
124
|
|
|
110
|
-
ReadOnlyWorkbook.open(path, streaming: streaming)
|
|
125
|
+
ReadOnlyWorkbook.open(path, streaming: streaming, date_conversion: date_conversion)
|
|
111
126
|
end
|
|
112
127
|
|
|
113
128
|
# Creates a new workbook in write-only mode.
|
|
114
129
|
#
|
|
115
|
-
# The +write_only+ keyword
|
|
116
|
-
# save-once, append-only contract
|
|
130
|
+
# The +write_only+ keyword defaults to +true+ and exists to mark the
|
|
131
|
+
# save-once, append-only contract explicitly. Passing
|
|
132
|
+
# +write_only: false+ raises {NotImplementedError}.
|
|
117
133
|
#
|
|
118
|
-
# @param write_only [Boolean] must be +true+
|
|
134
|
+
# @param write_only [Boolean] retained for call-site clarity; must be +true+
|
|
119
135
|
# @return [Rbxl::WriteOnlyWorkbook]
|
|
120
|
-
# @raise [
|
|
121
|
-
def new(write_only:
|
|
122
|
-
raise
|
|
136
|
+
# @raise [NotImplementedError] if +write_only+ is not +true+
|
|
137
|
+
def new(write_only: true)
|
|
138
|
+
raise NotImplementedError, "read/write mode is not supported; pass write_only: true" unless write_only
|
|
123
139
|
|
|
124
140
|
WriteOnlyWorkbook.new
|
|
125
141
|
end
|
data/sig/rbxl.rbs
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
module Rbxl
|
|
2
2
|
VERSION: String
|
|
3
3
|
|
|
4
|
-
type cell_value = String | Integer | Float | bool | nil
|
|
4
|
+
type cell_value = String | Integer | Float | bool | Date | Time | nil
|
|
5
5
|
type pathish = String | Pathname
|
|
6
6
|
type row_input = Array[untyped] | Enumerator[untyped, untyped]
|
|
7
7
|
type row_values = Array[cell_value]
|
|
@@ -9,7 +9,7 @@ module Rbxl
|
|
|
9
9
|
type row_cells = Array[row_cell]
|
|
10
10
|
type dimensions = { ref: String, max_col: Integer, max_row: Integer }
|
|
11
11
|
|
|
12
|
-
def self.open: (pathish path, ?read_only: bool, ?streaming: bool) -> ReadOnlyWorkbook
|
|
12
|
+
def self.open: (pathish path, ?read_only: bool, ?streaming: bool, ?date_conversion: bool) -> ReadOnlyWorkbook
|
|
13
13
|
def self.new: (?write_only: bool) -> WriteOnlyWorkbook
|
|
14
14
|
|
|
15
15
|
attr_accessor self.max_shared_strings: Integer?
|
|
@@ -83,8 +83,8 @@ module Rbxl
|
|
|
83
83
|
attr_reader path: String
|
|
84
84
|
attr_reader sheet_names: Array[String]
|
|
85
85
|
|
|
86
|
-
def self.open: (pathish path, ?streaming: bool) -> ReadOnlyWorkbook
|
|
87
|
-
def initialize: (pathish path, ?streaming: bool) -> void
|
|
86
|
+
def self.open: (pathish path, ?streaming: bool, ?date_conversion: bool) -> ReadOnlyWorkbook
|
|
87
|
+
def initialize: (pathish path, ?streaming: bool, ?date_conversion: bool) -> void
|
|
88
88
|
def sheet: (String name) -> ReadOnlyWorksheet
|
|
89
89
|
def close: () -> void
|
|
90
90
|
def closed?: () -> bool
|
|
@@ -94,7 +94,7 @@ module Rbxl
|
|
|
94
94
|
attr_reader name: String
|
|
95
95
|
attr_reader dimensions: dimensions?
|
|
96
96
|
|
|
97
|
-
def initialize: (zip: untyped, entry_path: String, shared_strings: Array[String], name: String, ?streaming: bool) -> void
|
|
97
|
+
def initialize: (zip: untyped, entry_path: String, shared_strings: Array[String], name: String, ?streaming: bool, ?date_styles: Array[bool]?) -> void
|
|
98
98
|
|
|
99
99
|
def each_row: (?pad_cells: bool, ?values_only: bool, ?expand_merged: bool) { (Row | row_values) -> void } -> void
|
|
100
100
|
| (?pad_cells: bool, ?values_only: bool, ?expand_merged: bool) -> Enumerator[Row | row_values, void]
|