roo-smarter_csv 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 1e9eda68125d7fc44da76120a836232beb878606d3408663910099f9758115ff
4
+ data.tar.gz: d16379e081c95336d6f4b917e28803b7a9f057bbc9a17ae0d864e7cdd28202aa
5
+ SHA512:
6
+ metadata.gz: bad1032a637d7edb2aca75dfc3f4528532d96ca82dc88f75e8007ec2dfda4b7b4bfd006c8628503abb2e2c6db622553ce423227056b1c64b04e6885b57e51684
7
+ data.tar.gz: 813adc5e8931a020194c3a12bb4a699f20faaa94f97a6a49ec65a9bd9de255699d997ec424aa589e5bcaea5a06b55c964a2aacecc0d8cf188cf57dc3fac17821
@@ -0,0 +1,32 @@
1
+ name: Ruby
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+
8
+ pull_request:
9
+
10
+ permissions:
11
+ contents: read
12
+
13
+ jobs:
14
+ build:
15
+ runs-on: ubuntu-latest
16
+ name: Ruby ${{ matrix.ruby }}
17
+ strategy:
18
+ matrix:
19
+ ruby:
20
+ - '3.4.7'
21
+
22
+ steps:
23
+ - uses: actions/checkout@v6
24
+ with:
25
+ persist-credentials: false
26
+ - name: Set up Ruby
27
+ uses: ruby/setup-ruby@v1
28
+ with:
29
+ ruby-version: ${{ matrix.ruby }}
30
+ bundler-cache: true
31
+ - name: Run the default task
32
+ run: bundle exec rake
data/.gitignore ADDED
@@ -0,0 +1,12 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+ Gemfile.lock
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,39 @@
1
+ AllCops:
2
+ TargetRubyVersion: 2.6.0
3
+
4
+ Naming/FileName:
5
+ Exclude:
6
+ - "lib/roo-smarter_csv.rb"
7
+
8
+ Metrics/ClassLength:
9
+ Enabled: false
10
+
11
+ Style/IfUnlessModifier:
12
+ Enabled: false
13
+
14
+ Metrics/MethodLength:
15
+ Enabled: false
16
+
17
+ Layout/LineLength:
18
+ Enabled: false
19
+
20
+ Metrics/AbcSize:
21
+ Enabled: false
22
+
23
+ Metrics/BlockLength:
24
+ Enabled: false
25
+
26
+ Metrics/CyclomaticComplexity:
27
+ Enabled: false
28
+
29
+ Naming/VariableNumber:
30
+ CheckSymbols: false
31
+
32
+ Style/WhileUntilModifier:
33
+ Enabled: false
34
+
35
+ Style/StringLiterals:
36
+ EnforcedStyle: double_quotes
37
+
38
+ Style/StringLiteralsInInterpolation:
39
+ EnforcedStyle: double_quotes
data/CHANGELOG.md ADDED
@@ -0,0 +1,21 @@
1
+ # Roo SmarterCSV Change Log
2
+
3
+ ## [1.0.0.pre2] - 2026-05-17
4
+
5
+ - Initial release
6
+
7
+ Speedup vs Roo::CSV with SmarterCSV 1.17.1
8
+
9
+ | File | Speedup |
10
+ | ------------------------------ | ------: |
11
+ | PEOPLE_IMPORT_B.csv | 2.98x |
12
+ | uscities.csv | 4.22x |
13
+ | uszips.csv | 4.45x |
14
+ | worldcities.csv | 4.58x |
15
+ | embedded_newlines_60k.csv | 3.84x |
16
+ | heavy_quoting_60k.csv | 3.42x |
17
+ | many_empty_fields_60k.csv | 3.36x |
18
+ | sample_100k.csv | 3.17x |
19
+ | sensor_data_50krows_50cols.csv | 3.23x |
20
+ | tab_separated_60k.tsv | 3.14x |
21
+ | utf8_multibyte_60k.csv | 3.17x |
@@ -0,0 +1,10 @@
1
+ # Code of Conduct
2
+
3
+ "roo-smarter_csv" follows [The Ruby Community Conduct Guideline](https://www.ruby-lang.org/en/conduct) in all "collaborative space", which is defined as community communications channels (such as mailing lists, submitted patches, commit comments, etc.):
4
+
5
+ * Participants will be tolerant of opposing views.
6
+ * Participants must ensure that their language and actions are free of personal attacks and disparaging personal remarks.
7
+ * When interpreting the words and actions of others, participants should always assume good intentions.
8
+ * Behaviour which can be reasonably considered harassment will not be tolerated.
9
+
10
+ If you have any concerns about behaviour within this project, please contact us at ["tilo.sloboda@gmail.com"](mailto:"tilo.sloboda@gmail.com").
data/Gemfile ADDED
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gemspec
6
+
7
+ group :development, :test do
8
+ gem "bundler"
9
+ gem "minitest", "~> 5.4"
10
+ gem "rake"
11
+ gem "rspec"
12
+ gem "rubocop"
13
+ end
14
+
15
+ group :test do
16
+ gem "matrix"
17
+ gem "simplecov"
18
+ end
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 Tilo Sloboda
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,311 @@
1
+ # roo-smarter_csv
2
+
3
+ ![Gem Version](https://img.shields.io/gem/v/roo-smarter_csv) [![codecov](https://codecov.io/gh/tilo/roo-smarter_csv/branch/main/graph/badge.svg)](https://codecov.io/gh/tilo/roo-smarter_csv) [![RubyGems](https://img.shields.io/badge/RubyGems-roo__smarter__csv-brightgreen?logo=rubygems&logoColor=white)](https://rubygems.org/gems/roo-smarter_csv) [![Ruby Toolbox](https://img.shields.io/badge/Ruby%20Toolbox-roo__smarter__csv-brightgreen)](https://www.ruby-toolbox.com/projects/roo-smarter_csv)
4
+
5
+ `roo-smarter_csv` replaces Roo's CSV backend with [SmarterCSV](https://github.com/tilo/smarter_csv) while keeping the Roo spreadsheet API.
6
+
7
+ ## What it does
8
+
9
+ - Uses [SmarterCSV](https://github.com/tilo/smarter_csv) for parsing CSV input
10
+ - Uses SmarterCSV defaults unless overridden by Roo compatibility behavior or explicit options
11
+
12
+ ### SmarterCSV Benefits
13
+ - **SmarterCSV is 3-4.6x faster than Roo::CSV**
14
+ - SmarterCSV automatically detects `col_sep`, `row_sep`
15
+ - SmarterCSV is more robust against real-world data
16
+ - See [Ruby CSV Pitfalls](https://github.com/tilo/smarter_csv/blob/main/docs/ruby_csv_pitfalls.md) for examples of silent data loss and corruption cases in Ruby CSV
17
+ - See [Migrating from Ruby CSV](https://github.com/tilo/smarter_csv/blob/main/docs/migrating_from_csv.md) for behavior differences and migration guidance
18
+ - See [SmarterCSV 1.15.2: Faster Than Raw CSV Arrays](https://tilo-sloboda.medium.com/smartercsv-1-15-2-faster-than-raw-csv-arrays-benchmarks-zsv-and-the-full-pipeline-2c12a798032e) for benchmark background
19
+
20
+ ## Performance
21
+
22
+ Speedup vs Roo::CSV with SmarterCSV 1.17.1
23
+
24
+ | File | Speedup |
25
+ | ------------------------------ | ------: |
26
+ | PEOPLE_IMPORT_B.csv | 2.98x |
27
+ | uscities.csv | 4.22x |
28
+ | uszips.csv | 4.45x |
29
+ | worldcities.csv | 4.58x |
30
+ | embedded_newlines_60k.csv | 3.84x |
31
+ | heavy_quoting_60k.csv | 3.42x |
32
+ | many_empty_fields_60k.csv | 3.36x |
33
+ | sample_100k.csv | 3.17x |
34
+ | sensor_data_50krows_50cols.csv | 3.23x |
35
+ | tab_separated_60k.tsv | 3.14x |
36
+ | utf8_multibyte_60k.csv | 3.17x |
37
+
38
+ ### Roo API
39
+ - Keeps Roo's spreadsheet-style API:
40
+ - `cell`
41
+ - `celltype`
42
+ - `row`
43
+ - `column`
44
+ - `each`
45
+ - `parse`
46
+ - `first_row` / `last_row`
47
+ - `first_column` / `last_column`
48
+ - Preserves Roo's single-sheet CSV behavior
49
+ - Supports Roo's `Roo::Spreadsheet.open(...)` entry point
50
+ - Supports CSV export through Roo's existing `to_csv`
51
+
52
+ ## Installation
53
+
54
+ Add to your Gemfile:
55
+
56
+ ```ruby
57
+ gem "roo-smarter_csv"
58
+ ```
59
+
60
+ Then run:
61
+
62
+ ```bash
63
+ bundle install
64
+ ```
65
+
66
+ ## Activation
67
+
68
+ ```ruby
69
+ require "roo-smarter_csv"
70
+
71
+ spreadsheet = Roo::Spreadsheet.open("data.csv")
72
+ ```
73
+
74
+ `require "roo-smarter_csv"` automatically loads both `roo` and `smarter_csv` and registers `Roo::SmarterCSV` as Roo's CSV handler.
75
+
76
+ ## Supported behavior
77
+
78
+ `roo-smarter_csv` reads the full CSV input and exposes it through Roo's spreadsheet abstraction.
79
+
80
+ It supports:
81
+
82
+ - local files
83
+ - `StringIO` / stream input
84
+ - Roo's `Roo::Spreadsheet.open(...)`
85
+ - CSV files with a UTF-8 BOM
86
+ - tab-delimited input via `col_sep: "\t"`
87
+ - SmarterCSV type conversion
88
+ - warnings emitted by SmarterCSV
89
+ - Roo's `to_csv` export for the in-memory spreadsheet representation
90
+
91
+ ## Architecture note
92
+
93
+ SmarterCSV is used as the parser, but Roo remains the public model.
94
+
95
+ That means:
96
+
97
+ - SmarterCSV row hashes are an internal parsing representation
98
+ - Roo still stores data in its coordinate-based cell grid
99
+ - Roo's public API remains spreadsheet-like
100
+ - hash-based rows are only an intermediate step for parser-to-grid conversion
101
+
102
+ ## Options
103
+
104
+ - SmarterCSV options are handled as nested options, e.g. `options = { smarter_csv: {} }`
105
+ - `roo-smarter_csv` defaults the SmarterCSV option `remove_empty_hashes` to `false`, so that it is compatible with Roo.
106
+ - `roo-smarter_csv` honors some of the `csv_options` from Roo, but we encourage that you pass those under `smarter_csv` options.
107
+
108
+ ### Option precedence
109
+
110
+ `roo-smarter_csv` understands two option namespaces:
111
+
112
+ ### 1. SmarterCSV options
113
+ Primary namespace:
114
+
115
+ ```ruby
116
+ smarter_csv: {
117
+ col_sep: ";",
118
+ row_sep: "\n",
119
+ quote_char: '"',
120
+ encoding: "utf-8"
121
+ }
122
+ ```
123
+
124
+ ### 2. Roo compatibility options
125
+ Roo already uses:
126
+
127
+ ```ruby
128
+ csv_options: {
129
+ col_sep: ";",
130
+ row_sep: "\n",
131
+ quote_char: '"',
132
+ encoding: "utf-8"
133
+ }
134
+ ```
135
+
136
+ Only these four keys are copied from `csv_options` into the effective SmarterCSV options:
137
+
138
+ - `col_sep`
139
+ - `row_sep`
140
+ - `quote_char`
141
+ - `encoding`
142
+
143
+ ### Precedence rules
144
+
145
+ 1. Start with SmarterCSV defaults.
146
+ 2. Apply `roo-smarter_csv` compatibility overrides.
147
+ 3. Copy supported keys from `csv_options` into the SmarterCSV options.
148
+ 4. Apply `smarter_csv` on top.
149
+ 5. If the same key exists in both places, `smarter_csv` wins.
150
+ 6. Conflicts emit a warning.
151
+
152
+ Only the following Roo-compatible CSV keys are bridged from `csv_options`:
153
+
154
+ - `col_sep`
155
+ - `row_sep`
156
+ - `quote_char`
157
+ - `encoding`
158
+
159
+ No other Roo options are treated as CSV parser settings.
160
+
161
+ ### Examples
162
+
163
+ #### Only Roo options
164
+
165
+ ```ruby
166
+ Roo::Spreadsheet.open("data.tsv", csv_options: { col_sep: "\t" })
167
+ ```
168
+
169
+ #### Only SmarterCSV options
170
+
171
+ ```ruby
172
+ Roo::Spreadsheet.open("data.csv", smarter_csv: { col_sep: ";" })
173
+ ```
174
+
175
+ #### Both, with conflict
176
+
177
+ ```ruby
178
+ Roo::Spreadsheet.open(
179
+ "data.csv",
180
+ csv_options: { col_sep: ";" },
181
+ smarter_csv: { col_sep: "\t" }
182
+ )
183
+ ```
184
+
185
+ In this case, `smarter_csv[:col_sep]` wins and a warning is emitted.
186
+
187
+ ## SmarterCSV defaults
188
+
189
+ When you do not pass any options, `roo-smarter_csv` starts from SmarterCSV defaults and then applies one compatibility override for Roo:
190
+
191
+ - `remove_empty_hashes: false`
192
+
193
+ That override is intentional. Roo expects blank rows to remain addressable in the spreadsheet model, so `roo-smarter_csv` disables SmarterCSV's default behavior of dropping fully empty row hashes.
194
+
195
+ Some important effective defaults are therefore:
196
+
197
+ - `col_sep: :auto` — auto-detects the separator
198
+ - `row_sep: :auto` — auto-detects line endings
199
+ - `quote_char: '"'`
200
+ - `downcase_header: true`
201
+ - `strings_as_keys: false`
202
+ - `convert_values_to_numeric: true`
203
+ - `remove_empty_hashes: false` — `roo-smarter_csv` sets this for Roo compatibility so blank rows remain addressable through the spreadsheet API.
204
+ - `headers_in_file: true`
205
+
206
+ This means common CSV files work without extra configuration, and SmarterCSV can infer separators and convert numeric values automatically while still preserving Roo-compatible blank rows.
207
+
208
+ ### Default behavior examples
209
+
210
+ #### Auto-detected separator
211
+
212
+ ```ruby
213
+ spreadsheet = Roo::Spreadsheet.open("data.csv")
214
+ ```
215
+
216
+ No `col_sep` is needed for normal comma-separated CSV files.
217
+
218
+ #### Automatic numeric conversion
219
+
220
+ ```ruby
221
+ spreadsheet.cell(2, 2) # => 30
222
+ spreadsheet.cell(2, 4) # => 1.5
223
+ ```
224
+
225
+ #### Headers and keys
226
+
227
+ SmarterCSV downcases headers by default and returns symbol keys:
228
+
229
+ ```ruby
230
+ SmarterCSV.process(StringIO.new("Name,Email\nJohn,john@example.com\n")).first
231
+ # => { name: "John", email: "john@example.com" }
232
+ ```
233
+
234
+ If you want string keys instead, SmarterCSV supports:
235
+
236
+ ```ruby
237
+ SmarterCSV.process(
238
+ StringIO.new("Name,Email\nJohn,john@example.com\n"),
239
+ strings_as_keys: true
240
+ ).first
241
+ # => { "name" => "John", "email" => "john@example.com" }
242
+ ```
243
+
244
+ In `roo-smarter_csv`, those row hashes are used internally to populate Roo's spreadsheet grid. The public Roo methods still behave like spreadsheet methods.
245
+
246
+ ## Examples
247
+
248
+ ### Basic Roo usage
249
+
250
+ ```ruby
251
+ require "roo"
252
+ require "roo-smarter_csv"
253
+
254
+ csv = Roo::Spreadsheet.open("people.csv")
255
+
256
+ csv.cell(2, 1) # => "John"
257
+ csv.cell(2, 2) # => 30
258
+ csv.row(2) # => ["John", 30, "john@example.com", 50000]
259
+ csv.first_row # => 1
260
+ csv.last_row # => 4
261
+ ```
262
+
263
+ ### TSV example
264
+
265
+ ```ruby
266
+ csv = Roo::Spreadsheet.open(
267
+ "people.tsv",
268
+ extension: :csv,
269
+ csv_options: { col_sep: "\t" }
270
+ )
271
+ ```
272
+
273
+ ### Explicit SmarterCSV options
274
+
275
+ ```ruby
276
+ csv = Roo::Spreadsheet.open(
277
+ "data.csv",
278
+ smarter_csv: {
279
+ col_sep: ";",
280
+ quote_char: '"'
281
+ }
282
+ )
283
+ ```
284
+
285
+ ## Development
286
+
287
+ ```bash
288
+ bundle install
289
+ bundle exec rspec
290
+ ```
291
+
292
+ ## Reporting Bugs / Feature Requests
293
+
294
+ Please [open an Issue on GitHub](https://github.com/tilo/roo-smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
295
+
296
+ For reporting issues, please:
297
+ * include a small sample CSV file
298
+ * open a pull-request adding a test that demonstrates the issue
299
+ * mention your version of SmarterCSV, Ruby, Rails
300
+
301
+ ## Contributing
302
+
303
+ 1. Fork it
304
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
305
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
306
+ 4. Push to the branch (`git push origin my-new-feature`)
307
+ 5. Create new Pull Request
308
+
309
+ ## License
310
+
311
+ MIT
data/Rakefile ADDED
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+ require "shellwords"
6
+
7
+ RSpec::Core::RakeTask.new(:spec)
8
+
9
+ RUBOCOP_ARGS = begin
10
+ rubocop_index = ARGV.index("rubocop")
11
+ rubocop_index ? ARGV[(rubocop_index + 1)..] || [] : []
12
+ end
13
+
14
+ RUBOCOP_ARGS.each do |arg|
15
+ task arg do
16
+ end
17
+ end
18
+
19
+ desc "Run RuboCop; extra args after 'rubocop' are passed through"
20
+ task :rubocop do
21
+ sh(["bundle", "exec", "rubocop", *RUBOCOP_ARGS].shelljoin)
22
+ end
23
+
24
+ task default: %i[spec]
data/bin/console ADDED
@@ -0,0 +1,11 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "roo/smarter_csv"
6
+
7
+ # You can add fixtures and/or initialization code here to make experimenting
8
+ # with your gem easier. You can also use a different console, if you like.
9
+
10
+ require "irb"
11
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here