smarter_csv 1.17.2 → 1.17.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +235 -61
- data/README.md +4 -1
- data/UPGRADING.md +251 -0
- data/docs/.nojekyll +0 -0
- data/docs/upgrade_path.json +175 -0
- data/docs/upgrade_wizard.html +498 -0
- data/ext/smarter_csv/smarter_csv.c +176 -309
- data/lib/smarter_csv/parser.rb +4 -2
- data/lib/smarter_csv/version.rb +1 -1
- data/smarter_csv.gemspec +7 -5
- metadata +8 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ec50e8539c6872f9c86c25eabc2982e39846ad07dc5a21021fc687c7661f8084
|
|
4
|
+
data.tar.gz: 977ce04d8dd225b6042ea03ad0c174305f3ea122340fad052e2c2ada440d6400
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 0452dc7f15ab31b0cdfad83ca718e17e6456cf6c9826d177e606c5924f3ec72a155c86ee6f9f938540fe3b2ed8f694a981c95cf775b5a38d7f7e44318bc453a3
|
|
7
|
+
data.tar.gz: c1c9732d6d4393fb2ffa995f0c7bb73cd60f566132d487a86007f8a5a623257365c2324ed894a5b38d681df7cfab67069d9fa2e61fd525ba51675954ddadad7a
|
data/CHANGELOG.md
CHANGED
|
@@ -1,11 +1,40 @@
|
|
|
1
1
|
|
|
2
2
|
# SmarterCSV 1.x Change Log
|
|
3
3
|
|
|
4
|
+
> [!TIP]
|
|
5
|
+
> **Upgrading?** The [SmarterCSV Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) walks you through what (if anything) you need to change for your specific version. Most hops do not require any changes.
|
|
6
|
+
|
|
7
|
+
## 1.17.3 (2026-05-26)
|
|
8
|
+
|
|
9
|
+
RSpec tests: **2,274→ 2,277** (+3 tests)
|
|
10
|
+
|
|
11
|
+
* No functional changes
|
|
12
|
+
* added 3 test cases
|
|
13
|
+
|
|
14
|
+
### Improvements
|
|
15
|
+
* DRY-up C-code
|
|
16
|
+
* no performance changes on the C-path
|
|
17
|
+
|
|
18
|
+
### Performance
|
|
19
|
+
* performance improvement on the Ruby-path
|
|
20
|
+
|
|
21
|
+
| File | RB-path |
|
|
22
|
+
|-----------------------------------|--------------|
|
|
23
|
+
| PEOPLE_IMPORT_B / PEOPLE_IMPORT_C | 13.5% faster |
|
|
24
|
+
| tab_separated_60k | 13.2% faster |
|
|
25
|
+
| sample_100k | 10.3% faster |
|
|
26
|
+
| multi_char_separator | 9.0% faster |
|
|
27
|
+
| utf8_multibyte | 7.1% faster |
|
|
28
|
+
| many_empty_fields | 6.7% faster |
|
|
29
|
+
| PEOPLE_IMPORT_NC | 5.2% faster |
|
|
30
|
+
| sensor_data | 4.5% faster |
|
|
31
|
+
|
|
32
|
+
|
|
4
33
|
## 1.17.2 (2026-05-21)
|
|
5
34
|
|
|
6
35
|
RSpec tests: **2,220→ 2,274** (+54 tests)
|
|
7
36
|
|
|
8
|
-
### Bug
|
|
37
|
+
### Bug Fixes
|
|
9
38
|
|
|
10
39
|
- fixed [Issue #334](https://github.com/tilo/smarter_csv/issues/334) with escaped double quote followed by comma. Thanks to [conorg](https://github.com/conorg)
|
|
11
40
|
- fixed bug when using `headers: { except: }`
|
|
@@ -73,6 +102,16 @@ Measured against 1.16.4 (Apple M4, Ruby 3.4.7):
|
|
|
73
102
|
|
|
74
103
|
Per-file breakdown: [`docs/releases/1.17.0/performance_notes.md`](docs/releases/1.17.0/performance_notes.md).
|
|
75
104
|
|
|
105
|
+
## 1.16.6 (2026-05-21)
|
|
106
|
+
|
|
107
|
+
RSpec tests: **1,467 → 1,591** (+124 tests)
|
|
108
|
+
|
|
109
|
+
### Bug Fixes
|
|
110
|
+
|
|
111
|
+
- fixed [Issue #334](https://github.com/tilo/smarter_csv/issues/334) with escaped double quote followed by comma. Thanks to [conorg](https://github.com/conorg)
|
|
112
|
+
- fixed bug when using `headers: { except: }`
|
|
113
|
+
- added more tests
|
|
114
|
+
|
|
76
115
|
## 1.16.5 (2026-05-17)
|
|
77
116
|
|
|
78
117
|
### Bug Fix
|
|
@@ -164,23 +203,42 @@ RSpec tests: **1,247 → 1,410** (+163 tests)
|
|
|
164
203
|
|
|
165
204
|
* Added 163 tests covering new features and corner cases
|
|
166
205
|
|
|
167
|
-
## 1.16.0 (2026-03-12) —
|
|
206
|
+
## 1.16.0 (2026-03-12) — improved RFC 4180 quote handling, new APIs, large performance gains
|
|
168
207
|
|
|
169
208
|
[Full details](docs/releases/1.16.0/changes.md) · [Benchmarks](docs/releases/1.16.0/benchmarks.md) · [Performance notes](docs/releases/1.16.0/performance_notes.md)
|
|
170
209
|
|
|
171
210
|
RSpec tests: **714 → 1,247** (+533 tests)
|
|
172
211
|
|
|
173
|
-
###
|
|
212
|
+
### (Bug Fix) `quote_boundary:` — new default for how mid-field quotes are handled
|
|
213
|
+
|
|
214
|
+
**In short — most users will see incorrect output silently improve. If your CSV files don't contain stray `"` characters in the middle of unquoted fields, you are not affected. If they do, the new default produces correct output where the old default produced corrupted output.**
|
|
174
215
|
|
|
175
|
-
|
|
176
|
-
* defaults to `:standard`**: quotes are now only recognized as field delimiters at field boundaries;
|
|
177
|
-
mid-field quotes are treated as literal characters.
|
|
216
|
+
A new option `quote_boundary:` controls when a `"` character marks the start or end of a quoted field versus when it's a literal character inside the field.
|
|
178
217
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
correct behavior improve, not regress.
|
|
218
|
+
* `quote_boundary: :standard` (the new default) — quotes are only recognized as field delimiters at field boundaries (start of a field, or immediately before `col_sep` / end of line). A `"` that appears in the middle of an unquoted field is treated as a literal character. This matches RFC 4180 and Ruby's standard `CSV` library.
|
|
219
|
+
* `quote_boundary: :legacy` — **not recommended.** Restores the pre-1.16.0 behavior, where any `"` could open a quoted region. This is the behavior that produced silently corrupt output on files with stray mid-field quotes; it exists only as an escape hatch for code that built workarounds on top of the buggy output. New code should never use this.
|
|
182
220
|
|
|
183
|
-
|
|
221
|
+
In practice, the old `:legacy` behavior was silently producing corrupt output whenever a CSV file contained a stray mid-field `"` — so for most users this change makes output **correct** where it was wrong before, not the other way around.
|
|
222
|
+
|
|
223
|
+
#### You are NOT affected if:
|
|
224
|
+
- Your CSV files don't contain any `"` characters mid-field (the common case).
|
|
225
|
+
- Your CSV files quote fields cleanly per RFC 4180 (well-formed `"..."` around each quoted field, no stray quotes inside other fields).
|
|
226
|
+
|
|
227
|
+
#### You are affected if:
|
|
228
|
+
- Your CSV files contain stray `"` characters in the middle of unquoted fields (e.g. `5'6"`, `Joe "the Hat" Smith` without surrounding quotes), **and** you had downstream code that compensated for the previously-corrupted parse output.
|
|
229
|
+
|
|
230
|
+
#### How to migrate
|
|
231
|
+
|
|
232
|
+
For almost everyone: do nothing. Upgrade and observe that the output is the same or more correct.
|
|
233
|
+
|
|
234
|
+
The `quote_boundary: :legacy` option exists only as a short-term escape hatch — **we do not advise using it**, because it re-enables the buggy parse behavior that motivated this change. If your code built workarounds on top of the previously-corrupted output, the right fix is to remove those workarounds and rely on the new `:standard` behavior, not to opt back into the bug:
|
|
235
|
+
|
|
236
|
+
```ruby
|
|
237
|
+
# Only as a temporary escape hatch — not recommended for new code:
|
|
238
|
+
SmarterCSV.process('file.csv', quote_boundary: :legacy)
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
See [Parsing Strategy](docs/parsing_strategy.md) for details on how each mode handles edge cases.
|
|
184
242
|
|
|
185
243
|
### Performance
|
|
186
244
|
|
|
@@ -399,44 +457,90 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
|
|
|
399
457
|
## 1.13.1 (2024-12-12)
|
|
400
458
|
* fix bug with SmarterCSV.generate with `force_quotes: true` ([issue 294](https://github.com/tilo/smarter_csv/issues/294))
|
|
401
459
|
|
|
402
|
-
## 1.13.0 (2024-11-06)
|
|
403
|
-
|
|
404
|
-
CHANGED DEFAULT BEHAVIOR
|
|
405
|
-
========================
|
|
406
|
-
The changes are to improve robustness and to reduce the risk of data loss
|
|
460
|
+
## 1.13.0 (2024-11-06) — Three default-behavior changes that prevent silent data loss
|
|
407
461
|
|
|
408
|
-
|
|
462
|
+
This release flipped three defaults so that SmarterCSV no longer silently loses data in three specific edge cases. For most users this is a quiet improvement — files that used to lose rows or columns silently now parse correctly with no code changes. Each change below has a short "affected if / not affected if" so you can skip past it quickly.
|
|
409
463
|
|
|
410
|
-
|
|
411
|
-
-> SmarterCSV will now raise `SmarterCSV::MalformedCSV` for unbalanced quote_char.
|
|
464
|
+
The motivation for all three changes is the same: data loss should never be silent. Either parse it correctly, or raise loudly.
|
|
412
465
|
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
* previous behavior:
|
|
416
|
-
when a CSV row had more columns than listed in the header, the additional columns were ignored
|
|
466
|
+
### Change 1 (Bug Fix): extra columns in a row are auto-named instead of dropped
|
|
417
467
|
|
|
418
|
-
|
|
419
|
-
* new default behavior is to auto-generate additional headers, e.g. :column_7, :column_8, etc
|
|
420
|
-
* you can set option `:strict` to true in order to get a `SmarterCSV::MalformedCSV` exception instead
|
|
468
|
+
(Thanks to James Fenley, [issue #284](https://github.com/tilo/smarter_csv/issues/284).)
|
|
421
469
|
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
470
|
+
If a CSV row had more columns than the header (e.g. header has 6 columns, a row has 8), the extras used to be **silently dropped**. As of 1.13.0 they survive as `:column_7`, `:column_8`, etc.
|
|
471
|
+
|
|
472
|
+
#### You are NOT affected if:
|
|
473
|
+
- Your CSV files have exactly as many columns per row as headers (the common case).
|
|
474
|
+
|
|
475
|
+
#### You are affected if:
|
|
476
|
+
- Your CSV files have rows with extra columns past the header **and** your code expects only the header-listed keys.
|
|
477
|
+
|
|
478
|
+
#### How to migrate
|
|
479
|
+
|
|
480
|
+
If you want the old "ignore extras" behavior, drop the extra keys yourself. If you want loud failure instead, use the strict mode:
|
|
481
|
+
|
|
482
|
+
```ruby
|
|
483
|
+
# Raise SmarterCSV::MalformedCSV on extra columns:
|
|
484
|
+
SmarterCSV.process('file.csv', strict: true)
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
(In 1.16.0 this option was renamed to `missing_headers: :raise`, but `strict: true` still works.)
|
|
488
|
+
|
|
489
|
+
### Change 2 (Bug Fix): unbalanced quotes raise `MalformedCSV` instead of producing garbage
|
|
490
|
+
|
|
491
|
+
(Thanks to Simon Rentzke, James Fenley, Randall B, and Matthew Kennedy. Issues [#283](https://github.com/tilo/smarter_csv/issues/283), [#288](https://github.com/tilo/smarter_csv/issues/288).)
|
|
492
|
+
|
|
493
|
+
Files with an unbalanced `quote_char` (an opening `"` with no matching close) used to parse to corrupted output. As of 1.13.0 they raise `SmarterCSV::MalformedCSV`.
|
|
494
|
+
|
|
495
|
+
#### You are NOT affected if:
|
|
496
|
+
- Your CSV files have well-formed quotes (the common case).
|
|
497
|
+
|
|
498
|
+
#### You are affected if:
|
|
499
|
+
- Some of your input files have unbalanced quotes and you used to silently live with the garbled output.
|
|
500
|
+
|
|
501
|
+
#### How to migrate
|
|
425
502
|
|
|
426
|
-
|
|
503
|
+
If you need to keep processing other files even when one is malformed, rescue the new exception:
|
|
427
504
|
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
505
|
+
```ruby
|
|
506
|
+
begin
|
|
507
|
+
SmarterCSV.process('file.csv')
|
|
508
|
+
rescue SmarterCSV::MalformedCSV => e
|
|
509
|
+
warn "Skipping malformed file: #{e.message}"
|
|
510
|
+
end
|
|
511
|
+
```
|
|
431
512
|
|
|
432
|
-
|
|
433
|
-
Setting `user_provided_headers` sets`headers_in_file: false`
|
|
434
|
-
a) Improved behavior if there was no header in the input data.
|
|
435
|
-
b) If there was a header in the input data, and `user_provided_headers` is used to override the headers in the file, then please explicitly specify `headers_in_file: true`, otherwise you will get an extra hash which includes the header data.
|
|
513
|
+
### Change 3 (Bug Fix): `user_provided_headers:` now implies `headers_in_file: false`
|
|
436
514
|
|
|
437
|
-
|
|
515
|
+
([Issue #282](https://github.com/tilo/smarter_csv/issues/282).)
|
|
438
516
|
|
|
439
|
-
|
|
517
|
+
This one fixes a quiet footgun: if you passed `user_provided_headers:` and the file had **no** header row, SmarterCSV used to treat the first data row as a header and silently drop it. As of 1.13.0, setting `user_provided_headers:` automatically sets `headers_in_file: false`, so the first row is treated as data — which is what you almost always wanted.
|
|
518
|
+
|
|
519
|
+
#### You are NOT affected if:
|
|
520
|
+
- You don't use `user_provided_headers:`.
|
|
521
|
+
- You use `user_provided_headers:` with files that have no header line (the common case — that's what the option is for).
|
|
522
|
+
|
|
523
|
+
#### You are affected if:
|
|
524
|
+
- You pass `user_provided_headers:` **and** your CSV file **does** have a header line that needs to be skipped.
|
|
525
|
+
|
|
526
|
+
#### How to migrate
|
|
527
|
+
|
|
528
|
+
If your file has a header line **and** you're overriding it with `user_provided_headers:`, add `headers_in_file: true` explicitly so the existing header line is skipped:
|
|
529
|
+
|
|
530
|
+
```ruby
|
|
531
|
+
# File has a header row that you want to override:
|
|
532
|
+
SmarterCSV.process(
|
|
533
|
+
'file.csv',
|
|
534
|
+
user_provided_headers: [:id, :name, :email],
|
|
535
|
+
headers_in_file: true, # skip the header row in the file
|
|
536
|
+
)
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
Without `headers_in_file: true`, you will get an extra hash at the top of your results containing the file's original header strings as values — that's the symptom to look for.
|
|
540
|
+
|
|
541
|
+
### Documentation
|
|
542
|
+
|
|
543
|
+
* Improved documentation for handling numeric columns with leading zeroes (e.g. ZIP codes). Use `convert_values_to_numeric: { except: [:zip] }` to keep that column as a string. (Available since 1.10.x.) Thanks to David Moles, [issue #151](https://github.com/tilo/smarter_csv/issues/151).
|
|
440
544
|
|
|
441
545
|
## 1.12.1 (2024-07-10)
|
|
442
546
|
* Improved column separator detection by ignoring quoted sections [#276](https://github.com/tilo/smarter_csv/pull/276) (thanks to Nicolas Castellanos)
|
|
@@ -490,23 +594,66 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
|
|
|
490
594
|
## 1.10.1 (2024-01-07)
|
|
491
595
|
* fix incorrect warning about UTF-8 (issue #268, thanks hirowatari)
|
|
492
596
|
|
|
493
|
-
## 1.10.0 (2023-12-31)
|
|
597
|
+
## 1.10.0 (2023-12-31) — Behavior changes for `user_provided_headers:` and duplicate headers
|
|
494
598
|
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
502
|
-
|
|
503
|
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
599
|
+
Two small behavior changes plus performance and memory improvements. Most users are not affected. Read on for who needs to look closer.
|
|
600
|
+
|
|
601
|
+
### Change 1 (Improvement): `user_provided_headers:` is now taken literally (no transformations, no duplicates)
|
|
602
|
+
|
|
603
|
+
**In short — if you use `user_provided_headers:`, write the list in the exact form you want the result keys (all symbols *or* all strings), and make sure there are no duplicates. For most users this is already what you were doing.**
|
|
604
|
+
|
|
605
|
+
Before 1.10.0, any list you passed as `user_provided_headers:` was run through the same header pipeline as in-file headers — `strings_as_keys` could flip strings to symbols, etc. Duplicates were silently accepted. As of 1.10.0, the list is used **literally**: no transformations are applied, and duplicates raise `SmarterCSV::DuplicateHeaders`.
|
|
606
|
+
|
|
607
|
+
This is almost always what people actually wanted: if you're explicitly listing the headers, you want *those* headers, not a transformed version of them.
|
|
608
|
+
|
|
609
|
+
#### You are NOT affected if:
|
|
610
|
+
- You don't use `user_provided_headers:`.
|
|
611
|
+
- Your `user_provided_headers:` list is already in the form you want (all symbols *or* all strings, no duplicates).
|
|
612
|
+
In these cases, you can just upgrade without any code changes.
|
|
613
|
+
|
|
614
|
+
#### You are affected if either is true:
|
|
615
|
+
- You pass `user_provided_headers:` **and** relied on `strings_as_keys:` to flip between string/symbol keys.
|
|
616
|
+
- You pass `user_provided_headers:` **and** had accidental duplicates in the list that the library used to silently accept (this case would be very odd).
|
|
617
|
+
|
|
618
|
+
#### How to migrate
|
|
619
|
+
|
|
620
|
+
```ruby
|
|
621
|
+
# If you want symbol keys, write symbols directly:
|
|
622
|
+
SmarterCSV.process('file.csv', user_provided_headers: [:id, :name, :email])
|
|
623
|
+
|
|
624
|
+
# If you want string keys, write strings directly:
|
|
625
|
+
SmarterCSV.process('file.csv', user_provided_headers: ['id', 'name', 'email'])
|
|
626
|
+
```
|
|
627
|
+
|
|
628
|
+
Drop any `strings_as_keys:` option you used alongside `user_provided_headers:` — it's ignored in that case now.
|
|
629
|
+
|
|
630
|
+
If you see `SmarterCSV::DuplicateHeaders` after upgrading, your list has a repeat in it — fix the duplicate and you're done.
|
|
631
|
+
|
|
632
|
+
### Change 2 (Improvement): duplicate headers in the CSV file are now auto-disambiguated
|
|
633
|
+
|
|
634
|
+
**In short — if your input CSV has duplicate column headers, they now Just Work instead of colliding. If your files don't have duplicate headers, you are not affected.**
|
|
635
|
+
|
|
636
|
+
`duplicate_header_suffix:` used to default to `nil`. Now it defaults to `''` (empty string), which means a file with headers like `name,name,name` becomes keys `name`, `name2`, `name3` automatically — no more silently overwriting earlier columns.
|
|
637
|
+
|
|
638
|
+
#### You are affected if:
|
|
639
|
+
- You depended on SmarterCSV raising or failing fast when a CSV has duplicate headers (e.g. as a data-quality check at the boundary of your pipeline).
|
|
640
|
+
|
|
641
|
+
#### You are NOT affected if:
|
|
642
|
+
- Your CSVs don't have duplicate headers.
|
|
643
|
+
- You already explicitly set `duplicate_header_suffix:` in your code.
|
|
644
|
+
|
|
645
|
+
#### How to migrate
|
|
646
|
+
|
|
647
|
+
If you want the old strict behavior, set the option explicitly to `nil`:
|
|
648
|
+
|
|
649
|
+
```ruby
|
|
650
|
+
SmarterCSV.process('file.csv', duplicate_header_suffix: nil)
|
|
651
|
+
```
|
|
652
|
+
|
|
653
|
+
### Other
|
|
654
|
+
|
|
655
|
+
* Performance and memory improvements
|
|
656
|
+
* Internal code refactor
|
|
510
657
|
|
|
511
658
|
## 1.9.3 (2023-12-16)
|
|
512
659
|
* raise SmarterCSV::IncorrectOption when `user_provided_headers` are empty
|
|
@@ -644,13 +791,40 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
|
|
|
644
791
|
* fixed buggy behavior when using `remove_empty_values: false` (issue #168)
|
|
645
792
|
* fixed Ruby 3.0 deprecation
|
|
646
793
|
|
|
647
|
-
## 1.3.0 (2022-02-06)
|
|
648
|
-
|
|
649
|
-
|
|
794
|
+
## 1.3.0 (2022-02-06)
|
|
795
|
+
|
|
796
|
+
### (Bug Fix) Small change for users of the `key_mapping:` option (issue #181)
|
|
797
|
+
|
|
798
|
+
**In short — if you use `key_mapping:`, this is a one-character fix per mapping. If you don't use `key_mapping:`, you are not affected.**
|
|
799
|
+
|
|
800
|
+
Previously, the values in a `key_mapping:` hash were silently coerced to symbols, so `'new_name'` and `:new_name` produced the same result key. As of 1.3.0, the values are used as-is — strings stay strings, symbols stay symbols. This gives you direct control over whether the result hashes use string or symbol keys.
|
|
801
|
+
|
|
802
|
+
#### You are NOT affected if any of these are true:
|
|
803
|
+
- You don't use `key_mapping:`.
|
|
804
|
+
- Your `key_mapping:` already uses symbol values (e.g. `:new_name`).
|
|
805
|
+
- Your downstream code already reads result hashes with string keys.
|
|
806
|
+
In these cases, you can just upgrade without any code changes.
|
|
807
|
+
|
|
808
|
+
#### You are affected if all three are true:
|
|
809
|
+
- You pass `key_mapping:` to `SmarterCSV.process` (or `process_csv` in older code), **and**
|
|
810
|
+
- The values in that hash are strings (e.g. `'new_name'`, not `:new_name`), **and**
|
|
811
|
+
- Your downstream code reads the result hashes with symbol keys (e.g. `row[:new_name]`).
|
|
812
|
+
This needs a small code-change
|
|
813
|
+
|
|
814
|
+
#### How to migrate
|
|
815
|
+
|
|
816
|
+
Pick whichever is the smaller diff in your code:
|
|
817
|
+
|
|
818
|
+
```ruby
|
|
819
|
+
# Option A — keep symbol keys in the result (one extra colon per line):
|
|
820
|
+
SmarterCSV.process('file.csv', key_mapping: { 'Old Header' => :new_name })
|
|
821
|
+
# ^ add the colon
|
|
822
|
+
|
|
823
|
+
# Option B — switch your reads to string keys:
|
|
824
|
+
row['new_name'] # instead of row[:new_name]
|
|
825
|
+
```
|
|
650
826
|
|
|
651
|
-
|
|
652
|
-
* either use symbols in the `key_mapping` hash
|
|
653
|
-
* or change the expected keys from symbols to strings
|
|
827
|
+
That's the whole migration. Everything else in 1.3.0 is source-compatible with 1.2.x.
|
|
654
828
|
|
|
655
829
|
## 1.2.9 (2021-11-22) (PULLED)
|
|
656
830
|
* fix bug for key_mappings (issue #181)
|
|
@@ -677,7 +851,7 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
|
|
|
677
851
|
* bugfix (thanks to Joshua Smith for reporting)
|
|
678
852
|
|
|
679
853
|
## 1.2.0 (2018-01-20)
|
|
680
|
-
* add default validation that a header can only appear once
|
|
854
|
+
* add default validation that a header can only appear once; raises `SmarterCSV::DuplicateHeaders` when it doesn't
|
|
681
855
|
* add option `required_headers`
|
|
682
856
|
|
|
683
857
|
## 1.1.5 (2017-11-05)
|
data/README.md
CHANGED
|
@@ -1,7 +1,10 @@
|
|
|
1
1
|
|
|
2
2
|
# SmarterCSV
|
|
3
3
|
|
|
4
|
-
 [](https://codecov.io/gh/tilo/smarter_csv) [](https://rubygems.org/gems/smarter_csv) [](https://rubygems.org/gems/smarter_csv) [](https://www.ruby-toolbox.com/projects/smarter_csv)
|
|
4
|
+
 [](https://codecov.io/gh/tilo/smarter_csv) [](https://rubygems.org/gems/smarter_csv) [](https://rubygems.org/gems/smarter_csv) [](https://www.ruby-toolbox.com/projects/smarter_csv) [](https://tilo.github.io/smarter_csv/upgrade_wizard.html)
|
|
5
|
+
|
|
6
|
+
> [!TIP]
|
|
7
|
+
> **Upgrading from an older version?** Use the [SmarterCSV Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) to walk through what (if anything) you need to change for your specific version. Most hops do not require any changes.
|
|
5
8
|
|
|
6
9
|
SmarterCSV is a high-performance CSV ingestion and generation for Ruby, focused on fast end-to-end CSV ingestion of real-world data — no silent failures, no surprises, not just tokenization.
|
|
7
10
|
|
data/UPGRADING.md
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
1
|
+
# Upgrading SmarterCSV
|
|
2
|
+
|
|
3
|
+
> [!TIP]
|
|
4
|
+
> Prefer the interactive [Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) for a guided walk-through with Yes/No questions.
|
|
5
|
+
> This document is auto-generated from `CHANGELOG.md` and `docs/upgrade_path.json` by `bin/gen-upgrading-md`.
|
|
6
|
+
|
|
7
|
+
## How to use this guide
|
|
8
|
+
|
|
9
|
+
1. Find your current version below. **Newest releases appear first; older ones further down.**
|
|
10
|
+
2. Read each series section between yours and the latest at the top. For each one, check whether any **If** conditions apply to your code.
|
|
11
|
+
3. If none apply, you can upgrade all the way through that series with no code changes.
|
|
12
|
+
|
|
13
|
+
Prefer an interactive walk-through? The [Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) asks one question at a time and only shows the migration steps that apply to your code.
|
|
14
|
+
|
|
15
|
+
**Latest release:** `1.17.3` (in the `1.17.x` series).
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## 1.17.x — latest series
|
|
20
|
+
|
|
21
|
+
**Versions in this series:**
|
|
22
|
+
[1.17.0, 1.17.1, 1.17.2, 1.17.3]
|
|
23
|
+
|
|
24
|
+
**Latest release:** `1.17.3`
|
|
25
|
+
|
|
26
|
+
Update your Gemfile to:
|
|
27
|
+
|
|
28
|
+
```ruby
|
|
29
|
+
gem 'smarter_csv', '~> 1.17.0'
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Then run `bundle update smarter_csv`.
|
|
33
|
+
|
|
34
|
+
## Series 1.16 → 1.17
|
|
35
|
+
|
|
36
|
+
**Coming from any 1.16 version:**
|
|
37
|
+
[1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6]
|
|
38
|
+
|
|
39
|
+
> ⚠️ **In-series notes** worth checking if you're upgrading through one of these:
|
|
40
|
+
> - **1.16.1:** **Fibers:** `SmarterCSV.errors` uses `Thread.current` for storage, which is **shared across all fibers running in the same thread**. If you process CSV files concurrently in fibers (e.g. with `Async`, `Falcon`, or manual `Fiber` scheduling), `SmarterCSV.errors` may return stale or wrong results. **Use `SmarterCSV::Reader` directly** — errors are scoped to the reader instance and are always correct regardless of fiber context.
|
|
41
|
+
> - **1.16.2:** If your code references auto-generated keys for blank headers, update those to use the absolute column position.
|
|
42
|
+
|
|
43
|
+
**Upgrading to 1.17.x** (latest: `1.17.3`): you can upgrade all the way — no code changes needed.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Series 1.15 → 1.16
|
|
48
|
+
|
|
49
|
+
**Coming from any 1.15 version:**
|
|
50
|
+
[1.15.0, 1.15.1, 1.15.2, 1.15.3]
|
|
51
|
+
|
|
52
|
+
**Upgrading to 1.16.x** (latest: `1.16.6`):
|
|
53
|
+
|
|
54
|
+
- **If** your CSV files contain stray `"` characters in the middle of unquoted fields:
|
|
55
|
+
→ verify the output is now correct — 1.16.0 treats them as literal (RFC 4180). Output gets more correct for almost everyone; the temporary escape hatch `quote_boundary: :legacy` exists if your downstream code depended on the previously-corrupted output (not recommended for new code).
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Series 1.14 → 1.15
|
|
60
|
+
|
|
61
|
+
**Coming from any 1.14 version:**
|
|
62
|
+
[1.14.0, 1.14.1, 1.14.2, 1.14.3, 1.14.4]
|
|
63
|
+
|
|
64
|
+
**Upgrading to 1.15.x** (latest: `1.15.3`):
|
|
65
|
+
|
|
66
|
+
- **If** your Ruby version is 2.5 or older:
|
|
67
|
+
→ upgrade Ruby to 2.6 or newer — 1.15.0 dropped support for Ruby 2.5.
|
|
68
|
+
|
|
69
|
+
The migration is small: Ruby 2.5 reached end-of-life in March 2021 (no more security fixes anywhere), and Ruby 2.5 → 2.6 is API-compatible for nearly all code. Update your `.ruby-version` or the `ruby` line in your `Gemfile`, run `bundle install`, and you're done. Most users jump straight to a current Ruby (3.x).
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Series 1.13 → 1.14
|
|
74
|
+
|
|
75
|
+
**Coming from any 1.13 version:**
|
|
76
|
+
[1.13.0, 1.13.1]
|
|
77
|
+
|
|
78
|
+
**Upgrading to 1.14.x** (latest: `1.14.4`): you can upgrade all the way — no code changes needed.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Series 1.12 → 1.13
|
|
83
|
+
|
|
84
|
+
**Coming from any 1.12 version:**
|
|
85
|
+
[1.12.0, 1.12.1]
|
|
86
|
+
|
|
87
|
+
**Upgrading to 1.13.x** (latest: `1.13.1`):
|
|
88
|
+
|
|
89
|
+
- **If** your CSV rows can have more columns than the header AND your code expects only header-listed keys:
|
|
90
|
+
→ filter out the new auto-generated `:column_N` keys, or pass `strict: true` to raise on extras — 1.13.0 keeps extra columns instead of dropping them silently.
|
|
91
|
+
|
|
92
|
+
- **If** any of your input files might have unbalanced quotes:
|
|
93
|
+
→ wrap calls in `rescue SmarterCSV::MalformedCSV` — 1.13.0 now raises instead of producing garbled output.
|
|
94
|
+
|
|
95
|
+
- **If** you pass `user_provided_headers:` AND your file has a header line that should be skipped:
|
|
96
|
+
→ also pass `headers_in_file: true` explicitly — 1.13.0 made `user_provided_headers:` imply `headers_in_file: false` by default.
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Series 1.11 → 1.12
|
|
101
|
+
|
|
102
|
+
**Coming from any 1.11 version:**
|
|
103
|
+
[1.11.0, 1.11.2]
|
|
104
|
+
|
|
105
|
+
**Upgrading to 1.12.x** (latest: `1.12.1`):
|
|
106
|
+
|
|
107
|
+
- **If** you call `SmarterCSV.process` and need to inspect headers / warnings / errors after parsing:
|
|
108
|
+
→ switch to using `reader = SmarterCSV::Reader.new(file, options); reader.process`.
|
|
109
|
+
|
|
110
|
+
Version 1.11 class-level accessors `SmarterCSV.headers` / `SmarterCSV.raw_header` are gone in 1.12.0 — if you used those, see the next question.
|
|
111
|
+
|
|
112
|
+
- **If** you call `SmarterCSV.raw_headers` or `SmarterCSV.headers`:
|
|
113
|
+
→ switch to instantiating `SmarterCSV::Reader` and reading `reader.raw_headers` / `reader.headers` — 1.12.0 moved these off the class-level API.
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Series 1.10 → 1.11
|
|
118
|
+
|
|
119
|
+
**Coming from any 1.10 version:**
|
|
120
|
+
[1.10.0, 1.10.1, 1.10.2, 1.10.3]
|
|
121
|
+
|
|
122
|
+
**Upgrading to 1.11.x** (latest: `1.11.2`): you can upgrade all the way — no code changes needed.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Series 1.9 → 1.10
|
|
127
|
+
|
|
128
|
+
**Coming from any 1.9 version:**
|
|
129
|
+
[1.9.0, 1.9.2, 1.9.3]
|
|
130
|
+
|
|
131
|
+
**Upgrading to 1.10.x** (latest: `1.10.3`):
|
|
132
|
+
|
|
133
|
+
- **If** you use `user_provided_headers:`:
|
|
134
|
+
→ write the list in the exact final form you want (all symbols *or* all strings) — 1.10.0 stopped applying additional transformations. `strings_as_keys:` is ignored alongside it.
|
|
135
|
+
|
|
136
|
+
- **If** your `user_provided_headers:` list contains duplicate entries:
|
|
137
|
+
→ remove the duplicates — 1.10.0 raises `SmarterCSV::DuplicateHeaders`.
|
|
138
|
+
|
|
139
|
+
- **If** you depended on duplicate-header detection failing fast:
|
|
140
|
+
→ pass `duplicate_header_suffix: nil` explicitly — 1.10.0 changed the default to `''` (it auto-disambiguates duplicates as `name`, `name2`, ...).
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Series 1.8 → 1.9
|
|
145
|
+
|
|
146
|
+
**Coming from any 1.8 version:**
|
|
147
|
+
[1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5]
|
|
148
|
+
|
|
149
|
+
**Upgrading to 1.9.x** (latest: `1.9.3`):
|
|
150
|
+
|
|
151
|
+
- **If** you rescue `SmarterCSV::MissingHeaders`:
|
|
152
|
+
→ rename it to `SmarterCSV::MissingKeys` — 1.9.0 renamed the error.
|
|
153
|
+
|
|
154
|
+
- **If** you use `key_mapping:` and want to allow some mapped headers to be missing:
|
|
155
|
+
→ pass `silence_missing_keys: true` — 1.9.0 now raises `MissingKeys` for unmapped headers (this makes them optional).
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Series 1.7 → 1.8
|
|
160
|
+
|
|
161
|
+
**Coming from any 1.7 version:**
|
|
162
|
+
[1.7.0.pre1, 1.7.0.pre5, 1.7.1, 1.7.2, 1.7.3, 1.7.4]
|
|
163
|
+
|
|
164
|
+
**Upgrading to 1.8.x** (latest: `1.8.5`):
|
|
165
|
+
|
|
166
|
+
- **If** you accept CSV files from users or other external sources where the column separator might not be a comma (e.g. locale-specific exports using `;` or tab), or where a file might have only one column:
|
|
167
|
+
→ wrap your `SmarterCSV.process` calls in `rescue SmarterCSV::NoColSepDetected` — 1.8.0 made `col_sep: :auto` and `row_sep: :auto` the new defaults, but in rare cases it raises when separators could not be found.
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Series 1.6 → 1.7
|
|
172
|
+
|
|
173
|
+
**Coming from any 1.6 version:**
|
|
174
|
+
[1.6.0, 1.6.1]
|
|
175
|
+
|
|
176
|
+
**Upgrading to 1.7.x** (latest: `1.7.4`): you can upgrade all the way — no code changes needed.
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Series 1.5 → 1.6
|
|
181
|
+
|
|
182
|
+
**Coming from any 1.5 version:**
|
|
183
|
+
[1.5.0, 1.5.1, 1.5.2]
|
|
184
|
+
|
|
185
|
+
**Upgrading to 1.6.x** (latest: `1.6.1`):
|
|
186
|
+
|
|
187
|
+
- **If** you rescue an exception when `key_mapping:` has an unused key:
|
|
188
|
+
→ remove that rescue clause — 1.6.1 changed this from an exception to a warning.
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## Series 1.4 → 1.5
|
|
193
|
+
|
|
194
|
+
**Coming from any 1.4 version:**
|
|
195
|
+
[1.4.0, 1.4.2]
|
|
196
|
+
|
|
197
|
+
**Upgrading to 1.5.x** (latest: `1.5.2`):
|
|
198
|
+
|
|
199
|
+
- **If** you relied on lines starting with `#` being treated as comments:
|
|
200
|
+
→ pass `comment_regexp: /\A#/` explicitly — 1.5.0 changed the default to `nil`.
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Series 1.3 → 1.4
|
|
205
|
+
|
|
206
|
+
**Coming from any 1.3 version:**
|
|
207
|
+
[1.3.0]
|
|
208
|
+
|
|
209
|
+
**Upgrading to 1.4.x** (latest: `1.4.2`): you can upgrade all the way — no code changes needed.
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Series 1.2 → 1.3
|
|
214
|
+
|
|
215
|
+
**Coming from any 1.2 version:**
|
|
216
|
+
[1.2.0, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8]
|
|
217
|
+
|
|
218
|
+
**Upgrading to 1.3.x** (latest: `1.3.0`):
|
|
219
|
+
|
|
220
|
+
- **If** you use `key_mapping:`:
|
|
221
|
+
→ switch hash values to symbols (or update downstream reads to use string keys) — 1.3.0 stopped silently coercing values to symbols.
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## Series 1.1 → 1.2
|
|
226
|
+
|
|
227
|
+
**Coming from any 1.1 version:**
|
|
228
|
+
[1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5]
|
|
229
|
+
|
|
230
|
+
**Upgrading to 1.2.x** (latest: `1.2.8`):
|
|
231
|
+
|
|
232
|
+
- **If** your CSV files have duplicate header names:
|
|
233
|
+
→ rename the duplicates, or be ready to rescue `SmarterCSV::DuplicateHeaders` — 1.2.0 added default validation that each header appears only once and raises this exception when it doesn't.
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
## Series 1.0 → 1.1
|
|
238
|
+
|
|
239
|
+
**Coming from any 1.0 version:**
|
|
240
|
+
[1.0.0.pre1, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.14, 1.0.15, 1.0.16, 1.0.17, 1.0.18, 1.0.19]
|
|
241
|
+
|
|
242
|
+
**Upgrading to 1.1.x** (latest: `1.1.5`):
|
|
243
|
+
|
|
244
|
+
- **If** you set `headers_in_file: false`:
|
|
245
|
+
→ also provide `user_provided_headers:` — 1.1.0 now raises an error if you set the former without the latter.
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
Questions? Open an issue: <https://github.com/tilo/smarter_csv/issues>.
|
data/docs/.nojekyll
ADDED
|
File without changes
|