smarter_csv 1.17.2 → 1.17.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2e665f0dc98db44950aa9cbb2cac430068e91df8886062068413dfbcefc74fc3
4
- data.tar.gz: def43fb66886b16ec13bd429b4fd6923b09aa1a01757a696390a38c18b59fa31
3
+ metadata.gz: ec50e8539c6872f9c86c25eabc2982e39846ad07dc5a21021fc687c7661f8084
4
+ data.tar.gz: 977ce04d8dd225b6042ea03ad0c174305f3ea122340fad052e2c2ada440d6400
5
5
  SHA512:
6
- metadata.gz: 2fb7793ed4eca64cfef1f7dd82a417b44988832280b373c1748213d9f7c879cd0a2d17c4e3b72c82be6acedb01b0fec26b70e6daaefb645ee2c3bf64b7aedcd8
7
- data.tar.gz: a0b8842d5a69d8526af81d4e2a64c31fd6a54d6d610c5ce0dbb16298dfd03c3d296546c049a4a76a710908390ec1ea1739bc7530b1ab652c7ead1ceaa02b431d
6
+ metadata.gz: 0452dc7f15ab31b0cdfad83ca718e17e6456cf6c9826d177e606c5924f3ec72a155c86ee6f9f938540fe3b2ed8f694a981c95cf775b5a38d7f7e44318bc453a3
7
+ data.tar.gz: c1c9732d6d4393fb2ffa995f0c7bb73cd60f566132d487a86007f8a5a623257365c2324ed894a5b38d681df7cfab67069d9fa2e61fd525ba51675954ddadad7a
data/CHANGELOG.md CHANGED
@@ -1,11 +1,40 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ > [!TIP]
5
+ > **Upgrading?** The [SmarterCSV Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) walks you through what (if anything) you need to change for your specific version. Most hops do not require any changes.
6
+
7
+ ## 1.17.3 (2026-05-26)
8
+
9
+ RSpec tests: **2,274→ 2,277** (+3 tests)
10
+
11
+ * No functional changes
12
+ * added 3 test cases
13
+
14
+ ### Improvements
15
+ * DRY-up C-code
16
+ * no performance changes on the C-path
17
+
18
+ ### Performance
19
+ * performance improvement on the Ruby-path
20
+
21
+ | File | RB-path |
22
+ |-----------------------------------|--------------|
23
+ | PEOPLE_IMPORT_B / PEOPLE_IMPORT_C | 13.5% faster |
24
+ | tab_separated_60k | 13.2% faster |
25
+ | sample_100k | 10.3% faster |
26
+ | multi_char_separator | 9.0% faster |
27
+ | utf8_multibyte | 7.1% faster |
28
+ | many_empty_fields | 6.7% faster |
29
+ | PEOPLE_IMPORT_NC | 5.2% faster |
30
+ | sensor_data | 4.5% faster |
31
+
32
+
4
33
  ## 1.17.2 (2026-05-21)
5
34
 
6
35
  RSpec tests: **2,220→ 2,274** (+54 tests)
7
36
 
8
- ### Bug Fix
37
+ ### Bug Fixes
9
38
 
10
39
  - fixed [Issue #334](https://github.com/tilo/smarter_csv/issues/334) with escaped double quote followed by comma. Thanks to [conorg](https://github.com/conorg)
11
40
  - fixed bug when using `headers: { except: }`
@@ -73,6 +102,16 @@ Measured against 1.16.4 (Apple M4, Ruby 3.4.7):
73
102
 
74
103
  Per-file breakdown: [`docs/releases/1.17.0/performance_notes.md`](docs/releases/1.17.0/performance_notes.md).
75
104
 
105
+ ## 1.16.6 (2026-05-21)
106
+
107
+ RSpec tests: **1,467 → 1,591** (+124 tests)
108
+
109
+ ### Bug Fixes
110
+
111
+ - fixed [Issue #334](https://github.com/tilo/smarter_csv/issues/334) with escaped double quote followed by comma. Thanks to [conorg](https://github.com/conorg)
112
+ - fixed bug when using `headers: { except: }`
113
+ - added more tests
114
+
76
115
  ## 1.16.5 (2026-05-17)
77
116
 
78
117
  ### Bug Fix
@@ -164,23 +203,42 @@ RSpec tests: **1,247 → 1,410** (+163 tests)
164
203
 
165
204
  * Added 163 tests covering new features and corner cases
166
205
 
167
- ## 1.16.0 (2026-03-12) — Minor Breaking Change
206
+ ## 1.16.0 (2026-03-12) — improved RFC 4180 quote handling, new APIs, large performance gains
168
207
 
169
208
  [Full details](docs/releases/1.16.0/changes.md) · [Benchmarks](docs/releases/1.16.0/benchmarks.md) · [Performance notes](docs/releases/1.16.0/performance_notes.md)
170
209
 
171
210
  RSpec tests: **714 → 1,247** (+533 tests)
172
211
 
173
- ### Minor Breaking Change
212
+ ### (Bug Fix) `quote_boundary:` — new default for how mid-field quotes are handled
213
+
214
+ **In short — most users will see incorrect output silently improve. If your CSV files don't contain stray `"` characters in the middle of unquoted fields, you are not affected. If they do, the new default produces correct output where the old default produced corrupted output.**
174
215
 
175
- New option **`quote_boundary:`**
176
- * defaults to `:standard`**: quotes are now only recognized as field delimiters at field boundaries;
177
- mid-field quotes are treated as literal characters.
216
+ A new option `quote_boundary:` controls when a `"` character marks the start or end of a quoted field versus when it's a literal character inside the field.
178
217
 
179
- This aligns SmarterCSV with RFC 4180 and other CSV libraries. In practice, mid-field quotes
180
- were already producing silently corrupt output in previous versions so most users will see
181
- correct behavior improve, not regress.
218
+ * `quote_boundary: :standard` (the new default) quotes are only recognized as field delimiters at field boundaries (start of a field, or immediately before `col_sep` / end of line). A `"` that appears in the middle of an unquoted field is treated as a literal character. This matches RFC 4180 and Ruby's standard `CSV` library.
219
+ * `quote_boundary: :legacy` — **not recommended.** Restores the pre-1.16.0 behavior, where any `"` could open a quoted region. This is the behavior that produced silently corrupt output on files with stray mid-field quotes; it exists only as an escape hatch for code that built workarounds on top of the buggy output. New code should never use this.
182
220
 
183
- * Use `quote_boundary: :legacy` only in exceptional cases to restore previous behavior. See [Parsing Strategy](../../parsing_strategy.md).
221
+ In practice, the old `:legacy` behavior was silently producing corrupt output whenever a CSV file contained a stray mid-field `"` — so for most users this change makes output **correct** where it was wrong before, not the other way around.
222
+
223
+ #### You are NOT affected if:
224
+ - Your CSV files don't contain any `"` characters mid-field (the common case).
225
+ - Your CSV files quote fields cleanly per RFC 4180 (well-formed `"..."` around each quoted field, no stray quotes inside other fields).
226
+
227
+ #### You are affected if:
228
+ - Your CSV files contain stray `"` characters in the middle of unquoted fields (e.g. `5'6"`, `Joe "the Hat" Smith` without surrounding quotes), **and** you had downstream code that compensated for the previously-corrupted parse output.
229
+
230
+ #### How to migrate
231
+
232
+ For almost everyone: do nothing. Upgrade and observe that the output is the same or more correct.
233
+
234
+ The `quote_boundary: :legacy` option exists only as a short-term escape hatch — **we do not advise using it**, because it re-enables the buggy parse behavior that motivated this change. If your code built workarounds on top of the previously-corrupted output, the right fix is to remove those workarounds and rely on the new `:standard` behavior, not to opt back into the bug:
235
+
236
+ ```ruby
237
+ # Only as a temporary escape hatch — not recommended for new code:
238
+ SmarterCSV.process('file.csv', quote_boundary: :legacy)
239
+ ```
240
+
241
+ See [Parsing Strategy](docs/parsing_strategy.md) for details on how each mode handles edge cases.
184
242
 
185
243
  ### Performance
186
244
 
@@ -399,44 +457,90 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
399
457
  ## 1.13.1 (2024-12-12)
400
458
  * fix bug with SmarterCSV.generate with `force_quotes: true` ([issue 294](https://github.com/tilo/smarter_csv/issues/294))
401
459
 
402
- ## 1.13.0 (2024-11-06) POTENTIALLY BREAKING
403
-
404
- CHANGED DEFAULT BEHAVIOR
405
- ========================
406
- The changes are to improve robustness and to reduce the risk of data loss
460
+ ## 1.13.0 (2024-11-06) Three default-behavior changes that prevent silent data loss
407
461
 
408
- * implementing auto-detection of extra columns (thanks to James Fenley)
462
+ This release flipped three defaults so that SmarterCSV no longer silently loses data in three specific edge cases. For most users this is a quiet improvement — files that used to lose rows or columns silently now parse correctly with no code changes. Each change below has a short "affected if / not affected if" so you can skip past it quickly.
409
463
 
410
- * improved handling of unbalanced quote_char in input ([issue 288](https://github.com/tilo/smarter_csv/issues/288)) thanks to Simon Rentzke), and ([issue 283](https://github.com/tilo/smarter_csv/issues/283)) thanks to James Fenley, Randall B, Matthew Kennedy)
411
- -> SmarterCSV will now raise `SmarterCSV::MalformedCSV` for unbalanced quote_char.
464
+ The motivation for all three changes is the same: data loss should never be silent. Either parse it correctly, or raise loudly.
412
465
 
413
- * bugfix / improved handling of extra columns in input data ([issue 284](https://github.com/tilo/smarter_csv/issues/284)) (thanks to James Fenley)
414
-
415
- * previous behavior:
416
- when a CSV row had more columns than listed in the header, the additional columns were ignored
466
+ ### Change 1 (Bug Fix): extra columns in a row are auto-named instead of dropped
417
467
 
418
- * new behavior:
419
- * new default behavior is to auto-generate additional headers, e.g. :column_7, :column_8, etc
420
- * you can set option `:strict` to true in order to get a `SmarterCSV::MalformedCSV` exception instead
468
+ (Thanks to James Fenley, [issue #284](https://github.com/tilo/smarter_csv/issues/284).)
421
469
 
422
- * setting `user_provided_headers` now implies `headers_in_file: false` ([issue 282](https://github.com/tilo/smarter_csv/issues/282))
423
-
424
- The option `user_provided_headers` can be used to specify headers when there are none in the input, OR to completely override headers that are in the input (file).
470
+ If a CSV row had more columns than the header (e.g. header has 6 columns, a row has 8), the extras used to be **silently dropped**. As of 1.13.0 they survive as `:column_7`, `:column_8`, etc.
471
+
472
+ #### You are NOT affected if:
473
+ - Your CSV files have exactly as many columns per row as headers (the common case).
474
+
475
+ #### You are affected if:
476
+ - Your CSV files have rows with extra columns past the header **and** your code expects only the header-listed keys.
477
+
478
+ #### How to migrate
479
+
480
+ If you want the old "ignore extras" behavior, drop the extra keys yourself. If you want loud failure instead, use the strict mode:
481
+
482
+ ```ruby
483
+ # Raise SmarterCSV::MalformedCSV on extra columns:
484
+ SmarterCSV.process('file.csv', strict: true)
485
+ ```
486
+
487
+ (In 1.16.0 this option was renamed to `missing_headers: :raise`, but `strict: true` still works.)
488
+
489
+ ### Change 2 (Bug Fix): unbalanced quotes raise `MalformedCSV` instead of producing garbage
490
+
491
+ (Thanks to Simon Rentzke, James Fenley, Randall B, and Matthew Kennedy. Issues [#283](https://github.com/tilo/smarter_csv/issues/283), [#288](https://github.com/tilo/smarter_csv/issues/288).)
492
+
493
+ Files with an unbalanced `quote_char` (an opening `"` with no matching close) used to parse to corrupted output. As of 1.13.0 they raise `SmarterCSV::MalformedCSV`.
494
+
495
+ #### You are NOT affected if:
496
+ - Your CSV files have well-formed quotes (the common case).
497
+
498
+ #### You are affected if:
499
+ - Some of your input files have unbalanced quotes and you used to silently live with the garbled output.
500
+
501
+ #### How to migrate
425
502
 
426
- SmarterCSV is now using a safer default behavior.
503
+ If you need to keep processing other files even when one is malformed, rescue the new exception:
427
504
 
428
- * previous behavior:
429
- Setting `user_provided_headers` did not change the default `headers_in_file: true`
430
- If the input had no headers, this would cause the first line to be erroneously treated as a header, and the user could lose the first row of data.
505
+ ```ruby
506
+ begin
507
+ SmarterCSV.process('file.csv')
508
+ rescue SmarterCSV::MalformedCSV => e
509
+ warn "Skipping malformed file: #{e.message}"
510
+ end
511
+ ```
431
512
 
432
- * new behavior:
433
- Setting `user_provided_headers` sets`headers_in_file: false`
434
- a) Improved behavior if there was no header in the input data.
435
- b) If there was a header in the input data, and `user_provided_headers` is used to override the headers in the file, then please explicitly specify `headers_in_file: true`, otherwise you will get an extra hash which includes the header data.
513
+ ### Change 3 (Bug Fix): `user_provided_headers:` now implies `headers_in_file: false`
436
514
 
437
- IF you set `user_provided_headers` and the file has a header, then provide `headers_in_file: true` to avoid getting that extra record.
515
+ ([Issue #282](https://github.com/tilo/smarter_csv/issues/282).)
438
516
 
439
- * improved documentation for handling of numeric columns with leading zeroes, e.g. ZIP codes. ([issue #151](https://github.com/tilo/smarter_csv/issues/151) thanks to David Moles). `convert_values_to_numeric: { except: [:zip] }` will return a string for that column instead (since version 1.10.x)
517
+ This one fixes a quiet footgun: if you passed `user_provided_headers:` and the file had **no** header row, SmarterCSV used to treat the first data row as a header and silently drop it. As of 1.13.0, setting `user_provided_headers:` automatically sets `headers_in_file: false`, so the first row is treated as data which is what you almost always wanted.
518
+
519
+ #### You are NOT affected if:
520
+ - You don't use `user_provided_headers:`.
521
+ - You use `user_provided_headers:` with files that have no header line (the common case — that's what the option is for).
522
+
523
+ #### You are affected if:
524
+ - You pass `user_provided_headers:` **and** your CSV file **does** have a header line that needs to be skipped.
525
+
526
+ #### How to migrate
527
+
528
+ If your file has a header line **and** you're overriding it with `user_provided_headers:`, add `headers_in_file: true` explicitly so the existing header line is skipped:
529
+
530
+ ```ruby
531
+ # File has a header row that you want to override:
532
+ SmarterCSV.process(
533
+ 'file.csv',
534
+ user_provided_headers: [:id, :name, :email],
535
+ headers_in_file: true, # skip the header row in the file
536
+ )
537
+ ```
538
+
539
+ Without `headers_in_file: true`, you will get an extra hash at the top of your results containing the file's original header strings as values — that's the symptom to look for.
540
+
541
+ ### Documentation
542
+
543
+ * Improved documentation for handling numeric columns with leading zeroes (e.g. ZIP codes). Use `convert_values_to_numeric: { except: [:zip] }` to keep that column as a string. (Available since 1.10.x.) Thanks to David Moles, [issue #151](https://github.com/tilo/smarter_csv/issues/151).
440
544
 
441
545
  ## 1.12.1 (2024-07-10)
442
546
  * Improved column separator detection by ignoring quoted sections [#276](https://github.com/tilo/smarter_csv/pull/276) (thanks to Nicolas Castellanos)
@@ -490,23 +594,66 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
490
594
  ## 1.10.1 (2024-01-07)
491
595
  * fix incorrect warning about UTF-8 (issue #268, thanks hirowatari)
492
596
 
493
- ## 1.10.0 (2023-12-31) BREAKING
597
+ ## 1.10.0 (2023-12-31) Behavior changes for `user_provided_headers:` and duplicate headers
494
598
 
495
- * BREAKING CHANGES:
496
-
497
- Changed behavior:
498
- + when `user_provided_headers` are provided:
499
- * if they are not unique, an exception will now be raised
500
- * they are taken "as is", no header transformations can be applied
501
- * when they are given as strings or as symbols, it is assumed that this is the desired format
502
- * the value of the `strings_as_keys` options will be ignored
503
-
504
- + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
505
- * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
506
- * explicitly set this option to `nil` to get the behavior from previous versions.
507
-
508
- * performance and memory improvements
509
- * code refactor
599
+ Two small behavior changes plus performance and memory improvements. Most users are not affected. Read on for who needs to look closer.
600
+
601
+ ### Change 1 (Improvement): `user_provided_headers:` is now taken literally (no transformations, no duplicates)
602
+
603
+ **In short — if you use `user_provided_headers:`, write the list in the exact form you want the result keys (all symbols *or* all strings), and make sure there are no duplicates. For most users this is already what you were doing.**
604
+
605
+ Before 1.10.0, any list you passed as `user_provided_headers:` was run through the same header pipeline as in-file headers — `strings_as_keys` could flip strings to symbols, etc. Duplicates were silently accepted. As of 1.10.0, the list is used **literally**: no transformations are applied, and duplicates raise `SmarterCSV::DuplicateHeaders`.
606
+
607
+ This is almost always what people actually wanted: if you're explicitly listing the headers, you want *those* headers, not a transformed version of them.
608
+
609
+ #### You are NOT affected if:
610
+ - You don't use `user_provided_headers:`.
611
+ - Your `user_provided_headers:` list is already in the form you want (all symbols *or* all strings, no duplicates).
612
+ In these cases, you can just upgrade without any code changes.
613
+
614
+ #### You are affected if either is true:
615
+ - You pass `user_provided_headers:` **and** relied on `strings_as_keys:` to flip between string/symbol keys.
616
+ - You pass `user_provided_headers:` **and** had accidental duplicates in the list that the library used to silently accept (this case would be very odd).
617
+
618
+ #### How to migrate
619
+
620
+ ```ruby
621
+ # If you want symbol keys, write symbols directly:
622
+ SmarterCSV.process('file.csv', user_provided_headers: [:id, :name, :email])
623
+
624
+ # If you want string keys, write strings directly:
625
+ SmarterCSV.process('file.csv', user_provided_headers: ['id', 'name', 'email'])
626
+ ```
627
+
628
+ Drop any `strings_as_keys:` option you used alongside `user_provided_headers:` — it's ignored in that case now.
629
+
630
+ If you see `SmarterCSV::DuplicateHeaders` after upgrading, your list has a repeat in it — fix the duplicate and you're done.
631
+
632
+ ### Change 2 (Improvement): duplicate headers in the CSV file are now auto-disambiguated
633
+
634
+ **In short — if your input CSV has duplicate column headers, they now Just Work instead of colliding. If your files don't have duplicate headers, you are not affected.**
635
+
636
+ `duplicate_header_suffix:` used to default to `nil`. Now it defaults to `''` (empty string), which means a file with headers like `name,name,name` becomes keys `name`, `name2`, `name3` automatically — no more silently overwriting earlier columns.
637
+
638
+ #### You are affected if:
639
+ - You depended on SmarterCSV raising or failing fast when a CSV has duplicate headers (e.g. as a data-quality check at the boundary of your pipeline).
640
+
641
+ #### You are NOT affected if:
642
+ - Your CSVs don't have duplicate headers.
643
+ - You already explicitly set `duplicate_header_suffix:` in your code.
644
+
645
+ #### How to migrate
646
+
647
+ If you want the old strict behavior, set the option explicitly to `nil`:
648
+
649
+ ```ruby
650
+ SmarterCSV.process('file.csv', duplicate_header_suffix: nil)
651
+ ```
652
+
653
+ ### Other
654
+
655
+ * Performance and memory improvements
656
+ * Internal code refactor
510
657
 
511
658
  ## 1.9.3 (2023-12-16)
512
659
  * raise SmarterCSV::IncorrectOption when `user_provided_headers` are empty
@@ -644,13 +791,40 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
644
791
  * fixed buggy behavior when using `remove_empty_values: false` (issue #168)
645
792
  * fixed Ruby 3.0 deprecation
646
793
 
647
- ## 1.3.0 (2022-02-06) Breaking code change if you used `--key_mappings`
648
- * fix bug for key_mappings (issue #181)
649
- The values of the `key_mappings` hash will now be used "as is", and no longer forced to be symbols
794
+ ## 1.3.0 (2022-02-06)
795
+
796
+ ### (Bug Fix) Small change for users of the `key_mapping:` option (issue #181)
797
+
798
+ **In short — if you use `key_mapping:`, this is a one-character fix per mapping. If you don't use `key_mapping:`, you are not affected.**
799
+
800
+ Previously, the values in a `key_mapping:` hash were silently coerced to symbols, so `'new_name'` and `:new_name` produced the same result key. As of 1.3.0, the values are used as-is — strings stay strings, symbols stay symbols. This gives you direct control over whether the result hashes use string or symbol keys.
801
+
802
+ #### You are NOT affected if any of these are true:
803
+ - You don't use `key_mapping:`.
804
+ - Your `key_mapping:` already uses symbol values (e.g. `:new_name`).
805
+ - Your downstream code already reads result hashes with string keys.
806
+ In these cases, you can just upgrade without any code changes.
807
+
808
+ #### You are affected if all three are true:
809
+ - You pass `key_mapping:` to `SmarterCSV.process` (or `process_csv` in older code), **and**
810
+ - The values in that hash are strings (e.g. `'new_name'`, not `:new_name`), **and**
811
+ - Your downstream code reads the result hashes with symbol keys (e.g. `row[:new_name]`).
812
+ This needs a small code-change
813
+
814
+ #### How to migrate
815
+
816
+ Pick whichever is the smaller diff in your code:
817
+
818
+ ```ruby
819
+ # Option A — keep symbol keys in the result (one extra colon per line):
820
+ SmarterCSV.process('file.csv', key_mapping: { 'Old Header' => :new_name })
821
+ # ^ add the colon
822
+
823
+ # Option B — switch your reads to string keys:
824
+ row['new_name'] # instead of row[:new_name]
825
+ ```
650
826
 
651
- **Users with existing code with `--key_mappings` need to change their code** to
652
- * either use symbols in the `key_mapping` hash
653
- * or change the expected keys from symbols to strings
827
+ That's the whole migration. Everything else in 1.3.0 is source-compatible with 1.2.x.
654
828
 
655
829
  ## 1.2.9 (2021-11-22) (PULLED)
656
830
  * fix bug for key_mappings (issue #181)
@@ -677,7 +851,7 @@ _worldcities.csv is [from here](https://simplemaps.com/data/world-cities)_
677
851
  * bugfix (thanks to Joshua Smith for reporting)
678
852
 
679
853
  ## 1.2.0 (2018-01-20)
680
- * add default validation that a header can only appear once
854
+ * add default validation that a header can only appear once; raises `SmarterCSV::DuplicateHeaders` when it doesn't
681
855
  * add option `required_headers`
682
856
 
683
857
  ## 1.1.5 (2017-11-05)
data/README.md CHANGED
@@ -1,7 +1,10 @@
1
1
 
2
2
  # SmarterCSV
3
3
 
4
- ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Downloads](https://img.shields.io/gem/dt/smarter_csv)](https://rubygems.org/gems/smarter_csv) [![RubyGems](https://img.shields.io/badge/RubyGems-smarter__csv-brightgreen?logo=rubygems&logoColor=white)](https://rubygems.org/gems/smarter_csv) [![Ruby Toolbox](https://img.shields.io/badge/Ruby%20Toolbox-smarter__csv-brightgreen)](https://www.ruby-toolbox.com/projects/smarter_csv)
4
+ ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Downloads](https://img.shields.io/gem/dt/smarter_csv)](https://rubygems.org/gems/smarter_csv) [![RubyGems](https://img.shields.io/badge/RubyGems-smarter__csv-brightgreen?logo=rubygems&logoColor=white)](https://rubygems.org/gems/smarter_csv) [![Ruby Toolbox](https://img.shields.io/badge/Ruby%20Toolbox-smarter__csv-brightgreen)](https://www.ruby-toolbox.com/projects/smarter_csv) [![Upgrade Wizard](https://img.shields.io/badge/Upgrade%20Wizard-Try%20it-2c7a2c?style=flat)](https://tilo.github.io/smarter_csv/upgrade_wizard.html)
5
+
6
+ > [!TIP]
7
+ > **Upgrading from an older version?** Use the [SmarterCSV Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) to walk through what (if anything) you need to change for your specific version. Most hops do not require any changes.
5
8
 
6
9
  SmarterCSV is a high-performance CSV ingestion and generation for Ruby, focused on fast end-to-end CSV ingestion of real-world data — no silent failures, no surprises, not just tokenization.
7
10
 
data/UPGRADING.md ADDED
@@ -0,0 +1,251 @@
1
+ # Upgrading SmarterCSV
2
+
3
+ > [!TIP]
4
+ > Prefer the interactive [Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) for a guided walk-through with Yes/No questions.
5
+ > This document is auto-generated from `CHANGELOG.md` and `docs/upgrade_path.json` by `bin/gen-upgrading-md`.
6
+
7
+ ## How to use this guide
8
+
9
+ 1. Find your current version below. **Newest releases appear first; older ones further down.**
10
+ 2. Read each series section between yours and the latest at the top. For each one, check whether any **If** conditions apply to your code.
11
+ 3. If none apply, you can upgrade all the way through that series with no code changes.
12
+
13
+ Prefer an interactive walk-through? The [Upgrade Wizard](https://tilo.github.io/smarter_csv/upgrade_wizard.html) asks one question at a time and only shows the migration steps that apply to your code.
14
+
15
+ **Latest release:** `1.17.3` (in the `1.17.x` series).
16
+
17
+ ---
18
+
19
+ ## 1.17.x — latest series
20
+
21
+ **Versions in this series:**
22
+ [1.17.0, 1.17.1, 1.17.2, 1.17.3]
23
+
24
+ **Latest release:** `1.17.3`
25
+
26
+ Update your Gemfile to:
27
+
28
+ ```ruby
29
+ gem 'smarter_csv', '~> 1.17.0'
30
+ ```
31
+
32
+ Then run `bundle update smarter_csv`.
33
+
34
+ ## Series 1.16 → 1.17
35
+
36
+ **Coming from any 1.16 version:**
37
+ [1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6]
38
+
39
+ > ⚠️ **In-series notes** worth checking if you're upgrading through one of these:
40
+ > - **1.16.1:** **Fibers:** `SmarterCSV.errors` uses `Thread.current` for storage, which is **shared across all fibers running in the same thread**. If you process CSV files concurrently in fibers (e.g. with `Async`, `Falcon`, or manual `Fiber` scheduling), `SmarterCSV.errors` may return stale or wrong results. **Use `SmarterCSV::Reader` directly** — errors are scoped to the reader instance and are always correct regardless of fiber context.
41
+ > - **1.16.2:** If your code references auto-generated keys for blank headers, update those to use the absolute column position.
42
+
43
+ **Upgrading to 1.17.x** (latest: `1.17.3`): you can upgrade all the way — no code changes needed.
44
+
45
+ ---
46
+
47
+ ## Series 1.15 → 1.16
48
+
49
+ **Coming from any 1.15 version:**
50
+ [1.15.0, 1.15.1, 1.15.2, 1.15.3]
51
+
52
+ **Upgrading to 1.16.x** (latest: `1.16.6`):
53
+
54
+ - **If** your CSV files contain stray `"` characters in the middle of unquoted fields:
55
+ → verify the output is now correct — 1.16.0 treats them as literal (RFC 4180). Output gets more correct for almost everyone; the temporary escape hatch `quote_boundary: :legacy` exists if your downstream code depended on the previously-corrupted output (not recommended for new code).
56
+
57
+ ---
58
+
59
+ ## Series 1.14 → 1.15
60
+
61
+ **Coming from any 1.14 version:**
62
+ [1.14.0, 1.14.1, 1.14.2, 1.14.3, 1.14.4]
63
+
64
+ **Upgrading to 1.15.x** (latest: `1.15.3`):
65
+
66
+ - **If** your Ruby version is 2.5 or older:
67
+ → upgrade Ruby to 2.6 or newer — 1.15.0 dropped support for Ruby 2.5.
68
+
69
+ The migration is small: Ruby 2.5 reached end-of-life in March 2021 (no more security fixes anywhere), and Ruby 2.5 → 2.6 is API-compatible for nearly all code. Update your `.ruby-version` or the `ruby` line in your `Gemfile`, run `bundle install`, and you're done. Most users jump straight to a current Ruby (3.x).
70
+
71
+ ---
72
+
73
+ ## Series 1.13 → 1.14
74
+
75
+ **Coming from any 1.13 version:**
76
+ [1.13.0, 1.13.1]
77
+
78
+ **Upgrading to 1.14.x** (latest: `1.14.4`): you can upgrade all the way — no code changes needed.
79
+
80
+ ---
81
+
82
+ ## Series 1.12 → 1.13
83
+
84
+ **Coming from any 1.12 version:**
85
+ [1.12.0, 1.12.1]
86
+
87
+ **Upgrading to 1.13.x** (latest: `1.13.1`):
88
+
89
+ - **If** your CSV rows can have more columns than the header AND your code expects only header-listed keys:
90
+ → filter out the new auto-generated `:column_N` keys, or pass `strict: true` to raise on extras — 1.13.0 keeps extra columns instead of dropping them silently.
91
+
92
+ - **If** any of your input files might have unbalanced quotes:
93
+ → wrap calls in `rescue SmarterCSV::MalformedCSV` — 1.13.0 now raises instead of producing garbled output.
94
+
95
+ - **If** you pass `user_provided_headers:` AND your file has a header line that should be skipped:
96
+ → also pass `headers_in_file: true` explicitly — 1.13.0 made `user_provided_headers:` imply `headers_in_file: false` by default.
97
+
98
+ ---
99
+
100
+ ## Series 1.11 → 1.12
101
+
102
+ **Coming from any 1.11 version:**
103
+ [1.11.0, 1.11.2]
104
+
105
+ **Upgrading to 1.12.x** (latest: `1.12.1`):
106
+
107
+ - **If** you call `SmarterCSV.process` and need to inspect headers / warnings / errors after parsing:
108
+ → switch to using `reader = SmarterCSV::Reader.new(file, options); reader.process`.
109
+
110
+ Version 1.11 class-level accessors `SmarterCSV.headers` / `SmarterCSV.raw_header` are gone in 1.12.0 — if you used those, see the next question.
111
+
112
+ - **If** you call `SmarterCSV.raw_headers` or `SmarterCSV.headers`:
113
+ → switch to instantiating `SmarterCSV::Reader` and reading `reader.raw_headers` / `reader.headers` — 1.12.0 moved these off the class-level API.
114
+
115
+ ---
116
+
117
+ ## Series 1.10 → 1.11
118
+
119
+ **Coming from any 1.10 version:**
120
+ [1.10.0, 1.10.1, 1.10.2, 1.10.3]
121
+
122
+ **Upgrading to 1.11.x** (latest: `1.11.2`): you can upgrade all the way — no code changes needed.
123
+
124
+ ---
125
+
126
+ ## Series 1.9 → 1.10
127
+
128
+ **Coming from any 1.9 version:**
129
+ [1.9.0, 1.9.2, 1.9.3]
130
+
131
+ **Upgrading to 1.10.x** (latest: `1.10.3`):
132
+
133
+ - **If** you use `user_provided_headers:`:
134
+ → write the list in the exact final form you want (all symbols *or* all strings) — 1.10.0 stopped applying additional transformations. `strings_as_keys:` is ignored alongside it.
135
+
136
+ - **If** your `user_provided_headers:` list contains duplicate entries:
137
+ → remove the duplicates — 1.10.0 raises `SmarterCSV::DuplicateHeaders`.
138
+
139
+ - **If** you depended on duplicate-header detection failing fast:
140
+ → pass `duplicate_header_suffix: nil` explicitly — 1.10.0 changed the default to `''` (it auto-disambiguates duplicates as `name`, `name2`, ...).
141
+
142
+ ---
143
+
144
+ ## Series 1.8 → 1.9
145
+
146
+ **Coming from any 1.8 version:**
147
+ [1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5]
148
+
149
+ **Upgrading to 1.9.x** (latest: `1.9.3`):
150
+
151
+ - **If** you rescue `SmarterCSV::MissingHeaders`:
152
+ → rename it to `SmarterCSV::MissingKeys` — 1.9.0 renamed the error.
153
+
154
+ - **If** you use `key_mapping:` and want to allow some mapped headers to be missing:
155
+ → pass `silence_missing_keys: true` — 1.9.0 now raises `MissingKeys` for unmapped headers (this makes them optional).
156
+
157
+ ---
158
+
159
+ ## Series 1.7 → 1.8
160
+
161
+ **Coming from any 1.7 version:**
162
+ [1.7.0.pre1, 1.7.0.pre5, 1.7.1, 1.7.2, 1.7.3, 1.7.4]
163
+
164
+ **Upgrading to 1.8.x** (latest: `1.8.5`):
165
+
166
+ - **If** you accept CSV files from users or other external sources where the column separator might not be a comma (e.g. locale-specific exports using `;` or tab), or where a file might have only one column:
167
+ → wrap your `SmarterCSV.process` calls in `rescue SmarterCSV::NoColSepDetected` — 1.8.0 made `col_sep: :auto` and `row_sep: :auto` the new defaults, but in rare cases it raises when separators could not be found.
168
+
169
+ ---
170
+
171
+ ## Series 1.6 → 1.7
172
+
173
+ **Coming from any 1.6 version:**
174
+ [1.6.0, 1.6.1]
175
+
176
+ **Upgrading to 1.7.x** (latest: `1.7.4`): you can upgrade all the way — no code changes needed.
177
+
178
+ ---
179
+
180
+ ## Series 1.5 → 1.6
181
+
182
+ **Coming from any 1.5 version:**
183
+ [1.5.0, 1.5.1, 1.5.2]
184
+
185
+ **Upgrading to 1.6.x** (latest: `1.6.1`):
186
+
187
+ - **If** you rescue an exception when `key_mapping:` has an unused key:
188
+ → remove that rescue clause — 1.6.1 changed this from an exception to a warning.
189
+
190
+ ---
191
+
192
+ ## Series 1.4 → 1.5
193
+
194
+ **Coming from any 1.4 version:**
195
+ [1.4.0, 1.4.2]
196
+
197
+ **Upgrading to 1.5.x** (latest: `1.5.2`):
198
+
199
+ - **If** you relied on lines starting with `#` being treated as comments:
200
+ → pass `comment_regexp: /\A#/` explicitly — 1.5.0 changed the default to `nil`.
201
+
202
+ ---
203
+
204
+ ## Series 1.3 → 1.4
205
+
206
+ **Coming from any 1.3 version:**
207
+ [1.3.0]
208
+
209
+ **Upgrading to 1.4.x** (latest: `1.4.2`): you can upgrade all the way — no code changes needed.
210
+
211
+ ---
212
+
213
+ ## Series 1.2 → 1.3
214
+
215
+ **Coming from any 1.2 version:**
216
+ [1.2.0, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8]
217
+
218
+ **Upgrading to 1.3.x** (latest: `1.3.0`):
219
+
220
+ - **If** you use `key_mapping:`:
221
+ → switch hash values to symbols (or update downstream reads to use string keys) — 1.3.0 stopped silently coercing values to symbols.
222
+
223
+ ---
224
+
225
+ ## Series 1.1 → 1.2
226
+
227
+ **Coming from any 1.1 version:**
228
+ [1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5]
229
+
230
+ **Upgrading to 1.2.x** (latest: `1.2.8`):
231
+
232
+ - **If** your CSV files have duplicate header names:
233
+ → rename the duplicates, or be ready to rescue `SmarterCSV::DuplicateHeaders` — 1.2.0 added default validation that each header appears only once and raises this exception when it doesn't.
234
+
235
+ ---
236
+
237
+ ## Series 1.0 → 1.1
238
+
239
+ **Coming from any 1.0 version:**
240
+ [1.0.0.pre1, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.14, 1.0.15, 1.0.16, 1.0.17, 1.0.18, 1.0.19]
241
+
242
+ **Upgrading to 1.1.x** (latest: `1.1.5`):
243
+
244
+ - **If** you set `headers_in_file: false`:
245
+ → also provide `user_provided_headers:` — 1.1.0 now raises an error if you set the former without the latter.
246
+
247
+ ---
248
+
249
+ ---
250
+
251
+ Questions? Open an issue: <https://github.com/tilo/smarter_csv/issues>.
data/docs/.nojekyll ADDED
File without changes