smarter_csv 1.16.1 → 1.16.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 043745aedb1c63fd4a044b9ae46bb8e5d98324c14e609214ee3d895acfd5f501
4
- data.tar.gz: c39a10521b767daf51887278c9020c9ff6d8d93c32c5ec3f95a17ec575ebdab5
3
+ metadata.gz: 6d4cf6c7123e6048cb2c8de80ad92625a9985954e4308084f0a5b86cae4df03c
4
+ data.tar.gz: 5bd6237f017a8d4c54e4ee9ce6f9c3863d65d744c49ed1d6409c78c07f84ec88
5
5
  SHA512:
6
- metadata.gz: 5f1d125138443f02e0276e964dac9e584b996de6acafe8b3856316852a38220094e69a2c14302f922bcc93b6d23cc594bbf926940ccd70a2bd65ab08c5a18b49
7
- data.tar.gz: '0929051996781c8643c0239556d123c840e7041d9c12f7d867e3800dfb2c2eb92e6f6fb77b5fc08660a36f34f470cb826f255943fa445d3e29d231e648da51b4'
6
+ metadata.gz: 25259d4b0b4edfe05c8e5d83e5ba691f6c82effaacc3a2cb9b78490f89b1e7a132ae70c8d1ea32731d1c3cce1f62e1dd98d05ae23576cc9872bd1ff4ac635ea3
7
+ data.tar.gz: 045fae96155913ff53c7661c7b4fe59946a792b72ff833eb723b49138406f30b3d6763e196c2cb6d5286ce014dc3d07f7bda94434a58857701b97d24a407fb4f
data/CHANGELOG.md CHANGED
@@ -1,6 +1,22 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.16.2 (2026-03-30) — Bug Fixes
5
+
6
+ RSpec tests: **1,410 → 1,425** (+15 tests)
7
+
8
+ ### Bug Fixes
9
+
10
+ * Fixed `value_converters` to accept lambdas and Procs in addition to class-based converters.
11
+ Thanks to [Jonas Staškevičius](https://github.com/pirminis) for issue [#329](https://github.com/tilo/smarter_csv/issues/329).
12
+
13
+ * Fixed blank header auto-naming to use **absolute column position**, consistent with extra data column naming.
14
+ `name,,` now produces `column_2`/`column_3` instead of `column_1`/`column_2`.
15
+ ⚠️ If your code references auto-generated keys for blank headers, update those to use the absolute column position.
16
+
17
+ * Fixed `Writer`: when both `map_headers:` and `header_converter:` were used together, `map_headers` was silently ignored.
18
+ `map_headers` is now applied first, then `header_converter` on top.
19
+
4
20
  ## 1.16.1 (2026-03-16) — Bug Fixes & New Features
5
21
 
6
22
  RSpec tests: **1,247 → 1,410** (+163 tests)
@@ -101,7 +117,6 @@ Measured on 19 benchmark files, Apple M1, Ruby 3.4.7. See [benchmarks](docs/rele
101
117
  * `remove_values_matching:` → use `nil_values_matching:`
102
118
  * `strict:` → use `missing_headers: :raise/:auto`
103
119
  * `verbose: true/false` → use `verbose: :debug/:normal`
104
- * `only_headers:` / `except_headers:` → use `headers: { only: }` / `headers: { except: }`
105
120
 
106
121
  ### Bug Fixes
107
122
 
data/CONTRIBUTORS.md CHANGED
@@ -1,4 +1,4 @@
1
- # A Big Thank You to all 61 Contributors!!
1
+ # A Big Thank You to all 63 Contributors!!
2
2
 
3
3
 
4
4
  A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
@@ -65,3 +65,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
65
65
  * [Tophe](https://github.com/tophe)
66
66
  * [Dom Lebron](https://github.com/biglebronski)
67
67
  * [Paho Lurie-Gregg](https://github.com/paholg)
68
+ * [Jonas Staškevičius](https://github.com/pirminis)
data/README.md CHANGED
@@ -249,7 +249,7 @@ For reporting issues, please:
249
249
  * open a pull-request adding a test that demonstrates the issue
250
250
  * mention your version of SmarterCSV, Ruby, Rails
251
251
 
252
- # [A Special Thanks to all 62 Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
252
+ # [A Special Thanks to all 63 Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
253
253
 
254
254
 
255
255
  ## Contributing
data/docs/options.md CHANGED
@@ -119,7 +119,7 @@ See [Parsing Strategy](./parsing_strategy.md) for full details on quote handling
119
119
  |--------|---------|-------------|
120
120
  | `:strip_whitespace` | `true` | Remove whitespace before/after values and headers. |
121
121
  | `:convert_values_to_numeric` | `true` | Convert strings containing integers or floats to the appropriate numeric type. Accepts `{except: [:key1, :key2]}` or `{only: :key3}` to limit which columns. |
122
- | `:value_converters` | `nil` | Hash of `:header => ClassName`; each class must implement `self.convert(value)`. See [Value Converters](./value_converters.md). |
122
+ | `:value_converters` | `nil` | Hash of `:header => converter`; converter can be a lambda/Proc or a class implementing `self.convert(value)`. See [Value Converters](./value_converters.md). |
123
123
  | `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil` or an empty string. |
124
124
  | `:remove_zero_values` | `false` | Remove key/value pairs where the numeric value equals zero. |
125
125
  | `:nil_values_matching` | `nil` | Set matching values to `nil`. Accepts a regular expression matched against the string representation of each value (e.g. `/\ANAN\z/` for NaN, `/\A#VALUE!\z/` for Excel errors). With `remove_empty_values: true` (default), nil-ified values are then removed. With `remove_empty_values: false`, the key is retained with a `nil` value. |
@@ -195,8 +195,6 @@ See [performance_notes.md](performance_notes.md) and [benchmarks.md](benchmarks.
195
195
 
196
196
  **Deprecations:**
197
197
 
198
- - `only_headers:` → use `headers: { only: }`
199
- - `except_headers:` → use `headers: { except: }`
200
198
  - `remove_values_matching:` → use `nil_values_matching:`
201
199
  - `strict: true` → use `missing_headers: :raise`
202
200
  - `strict: false` → use `missing_headers: :auto`
@@ -26,50 +26,84 @@
26
26
 
27
27
  # Ruby CSV Pitfalls: Silent Data Corruption and Loss
28
28
 
29
- Ruby's built-in `CSV` library is for many the go-to — it ships with Ruby and requires no dependencies. But it has failure modes that produce **no exception, no warning, and no indication that anything went wrong**. Your import runs, your tests pass, and your data is quietly wrong.
29
+ When having to parse CSV files, many developers go straight to the Ruby `CSV` library — it ships with Ruby and requires no dependencies.
30
30
 
31
- This page documents ten reproducible ways `CSV.read` (and `CSV.table`) can silently corrupt or lose data, with examples you can run yourself, and how SmarterCSV handles each case.
31
+ But it comes at the cost of boilerplate post-processing you have to write, test, and maintain yourself. Worse, there are some failure modes that produce **no exception, no warning, and no indication that anything went wrong**. Your import runs, your tests pass, and your data is quietly wrong.
32
32
 
33
- > **Note on `CSV.table`:** It's a convenience wrapper for `CSV.read` with `headers: true`, `header_converters: :symbol`, and `converters: :numeric`.
33
+ `CSV.read` is fine for small, trusted, well-formed files — particularly when you control the source. This page is about what can happen with **messy real-world files your partners produce, or users upload** — ten reproducible ways `CSV.read` and `CSV.table` can silently corrupt or lose data, with examples you can run yourself, and how SmarterCSV handles each case.
34
+
35
+ > Not all ten may be equally surprising — some are odd behavior that bites you anyway, others are genuine traps. All ten are silent.
36
+
37
+ ---
38
+
39
+ > 💡 **Want to follow along?** Download the [example CSV files](https://raw.githubusercontent.com/tilo/articles/main/ruby/smarter_csv/10-ways-ruby_csv-can-silently-corrupt-or-lose-your-data/images/10-ways-ruby_csv-can-silently-corrupt-or-lose-your-data-examples.tgz) and run the examples locally.
34
40
 
35
41
  ---
36
42
 
37
43
  ## At a Glance
38
44
 
39
- | # | Ruby CSV Issue | Failure Mode | SmarterCSV fix | SmarterCSV Details |
40
- |---|-------|-------------|:--------------:|---------|
41
- | 1 | Extra columns silently dropped | Values beyond header count compete for the `nil` key — all but the last are discarded | by default ✅ | Default `missing_headers: :auto` auto-generates `:column_N` keys |
42
- | 2 | Duplicate headers — last wins | `.to_h` keeps only the last value for a repeated header; earlier values silently lost | by default ✅ | Default `duplicate_header_suffix:` → `:score`, `:score2`, `:score3` |
43
- | 3 | Empty headers — `""` key collision | Blank header cells become `""` keys; multiple blanks collide and overwrite each other | by default ✅ | Default `missing_header_prefix:` → `:column_1`, `:column_2` |
44
- | 4 | BOM corrupts first header | `"\xEF\xBB\xBFname"` `"name"` first column becomes unreachable by its key | by default ✅ | Automatic BOM stripping always on, no option needed |
45
- | 5 | Whitespace in headers ¹ | `" Age"` ≠ `"Age"` — lookup silently returns `nil` | by default ✅ | Default `strip_whitespace: true` strips headers and values |
46
- | 6 | `liberal_parsing` garbles fields | Unmatched quotes produce wrong field boundaries corrupted data returned as valid | by default ✅ | `on_bad_row: :raise` (default); opt-in `:skip` / `:collect` for quarantine |
47
- | 7 | `nil` vs `""` for empty fields | Unquoted empty → `nil`, quoted empty → `""` — inconsistent empty checks | by default ✅ | Default `remove_empty_values: true` removes both; `false` normalizes both to `nil` |
48
- | 8 | Backslash-escaped quotes (MySQL/Unix) | `\"` treated as field-closing quote — crash or garbled data | by default ✅ | Default `quote_escaping: :auto` handles both RFC 4180 and backslash escaping |
49
- | 9 | Missing closing quote eats the rest of the file | One unclosed `"` swallows all subsequent rows into one field value | via option | `field_size_limit: N` raises immediately; `quote_boundary: :standard` (default) reduces exposure |
50
- | 10 | No encoding auto-detection | Non-UTF-8 files either crash or silently produce mojibake | via option | `file_encoding:`, `force_utf8: true`, `invalid_byte_sequence:` |
51
-
52
- ¹ The one case where `CSV.table` does better than `CSV.read`: its `header_converters: :symbol` option includes `.strip`, so whitespace is removed from headers. All other nine issues are identical between `CSV.read` and `CSV.table`.
45
+ | # | Severity | Ruby CSV Issue | Failure Mode | SmarterCSV fix | SmarterCSV Details |
46
+ |---|:--------:|-------|-------------|:--------------:|---------|
47
+ | 1 | 🔴 | Extra columns silently dropped | Values beyond header count compete for the `nil` key — only the first survives, the rest are discarded | by default ✅ | Default `missing_headers: :auto` auto-generates `:column_N` keys |
48
+ | 2 | 🔴 | Duplicate headers — first wins | `.to_h` keeps only the first value for a repeated header; later values silently lost | by default ✅ | Default `duplicate_header_suffix:` → `:score`, `:score2`, `:score3` |
49
+ | 3 | 🔴 | Empty headers — `nil` key collision | Blank header cells become `nil` keys; multiple blanks collide and only the first value survives | by default ✅ | Default `missing_header_prefix:` → `:column_1`, `:column_2` |
50
+ | 4 | 🔴 | `converters: :numeric` silently corrupts leading-zero values as octal ¹ | `Integer()` interprets leading zeros as octal `"00123"` `83` | by default ✅ | Default `convert_values_to_numeric: true` uses decimal no octal trap; `convert_values_to_numeric: false` preserves strings exactly |
51
+ | 5 | 🟡 | Whitespace in headers ² | `" Age"` ≠ `"Age"` — lookup silently returns `nil` | by default ✅ | Default `strip_whitespace: true` strips headers and values |
52
+ | 6 | 🟡 | Whitespace around values | `"active " == "active"` `false` leading/trailing spaces or tabs cause status/type checks to silently return wrong results | by default ✅ | Default `strip_whitespace: true` strips all values; set `false` to preserve spaces |
53
+ | 7 | 🟠 | `nil` vs `""` for empty fields | Unquoted empty → `nil`, quoted empty → `""` — inconsistent empty checks | by default ✅ | Default `remove_empty_values: true` removes both; `false` normalizes both to `""` |
54
+ | 8 | 🟠 | Backslash-escaped quotes (MySQL/Unix) | `\"` treated as field-closing quote — crash or garbled data | by default ✅ | Default `quote_escaping: :auto` handles both RFC 4180 and backslash escaping |
55
+ | 9 | 🔴 | TSV file read as CSV completely breaks ❌ | Default `col_sep: ","` on a tab-delimited file returns each row as a single string; all column structure lost | by default | Default `col_sep: :auto` detects the actual delimiter — no option needed |
56
+ | 10 | 🔴 | No encoding auto-detection | Non-UTF-8 files either crash or silently produce mojibake | via option | `file_encoding:`, `force_utf8: true`, `invalid_byte_sequence: ''` |
57
+
58
+ ¹ Issue #4 can be triggered two ways: `CSV.table` enables `converters: :numeric` by default (no opt-in required), and `CSV.read` triggers the same corruption when passed `converters: :numeric` explicitly. Either way, any leading-zero string field ZIP codes, customer IDs, product codes is silently converted to a wrong integer.
59
+
60
+ ² The one case where `CSV.table` does better than `CSV.read`: its `header_converters: :symbol` option includes `.strip`, so whitespace is removed from headers (#5). Values (#6) are not stripped — `CSV.table` has the same whitespace-around-values problem. For all other issues `CSV.table` is identical to or worse than `CSV.read`.
61
+
62
+ > `CSV.table` is a convenience wrapper for `CSV.read` with `headers: true`, `header_converters: :symbol`, and `converters: :numeric`.
63
+
64
+ ---
65
+
66
+ ## The Real Cost of Handling This Yourself
67
+
68
+ Experienced users of `CSV.read` know some of these gotchas and handle them in post-processing — but not all of them can be: some are serious bugs that will silently corrupt your data regardless. And even for the ones you can handle, manual post-processing has five hidden costs:
69
+
70
+ * **You hand-craft boilerplate for every use case.** The right fix for whitespace differs when headers have spaces vs. values have spaces vs. both. Encoding handling depends on the source system. There is no generic post-processing snippet — you write a slightly different version every time.
71
+
72
+ * **You have to remember all of it, every time.** Every new import, service, or data source needs the same gotchas handled — consistently. But boilerplate doesn't enforce itself. A fix you wrote for one importer doesn't automatically apply to the next. The gotchas don't announce themselves — you only catch them if you remember to look.
73
+
74
+ * **Your boilerplate is probably undertested.** Post-processing code that wraps `CSV.read` rarely gets the same test coverage as business logic. Developers don't think of it as the risky part. Data edge cases — files with blank headers, leading-zero IDs, quoted empty fields, mixed encoding — don't make it into the test suite until they cause a production incident. You don't know what your boilerplate misses until a file breaks it.
75
+
76
+ > ❓ Do your tests for your CSV wrapper just test the mechanics, or include data corner cases?
77
+
78
+ * **Your benchmarks probably don't include the boilerplate code.** When you chose `CSV.read`, you probably looked at raw parsing performance — but did you measure the end-to-end cost of your post-processing? Whitespace stripping, header cleanup, empty normalization: none of that is free. Your end-to-end data pipeline is much slower than what you initially measured.
79
+
80
+ * **One library that handles it predictably and performant is worth more than the sum of its parts.** The value isn't "these ten cases are covered." It is that you stop maintaining a bespoke cleaning pipeline, stop writing one-off fixes after production surprises, and don't have to worry about test coverage or performance - you can trust that the default behavior handles edge cases sensibly — without silently damaging your data.
81
+
82
+ Predictable behavior in a well-tested library beats hand-crafted boilerplate that anticipates fewer edge cases.
53
83
 
54
84
  ---
55
85
 
56
86
  ## Why These Failures Are Dangerous
57
87
 
58
- Every failure in this list is **silent**. No exception, no warning, no log line — the import completes successfully and the data is quietly wrong. That makes them hard to catch in tests and easy to miss in code review.
88
+ **Every single failure in this list is silent.** No exception, no warning, no log line — your import completes successfully and your data is quietly wrong. That's what makes these issues so dangerous: they don't surface in tests, they don't cause immediate errors, and they're easy to miss during code review.
89
+
90
+ The root cause is that `CSV.read` is a **tokenizer**, not a data pipeline. It splits bytes into fields and hands them back with no normalization, no validation, and no defensive handling of real-world messiness. Every assumption about what "clean" input looks like is left to the caller.
59
91
 
60
- The root cause is that `CSV.read` is a tokenizer, not a data pipeline. It splits bytes into fields and returns them with no normalization, no validation, and no defensive handling of real-world messiness. Every assumption about what "clean" input looks like is left to the caller.
92
+ Issue #4 deserves special mention: `CSV.table`'s default `converters: :numeric` silently turns `"00123"` into `83`³ and `"01234"` into `668`³ values that look like perfectly valid integers. ZIP codes, customer IDs, and product codes are quietly replaced with wrong numbers that pass every validation, get stored in your database, and are indistinguishable from real data until someone notices the numbers don't match.
61
93
 
62
- `CSV.table` fixes exactly one issue out of ten whitespace in headers because its `:symbol` converter happens to call `.strip`. Everything else is identical.
94
+ These aren't obscure edge cases. Extra columns, trailing commas, Windows-1252 encoding, duplicate headers, blank header cells, TSV-vs-CSV confusion, leading-zero identifiers, and whitespace-padded values are all common in CSV files exported from Excel, reporting tools, ERP systems, and legacy data pipelines. If your application accepts user-uploaded CSV files, you will encounter these.
63
95
 
64
- These are not obscure edge cases. Extra columns, trailing commas, BOMs, Windows-1252 encoding, duplicate headers, and blank header cells are all common in CSV files exported from Excel, reporting tools, ERP systems, and legacy data pipelines.
96
+ The defensive post-processing code required to handle all ten cases correctly — octal-safe numeric conversion, whitespace normalization, duplicate header disambiguation, extra column naming, consistent empty value handling, backslash quote escaping, delimiter auto-detection, encoding detection is non-trivial to write, test, and maintain. Most applications never bother, because the failures are silent.
65
97
 
66
- > **Ready to switch?** ➡️ [Migrating from Ruby CSV](./migrating_from_csv.md)
98
+ ³ These aren't rounding errors or truncations — they are completely different numbers. [Octal](https://en.wikipedia.org/wiki/Octal) is a base-8 number system from the early days of computing, still used in low-level Unix file permissions and C integer literals. It has no place in CSV data. No spreadsheet, ERP system, or database exports ZIP codes or customer IDs in octal — but Ruby CSV silently assumes that's exactly what a leading zero means.
99
+
100
+ Read on for a detailed explanation and reproducible example for each issue.
67
101
 
68
102
  ---
69
103
 
70
104
  ## 1. Extra Columns Without Headers — Values Silently Discarded
71
105
 
72
- When a row has more fields than there are headers, `CSV.read` maps every extra field to the `nil` key. If there are multiple extra fields, they all compete for the same `nil` key — **only the last one survives**, the rest are silently discarded.
106
+ When a row has more fields than there are headers, `CSV.read` maps every extra field to the `nil` key. If there are multiple extra fields, they all compete for the same `nil` key — **only the first one survives**, the rest are silently discarded.
73
107
 
74
108
  ```
75
109
  $ cat example1.csv
@@ -78,36 +112,46 @@ Alice , Smith, 30, VIP, Gold ,
78
112
  Bob, Jones, 25
79
113
  ```
80
114
 
81
- **With Ruby CSV:**
82
-
83
115
  ```ruby
84
116
  rows = CSV.read('example1.csv', headers: true).map(&:to_h)
85
117
  rows.first
86
- # => {" First Name " => "Alice ", " Last Name " => " Smith", " Age" => " 30", nil => ""}
87
- # the values "VIP" and "Gold" are silently lost here ^^^^^^^^^
118
+ # => {
119
+ # " First Name " => "Alice ",
120
+ # " Last Name " => " Smith",
121
+ # " Age" => " 30",
122
+ # nil => " VIP"
123
+ # ^^^^^^^^^^^^^
124
+ # data from unnamed column with "Gold" is silently lost
125
+ # }
88
126
  ```
89
127
 
90
- Alice's row has 6 fields but only 3 headers. The extra fields `"VIP"`, `"Gold"`, and `""` (trailing comma) all land on `nil` — each overwriting the last. No error, no warning.
128
+ Alice's row has 6 fields but only 3 headers. The extra fields `" VIP"`, `" Gold"`, and `""` (trailing comma) all land on `nil` — only the first one wins. No error, no warning.
91
129
 
92
130
  This is common in real-world exports: tools frequently append audit columns, status flags, or trailing commas that don't correspond to headers.
93
131
 
94
132
  **`CSV.table` has the same problem.**
95
133
 
96
- **With SmarterCSV:**
134
+ **SmarterCSV:** The default `missing_headers: :auto` auto-generates distinct names for extra columns using `missing_header_prefix` (default: `"column_"`). The trailing empty field is dropped by the default `remove_empty_values: true` setting. No data loss.
97
135
 
98
136
  ```ruby
99
137
  rows = SmarterCSV.process('example1.csv')
100
138
  rows.first
101
- # => {first_name: "Alice", last_name: "Smith", age: 30, column_1: "VIP", column_2: "Gold"}
139
+ # => {
140
+ # first_name: "Alice",
141
+ # last_name: "Smith",
142
+ # age: 30,
143
+ # column_4: "VIP",
144
+ # column_5: "Gold"
145
+ # ^^^^^^^^^^^^^^^^
146
+ # extra data columns are handled, no data is lost
147
+ # }
102
148
  ```
103
149
 
104
- The default `missing_headers: :auto` auto-generates distinct names for extra columns using `missing_header_prefix` (default: `"column_"`). The trailing empty field is dropped by the default `remove_empty_values: true` setting. No data loss.
105
-
106
150
  ---
107
151
 
108
- ## 2. Duplicate Header Names — First Value Silently Dropped
152
+ ## 2. Duplicate Header Names — Second Value Silently Dropped
109
153
 
110
- When two columns share the same header name, `CSV::Row#to_h` keeps only the **last** value. The first is silently dropped.
154
+ When two columns share the same header name, `CSV::Row#to_h` keeps only the **first** value. Later values are silently dropped.
111
155
 
112
156
  ```
113
157
  $ cat example2.csv
@@ -115,18 +159,18 @@ score,name,score
115
159
  95,Alice,87
116
160
  ```
117
161
 
118
- **With Ruby CSV:**
119
-
120
162
  ```ruby
121
163
  rows = CSV.read('example2.csv', headers: true).map(&:to_h)
122
164
  rows.first
123
- # => {"score" => "87", "name" => "Alice"}
124
- # ^^^ first score (95) silently lost
165
+ # => {"score" => "95", "name" => "Alice"}
166
+ # ^^^ second score (87) silently lost
125
167
  ```
126
168
 
127
169
  Common with reporting tool exports that repeat a column (e.g., two date columns both labeled `"Date"`).
128
170
 
129
- **With SmarterCSV:**
171
+ **`CSV.table` has the same problem.**
172
+
173
+ **SmarterCSV:** disambiguates duplicate headers by appending a number directly: `:score`, `:score2`, `:score3`.
130
174
 
131
175
  ```ruby
132
176
  rows = SmarterCSV.process('example2.csv')
@@ -136,15 +180,15 @@ rows.first
136
180
 
137
181
  * The default `duplicate_header_suffix: ""` disambiguates by appending a counter: `:score`, `:score2`, `:score3`.
138
182
  * Use `duplicate_header_suffix: '_'` to get `:score_2`, `:score_3`.
139
- * Set `duplicate_header_suffic: nil` to raise `DuplicateHeaders` instead.
183
+ * Set `duplicate_header_suffix: nil` to raise `DuplicateHeaders` instead.
140
184
 
141
185
  ---
142
186
 
143
- ## 3. Empty Header Fields — `""` Key Collision
187
+ ## 3. Empty Header Fields — `nil` Key Collision
144
188
 
145
- A CSV file with blank header cells (e.g., `name,,age`) gives those columns an empty string key. Multiple blank headers all collide on `""` — same overwrite problem as issue #1.
189
+ A CSV file with blank header fields (e.g., `name,,age`) gives those columns a `nil` key. Multiple blank headers all collide on `nil` — same overwrite problem as issue #1, and only the first value survives.
146
190
 
147
- > This is distinct from issue #1. Issue #1 is about extra *data* fields beyond the header count, which get keyed under `nil`. Issue #3 is about blank cells *in the header row itself*, which get keyed under `""`.
191
+ > Note: this is distinct from issue #1. Issue #1 is about extra *data* fields beyond the header count, which get keyed under `nil`. Issue #3 is about blank cells *in the header row itself*, which also get keyed under `nil`.
148
192
 
149
193
  ```
150
194
  $ cat example3.csv
@@ -152,25 +196,23 @@ name,,,age
152
196
  Alice,foo,bar,30
153
197
  ```
154
198
 
155
- **With Ruby CSV:**
156
-
157
199
  ```ruby
158
200
  rows = CSV.read('example3.csv', headers: true).map(&:to_h)
159
201
  rows.first
160
- # => {"name" => "Alice", "" => "bar", "age" => "30"}
161
- # ^^^ "foo" silently lost — both blank headers wrote to the "" key
202
+ # => {"name" => "Alice", nil => "foo", "age" => "30"}
203
+ # ^^^ "bar" silently lost — both blank headers map to nil, first value wins
162
204
  ```
163
205
 
164
- `CSV.table` converts headers to symbols — blank headers become `:"" ` same collision, different key:
206
+ `CSV.table` has the same `nil` key collision:
165
207
 
166
208
  ```ruby
167
209
  rows = CSV.table('example3.csv').map(&:to_h)
168
210
  rows.first
169
- # => {name: "Alice", :"" => "bar", age: 30}
170
- # ^^^ "foo" still silently lost
211
+ # => {name: "Alice", nil => "foo", age: 30}
212
+ # ^^^ "bar" still silently lost
171
213
  ```
172
214
 
173
- **With SmarterCSV:**
215
+ **SmarterCSV:** `missing_header_prefix:` (default `"column_"`) auto-generates names for blank headers: `:column_1`, `:column_2`, etc. No collision, no data loss.
174
216
 
175
217
  ```ruby
176
218
  rows = SmarterCSV.process('example3.csv')
@@ -178,47 +220,62 @@ rows.first
178
220
  # => {name: "Alice", column_1: "foo", column_2: "bar", age: 30}
179
221
  ```
180
222
 
181
- `missing_header_prefix:` (default `"column_"`) auto-generates names for blank headers: `:column_1`, `:column_2`, etc. No collision, no data loss.
182
-
183
223
  ---
184
224
 
185
- ## 4. BOM Corrupts the First Header
225
+ ## 4. `converters: :numeric` Silently Corrupts Leading-Zero Values as Octal
186
226
 
187
- Files saved by Excel on Windows often include a UTF-8 BOM (`\xEF\xBB\xBF`) at the start. `CSV.read` does not strip it, so the BOM is silently prepended to the first header name.
227
+ `converters: :numeric` When numbers have leading zeroes, the result does not just strip them - the entire number is silently converted to a completely different value³ that looks plausible but is incorrect ❌ .
228
+
229
+ `CSV.table` enables `converters: :numeric` by default without any opt-in, **triggering the bug by default**. `CSV.read` is safe by default, but triggers the same corruption when `converters: :numeric` (or `converters: :integer`) is passed explicitly.
188
230
 
189
231
  ```
190
232
  $ cat example4.csv
191
- name,age
192
- Alice,30
233
+ customer_id,zip_code,amount
234
+ 00123,01234,99.50
235
+ 00456,90210,9.99
193
236
  ```
194
237
 
195
- ```
196
- $ hexdump -C example4.csv
197
- 00000000 ef bb bf 6e 61 6d 65 2c 61 67 65 0a 41 6c 69 63 |...name,age.Alic|
198
- 00000010 65 2c 33 30 0a |e,30.|
238
+ **With Ruby CSV:**
239
+
240
+ ```ruby
241
+ # CSV.table converters: :numeric on by default, no opt-in needed
242
+ rows = CSV.table('example4.csv').map(&:to_h)
243
+ rows.first
244
+ # => {customer_id: 83, zip_code: 668, amount: 99.5}
245
+ # ^^^ "00123" → 83 (octal 0123 = decimal 83)
246
+ # ^^^ "01234" → 668 (octal 1234 = decimal 668)
247
+
248
+ # CSV.read with explicit converters: :numeric — same result
249
+ rows = CSV.read('example4.csv', headers: true, converters: :numeric).map(&:to_h)
250
+ rows.first
251
+ # => {"customer_id" => 83, "zip_code" => 668, "amount" => 99.5}
199
252
  ```
200
253
 
201
- The `ef bb bf` at offset 0 is the UTF-8 BOMinvisible in `cat` output but silently prepended to the first header by `CSV.read`.
254
+ `"00123"` becomes `83`. `"01234"` becomes `668`. ZIP codes, customer IDs, order numbers, product codes any field with a leading zero becomes a completely wrong integer. No exception, no warning. The resulting values look plausible and pass all type validations.
202
255
 
203
- **With Ruby CSV:**
256
+ `CSV.read` without converters is safe — strings are returned as-is:
204
257
 
205
258
  ```ruby
206
259
  rows = CSV.read('example4.csv', headers: true).map(&:to_h)
207
- rows.first.keys.first # => "\xEF\xBB\xBFname" ← not "name"
208
-
209
- rows.first['name'] # => nil ← first column unreachable
260
+ rows.first
261
+ # => {"customer_id" => "00123", "zip_code" => "01234", "amount" => "99.50"}
210
262
  ```
211
263
 
212
- The data is present but every lookup on the first column silently returns `nil`. The BOM is invisible in most terminals and editors — the output appears correct.
213
-
214
- **With SmarterCSV:**
264
+ **SmarterCSV:**
215
265
 
216
266
  ```ruby
267
+ # Default (convert_values_to_numeric: true) — decimal conversion, no octal trap
217
268
  rows = SmarterCSV.process('example4.csv')
218
- rows.first[:name] # => "Alice" ← BOM stripped automatically
269
+ rows.first
270
+ # => {customer_id: 123, zip_code: 1234, amount: 99.5}
271
+
272
+ # convert_values_to_numeric: false — preserves strings exactly, including leading zeros
273
+ rows = SmarterCSV.process('example4.csv', convert_values_to_numeric: false)
274
+ rows.first
275
+ # => {customer_id: "00123", zip_code: "01234", amount: "99.50"}
219
276
  ```
220
277
 
221
- By default SmarterCSV automatically detects and strips BOMs. Always on, no option needed.
278
+ SmarterCSV's default `convert_values_to_numeric: true` uses `to_i` / `to_f`, which always treats strings as decimal — no octal interpretation. Use `convert_values_to_numeric: false` when leading zeros must be preserved (ZIP codes, IDs, product codes).
222
279
 
223
280
  ---
224
281
 
@@ -232,94 +289,71 @@ $ cat example5.csv
232
289
  Alice,30
233
290
  ```
234
291
 
235
- **With Ruby CSV:**
236
-
237
292
  ```ruby
238
293
  rows = CSV.read('example5.csv', headers: true).map(&:to_h)
239
294
  rows.first
240
- # => {" name " => "Alice", " age " => "30"}
295
+ # => {" name " => "Alice", " age" => "30"}
241
296
 
242
- rows.first['name'] # => nil ← key is " name ", not "name"
297
+ rows.first['name'] # => nil ← silent miss; key is " name ", not "name"
243
298
  rows.first['age'] # => nil
244
299
  ```
245
300
 
246
- > `CSV.table` mitigates this: the `:symbol` header converter includes `.strip`. This is the one issue where `CSV.table` behaves better than `CSV.read`.
301
+ **`CSV.table` mitigates this:** ² the `:symbol` header converter includes `.strip`, so whitespace is removed from headers. This is the one issue where `CSV.table` behaves better than `CSV.read`.
247
302
 
248
- **With SmarterCSV:**
303
+ **SmarterCSV:**
249
304
 
250
305
  ```ruby
251
306
  rows = SmarterCSV.process('example5.csv')
252
307
  rows.first
253
308
  # => {name: "Alice", age: 30}
254
309
  ```
255
-
256
310
  The default setting `strip_whitespace: true` strips leading/trailing whitespace from both headers and values.
257
311
 
312
+
258
313
  ---
259
314
 
260
- ## 6. `liberal_parsing: true` Garbles Field Values
315
+ ## 6. Whitespace Around Values Silent Comparison Failure
261
316
 
262
- `CSV.read` raises `MalformedCSVError` when it encounters an unmatched quote. `liberal_parsing: true` suppresses the error and returns a row anyway but with wrong field boundaries.
317
+ `CSV.read` returns field values exactly as they appear in the file — leading spaces, trailing spaces, and tab characters all preserved. Exporters from fixed-width database systems (Oracle `CHAR` columns, COBOL-era systems) routinely pad string fields to a fixed width; other tools leave accidental leading spaces. The values look correct when printed, but equality checks silently return `false`.
263
318
 
264
- **The key danger:** without `liberal_parsing` you at least know something is wrong. With it, corrupted data is silently returned as valid.
319
+ This pairs with Example 5 (whitespace in headers): Ruby CSV strips neither headers nor values by default.
265
320
 
266
321
  ```
267
322
  $ cat example6.csv
268
- name,note,score
269
- Alice,"unclosed quote,99
270
- Bob,normal,87
323
+ name,status,city
324
+ Alice,active ,New York ← trailing spaces after 'active'
325
+ Bob,inactive,Chicago
326
+ Carol, active,Boston ← leading space before 'active'
271
327
  ```
272
328
 
273
- **With Ruby CSV:**
274
-
275
329
  ```ruby
276
- # Without liberal_parsing: you know something is wrong
277
- CSV.read('example6.csv', headers: true)
278
- # => CSV::MalformedCSVError: Unclosed quoted field on line 2
330
+ rows = CSV.read('example6.csv', headers: true).map(&:to_h)
279
331
 
280
- # With liberal_parsing: silent corruption
281
- rows = CSV.read('example6.csv', headers: true, liberal_parsing: true).map(&:to_h)
282
- rows.length # => 1 (not 2 — Bob's row is gone)
283
- rows[0]
284
- # => {"name" => "Alice", "note" => "unclosed quote,99\nBob,normal,87", "score" => nil}
285
- # ^^^ Alice's note field swallowed the rest of the file; Bob vanished
332
+ rows[0]['status'] # => "active "
333
+ rows[2]['status'] # => " active"
334
+
335
+ rows.select { |r| r['status'] == 'active' }
336
+ # => [] ← Alice and Carol are not found. No error raised.
286
337
  ```
287
338
 
288
- The garbled row passes validations, gets inserted into the database, and surfaces as a data quality issue later.
339
+ The values look fine in logs and `puts` output. The bug only surfaces when the comparison silently returns the wrong result.
289
340
 
290
- **With SmarterCSV:**
341
+ **Workaround:** pass `strip: true` to `CSV.read`. This correctly strips spaces and tab characters. Note it also strips intentional leading/trailing spaces from any field — including quoted fields where spaces may be meaningful.
291
342
 
292
- ```ruby
293
- reader = SmarterCSV::Reader.new('example6.csv', on_bad_row: :collect)
294
- good_rows = reader.process
295
- reader.errors
296
- # => {
297
- # :bad_row_count => 1,
298
- # :bad_rows => [
299
- # {
300
- # :csv_line_number => 2,
301
- # :file_line_number => 2,
302
- # :file_lines_consumed => 2,
303
- # :error_class => SmarterCSV::MalformedCSV,
304
- # :error_message => "Unclosed quoted field detected in multiline data",
305
- # :raw_logical_line => "Alice,\"unclosed quote,99\nBob,normal,87\n"
306
- # }
307
- # ]
308
- # }
309
- ```
343
+ **`CSV.table` has the same problem** — its `:symbol` converter strips header names but does not touch field values.
310
344
 
311
- Or pass a lambda to `on_bad_row` — works with `SmarterCSV.process` (no `Reader` instance needed):
345
+ **SmarterCSV:**
312
346
 
313
347
  ```ruby
314
- bad_rows = []
315
- good_rows = SmarterCSV.process('example6.csv',
316
- on_bad_row: ->(rec) { bad_rows << rec })
348
+ rows = SmarterCSV.process('example6.csv')
349
+
350
+ rows[0][:status] # => "active"
351
+ rows[2][:status] # => "active"
352
+
353
+ rows.select { |r| r[:status] == 'active' }.length # => 2
317
354
  ```
318
355
 
319
- * `on_bad_row: :raise` (default) fails fast.
320
- * `on_bad_row: :collect` quarantines them — use `reader.errors` to access.
321
- * `on_bad_row: ->(rec) { ... }` calls your lambda per bad row; works with `SmarterCSV.process`.
322
- * `on_bad_row: :skip` discards bad rows silently.
356
+ `strip_whitespace: true` (default) strips all leading and trailing whitespace (spaces and tabs) from values. Set `strip_whitespace: false` to preserve spaces when needed.
323
357
 
324
358
  ---
325
359
 
@@ -337,8 +371,6 @@ Alice,
337
371
  Bob,""
338
372
  ```
339
373
 
340
- **With Ruby CSV:**
341
-
342
374
  ```ruby
343
375
  rows = CSV.read('example7.csv', headers: true).map(&:to_h)
344
376
 
@@ -349,9 +381,11 @@ rows[0]['city'].nil? # => true
349
381
  rows[1]['city'].nil? # => false ← same semantic meaning, different Ruby type
350
382
  ```
351
383
 
352
- Both rows have no city, but your code sees two different things. Any check using `.nil?`, `.blank?`, `.present?`, or `if row['city']` will behave differently depending on how the upstream exporter quoted the empty field.
384
+ Both rows have no city. But your code sees two different things. Any check using `.nil?`, `.blank?`, `.present?`, or a simple `if row['city']` will behave differently depending on how the upstream exporter happened to quote the empty field. No two exporters agree on this.
353
385
 
354
- **With SmarterCSV:**
386
+ **`CSV.table` has the same problem.**
387
+
388
+ **SmarterCSV:** `remove_empty_values: true` (default) removes both from the hash. With `remove_empty_values: false`, both are normalized to `""`. Consistent either way.
355
389
 
356
390
  ```ruby
357
391
  # remove_empty_values: true (default) — both empty cities are dropped from the hash
@@ -359,17 +393,17 @@ rows = SmarterCSV.process('example7.csv')
359
393
  rows[0] # => {name: "Alice"}
360
394
  rows[1] # => {name: "Bob"}
361
395
 
362
- # remove_empty_values: false — both normalized to nil
396
+ # remove_empty_values: false — both normalized to ""
363
397
  rows = SmarterCSV.process('example7.csv', remove_empty_values: false)
364
- rows[0] # => {name: "Alice", city: nil}
365
- rows[1] # => {name: "Bob", city: nil}
398
+ rows[0] # => {name: "Alice", city: ""}
399
+ rows[1] # => {name: "Bob", city: ""}
366
400
  ```
367
401
 
368
402
  ---
369
403
 
370
404
  ## 8. Backslash-Escaped Quotes — MySQL / Unix Dump Format
371
405
 
372
- MySQL's `SELECT INTO OUTFILE`, PostgreSQL `COPY TO`, and many Unix data-pipeline tools escape embedded double quotes as `\"` — not as `""` (the RFC 4180 standard). Ruby's `CSV` only understands RFC 4180, so a backslash before a quote is treated as two separate characters: a literal `\` followed by a `"` that immediately **closes the field**.
406
+ MySQL's `SELECT INTO OUTFILE`, PostgreSQL `COPY TO`, and many Unix data-pipeline tools escape embedded double quotes as `\"` — not as `""` (the RFC 4180 standard). Ruby's `CSV` only understands the RFC 4180 convention, so a backslash before a quote is treated as two separate characters: a literal `\` followed by a `"` that immediately **closes the field**.
373
407
 
374
408
  ```
375
409
  $ cat example8.csv
@@ -378,92 +412,69 @@ Alice,"She said \"hello\" to everyone"
378
412
  Bob,"Normal note"
379
413
  ```
380
414
 
381
- **With Ruby CSV Scenario 1: crash** (at least you know something went wrong):
415
+ **Scenario 1 — crash** (at least you know something went wrong):
382
416
 
383
417
  ```ruby
384
418
  rows = CSV.read('example8.csv', headers: true)
385
- # => CSV::MalformedCSVError: Illegal quoting in line 2.
419
+ # => CSV::MalformedCSVError: Any value after quoted field isn't allowed in line 2.
386
420
  ```
387
421
 
388
- **With Ruby CSV Scenario 2: silent garbling** with `liberal_parsing: true`:
422
+ **Scenario 2 — silent garbling** with `liberal_parsing: true`:
389
423
 
390
424
  ```ruby
391
425
  rows = CSV.read('example8.csv', headers: true, liberal_parsing: true)
392
- rows[0]['name'] # => "Alice"
393
- rows[0]['note'] # => "She said \\" ← field closed at the backslash-quote; rest lost
394
- rows[1]['name'] # => "hello" ← Alice's leftovers eaten as Bob's name
395
- rows[1]['note'] # => nil
426
+ rows[0]['note'] # => 'She said \"hello\" to everyone'
396
427
  ```
397
428
 
398
- No exception. No warning. `rows.length` is still 2. The data just quietly moved to the wrong fields.
429
+ No exception. No warning. The note field has extra wrapping quotes and mangled escaping it won't compare, display, or serialize correctly.
430
+
431
+ **`CSV.table` has the same problem** — and adding `liberal_parsing: true` makes it silently worse.
399
432
 
400
- **With SmarterCSV:**
433
+ **SmarterCSV:** `quote_escaping: :auto` (default since 1.0) detects and handles both `""` and `\"` escaping row-by-row. No option required.
401
434
 
402
435
  ```ruby
403
436
  rows = SmarterCSV.process('example8.csv')
404
- rows[0] # => {name: "Alice", note: "She said \"hello\" to everyone"}
437
+ rows[0] # => {name: "Alice", note: 'She said \"hello\" to everyone'}
405
438
  rows[1] # => {name: "Bob", note: "Normal note"}
406
439
  ```
407
440
 
408
- `quote_escaping: :auto` (default) detects and handles both `""` and `\"` escaping row-by-row. No option required. This covers MySQL `SELECT INTO OUTFILE`, PostgreSQL `COPY TO`, and Unix `csvkit`/`awk`-generated files.
409
-
410
441
  ---
411
442
 
412
- ## 9. Missing Closing Quote Consumes the Rest of the File
443
+ ## 9. TSV File Read as CSV Completely Breaks
413
444
 
414
- A single unclosed `"` causes the parser to enter quoted-field mode and treat everything that follows newlines included as part of one field. **All remaining rows are swallowed into a single field value.**
445
+ `CSV.read` defaults to `col_sep: ","`. When given a tab-delimited file (TSV), it finds no commas and treats each entire row as a single field. The header row becomes one giant key; each data row becomes one giant value. All column structure is silently lost no error, no warning, and `rows.length` looks correct.
415
446
 
416
447
  ```
417
- $ cat example8.csv
418
- name,age
419
- "Alice,30
420
- Bob,25
421
- Carol,40
448
+ $ cat example9.csv
449
+ name city score
450
+ Alice New York 95
451
+ Bob Chicago 87
422
452
  ```
423
453
 
424
- **With Ruby CSV:**
425
-
426
454
  ```ruby
427
- rows = CSV.read('example8.csv', headers: true)
428
- rows.length # => 1 (not 3)
429
- rows.first['name'] # => "Alice,30\nBob,25\nCarol,40"
430
- # ^^^ entire remainder of file in one field
431
- ```
455
+ rows = CSV.read('example9.csv', headers: true).map(&:to_h)
432
456
 
433
- On a large file this is an OOM risk: the parser accumulates an ever-growing string until EOF or memory exhaustion. There is no field size limit, no timeout, and no error until the file ends.
457
+ rows.length # => 2 (looks right but...)
458
+ rows.first.keys # => ["name\tcity\tscore"] ← entire header is one key
459
+ rows.first['name'] # => nil ← column unreachable
460
+ rows.first.values # => ["Alice\tNew York\t95"] ← entire row is one value
461
+ ```
434
462
 
435
- **With SmarterCSV:**
463
+ This can happen when users upload TSV instead of CSV - the file name could still be `.csv`, so indistinguishable from actual CSV data.
436
464
 
437
- ```ruby
438
- reader = SmarterCSV::Reader.new('example8.csv',
439
- on_bad_row: :collect,
440
- )
441
- good_rows = reader.process
442
- reader.errors
443
- # => {
444
- # :bad_row_count => 1,
445
- # :bad_rows => [
446
- # {
447
- # :csv_line_number => 2,
448
- # :file_line_number => 2,
449
- # :file_lines_consumed => 3,
450
- # :error_class => SmarterCSV::MalformedCSV,
451
- # :error_message => "Unclosed quoted field detected in multiline data",
452
- # :raw_logical_line => "\"Alice,30\nBob,25\nCarol,40\n"
453
- # }
454
- # ]
455
- # }
456
- ```
465
+ **`CSV.table` has the same problem.**
457
466
 
458
- Or pass a lambda to `on_bad_row` — works with `SmarterCSV.process` (no `Reader` instance needed):
467
+ **SmarterCSV:**
459
468
 
460
469
  ```ruby
461
- bad_rows = []
462
- good_rows = SmarterCSV.process('example8.csv',
463
- on_bad_row: ->(rec) { bad_rows << rec })
470
+ rows = SmarterCSV.process('example9.csv')
471
+ # col_sep: :auto detects the tab separator automatically
472
+
473
+ rows.first
474
+ # => {name: "Alice", city: "New York", score: 95}
464
475
  ```
465
476
 
466
- `field_size_limit: N` raises `SmarterCSV::FieldSizeLimitExceeded` as soon as any field or accumulating multiline buffer exceeds N bytes — the runaway parse stops immediately. Additionally, `quote_boundary: :standard` (default since 1.16.0) means mid-field quotes don't toggle quoted mode, reducing the attack surface further.
477
+ `col_sep: :auto` (default) samples the file and detects the actual delimiter. No option required.
467
478
 
468
479
  ---
469
480
 
@@ -472,42 +483,62 @@ good_rows = SmarterCSV.process('example8.csv',
472
483
  `CSV.read` assumes UTF-8. CSV files exported from Excel on Windows are typically Windows-1252 (CP1252), which encodes accented characters (é, ü, ñ) differently from UTF-8.
473
484
 
474
485
  ```
475
- $ cat example9.csv
486
+ $ cat example10.csv
476
487
  last_name,first_name
477
488
  Müller,Hans
478
489
  ```
479
490
 
480
491
  The file is saved in Windows-1252 encoding — `ü` is stored as `\xFC`, not as UTF-8.
481
492
 
482
- **With Ruby CSV Scenario 1: crash** (the better outcome — at least you know):
493
+ **Scenario 1 — crash** (the better outcome — at least you know):
483
494
 
484
495
  ```ruby
485
- rows = CSV.read('example9.csv', headers: true)
486
- # => Encoding::InvalidByteSequenceError: "\xFC" from ASCII-8BIT to UTF-8
496
+ rows = CSV.read('example10.csv', headers: true)
497
+ # => CSV::InvalidEncodingError: Invalid byte sequence in UTF-8 in line 2.
487
498
  ```
488
499
 
489
- **With Ruby CSV Scenario 2: silent mojibake** (the worse outcome):
500
+ **Scenario 2 — silent mojibake** (the worse outcome):
490
501
 
491
502
  ```ruby
492
503
  # Specifying the wrong encoding suppresses the error
493
- rows = CSV.read('example9.csv', headers: true, encoding: 'binary')
504
+ rows = CSV.read('example10.csv', headers: true, encoding: 'binary')
494
505
  rows.first['last_name'] # => "M\xFCller" ← garbled string
495
- rows.first['last_name'].valid_encoding? # => true ← Ruby thinks it's fine
506
+ rows.first['last_name'].valid_encoding? # => true ← Ruby thinks it's fine!
496
507
  ```
497
508
 
498
- The mojibake string passes `.valid_encoding?`, passes database validations, gets stored, and surfaces as a display bug in production.
509
+ The mojibake string passes `.valid_encoding?`, passes database validations, gets stored, and surfaces as a display bug weeks later in production.
510
+
511
+ **`CSV.table` has the same problem.**
499
512
 
500
- **With SmarterCSV:**
513
+ **SmarterCSV:** `file_encoding:` accepts Ruby's `'external:internal'` transcoding notation; `force_utf8: true` transcodes to UTF-8 automatically; `invalid_byte_sequence: ''` controls the replacement character for bytes that can't be transcoded, e.g. `''`.
501
514
 
502
515
  ```ruby
503
- rows = SmarterCSV.process('example9.csv',
516
+ rows = SmarterCSV.process('example10.csv',
504
517
  file_encoding: 'windows-1252:utf-8')
505
518
  rows.first[:last_name] # => "Müller"
506
519
  ```
507
520
 
508
- * `file_encoding:` accepts Ruby's `'external:internal'` transcoding notation.
509
- * `force_utf8: true` transcodes to UTF-8 automatically.
510
- * `invalid_byte_sequence:` controls the replacement character for bytes that can't be transcoded.
521
+ ---
522
+
523
+ ## The Alternative
524
+
525
+ ```ruby
526
+ gem 'smarter_csv'
527
+ ```
528
+
529
+ ```ruby
530
+ # Before
531
+ rows = CSV.read('data.csv', headers: true).map(&:to_h)
532
+
533
+ # After
534
+ rows = SmarterCSV.process('data.csv')
535
+ ```
536
+
537
+ SmarterCSV handles nine of the ten cases out of the box — octal-safe numeric conversion, whitespace normalization, duplicate header disambiguation, extra column naming, consistent empty value handling, backslash quote escaping, and delimiter auto-detection.
538
+
539
+ The remaining one (encoding control) requires explicit opt-in options, but the building blocks are there. No boilerplate, no post-processing pipeline, no silent data loss.
540
+
541
+ > **Ready to switch?** → [Migrating from Ruby CSV](./migrating_from_csv.md)
511
542
 
512
543
  ---
513
544
 
@@ -62,7 +62,7 @@ module SmarterCSV
62
62
  # Apply value converters
63
63
  if value_converters
64
64
  converter = value_converters[k]
65
- hash[k] = converter.convert(hash[k]) if converter
65
+ hash[k] = converter.respond_to?(:convert) ? converter.convert(hash[k]) : converter.call(hash[k]) if converter
66
66
  end
67
67
  end
68
68
 
@@ -27,19 +27,21 @@ module SmarterCSV
27
27
 
28
28
  def disambiguate_headers(headers, options)
29
29
  counts = Hash.new(0)
30
- empty_count = 0
31
30
  prefix = options[:missing_header_prefix] || 'column_'
32
31
  # Pre-collect non-blank header names so auto-generated names can avoid collisions.
33
32
  used = headers.reject { |h| blank?(h) }
34
- headers.map do |header|
33
+ headers.each_with_index.map do |header, idx|
35
34
  if blank?(header)
36
- # Empty headers use missing_header_prefix (e.g. "column_1", "column_2") so they
37
- # produce a usable key instead of :"" which gets silently deleted downstream.
38
- # Skip ahead if the generated name collides with an existing header.
39
- begin
40
- empty_count += 1
41
- candidate = "#{prefix}#{empty_count}"
42
- end while used.include?(candidate)
35
+ # Use absolute 1-based column position, consistent with how extra data columns
36
+ # beyond the header count are named. If the positional name collides with an
37
+ # existing header, append underscores until a free name is found this avoids
38
+ # stealing the positional name from any subsequent blank header.
39
+ candidate = "#{prefix}#{idx + 1}"
40
+ suffix = ''
41
+ while used.include?(candidate)
42
+ suffix += '_'
43
+ candidate = "#{prefix}#{idx + 1}#{suffix}"
44
+ end
43
45
  used << candidate
44
46
  candidate
45
47
  else
@@ -357,7 +357,7 @@ module SmarterCSV
357
357
 
358
358
  if options[:value_converters]
359
359
  options[:value_converters].each do |key, converter|
360
- hash[key] = converter.convert(hash[key]) if hash.key?(key)
360
+ hash[key] = converter.respond_to?(:convert) ? converter.convert(hash[key]) : converter.call(hash[key]) if hash.key?(key)
361
361
  end
362
362
  end
363
363
  else
@@ -755,7 +755,7 @@ module SmarterCSV
755
755
 
756
756
  if options[:value_converters]
757
757
  options[:value_converters].each do |key, converter|
758
- hash[key] = converter.convert(hash[key]) if hash.key?(key)
758
+ hash[key] = converter.respond_to?(:convert) ? converter.convert(hash[key]) : converter.call(hash[key]) if hash.key?(key)
759
759
  end
760
760
  end
761
761
  else
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.16.1"
4
+ VERSION = "1.16.2"
5
5
  end
@@ -149,7 +149,7 @@ module SmarterCSV
149
149
 
150
150
  def write_header_line
151
151
  mapped_headers = @headers.map { |header| @map_headers[header] || header }
152
- mapped_headers = @headers.map { |header| @header_converter.call(header) } if @header_converter
152
+ mapped_headers = mapped_headers.map { |header| @header_converter.call(header) } if @header_converter
153
153
  force_quotes = @quote_headers || @force_quotes
154
154
  mapped_headers = mapped_headers.map { |x| escape_csv_field(x, force_quotes) }
155
155
  @output_file.write(mapped_headers.join(@col_sep) + @row_sep) unless mapped_headers.empty?
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.16.1
4
+ version: 1.16.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2026-03-16 00:00:00.000000000 Z
10
+ date: 2026-03-30 00:00:00.000000000 Z
11
11
  dependencies: []
12
12
  description: |
13
13
  SmarterCSV is a high-performance CSV reader and writer for Ruby focused on