smarter_csv 1.16.0 → 1.16.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -3,6 +3,7 @@
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
5
  * [**Migrating from Ruby CSV**](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
6
7
  * [Parsing Strategy](./parsing_strategy.md)
7
8
  * [The Basic Read API](./basic_read_api.md)
8
9
  * [The Basic Write API](./basic_write_api.md)
@@ -25,9 +26,18 @@
25
26
 
26
27
  # Migrating from Ruby CSV
27
28
 
28
- Already using Ruby's built-in `CSV` library? Switching to SmarterCSV is typically a one- or
29
- two-line change — and you get **1.7×–8.6× faster** end-to-end throughput vs `CSV.read`, plain Ruby
30
- hashes with symbol keys, automatic type conversion, and a much richer feature set in return.
29
+ Already using Ruby's built-in `CSV` library? There are three good reasons to switch — and switching is typically a one- or two-line change.
30
+
31
+ ### Inconvenient
32
+ `CSV.read` returns arrays of arrays, so your code must manually handle column indexing, header normalization, type conversion, and whitespace stripping. SmarterCSV returns Rails-ready hashes with symbol keys, numeric conversion, and whitespace stripping out of the box — no boilerplate needed.
33
+
34
+ ### Hidden failure modes
35
+ `CSV.read` has 10 ways to silently corrupt or lose data — no exception, no warning, no log line.
36
+
37
+ ➡️ See [**Ruby CSV Pitfalls**](./ruby_csv_pitfalls.md) for reproducible examples and the SmarterCSV fix for each.
38
+
39
+ ### Slow
40
+ On top of everything else, it is up to 129× slower than SmarterCSV for equivalent end-to-end work — see the [Performance](#performance) section below.
31
41
 
32
42
  > **Medium article:** *"Switch from Ruby CSV to SmarterCSV in 5 Minutes"* — *(coming soon)*
33
43
 
@@ -50,29 +60,78 @@ _‡ `CSV.table` is the closest Ruby equivalent to SmarterCSV — both return sy
50
60
 
51
61
  ## The one-line switch
52
62
 
63
+ Real-world CSV files are messy — whitespace-padded headers, extra columns without headers, trailing
64
+ commas. Consider this file:
65
+
66
+ ```
67
+ $ cat data.csv
68
+ First Name , Last Name , Age
69
+ Alice , Smith, 30, VIP, Gold ,
70
+ Bob, Jones, 25
71
+ ```
72
+
73
+ **With Ruby CSV:**
53
74
  ```ruby
54
- # Before Ruby CSV
55
- rows = CSV.table('data.csv').map(&:to_h) # array of hashes with symbol keys
75
+ rows = CSV.read('data.csv', headers: true).map(&:to_h)
76
+ rows.first
77
+ # => { " First Name " => "Alice ", " Last Name " => " Smith", " Age" => " 30", nil => "" }
78
+ # "VIP" and "Gold" silently lost — both compete for the nil key, last one wins
79
+ ```
80
+
81
+ Whitespace-polluted keys, `Age` as a string, and extra columns competing for the same `nil` key —
82
+ the last one wins and the rest are silently discarded.
56
83
 
57
- # After — SmarterCSV (drop-in, up to 129× faster)
58
- rows = SmarterCSV.process('data.csv') # array of hashes with symbol keys
84
+ **With SmarterCSV:**
85
+ ```ruby
86
+ rows = SmarterCSV.process('data.csv')
87
+ rows.first
88
+ # => { first_name: "Alice", last_name: "Smith", age: 30, column_1: "VIP", column_2: "Gold" }
59
89
  ```
60
90
 
61
- That's it for the common case. Keep reading for the few behavior differences to be aware of.
91
+ Clean symbol keys, whitespace stripped, `age` converted to `Integer`, extra columns named no data loss.
92
+
93
+ No `.map(&:to_h)`, no `header_converters:`, no manual post-processing.
94
+
95
+ ---
96
+
97
+ ## Sample file used in remaining examples
98
+
99
+ The sections below use a simpler file to keep the focus on the specific behavior being illustrated:
100
+
101
+ ```
102
+ $ cat sample.csv
103
+ name,age,city
104
+ Alice,30,New York
105
+ Bob,25,
106
+ Charlie,35,Chicago
107
+ ```
108
+
109
+ Bob's `city` field is intentionally empty to illustrate empty-value handling.
62
110
 
63
111
  ---
64
112
 
65
113
  ## Parsing a CSV string
66
114
 
115
+ **With Ruby CSV:**
67
116
  ```ruby
68
- csv_string = "name,age\nAlice,30\nBob,25\n"
69
-
70
- # Ruby CSV
71
- rows = CSV.parse(csv_string, headers: true, header_converters: :symbol)
117
+ csv_string = "name,age,city\nAlice,30,New York\nBob,25,\nCharlie,35,Chicago\n"
118
+
119
+ rows = CSV.parse(csv_string, headers: true, header_converters: :symbol).map(&:to_h)
120
+ # => [
121
+ # { name: "Alice", age: "30", city: "New York" },
122
+ # { name: "Bob", age: "25", city: nil },
123
+ # { name: "Charlie", age: "35", city: "Chicago" }
124
+ # ]
125
+ ```
72
126
 
73
- # SmarterCSV — direct string parsing
127
+ **With SmarterCSV:**
128
+ ```ruby
74
129
  rows = SmarterCSV.parse(csv_string)
75
- # => [{name: "Alice", age: 30}, {name: "Bob", age: 25}]
130
+ # => [
131
+ # { name: "Alice", age: 30, city: "New York" },
132
+ # { name: "Bob", age: 25 },
133
+ # { name: "Charlie", age: 35, city: "Chicago" }
134
+ # ]
76
135
  ```
77
136
 
78
137
  `SmarterCSV.parse` is a convenience wrapper added in 1.16.0. Under the hood it wraps the
@@ -82,15 +141,17 @@ string in a `StringIO` — but you don't need to think about that.
82
141
 
83
142
  ## Row-by-row iteration
84
143
 
144
+ **With Ruby CSV:**
85
145
  ```ruby
86
- # Ruby CSV
87
- CSV.foreach('data.csv', headers: true, header_converters: :symbol) do |row|
88
- MyModel.create(row.to_h)
146
+ CSV.foreach('sample.csv', headers: true, header_converters: :symbol) do |row|
147
+ MyModel.create(row.to_h) # row is a CSV::Row needs .to_h
89
148
  end
149
+ ```
90
150
 
91
- # SmarterCSV
92
- SmarterCSV.each('data.csv') do |row|
93
- MyModel.create(row) # row is already a plain Hash — no .to_h needed
151
+ **With SmarterCSV:**
152
+ ```ruby
153
+ SmarterCSV.each('sample.csv') do |row|
154
+ MyModel.create(row) # row is already a plain Hash — no .to_h needed
94
155
  end
95
156
  ```
96
157
 
@@ -98,53 +159,67 @@ end
98
159
  `Enumerable` API is available:
99
160
 
100
161
  ```ruby
101
- names = SmarterCSV.each('data.csv').map { |row| row[:name] }
102
- us_rows = SmarterCSV.each('data.csv').select { |row| row[:country] == 'US' }
103
- first10 = SmarterCSV.each('data.csv').lazy.first(10)
162
+ names = SmarterCSV.each('sample.csv').map { |row| row[:name] }
163
+ # => ["Alice", "Bob", "Charlie"]
164
+
165
+ us_rows = SmarterCSV.each('sample.csv').select { |row| row[:city] == 'New York' }
166
+ # => [{ name: "Alice", age: 30, city: "New York" }]
167
+
168
+ first2 = SmarterCSV.each('sample.csv').lazy.first(2)
169
+ # => [{ name: "Alice", age: 30, city: "New York" }, { name: "Bob", age: 25 }]
104
170
  ```
105
171
 
106
172
  ---
107
173
 
108
174
  ## Key behavior differences
109
175
 
110
- ### 1. Symbol keys (same as `CSV.table`, different from `CSV.read`)
176
+ ### 1. String keys Symbol keys
111
177
 
112
- SmarterCSV returns symbol keys by default the same as `CSV.table`. If you were using
113
- `CSV.read` with string keys, add `strings_as_keys: true`:
178
+ `CSV.read` returns string keys by default. SmarterCSV returns symbol keys, which are more
179
+ efficient (interned in memory) and idiomatic for Rails and ActiveRecord.
114
180
 
181
+ **With Ruby CSV:**
115
182
  ```ruby
116
- # Ruby CSV.read string keys
117
- rows = CSV.read('data.csv', headers: true)
118
- rows.first['name'] # string key
183
+ rows = CSV.read('sample.csv', headers: true).map(&:to_h)
184
+ rows.first['name'] # => "Alice"
185
+ rows.first['age'] # => "30"
186
+ ```
119
187
 
120
- # SmarterCSV default — symbol keys (same as CSV.table)
121
- rows = SmarterCSV.process('data.csv')
122
- rows.first[:name] # symbol key
188
+ **With SmarterCSV:**
189
+ ```ruby
190
+ rows = SmarterCSV.process('sample.csv')
191
+ rows.first[:name] # => "Alice"
192
+ rows.first[:age] # => 30
123
193
 
124
- # SmarterCSV with string keys — if you need to match CSV.read behaviour
125
- rows = SmarterCSV.process('data.csv', strings_as_keys: true)
126
- rows.first['name']
194
+ # To match CSV.read string-key behaviour:
195
+ rows = SmarterCSV.process('sample.csv', strings_as_keys: true)
196
+ rows.first['name'] # => "Alice"
127
197
  ```
128
198
 
129
199
  ### 2. Numeric conversion is automatic
130
200
 
131
- SmarterCSV converts numeric strings to `Integer` or `Float` automatically (the `:numeric`
132
- converter in Ruby CSV terms). You get integers and floats back without requesting it:
201
+ `CSV.read` returns everything as strings. SmarterCSV converts numeric strings to `Integer`
202
+ or `Float` automatically no `converters: :numeric` needed.
133
203
 
134
- ```ruby
135
- # Ruby CSV — explicit converter needed
136
- CSV.table('data.csv', converters: :numeric)
204
+ Watch out for columns where leading zeros matter — ZIP codes, phone numbers, account numbers —
205
+ and exclude them:
137
206
 
138
- # SmarterCSV — automatic (convert_values_to_numeric: true is the default)
139
- SmarterCSV.process('data.csv')
207
+ **With Ruby CSV:**
208
+ ```ruby
209
+ rows = CSV.read('sample.csv', headers: true).map(&:to_h)
210
+ rows.first['age'] # => "30" (String)
211
+ rows.first['age'].class # => String
140
212
  ```
141
213
 
142
- To disable: `convert_values_to_numeric: false`.
143
-
144
- To limit conversion to specific columns:
214
+ **With SmarterCSV:**
145
215
  ```ruby
146
- SmarterCSV.process('data.csv', convert_values_to_numeric: { only: [:age, :score] })
147
- SmarterCSV.process('data.csv', convert_values_to_numeric: { except: [:zip_code] })
216
+ rows = SmarterCSV.process('sample.csv')
217
+ rows.first[:age] # => 30 (Integer)
218
+ rows.first[:age].class # => Integer
219
+
220
+ # Exclude columns where leading zeros matter:
221
+ rows = SmarterCSV.process('sample.csv',
222
+ convert_values_to_numeric: { except: [:zip_code, :phone, :account_number] })
148
223
  ```
149
224
 
150
225
  ### 3. Empty values are removed by default
@@ -152,18 +227,20 @@ SmarterCSV.process('data.csv', convert_values_to_numeric: { except: [:zip_code]
152
227
  SmarterCSV drops key/value pairs where the value is `nil` or blank
153
228
  (`remove_empty_values: true` is the default). Ruby CSV keeps them as `nil`.
154
229
 
230
+ **With Ruby CSV:**
155
231
  ```ruby
156
- # CSV "Alice,,30" with header "name,city,age"
157
-
158
- # Ruby CSV — nil values present
159
- # => {name: "Alice", city: nil, age: 30}
232
+ rows = CSV.read('sample.csv', headers: true, header_converters: :symbol).map(&:to_h)
233
+ rows[1] # => { name: "Bob", age: "25", city: nil }
234
+ ```
160
235
 
161
- # SmarterCSV default — nil removed
162
- # => {name: "Alice", age: 30}
236
+ **With SmarterCSV:**
237
+ ```ruby
238
+ rows = SmarterCSV.process('sample.csv')
239
+ rows[1] # => { name: "Bob", age: 25 } ← empty city removed
163
240
 
164
- # SmarterCSV keep nil values (match Ruby CSV behaviour)
165
- SmarterCSV.process('data.csv', remove_empty_values: false)
166
- # => {name: "Alice", city: nil, age: 30}
241
+ # To keep nil values and match Ruby CSV behaviour:
242
+ rows = SmarterCSV.process('sample.csv', remove_empty_values: false)
243
+ rows[1] # => { name: "Bob", age: 25, city: nil }
167
244
  ```
168
245
 
169
246
  ### 4. Plain Hash, not CSV::Row
@@ -173,18 +250,69 @@ Ruby CSV returns `CSV::Row` objects. SmarterCSV returns plain Ruby `Hash` object
173
250
  `CSV::Row` wraps a hash with extra methods (`.headers`, `.fields`, `.to_h`, `.to_a`).
174
251
  With SmarterCSV you work directly with the hash — no wrapper, no `.to_h` needed.
175
252
 
253
+ **With Ruby CSV:**
176
254
  ```ruby
177
- # Ruby CSV CSV::Row object
178
- row = CSV.table('data.csv').first
255
+ row = CSV.read('sample.csv', headers: true).first
179
256
  row.class # => CSV::Row
180
- row.headers # => [:name, :age]
181
- row.to_h # => {name: "Alice", age: 30}
257
+ row['name'] # => "Alice"
258
+ row['age'] # => "30" (String)
259
+ row.to_h # => { "name" => "Alice", "age" => "30", "city" => "New York" }
260
+ ```
182
261
 
183
- # SmarterCSV — plain Hash
184
- row = SmarterCSV.process('data.csv').first
262
+ **With SmarterCSV:**
263
+ ```ruby
264
+ row = SmarterCSV.process('sample.csv').first
185
265
  row.class # => Hash
186
- row.keys # => [:name, :age]
187
- row # => {name: "Alice", age: 30}
266
+ row[:name] # => "Alice"
267
+ row[:age] # => 30 (Integer)
268
+ row # => { name: "Alice", age: 30, city: "New York" }
269
+ ```
270
+
271
+ ---
272
+
273
+ ## Renaming headers to match your schema
274
+
275
+ CSV column names rarely match your ActiveRecord attribute names. Use `key_mapping:` to rename
276
+ them in one step — the mapping uses the normalized (downcased, underscored) header name as input:
277
+
278
+ **With SmarterCSV:**
279
+ ```ruby
280
+ # CSV headers: "First Name", "Last Name", "E-Mail", "Date of Birth"
281
+ # After normalization: :first_name, :last_name, :e_mail, :date_of_birth
282
+
283
+ rows = SmarterCSV.process('contacts.csv',
284
+ key_mapping: {
285
+ first_name: :given_name,
286
+ last_name: :family_name,
287
+ e_mail: :email,
288
+ date_of_birth: :dob,
289
+ })
290
+ # => [{ given_name: "Alice", family_name: "Smith", email: "alice@example.com", dob: "1990-05-14" }, ...]
291
+ ```
292
+
293
+ Map a key to `nil` to drop that column entirely:
294
+
295
+ ```ruby
296
+ key_mapping: { internal_id: nil, created_at: nil } # these columns won't appear in results
297
+ ```
298
+
299
+ ---
300
+
301
+ ## Select only the columns you need
302
+
303
+ Wide CSV files often have dozens of columns your application doesn't need. Use `headers: { only: }`
304
+ to declare upfront which columns to keep — SmarterCSV skips everything else at the parser level,
305
+ so unneeded fields are never allocated:
306
+
307
+ **With SmarterCSV:**
308
+ ```ruby
309
+ # CSV has 50 columns — you only need 3
310
+ rows = SmarterCSV.process('contacts.csv',
311
+ headers: { only: [:email, :first_name, :last_name] })
312
+ # => [{ email: "alice@example.com", first_name: "Alice", last_name: "Smith" }, ...]
313
+
314
+ # Or exclude a known noisy column while keeping everything else:
315
+ rows = SmarterCSV.process('export.csv', headers: { except: [:internal_notes] })
188
316
  ```
189
317
 
190
318
  ---
@@ -195,16 +323,44 @@ Ruby CSV has built-in `:date` and `:date_time` converters. SmarterCSV intentiona
195
323
  them because date formats are locale-dependent (`12/03/2020` means December 3rd in the US
196
324
  but March 12th in Europe). Use a `value_converter` instead:
197
325
 
326
+ **With Ruby CSV:**
327
+ ```ruby
328
+ rows = CSV.read('data.csv', headers: true, converters: :date)
329
+ rows.first['birth_date'] # => #<Date: 1990-05-15> (assumes ISO 8601 format only)
330
+ ```
331
+
332
+ **With SmarterCSV:**
198
333
  ```ruby
199
334
  require 'date'
200
335
 
201
- # ISO 8601 (YYYY-MM-DD) — unambiguous
202
- iso_date = Class.new { def self.convert(v) = v ? Date.strptime(v, '%Y-%m-%d') : nil }
336
+ rows = SmarterCSV.process('data.csv',
337
+ value_converters: {
338
+ birth_date: ->(v) { v ? Date.strptime(v, '%Y-%m-%d') : nil }, # ISO 8601
339
+ # birth_date: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil }, # US format
340
+ # birth_date: ->(v) { v ? Date.strptime(v, '%d.%m.%Y') : nil }, # EU format
341
+ })
342
+ rows.first[:birth_date] # => #<Date: 1990-05-15>
343
+ ```
344
+
345
+ See [Value Converters](./value_converters.md) for full details.
346
+
347
+ ---
348
+
349
+ ## Custom value converters
203
350
 
204
- SmarterCSV.process('data.csv', value_converters: { birth_date: iso_date })
351
+ SmarterCSV lets you apply any transformation per column — prices, booleans, custom types:
352
+
353
+ **With SmarterCSV:**
354
+ ```ruby
355
+ rows = SmarterCSV.process('records.csv',
356
+ value_converters: {
357
+ birth_date: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
358
+ price: ->(v) { v&.delete('$,')&.to_f },
359
+ active: ->(v) { v&.match?(/\Atrue\z/i) },
360
+ })
205
361
  ```
206
362
 
207
- See [Value Converters](./value_converters.md) for full details and examples for US/EU formats.
363
+ See [Value Converters](./value_converters.md) for full details.
208
364
 
209
365
  ---
210
366
 
@@ -213,50 +369,72 @@ See [Value Converters](./value_converters.md) for full details and examples for
213
369
  Ruby CSV leaves these as strings. SmarterCSV lets you nil-ify them (and optionally remove
214
370
  the key) in a single option:
215
371
 
372
+ **With SmarterCSV:**
216
373
  ```ruby
217
- # Remove rows where any value is NULL or an Excel error
218
- SmarterCSV.process('data.csv', nil_values_matching: /\A(NULL|NaN|#VALUE!)\z/)
374
+ # Remove keys where value matches (remove_empty_values: true is the default)
375
+ rows = SmarterCSV.process('data.csv', nil_values_matching: /\A(NULL|N\/A|NaN|#VALUE!)\z/i)
376
+ # fields matching the pattern are removed entirely
219
377
 
220
- # Keep the key but set the value to nil (useful for distinguishing "missing" from "absent")
221
- SmarterCSV.process('data.csv',
378
+ # Keep the key but set the value to nil:
379
+ rows = SmarterCSV.process('data.csv',
222
380
  nil_values_matching: /\ANULL\z/,
223
381
  remove_empty_values: false,
224
382
  )
383
+ # => [{ name: "Alice", score: nil, ... }]
225
384
  ```
226
385
 
227
386
  ---
228
387
 
229
388
  ## Malformed / bad rows
230
389
 
231
- Ruby CSV has `liberal_parsing: true` to silently swallow parse errors.
232
- SmarterCSV gives you explicit control:
233
-
390
+ **With Ruby CSV:**
234
391
  ```ruby
235
- # Ruby CSVsilent ignore
236
- CSV.read('data.csv', liberal_parsing: true)
392
+ # Silent ignoreerrors are swallowed
393
+ rows = CSV.read('data.csv', liberal_parsing: true)
394
+ ```
237
395
 
238
- # SmarterCSV — collect bad rows so you can inspect them
396
+ **With SmarterCSV:**
397
+ ```ruby
398
+ # Collect bad rows so you can inspect, log, or quarantine them
239
399
  reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
240
400
  good_rows = reader.process
241
- bad_rows = reader.errors[:bad_rows] # inspect / log / quarantine
401
+ bad_rows = reader.errors[:bad_rows]
402
+
403
+ puts "#{good_rows.size} imported, #{bad_rows.size} bad rows"
404
+ bad_rows.each { |r| puts "Line #{r[:file_line_number]}: #{r[:error_message]}" }
242
405
  ```
243
406
 
244
407
  See [Bad Row Quarantine](./bad_row_quarantine.md) for full details.
245
408
 
246
409
  ---
247
410
 
411
+ ## Batch processing for large files
412
+
413
+ **With SmarterCSV:**
414
+ ```ruby
415
+ SmarterCSV.process('big.csv', chunk_size: 500) do |chunk|
416
+ MyModel.insert_all(chunk) # bulk insert 500 rows at a time
417
+ end
418
+ ```
419
+
420
+ ---
421
+
248
422
  ## Writing CSV
249
423
 
424
+ **With Ruby CSV:**
250
425
  ```ruby
251
- # Ruby CSV
252
- CSV.open('out.csv', 'w', write_headers: true, headers: ['name','age']) do |csv|
426
+ CSV.open('out.csv', 'w', write_headers: true, headers: ['name', 'age']) do |csv|
253
427
  csv << ['Alice', 30]
428
+ csv << ['Bob', 25]
254
429
  end
430
+ ```
255
431
 
256
- # SmarterCSV — takes hashes, discovers headers automatically
432
+ **With SmarterCSV:**
433
+ ```ruby
434
+ # Takes hashes, discovers headers automatically
257
435
  SmarterCSV.generate('out.csv') do |csv|
258
- csv << {name: 'Alice', age: 30}
259
- csv << {name: 'Bob', age: 25}
436
+ csv << { name: 'Alice', age: 30 }
437
+ csv << { name: 'Bob', age: 25 }
260
438
  end
261
439
  ```
262
440
 
@@ -270,21 +448,118 @@ send_data io.string, type: 'text/csv'
270
448
 
271
449
  ---
272
450
 
451
+ ## Advanced patterns
452
+
453
+ ### Rails file upload
454
+
455
+ Accepting a CSV upload in a Rails controller — pass the tempfile path directly:
456
+
457
+ ```ruby
458
+ def create
459
+ file = params[:file] # ActionDispatch::Http::UploadedFile
460
+
461
+ SmarterCSV.process(file.path, chunk_size: 500) do |chunk|
462
+ MyModel.insert_all(chunk)
463
+ end
464
+
465
+ redirect_to root_path, notice: "Import complete"
466
+ end
467
+ ```
468
+
469
+ ### Parallel processing with Sidekiq
470
+
471
+ ```ruby
472
+ SmarterCSV.process('users.csv', chunk_size: 100) do |chunk, chunk_index|
473
+ puts "Queueing chunk #{chunk_index} (#{chunk.size} records)..."
474
+ Sidekiq::Client.push_bulk(
475
+ 'class' => UserImportWorker,
476
+ 'args' => chunk,
477
+ )
478
+ end
479
+ ```
480
+
481
+ ### Streaming directly from S3
482
+
483
+ SmarterCSV accepts any IO-like object — stream a CSV directly from S3 without writing a temp file:
484
+
485
+ ```ruby
486
+ require 'aws-sdk-s3'
487
+
488
+ s3 = Aws::S3::Client.new(region: 'us-east-1')
489
+ obj = s3.get_object(bucket: 'my-bucket', key: 'imports/contacts.csv')
490
+
491
+ SmarterCSV::Reader.new(obj.body, chunk_size: 500).each_chunk do |chunk, _index|
492
+ MyModel.insert_all(chunk)
493
+ end
494
+ ```
495
+
496
+ ### Production instrumentation
497
+
498
+ ```ruby
499
+ SmarterCSV.process('large_import.csv',
500
+ chunk_size: 1_000,
501
+ on_start: ->(info) { Rails.logger.info "Import started: #{info[:input]} (#{info[:file_size]} bytes)" },
502
+ on_chunk: ->(info) { Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows (#{info[:total_rows_so_far]} total)" },
503
+ on_complete: ->(stats) {
504
+ Rails.logger.info "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s, #{stats[:bad_rows]} bad rows"
505
+ StatsD.histogram('csv.import.duration', stats[:duration])
506
+ },
507
+ ) { |chunk| MyModel.insert_all(chunk) }
508
+ ```
509
+
510
+ See [Instrumentation Hooks](./instrumentation.md) for full details.
511
+
512
+ ### Resumable imports with Rails ActiveJob
513
+
514
+ Rails 8.1 introduced `ActiveJob::Continuable` — jobs that pause on deployment and resume exactly
515
+ where they stopped. SmarterCSV's `chunk_index` maps directly onto the job cursor:
516
+
517
+ ```ruby
518
+ class ImportCsvJob < ApplicationJob
519
+ include ActiveJob::Continuable
520
+
521
+ def perform(file_path)
522
+ step :import_rows do |step|
523
+ SmarterCSV.process(file_path, chunk_size: 500) do |chunk, chunk_index|
524
+ next if chunk_index < step.cursor.to_i # skip already-processed chunks on resume
525
+
526
+ MyModel.insert_all(chunk)
527
+ step.set! chunk_index + 1
528
+ end
529
+ end
530
+ end
531
+ end
532
+ ```
533
+
534
+ ### Bulk upsert — insert or update
535
+
536
+ ```ruby
537
+ SmarterCSV.process('contacts.csv',
538
+ chunk_size: 500,
539
+ key_mapping: { e_mail: :email },
540
+ ) do |chunk|
541
+ Contact.upsert_all(chunk, unique_by: :email)
542
+ end
543
+ ```
544
+
545
+ ---
546
+
273
547
  ## Quick reference
274
548
 
275
549
  | Ruby CSV | SmarterCSV equivalent | Notes |
276
550
  |---|---|---|
277
- | `CSV.table(f)` | `SmarterCSV.process(f)` | Drop-in. Symbol keys, numeric conversion. |
278
- | `CSV.read(f, headers: true)` | `SmarterCSV.process(f, strings_as_keys: true)` | Add `strings_as_keys:` for string keys. |
551
+ | `CSV.read(f, headers: true).map(&:to_h)` | `SmarterCSV.process(f)` | Symbol keys, numeric conversion, whitespace stripped. |
552
+ | `CSV.read(f, headers: true, header_converters: :symbol).map(&:to_h)` | `SmarterCSV.process(f)` | Drop-in. |
553
+ | `CSV.table(f).map(&:to_h)` | `SmarterCSV.process(f)` | Drop-in. |
279
554
  | `CSV.parse(str, headers: true, header_converters: :symbol)` | `SmarterCSV.parse(str)` | Direct string parsing. |
280
555
  | `CSV.foreach(f, headers: true) { \|r\| }` | `SmarterCSV.each(f) { \|r\| }` | Row is already a plain Hash. |
281
556
  | `converters: :numeric` | default | Automatic in SmarterCSV. |
282
- | `converters: :date` | `value_converters: {col: DateConverter}` | See [Value Converters](./value_converters.md). |
283
- | `liberal_parsing: true` | `on_bad_row: :collect` | Explicit quarantine is better. |
557
+ | `converters: :date` | `value_converters: {col: ->(v) { ... } }` | Use explicit format strings — date formats are locale-dependent. |
558
+ | `liberal_parsing: true` | `on_bad_row: :collect` | Explicit quarantine gives you visibility. |
284
559
  | `skip_blanks: true` | `remove_empty_hashes: true` | Default in SmarterCSV. |
285
560
  | `row.to_h` | `row` | Already a plain Hash — no conversion needed. |
286
561
  | `row.headers` | `reader.headers` | Available on the `Reader` instance. |
287
562
 
288
563
  ---
289
- PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Parsing Strategy](./parsing_strategy.md) | UP: [README](../README.md)
564
+ PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md) | UP: [README](../README.md)
290
565
 
data/docs/options.md CHANGED
@@ -3,6 +3,7 @@
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
5
  * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
6
7
  * [Parsing Strategy](./parsing_strategy.md)
7
8
  * [The Basic Read API](./basic_read_api.md)
8
9
  * [The Basic Write API](./basic_write_api.md)
@@ -118,7 +119,7 @@ See [Parsing Strategy](./parsing_strategy.md) for full details on quote handling
118
119
  |--------|---------|-------------|
119
120
  | `:strip_whitespace` | `true` | Remove whitespace before/after values and headers. |
120
121
  | `:convert_values_to_numeric` | `true` | Convert strings containing integers or floats to the appropriate numeric type. Accepts `{except: [:key1, :key2]}` or `{only: :key3}` to limit which columns. |
121
- | `:value_converters` | `nil` | Hash of `:header => ClassName`; each class must implement `self.convert(value)`. See [Value Converters](./value_converters.md). |
122
+ | `:value_converters` | `nil` | Hash of `:header => converter`; converter can be a lambda/Proc or a class implementing `self.convert(value)`. See [Value Converters](./value_converters.md). |
122
123
  | `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil` or an empty string. |
123
124
  | `:remove_zero_values` | `false` | Remove key/value pairs where the numeric value equals zero. |
124
125
  | `:nil_values_matching` | `nil` | Set matching values to `nil`. Accepts a regular expression matched against the string representation of each value (e.g. `/\ANAN\z/` for NaN, `/\A#VALUE!\z/` for Excel errors). With `remove_empty_values: true` (default), nil-ified values are then removed. With `remove_empty_values: false`, the key is retained with a `nil` value. |
@@ -3,6 +3,7 @@
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
5
  * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
6
7
  * [**Parsing Strategy**](./parsing_strategy.md)
7
8
  * [The Basic Read API](./basic_read_api.md)
8
9
  * [The Basic Write API](./basic_write_api.md)
@@ -158,4 +159,4 @@ Both options apply simultaneously. `quote_boundary` governs *where* a quote is r
158
159
 
159
160
  --------------
160
161
 
161
- PREVIOUS: [Migrating from Ruby CSV](./migrating_from_csv.md) | NEXT: [The Basic Read API](./basic_read_api.md) | UP: [README](../README.md)
162
+ PREVIOUS: [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md) | NEXT: [The Basic Read API](./basic_read_api.md) | UP: [README](../README.md)
@@ -3,6 +3,7 @@
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
5
  * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
6
7
  * [Parsing Strategy](./parsing_strategy.md)
7
8
  * [The Basic Read API](./basic_read_api.md)
8
9
  * [The Basic Write API](./basic_write_api.md)
@@ -3,6 +3,7 @@
3
3
 
4
4
  * [Introduction](../../_introduction.md)
5
5
  * [Migrating from Ruby CSV](../../migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](../../ruby_csv_pitfalls.md)
6
7
  * [Parsing Strategy](../../parsing_strategy.md)
7
8
  * [The Basic Read API](../../basic_read_api.md)
8
9
  * [The Basic Write API](../../basic_write_api.md)
@@ -194,8 +195,6 @@ See [performance_notes.md](performance_notes.md) and [benchmarks.md](benchmarks.
194
195
 
195
196
  **Deprecations:**
196
197
 
197
- - `only_headers:` → use `headers: { only: }`
198
- - `except_headers:` → use `headers: { except: }`
199
198
  - `remove_values_matching:` → use `nil_values_matching:`
200
199
  - `strict: true` → use `missing_headers: :raise`
201
200
  - `strict: false` → use `missing_headers: :auto`
data/docs/row_col_sep.md CHANGED
@@ -3,6 +3,7 @@
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
5
  * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
6
7
  * [Parsing Strategy](./parsing_strategy.md)
7
8
  * [The Basic Read API](./basic_read_api.md)
8
9
  * [The Basic Write API](./basic_write_api.md)