smarter_csv 1.15.2 → 1.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/.rspec +2 -0
  3. data/.rubocop.yml +9 -0
  4. data/CHANGELOG.md +112 -1
  5. data/CONTRIBUTORS.md +4 -1
  6. data/Gemfile +1 -0
  7. data/README.md +129 -27
  8. data/docs/_introduction.md +45 -24
  9. data/docs/bad_row_quarantine.md +342 -0
  10. data/docs/basic_read_api.md +152 -9
  11. data/docs/basic_write_api.md +475 -59
  12. data/docs/batch_processing.md +162 -4
  13. data/docs/column_selection.md +184 -0
  14. data/docs/data_transformations.md +163 -29
  15. data/docs/examples.md +340 -46
  16. data/docs/header_transformations.md +94 -12
  17. data/docs/header_validations.md +57 -18
  18. data/docs/history.md +119 -0
  19. data/docs/instrumentation.md +166 -0
  20. data/docs/migrating_from_csv.md +565 -0
  21. data/docs/options.md +151 -87
  22. data/docs/parsing_strategy.md +64 -1
  23. data/docs/real_world_csv.md +263 -0
  24. data/docs/releases/1.16.0/benchmarks.md +223 -0
  25. data/docs/releases/1.16.0/changes.md +273 -0
  26. data/docs/releases/1.16.0/performance_notes.md +114 -0
  27. data/docs/row_col_sep.md +15 -5
  28. data/docs/ruby_csv_pitfalls.md +514 -0
  29. data/docs/value_converters.md +194 -57
  30. data/ext/smarter_csv/extconf.rb +3 -0
  31. data/ext/smarter_csv/smarter_csv.c +1017 -82
  32. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
  33. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
  34. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
  35. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
  36. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
  37. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
  38. data/lib/smarter_csv/errors.rb +8 -0
  39. data/lib/smarter_csv/file_io.rb +1 -1
  40. data/lib/smarter_csv/hash_transformations.rb +14 -13
  41. data/lib/smarter_csv/header_transformations.rb +21 -2
  42. data/lib/smarter_csv/headers.rb +2 -1
  43. data/lib/smarter_csv/options.rb +124 -7
  44. data/lib/smarter_csv/parser.rb +358 -74
  45. data/lib/smarter_csv/reader.rb +494 -46
  46. data/lib/smarter_csv/version.rb +1 -1
  47. data/lib/smarter_csv/writer.rb +71 -19
  48. data/lib/smarter_csv.rb +134 -13
  49. data/smarter_csv.gemspec +20 -10
  50. metadata +38 -80
@@ -0,0 +1,565 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [**Migrating from Ruby CSV**](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
7
+ * [Parsing Strategy](./parsing_strategy.md)
8
+ * [The Basic Read API](./basic_read_api.md)
9
+ * [The Basic Write API](./basic_write_api.md)
10
+ * [Batch Processing](././batch_processing.md)
11
+ * [Configuration Options](./options.md)
12
+ * [Row and Column Separators](./row_col_sep.md)
13
+ * [Header Transformations](./header_transformations.md)
14
+ * [Header Validations](./header_validations.md)
15
+ * [Column Selection](./column_selection.md)
16
+ * [Data Transformations](./data_transformations.md)
17
+ * [Value Converters](./value_converters.md)
18
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
19
+ * [Instrumentation Hooks](./instrumentation.md)
20
+ * [Examples](./examples.md)
21
+ * [Real-World CSV Files](./real_world_csv.md)
22
+ * [SmarterCSV over the Years](./history.md)
23
+ * [Release Notes](./releases/1.16.0/changes.md)
24
+
25
+ --------------
26
+
27
+ # Migrating from Ruby CSV
28
+
29
+ Already using Ruby's built-in `CSV` library? There are three good reasons to switch — and switching is typically a one- or two-line change.
30
+
31
+ ### Inconvenient
32
+ `CSV.read` returns arrays of arrays, so your code must manually handle column indexing, header normalization, type conversion, and whitespace stripping. SmarterCSV returns Rails-ready hashes with symbol keys, numeric conversion, and whitespace stripping out of the box — no boilerplate needed.
33
+
34
+ ### Hidden failure modes
35
+ `CSV.read` has 10 ways to silently corrupt or lose data — no exception, no warning, no log line.
36
+
37
+ ➡️ See [**Ruby CSV Pitfalls**](./ruby_csv_pitfalls.md) for reproducible examples and the SmarterCSV fix for each.
38
+
39
+ ### Slow
40
+ On top of everything else, it is up to 129× slower than SmarterCSV for equivalent end-to-end work — see the [Performance](#performance) section below.
41
+
42
+ > **Medium article:** *"Switch from Ruby CSV to SmarterCSV in 5 Minutes"* — *(coming soon)*
43
+
44
+ ---
45
+
46
+ ## Performance
47
+
48
+ | Comparison | Range |
49
+ |---|---|
50
+ | SmarterCSV vs `CSV.read` † | **1.7×–8.6× faster** |
51
+ | SmarterCSV vs `CSV.table` ‡ | **7×–129× faster** |
52
+
53
+ _Benchmarks: 19 CSV files (20k–80k rows), Ruby 3.4.7, Apple M1._
54
+
55
+ _† `CSV.read` returns raw arrays of arrays — hash construction, key normalization, and type conversion still need to happen, understating the real cost difference._
56
+
57
+ _‡ `CSV.table` is the closest Ruby equivalent to SmarterCSV — both return symbol-keyed hashes._
58
+
59
+ ---
60
+
61
+ ## The one-line switch
62
+
63
+ Real-world CSV files are messy — whitespace-padded headers, extra columns without headers, trailing
64
+ commas. Consider this file:
65
+
66
+ ```
67
+ $ cat data.csv
68
+ First Name , Last Name , Age
69
+ Alice , Smith, 30, VIP, Gold ,
70
+ Bob, Jones, 25
71
+ ```
72
+
73
+ **With Ruby CSV:**
74
+ ```ruby
75
+ rows = CSV.read('data.csv', headers: true).map(&:to_h)
76
+ rows.first
77
+ # => { " First Name " => "Alice ", " Last Name " => " Smith", " Age" => " 30", nil => "" }
78
+ # "VIP" and "Gold" silently lost — both compete for the nil key, last one wins
79
+ ```
80
+
81
+ Whitespace-polluted keys, `Age` as a string, and extra columns competing for the same `nil` key —
82
+ the last one wins and the rest are silently discarded.
83
+
84
+ **With SmarterCSV:**
85
+ ```ruby
86
+ rows = SmarterCSV.process('data.csv')
87
+ rows.first
88
+ # => { first_name: "Alice", last_name: "Smith", age: 30, column_1: "VIP", column_2: "Gold" }
89
+ ```
90
+
91
+ Clean symbol keys, whitespace stripped, `age` converted to `Integer`, extra columns named — no data loss.
92
+
93
+ No `.map(&:to_h)`, no `header_converters:`, no manual post-processing.
94
+
95
+ ---
96
+
97
+ ## Sample file used in remaining examples
98
+
99
+ The sections below use a simpler file to keep the focus on the specific behavior being illustrated:
100
+
101
+ ```
102
+ $ cat sample.csv
103
+ name,age,city
104
+ Alice,30,New York
105
+ Bob,25,
106
+ Charlie,35,Chicago
107
+ ```
108
+
109
+ Bob's `city` field is intentionally empty to illustrate empty-value handling.
110
+
111
+ ---
112
+
113
+ ## Parsing a CSV string
114
+
115
+ **With Ruby CSV:**
116
+ ```ruby
117
+ csv_string = "name,age,city\nAlice,30,New York\nBob,25,\nCharlie,35,Chicago\n"
118
+
119
+ rows = CSV.parse(csv_string, headers: true, header_converters: :symbol).map(&:to_h)
120
+ # => [
121
+ # { name: "Alice", age: "30", city: "New York" },
122
+ # { name: "Bob", age: "25", city: nil },
123
+ # { name: "Charlie", age: "35", city: "Chicago" }
124
+ # ]
125
+ ```
126
+
127
+ **With SmarterCSV:**
128
+ ```ruby
129
+ rows = SmarterCSV.parse(csv_string)
130
+ # => [
131
+ # { name: "Alice", age: 30, city: "New York" },
132
+ # { name: "Bob", age: 25 },
133
+ # { name: "Charlie", age: 35, city: "Chicago" }
134
+ # ]
135
+ ```
136
+
137
+ `SmarterCSV.parse` is a convenience wrapper added in 1.16.0. Under the hood it wraps the
138
+ string in a `StringIO` — but you don't need to think about that.
139
+
140
+ ---
141
+
142
+ ## Row-by-row iteration
143
+
144
+ **With Ruby CSV:**
145
+ ```ruby
146
+ CSV.foreach('sample.csv', headers: true, header_converters: :symbol) do |row|
147
+ MyModel.create(row.to_h) # row is a CSV::Row — needs .to_h
148
+ end
149
+ ```
150
+
151
+ **With SmarterCSV:**
152
+ ```ruby
153
+ SmarterCSV.each('sample.csv') do |row|
154
+ MyModel.create(row) # row is already a plain Hash — no .to_h needed
155
+ end
156
+ ```
157
+
158
+ `SmarterCSV.each` returns an `Enumerator` when called without a block, so the full
159
+ `Enumerable` API is available:
160
+
161
+ ```ruby
162
+ names = SmarterCSV.each('sample.csv').map { |row| row[:name] }
163
+ # => ["Alice", "Bob", "Charlie"]
164
+
165
+ us_rows = SmarterCSV.each('sample.csv').select { |row| row[:city] == 'New York' }
166
+ # => [{ name: "Alice", age: 30, city: "New York" }]
167
+
168
+ first2 = SmarterCSV.each('sample.csv').lazy.first(2)
169
+ # => [{ name: "Alice", age: 30, city: "New York" }, { name: "Bob", age: 25 }]
170
+ ```
171
+
172
+ ---
173
+
174
+ ## Key behavior differences
175
+
176
+ ### 1. String keys → Symbol keys
177
+
178
+ `CSV.read` returns string keys by default. SmarterCSV returns symbol keys, which are more
179
+ efficient (interned in memory) and idiomatic for Rails and ActiveRecord.
180
+
181
+ **With Ruby CSV:**
182
+ ```ruby
183
+ rows = CSV.read('sample.csv', headers: true).map(&:to_h)
184
+ rows.first['name'] # => "Alice"
185
+ rows.first['age'] # => "30"
186
+ ```
187
+
188
+ **With SmarterCSV:**
189
+ ```ruby
190
+ rows = SmarterCSV.process('sample.csv')
191
+ rows.first[:name] # => "Alice"
192
+ rows.first[:age] # => 30
193
+
194
+ # To match CSV.read string-key behaviour:
195
+ rows = SmarterCSV.process('sample.csv', strings_as_keys: true)
196
+ rows.first['name'] # => "Alice"
197
+ ```
198
+
199
+ ### 2. Numeric conversion is automatic
200
+
201
+ `CSV.read` returns everything as strings. SmarterCSV converts numeric strings to `Integer`
202
+ or `Float` automatically — no `converters: :numeric` needed.
203
+
204
+ Watch out for columns where leading zeros matter — ZIP codes, phone numbers, account numbers —
205
+ and exclude them:
206
+
207
+ **With Ruby CSV:**
208
+ ```ruby
209
+ rows = CSV.read('sample.csv', headers: true).map(&:to_h)
210
+ rows.first['age'] # => "30" (String)
211
+ rows.first['age'].class # => String
212
+ ```
213
+
214
+ **With SmarterCSV:**
215
+ ```ruby
216
+ rows = SmarterCSV.process('sample.csv')
217
+ rows.first[:age] # => 30 (Integer)
218
+ rows.first[:age].class # => Integer
219
+
220
+ # Exclude columns where leading zeros matter:
221
+ rows = SmarterCSV.process('sample.csv',
222
+ convert_values_to_numeric: { except: [:zip_code, :phone, :account_number] })
223
+ ```
224
+
225
+ ### 3. Empty values are removed by default
226
+
227
+ SmarterCSV drops key/value pairs where the value is `nil` or blank
228
+ (`remove_empty_values: true` is the default). Ruby CSV keeps them as `nil`.
229
+
230
+ **With Ruby CSV:**
231
+ ```ruby
232
+ rows = CSV.read('sample.csv', headers: true, header_converters: :symbol).map(&:to_h)
233
+ rows[1] # => { name: "Bob", age: "25", city: nil }
234
+ ```
235
+
236
+ **With SmarterCSV:**
237
+ ```ruby
238
+ rows = SmarterCSV.process('sample.csv')
239
+ rows[1] # => { name: "Bob", age: 25 } ← empty city removed
240
+
241
+ # To keep nil values and match Ruby CSV behaviour:
242
+ rows = SmarterCSV.process('sample.csv', remove_empty_values: false)
243
+ rows[1] # => { name: "Bob", age: 25, city: nil }
244
+ ```
245
+
246
+ ### 4. Plain Hash, not CSV::Row
247
+
248
+ Ruby CSV returns `CSV::Row` objects. SmarterCSV returns plain Ruby `Hash` objects.
249
+
250
+ `CSV::Row` wraps a hash with extra methods (`.headers`, `.fields`, `.to_h`, `.to_a`).
251
+ With SmarterCSV you work directly with the hash — no wrapper, no `.to_h` needed.
252
+
253
+ **With Ruby CSV:**
254
+ ```ruby
255
+ row = CSV.read('sample.csv', headers: true).first
256
+ row.class # => CSV::Row
257
+ row['name'] # => "Alice"
258
+ row['age'] # => "30" (String)
259
+ row.to_h # => { "name" => "Alice", "age" => "30", "city" => "New York" }
260
+ ```
261
+
262
+ **With SmarterCSV:**
263
+ ```ruby
264
+ row = SmarterCSV.process('sample.csv').first
265
+ row.class # => Hash
266
+ row[:name] # => "Alice"
267
+ row[:age] # => 30 (Integer)
268
+ row # => { name: "Alice", age: 30, city: "New York" }
269
+ ```
270
+
271
+ ---
272
+
273
+ ## Renaming headers to match your schema
274
+
275
+ CSV column names rarely match your ActiveRecord attribute names. Use `key_mapping:` to rename
276
+ them in one step — the mapping uses the normalized (downcased, underscored) header name as input:
277
+
278
+ **With SmarterCSV:**
279
+ ```ruby
280
+ # CSV headers: "First Name", "Last Name", "E-Mail", "Date of Birth"
281
+ # After normalization: :first_name, :last_name, :e_mail, :date_of_birth
282
+
283
+ rows = SmarterCSV.process('contacts.csv',
284
+ key_mapping: {
285
+ first_name: :given_name,
286
+ last_name: :family_name,
287
+ e_mail: :email,
288
+ date_of_birth: :dob,
289
+ })
290
+ # => [{ given_name: "Alice", family_name: "Smith", email: "alice@example.com", dob: "1990-05-14" }, ...]
291
+ ```
292
+
293
+ Map a key to `nil` to drop that column entirely:
294
+
295
+ ```ruby
296
+ key_mapping: { internal_id: nil, created_at: nil } # these columns won't appear in results
297
+ ```
298
+
299
+ ---
300
+
301
+ ## Select only the columns you need
302
+
303
+ Wide CSV files often have dozens of columns your application doesn't need. Use `headers: { only: }`
304
+ to declare upfront which columns to keep — SmarterCSV skips everything else at the parser level,
305
+ so unneeded fields are never allocated:
306
+
307
+ **With SmarterCSV:**
308
+ ```ruby
309
+ # CSV has 50 columns — you only need 3
310
+ rows = SmarterCSV.process('contacts.csv',
311
+ headers: { only: [:email, :first_name, :last_name] })
312
+ # => [{ email: "alice@example.com", first_name: "Alice", last_name: "Smith" }, ...]
313
+
314
+ # Or exclude a known noisy column while keeping everything else:
315
+ rows = SmarterCSV.process('export.csv', headers: { except: [:internal_notes] })
316
+ ```
317
+
318
+ ---
319
+
320
+ ## Date / DateTime conversion
321
+
322
+ Ruby CSV has built-in `:date` and `:date_time` converters. SmarterCSV intentionally omits
323
+ them because date formats are locale-dependent (`12/03/2020` means December 3rd in the US
324
+ but March 12th in Europe). Use a `value_converter` instead:
325
+
326
+ **With Ruby CSV:**
327
+ ```ruby
328
+ rows = CSV.read('data.csv', headers: true, converters: :date)
329
+ rows.first['birth_date'] # => #<Date: 1990-05-15> (assumes ISO 8601 format only)
330
+ ```
331
+
332
+ **With SmarterCSV:**
333
+ ```ruby
334
+ require 'date'
335
+
336
+ rows = SmarterCSV.process('data.csv',
337
+ value_converters: {
338
+ birth_date: ->(v) { v ? Date.strptime(v, '%Y-%m-%d') : nil }, # ISO 8601
339
+ # birth_date: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil }, # US format
340
+ # birth_date: ->(v) { v ? Date.strptime(v, '%d.%m.%Y') : nil }, # EU format
341
+ })
342
+ rows.first[:birth_date] # => #<Date: 1990-05-15>
343
+ ```
344
+
345
+ See [Value Converters](./value_converters.md) for full details.
346
+
347
+ ---
348
+
349
+ ## Custom value converters
350
+
351
+ SmarterCSV lets you apply any transformation per column — prices, booleans, custom types:
352
+
353
+ **With SmarterCSV:**
354
+ ```ruby
355
+ rows = SmarterCSV.process('records.csv',
356
+ value_converters: {
357
+ birth_date: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
358
+ price: ->(v) { v&.delete('$,')&.to_f },
359
+ active: ->(v) { v&.match?(/\Atrue\z/i) },
360
+ })
361
+ ```
362
+
363
+ See [Value Converters](./value_converters.md) for full details.
364
+
365
+ ---
366
+
367
+ ## Sentinel values (NULL, NaN, #VALUE!)
368
+
369
+ Ruby CSV leaves these as strings. SmarterCSV lets you nil-ify them (and optionally remove
370
+ the key) in a single option:
371
+
372
+ **With SmarterCSV:**
373
+ ```ruby
374
+ # Remove keys where value matches (remove_empty_values: true is the default)
375
+ rows = SmarterCSV.process('data.csv', nil_values_matching: /\A(NULL|N\/A|NaN|#VALUE!)\z/i)
376
+ # fields matching the pattern are removed entirely
377
+
378
+ # Keep the key but set the value to nil:
379
+ rows = SmarterCSV.process('data.csv',
380
+ nil_values_matching: /\ANULL\z/,
381
+ remove_empty_values: false,
382
+ )
383
+ # => [{ name: "Alice", score: nil, ... }]
384
+ ```
385
+
386
+ ---
387
+
388
+ ## Malformed / bad rows
389
+
390
+ **With Ruby CSV:**
391
+ ```ruby
392
+ # Silent ignore — errors are swallowed
393
+ rows = CSV.read('data.csv', liberal_parsing: true)
394
+ ```
395
+
396
+ **With SmarterCSV:**
397
+ ```ruby
398
+ # Collect bad rows so you can inspect, log, or quarantine them
399
+ reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
400
+ good_rows = reader.process
401
+ bad_rows = reader.errors[:bad_rows]
402
+
403
+ puts "#{good_rows.size} imported, #{bad_rows.size} bad rows"
404
+ bad_rows.each { |r| puts "Line #{r[:file_line_number]}: #{r[:error_message]}" }
405
+ ```
406
+
407
+ See [Bad Row Quarantine](./bad_row_quarantine.md) for full details.
408
+
409
+ ---
410
+
411
+ ## Batch processing for large files
412
+
413
+ **With SmarterCSV:**
414
+ ```ruby
415
+ SmarterCSV.process('big.csv', chunk_size: 500) do |chunk|
416
+ MyModel.insert_all(chunk) # bulk insert 500 rows at a time
417
+ end
418
+ ```
419
+
420
+ ---
421
+
422
+ ## Writing CSV
423
+
424
+ **With Ruby CSV:**
425
+ ```ruby
426
+ CSV.open('out.csv', 'w', write_headers: true, headers: ['name', 'age']) do |csv|
427
+ csv << ['Alice', 30]
428
+ csv << ['Bob', 25]
429
+ end
430
+ ```
431
+
432
+ **With SmarterCSV:**
433
+ ```ruby
434
+ # Takes hashes, discovers headers automatically
435
+ SmarterCSV.generate('out.csv') do |csv|
436
+ csv << { name: 'Alice', age: 30 }
437
+ csv << { name: 'Bob', age: 25 }
438
+ end
439
+ ```
440
+
441
+ SmarterCSV's writer also accepts any IO object (StringIO, open file handle) for streaming:
442
+
443
+ ```ruby
444
+ io = StringIO.new
445
+ SmarterCSV.generate(io) { |csv| records.each { |r| csv << r } }
446
+ send_data io.string, type: 'text/csv'
447
+ ```
448
+
449
+ ---
450
+
451
+ ## Advanced patterns
452
+
453
+ ### Rails file upload
454
+
455
+ Accepting a CSV upload in a Rails controller — pass the tempfile path directly:
456
+
457
+ ```ruby
458
+ def create
459
+ file = params[:file] # ActionDispatch::Http::UploadedFile
460
+
461
+ SmarterCSV.process(file.path, chunk_size: 500) do |chunk|
462
+ MyModel.insert_all(chunk)
463
+ end
464
+
465
+ redirect_to root_path, notice: "Import complete"
466
+ end
467
+ ```
468
+
469
+ ### Parallel processing with Sidekiq
470
+
471
+ ```ruby
472
+ SmarterCSV.process('users.csv', chunk_size: 100) do |chunk, chunk_index|
473
+ puts "Queueing chunk #{chunk_index} (#{chunk.size} records)..."
474
+ Sidekiq::Client.push_bulk(
475
+ 'class' => UserImportWorker,
476
+ 'args' => chunk,
477
+ )
478
+ end
479
+ ```
480
+
481
+ ### Streaming directly from S3
482
+
483
+ SmarterCSV accepts any IO-like object — stream a CSV directly from S3 without writing a temp file:
484
+
485
+ ```ruby
486
+ require 'aws-sdk-s3'
487
+
488
+ s3 = Aws::S3::Client.new(region: 'us-east-1')
489
+ obj = s3.get_object(bucket: 'my-bucket', key: 'imports/contacts.csv')
490
+
491
+ SmarterCSV::Reader.new(obj.body, chunk_size: 500).each_chunk do |chunk, _index|
492
+ MyModel.insert_all(chunk)
493
+ end
494
+ ```
495
+
496
+ ### Production instrumentation
497
+
498
+ ```ruby
499
+ SmarterCSV.process('large_import.csv',
500
+ chunk_size: 1_000,
501
+ on_start: ->(info) { Rails.logger.info "Import started: #{info[:input]} (#{info[:file_size]} bytes)" },
502
+ on_chunk: ->(info) { Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows (#{info[:total_rows_so_far]} total)" },
503
+ on_complete: ->(stats) {
504
+ Rails.logger.info "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s, #{stats[:bad_rows]} bad rows"
505
+ StatsD.histogram('csv.import.duration', stats[:duration])
506
+ },
507
+ ) { |chunk| MyModel.insert_all(chunk) }
508
+ ```
509
+
510
+ See [Instrumentation Hooks](./instrumentation.md) for full details.
511
+
512
+ ### Resumable imports with Rails ActiveJob
513
+
514
+ Rails 8.1 introduced `ActiveJob::Continuable` — jobs that pause on deployment and resume exactly
515
+ where they stopped. SmarterCSV's `chunk_index` maps directly onto the job cursor:
516
+
517
+ ```ruby
518
+ class ImportCsvJob < ApplicationJob
519
+ include ActiveJob::Continuable
520
+
521
+ def perform(file_path)
522
+ step :import_rows do |step|
523
+ SmarterCSV.process(file_path, chunk_size: 500) do |chunk, chunk_index|
524
+ next if chunk_index < step.cursor.to_i # skip already-processed chunks on resume
525
+
526
+ MyModel.insert_all(chunk)
527
+ step.set! chunk_index + 1
528
+ end
529
+ end
530
+ end
531
+ end
532
+ ```
533
+
534
+ ### Bulk upsert — insert or update
535
+
536
+ ```ruby
537
+ SmarterCSV.process('contacts.csv',
538
+ chunk_size: 500,
539
+ key_mapping: { e_mail: :email },
540
+ ) do |chunk|
541
+ Contact.upsert_all(chunk, unique_by: :email)
542
+ end
543
+ ```
544
+
545
+ ---
546
+
547
+ ## Quick reference
548
+
549
+ | Ruby CSV | SmarterCSV equivalent | Notes |
550
+ |---|---|---|
551
+ | `CSV.read(f, headers: true).map(&:to_h)` | `SmarterCSV.process(f)` | Symbol keys, numeric conversion, whitespace stripped. |
552
+ | `CSV.read(f, headers: true, header_converters: :symbol).map(&:to_h)` | `SmarterCSV.process(f)` | Drop-in. |
553
+ | `CSV.table(f).map(&:to_h)` | `SmarterCSV.process(f)` | Drop-in. |
554
+ | `CSV.parse(str, headers: true, header_converters: :symbol)` | `SmarterCSV.parse(str)` | Direct string parsing. |
555
+ | `CSV.foreach(f, headers: true) { \|r\| }` | `SmarterCSV.each(f) { \|r\| }` | Row is already a plain Hash. |
556
+ | `converters: :numeric` | default | Automatic in SmarterCSV. |
557
+ | `converters: :date` | `value_converters: {col: ->(v) { ... } }` | Use explicit format strings — date formats are locale-dependent. |
558
+ | `liberal_parsing: true` | `on_bad_row: :collect` | Explicit quarantine gives you visibility. |
559
+ | `skip_blanks: true` | `remove_empty_hashes: true` | Default in SmarterCSV. |
560
+ | `row.to_h` | `row` | Already a plain Hash — no conversion needed. |
561
+ | `row.headers` | `reader.headers` | Available on the `Reader` instance. |
562
+
563
+ ---
564
+ PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md) | UP: [README](../README.md)
565
+