smarter_csv 1.15.2 → 1.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +9 -0
  3. data/CHANGELOG.md +68 -1
  4. data/CONTRIBUTORS.md +3 -1
  5. data/Gemfile +1 -0
  6. data/README.md +123 -27
  7. data/docs/_introduction.md +40 -24
  8. data/docs/bad_row_quarantine.md +285 -0
  9. data/docs/basic_read_api.md +151 -9
  10. data/docs/basic_write_api.md +474 -59
  11. data/docs/batch_processing.md +161 -4
  12. data/docs/column_selection.md +183 -0
  13. data/docs/data_transformations.md +162 -29
  14. data/docs/examples.md +339 -46
  15. data/docs/header_transformations.md +93 -12
  16. data/docs/header_validations.md +56 -18
  17. data/docs/history.md +117 -0
  18. data/docs/instrumentation.md +165 -0
  19. data/docs/migrating_from_csv.md +290 -0
  20. data/docs/options.md +150 -87
  21. data/docs/parsing_strategy.md +63 -1
  22. data/docs/real_world_csv.md +262 -0
  23. data/docs/releases/1.16.0/benchmarks.md +223 -0
  24. data/docs/releases/1.16.0/changes.md +272 -0
  25. data/docs/releases/1.16.0/performance_notes.md +114 -0
  26. data/docs/row_col_sep.md +14 -5
  27. data/docs/value_converters.md +193 -57
  28. data/ext/smarter_csv/extconf.rb +3 -0
  29. data/ext/smarter_csv/smarter_csv.c +1007 -71
  30. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
  31. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
  32. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
  33. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
  34. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
  35. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
  36. data/lib/smarter_csv/errors.rb +8 -0
  37. data/lib/smarter_csv/file_io.rb +1 -1
  38. data/lib/smarter_csv/hash_transformations.rb +14 -13
  39. data/lib/smarter_csv/header_transformations.rb +21 -2
  40. data/lib/smarter_csv/headers.rb +2 -1
  41. data/lib/smarter_csv/options.rb +124 -7
  42. data/lib/smarter_csv/parser.rb +362 -75
  43. data/lib/smarter_csv/reader.rb +494 -46
  44. data/lib/smarter_csv/version.rb +1 -1
  45. data/lib/smarter_csv/writer.rb +71 -19
  46. data/lib/smarter_csv.rb +95 -12
  47. data/smarter_csv.gemspec +20 -10
  48. metadata +37 -80
data/docs/history.md ADDED
@@ -0,0 +1,117 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Parsing Strategy](./parsing_strategy.md)
7
+ * [The Basic Read API](./basic_read_api.md)
8
+ * [The Basic Write API](./basic_write_api.md)
9
+ * [Batch Processing](././batch_processing.md)
10
+ * [Configuration Options](./options.md)
11
+ * [Row and Column Separators](./row_col_sep.md)
12
+ * [Header Transformations](./header_transformations.md)
13
+ * [Header Validations](./header_validations.md)
14
+ * [Column Selection](./column_selection.md)
15
+ * [Data Transformations](./data_transformations.md)
16
+ * [Value Converters](./value_converters.md)
17
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
18
+ * [Instrumentation Hooks](./instrumentation.md)
19
+ * [Examples](./examples.md)
20
+ * [Real-World CSV Files](./real_world_csv.md)
21
+ * [**SmarterCSV over the Years**](./history.md)
22
+ * [Release Notes](./releases/1.16.0/changes.md)
23
+
24
+ --------------
25
+
26
+ # SmarterCSV over the Years
27
+
28
+ ## Origin
29
+
30
+ SmarterCSV was born from a [StackOverflow question in 2011](https://stackoverflow.com/questions/7788618/update-mongodb-with-array-from-csv-join-table/7788746#7788746) about importing CSV data into MongoDB. The answer involved processing CSV rows as hashes — which turned out to be so useful that it became a gem.
31
+
32
+ The original write-up is preserved at [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html).
33
+
34
+ The first gem release was **v1.0.1 on 2012-07-30**.
35
+
36
+ ---
37
+
38
+ ## Key Milestones
39
+
40
+ | Version | Date | Highlight |
41
+ |---------|------------|-----------|
42
+ | 1.0.1 | 2012-07-30 | First release: CSV → array of hashes, batch processing, key mapping |
43
+ | 1.0.17 | 2014-01-13 | `row_sep: :auto` — automatic row separator detection |
44
+ | 1.0.18 | 2014-10-27 | Multi-line / embedded-newline field support |
45
+ | 1.1.0 | 2015-07-26 | `value_converters` — custom per-column type parsing (dates, money, …) |
46
+ | 1.4.0 | 2022-02-11 | Experimental `col_sep: :auto` detection; switched to MIT-only licence |
47
+ | 1.5.1 | 2022-04-27 | `duplicate_header_suffix` for CSV files with repeated headers |
48
+ | 1.6.0 | 2022-05-03 | Complete rewrite of the pure-Ruby line parser |
49
+ | **1.7.0** | **2022-06-26** | **First C extension — >10× speedup over 1.6.x announced** |
50
+ | 1.8.0 | 2023-03-18 | `col_sep: :auto` and `row_sep: :auto` made the **default** |
51
+ | 1.9.0 | 2023-09-04 | Structured error objects with programmatic key access |
52
+ | 1.10.0 | 2023-12-31 | Performance & memory improvements; stricter `user_provided_headers` |
53
+ | **1.11.0** | **2024-07-02** | **SmarterCSV::Writer** — CSV generation from hashes |
54
+ | **1.12.0** | **2024-07-09** | **Thread-safe `SmarterCSV::Reader` class**; docs site added |
55
+ | 1.13.0 | 2024-11-06 | Auto-generation of extra column names; improved quote robustness |
56
+ | 1.14.0 | 2025-04-07 | Advanced Writer options; `header_converter` |
57
+ | 1.14.3 | 2025-05-04 | C-extension fast path for unquoted fields; inline whitespace stripping |
58
+ | **1.15.0** | **2026-02-04** | **Major C-extension rewrite — ~5× faster than 1.14.4; 39% less memory** |
59
+ | 1.15.1 | 2026-02-17 | Fix for backslash in quoted fields (`quote_escaping:` option) |
60
+ | 1.15.2 | 2026-02-20 | Further C-path optimisations; 5.4×–37.4× faster than 1.14.4 |
61
+ | **1.16.0** | **2026-03-12** | **New `each`/`each_chunk` enumerator API; `SmarterCSV.parse`; bad row quarantine; column selection `headers: { only: }`; 1.8×–8.6× faster than Ruby CSV.read; new features for Reader and Writer; minor breaking: `quote_boundary: :standard`** |
62
+
63
+ ---
64
+
65
+ ## Performance Journey
66
+
67
+ Measured on Apple M1, Ruby 3.4.7. Best of 2 sessions × 30 runs.
68
+ All times are **C-accelerated** except the `1.6.1` column (no C extension existed).
69
+ `—` = not measured for that version.
70
+
71
+ | File | Rows | 1.6.1 Rb (s) | 1.7.1 C (s) | 1.14.4 C (s) | 1.15.2 C (s) | 1.16.0 C (s) | total gain |
72
+ |--------------------------------|------:|-------------:|------------:|-------------:|-------------:|-------------:|-----------:|
73
+ | PEOPLE_IMPORT_B.csv | 50k | 3.793 | 1.083 | 1.656 | 0.101 | 0.087 | **43.6×** |
74
+ | PEOPLE_IMPORT_C.csv | 50k | 21.612 | 2.763 | 8.172 | 0.207 | 0.169 | **127.8×** |
75
+ | PEOPLE_IMPORT_NB.csv | 50k | 3.746 | 1.053 | 1.605 | 0.086 | 0.080 | **46.9×** |
76
+ | PEOPLE_IMPORT_NC.csv | 50k | 3.831 | 1.018 | 1.495 | 0.076 | 0.063 | **60.8×** |
77
+ | uscities.csv | 31k | — | — | 1.058 | 0.113 | 0.108 | — |
78
+ | uszips.csv | 34k | — | — | 1.277 | 0.111 | 0.102 | — |
79
+ | worldcities.csv | 48k | — | — | 1.070 | 0.116 | 0.097 | — |
80
+ | fmap.csv | 50k | 2.130 | 0.873 | — | — | — | — |
81
+ | zipcode.csv | 44k | 1.572 | 0.797 | — | — | — | — |
82
+ | sample_10M.csv | 50k | 1.291 | 0.661 | 0.459 | 0.053 | 0.046 | **28.0×** |
83
+ | sensor_data_50krows_50cols.csv | 50k | — | — | 3.985 | 0.272 | 0.264 | — |
84
+ | embedded_newlines_20k.csv | 80k | 0.716 | 0.366 | 0.540 | 0.056 | 0.054 | **13.2×** |
85
+ | embedded_separators_20k.csv | 20k | 0.714 | 0.333 | 0.278 | 0.032 | 0.025 | **28.6×** |
86
+ | heavy_quoting_20k.csv | 20k | 1.309 | 0.484 | 0.522 | 0.054 | 0.036 | **36.5×** |
87
+ | long_fields_20k.csv | 20k | 5.698 | 1.112 | 2.960 | 0.110 | 0.045 | **126.6×** |
88
+ | many_empty_fields_20k.csv | 20k | 1.149 | 0.420 | 0.395 | 0.031 | 0.025 | **45.8×** |
89
+ | multi_char_separator_20k.csv | 20k | — | — | 0.539 | 0.033 | 0.026 | — |
90
+ | tab_separated_20k.tsv | 20k | — | — | 0.462 | 0.034 | 0.025 | — |
91
+ | utf8_multibyte_20k.csv | 20k | 0.709 | 0.305 | 0.228 | 0.020 | 0.017 | **41.7×** |
92
+ | whitespace_heavy_20k.csv | 20k | 1.335 | 0.393 | 0.536 | 0.036 | 0.028 | **47.5×** |
93
+ | wide_500_cols_20k.csv | 20k | 39.755 | 9.532 | 17.658 | 1.419 | 1.352 | **29.4×** |
94
+
95
+ `total gain` = v1.6.1 Ruby time / v1.16.0 C-accelerated time (files without 1.6.1 data show `—`)
96
+
97
+ --------------
98
+
99
+ **Highlights:**
100
+ - `long_fields_20k` (long quoted fields): **126.6×** — `memchr`-based field scanning makes long quoted fields essentially free to skip.
101
+ - `PEOPLE_IMPORT_C` (116 columns): **127.8×** — wide rows multiply every per-field saving across all columns.
102
+ - `PEOPLE_IMPORT_NC` (17 columns): **60.8×** — Ruby-path optimisations #10 & #11 provide an extra boost on moderately wide files.
103
+ - `wide_500_cols_20k` went from **39.8 seconds → 1.35 seconds** — and with `headers: { only: }` keeping just 2 of those 500 columns it drops further to **~0.1 seconds** (an additional ~16× on top).
104
+ - `embedded_newlines` shows the smallest gain (**13.2×**) — multi-line stitching is bounded by I/O and the line-counting loop, not field parsing.
105
+
106
+ ---
107
+
108
+ ## Related Reading
109
+
110
+ - [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
111
+ - [SmarterCSV 1.15.2 — Faster than raw CSV arrays](https://tilo-sloboda.medium.com/smartercsv-1-15-2-faster-than-raw-csv-arrays-benchmarks-zsv-and-the-full-pipeline-2c12a798032e)
112
+ - [Processing 1.4 Million CSV Records in Ruby, fast](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
113
+ - [Faster Parsing CSV with Parallel Processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing) by [Jack Lin](https://github.com/xjlin0/)
114
+
115
+ --------------------
116
+
117
+ PREVIOUS: [Real-World CSV Files](./real_world_csv.md) | NEXT: [Release Notes](./releases/1.16.0/changes.md) | UP: [README](../README.md)
@@ -0,0 +1,165 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Parsing Strategy](./parsing_strategy.md)
7
+ * [The Basic Read API](./basic_read_api.md)
8
+ * [The Basic Write API](./basic_write_api.md)
9
+ * [Batch Processing](././batch_processing.md)
10
+ * [Configuration Options](./options.md)
11
+ * [Row and Column Separators](./row_col_sep.md)
12
+ * [Header Transformations](./header_transformations.md)
13
+ * [Header Validations](./header_validations.md)
14
+ * [Column Selection](./column_selection.md)
15
+ * [Data Transformations](./data_transformations.md)
16
+ * [Value Converters](./value_converters.md)
17
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
18
+ * [**Instrumentation Hooks**](./instrumentation.md)
19
+ * [Examples](./examples.md)
20
+ * [Real-World CSV Files](./real_world_csv.md)
21
+ * [SmarterCSV over the Years](./history.md)
22
+ * [Release Notes](./releases/1.16.0/changes.md)
23
+
24
+ --------------
25
+
26
+ # Instrumentation Hooks
27
+
28
+ SmarterCSV provides three optional callback hooks so you can observe file processing
29
+ without wrapping every call site in timing code. The hooks work with `SmarterCSV.process`
30
+ (library-controlled iteration). Enumerator modes (`each`, `each_chunk`) do not fire
31
+ hooks — in those modes the caller owns the lifecycle and should instrument their own loop.
32
+
33
+ ## The Three Hooks
34
+
35
+ | Hook | Fires when | Useful for |
36
+ |---------------|-----------------------------------------------------|---------------------------------------------|
37
+ | `on_start` | Once, before the first row is parsed | Logging intent, starting timers, counters |
38
+ | `on_chunk` | After each chunk is parsed, before block runs | Progress tracking, per-batch metrics |
39
+ | `on_complete` | Once, after the entire file is exhausted | Total duration, row counts, summary metrics |
40
+
41
+ `on_chunk` only fires when `chunk_size` is set. In non-chunked mode only `on_start` and
42
+ `on_complete` fire.
43
+
44
+ ## Usage
45
+
46
+ All three hooks are lambdas (or any callable) passed as options:
47
+
48
+ ```ruby
49
+ SmarterCSV.process('data.csv',
50
+ chunk_size: 500,
51
+
52
+ on_start: ->(info) {
53
+ Rails.logger.info "Starting CSV import: #{info[:input]} (#{info[:file_size]} bytes)"
54
+ Metrics.increment('csv.import.start')
55
+ },
56
+
57
+ on_chunk: ->(info) {
58
+ Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows " \
59
+ "(#{info[:total_rows_so_far]} so far)"
60
+ },
61
+
62
+ on_complete: ->(stats) {
63
+ Rails.logger.info "Import complete: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s"
64
+ Metrics.histogram('csv.import.duration', stats[:duration])
65
+ Metrics.gauge('csv.import.rows', stats[:total_rows])
66
+ Metrics.increment('csv.import.bad_rows', stats[:bad_rows]) if stats[:bad_rows] > 0
67
+ },
68
+ ) { |chunk| MyModel.insert_all(chunk) }
69
+ ```
70
+
71
+ ## Hook Payloads
72
+
73
+ ### `on_start`
74
+
75
+ | Key | Type | Description |
76
+ |--------------|---------------|---------------------------------------------------------------------|
77
+ | `:input` | String | File path if input is a filename; class name (e.g. `"File"`) otherwise |
78
+ | `:file_size` | Integer / nil | File size in bytes if determinable; nil for IO objects |
79
+ | `:col_sep` | String | Effective column separator (after auto-detection) |
80
+ | `:row_sep` | String | Effective row separator (after auto-detection) |
81
+
82
+ ### `on_chunk`
83
+
84
+ | Key | Type | Description |
85
+ |-----------------------|---------|------------------------------------------------------|
86
+ | `:chunk_number` | Integer | 1-based index of this chunk |
87
+ | `:rows_in_chunk` | Integer | Number of rows in this chunk (≤ `chunk_size`) |
88
+ | `:total_rows_so_far` | Integer | Cumulative rows processed including this chunk |
89
+
90
+ ### `on_complete`
91
+
92
+ | Key | Type | Description |
93
+ |-----------------|---------|--------------------------------------------------------------------|
94
+ | `:total_rows` | Integer | Total rows successfully parsed |
95
+ | `:total_chunks` | Integer | Number of chunks yielded (0 in non-chunked mode) |
96
+ | `:duration` | Float | Elapsed seconds from `on_start` to `on_complete` |
97
+ | `:bad_rows` | Integer | Number of rows that triggered `on_bad_row` handling (0 if none) |
98
+
99
+ ## Non-chunked mode
100
+
101
+ When `chunk_size` is not set, `on_chunk` never fires. `on_start` and `on_complete`
102
+ still fire and give you the full-file summary:
103
+
104
+ ```ruby
105
+ SmarterCSV.process('data.csv',
106
+ on_start: ->(info) { @started_at = Time.now; log "Importing #{info[:input]}" },
107
+ on_complete: ->(stats) { log "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(3)}s" },
108
+ )
109
+ ```
110
+
111
+ ## Execution order
112
+
113
+ ```
114
+ on_start
115
+ ├─ on_chunk (chunk 1 parsed) → block runs → returns
116
+ ├─ on_chunk (chunk 2 parsed) → block runs → returns
117
+ └─ on_chunk (chunk N parsed) → block runs → returns
118
+ on_complete
119
+ ```
120
+
121
+ `on_chunk` fires **before** the block receives the chunk, so you can record timing or
122
+ state before your processing logic runs.
123
+
124
+ ## Without Rails / ActiveSupport
125
+
126
+ The hooks are plain callables — no dependency on Rails or any framework:
127
+
128
+ ```ruby
129
+ require 'logger'
130
+ logger = Logger.new($stdout)
131
+
132
+ SmarterCSV.process('import.csv',
133
+ on_start: ->(i) { logger.info "CSV import started: #{i[:input]}" },
134
+ on_complete: ->(s) { logger.info "CSV import done: #{s[:total_rows]} rows, #{s[:duration].round(2)}s" },
135
+ )
136
+ ```
137
+
138
+ ## With `ActiveSupport::Notifications` (Rails)
139
+
140
+ If you prefer Rails-style instrumentation, wrap the hooks yourself:
141
+
142
+ ```ruby
143
+ # config/initializers/smarter_csv_instrumentation.rb
144
+ ON_START = ->(info) {
145
+ ActiveSupport::Notifications.instrument('start.smarter_csv', info)
146
+ }
147
+ ON_COMPLETE = ->(stats) {
148
+ ActiveSupport::Notifications.instrument('complete.smarter_csv', stats)
149
+ }
150
+
151
+ # Subscribe once at startup:
152
+ ActiveSupport::Notifications.subscribe('complete.smarter_csv') do |*, payload|
153
+ StatsD.histogram('csv.duration', payload[:duration])
154
+ StatsD.gauge('csv.rows', payload[:total_rows])
155
+ end
156
+ ```
157
+
158
+ Then pass the cached lambdas to any `process` call:
159
+
160
+ ```ruby
161
+ SmarterCSV.process(file, on_start: ON_START, on_complete: ON_COMPLETE)
162
+ ```
163
+
164
+ --------------------
165
+ PREVIOUS: [Bad Row Quarantine](./bad_row_quarantine.md) | NEXT: [Examples](./examples.md) | UP: [README](../README.md)
@@ -0,0 +1,290 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [**Migrating from Ruby CSV**](./migrating_from_csv.md)
6
+ * [Parsing Strategy](./parsing_strategy.md)
7
+ * [The Basic Read API](./basic_read_api.md)
8
+ * [The Basic Write API](./basic_write_api.md)
9
+ * [Batch Processing](././batch_processing.md)
10
+ * [Configuration Options](./options.md)
11
+ * [Row and Column Separators](./row_col_sep.md)
12
+ * [Header Transformations](./header_transformations.md)
13
+ * [Header Validations](./header_validations.md)
14
+ * [Column Selection](./column_selection.md)
15
+ * [Data Transformations](./data_transformations.md)
16
+ * [Value Converters](./value_converters.md)
17
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
18
+ * [Instrumentation Hooks](./instrumentation.md)
19
+ * [Examples](./examples.md)
20
+ * [Real-World CSV Files](./real_world_csv.md)
21
+ * [SmarterCSV over the Years](./history.md)
22
+ * [Release Notes](./releases/1.16.0/changes.md)
23
+
24
+ --------------
25
+
26
+ # Migrating from Ruby CSV
27
+
28
+ Already using Ruby's built-in `CSV` library? Switching to SmarterCSV is typically a one- or
29
+ two-line change — and you get **1.7×–8.6× faster** end-to-end throughput vs `CSV.read`, plain Ruby
30
+ hashes with symbol keys, automatic type conversion, and a much richer feature set in return.
31
+
32
+ > **Medium article:** *"Switch from Ruby CSV to SmarterCSV in 5 Minutes"* — *(coming soon)*
33
+
34
+ ---
35
+
36
+ ## Performance
37
+
38
+ | Comparison | Range |
39
+ |---|---|
40
+ | SmarterCSV vs `CSV.read` † | **1.7×–8.6× faster** |
41
+ | SmarterCSV vs `CSV.table` ‡ | **7×–129× faster** |
42
+
43
+ _Benchmarks: 19 CSV files (20k–80k rows), Ruby 3.4.7, Apple M1._
44
+
45
+ _† `CSV.read` returns raw arrays of arrays — hash construction, key normalization, and type conversion still need to happen, understating the real cost difference._
46
+
47
+ _‡ `CSV.table` is the closest Ruby equivalent to SmarterCSV — both return symbol-keyed hashes._
48
+
49
+ ---
50
+
51
+ ## The one-line switch
52
+
53
+ ```ruby
54
+ # Before — Ruby CSV
55
+ rows = CSV.table('data.csv').map(&:to_h) # array of hashes with symbol keys
56
+
57
+ # After — SmarterCSV (drop-in, up to 129× faster)
58
+ rows = SmarterCSV.process('data.csv') # array of hashes with symbol keys
59
+ ```
60
+
61
+ That's it for the common case. Keep reading for the few behavior differences to be aware of.
62
+
63
+ ---
64
+
65
+ ## Parsing a CSV string
66
+
67
+ ```ruby
68
+ csv_string = "name,age\nAlice,30\nBob,25\n"
69
+
70
+ # Ruby CSV
71
+ rows = CSV.parse(csv_string, headers: true, header_converters: :symbol)
72
+
73
+ # SmarterCSV — direct string parsing
74
+ rows = SmarterCSV.parse(csv_string)
75
+ # => [{name: "Alice", age: 30}, {name: "Bob", age: 25}]
76
+ ```
77
+
78
+ `SmarterCSV.parse` is a convenience wrapper added in 1.16.0. Under the hood it wraps the
79
+ string in a `StringIO` — but you don't need to think about that.
80
+
81
+ ---
82
+
83
+ ## Row-by-row iteration
84
+
85
+ ```ruby
86
+ # Ruby CSV
87
+ CSV.foreach('data.csv', headers: true, header_converters: :symbol) do |row|
88
+ MyModel.create(row.to_h)
89
+ end
90
+
91
+ # SmarterCSV
92
+ SmarterCSV.each('data.csv') do |row|
93
+ MyModel.create(row) # row is already a plain Hash — no .to_h needed
94
+ end
95
+ ```
96
+
97
+ `SmarterCSV.each` returns an `Enumerator` when called without a block, so the full
98
+ `Enumerable` API is available:
99
+
100
+ ```ruby
101
+ names = SmarterCSV.each('data.csv').map { |row| row[:name] }
102
+ us_rows = SmarterCSV.each('data.csv').select { |row| row[:country] == 'US' }
103
+ first10 = SmarterCSV.each('data.csv').lazy.first(10)
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Key behavior differences
109
+
110
+ ### 1. Symbol keys (same as `CSV.table`, different from `CSV.read`)
111
+
112
+ SmarterCSV returns symbol keys by default — the same as `CSV.table`. If you were using
113
+ `CSV.read` with string keys, add `strings_as_keys: true`:
114
+
115
+ ```ruby
116
+ # Ruby CSV.read — string keys
117
+ rows = CSV.read('data.csv', headers: true)
118
+ rows.first['name'] # string key
119
+
120
+ # SmarterCSV default — symbol keys (same as CSV.table)
121
+ rows = SmarterCSV.process('data.csv')
122
+ rows.first[:name] # symbol key
123
+
124
+ # SmarterCSV with string keys — if you need to match CSV.read behaviour
125
+ rows = SmarterCSV.process('data.csv', strings_as_keys: true)
126
+ rows.first['name']
127
+ ```
128
+
129
+ ### 2. Numeric conversion is automatic
130
+
131
+ SmarterCSV converts numeric strings to `Integer` or `Float` automatically (the `:numeric`
132
+ converter in Ruby CSV terms). You get integers and floats back without requesting it:
133
+
134
+ ```ruby
135
+ # Ruby CSV — explicit converter needed
136
+ CSV.table('data.csv', converters: :numeric)
137
+
138
+ # SmarterCSV — automatic (convert_values_to_numeric: true is the default)
139
+ SmarterCSV.process('data.csv')
140
+ ```
141
+
142
+ To disable: `convert_values_to_numeric: false`.
143
+
144
+ To limit conversion to specific columns:
145
+ ```ruby
146
+ SmarterCSV.process('data.csv', convert_values_to_numeric: { only: [:age, :score] })
147
+ SmarterCSV.process('data.csv', convert_values_to_numeric: { except: [:zip_code] })
148
+ ```
149
+
150
+ ### 3. Empty values are removed by default
151
+
152
+ SmarterCSV drops key/value pairs where the value is `nil` or blank
153
+ (`remove_empty_values: true` is the default). Ruby CSV keeps them as `nil`.
154
+
155
+ ```ruby
156
+ # CSV "Alice,,30" with header "name,city,age"
157
+
158
+ # Ruby CSV — nil values present
159
+ # => {name: "Alice", city: nil, age: 30}
160
+
161
+ # SmarterCSV default — nil removed
162
+ # => {name: "Alice", age: 30}
163
+
164
+ # SmarterCSV — keep nil values (match Ruby CSV behaviour)
165
+ SmarterCSV.process('data.csv', remove_empty_values: false)
166
+ # => {name: "Alice", city: nil, age: 30}
167
+ ```
168
+
169
+ ### 4. Plain Hash, not CSV::Row
170
+
171
+ Ruby CSV returns `CSV::Row` objects. SmarterCSV returns plain Ruby `Hash` objects.
172
+
173
+ `CSV::Row` wraps a hash with extra methods (`.headers`, `.fields`, `.to_h`, `.to_a`).
174
+ With SmarterCSV you work directly with the hash — no wrapper, no `.to_h` needed.
175
+
176
+ ```ruby
177
+ # Ruby CSV — CSV::Row object
178
+ row = CSV.table('data.csv').first
179
+ row.class # => CSV::Row
180
+ row.headers # => [:name, :age]
181
+ row.to_h # => {name: "Alice", age: 30}
182
+
183
+ # SmarterCSV — plain Hash
184
+ row = SmarterCSV.process('data.csv').first
185
+ row.class # => Hash
186
+ row.keys # => [:name, :age]
187
+ row # => {name: "Alice", age: 30}
188
+ ```
189
+
190
+ ---
191
+
192
+ ## Date / DateTime conversion
193
+
194
+ Ruby CSV has built-in `:date` and `:date_time` converters. SmarterCSV intentionally omits
195
+ them because date formats are locale-dependent (`12/03/2020` means December 3rd in the US
196
+ but March 12th in Europe). Use a `value_converter` instead:
197
+
198
+ ```ruby
199
+ require 'date'
200
+
201
+ # ISO 8601 (YYYY-MM-DD) — unambiguous
202
+ iso_date = Class.new { def self.convert(v) = v ? Date.strptime(v, '%Y-%m-%d') : nil }
203
+
204
+ SmarterCSV.process('data.csv', value_converters: { birth_date: iso_date })
205
+ ```
206
+
207
+ See [Value Converters](./value_converters.md) for full details and examples for US/EU formats.
208
+
209
+ ---
210
+
211
+ ## Sentinel values (NULL, NaN, #VALUE!)
212
+
213
+ Ruby CSV leaves these as strings. SmarterCSV lets you nil-ify them (and optionally remove
214
+ the key) in a single option:
215
+
216
+ ```ruby
217
+ # Remove rows where any value is NULL or an Excel error
218
+ SmarterCSV.process('data.csv', nil_values_matching: /\A(NULL|NaN|#VALUE!)\z/)
219
+
220
+ # Keep the key but set the value to nil (useful for distinguishing "missing" from "absent")
221
+ SmarterCSV.process('data.csv',
222
+ nil_values_matching: /\ANULL\z/,
223
+ remove_empty_values: false,
224
+ )
225
+ ```
226
+
227
+ ---
228
+
229
+ ## Malformed / bad rows
230
+
231
+ Ruby CSV has `liberal_parsing: true` to silently swallow parse errors.
232
+ SmarterCSV gives you explicit control:
233
+
234
+ ```ruby
235
+ # Ruby CSV — silent ignore
236
+ CSV.read('data.csv', liberal_parsing: true)
237
+
238
+ # SmarterCSV — collect bad rows so you can inspect them
239
+ reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
240
+ good_rows = reader.process
241
+ bad_rows = reader.errors[:bad_rows] # inspect / log / quarantine
242
+ ```
243
+
244
+ See [Bad Row Quarantine](./bad_row_quarantine.md) for full details.
245
+
246
+ ---
247
+
248
+ ## Writing CSV
249
+
250
+ ```ruby
251
+ # Ruby CSV
252
+ CSV.open('out.csv', 'w', write_headers: true, headers: ['name','age']) do |csv|
253
+ csv << ['Alice', 30]
254
+ end
255
+
256
+ # SmarterCSV — takes hashes, discovers headers automatically
257
+ SmarterCSV.generate('out.csv') do |csv|
258
+ csv << {name: 'Alice', age: 30}
259
+ csv << {name: 'Bob', age: 25}
260
+ end
261
+ ```
262
+
263
+ SmarterCSV's writer also accepts any IO object (StringIO, open file handle) for streaming:
264
+
265
+ ```ruby
266
+ io = StringIO.new
267
+ SmarterCSV.generate(io) { |csv| records.each { |r| csv << r } }
268
+ send_data io.string, type: 'text/csv'
269
+ ```
270
+
271
+ ---
272
+
273
+ ## Quick reference
274
+
275
+ | Ruby CSV | SmarterCSV equivalent | Notes |
276
+ |---|---|---|
277
+ | `CSV.table(f)` | `SmarterCSV.process(f)` | Drop-in. Symbol keys, numeric conversion. |
278
+ | `CSV.read(f, headers: true)` | `SmarterCSV.process(f, strings_as_keys: true)` | Add `strings_as_keys:` for string keys. |
279
+ | `CSV.parse(str, headers: true, header_converters: :symbol)` | `SmarterCSV.parse(str)` | Direct string parsing. |
280
+ | `CSV.foreach(f, headers: true) { \|r\| }` | `SmarterCSV.each(f) { \|r\| }` | Row is already a plain Hash. |
281
+ | `converters: :numeric` | default | Automatic in SmarterCSV. |
282
+ | `converters: :date` | `value_converters: {col: DateConverter}` | See [Value Converters](./value_converters.md). |
283
+ | `liberal_parsing: true` | `on_bad_row: :collect` | Explicit quarantine is better. |
284
+ | `skip_blanks: true` | `remove_empty_hashes: true` | Default in SmarterCSV. |
285
+ | `row.to_h` | `row` | Already a plain Hash — no conversion needed. |
286
+ | `row.headers` | `reader.headers` | Available on the `Reader` instance. |
287
+
288
+ ---
289
+ PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Parsing Strategy](./parsing_strategy.md) | UP: [README](../README.md)
290
+