smarter_csv 1.15.2 → 1.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/.rspec +2 -0
  3. data/.rubocop.yml +9 -0
  4. data/CHANGELOG.md +112 -1
  5. data/CONTRIBUTORS.md +4 -1
  6. data/Gemfile +1 -0
  7. data/README.md +129 -27
  8. data/docs/_introduction.md +45 -24
  9. data/docs/bad_row_quarantine.md +342 -0
  10. data/docs/basic_read_api.md +152 -9
  11. data/docs/basic_write_api.md +475 -59
  12. data/docs/batch_processing.md +162 -4
  13. data/docs/column_selection.md +184 -0
  14. data/docs/data_transformations.md +163 -29
  15. data/docs/examples.md +340 -46
  16. data/docs/header_transformations.md +94 -12
  17. data/docs/header_validations.md +57 -18
  18. data/docs/history.md +119 -0
  19. data/docs/instrumentation.md +166 -0
  20. data/docs/migrating_from_csv.md +565 -0
  21. data/docs/options.md +151 -87
  22. data/docs/parsing_strategy.md +64 -1
  23. data/docs/real_world_csv.md +263 -0
  24. data/docs/releases/1.16.0/benchmarks.md +223 -0
  25. data/docs/releases/1.16.0/changes.md +273 -0
  26. data/docs/releases/1.16.0/performance_notes.md +114 -0
  27. data/docs/row_col_sep.md +15 -5
  28. data/docs/ruby_csv_pitfalls.md +514 -0
  29. data/docs/value_converters.md +194 -57
  30. data/ext/smarter_csv/extconf.rb +3 -0
  31. data/ext/smarter_csv/smarter_csv.c +1017 -82
  32. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
  33. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
  34. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
  35. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
  36. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
  37. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
  38. data/lib/smarter_csv/errors.rb +8 -0
  39. data/lib/smarter_csv/file_io.rb +1 -1
  40. data/lib/smarter_csv/hash_transformations.rb +14 -13
  41. data/lib/smarter_csv/header_transformations.rb +21 -2
  42. data/lib/smarter_csv/headers.rb +2 -1
  43. data/lib/smarter_csv/options.rb +124 -7
  44. data/lib/smarter_csv/parser.rb +358 -74
  45. data/lib/smarter_csv/reader.rb +494 -46
  46. data/lib/smarter_csv/version.rb +1 -1
  47. data/lib/smarter_csv/writer.rb +71 -19
  48. data/lib/smarter_csv.rb +134 -13
  49. data/smarter_csv.gemspec +20 -10
  50. metadata +38 -80
@@ -2,6 +2,8 @@
2
2
  ### Contents
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
+ * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
5
7
  * [Parsing Strategy](./parsing_strategy.md)
6
8
  * [The Basic Read API](./basic_read_api.md)
7
9
  * [**The Basic Write API**](./basic_write_api.md)
@@ -10,10 +12,17 @@
10
12
  * [Row and Column Separators](./row_col_sep.md)
11
13
  * [Header Transformations](./header_transformations.md)
12
14
  * [Header Validations](./header_validations.md)
15
+ * [Column Selection](./column_selection.md)
13
16
  * [Data Transformations](./data_transformations.md)
14
17
  * [Value Converters](./value_converters.md)
15
-
16
- --------------
18
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
19
+ * [Instrumentation Hooks](./instrumentation.md)
20
+ * [Examples](./examples.md)
21
+ * [Real-World CSV Files](./real_world_csv.md)
22
+ * [SmarterCSV over the Years](./history.md)
23
+ * [Release Notes](./releases/1.16.0/changes.md)
24
+
25
+ --------------
17
26
 
18
27
  # SmarterCSV Basic Write API
19
28
 
@@ -25,6 +34,72 @@ To generate a CSV file, we use the `<<` operator to append new data to the file.
25
34
 
26
35
  The input operator for adding data to a CSV file `<<` can handle single hashes, array-of-hashes, or array-of-arrays-of-hashes, and can be called one or multiple times in order to create a file.
27
36
 
37
+ ### Hashes, Not Arrays — and Why It Matters for Data Integrity
38
+
39
+ Ruby's `CSV` library lets you write raw arrays: `csv << ["Alice", 30, "NYC"]`. SmarterCSV
40
+ deliberately does not support this, because positional array writing is an open invitation
41
+ to silent data corruption.
42
+
43
+ Consider what happens when a column is added:
44
+
45
+ ```ruby
46
+ # Originally:
47
+ headers = [:name, :age, :city]
48
+
49
+ # Later, a column is inserted:
50
+ headers = [:name, :age, :country, :city]
51
+
52
+ # But the array rows were never updated:
53
+ csv << ["Alice", 30, "NYC"] # "NYC" now lands under :country, not :city
54
+ csv << ["Bob", 25, "London"] # same silent mis-alignment
55
+ ```
56
+
57
+ There is no error. The CSV looks valid. The data is wrong. This class of bug — a silent off-by-one column mis-alignment — is completely undetectable from the output file alone.
58
+
59
+ SmarterCSV avoids this entirely by requiring hashes, where every value is explicitly bound to its column name:
60
+
61
+ ```ruby
62
+ csv << { name: 'Alice', age: 30, city: 'NYC' }
63
+ ```
64
+
65
+ Adding or reordering columns cannot silently shift values. A missing key produces an empty
66
+ field in the correct column. The mapping is always explicit.
67
+
68
+ **Providing `headers:` enforces column order.** When you pass `headers:`, the Writer always
69
+ outputs columns in exactly that order — regardless of the order keys appear in the hash.
70
+ This is the right tool when column order matters:
71
+
72
+ ```ruby
73
+ options = { headers: [:country, :city, :name, :age] }
74
+
75
+ SmarterCSV.generate('output.csv', options) do |csv|
76
+ # Hash key order is irrelevant — output follows the headers order
77
+ csv << { name: 'Alice', age: 30, city: 'NYC', country: 'USA' }
78
+ csv << { name: 'Bob', age: 25, city: 'London', country: 'UK' }
79
+ end
80
+
81
+ # output:
82
+ # country,city,name,age
83
+ # USA,NYC,Alice,30
84
+ # UK,London,Bob,25
85
+ ```
86
+
87
+ This is the correct way to write CSV when column order matters: declare the headers
88
+ explicitly and let the Writer enforce them. No positional assumptions, no off-by-one risk.
89
+
90
+ If you already have data in arrays, convert to hashes first using your headers as keys.
91
+ This forces the key-to-column mapping to be explicit and visible at the one place where
92
+ it can actually be verified — instead of being implicit in the position of every value:
93
+
94
+ ```ruby
95
+ headers = [:name, :age, :city]
96
+ rows = [["Alice", 30, "NYC"], ["Bob", 25, "London"]]
97
+
98
+ csv_string = SmarterCSV.generate do |csv|
99
+ rows.each { |row| csv << headers.zip(row).to_h }
100
+ end
101
+ ```
102
+
28
103
  ### Auto-Discovery of Headers
29
104
 
30
105
  By default, the `SmarterCSV::Writer` discovers all keys that are present in the input data, and as they become know, appends them to the CSV headers. This ensures that all data will be included in the output CSV file.
@@ -46,32 +121,90 @@ In either case the corresponding field will be put in double-quotes.
46
121
 
47
122
  ### Simplified Interface
48
123
 
49
- The simplified interface takes a block:
124
+ The simplified interface takes a block. The first argument can be:
125
+
126
+ * **Omitted** — SmarterCSV writes to an internal `StringIO` and returns the CSV as a `String`.
127
+ * A **`String`** path — SmarterCSV opens the file and closes it when done.
128
+ * A **`Pathname`** (or any object responding to `#to_path`) — treated the same as a String path.
129
+ * Any **IO-like object** responding to `#write` (e.g. `StringIO`, an open `File` handle, a
130
+ socket) — SmarterCSV writes to it but does **not** close it; the caller retains ownership.
50
131
 
51
- ```
52
- SmarterCSV.generate(filename, options) do |csv_writer|
132
+ Passing anything else raises `ArgumentError` immediately.
53
133
 
54
- MyModel.find_in_batches(batch_size: 100) do |batch|
55
- batch.pluck(:name, :description, :instructor).each do |record|
56
- csv_writer << record
57
- end
58
- end
134
+ **Generate a CSV String directly (no file argument):**
59
135
 
60
- end
61
- ```
136
+ ```ruby
137
+ csv_string = SmarterCSV.generate do |csv|
138
+ csv << { name: 'Alice', age: 30 }
139
+ csv << { name: 'Bob', age: 25 }
140
+ end
141
+ # => "name,age\nAlice,30\nBob,25\n"
142
+ ```
143
+
144
+ Options can be passed as the first argument when no destination is given:
145
+
146
+ ```ruby
147
+ csv_string = SmarterCSV.generate(col_sep: ';', row_sep: "\r\n") do |csv|
148
+ records.each { |r| csv << r }
149
+ end
150
+ ```
151
+
152
+ **Write to a file by path:**
153
+
154
+ ```ruby
155
+ SmarterCSV.generate('output.csv', options) do |csv|
156
+ MyModel.find_in_batches(batch_size: 100) do |batch|
157
+ batch.each { |record| csv << record.attributes }
158
+ end
159
+ end
160
+ ```
161
+
162
+ **Write to a file using a `Pathname`:**
163
+
164
+ ```ruby
165
+ require 'pathname'
166
+ SmarterCSV.generate(Pathname('output.csv'), options) do |csv|
167
+ records.each { |r| csv << r }
168
+ end
169
+ ```
170
+
171
+ **Write to a `StringIO` (e.g. for Rails streaming responses):**
172
+
173
+ ```ruby
174
+ io = StringIO.new
175
+ SmarterCSV.generate(io) do |csv|
176
+ records.each { |r| csv << r }
177
+ end
178
+ send_data io.string, type: 'text/csv', filename: 'export.csv'
179
+ ```
180
+
181
+ **Write to an already-open file handle:**
182
+
183
+ ```ruby
184
+ File.open('output.csv', 'w') do |f|
185
+ SmarterCSV.generate(f) do |csv|
186
+ records.each { |r| csv << r }
187
+ end
188
+ end
189
+ ```
62
190
 
63
191
  ### Full Interface
64
192
 
65
- ```
66
- csv_writer = SmarterCSV::Writer.new(file_path, options)
193
+ The full interface gives you direct access to the `Writer` instance, which is useful when you
194
+ need to call `finalize` explicitly or inspect the writer's state afterwards.
67
195
 
68
- MyModel.find_in_batches(batch_size: 100) do |batch|
69
- batch.pluck(:name, :description, :instructor).each do |record|
70
- csv_writer << record
71
- end
196
+ ```ruby
197
+ csv_writer = SmarterCSV::Writer.new(file_path_or_io, options)
72
198
 
73
- csv_writer.finalize
74
- ```
199
+ MyModel.find_in_batches(batch_size: 100) do |batch|
200
+ batch.each { |record| csv_writer << record.attributes }
201
+ end
202
+
203
+ csv_writer.finalize
204
+ ```
205
+
206
+ The full interface accepts the same argument types as the simplified interface: a String path,
207
+ a `Pathname`, or any IO-like object responding to `#write`.
75
208
 
76
209
  ## Advanced Features: Customizing the Output Format
77
210
 
@@ -95,67 +228,350 @@ Similar to the `headers` option, you can define `map_headers` in order to rename
95
228
 
96
229
  ### Per Key Value Converters
97
230
 
231
+ Using per-key value converters, you can control how specific hash keys in your data are
232
+ serialized in the output. Each converter is a lambda that receives the field value and
233
+ returns the string to write.
234
+
235
+ **Boolean to string:**
236
+
237
+ ```ruby
238
+ SmarterCSV.generate('output.csv', value_converters: { active: ->(v) { v ? 'YES' : 'NO' } }) do |csv|
239
+ csv << { name: 'Alice', active: true }
240
+ csv << { name: 'Bob', active: false }
241
+ end
242
+ # output:
243
+ # name,active
244
+ # Alice,YES
245
+ # Bob,NO
246
+ ```
98
247
 
99
- Using per-key value converters, you can control how specific hash keys in your data are converted in the output.
248
+ **Date/Time formatting:**
100
249
 
101
- Example 1:
250
+ ```ruby
251
+ SmarterCSV.generate('output.csv', value_converters: { created_at: ->(v) { v&.strftime('%Y-%m-%d') } }) do |csv|
252
+ csv << { name: 'Alice', created_at: Time.now }
253
+ end
254
+ # output:
255
+ # name,created_at
256
+ # Alice,2026-03-09
257
+ ```
102
258
 
259
+ **Numeric formatting:**
260
+
261
+ ```ruby
262
+ balance_converter = ->(v) do
263
+ case v
264
+ when Float then '$%.2f' % v.round(2)
265
+ when Integer then "$#{v}"
266
+ else v.to_s
267
+ end
268
+ end
269
+
270
+ SmarterCSV.generate('output.csv', value_converters: { balance: balance_converter }) do |csv|
271
+ csv << { name: 'Alice', balance: 1234.5 }
272
+ csv << { name: 'Bob', balance: 500 }
273
+ end
274
+ # output:
275
+ # name,balance
276
+ # Alice,$1234.50
277
+ # Bob,$500
103
278
  ```
104
- options = {
105
- value_converters: {
106
- active: ->(v) { !!v ? 'YES' : 'NO' },
107
- }
108
- }
279
+
280
+ **Reusing the same converter across multiple keys:**
281
+
282
+ ```ruby
283
+ date_converter = ->(v) { v&.strftime('%Y-%m-%d') }
284
+
285
+ SmarterCSV.generate('output.csv', value_converters: { created_at: date_converter, updated_at: date_converter }) do |csv|
286
+ csv << { name: 'Alice', created_at: Time.now, updated_at: Time.now }
287
+ end
109
288
  ```
110
289
 
111
- This maps the boolean value of the hash key `:active` into strings `"YES"`, `"NO"`.
290
+ ### Global Value Converters
291
+
292
+ The special key `:_all` defines a transformation applied to every field, after any
293
+ per-key converters have run. It receives both the key and the value.
112
294
 
113
- Example 2:
295
+ **Stripping whitespace from all string fields:**
114
296
 
297
+ ```ruby
298
+ SmarterCSV.generate('output.csv', value_converters: { _all: ->(_k, v) { v.is_a?(String) ? v.strip : v } }) do |csv|
299
+ csv << { name: ' Alice ', city: ' NYC ' }
300
+ end
301
+ # output:
302
+ # name,city
303
+ # Alice,NYC
115
304
  ```
116
- options = {
117
- value_converters: {
118
- active: ->(v) { !!v ? '✅' : '❌' },
119
- balance: ->(v) do
120
- case v
121
- when Float
122
- '$%.2f' % v.round(2)
123
- when Integer
124
- "$#{v}"
125
- else
126
- v.to_s
127
- end
128
- end,
129
- }
130
- }
305
+
306
+ **Combining per-key and global converters** — per-key runs first, `:_all` runs after:
307
+
308
+ ```ruby
309
+ options = {
310
+ value_converters: {
311
+ active: ->(v) { v ? 'YES' : 'NO' },
312
+ _all: ->(_k, v) { v.to_s.upcase },
313
+ }
314
+ }
315
+
316
+ SmarterCSV.generate('output.csv', options) do |csv|
317
+ csv << { name: 'Alice', city: 'nyc', active: true }
318
+ end
319
+ # output:
320
+ # name,city,active
321
+ # ALICE,NYC,YES
131
322
  ```
132
323
 
133
- This maps the hash key `:balance` to a string. Floats are rounded and displayed with 2 decimals and prefixed by `$`. Integers are prefixed by `$`.
134
- The boolean value of the key `:active` is mapped into an emoji.
324
+ **Custom quoting with `:_all`** when taking manual control of quoting, disable
325
+ auto-quoting to avoid double-quoting:
326
+
327
+ ```ruby
328
+ options = {
329
+ disable_auto_quoting: true,
330
+ value_converters: {
331
+ active: ->(v) { v ? 'YES' : 'NO' },
332
+ _all: ->(_k, v) { v.is_a?(String) ? "\"#{v}\"" : v },
333
+ }
334
+ }
335
+ ```
135
336
 
136
- ### Global Value Converters
337
+ > **Note:** `disable_auto_quoting: true` is a top-level option, not part of
338
+ > `value_converters:`. Only disable it when you are taking full control of quoting yourself.
137
339
 
138
- You can also use the special keyword `:_all` to define transformations that are applied to each field of the CSV file.
340
+ ## Serializing Dates, Money, and Units
139
341
 
342
+ Ruby's default `to_s` is often not enough when writing dates, monetary values, or measured
343
+ quantities to CSV. The target format depends on your consumer — a downstream system, a
344
+ locale, or a spreadsheet audience. Use `value_converters:` to take explicit control.
345
+
346
+ ### Dates and Times
347
+
348
+ `Date#to_s` produces ISO 8601 (`2026-03-09`), which is unambiguous and safe as a default.
349
+ Use a converter when you need a different format:
350
+
351
+ ```ruby
352
+ # ISO 8601 (default to_s — shown for clarity)
353
+ iso = ->(v) { v&.strftime('%Y-%m-%d') }
354
+
355
+ # US format: MM/DD/YYYY
356
+ us = ->(v) { v&.strftime('%m/%d/%Y') }
357
+
358
+ # European format: DD.MM.YYYY
359
+ eu = ->(v) { v&.strftime('%d.%m.%Y') }
360
+
361
+ # Human-readable with time
362
+ full = ->(v) { v&.strftime('%d %b %Y %H:%M') }
363
+
364
+ SmarterCSV.generate('output.csv', value_converters: { issued_on: eu, expires_at: full }) do |csv|
365
+ csv << { name: 'Alice', issued_on: Date.new(2026, 3, 9), expires_at: Time.now }
366
+ end
367
+ # output:
368
+ # name,issued_on,expires_at
369
+ # Alice,09.03.2026,09 Mar 2026 14:32
370
+ ```
371
+
372
+ The `&.` safe-navigation operator ensures a `nil` date field produces an empty cell
373
+ rather than raising `NoMethodError`.
374
+
375
+ ### Money
376
+
377
+ `Money#to_s` (from the [`money`](https://github.com/RubyMoney/money) gem) returns the
378
+ fractional amount as a string (e.g. `"4450"` for $44.50 stored in cents) — almost never
379
+ what a CSV consumer expects. Always use an explicit converter:
380
+
381
+ ```ruby
382
+ # Raw decimal amount — most portable, easy to re-import
383
+ amount_only = ->(v) { v&.to_d&.to_s } # "44.50"
384
+
385
+ # With currency symbol — for human-readable exports
386
+ with_symbol = ->(v) { v ? v.format : nil } # "$44.50", "€44,50" (locale-aware via money gem)
387
+
388
+ # Amount + currency code — for multi-currency files
389
+ with_code = ->(v) { v ? "#{v.currency.iso_code} #{v.to_d}" : nil } # "USD 44.50", "EUR 12.00"
390
+ ```
391
+
392
+ Choose the right format for your consumer:
393
+
394
+ ```ruby
395
+ # Single-currency export (e.g. internal finance tool)
396
+ SmarterCSV.generate('export.csv', value_converters: { price: amount_only, tax: amount_only }) do |csv|
397
+ records.each { |r| csv << r }
398
+ end
399
+
400
+ # Multi-currency export (e.g. cross-border invoicing)
401
+ SmarterCSV.generate('export.csv', value_converters: { price: with_code, tax: with_code }) do |csv|
402
+ records.each { |r| csv << r }
403
+ end
404
+ ```
405
+
406
+ > **Tip:** for re-importable CSV files, prefer `amount_only` — a bare decimal is
407
+ > unambiguous and can be parsed back without stripping symbols or handling locale-specific
408
+ > separators. Reserve `with_symbol` for human-readable exports that will not be re-parsed.
409
+
410
+ ### Unit Conversions
411
+
412
+ Value converters are not limited to formatting — they can perform any transformation,
413
+ including unit conversions. A common case is exporting sensor or weather data that is
414
+ stored internally in one unit but must be delivered in another.
415
+
416
+ Notice how `map_headers:` and `value_converters:` work together as two sides of the same
417
+ coin: the converter transforms the data into the target unit, and the renamed header tells
418
+ the reader exactly what unit they are looking at. Neither is useful without the other —
419
+ correct data with a misleading header is just as wrong as a correct header with unconverted
420
+ data.
421
+
422
+ **Fahrenheit to Celsius:**
423
+
424
+ ```ruby
425
+ f_to_c = ->(v) { v ? ((v - 32) * 5.0 / 9).round(1) : nil }
426
+
427
+ options = {
428
+ map_headers: { temperature: :temperature_c },
429
+ value_converters: { temperature: f_to_c },
430
+ }
431
+
432
+ SmarterCSV.generate('weather.csv', options) do |csv|
433
+ csv << { city: 'New York', temperature: 32 } # freezing
434
+ csv << { city: 'Phoenix', temperature: 104 } # hot
435
+ csv << { city: 'Paris', temperature: 68 }
436
+ end
437
+ # output:
438
+ # city,temperature_c
439
+ # New York,0.0
440
+ # Phoenix,40.0
441
+ # Paris,20.0
442
+ ```
443
+
444
+ The same pattern applies to any unit pair — kilometers to miles, kilograms to pounds,
445
+ meters per second to km/h, and so on:
446
+
447
+ ```ruby
448
+ miles_to_km = ->(v) { v ? (v * 1.60934).round(2) : nil }
449
+ lbs_to_kg = ->(v) { v ? (v * 0.453592).round(2) : nil }
450
+
451
+ options = {
452
+ map_headers: { distance: :distance_km, weight: :weight_kg },
453
+ value_converters: { distance: miles_to_km, weight: lbs_to_kg },
454
+ }
455
+
456
+ SmarterCSV.generate('measurements.csv', options) do |csv|
457
+ records.each { |r| csv << r }
458
+ end
459
+ ```
460
+
461
+ ## Handling Nil, Empty, and Missing Values
462
+
463
+ By default, both `nil` values and empty-string values are written as an empty field.
464
+ Use the `write_nil_value:` and `write_empty_value:` options to substitute a different string.
465
+
466
+ ### `write_nil_value`
467
+
468
+ Specifies the string written when a hash value is `nil`. Defaults to `''` (empty field).
469
+
470
+ ```ruby
471
+ SmarterCSV.generate('output.csv', write_nil_value: 'N/A') do |csv|
472
+ csv << { name: 'Alice', score: nil }
473
+ csv << { name: 'Bob', score: 42 }
474
+ end
475
+ # output:
476
+ # name,score
477
+ # Alice,N/A
478
+ # Bob,42
479
+ ```
480
+
481
+ ### `write_empty_value`
482
+
483
+ Specifies the string written when a hash value is an empty string `''`. Defaults to `''`.
484
+ This also applies to **missing keys**: if the row hash does not contain a key that appears
485
+ in the headers, the field defaults to `''` and `write_empty_value:` is substituted.
486
+
487
+ ```ruby
488
+ SmarterCSV.generate('output.csv', write_empty_value: 'EMPTY') do |csv|
489
+ csv << { name: 'Alice', city: '' } # explicit empty string
490
+ csv << { name: 'Bob' } # :city key missing entirely
491
+ end
492
+ # output:
493
+ # name,city
494
+ # Alice,EMPTY
495
+ # Bob,EMPTY
496
+ ```
497
+
498
+ ### Using both together
499
+
500
+ ```ruby
501
+ options = { write_nil_value: 'NULL', write_empty_value: '-' }
502
+ SmarterCSV.generate('output.csv', options) do |csv|
503
+ csv << { name: 'Alice', score: nil, city: '' }
504
+ end
505
+ # output:
506
+ # name,score,city
507
+ # Alice,NULL,-
508
+ ```
509
+
510
+ > **Note:** `write_nil_value:` is applied first. `write_empty_value:` only fires when the
511
+ > value is a non-nil empty string, so the two options are independent.
512
+
513
+ ## File Encoding and BOM
514
+
515
+ ### `encoding`
516
+
517
+ Specifies the encoding used when opening the output file. Only applies when writing to a
518
+ file path or `Pathname`; ignored when an IO object is passed in. Defaults to the system
519
+ encoding.
520
+
521
+ **Simple encoding** — sets the external (file) encoding:
522
+
523
+ ```ruby
524
+ SmarterCSV.generate('output.csv', encoding: 'UTF-8') do |csv|
525
+ csv << { city: 'Ångström', country: 'Sweden' }
526
+ end
140
527
  ```
141
- options = {
142
- value_converters: {
143
- disable_auto_quoting: true, # ⚠️ Important: turn off auto-quoting because we're messing with it below
144
- active: ->(v) { !!v ? 'YES' : 'NO' },
145
- _all: ->(_k, v) { v.is_a?(String) ? "\"#{v}\"" : v } # only double-quote string fields
146
- }
147
- }
528
+
529
+ **Transcoding** — use `'external:internal'` notation to automatically transcode from your
530
+ Ruby strings' encoding to the target file encoding. This is Ruby's standard
531
+ `File.open` encoding syntax:
532
+
533
+ ```ruby
534
+ # Ruby strings are UTF-8; write a Windows-1252 file for legacy consumers.
535
+ # Ruby will transcode each string automatically on write.
536
+ SmarterCSV.generate('output.csv', encoding: 'Windows-1252:UTF-8') do |csv|
537
+ records.each { |r| csv << r }
538
+ end
148
539
  ```
149
540
 
150
- Using the `:_all` keyword, you can set up rules to convert all hash keys. This is applied after all per-key conversions are made.
541
+ ```ruby
542
+ # Transcode UTF-8 strings into ISO-8859-1
543
+ SmarterCSV.generate('output.csv', encoding: 'ISO-8859-1:UTF-8') do |csv|
544
+ records.each { |r| csv << r }
545
+ end
546
+ ```
151
547
 
152
- This example puts double-quotes around all String-value data, but leaves other types unchanged.
548
+ > **Note:** Transcoding raises `Encoding::UndefinedConversionError` if a character in your
549
+ > data cannot be represented in the target encoding (e.g. a Chinese character written to
550
+ > ISO-8859-1). Handle this with a value converter if you need lossy substitution.
153
551
 
154
- Note that when you're customizing putting quote-chars around fields, you need to `disable_auto_quoting`.
552
+ ### `write_bom`
553
+
554
+ When `true`, prepends a UTF-8 BOM (`\xEF\xBB\xBF`) to the very beginning of the output.
555
+ Defaults to `false`.
556
+
557
+ A BOM is useful when the CSV will be opened in **Microsoft Excel**, which uses the BOM as a
558
+ signal to interpret the file as UTF-8 rather than the system code page. Without a BOM, Excel
559
+ may display accented characters and non-Latin scripts as garbage.
560
+
561
+ ```ruby
562
+ SmarterCSV.generate('export_for_excel.csv', encoding: 'UTF-8', write_bom: true) do |csv|
563
+ csv << { name: 'Ångström', value: 99 }
564
+ end
565
+ # The file begins with 0xEF 0xBB 0xBF followed by the header line.
566
+ ```
567
+
568
+ > **Note:** Only use `write_bom: true` with UTF-8 output. Adding a UTF-8 BOM to a
569
+ > non-UTF-8 file will corrupt it.
155
570
 
156
571
  ## More Examples
157
572
 
158
573
  Check out the [RSpec tests](../spec/smarter_csv/writer_spec.rb) for more examples.
159
574
 
160
575
  ----------------
161
- PREVIOUS: [The Basic Read API](./basic_read_api.md) | NEXT: [Batch Processing](./batch_processing.md)
576
+
577
+ PREVIOUS: [The Basic Read API](./basic_read_api.md) | NEXT: [Batch Processing](./batch_processing.md) | UP: [README](../README.md)