smarter_csv 1.15.2 → 1.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +9 -0
- data/CHANGELOG.md +68 -1
- data/CONTRIBUTORS.md +3 -1
- data/Gemfile +1 -0
- data/README.md +123 -27
- data/docs/_introduction.md +40 -24
- data/docs/bad_row_quarantine.md +285 -0
- data/docs/basic_read_api.md +151 -9
- data/docs/basic_write_api.md +474 -59
- data/docs/batch_processing.md +161 -4
- data/docs/column_selection.md +183 -0
- data/docs/data_transformations.md +162 -29
- data/docs/examples.md +339 -46
- data/docs/header_transformations.md +93 -12
- data/docs/header_validations.md +56 -18
- data/docs/history.md +117 -0
- data/docs/instrumentation.md +165 -0
- data/docs/migrating_from_csv.md +290 -0
- data/docs/options.md +150 -87
- data/docs/parsing_strategy.md +63 -1
- data/docs/real_world_csv.md +262 -0
- data/docs/releases/1.16.0/benchmarks.md +223 -0
- data/docs/releases/1.16.0/changes.md +272 -0
- data/docs/releases/1.16.0/performance_notes.md +114 -0
- data/docs/row_col_sep.md +14 -5
- data/docs/value_converters.md +193 -57
- data/ext/smarter_csv/extconf.rb +3 -0
- data/ext/smarter_csv/smarter_csv.c +1007 -71
- data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
- data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
- data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
- data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
- data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
- data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
- data/lib/smarter_csv/errors.rb +8 -0
- data/lib/smarter_csv/file_io.rb +1 -1
- data/lib/smarter_csv/hash_transformations.rb +14 -13
- data/lib/smarter_csv/header_transformations.rb +21 -2
- data/lib/smarter_csv/headers.rb +2 -1
- data/lib/smarter_csv/options.rb +124 -7
- data/lib/smarter_csv/parser.rb +362 -75
- data/lib/smarter_csv/reader.rb +494 -46
- data/lib/smarter_csv/version.rb +1 -1
- data/lib/smarter_csv/writer.rb +71 -19
- data/lib/smarter_csv.rb +95 -12
- data/smarter_csv.gemspec +20 -10
- metadata +37 -80
data/docs/basic_write_api.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
### Contents
|
|
3
3
|
|
|
4
4
|
* [Introduction](./_introduction.md)
|
|
5
|
+
* [Migrating from Ruby CSV](./migrating_from_csv.md)
|
|
5
6
|
* [Parsing Strategy](./parsing_strategy.md)
|
|
6
7
|
* [The Basic Read API](./basic_read_api.md)
|
|
7
8
|
* [**The Basic Write API**](./basic_write_api.md)
|
|
@@ -10,10 +11,17 @@
|
|
|
10
11
|
* [Row and Column Separators](./row_col_sep.md)
|
|
11
12
|
* [Header Transformations](./header_transformations.md)
|
|
12
13
|
* [Header Validations](./header_validations.md)
|
|
14
|
+
* [Column Selection](./column_selection.md)
|
|
13
15
|
* [Data Transformations](./data_transformations.md)
|
|
14
16
|
* [Value Converters](./value_converters.md)
|
|
15
|
-
|
|
16
|
-
|
|
17
|
+
* [Bad Row Quarantine](./bad_row_quarantine.md)
|
|
18
|
+
* [Instrumentation Hooks](./instrumentation.md)
|
|
19
|
+
* [Examples](./examples.md)
|
|
20
|
+
* [Real-World CSV Files](./real_world_csv.md)
|
|
21
|
+
* [SmarterCSV over the Years](./history.md)
|
|
22
|
+
* [Release Notes](./releases/1.16.0/changes.md)
|
|
23
|
+
|
|
24
|
+
--------------
|
|
17
25
|
|
|
18
26
|
# SmarterCSV Basic Write API
|
|
19
27
|
|
|
@@ -25,6 +33,72 @@ To generate a CSV file, we use the `<<` operator to append new data to the file.
|
|
|
25
33
|
|
|
26
34
|
The input operator for adding data to a CSV file `<<` can handle single hashes, array-of-hashes, or array-of-arrays-of-hashes, and can be called one or multiple times in order to create a file.
|
|
27
35
|
|
|
36
|
+
### Hashes, Not Arrays — and Why It Matters for Data Integrity
|
|
37
|
+
|
|
38
|
+
Ruby's `CSV` library lets you write raw arrays: `csv << ["Alice", 30, "NYC"]`. SmarterCSV
|
|
39
|
+
deliberately does not support this, because positional array writing is an open invitation
|
|
40
|
+
to silent data corruption.
|
|
41
|
+
|
|
42
|
+
Consider what happens when a column is added:
|
|
43
|
+
|
|
44
|
+
```ruby
|
|
45
|
+
# Originally:
|
|
46
|
+
headers = [:name, :age, :city]
|
|
47
|
+
|
|
48
|
+
# Later, a column is inserted:
|
|
49
|
+
headers = [:name, :age, :country, :city]
|
|
50
|
+
|
|
51
|
+
# But the array rows were never updated:
|
|
52
|
+
csv << ["Alice", 30, "NYC"] # "NYC" now lands under :country, not :city
|
|
53
|
+
csv << ["Bob", 25, "London"] # same silent mis-alignment
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
There is no error. The CSV looks valid. The data is wrong. This class of bug — a silent off-by-one column mis-alignment — is completely undetectable from the output file alone.
|
|
57
|
+
|
|
58
|
+
SmarterCSV avoids this entirely by requiring hashes, where every value is explicitly bound to its column name:
|
|
59
|
+
|
|
60
|
+
```ruby
|
|
61
|
+
csv << { name: 'Alice', age: 30, city: 'NYC' }
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Adding or reordering columns cannot silently shift values. A missing key produces an empty
|
|
65
|
+
field in the correct column. The mapping is always explicit.
|
|
66
|
+
|
|
67
|
+
**Providing `headers:` enforces column order.** When you pass `headers:`, the Writer always
|
|
68
|
+
outputs columns in exactly that order — regardless of the order keys appear in the hash.
|
|
69
|
+
This is the right tool when column order matters:
|
|
70
|
+
|
|
71
|
+
```ruby
|
|
72
|
+
options = { headers: [:country, :city, :name, :age] }
|
|
73
|
+
|
|
74
|
+
SmarterCSV.generate('output.csv', options) do |csv|
|
|
75
|
+
# Hash key order is irrelevant — output follows the headers order
|
|
76
|
+
csv << { name: 'Alice', age: 30, city: 'NYC', country: 'USA' }
|
|
77
|
+
csv << { name: 'Bob', age: 25, city: 'London', country: 'UK' }
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
# output:
|
|
81
|
+
# country,city,name,age
|
|
82
|
+
# USA,NYC,Alice,30
|
|
83
|
+
# UK,London,Bob,25
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
This is the correct way to write CSV when column order matters: declare the headers
|
|
87
|
+
explicitly and let the Writer enforce them. No positional assumptions, no off-by-one risk.
|
|
88
|
+
|
|
89
|
+
If you already have data in arrays, convert to hashes first using your headers as keys.
|
|
90
|
+
This forces the key-to-column mapping to be explicit and visible at the one place where
|
|
91
|
+
it can actually be verified — instead of being implicit in the position of every value:
|
|
92
|
+
|
|
93
|
+
```ruby
|
|
94
|
+
headers = [:name, :age, :city]
|
|
95
|
+
rows = [["Alice", 30, "NYC"], ["Bob", 25, "London"]]
|
|
96
|
+
|
|
97
|
+
csv_string = SmarterCSV.generate do |csv|
|
|
98
|
+
rows.each { |row| csv << headers.zip(row).to_h }
|
|
99
|
+
end
|
|
100
|
+
```
|
|
101
|
+
|
|
28
102
|
### Auto-Discovery of Headers
|
|
29
103
|
|
|
30
104
|
By default, the `SmarterCSV::Writer` discovers all keys that are present in the input data, and as they become know, appends them to the CSV headers. This ensures that all data will be included in the output CSV file.
|
|
@@ -46,32 +120,90 @@ In either case the corresponding field will be put in double-quotes.
|
|
|
46
120
|
|
|
47
121
|
### Simplified Interface
|
|
48
122
|
|
|
49
|
-
The simplified interface takes a block:
|
|
123
|
+
The simplified interface takes a block. The first argument can be:
|
|
124
|
+
|
|
125
|
+
* **Omitted** — SmarterCSV writes to an internal `StringIO` and returns the CSV as a `String`.
|
|
126
|
+
* A **`String`** path — SmarterCSV opens the file and closes it when done.
|
|
127
|
+
* A **`Pathname`** (or any object responding to `#to_path`) — treated the same as a String path.
|
|
128
|
+
* Any **IO-like object** responding to `#write` (e.g. `StringIO`, an open `File` handle, a
|
|
129
|
+
socket) — SmarterCSV writes to it but does **not** close it; the caller retains ownership.
|
|
50
130
|
|
|
51
|
-
|
|
52
|
-
SmarterCSV.generate(filename, options) do |csv_writer|
|
|
131
|
+
Passing anything else raises `ArgumentError` immediately.
|
|
53
132
|
|
|
54
|
-
|
|
55
|
-
batch.pluck(:name, :description, :instructor).each do |record|
|
|
56
|
-
csv_writer << record
|
|
57
|
-
end
|
|
58
|
-
end
|
|
133
|
+
**Generate a CSV String directly (no file argument):**
|
|
59
134
|
|
|
60
|
-
|
|
61
|
-
|
|
135
|
+
```ruby
|
|
136
|
+
csv_string = SmarterCSV.generate do |csv|
|
|
137
|
+
csv << { name: 'Alice', age: 30 }
|
|
138
|
+
csv << { name: 'Bob', age: 25 }
|
|
139
|
+
end
|
|
140
|
+
# => "name,age\nAlice,30\nBob,25\n"
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Options can be passed as the first argument when no destination is given:
|
|
144
|
+
|
|
145
|
+
```ruby
|
|
146
|
+
csv_string = SmarterCSV.generate(col_sep: ';', row_sep: "\r\n") do |csv|
|
|
147
|
+
records.each { |r| csv << r }
|
|
148
|
+
end
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
**Write to a file by path:**
|
|
152
|
+
|
|
153
|
+
```ruby
|
|
154
|
+
SmarterCSV.generate('output.csv', options) do |csv|
|
|
155
|
+
MyModel.find_in_batches(batch_size: 100) do |batch|
|
|
156
|
+
batch.each { |record| csv << record.attributes }
|
|
157
|
+
end
|
|
158
|
+
end
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Write to a file using a `Pathname`:**
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
require 'pathname'
|
|
165
|
+
SmarterCSV.generate(Pathname('output.csv'), options) do |csv|
|
|
166
|
+
records.each { |r| csv << r }
|
|
167
|
+
end
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**Write to a `StringIO` (e.g. for Rails streaming responses):**
|
|
171
|
+
|
|
172
|
+
```ruby
|
|
173
|
+
io = StringIO.new
|
|
174
|
+
SmarterCSV.generate(io) do |csv|
|
|
175
|
+
records.each { |r| csv << r }
|
|
176
|
+
end
|
|
177
|
+
send_data io.string, type: 'text/csv', filename: 'export.csv'
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**Write to an already-open file handle:**
|
|
181
|
+
|
|
182
|
+
```ruby
|
|
183
|
+
File.open('output.csv', 'w') do |f|
|
|
184
|
+
SmarterCSV.generate(f) do |csv|
|
|
185
|
+
records.each { |r| csv << r }
|
|
186
|
+
end
|
|
187
|
+
end
|
|
188
|
+
```
|
|
62
189
|
|
|
63
190
|
### Full Interface
|
|
64
191
|
|
|
65
|
-
|
|
66
|
-
|
|
192
|
+
The full interface gives you direct access to the `Writer` instance, which is useful when you
|
|
193
|
+
need to call `finalize` explicitly or inspect the writer's state afterwards.
|
|
67
194
|
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
csv_writer << record
|
|
71
|
-
end
|
|
195
|
+
```ruby
|
|
196
|
+
csv_writer = SmarterCSV::Writer.new(file_path_or_io, options)
|
|
72
197
|
|
|
73
|
-
|
|
74
|
-
|
|
198
|
+
MyModel.find_in_batches(batch_size: 100) do |batch|
|
|
199
|
+
batch.each { |record| csv_writer << record.attributes }
|
|
200
|
+
end
|
|
201
|
+
|
|
202
|
+
csv_writer.finalize
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
The full interface accepts the same argument types as the simplified interface: a String path,
|
|
206
|
+
a `Pathname`, or any IO-like object responding to `#write`.
|
|
75
207
|
|
|
76
208
|
## Advanced Features: Customizing the Output Format
|
|
77
209
|
|
|
@@ -95,67 +227,350 @@ Similar to the `headers` option, you can define `map_headers` in order to rename
|
|
|
95
227
|
|
|
96
228
|
### Per Key Value Converters
|
|
97
229
|
|
|
230
|
+
Using per-key value converters, you can control how specific hash keys in your data are
|
|
231
|
+
serialized in the output. Each converter is a lambda that receives the field value and
|
|
232
|
+
returns the string to write.
|
|
233
|
+
|
|
234
|
+
**Boolean to string:**
|
|
235
|
+
|
|
236
|
+
```ruby
|
|
237
|
+
SmarterCSV.generate('output.csv', value_converters: { active: ->(v) { v ? 'YES' : 'NO' } }) do |csv|
|
|
238
|
+
csv << { name: 'Alice', active: true }
|
|
239
|
+
csv << { name: 'Bob', active: false }
|
|
240
|
+
end
|
|
241
|
+
# output:
|
|
242
|
+
# name,active
|
|
243
|
+
# Alice,YES
|
|
244
|
+
# Bob,NO
|
|
245
|
+
```
|
|
98
246
|
|
|
99
|
-
|
|
247
|
+
**Date/Time formatting:**
|
|
100
248
|
|
|
101
|
-
|
|
249
|
+
```ruby
|
|
250
|
+
SmarterCSV.generate('output.csv', value_converters: { created_at: ->(v) { v&.strftime('%Y-%m-%d') } }) do |csv|
|
|
251
|
+
csv << { name: 'Alice', created_at: Time.now }
|
|
252
|
+
end
|
|
253
|
+
# output:
|
|
254
|
+
# name,created_at
|
|
255
|
+
# Alice,2026-03-09
|
|
256
|
+
```
|
|
102
257
|
|
|
258
|
+
**Numeric formatting:**
|
|
259
|
+
|
|
260
|
+
```ruby
|
|
261
|
+
balance_converter = ->(v) do
|
|
262
|
+
case v
|
|
263
|
+
when Float then '$%.2f' % v.round(2)
|
|
264
|
+
when Integer then "$#{v}"
|
|
265
|
+
else v.to_s
|
|
266
|
+
end
|
|
267
|
+
end
|
|
268
|
+
|
|
269
|
+
SmarterCSV.generate('output.csv', value_converters: { balance: balance_converter }) do |csv|
|
|
270
|
+
csv << { name: 'Alice', balance: 1234.5 }
|
|
271
|
+
csv << { name: 'Bob', balance: 500 }
|
|
272
|
+
end
|
|
273
|
+
# output:
|
|
274
|
+
# name,balance
|
|
275
|
+
# Alice,$1234.50
|
|
276
|
+
# Bob,$500
|
|
103
277
|
```
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
278
|
+
|
|
279
|
+
**Reusing the same converter across multiple keys:**
|
|
280
|
+
|
|
281
|
+
```ruby
|
|
282
|
+
date_converter = ->(v) { v&.strftime('%Y-%m-%d') }
|
|
283
|
+
|
|
284
|
+
SmarterCSV.generate('output.csv', value_converters: { created_at: date_converter, updated_at: date_converter }) do |csv|
|
|
285
|
+
csv << { name: 'Alice', created_at: Time.now, updated_at: Time.now }
|
|
286
|
+
end
|
|
109
287
|
```
|
|
110
288
|
|
|
111
|
-
|
|
289
|
+
### Global Value Converters
|
|
290
|
+
|
|
291
|
+
The special key `:_all` defines a transformation applied to every field, after any
|
|
292
|
+
per-key converters have run. It receives both the key and the value.
|
|
112
293
|
|
|
113
|
-
|
|
294
|
+
**Stripping whitespace from all string fields:**
|
|
114
295
|
|
|
296
|
+
```ruby
|
|
297
|
+
SmarterCSV.generate('output.csv', value_converters: { _all: ->(_k, v) { v.is_a?(String) ? v.strip : v } }) do |csv|
|
|
298
|
+
csv << { name: ' Alice ', city: ' NYC ' }
|
|
299
|
+
end
|
|
300
|
+
# output:
|
|
301
|
+
# name,city
|
|
302
|
+
# Alice,NYC
|
|
115
303
|
```
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
304
|
+
|
|
305
|
+
**Combining per-key and global converters** — per-key runs first, `:_all` runs after:
|
|
306
|
+
|
|
307
|
+
```ruby
|
|
308
|
+
options = {
|
|
309
|
+
value_converters: {
|
|
310
|
+
active: ->(v) { v ? 'YES' : 'NO' },
|
|
311
|
+
_all: ->(_k, v) { v.to_s.upcase },
|
|
312
|
+
}
|
|
313
|
+
}
|
|
314
|
+
|
|
315
|
+
SmarterCSV.generate('output.csv', options) do |csv|
|
|
316
|
+
csv << { name: 'Alice', city: 'nyc', active: true }
|
|
317
|
+
end
|
|
318
|
+
# output:
|
|
319
|
+
# name,city,active
|
|
320
|
+
# ALICE,NYC,YES
|
|
131
321
|
```
|
|
132
322
|
|
|
133
|
-
|
|
134
|
-
|
|
323
|
+
**Custom quoting with `:_all`** — when taking manual control of quoting, disable
|
|
324
|
+
auto-quoting to avoid double-quoting:
|
|
325
|
+
|
|
326
|
+
```ruby
|
|
327
|
+
options = {
|
|
328
|
+
disable_auto_quoting: true,
|
|
329
|
+
value_converters: {
|
|
330
|
+
active: ->(v) { v ? 'YES' : 'NO' },
|
|
331
|
+
_all: ->(_k, v) { v.is_a?(String) ? "\"#{v}\"" : v },
|
|
332
|
+
}
|
|
333
|
+
}
|
|
334
|
+
```
|
|
135
335
|
|
|
136
|
-
|
|
336
|
+
> **Note:** `disable_auto_quoting: true` is a top-level option, not part of
|
|
337
|
+
> `value_converters:`. Only disable it when you are taking full control of quoting yourself.
|
|
137
338
|
|
|
138
|
-
|
|
339
|
+
## Serializing Dates, Money, and Units
|
|
139
340
|
|
|
341
|
+
Ruby's default `to_s` is often not enough when writing dates, monetary values, or measured
|
|
342
|
+
quantities to CSV. The target format depends on your consumer — a downstream system, a
|
|
343
|
+
locale, or a spreadsheet audience. Use `value_converters:` to take explicit control.
|
|
344
|
+
|
|
345
|
+
### Dates and Times
|
|
346
|
+
|
|
347
|
+
`Date#to_s` produces ISO 8601 (`2026-03-09`), which is unambiguous and safe as a default.
|
|
348
|
+
Use a converter when you need a different format:
|
|
349
|
+
|
|
350
|
+
```ruby
|
|
351
|
+
# ISO 8601 (default to_s — shown for clarity)
|
|
352
|
+
iso = ->(v) { v&.strftime('%Y-%m-%d') }
|
|
353
|
+
|
|
354
|
+
# US format: MM/DD/YYYY
|
|
355
|
+
us = ->(v) { v&.strftime('%m/%d/%Y') }
|
|
356
|
+
|
|
357
|
+
# European format: DD.MM.YYYY
|
|
358
|
+
eu = ->(v) { v&.strftime('%d.%m.%Y') }
|
|
359
|
+
|
|
360
|
+
# Human-readable with time
|
|
361
|
+
full = ->(v) { v&.strftime('%d %b %Y %H:%M') }
|
|
362
|
+
|
|
363
|
+
SmarterCSV.generate('output.csv', value_converters: { issued_on: eu, expires_at: full }) do |csv|
|
|
364
|
+
csv << { name: 'Alice', issued_on: Date.new(2026, 3, 9), expires_at: Time.now }
|
|
365
|
+
end
|
|
366
|
+
# output:
|
|
367
|
+
# name,issued_on,expires_at
|
|
368
|
+
# Alice,09.03.2026,09 Mar 2026 14:32
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
The `&.` safe-navigation operator ensures a `nil` date field produces an empty cell
|
|
372
|
+
rather than raising `NoMethodError`.
|
|
373
|
+
|
|
374
|
+
### Money
|
|
375
|
+
|
|
376
|
+
`Money#to_s` (from the [`money`](https://github.com/RubyMoney/money) gem) returns the
|
|
377
|
+
fractional amount as a string (e.g. `"4450"` for $44.50 stored in cents) — almost never
|
|
378
|
+
what a CSV consumer expects. Always use an explicit converter:
|
|
379
|
+
|
|
380
|
+
```ruby
|
|
381
|
+
# Raw decimal amount — most portable, easy to re-import
|
|
382
|
+
amount_only = ->(v) { v&.to_d&.to_s } # "44.50"
|
|
383
|
+
|
|
384
|
+
# With currency symbol — for human-readable exports
|
|
385
|
+
with_symbol = ->(v) { v ? v.format : nil } # "$44.50", "€44,50" (locale-aware via money gem)
|
|
386
|
+
|
|
387
|
+
# Amount + currency code — for multi-currency files
|
|
388
|
+
with_code = ->(v) { v ? "#{v.currency.iso_code} #{v.to_d}" : nil } # "USD 44.50", "EUR 12.00"
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
Choose the right format for your consumer:
|
|
392
|
+
|
|
393
|
+
```ruby
|
|
394
|
+
# Single-currency export (e.g. internal finance tool)
|
|
395
|
+
SmarterCSV.generate('export.csv', value_converters: { price: amount_only, tax: amount_only }) do |csv|
|
|
396
|
+
records.each { |r| csv << r }
|
|
397
|
+
end
|
|
398
|
+
|
|
399
|
+
# Multi-currency export (e.g. cross-border invoicing)
|
|
400
|
+
SmarterCSV.generate('export.csv', value_converters: { price: with_code, tax: with_code }) do |csv|
|
|
401
|
+
records.each { |r| csv << r }
|
|
402
|
+
end
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
> **Tip:** for re-importable CSV files, prefer `amount_only` — a bare decimal is
|
|
406
|
+
> unambiguous and can be parsed back without stripping symbols or handling locale-specific
|
|
407
|
+
> separators. Reserve `with_symbol` for human-readable exports that will not be re-parsed.
|
|
408
|
+
|
|
409
|
+
### Unit Conversions
|
|
410
|
+
|
|
411
|
+
Value converters are not limited to formatting — they can perform any transformation,
|
|
412
|
+
including unit conversions. A common case is exporting sensor or weather data that is
|
|
413
|
+
stored internally in one unit but must be delivered in another.
|
|
414
|
+
|
|
415
|
+
Notice how `map_headers:` and `value_converters:` work together as two sides of the same
|
|
416
|
+
coin: the converter transforms the data into the target unit, and the renamed header tells
|
|
417
|
+
the reader exactly what unit they are looking at. Neither is useful without the other —
|
|
418
|
+
correct data with a misleading header is just as wrong as a correct header with unconverted
|
|
419
|
+
data.
|
|
420
|
+
|
|
421
|
+
**Fahrenheit to Celsius:**
|
|
422
|
+
|
|
423
|
+
```ruby
|
|
424
|
+
f_to_c = ->(v) { v ? ((v - 32) * 5.0 / 9).round(1) : nil }
|
|
425
|
+
|
|
426
|
+
options = {
|
|
427
|
+
map_headers: { temperature: :temperature_c },
|
|
428
|
+
value_converters: { temperature: f_to_c },
|
|
429
|
+
}
|
|
430
|
+
|
|
431
|
+
SmarterCSV.generate('weather.csv', options) do |csv|
|
|
432
|
+
csv << { city: 'New York', temperature: 32 } # freezing
|
|
433
|
+
csv << { city: 'Phoenix', temperature: 104 } # hot
|
|
434
|
+
csv << { city: 'Paris', temperature: 68 }
|
|
435
|
+
end
|
|
436
|
+
# output:
|
|
437
|
+
# city,temperature_c
|
|
438
|
+
# New York,0.0
|
|
439
|
+
# Phoenix,40.0
|
|
440
|
+
# Paris,20.0
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
The same pattern applies to any unit pair — kilometers to miles, kilograms to pounds,
|
|
444
|
+
meters per second to km/h, and so on:
|
|
445
|
+
|
|
446
|
+
```ruby
|
|
447
|
+
miles_to_km = ->(v) { v ? (v * 1.60934).round(2) : nil }
|
|
448
|
+
lbs_to_kg = ->(v) { v ? (v * 0.453592).round(2) : nil }
|
|
449
|
+
|
|
450
|
+
options = {
|
|
451
|
+
map_headers: { distance: :distance_km, weight: :weight_kg },
|
|
452
|
+
value_converters: { distance: miles_to_km, weight: lbs_to_kg },
|
|
453
|
+
}
|
|
454
|
+
|
|
455
|
+
SmarterCSV.generate('measurements.csv', options) do |csv|
|
|
456
|
+
records.each { |r| csv << r }
|
|
457
|
+
end
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
## Handling Nil, Empty, and Missing Values
|
|
461
|
+
|
|
462
|
+
By default, both `nil` values and empty-string values are written as an empty field.
|
|
463
|
+
Use the `write_nil_value:` and `write_empty_value:` options to substitute a different string.
|
|
464
|
+
|
|
465
|
+
### `write_nil_value`
|
|
466
|
+
|
|
467
|
+
Specifies the string written when a hash value is `nil`. Defaults to `''` (empty field).
|
|
468
|
+
|
|
469
|
+
```ruby
|
|
470
|
+
SmarterCSV.generate('output.csv', write_nil_value: 'N/A') do |csv|
|
|
471
|
+
csv << { name: 'Alice', score: nil }
|
|
472
|
+
csv << { name: 'Bob', score: 42 }
|
|
473
|
+
end
|
|
474
|
+
# output:
|
|
475
|
+
# name,score
|
|
476
|
+
# Alice,N/A
|
|
477
|
+
# Bob,42
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
### `write_empty_value`
|
|
481
|
+
|
|
482
|
+
Specifies the string written when a hash value is an empty string `''`. Defaults to `''`.
|
|
483
|
+
This also applies to **missing keys**: if the row hash does not contain a key that appears
|
|
484
|
+
in the headers, the field defaults to `''` and `write_empty_value:` is substituted.
|
|
485
|
+
|
|
486
|
+
```ruby
|
|
487
|
+
SmarterCSV.generate('output.csv', write_empty_value: 'EMPTY') do |csv|
|
|
488
|
+
csv << { name: 'Alice', city: '' } # explicit empty string
|
|
489
|
+
csv << { name: 'Bob' } # :city key missing entirely
|
|
490
|
+
end
|
|
491
|
+
# output:
|
|
492
|
+
# name,city
|
|
493
|
+
# Alice,EMPTY
|
|
494
|
+
# Bob,EMPTY
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### Using both together
|
|
498
|
+
|
|
499
|
+
```ruby
|
|
500
|
+
options = { write_nil_value: 'NULL', write_empty_value: '-' }
|
|
501
|
+
SmarterCSV.generate('output.csv', options) do |csv|
|
|
502
|
+
csv << { name: 'Alice', score: nil, city: '' }
|
|
503
|
+
end
|
|
504
|
+
# output:
|
|
505
|
+
# name,score,city
|
|
506
|
+
# Alice,NULL,-
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
> **Note:** `write_nil_value:` is applied first. `write_empty_value:` only fires when the
|
|
510
|
+
> value is a non-nil empty string, so the two options are independent.
|
|
511
|
+
|
|
512
|
+
## File Encoding and BOM
|
|
513
|
+
|
|
514
|
+
### `encoding`
|
|
515
|
+
|
|
516
|
+
Specifies the encoding used when opening the output file. Only applies when writing to a
|
|
517
|
+
file path or `Pathname`; ignored when an IO object is passed in. Defaults to the system
|
|
518
|
+
encoding.
|
|
519
|
+
|
|
520
|
+
**Simple encoding** — sets the external (file) encoding:
|
|
521
|
+
|
|
522
|
+
```ruby
|
|
523
|
+
SmarterCSV.generate('output.csv', encoding: 'UTF-8') do |csv|
|
|
524
|
+
csv << { city: 'Ångström', country: 'Sweden' }
|
|
525
|
+
end
|
|
140
526
|
```
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
527
|
+
|
|
528
|
+
**Transcoding** — use `'external:internal'` notation to automatically transcode from your
|
|
529
|
+
Ruby strings' encoding to the target file encoding. This is Ruby's standard
|
|
530
|
+
`File.open` encoding syntax:
|
|
531
|
+
|
|
532
|
+
```ruby
|
|
533
|
+
# Ruby strings are UTF-8; write a Windows-1252 file for legacy consumers.
|
|
534
|
+
# Ruby will transcode each string automatically on write.
|
|
535
|
+
SmarterCSV.generate('output.csv', encoding: 'Windows-1252:UTF-8') do |csv|
|
|
536
|
+
records.each { |r| csv << r }
|
|
537
|
+
end
|
|
148
538
|
```
|
|
149
539
|
|
|
150
|
-
|
|
540
|
+
```ruby
|
|
541
|
+
# Transcode UTF-8 strings into ISO-8859-1
|
|
542
|
+
SmarterCSV.generate('output.csv', encoding: 'ISO-8859-1:UTF-8') do |csv|
|
|
543
|
+
records.each { |r| csv << r }
|
|
544
|
+
end
|
|
545
|
+
```
|
|
151
546
|
|
|
152
|
-
|
|
547
|
+
> **Note:** Transcoding raises `Encoding::UndefinedConversionError` if a character in your
|
|
548
|
+
> data cannot be represented in the target encoding (e.g. a Chinese character written to
|
|
549
|
+
> ISO-8859-1). Handle this with a value converter if you need lossy substitution.
|
|
153
550
|
|
|
154
|
-
|
|
551
|
+
### `write_bom`
|
|
552
|
+
|
|
553
|
+
When `true`, prepends a UTF-8 BOM (`\xEF\xBB\xBF`) to the very beginning of the output.
|
|
554
|
+
Defaults to `false`.
|
|
555
|
+
|
|
556
|
+
A BOM is useful when the CSV will be opened in **Microsoft Excel**, which uses the BOM as a
|
|
557
|
+
signal to interpret the file as UTF-8 rather than the system code page. Without a BOM, Excel
|
|
558
|
+
may display accented characters and non-Latin scripts as garbage.
|
|
559
|
+
|
|
560
|
+
```ruby
|
|
561
|
+
SmarterCSV.generate('export_for_excel.csv', encoding: 'UTF-8', write_bom: true) do |csv|
|
|
562
|
+
csv << { name: 'Ångström', value: 99 }
|
|
563
|
+
end
|
|
564
|
+
# The file begins with 0xEF 0xBB 0xBF followed by the header line.
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
> **Note:** Only use `write_bom: true` with UTF-8 output. Adding a UTF-8 BOM to a
|
|
568
|
+
> non-UTF-8 file will corrupt it.
|
|
155
569
|
|
|
156
570
|
## More Examples
|
|
157
571
|
|
|
158
572
|
Check out the [RSpec tests](../spec/smarter_csv/writer_spec.rb) for more examples.
|
|
159
573
|
|
|
160
574
|
----------------
|
|
161
|
-
|
|
575
|
+
|
|
576
|
+
PREVIOUS: [The Basic Read API](./basic_read_api.md) | NEXT: [Batch Processing](./batch_processing.md) | UP: [README](../README.md)
|