smarter_csv 1.15.2 → 1.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/.rspec +2 -0
  3. data/.rubocop.yml +9 -0
  4. data/CHANGELOG.md +112 -1
  5. data/CONTRIBUTORS.md +4 -1
  6. data/Gemfile +1 -0
  7. data/README.md +129 -27
  8. data/docs/_introduction.md +45 -24
  9. data/docs/bad_row_quarantine.md +342 -0
  10. data/docs/basic_read_api.md +152 -9
  11. data/docs/basic_write_api.md +475 -59
  12. data/docs/batch_processing.md +162 -4
  13. data/docs/column_selection.md +184 -0
  14. data/docs/data_transformations.md +163 -29
  15. data/docs/examples.md +340 -46
  16. data/docs/header_transformations.md +94 -12
  17. data/docs/header_validations.md +57 -18
  18. data/docs/history.md +119 -0
  19. data/docs/instrumentation.md +166 -0
  20. data/docs/migrating_from_csv.md +565 -0
  21. data/docs/options.md +151 -87
  22. data/docs/parsing_strategy.md +64 -1
  23. data/docs/real_world_csv.md +263 -0
  24. data/docs/releases/1.16.0/benchmarks.md +223 -0
  25. data/docs/releases/1.16.0/changes.md +273 -0
  26. data/docs/releases/1.16.0/performance_notes.md +114 -0
  27. data/docs/row_col_sep.md +15 -5
  28. data/docs/ruby_csv_pitfalls.md +514 -0
  29. data/docs/value_converters.md +194 -57
  30. data/ext/smarter_csv/extconf.rb +3 -0
  31. data/ext/smarter_csv/smarter_csv.c +1017 -82
  32. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
  33. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
  34. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
  35. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
  36. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
  37. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
  38. data/lib/smarter_csv/errors.rb +8 -0
  39. data/lib/smarter_csv/file_io.rb +1 -1
  40. data/lib/smarter_csv/hash_transformations.rb +14 -13
  41. data/lib/smarter_csv/header_transformations.rb +21 -2
  42. data/lib/smarter_csv/headers.rb +2 -1
  43. data/lib/smarter_csv/options.rb +124 -7
  44. data/lib/smarter_csv/parser.rb +358 -74
  45. data/lib/smarter_csv/reader.rb +494 -46
  46. data/lib/smarter_csv/version.rb +1 -1
  47. data/lib/smarter_csv/writer.rb +71 -19
  48. data/lib/smarter_csv.rb +134 -13
  49. data/smarter_csv.gemspec +20 -10
  50. metadata +38 -80
@@ -2,6 +2,8 @@
2
2
  ### Contents
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
+ * [Migrating from Ruby CSV](./migrating_from_csv.md)
6
+ * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
5
7
  * [Parsing Strategy](./parsing_strategy.md)
6
8
  * [The Basic Read API](./basic_read_api.md)
7
9
  * [The Basic Write API](./basic_write_api.md)
@@ -10,76 +12,211 @@
10
12
  * [Row and Column Separators](./row_col_sep.md)
11
13
  * [Header Transformations](./header_transformations.md)
12
14
  * [Header Validations](./header_validations.md)
15
+ * [Column Selection](./column_selection.md)
13
16
  * [Data Transformations](./data_transformations.md)
14
17
  * [**Value Converters**](./value_converters.md)
15
-
16
- --------------
18
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
19
+ * [Instrumentation Hooks](./instrumentation.md)
20
+ * [Examples](./examples.md)
21
+ * [Real-World CSV Files](./real_world_csv.md)
22
+ * [SmarterCSV over the Years](./history.md)
23
+ * [Release Notes](./releases/1.16.0/changes.md)
24
+
25
+ --------------
17
26
 
18
27
  # Using Value Converters for Reading CSV
19
28
 
20
- Value Converters allow you to do custom transformations specific rows, to help you massage the data so it fits the expectations of your down-stream process, such as creating a DB record.
29
+ Value converters let you transform raw CSV strings into the types your downstream code
30
+ expects — dates, booleans, numbers, Money objects, whatever you need. They run per-key,
31
+ after SmarterCSV has parsed and mapped the headers.
32
+
33
+ A converter is either a **lambda** (for simple inline cases) or a **class** implementing
34
+ `self.convert(value)` (for reusable, independently testable converters). Both forms are
35
+ fully supported.
36
+
37
+ The examples throughout this page use the following fixture file:
38
+
39
+ ```
40
+ first,last,date,price,member
41
+ Ben,Miller,10/30/1998,$44.50,TRUE
42
+ Tom,Turner,2/1/2011,$15.99,False
43
+ Ken,Smith,01/09/2013,$199.99,true
44
+ ```
45
+
46
+ > **Key mapping interaction:** if you use `key_mapping:`, converters must reference the
47
+ > **mapped** key name, not the original CSV header name. The mapping runs first; converters
48
+ > see the final key.
49
+
50
+ ## Lambda Converters
21
51
 
22
- If you use `key_mappings` and `value_converters`, make sure that the value converters references the keys based on the final mapped name, not the original name in the CSV file.
52
+ Lambdas are the quickest way to define a converter inline.
53
+
54
+ **Boolean:**
23
55
 
24
56
  ```ruby
25
- $ cat spec/fixtures/with_dates.csv
26
- first,last,date,price,member
27
- Ben,Miller,10/30/1998,$44.50,TRUE
28
- Tom,Turner,2/1/2011,$15.99,False
29
- Ken,Smith,01/09/2013,$199.99,true
30
-
31
- $ irb
32
- > require 'smarter_csv'
33
- > require 'date'
34
-
35
- # define a custom converter class, which implements self.convert(value)
36
- class DateConverter
37
- def self.convert(value)
38
- Date.strptime( value, '%m/%d/%Y') # parses custom date format into Date instance
39
- end
40
- end
57
+ bool = ->(v) { v&.match?(/\Atrue\z/i) }
41
58
 
42
- class DollarConverter
43
- def self.convert(value)
44
- value.sub('$','').to_f # strips the dollar sign and creates a Float value
45
- end
46
- end
59
+ data = SmarterCSV.process('records.csv', value_converters: { active: bool, verified: bool })
60
+ # "TRUE" => true
61
+ # "false" => false
62
+ # nil => nil (& guard handles missing/empty fields)
63
+ ```
47
64
 
48
- require 'money'
49
- class MoneyConverter
50
- def self.convert(value)
51
- # depending on locale you might want to also remove the indicator for thousands, e.g. comma
52
- Money.from_amount(value.gsub(/[\s\$]/,'').to_f) # creates a Money instance (based on cents)
53
- end
54
- end
65
+ **Strip currency symbol and convert to Float:**
66
+
67
+ ```ruby
68
+ dollar = ->(v) { v&.sub('$', '')&.to_f }
69
+
70
+ data = SmarterCSV.process('records.csv', value_converters: { price: dollar, tax: dollar })
71
+ # "$44.50" => 44.5
72
+ # nil => nil
73
+ ```
74
+
75
+ **Reusing the same lambda across multiple keys:**
76
+
77
+ ```ruby
78
+ date = ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil }
79
+
80
+ data = SmarterCSV.process('records.csv', value_converters: { start_date: date, end_date: date })
81
+ ```
82
+
83
+ **`key_mapping` + `value_converters` — always use the mapped name:**
84
+
85
+ ```ruby
86
+ # CSV header is "MemberSince" — mapped to :member_since
87
+ options = {
88
+ key_mapping: { membersince: :member_since },
89
+ value_converters: { member_since: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil } },
90
+ }
91
+ data = SmarterCSV.process('records.csv', options)
92
+ ```
93
+
94
+ ## Handling nil and Empty Fields
95
+
96
+ Converters receive the raw string value from the CSV field. If a field is blank or missing,
97
+ the value passed to your converter may be `nil` or `""`. Always guard against this:
98
+
99
+ ```ruby
100
+ # Safe: returns nil for blank fields instead of raising
101
+ price = ->(v) { v&.sub('$', '')&.to_f }
102
+
103
+ # Unsafe: raises NoMethodError when v is nil
104
+ price = ->(v) { v.sub('$', '').to_f }
105
+ ```
55
106
 
56
- class BooleanConverter
57
- def self.convert(value)
58
- case value
59
- when /true/i
60
- true
61
- when /false/i
62
- false
63
- else
64
- nil
65
- end
66
- end
107
+ For class-based converters, add an explicit guard at the top of `self.convert`:
108
+
109
+ ```ruby
110
+ def self.convert(value)
111
+ return nil if value.nil? || value.empty?
112
+ # ... rest of conversion
113
+ end
114
+ ```
115
+
116
+ ## Class-Based Converters
117
+
118
+ For converters you want to reuse across the codebase or test independently, define a class
119
+ with a `self.convert(value)` class method:
120
+
121
+ ```ruby
122
+ require 'date'
123
+
124
+ class DateConverter
125
+ def self.convert(value)
126
+ return nil if value.nil? || value.empty?
127
+ Date.strptime(value, '%m/%d/%Y')
128
+ end
129
+ end
130
+
131
+ class DollarConverter
132
+ def self.convert(value)
133
+ return nil if value.nil? || value.empty?
134
+ value.sub('$', '').to_f
135
+ end
136
+ end
137
+
138
+ class BooleanConverter
139
+ def self.convert(value)
140
+ case value
141
+ when /\Atrue\z/i then true
142
+ when /\Afalse\z/i then false
67
143
  end
144
+ end
145
+ end
68
146
 
69
- options = {value_converters: {date: DateConverter, price: DollarConverter, member: BooleanConverter}}
70
- data = SmarterCSV.process("spec/fixtures/with_dates.csv", options)
71
- first_record = data.first
72
- first_record[:date]
73
- => #<Date: 1998-10-30 ((2451117j,0s,0n),+0s,2299161j)>
74
- first_record[:date].class
75
- => Date
76
- first_record[:price]
77
- => 44.50
78
- first_record[:price].class
79
- => Float
80
- first_record[:member]
81
- => true
147
+ options = {
148
+ value_converters: {
149
+ date: DateConverter,
150
+ price: DollarConverter,
151
+ member: BooleanConverter,
152
+ }
153
+ }
154
+ data = SmarterCSV.process('spec/fixtures/with_dates.csv', options)
155
+
156
+ data.first[:date] #=> #<Date: 1998-10-30>
157
+ data.first[:price] #=> 44.5
158
+ data.first[:member] #=> true
82
159
  ```
83
160
 
161
+ ## Money Converter
162
+
163
+ For applications using the [`money`](https://github.com/RubyMoney/money) gem:
164
+
165
+ ```ruby
166
+ require 'money'
167
+
168
+ class MoneyConverter
169
+ def self.convert(value)
170
+ return nil if value.nil? || value.empty?
171
+ # remove currency symbol and thousands separators before converting
172
+ Money.from_amount(value.gsub(/[\s$,]/, '').to_f)
173
+ end
174
+ end
175
+
176
+ data = SmarterCSV.process('invoices.csv', value_converters: { amount: MoneyConverter })
177
+ ```
178
+
179
+ ## Why there are no built-in Date / Time / DateTime converters
180
+
181
+ SmarterCSV intentionally does not ship built-in date or time converters. The reason is
182
+ **localization (L10N)**: date formats vary widely across regions and there is no single
183
+ correct interpretation of a bare string like `"12/03/2020"` — it is December 3rd in the
184
+ United States but March 12th in most of Europe.
185
+
186
+ Ruby's standard library `Date.parse` / `DateTime.parse` handle ISO 8601 and a handful of
187
+ English-language formats, but they are not locale-aware and will silently produce the wrong
188
+ date for locale-specific formats. Shipping a built-in converter that is wrong for half the
189
+ world's locales would be worse than shipping none.
190
+
191
+ The right solution is a `value_converter` with an explicit format string tuned to your data:
192
+
193
+ ```ruby
194
+ require 'date'
195
+
196
+ # US format: MM/DD/YYYY
197
+ us_date = ->(v) { Date.strptime(v, '%m/%d/%Y') rescue v }
198
+
199
+ # European format: DD.MM.YYYY
200
+ eu_date = ->(v) { Date.strptime(v, '%d.%m.%Y') rescue v }
201
+
202
+ # ISO 8601 (unambiguous, safe to use without rescue)
203
+ iso_date = ->(v) { Date.iso8601(v) rescue v }
204
+
205
+ options = {
206
+ value_converters: {
207
+ birth_date: eu_date,
208
+ created_at: iso_date,
209
+ invoiced_on: us_date,
210
+ }
211
+ }
212
+ data = SmarterCSV.process('records.csv', options)
213
+ ```
214
+
215
+ For locale-aware parsing of user-supplied date strings (e.g., "3. Oktober 2024" in German),
216
+ consider the [`delocalize`](https://github.com/clemens/delocalize) gem, which integrates
217
+ with Rails' I18n locale configuration. For natural-language date strings, consider
218
+ [`chronic`](https://github.com/mojombo/chronic).
219
+
84
220
  --------------------
85
- PREVIOUS: [Data Transformations](./data_transformations.md) | UP: [README](../README.md)
221
+
222
+ PREVIOUS: [Data Transformations](./data_transformations.md) | NEXT: [Bad Row Quarantine](./bad_row_quarantine.md) | UP: [README](../README.md)
@@ -11,6 +11,9 @@ end
11
11
 
12
12
  optflags = "-O3 -flto -fomit-frame-pointer -DNDEBUG".dup
13
13
  optflags << " -march=native" unless RUBY_PLATFORM.start_with?("arm64-darwin")
14
+ # -fno-semantic-interposition: GCC/Clang only (not MSVC). Allows intra-library
15
+ # calls to bypass the PLT on Linux and enables more aggressive LTO inlining.
16
+ optflags << " -fno-semantic-interposition" unless RUBY_PLATFORM.include?("mswin")
14
17
 
15
18
  append_cflags('-Wno-compound-token-split-by-macro')
16
19