smarter_csv 1.15.2 → 1.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +9 -0
  3. data/CHANGELOG.md +68 -1
  4. data/CONTRIBUTORS.md +3 -1
  5. data/Gemfile +1 -0
  6. data/README.md +123 -27
  7. data/docs/_introduction.md +40 -24
  8. data/docs/bad_row_quarantine.md +285 -0
  9. data/docs/basic_read_api.md +151 -9
  10. data/docs/basic_write_api.md +474 -59
  11. data/docs/batch_processing.md +161 -4
  12. data/docs/column_selection.md +183 -0
  13. data/docs/data_transformations.md +162 -29
  14. data/docs/examples.md +339 -46
  15. data/docs/header_transformations.md +93 -12
  16. data/docs/header_validations.md +56 -18
  17. data/docs/history.md +117 -0
  18. data/docs/instrumentation.md +165 -0
  19. data/docs/migrating_from_csv.md +290 -0
  20. data/docs/options.md +150 -87
  21. data/docs/parsing_strategy.md +63 -1
  22. data/docs/real_world_csv.md +262 -0
  23. data/docs/releases/1.16.0/benchmarks.md +223 -0
  24. data/docs/releases/1.16.0/changes.md +272 -0
  25. data/docs/releases/1.16.0/performance_notes.md +114 -0
  26. data/docs/row_col_sep.md +14 -5
  27. data/docs/value_converters.md +193 -57
  28. data/ext/smarter_csv/extconf.rb +3 -0
  29. data/ext/smarter_csv/smarter_csv.c +1007 -71
  30. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
  31. data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
  32. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
  33. data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
  34. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
  35. data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
  36. data/lib/smarter_csv/errors.rb +8 -0
  37. data/lib/smarter_csv/file_io.rb +1 -1
  38. data/lib/smarter_csv/hash_transformations.rb +14 -13
  39. data/lib/smarter_csv/header_transformations.rb +21 -2
  40. data/lib/smarter_csv/headers.rb +2 -1
  41. data/lib/smarter_csv/options.rb +124 -7
  42. data/lib/smarter_csv/parser.rb +362 -75
  43. data/lib/smarter_csv/reader.rb +494 -46
  44. data/lib/smarter_csv/version.rb +1 -1
  45. data/lib/smarter_csv/writer.rb +71 -19
  46. data/lib/smarter_csv.rb +95 -12
  47. data/smarter_csv.gemspec +20 -10
  48. metadata +37 -80
@@ -2,6 +2,7 @@
2
2
  ### Contents
3
3
 
4
4
  * [Introduction](./_introduction.md)
5
+ * [Migrating from Ruby CSV](./migrating_from_csv.md)
5
6
  * [Parsing Strategy](./parsing_strategy.md)
6
7
  * [The Basic Read API](./basic_read_api.md)
7
8
  * [The Basic Write API](./basic_write_api.md)
@@ -10,76 +11,211 @@
10
11
  * [Row and Column Separators](./row_col_sep.md)
11
12
  * [Header Transformations](./header_transformations.md)
12
13
  * [Header Validations](./header_validations.md)
14
+ * [Column Selection](./column_selection.md)
13
15
  * [Data Transformations](./data_transformations.md)
14
16
  * [**Value Converters**](./value_converters.md)
15
-
16
- --------------
17
+ * [Bad Row Quarantine](./bad_row_quarantine.md)
18
+ * [Instrumentation Hooks](./instrumentation.md)
19
+ * [Examples](./examples.md)
20
+ * [Real-World CSV Files](./real_world_csv.md)
21
+ * [SmarterCSV over the Years](./history.md)
22
+ * [Release Notes](./releases/1.16.0/changes.md)
23
+
24
+ --------------
17
25
 
18
26
  # Using Value Converters for Reading CSV
19
27
 
20
- Value Converters allow you to do custom transformations specific rows, to help you massage the data so it fits the expectations of your down-stream process, such as creating a DB record.
28
+ Value converters let you transform raw CSV strings into the types your downstream code
29
+ expects — dates, booleans, numbers, Money objects, whatever you need. They run per-key,
30
+ after SmarterCSV has parsed and mapped the headers.
31
+
32
+ A converter is either a **lambda** (for simple inline cases) or a **class** implementing
33
+ `self.convert(value)` (for reusable, independently testable converters). Both forms are
34
+ fully supported.
35
+
36
+ The examples throughout this page use the following fixture file:
37
+
38
+ ```
39
+ first,last,date,price,member
40
+ Ben,Miller,10/30/1998,$44.50,TRUE
41
+ Tom,Turner,2/1/2011,$15.99,False
42
+ Ken,Smith,01/09/2013,$199.99,true
43
+ ```
44
+
45
+ > **Key mapping interaction:** if you use `key_mapping:`, converters must reference the
46
+ > **mapped** key name, not the original CSV header name. The mapping runs first; converters
47
+ > see the final key.
48
+
49
+ ## Lambda Converters
21
50
 
22
- If you use `key_mappings` and `value_converters`, make sure that the value converters references the keys based on the final mapped name, not the original name in the CSV file.
51
+ Lambdas are the quickest way to define a converter inline.
52
+
53
+ **Boolean:**
23
54
 
24
55
  ```ruby
25
- $ cat spec/fixtures/with_dates.csv
26
- first,last,date,price,member
27
- Ben,Miller,10/30/1998,$44.50,TRUE
28
- Tom,Turner,2/1/2011,$15.99,False
29
- Ken,Smith,01/09/2013,$199.99,true
30
-
31
- $ irb
32
- > require 'smarter_csv'
33
- > require 'date'
34
-
35
- # define a custom converter class, which implements self.convert(value)
36
- class DateConverter
37
- def self.convert(value)
38
- Date.strptime( value, '%m/%d/%Y') # parses custom date format into Date instance
39
- end
40
- end
56
+ bool = ->(v) { v&.match?(/\Atrue\z/i) }
41
57
 
42
- class DollarConverter
43
- def self.convert(value)
44
- value.sub('$','').to_f # strips the dollar sign and creates a Float value
45
- end
46
- end
58
+ data = SmarterCSV.process('records.csv', value_converters: { active: bool, verified: bool })
59
+ # "TRUE" => true
60
+ # "false" => false
61
+ # nil => nil (& guard handles missing/empty fields)
62
+ ```
47
63
 
48
- require 'money'
49
- class MoneyConverter
50
- def self.convert(value)
51
- # depending on locale you might want to also remove the indicator for thousands, e.g. comma
52
- Money.from_amount(value.gsub(/[\s\$]/,'').to_f) # creates a Money instance (based on cents)
53
- end
54
- end
64
+ **Strip currency symbol and convert to Float:**
65
+
66
+ ```ruby
67
+ dollar = ->(v) { v&.sub('$', '')&.to_f }
68
+
69
+ data = SmarterCSV.process('records.csv', value_converters: { price: dollar, tax: dollar })
70
+ # "$44.50" => 44.5
71
+ # nil => nil
72
+ ```
73
+
74
+ **Reusing the same lambda across multiple keys:**
75
+
76
+ ```ruby
77
+ date = ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil }
78
+
79
+ data = SmarterCSV.process('records.csv', value_converters: { start_date: date, end_date: date })
80
+ ```
81
+
82
+ **`key_mapping` + `value_converters` — always use the mapped name:**
83
+
84
+ ```ruby
85
+ # CSV header is "MemberSince" — mapped to :member_since
86
+ options = {
87
+ key_mapping: { membersince: :member_since },
88
+ value_converters: { member_since: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil } },
89
+ }
90
+ data = SmarterCSV.process('records.csv', options)
91
+ ```
92
+
93
+ ## Handling nil and Empty Fields
94
+
95
+ Converters receive the raw string value from the CSV field. If a field is blank or missing,
96
+ the value passed to your converter may be `nil` or `""`. Always guard against this:
97
+
98
+ ```ruby
99
+ # Safe: returns nil for blank fields instead of raising
100
+ price = ->(v) { v&.sub('$', '')&.to_f }
101
+
102
+ # Unsafe: raises NoMethodError when v is nil
103
+ price = ->(v) { v.sub('$', '').to_f }
104
+ ```
55
105
 
56
- class BooleanConverter
57
- def self.convert(value)
58
- case value
59
- when /true/i
60
- true
61
- when /false/i
62
- false
63
- else
64
- nil
65
- end
66
- end
106
+ For class-based converters, add an explicit guard at the top of `self.convert`:
107
+
108
+ ```ruby
109
+ def self.convert(value)
110
+ return nil if value.nil? || value.empty?
111
+ # ... rest of conversion
112
+ end
113
+ ```
114
+
115
+ ## Class-Based Converters
116
+
117
+ For converters you want to reuse across the codebase or test independently, define a class
118
+ with a `self.convert(value)` class method:
119
+
120
+ ```ruby
121
+ require 'date'
122
+
123
+ class DateConverter
124
+ def self.convert(value)
125
+ return nil if value.nil? || value.empty?
126
+ Date.strptime(value, '%m/%d/%Y')
127
+ end
128
+ end
129
+
130
+ class DollarConverter
131
+ def self.convert(value)
132
+ return nil if value.nil? || value.empty?
133
+ value.sub('$', '').to_f
134
+ end
135
+ end
136
+
137
+ class BooleanConverter
138
+ def self.convert(value)
139
+ case value
140
+ when /\Atrue\z/i then true
141
+ when /\Afalse\z/i then false
67
142
  end
143
+ end
144
+ end
68
145
 
69
- options = {value_converters: {date: DateConverter, price: DollarConverter, member: BooleanConverter}}
70
- data = SmarterCSV.process("spec/fixtures/with_dates.csv", options)
71
- first_record = data.first
72
- first_record[:date]
73
- => #<Date: 1998-10-30 ((2451117j,0s,0n),+0s,2299161j)>
74
- first_record[:date].class
75
- => Date
76
- first_record[:price]
77
- => 44.50
78
- first_record[:price].class
79
- => Float
80
- first_record[:member]
81
- => true
146
+ options = {
147
+ value_converters: {
148
+ date: DateConverter,
149
+ price: DollarConverter,
150
+ member: BooleanConverter,
151
+ }
152
+ }
153
+ data = SmarterCSV.process('spec/fixtures/with_dates.csv', options)
154
+
155
+ data.first[:date] #=> #<Date: 1998-10-30>
156
+ data.first[:price] #=> 44.5
157
+ data.first[:member] #=> true
82
158
  ```
83
159
 
160
+ ## Money Converter
161
+
162
+ For applications using the [`money`](https://github.com/RubyMoney/money) gem:
163
+
164
+ ```ruby
165
+ require 'money'
166
+
167
+ class MoneyConverter
168
+ def self.convert(value)
169
+ return nil if value.nil? || value.empty?
170
+ # remove currency symbol and thousands separators before converting
171
+ Money.from_amount(value.gsub(/[\s$,]/, '').to_f)
172
+ end
173
+ end
174
+
175
+ data = SmarterCSV.process('invoices.csv', value_converters: { amount: MoneyConverter })
176
+ ```
177
+
178
+ ## Why there are no built-in Date / Time / DateTime converters
179
+
180
+ SmarterCSV intentionally does not ship built-in date or time converters. The reason is
181
+ **localization (L10N)**: date formats vary widely across regions and there is no single
182
+ correct interpretation of a bare string like `"12/03/2020"` — it is December 3rd in the
183
+ United States but March 12th in most of Europe.
184
+
185
+ Ruby's standard library `Date.parse` / `DateTime.parse` handle ISO 8601 and a handful of
186
+ English-language formats, but they are not locale-aware and will silently produce the wrong
187
+ date for locale-specific formats. Shipping a built-in converter that is wrong for half the
188
+ world's locales would be worse than shipping none.
189
+
190
+ The right solution is a `value_converter` with an explicit format string tuned to your data:
191
+
192
+ ```ruby
193
+ require 'date'
194
+
195
+ # US format: MM/DD/YYYY
196
+ us_date = ->(v) { Date.strptime(v, '%m/%d/%Y') rescue v }
197
+
198
+ # European format: DD.MM.YYYY
199
+ eu_date = ->(v) { Date.strptime(v, '%d.%m.%Y') rescue v }
200
+
201
+ # ISO 8601 (unambiguous, safe to use without rescue)
202
+ iso_date = ->(v) { Date.iso8601(v) rescue v }
203
+
204
+ options = {
205
+ value_converters: {
206
+ birth_date: eu_date,
207
+ created_at: iso_date,
208
+ invoiced_on: us_date,
209
+ }
210
+ }
211
+ data = SmarterCSV.process('records.csv', options)
212
+ ```
213
+
214
+ For locale-aware parsing of user-supplied date strings (e.g., "3. Oktober 2024" in German),
215
+ consider the [`delocalize`](https://github.com/clemens/delocalize) gem, which integrates
216
+ with Rails' I18n locale configuration. For natural-language date strings, consider
217
+ [`chronic`](https://github.com/mojombo/chronic).
218
+
84
219
  --------------------
85
- PREVIOUS: [Data Transformations](./data_transformations.md) | UP: [README](../README.md)
220
+
221
+ PREVIOUS: [Data Transformations](./data_transformations.md) | NEXT: [Bad Row Quarantine](./bad_row_quarantine.md) | UP: [README](../README.md)
@@ -11,6 +11,9 @@ end
11
11
 
12
12
  optflags = "-O3 -flto -fomit-frame-pointer -DNDEBUG".dup
13
13
  optflags << " -march=native" unless RUBY_PLATFORM.start_with?("arm64-darwin")
14
+ # -fno-semantic-interposition: GCC/Clang only (not MSVC). Allows intra-library
15
+ # calls to bypass the PLT on Linux and enables more aggressive LTO inlining.
16
+ optflags << " -fno-semantic-interposition" unless RUBY_PLATFORM.include?("mswin")
14
17
 
15
18
  append_cflags('-Wno-compound-token-split-by-macro')
16
19