csv_party 0.0.1.pre8 → 1.0.0.rc5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: f4ebba54721bd80642ce2a0d3d07add5d4cdb348
4
- data.tar.gz: cd77e182c9026980d9df49ce32a58bf6789dd570
2
+ SHA256:
3
+ metadata.gz: 174fc0ba77e52795b1763e2ff29eae5e5c3cfa3bcd181973aaba886a54a79eeb
4
+ data.tar.gz: 7676780e13435e10bf35b7c34c849484ea0efe0a86d7889ff2a4697f312a99c6
5
5
  SHA512:
6
- metadata.gz: b2d9374b60ca71b54ff36f35751245580dfa2f8904fab72c8010162a34e254d12c8f7b146e0ca532c03a3ad963fd1d19c880fe3e92776260e4d07d064d532bbd
7
- data.tar.gz: cffc09a0eacaf8639cd5dede503f97e8dd0560e9866b39d37c51254d337ab1ebb89c882c184ff037500df6d5f537597670e0ce84eb6971b9defeebb1d6eddffd
6
+ metadata.gz: ab4bc476717516d1fa2a9d674f47e60c95ef0b24541cb7cf6cf76948ca5b9ce9dad96b9b89d10edfd99f338d4a04b612fd241c38d37d746d049f6eedb8868120
7
+ data.tar.gz: 4041355ba68669a3cf3e6275bf099da83d3d92d759bfc328269080a41dcbee02fa9a77b9c95c3549eaf5330cfc9fbbdcfdea05f83d320d2f29b55e89553d3fea
data/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 Richard A. Jones
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,218 @@
1
+ [![Gem Version](https://badge.fury.io/rb/csv_party.svg)](https://badge.fury.io/rb/csv_party)
2
+ [![Build Status](https://travis-ci.org/toasterlovin/csv_party.svg?branch=master)](https://travis-ci.org/toasterlovin/csv_party)
3
+ [![Code Climate Maintainability](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/maintainability)](https://codeclimate.com/github/toasterlovin/csv_party/maintainability)
4
+ [![Code Climate Test Coverage](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/test_coverage)](https://codeclimate.com/github/toasterlovin/csv_party/test_coverage)
5
+
6
+ # Make importing CSV files a party
7
+
8
+ The point of this gem is to make it easier to focus on the business
9
+ logic of your CSV imports. You start by defining which columns you
10
+ will be importing, as well as how they will be parsed. Then, you
11
+ specify what you want to do with each row after it has been parsed.
12
+ That's it; CSVParty takes care of all the tedious stuff for you.
13
+
14
+ ## Defining Columns
15
+
16
+ This is what defining your import columns look like:
17
+
18
+ class MyImporter < CSVParty
19
+ column :price, header: 'Nonsensical Column Name', as: :decimal
20
+ end
21
+
22
+ This will take the value in the 'Nonsensical Column Name' column,
23
+ parse it as a decimal, then make it available to your import logic
24
+ as a nice, sane variable named `price`.
25
+
26
+ The available built-in parsers are:
27
+
28
+ - `:raw` returns the value from the CSV file, unchanged
29
+ - `:string` strips whitespace and returns the resulting string
30
+ - `:integer` strips whitespace, then calls `to_i` on the resulting string
31
+ - `:decimal` strips all characters except `0-9` and `.`, then passes the
32
+ resulting string to `BigDecimal.new`
33
+ - `:boolean` strips whitespace, downcases, then returns `true` if the
34
+ resulting string is `'1'`, `'t'`, or `'true'`, otherwise it returns `false`
35
+
36
+ When defining a column, you can also pass a block if you need custom
37
+ parsing logic:
38
+
39
+ class MyImporter < CSVParty
40
+ column :product, header: 'Product' do |value|
41
+ Product.find_by(name: value)
42
+ end
43
+ end
44
+
45
+ Or, if you want to re-use a custom parser for multiple columns, just
46
+ define a method on your class with a name that ends in `_parser` and
47
+ you can use it the same way you use the built-in parsers:
48
+
49
+ class MyImporter < CSVParty
50
+ def dollars_to_cents_parser(value)
51
+ (BigDecimal.new(value) * 100).to_i
52
+ end
53
+
54
+ column :price_in_cents, header: 'Price in $', as: :dollars_to_cents
55
+ column :cost_in_cents, header: 'Cost in $', as: :dollars_to_cents
56
+ end
57
+
58
+ #### NOTE: Parsing nil and blank values
59
+
60
+ By default, CSVParty will intercept any values that are `nil` or which contain
61
+ only whitespace and coerce them to `nil` _without invoking the parser for that
62
+ column_. This applies to all parsers, including custom parsers which you
63
+ define, with one exception: the :raw parser. This is done as a convenience to
64
+ avoid pesky `NoMethodError`s that arise when a parser tries to do its thing
65
+ to a `nil` value that it wasn't expecting. You can turn this behavior off on a
66
+ given column by setting `intercept_blanks` to `false` in the options hash:
67
+
68
+ class MyImporter < CSVParty
69
+ column :price, header: 'Price', intercept_blanks: false do |value|
70
+ if value.nil?
71
+ 'n/a'
72
+ else
73
+ BigDecimal.new(value)
74
+ end
75
+ end
76
+ end
77
+
78
+ #### NOTE: Parsers cannot reference each other
79
+
80
+ When using a custom parser to parse a column, the block or method that you
81
+ define has no way to reference the values from any other columns. So, this won't
82
+ work:
83
+
84
+ class MyImporter < CSVParty
85
+ column :product, header: 'Product', do |value|
86
+ Product.find_by(name: value)
87
+ end
88
+
89
+ column :price, header: 'Price', do |value|
90
+ # product is not defined...
91
+ product.price = BigDecimal.new(value)
92
+ end
93
+ end
94
+
95
+ Instead, you would do this in your row import logic. Which brings us to:
96
+
97
+ ## Importing Rows
98
+
99
+ Once you've defined all of your columns, you specify your logic for importing
100
+ rows by passing a block to the `rows` DSL method. That block will have access
101
+ to a `row` variable which contains all of the parsed values for your columns.
102
+ Here's what that looks like:
103
+
104
+ class MyImporter < CSVParty
105
+ rows do |row|
106
+ product = row.product
107
+ product.price = row.price
108
+ product.save
109
+ end
110
+ end
111
+
112
+ The `row` variable also provides access to two other things:
113
+
114
+ - The unparsed values for your columns
115
+ - The raw CSV string for that row
116
+
117
+ Here's how you access those:
118
+
119
+ class MyImporter < CSVParty
120
+ rows do |row|
121
+ row.price # parsed value: #<BigDecimal:7f88d92cb820,'0.9E1',9(18)>
122
+ row.unparsed.price # unparsed value: '$9.00'
123
+ row.string # raw CSV string: 'USB Cable,$9.00,Box,Blue'
124
+ end
125
+ end
126
+
127
+ ## Importing
128
+
129
+ Once your importer class is defined, you use it like this:
130
+
131
+ importer = MyImporter.new('path/to/file.csv')
132
+ importer.import!
133
+
134
+ You can also specify what should happen before and after your import by passing
135
+ a block to `import`, like so:
136
+
137
+ class MyImporter < CSVParty
138
+ # column definitions
139
+ # row import logic
140
+
141
+ import do
142
+ puts 'Starting import'
143
+ import_rows!
144
+ puts 'Import finished!'
145
+ end
146
+ end
147
+
148
+ You can do whatever you want inside of the `import` block, just make sure to
149
+ call `import_rows!` somewhere in there.
150
+
151
+ ## Handling Errors
152
+
153
+ One of the hallmarks of importing data from CSV files is that there are
154
+ inevitably rows with errors of some kind. You can handle error rows by
155
+ specifying an `errors` block:
156
+
157
+ class MyImporter < CSVParty
158
+ # column definitions
159
+ # row import logic
160
+
161
+ errors do |error, line_number|
162
+ # log error
163
+ end
164
+ end
165
+
166
+ Any row in your CSV file which results in an exception will be passed to this
167
+ block. Which means you can specify that there is an error with a given row by
168
+ raising an exception:
169
+
170
+ rows do |row|
171
+ # rows with price less than 0 will be treated as errors
172
+ raise if row.price < 0
173
+ end
174
+
175
+ ## External Dependencies
176
+
177
+ Sometimes you need access to external objects in your importer's logic. You can specify
178
+ what external objects your importer depends on with `depends_on`. Dependencies declared
179
+ this way will then be available in your parsers and your `rows`, `import`, and `errors`
180
+ blocks:
181
+
182
+ class MyImporter < CSVParty
183
+ # column definitions...
184
+
185
+ depends_on: :product_import
186
+
187
+ rows do |row|
188
+ # do some stuff
189
+
190
+ # product_import is not provided by the class,
191
+ # but is passed in at runtime instead!
192
+ product_import.log_success(product)
193
+ end
194
+ end
195
+
196
+ Then, to pass the dependency in at runtime, you just add an option to `.new` with
197
+ the name and value of the dependency:
198
+
199
+ MyImporter.new(
200
+ 'path/to/csv',
201
+ product_import: @product_import
202
+ )
203
+
204
+ # Tested Rubies
205
+
206
+ CSVParty has been tested against the following Rubies:
207
+
208
+ MRI
209
+ - 2.5
210
+ - 2.4
211
+ - 2.3
212
+ - 2.2
213
+ - 2.1
214
+ - 2.0
215
+
216
+ # License
217
+
218
+ This project uses the MIT License. See LICENSE.md for details.
data/ROADMAP.md ADDED
@@ -0,0 +1,355 @@
1
+ Roadmap
2
+ -
3
+
4
+ - [1.1 Early Return While Parsing](#11-early-return-while-parsing)
5
+ - [1.2 Rows to Hash](#12-rows-to-hash)
6
+ - [1.3 Generate Unimported Rows CSV](#13-generate-unimported-rows-csv)
7
+ - [1.4 Batch API](#14-batch-api)
8
+ - [1.5 Runtime Configuration](#15-runtime-configuration)
9
+ - [1.6 CSV Parse Error Handling](#16-csv-parse-error-handling)
10
+ - [Someday Features](#someday-features)
11
+ - [Parse Row Access](#parse-row-access)
12
+ - [Deferred Parsing](#deferred-parsing)
13
+ - [Columns Macro](#columns-macro)
14
+ - [Column Numbers](#column-numbers)
15
+ - [Multi-column Parsing](#multi-column-parsing)
16
+ - [Parse Dependencies](#parse-dependencies)
17
+ - [Rails Generator](#rails-generator0
18
+
19
+ #### 1.1 Early Return While Parsing
20
+
21
+ Currently, CSVParty is pretty well thought out about what should happen when
22
+ either 1) one of the built in flow control methods (`next_row`, `skip_row`,
23
+ `abort_row`, and `abort_import`) is used, or 2) an error is raised while
24
+ the row importer block is being executed. However, all of these things can also
25
+ happen when the columns for a row are being parsed. When/if it does, most of the
26
+ flow control and error handling kind of assumes that the row has been fully
27
+ parsed. So some design work should go into deciding what should happen in these
28
+ cases. And then tests should be written for all of the various scenarios.
29
+
30
+ #### 1.2 Rows to Hash
31
+
32
+ One of the primary use cases for importing CSV files is to insert their contents
33
+ into a database. Apparently this is common enough that the
34
+ [csv-importer](https://github.com/pcreux/csv-importer) gem, which almost
35
+ completely automates this process without much room for customization, is very
36
+ popular. So, in the case where there is a pretty simple correspondence between
37
+ the contents of a CSV file and ActiveRecord models, it should be dead simple to
38
+ get the job done.
39
+
40
+ What I have in mind is something like:
41
+
42
+ class MyImporter < CSVParty::Importer
43
+ column :product_id
44
+ column :quantity
45
+ column :price
46
+
47
+ rows do |row|
48
+ LineItem.create(row.attributes)
49
+ end
50
+ end
51
+
52
+ Where `row.attributes` returns a hash with all of the column names as keys and
53
+ all of the parsed values as values. So, with an importer like the one above,
54
+ `row.attributes` would return a hash like so:
55
+
56
+ { product_id: 42, quantity: 3, price: 9.99 }
57
+
58
+ #### 1.3 Generate Unimported Rows CSV
59
+
60
+ Most user inputs to an application are relatively constrained. CSV files, on the
61
+ other hand, are not. Users can, and will, put all kinds of erroneous data into
62
+ their CSV files. So, it is useful to be able to provide a user with a list of
63
+ the rows in their file that could not be imported, so that they can re-import
64
+ these rows after they have resolved whatever issues existed. And CSV is a
65
+ natural format for this, since the user can open the file in Excel and make
66
+ edits.
67
+
68
+ A motivated user of CSVParty can already achieve this by accessing the
69
+ `skipped_rows`, `aborted_rows`, and `error_rows` arrays and constructing one or
70
+ more CSV files from these, but it would be nice to provide a default
71
+ implementation that is only a method call away. What I have in mind is for the
72
+ CSV file that is created to have the exact same column structure as the original
73
+ file, but with three additional columns:
74
+
75
+ - The original row number
76
+ - The status (skipped, aborted, errored)
77
+ - A message explaining the reason for the status
78
+
79
+ Conveniently, all of these pieces of data are available for skipped, aborted,
80
+ and errored rows. Then, the file would be generated with a method, like so:
81
+
82
+ # all three combined
83
+ importer.unimported_rows_as_csv
84
+ # or separate
85
+ importer.skipped_rows_as_csv
86
+ importer.aborted_rows_as_csv
87
+ importer.error_rows_as_csv
88
+
89
+ #### 1.4 Batch API
90
+
91
+ It can be way more performant to batch imports so that expensive operations,
92
+ like persisting data, are only done every so often. This would add an API to
93
+ accumulate data, execute some logic every X number of rows, reset the
94
+ accumulators, then repeat. Here's a rough sketch of what that API might look
95
+ like:
96
+
97
+ rows do |row|
98
+ customers[row.customer_id] = { name: row.customer_name, phone: row.phone }
99
+ orders[row.order_id] = { customer_id: row.customer_id, invoice_number: row.invoice_number }
100
+ end
101
+
102
+ batch 50, customers: {}, orders: {} do
103
+ # insert customers into database
104
+ # insert orders into database
105
+ end
106
+
107
+ The first argument is how often the batch logic should be executed. In this
108
+ case, every 50 rows. Then there is a hash of accumulators, where the keys are
109
+ the names of the accumulators and the values are the initial values. Declaring
110
+ the accumulators accomplished two things:
111
+
112
+ 1. It provides accessor methods so that the accumulators can be accessed from
113
+ within the row import block.
114
+ 2. It automatically resets the accumulators to their initial values each time
115
+ the batch block is executed.
116
+
117
+ So, it is essentially functionally identical to doing the following:
118
+
119
+ class MyImporter < CSVParty::Importer
120
+ attr_accessor :customers, :orders
121
+
122
+ def customers
123
+ @customers ||= {}
124
+ end
125
+
126
+ def orders
127
+ @orders ||= {}
128
+ end
129
+
130
+ rows do |row|
131
+ # add customer to customers accumulator
132
+ # add order to orders accumulator
133
+ end
134
+
135
+ batch 50 do
136
+ # insert customers into database
137
+ # insert orders into database
138
+ customers = {}
139
+ orders = {}
140
+ end
141
+ end
142
+
143
+ _Note:_ The following is a rough sketch of an API that would handle a use case
144
+ that has come up. However, some research should be done first to figure out if
145
+ the use case it addresses is common.
146
+
147
+ One use case that has been mentioned is when rows are grouped by their
148
+ relationship to a parent record and those rows need to be acted on as a group.
149
+ So, imagine a CSV file like so:
150
+
151
+ Customer,Address,Product,Quantity,Price
152
+ Joe Smith,123 Main St.,Birkenstocks,1,74.99
153
+ Joe Smith,123 Main St.,Air Jordans,1,129.99
154
+ Joe Smith,123 Main St.,Tevas,3,59.99
155
+ Jane Doe,713 Broadway,Converse All-Star,1,39.99
156
+ Jane Doe,713 Broadway,Toms,1,59.99
157
+
158
+ It might be useful to be able to specify the batch interval in terms of one of
159
+ the columns in the CSV file, rather than as a number of rows. So, you would be
160
+ able to do:
161
+
162
+ class MyImporter < CSVParty::Importer
163
+ column :customer
164
+ column :address
165
+ column :product
166
+ column :quantity, as: :integer
167
+ column :price, as: :decimal
168
+
169
+ rows do |row|
170
+ line_items << { product: row.product, quantity: row.quantity, price: row.price }
171
+ end
172
+
173
+ batch :customer, line_items: [] do |current_row|
174
+ Customer.create(name: current_row.customer, address: current_row.address)
175
+ line_items.each do |li|
176
+ LineItem.create(li)
177
+ end
178
+ end
179
+ end
180
+
181
+ In this case, the batch logic gets executed everytime there is a change in the
182
+ `:customer` column from one row to the next, rather than every X number of rows.
183
+ The accumulator works the same way: accessors are made available for adding
184
+ records to the accumulator and then the accumulator is automatically reset to
185
+ its initial value each time the batch logic is executed.
186
+
187
+ #### 1.5 Runtime Configuration
188
+
189
+ Sometimes it useful to be able to configure an importer at runtime, rather than
190
+ at code writing time. An obvious example of when this would be useful is in the
191
+ case of user defined column header names. So, imagine a UI in which the user
192
+ uploads their CSV file, then specifies which column is, for example, the product
193
+ column, which is the quantity column, and which is the price column. In a case
194
+ like this, there is no way to specify the column definitions ahead of time; we
195
+ have to wait for the header names from the user.
196
+
197
+ Here is a sketch of what the API for runtime configuration would look like:
198
+
199
+ class MyImporter < CSVParty::Importer
200
+ rows do |row|
201
+ # persist data
202
+ end
203
+ end
204
+
205
+ # then:
206
+
207
+ my_importer = MyImporter.new
208
+ my_importer.configure do
209
+ column :product, header: user_product_header
210
+ column :quantity, header: user_quantity_header, as: :integer
211
+ column :price, header: user_price_header, as: :decimal
212
+ end
213
+
214
+ An open question is whether all DSL methods should be configurable at runtime.
215
+
216
+ #### 1.6 CSV Parse Error Handling
217
+
218
+ Sometimes it is useful to be able to completely ignore parsing and encoding
219
+ errors raised by the `CSV` class. To be clear, doing so is dangerous, since the
220
+ parsing logic in the `CSV` class is not designed to continue operating after it
221
+ encounters an error and raises. But sometimes you don't want to let a single
222
+ improperly encoded character prevent you from importing an entire CSV file. So,
223
+ this feature would be an optional way to either ignore those errors or respond
224
+ to them, and then continue importing. The API would probably be similar to the
225
+ error handling API for non-parse errors. So:
226
+
227
+ parse_errors :ignore # silently continue importing the next row
228
+
229
+ parse_errors do |line_number|
230
+ # handle parse error
231
+ end
232
+
233
+ my_import.parse_error_rows # returns array of parse error rows
234
+
235
+ ## Someday Features
236
+
237
+ #### Parse Row Access
238
+
239
+ This feature would allow access to the `CSV::Row` object when parsing a column.
240
+ It could work something like this:
241
+
242
+ column product do |value, row|
243
+ Product.find_by(name: row['Product'])
244
+ end
245
+
246
+ In theory, the CSVParty API would cover all use cases where somebody would need
247
+ to access the raw row data, but perhaps not. Sometimes it's nice to be able to
248
+ cut through the stuff in your way and just get at the raw internals.
249
+
250
+ Additionally, perhaps deferred parsing would allow access to parsed row values,
251
+ which would possibly enable some of the features below, Multi Column Parsing and
252
+ Parse Dependencies, without requiring additional code. That might look like:
253
+
254
+ column product do |value, row|
255
+ Product.find_by(name: row.unparsed.row)
256
+ end
257
+
258
+ #### Deferred Parsing
259
+
260
+ Currently, CSVParty parses all columns before the row import logic is executed.
261
+ There are situations where parsing columns is expensive and you may want to
262
+ defer parsing columns until and unless you actually need them for that a given
263
+ row. For example, say you have an import where you are either:
264
+
265
+ 1. Updating a value on existing records, or
266
+ 2. Setting a value on and creating new records
267
+
268
+ And when you create new records, you have to also set a bunch of other values
269
+ and some of those values require database queries.
270
+
271
+ In a situation like this, you would want to defer all of the database queries
272
+ that need to run when creating a new record so that they aren't done in cases
273
+ where you are updating an existing record.
274
+
275
+ #### Columns Macro
276
+
277
+ This feature would allow you to declare multiple columns in a single line. So,
278
+ rather than:
279
+
280
+ column :product
281
+ column :price
282
+ column :color
283
+
284
+ You could do:
285
+
286
+ columns :product, :price, :color
287
+
288
+ This is probably most useful when there are a bunch of columns that should all
289
+ be parsed as text. Though it might make sense to allow specifying parsers and
290
+ other options:
291
+
292
+ columns product: { as: :raw }, price: { as: :decimal }
293
+
294
+ It should probably also be possible to combine `columns` and `column` macros:
295
+
296
+ columns :product, :price, :color
297
+ column :size
298
+
299
+ #### Column Numbers
300
+
301
+ CSVParty is entirely oriented around a CSV file having a header. This is not
302
+ always the case, though. This would add the ability to specify columns using a
303
+ column number, rather than a header. A rough sketch of the API might look like:
304
+
305
+ class MyImporter < CSVParty::Importer
306
+ column :product, number: 7
307
+ column :quantity, number: 8, as: :integer
308
+ column :price, number: 9, as: :decimal
309
+ end
310
+
311
+ #### Multi-column Parsing
312
+
313
+ The whole idea behind custom parsers is that it makes for much cleaner code to
314
+ get all the logic related to parsing a raw value into a useful intermediate
315
+ object in one place, away from the larger logic of what needs to happen to each
316
+ row. Sometimes, though, you need access to multiple column values to create a
317
+ useful parsed value. Here is what an API for that might look like:
318
+
319
+ column :total, header: ['Price', 'Quantity'] do |price, quantity|
320
+ BigDecimal.new(price) * BigDecimal.new(quantity)
321
+ end
322
+
323
+ #### Parse Dependencies
324
+
325
+ Sometimes, while parsing a column, it would be useful to have access to the
326
+ parsed value from another column. This would make that possible. Here is what
327
+ that might look like:
328
+
329
+ class MyImporter < CSVParty::Importer
330
+ column :customer do |customer_id|
331
+ Customer.find(customer_id)
332
+ end
333
+
334
+ column :order, depends_on: :customer do |order_id, customer|
335
+ customer.orders.find(order_id)
336
+ end
337
+ end
338
+
339
+ #### Rails Generator
340
+
341
+ This feature would add a generator for Rails which creates an importer file. So
342
+ doing:
343
+
344
+ rails generate importer Product
345
+
346
+ Would generate the following file at `app/importers/product_importer.rb`:
347
+
348
+ class ProductImporter
349
+ include CSVParty
350
+
351
+ import do |row|
352
+ # import logic goes here
353
+ end
354
+ end
355
+
@@ -0,0 +1,82 @@
1
+ module CSVParty
2
+ class Configuration
3
+ attr_accessor :row_importer, :file_importer, :error_handler,
4
+ :skipped_row_handler, :aborted_row_handler
5
+
6
+ attr_reader :columns, :dependencies
7
+
8
+ def initialize
9
+ @columns = {}
10
+ @dependencies = []
11
+ end
12
+
13
+ def add_column(column, options = {}, &block)
14
+ raise_if_duplicate_column(column)
15
+ raise_if_reserved_column_name(column)
16
+
17
+ options = {
18
+ header: column_regex(column),
19
+ as: :string,
20
+ format: nil,
21
+ intercept_blanks: (options[:as] != :raw)
22
+ }.merge(options)
23
+
24
+ parser = if block_given?
25
+ block
26
+ else
27
+ "parse_#{options[:as]}".to_sym
28
+ end
29
+
30
+ columns[column] = {
31
+ header: options[:header],
32
+ parser: parser,
33
+ format: options[:format],
34
+ intercept_blanks: options[:intercept_blanks]
35
+ }
36
+ end
37
+
38
+ def add_dependency(*args)
39
+ args.each do |arg|
40
+ dependencies << arg
41
+ end
42
+ end
43
+
44
+ def columns_with_named_parsers
45
+ columns.select { |_name, options| options[:parser].is_a? Symbol }
46
+ end
47
+
48
+ def columns_with_regex_headers
49
+ columns.select { |_name, options| options[:header].is_a? Regexp }
50
+ end
51
+
52
+ def required_columns
53
+ columns.map { |_name, options| options[:header] }
54
+ end
55
+
56
+ private
57
+
58
+ def column_regex(column)
59
+ column = Regexp.escape(column.to_s)
60
+ underscored_or_whitespaced = "#{column}|#{column.tr('_', ' ')}"
61
+ /\A\s*#{underscored_or_whitespaced}\s*\z/i
62
+ end
63
+
64
+ def raise_if_duplicate_column(name)
65
+ return unless columns.has_key?(name)
66
+
67
+ raise DuplicateColumnError.new(name)
68
+ end
69
+
70
+ RESERVED_COLUMN_NAMES = [:unparsed,
71
+ :csv_string,
72
+ :row_number,
73
+ :skip_message,
74
+ :abort_message].freeze
75
+
76
+ def raise_if_reserved_column_name(column)
77
+ return unless RESERVED_COLUMN_NAMES.include? column
78
+
79
+ raise ReservedColumnNameError.new(RESERVED_COLUMN_NAMES)
80
+ end
81
+ end
82
+ end