RubyGems - csv_party - Versions diffs - 0.0.1.pre9 → 1.0.0.rc4 - Mend

csv_party 0.0.1.pre9 → 1.0.0.rc4

Files changed (14) hide show

checksums.yaml +4 -4
data/LICENSE.md +21 -0
data/README.md +218 -0
data/ROADMAP.md +271 -0
data/lib/csv_party.rb +45 -275
data/lib/csv_party/configuration.rb +82 -0
data/lib/csv_party/data_preparer.rb +45 -0
data/lib/csv_party/dsl.rb +38 -0
data/lib/csv_party/errors.rb +157 -0
data/lib/csv_party/parsers.rb +71 -0
data/lib/csv_party/row.rb +83 -0
data/lib/csv_party/runner.rb +219 -0
data/lib/csv_party/testing.rb +6 -0
metadata +14 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: b70149f2976c65072cd5ba2e20b1187f250ed7cf
-  data.tar.gz: 64c20377be56bc6b01249b092d01e41f8caf756e
+  metadata.gz: 67d5895445a9fe397df95275260491ed6d9f6ce5
+  data.tar.gz: 542a1442466867afa33cf0ef883a32443777cfd5
 SHA512:
-  metadata.gz: fa5d31d3102e420b6f5f4d63bea3b13c8a9752a77047d28e32804abdc3ba07e2e21a68e9cf5ec909dd474537345bd4785750cdfaf0fb7112138fbae9d5082ec8
-  data.tar.gz: fd46166bafee072f6308212cd5c3d028d560078e7800f61b003d7c0a050afc6919ce14cf680465e4beb70860baa3630c44377c283e723c5440033fc1e8e28f48
+  metadata.gz: 1035dac76f5ec71d97015a5d9c2874074c28ad4b0ff7994ab565735fd6d49e539e4292a42ee15f44d8d2de2fb356e81a06f61685019fa3ad250d12ab9c160ba8
+  data.tar.gz: 93c748026adc2fa907f3e08825c0fef8b41fed5b2945e3066aff70978a86ed4c9d47df3a98abc30e25b2fb44a597cddb91efbb519699d2a9e3b4c580b4ef1100

data/LICENSE.md ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2018 Richard A. Jones
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,218 @@
+[![Gem Version](https://badge.fury.io/rb/csv_party.svg)](https://badge.fury.io/rb/csv_party)
+[![Build Status](https://travis-ci.org/toasterlovin/csv_party.svg?branch=master)](https://travis-ci.org/toasterlovin/csv_party)
+[![Code Climate Maintainability](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/maintainability)](https://codeclimate.com/github/toasterlovin/csv_party/maintainability)
+[![Code Climate Test Coverage](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/test_coverage)](https://codeclimate.com/github/toasterlovin/csv_party/test_coverage)
+# Make importing CSV files a party
+The point of this gem is to make it easier to focus on the business
+logic of your CSV imports. You start by defining which columns you
+will be importing, as well as how they will be parsed. Then, you
+specify what you want to do with each row after it has been parsed.
+That's it; CSVParty takes care of all the tedious stuff for you.
+## Defining Columns
+This is what defining your import columns look like:
+    class MyImporter < CSVParty
+      column :price, header: 'Nonsensical Column Name', as: :decimal
+    end
+This will take the value in the 'Nonsensical Column Name' column,
+parse it as a decimal, then make it available to your import logic
+as a nice, sane variable named `price`.
+The available built-in parsers are:
+  - `:raw` returns the value from the CSV file, unchanged
+  - `:string` strips whitespace and returns the resulting string
+  - `:integer` strips whitespace, then calls `to_i` on the resulting string
+  - `:decimal` strips all characters except `0-9` and `.`, then passes the
+    resulting string to `BigDecimal.new`
+  - `:boolean` strips whitespace, downcases, then returns `true` if the
+    resulting string is `'1'`, `'t'`, or `'true'`, otherwise it returns `false`
+When defining a column, you can also pass a block if you need custom
+parsing logic:
+    class MyImporter < CSVParty
+      column :product, header: 'Product' do |value|
+        Product.find_by(name: value)
+      end
+    end
+Or, if you want to re-use a custom parser for multiple columns, just
+define a method on your class with a name that ends in `_parser` and
+you can use it the same way you use the built-in parsers:
+    class MyImporter < CSVParty
+      def dollars_to_cents_parser(value)
+        (BigDecimal.new(value) * 100).to_i
+      end
+      column :price_in_cents, header: 'Price in $', as: :dollars_to_cents
+      column :cost_in_cents, header: 'Cost in $', as: :dollars_to_cents
+    end
+#### NOTE: Parsing nil and blank values
+By default, CSVParty will intercept any values that are `nil` or which contain
+only whitespace and coerce them to `nil` _without invoking the parser for that
+column_. This applies to all parsers, including custom parsers which you
+define, with one exception: the :raw parser. This is done as a convenience to
+avoid pesky `NoMethodError`s that arise when a parser tries to do its thing
+to a `nil` value that it wasn't expecting. You can turn this behavior off on a
+given column by setting `intercept_blanks` to `false` in the options hash:
+    class MyImporter < CSVParty
+      column :price, header: 'Price', intercept_blanks: false do |value|
+        if value.nil?
+          'n/a'
+        else
+          BigDecimal.new(value)
+        end
+      end
+    end
+#### NOTE: Parsers cannot reference each other
+When using a custom parser to parse a column, the block or method that you
+define has no way to reference the values from any other columns. So, this won't
+work:
+    class MyImporter < CSVParty
+      column :product, header: 'Product', do |value|
+        Product.find_by(name: value)
+      end
+      column :price, header: 'Price', do |value|
+        # product is not defined...
+        product.price = BigDecimal.new(value)
+      end
+    end
+Instead, you would do this in your row import logic. Which brings us to:
+## Importing Rows
+Once you've defined all of your columns, you specify your logic for importing
+rows by passing a block to the `rows` DSL method. That block will have access
+to a `row` variable which contains all of the parsed values for your columns.
+Here's what that looks like:
+    class MyImporter < CSVParty
+      rows do |row|
+        product = row.product
+        product.price = row.price
+        product.save
+      end
+    end
+The `row` variable also provides access to two other things:
+- The unparsed values for your columns
+- The raw CSV string for that row
+Here's how you access those:
+    class MyImporter < CSVParty
+      rows do |row|
+        row.price           # parsed value: #<BigDecimal:7f88d92cb820,'0.9E1',9(18)>
+        row.unparsed.price  # unparsed value: '$9.00'
+        row.string          # raw CSV string: 'USB Cable,$9.00,Box,Blue'
+      end
+    end
+## Importing
+Once your importer class is defined, you use it like this:
+    importer = MyImporter.new('path/to/file.csv')
+    importer.import!
+You can also specify what should happen before and after your import by passing
+a block to `import`, like so:
+    class MyImporter < CSVParty
+      # column definitions
+      # row import logic
+      import do
+        puts 'Starting import'
+        import_rows!
+        puts 'Import finished!'
+      end
+    end
+You can do whatever you want inside of the `import` block, just make sure to
+call `import_rows!` somewhere in there.
+## Handling Errors
+One of the hallmarks of importing data from CSV files is that there are
+inevitably rows with errors of some kind. You can handle error rows by
+specifying an `errors` block:
+    class MyImporter < CSVParty
+      # column definitions
+      # row import logic
+      errors do |error, line_number|
+        # log error
+      end
+    end
+Any row in your CSV file which results in an exception will be passed to this
+block. Which means you can specify that there is an error with a given row by
+raising an exception:
+    rows do |row|
+      # rows with price less than 0 will be treated as errors
+      raise if row.price < 0
+    end
+## External Dependencies
+Sometimes you need access to external objects in your importer's logic. You can specify
+what external objects your importer depends on with `depends_on`. Dependencies declared
+this way will then be available in your parsers and your `rows`, `import`, and `errors`
+blocks:
+    class MyImporter < CSVParty
+      # column definitions...
+      depends_on: :product_import
+      rows do |row|
+        # do some stuff
+        # product_import is not provided by the class,
+        # but is passed in at runtime instead!
+        product_import.log_success(product)
+      end
+    end
+Then, to pass the dependency in at runtime, you just add an option to `.new` with
+the name and value of the dependency:
+    MyImporter.new(
+      'path/to/csv',
+       product_import: @product_import
+    )
+# Tested Rubies
+CSVParty has been tested against the following Rubies:
+MRI
+- 2.5
+- 2.4
+- 2.3
+- 2.2
+- 2.1
+- 2.0
+# License
+This project uses the MIT License. See LICENSE.md for details.

data/ROADMAP.md ADDED Viewed

@@ -0,0 +1,271 @@
+Roadmap
+-
+- [1.1 Early Return While Parsing](#11-early-return-while-parsing)
+- [1.2 Rows to Hash](#12-rows-to-hash)
+- [1.3 Generate Unimported Rows CSV](#13-generate-unimported-rows-csv)
+- [1.4 Batch API](#14-batch-api)
+- [1.5 Runtime Configuration](#15-runtime-configuration)
+- [1.6 CSV Parse Error Handling](#16-csv-parse-error-handling)
+- [Someday Features](#someday-features)
+    - [Column Numbers](#column-numbers)
+    - [Multi-column Parsing](#multi-column-parsing)
+    - [Parse Dependencies](#parse-dependencies)
+#### 1.1 Early Return While Parsing
+Currently, CSVParty is pretty well thought out about what should happen when
+either 1) one of the built in flow control methods (`next_row`, `skip_row`,
+`abort_row`, and `abort_import`) is used, or 2) an error is raised while
+the row importer block is being executed. However, all of these things can also
+happen when the columns for a row are being parsed. When/if it does, most of the
+flow control and error handling kind of assumes that the row has been fully
+parsed. So some design work should go into deciding what should happen in these
+cases. And then tests should be written for all of the various scenarios.
+#### 1.2 Rows to Hash
+One of the primary use cases for importing CSV files is to insert their contents
+into a database. Apparently this is common enough that the
+[csv-importer](https://github.com/pcreux/csv-importer) gem, which almost
+completely automates this process without much room for customization, is very
+popular. So, in the case where there is a pretty simple correspondence between
+the contents of a CSV file and ActiveRecord models, it should be dead simple to
+get the job done.
+What I have in mind is something like:
+    class MyImporter < CSVParty::Importer
+      column :product_id
+      column :quantity
+      column :price
+      rows do |row|
+        LineItem.create(row.attributes)
+      end
+    end
+Where `row.attributes` returns a hash with all of the column names as keys and
+all of the parsed values as values. So, with an importer like the one above,
+`row.attributes` would return a hash like so:
+    { product_id: 42, quantity: 3, price: 9.99 }
+#### 1.3 Generate Unimported Rows CSV
+Most user inputs to an application are relatively constrained. CSV files, on the
+other hand, are not. Users can, and will, put all kinds of erroneous data into
+their CSV files. So, it is useful to be able to provide a user with a list of
+the rows in their file that could not be imported, so that they can re-import
+these rows after they have resolved whatever issues existed. And CSV is a
+natural format for this, since the user can open the file in Excel and make
+edits.
+A motivated user of CSVParty can already achieve this by accessing the
+`skipped_rows`, `aborted_rows`, and `error_rows` arrays and constructing one or
+more CSV files from these, but it would be nice to provide a default
+implementation that is only a method call away. What I have in mind is for the
+CSV file that is created to have the exact same column structure as the original
+file, but with three additional columns:
+  - The original row number
+  - The status (skipped, aborted, errored)
+  - A message explaining the reason for the status
+Conveniently, all of these pieces of data are available for skipped, aborted,
+and errored rows. Then, the file would be generated with a method, like so:
+    # all three combined
+    importer.unimported_rows_as_csv
+    # or separate
+    importer.skipped_rows_as_csv
+    importer.aborted_rows_as_csv
+    importer.error_rows_as_csv
+#### 1.4 Batch API
+It can be way more performant to batch imports so that expensive operations,
+like persisting data, are only done every so often. This would add an API to
+accumulate data, execute some logic every X number of rows, reset the
+accumulators, then repeat. Here's a rough sketch of what that API might look
+like:
+    rows do |row|
+      customers[row.customer_id] = { name: row.customer_name, phone: row.phone }
+      orders[row.order_id] = { customer_id: row.customer_id, invoice_number: row.invoice_number }
+    end
+    batch 50, customers: {}, orders: {} do
+      # insert customers into database
+      # insert orders into database
+    end
+The first argument is how often the batch logic should be executed. In this
+case, every 50 rows. Then there is a hash of accumulators, where the keys are
+the names of the accumulators and the values are the initial values. Declaring
+the accumulators accomplished two things:
+1. It provides accessor methods so that the accumulators can be accessed from
+   within the row import block.
+2. It automatically resets the accumulators to their initial values each time
+   the batch block is executed.
+So, it is essentially functionally identical to doing the following:
+    class MyImporter < CSVParty::Importer
+      attr_accessor :customers, :orders
+      def customers
+        @customers ||= {}
+      end
+      def orders
+        @orders ||= {}
+      end
+      rows do |row|
+        # add customer to customers accumulator
+        # add order to orders accumulator
+      end
+      batch 50 do
+        # insert customers into database
+        # insert orders into database
+        customers = {}
+        orders = {}
+      end
+    end
+_Note:_ The following is a rough sketch of an API that would handle a use case
+that has come up. However, some research should be done first to figure out if
+the use case it addresses is common.
+One use case that has been mentioned is when rows are grouped by their
+relationship to a parent record and those rows need to be acted on as a group.
+So, imagine a CSV file like so:
+    Customer,Address,Product,Quantity,Price
+    Joe Smith,123 Main St.,Birkenstocks,1,74.99
+    Joe Smith,123 Main St.,Air Jordans,1,129.99
+    Joe Smith,123 Main St.,Tevas,3,59.99
+    Jane Doe,713 Broadway,Converse All-Star,1,39.99
+    Jane Doe,713 Broadway,Toms,1,59.99
+It might be useful to be able to specify the batch interval in terms of one of
+the columns in the CSV file, rather than as a number of rows. So, you would be
+able to do:
+    class MyImporter < CSVParty::Importer
+      column :customer
+      column :address
+      column :product
+      column :quantity, as: :integer
+      column :price, as: :decimal
+      rows do |row|
+        line_items << { product: row.product, quantity: row.quantity, price: row.price }
+      end
+      batch :customer, line_items: [] do |current_row|
+        Customer.create(name: current_row.customer, address: current_row.address)
+        line_items.each do |li|
+          LineItem.create(li)
+        end
+      end
+    end
+In this case, the batch logic gets executed everytime there is a change in the
+`:customer` column from one row to the next, rather than every X number of rows.
+The accumulator works the same way: accessors are made available for adding
+records to the accumulator and then the accumulator is automatically reset to
+its initial value each time the batch logic is executed.
+#### 1.5 Runtime Configuration
+Sometimes it useful to be able to configure an importer at runtime, rather than
+at code writing time. An obvious example of when this would be useful is in the
+case of user defined column header names. So, imagine a UI in which the user
+uploads their CSV file, then specifies which column is, for example, the product
+column, which is the quantity column, and which is the price column. In a case
+like this, there is no way to specify the column definitions ahead of time; we
+have to wait for the header names from the user.
+Here is a sketch of what the API for runtime configuration would look like:
+    class MyImporter < CSVParty::Importer
+      rows do |row|
+        # persist data
+      end
+    end
+    # then:
+    my_importer = MyImporter.new
+    my_importer.configure do
+      column :product, header: user_product_header
+      column :quantity, header: user_quantity_header, as: :integer
+      column :price, header: user_price_header, as: :decimal
+    end
+An open question is whether all DSL methods should be configurable at runtime.
+#### 1.6 CSV Parse Error Handling
+Sometimes it is useful to be able to completely ignore parsing and encoding
+errors raised by the `CSV` class. To be clear, doing so is dangerous, since the
+parsing logic in the `CSV` class is not designed to continue operating after it
+encounters an error and raises. But sometimes you don't want to let a single
+improperly encoded character prevent you from importing an entire CSV file. So,
+this feature would be an optional way to either ignore those errors or respond
+to them, and then continue importing. The API would probably be similar to the
+error handling API for non-parse errors. So:
+    parse_errors :ignore # silently continue importing the next row
+    parse_errors do |line_number|
+      # handle parse error
+    end
+    my_import.parse_error_rows # returns array of parse error rows
+## Someday Features
+#### Column Numbers
+CSVParty is entirely oriented around a CSV file having a header. This is not
+always the case, though. This would add the ability to specify columns using a
+column number, rather than a header. A rough sketch of the API might look like:
+    class MyImporter < CSVParty::Importer
+      column :product, number: 7
+      column :quantity, number: 8, as: :integer
+      column :price, number: 9, as: :decimal
+    end
+#### Multi-column Parsing
+The whole idea behind custom parsers is that it makes for much cleaner code to
+get all the logic related to parsing a raw value into a useful intermediate
+object in one place, away from the larger logic of what needs to happen to each
+row. Sometimes, though, you need access to multiple column values to create a
+useful parsed value. Here is what an API for that might look like:
+    column :total, header: ['Price', 'Quantity'] do |price, quantity|
+      BigDecimal.new(price) * BigDecimal.new(quantity)
+    end
+#### Parse Dependencies
+Sometimes, while parsing a column, it would be useful to have access to the
+parsed value from another column. This would make that possible. Here is what
+that might look like:
+    class MyImporter < CSVParty::Importer
+      column :customer do |customer_id|
+        Customer.find(customer_id)
+      end
+      column :order, depends_on: :customer do |order_id, customer|
+        customer.orders.find(order_id)
+      end
+    end