RubyGems - csv_party - Versions diffs - 0.0.1.pre8 → 1.0.0.rc5 - Mend

csv_party 0.0.1.pre8 → 1.0.0.rc5

Files changed (15) hide show

checksums.yaml +5 -5
data/LICENSE.md +21 -0
data/README.md +218 -0
data/ROADMAP.md +355 -0
data/lib/csv_party/configuration.rb +82 -0
data/lib/csv_party/data_preparer.rb +48 -0
data/lib/csv_party/dsl.rb +40 -0
data/lib/csv_party/errors.rb +157 -0
data/lib/csv_party/parsers.rb +73 -0
data/lib/csv_party/row.rb +84 -0
data/lib/csv_party/runner.rb +186 -0
data/lib/csv_party/testing.rb +6 -0
data/lib/csv_party/validations.rb +41 -0
data/lib/csv_party.rb +43 -259
metadata +16 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: f4ebba54721bd80642ce2a0d3d07add5d4cdb348
-  data.tar.gz: cd77e182c9026980d9df49ce32a58bf6789dd570
+SHA256:
+  metadata.gz: 174fc0ba77e52795b1763e2ff29eae5e5c3cfa3bcd181973aaba886a54a79eeb
+  data.tar.gz: 7676780e13435e10bf35b7c34c849484ea0efe0a86d7889ff2a4697f312a99c6
 SHA512:
-  metadata.gz: b2d9374b60ca71b54ff36f35751245580dfa2f8904fab72c8010162a34e254d12c8f7b146e0ca532c03a3ad963fd1d19c880fe3e92776260e4d07d064d532bbd
-  data.tar.gz: cffc09a0eacaf8639cd5dede503f97e8dd0560e9866b39d37c51254d337ab1ebb89c882c184ff037500df6d5f537597670e0ce84eb6971b9defeebb1d6eddffd
+  metadata.gz: ab4bc476717516d1fa2a9d674f47e60c95ef0b24541cb7cf6cf76948ca5b9ce9dad96b9b89d10edfd99f338d4a04b612fd241c38d37d746d049f6eedb8868120
+  data.tar.gz: 4041355ba68669a3cf3e6275bf099da83d3d92d759bfc328269080a41dcbee02fa9a77b9c95c3549eaf5330cfc9fbbdcfdea05f83d320d2f29b55e89553d3fea

data/LICENSE.md ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2018 Richard A. Jones
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,218 @@
+[![Gem Version](https://badge.fury.io/rb/csv_party.svg)](https://badge.fury.io/rb/csv_party)
+[![Build Status](https://travis-ci.org/toasterlovin/csv_party.svg?branch=master)](https://travis-ci.org/toasterlovin/csv_party)
+[![Code Climate Maintainability](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/maintainability)](https://codeclimate.com/github/toasterlovin/csv_party/maintainability)
+[![Code Climate Test Coverage](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/test_coverage)](https://codeclimate.com/github/toasterlovin/csv_party/test_coverage)
+# Make importing CSV files a party
+The point of this gem is to make it easier to focus on the business
+logic of your CSV imports. You start by defining which columns you
+will be importing, as well as how they will be parsed. Then, you
+specify what you want to do with each row after it has been parsed.
+That's it; CSVParty takes care of all the tedious stuff for you.
+## Defining Columns
+This is what defining your import columns look like:
+    class MyImporter < CSVParty
+      column :price, header: 'Nonsensical Column Name', as: :decimal
+    end
+This will take the value in the 'Nonsensical Column Name' column,
+parse it as a decimal, then make it available to your import logic
+as a nice, sane variable named `price`.
+The available built-in parsers are:
+  - `:raw` returns the value from the CSV file, unchanged
+  - `:string` strips whitespace and returns the resulting string
+  - `:integer` strips whitespace, then calls `to_i` on the resulting string
+  - `:decimal` strips all characters except `0-9` and `.`, then passes the
+    resulting string to `BigDecimal.new`
+  - `:boolean` strips whitespace, downcases, then returns `true` if the
+    resulting string is `'1'`, `'t'`, or `'true'`, otherwise it returns `false`
+When defining a column, you can also pass a block if you need custom
+parsing logic:
+    class MyImporter < CSVParty
+      column :product, header: 'Product' do |value|
+        Product.find_by(name: value)
+      end
+    end
+Or, if you want to re-use a custom parser for multiple columns, just
+define a method on your class with a name that ends in `_parser` and
+you can use it the same way you use the built-in parsers:
+    class MyImporter < CSVParty
+      def dollars_to_cents_parser(value)
+        (BigDecimal.new(value) * 100).to_i
+      end
+      column :price_in_cents, header: 'Price in $', as: :dollars_to_cents
+      column :cost_in_cents, header: 'Cost in $', as: :dollars_to_cents
+    end
+#### NOTE: Parsing nil and blank values
+By default, CSVParty will intercept any values that are `nil` or which contain
+only whitespace and coerce them to `nil` _without invoking the parser for that
+column_. This applies to all parsers, including custom parsers which you
+define, with one exception: the :raw parser. This is done as a convenience to
+avoid pesky `NoMethodError`s that arise when a parser tries to do its thing
+to a `nil` value that it wasn't expecting. You can turn this behavior off on a
+given column by setting `intercept_blanks` to `false` in the options hash:
+    class MyImporter < CSVParty
+      column :price, header: 'Price', intercept_blanks: false do |value|
+        if value.nil?
+          'n/a'
+        else
+          BigDecimal.new(value)
+        end
+      end
+    end
+#### NOTE: Parsers cannot reference each other
+When using a custom parser to parse a column, the block or method that you
+define has no way to reference the values from any other columns. So, this won't
+work:
+    class MyImporter < CSVParty
+      column :product, header: 'Product', do |value|
+        Product.find_by(name: value)
+      end
+      column :price, header: 'Price', do |value|
+        # product is not defined...
+        product.price = BigDecimal.new(value)
+      end
+    end
+Instead, you would do this in your row import logic. Which brings us to:
+## Importing Rows
+Once you've defined all of your columns, you specify your logic for importing
+rows by passing a block to the `rows` DSL method. That block will have access
+to a `row` variable which contains all of the parsed values for your columns.
+Here's what that looks like:
+    class MyImporter < CSVParty
+      rows do |row|
+        product = row.product
+        product.price = row.price
+        product.save
+      end
+    end
+The `row` variable also provides access to two other things:
+- The unparsed values for your columns
+- The raw CSV string for that row
+Here's how you access those:
+    class MyImporter < CSVParty
+      rows do |row|
+        row.price           # parsed value: #<BigDecimal:7f88d92cb820,'0.9E1',9(18)>
+        row.unparsed.price  # unparsed value: '$9.00'
+        row.string          # raw CSV string: 'USB Cable,$9.00,Box,Blue'
+      end
+    end
+## Importing
+Once your importer class is defined, you use it like this:
+    importer = MyImporter.new('path/to/file.csv')
+    importer.import!
+You can also specify what should happen before and after your import by passing
+a block to `import`, like so:
+    class MyImporter < CSVParty
+      # column definitions
+      # row import logic
+      import do
+        puts 'Starting import'
+        import_rows!
+        puts 'Import finished!'
+      end
+    end
+You can do whatever you want inside of the `import` block, just make sure to
+call `import_rows!` somewhere in there.
+## Handling Errors
+One of the hallmarks of importing data from CSV files is that there are
+inevitably rows with errors of some kind. You can handle error rows by
+specifying an `errors` block:
+    class MyImporter < CSVParty
+      # column definitions
+      # row import logic
+      errors do |error, line_number|
+        # log error
+      end
+    end
+Any row in your CSV file which results in an exception will be passed to this
+block. Which means you can specify that there is an error with a given row by
+raising an exception:
+    rows do |row|
+      # rows with price less than 0 will be treated as errors
+      raise if row.price < 0
+    end
+## External Dependencies
+Sometimes you need access to external objects in your importer's logic. You can specify
+what external objects your importer depends on with `depends_on`. Dependencies declared
+this way will then be available in your parsers and your `rows`, `import`, and `errors`
+blocks:
+    class MyImporter < CSVParty
+      # column definitions...
+      depends_on: :product_import
+      rows do |row|
+        # do some stuff
+        # product_import is not provided by the class,
+        # but is passed in at runtime instead!
+        product_import.log_success(product)
+      end
+    end
+Then, to pass the dependency in at runtime, you just add an option to `.new` with
+the name and value of the dependency:
+    MyImporter.new(
+      'path/to/csv',
+       product_import: @product_import
+    )
+# Tested Rubies
+CSVParty has been tested against the following Rubies:
+MRI
+- 2.5
+- 2.4
+- 2.3
+- 2.2
+- 2.1
+- 2.0
+# License
+This project uses the MIT License. See LICENSE.md for details.

data/ROADMAP.md ADDED Viewed

@@ -0,0 +1,355 @@
+Roadmap
+-
+- [1.1 Early Return While Parsing](#11-early-return-while-parsing)
+- [1.2 Rows to Hash](#12-rows-to-hash)
+- [1.3 Generate Unimported Rows CSV](#13-generate-unimported-rows-csv)
+- [1.4 Batch API](#14-batch-api)
+- [1.5 Runtime Configuration](#15-runtime-configuration)
+- [1.6 CSV Parse Error Handling](#16-csv-parse-error-handling)
+- [Someday Features](#someday-features)
+    - [Parse Row Access](#parse-row-access)
+    - [Deferred Parsing](#deferred-parsing)
+    - [Columns Macro](#columns-macro)
+    - [Column Numbers](#column-numbers)
+    - [Multi-column Parsing](#multi-column-parsing)
+    - [Parse Dependencies](#parse-dependencies)
+    - [Rails Generator](#rails-generator0
+#### 1.1 Early Return While Parsing
+Currently, CSVParty is pretty well thought out about what should happen when
+either 1) one of the built in flow control methods (`next_row`, `skip_row`,
+`abort_row`, and `abort_import`) is used, or 2) an error is raised while
+the row importer block is being executed. However, all of these things can also
+happen when the columns for a row are being parsed. When/if it does, most of the
+flow control and error handling kind of assumes that the row has been fully
+parsed. So some design work should go into deciding what should happen in these
+cases. And then tests should be written for all of the various scenarios.
+#### 1.2 Rows to Hash
+One of the primary use cases for importing CSV files is to insert their contents
+into a database. Apparently this is common enough that the
+[csv-importer](https://github.com/pcreux/csv-importer) gem, which almost
+completely automates this process without much room for customization, is very
+popular. So, in the case where there is a pretty simple correspondence between
+the contents of a CSV file and ActiveRecord models, it should be dead simple to
+get the job done.
+What I have in mind is something like:
+    class MyImporter < CSVParty::Importer
+      column :product_id
+      column :quantity
+      column :price
+      rows do |row|
+        LineItem.create(row.attributes)
+      end
+    end
+Where `row.attributes` returns a hash with all of the column names as keys and
+all of the parsed values as values. So, with an importer like the one above,
+`row.attributes` would return a hash like so:
+    { product_id: 42, quantity: 3, price: 9.99 }
+#### 1.3 Generate Unimported Rows CSV
+Most user inputs to an application are relatively constrained. CSV files, on the
+other hand, are not. Users can, and will, put all kinds of erroneous data into
+their CSV files. So, it is useful to be able to provide a user with a list of
+the rows in their file that could not be imported, so that they can re-import
+these rows after they have resolved whatever issues existed. And CSV is a
+natural format for this, since the user can open the file in Excel and make
+edits.
+A motivated user of CSVParty can already achieve this by accessing the
+`skipped_rows`, `aborted_rows`, and `error_rows` arrays and constructing one or
+more CSV files from these, but it would be nice to provide a default
+implementation that is only a method call away. What I have in mind is for the
+CSV file that is created to have the exact same column structure as the original
+file, but with three additional columns:
+  - The original row number
+  - The status (skipped, aborted, errored)
+  - A message explaining the reason for the status
+Conveniently, all of these pieces of data are available for skipped, aborted,
+and errored rows. Then, the file would be generated with a method, like so:
+    # all three combined
+    importer.unimported_rows_as_csv
+    # or separate
+    importer.skipped_rows_as_csv
+    importer.aborted_rows_as_csv
+    importer.error_rows_as_csv
+#### 1.4 Batch API
+It can be way more performant to batch imports so that expensive operations,
+like persisting data, are only done every so often. This would add an API to
+accumulate data, execute some logic every X number of rows, reset the
+accumulators, then repeat. Here's a rough sketch of what that API might look
+like:
+    rows do |row|
+      customers[row.customer_id] = { name: row.customer_name, phone: row.phone }
+      orders[row.order_id] = { customer_id: row.customer_id, invoice_number: row.invoice_number }
+    end
+    batch 50, customers: {}, orders: {} do
+      # insert customers into database
+      # insert orders into database
+    end
+The first argument is how often the batch logic should be executed. In this
+case, every 50 rows. Then there is a hash of accumulators, where the keys are
+the names of the accumulators and the values are the initial values. Declaring
+the accumulators accomplished two things:
+1. It provides accessor methods so that the accumulators can be accessed from
+   within the row import block.
+2. It automatically resets the accumulators to their initial values each time
+   the batch block is executed.
+So, it is essentially functionally identical to doing the following:
+    class MyImporter < CSVParty::Importer
+      attr_accessor :customers, :orders
+      def customers
+        @customers ||= {}
+      end
+      def orders
+        @orders ||= {}
+      end
+      rows do |row|
+        # add customer to customers accumulator
+        # add order to orders accumulator
+      end
+      batch 50 do
+        # insert customers into database
+        # insert orders into database
+        customers = {}
+        orders = {}
+      end
+    end
+_Note:_ The following is a rough sketch of an API that would handle a use case
+that has come up. However, some research should be done first to figure out if
+the use case it addresses is common.
+One use case that has been mentioned is when rows are grouped by their
+relationship to a parent record and those rows need to be acted on as a group.
+So, imagine a CSV file like so:
+    Customer,Address,Product,Quantity,Price
+    Joe Smith,123 Main St.,Birkenstocks,1,74.99
+    Joe Smith,123 Main St.,Air Jordans,1,129.99
+    Joe Smith,123 Main St.,Tevas,3,59.99
+    Jane Doe,713 Broadway,Converse All-Star,1,39.99
+    Jane Doe,713 Broadway,Toms,1,59.99
+It might be useful to be able to specify the batch interval in terms of one of
+the columns in the CSV file, rather than as a number of rows. So, you would be
+able to do:
+    class MyImporter < CSVParty::Importer
+      column :customer
+      column :address
+      column :product
+      column :quantity, as: :integer
+      column :price, as: :decimal
+      rows do |row|
+        line_items << { product: row.product, quantity: row.quantity, price: row.price }
+      end
+      batch :customer, line_items: [] do |current_row|
+        Customer.create(name: current_row.customer, address: current_row.address)
+        line_items.each do |li|
+          LineItem.create(li)
+        end
+      end
+    end
+In this case, the batch logic gets executed everytime there is a change in the
+`:customer` column from one row to the next, rather than every X number of rows.
+The accumulator works the same way: accessors are made available for adding
+records to the accumulator and then the accumulator is automatically reset to
+its initial value each time the batch logic is executed.
+#### 1.5 Runtime Configuration
+Sometimes it useful to be able to configure an importer at runtime, rather than
+at code writing time. An obvious example of when this would be useful is in the
+case of user defined column header names. So, imagine a UI in which the user
+uploads their CSV file, then specifies which column is, for example, the product
+column, which is the quantity column, and which is the price column. In a case
+like this, there is no way to specify the column definitions ahead of time; we
+have to wait for the header names from the user.
+Here is a sketch of what the API for runtime configuration would look like:
+    class MyImporter < CSVParty::Importer
+      rows do |row|
+        # persist data
+      end
+    end
+    # then:
+    my_importer = MyImporter.new
+    my_importer.configure do
+      column :product, header: user_product_header
+      column :quantity, header: user_quantity_header, as: :integer
+      column :price, header: user_price_header, as: :decimal
+    end
+An open question is whether all DSL methods should be configurable at runtime.
+#### 1.6 CSV Parse Error Handling
+Sometimes it is useful to be able to completely ignore parsing and encoding
+errors raised by the `CSV` class. To be clear, doing so is dangerous, since the
+parsing logic in the `CSV` class is not designed to continue operating after it
+encounters an error and raises. But sometimes you don't want to let a single
+improperly encoded character prevent you from importing an entire CSV file. So,
+this feature would be an optional way to either ignore those errors or respond
+to them, and then continue importing. The API would probably be similar to the
+error handling API for non-parse errors. So:
+    parse_errors :ignore # silently continue importing the next row
+    parse_errors do |line_number|
+      # handle parse error
+    end
+    my_import.parse_error_rows # returns array of parse error rows
+## Someday Features
+#### Parse Row Access
+This feature would allow access to the `CSV::Row` object when parsing a column.
+It could work something like this:
+    column product do |value, row|
+      Product.find_by(name: row['Product'])
+    end
+In theory, the CSVParty API would cover all use cases where somebody would need
+to access the raw row data, but perhaps not. Sometimes it's nice to be able to
+cut through the stuff in your way and just get at the raw internals.
+Additionally, perhaps deferred parsing would allow access to parsed row values,
+which would possibly enable some of the features below, Multi Column Parsing and
+Parse Dependencies, without requiring additional code. That might look like:
+    column product do |value, row|
+      Product.find_by(name: row.unparsed.row)
+    end
+#### Deferred Parsing
+Currently, CSVParty parses all columns before the row import logic is executed.
+There are situations where parsing columns is expensive and you may want to
+defer parsing columns until and unless you actually need them for that a given
+row. For example, say you have an import where you are either:
+1. Updating a value on existing records, or
+2. Setting a value on and creating new records
+And when you create new records, you have to also set a bunch of other values
+and some of those values require database queries.
+In a situation like this, you would want to defer all of the database queries
+that need to run when creating a new record so that they aren't done in cases
+where you are updating an existing record.
+#### Columns Macro
+This feature would allow you to declare multiple columns in a single line. So,
+rather than:
+    column :product
+    column :price
+    column :color
+You could do:
+    columns :product, :price, :color
+This is probably most useful when there are a bunch of columns that should all
+be parsed as text. Though it might make sense to allow specifying parsers and
+other options:
+    columns product: { as: :raw }, price: { as: :decimal }
+It should probably also be possible to combine `columns` and `column` macros:
+    columns :product, :price, :color
+    column :size
+#### Column Numbers
+CSVParty is entirely oriented around a CSV file having a header. This is not
+always the case, though. This would add the ability to specify columns using a
+column number, rather than a header. A rough sketch of the API might look like:
+    class MyImporter < CSVParty::Importer
+      column :product, number: 7
+      column :quantity, number: 8, as: :integer
+      column :price, number: 9, as: :decimal
+    end
+#### Multi-column Parsing
+The whole idea behind custom parsers is that it makes for much cleaner code to
+get all the logic related to parsing a raw value into a useful intermediate
+object in one place, away from the larger logic of what needs to happen to each
+row. Sometimes, though, you need access to multiple column values to create a
+useful parsed value. Here is what an API for that might look like:
+    column :total, header: ['Price', 'Quantity'] do |price, quantity|
+      BigDecimal.new(price) * BigDecimal.new(quantity)
+    end
+#### Parse Dependencies
+Sometimes, while parsing a column, it would be useful to have access to the
+parsed value from another column. This would make that possible. Here is what
+that might look like:
+    class MyImporter < CSVParty::Importer
+      column :customer do |customer_id|
+        Customer.find(customer_id)
+      end
+      column :order, depends_on: :customer do |order_id, customer|
+        customer.orders.find(order_id)
+      end
+    end
+#### Rails Generator
+This feature would add a generator for Rails which creates an importer file. So
+doing:
+    rails generate importer Product
+Would generate the following file at `app/importers/product_importer.rb`:
+    class ProductImporter
+      include CSVParty
+      import do |row|
+        # import logic goes here
+      end
+    end

data/lib/csv_party/configuration.rb ADDED Viewed

@@ -0,0 +1,82 @@
+module CSVParty
+  class Configuration
+    attr_accessor :row_importer, :file_importer, :error_handler,
+                  :skipped_row_handler, :aborted_row_handler
+    attr_reader :columns, :dependencies
+    def initialize
+      @columns = {}
+      @dependencies = []
+    end
+    def add_column(column, options = {}, &block)
+      raise_if_duplicate_column(column)
+      raise_if_reserved_column_name(column)
+      options = {
+        header: column_regex(column),
+        as: :string,
+        format: nil,
+        intercept_blanks: (options[:as] != :raw)
+      }.merge(options)
+      parser = if block_given?
+                 block
+               else
+                 "parse_#{options[:as]}".to_sym
+               end
+      columns[column] = {
+        header: options[:header],
+        parser: parser,
+        format: options[:format],
+        intercept_blanks: options[:intercept_blanks]
+      }
+    end
+    def add_dependency(*args)
+      args.each do |arg|
+        dependencies << arg
+      end
+    end
+    def columns_with_named_parsers
+      columns.select { |_name, options| options[:parser].is_a? Symbol }
+    end
+    def columns_with_regex_headers
+      columns.select { |_name, options| options[:header].is_a? Regexp }
+    end
+    def required_columns
+      columns.map { |_name, options| options[:header] }
+    end
+    private
+    def column_regex(column)
+      column = Regexp.escape(column.to_s)
+      underscored_or_whitespaced = "#{column}|#{column.tr('_', ' ')}"
+      /\A\s*#{underscored_or_whitespaced}\s*\z/i
+    end
+    def raise_if_duplicate_column(name)
+      return unless columns.has_key?(name)
+      raise DuplicateColumnError.new(name)
+    end
+    RESERVED_COLUMN_NAMES = [:unparsed,
+                             :csv_string,
+                             :row_number,
+                             :skip_message,
+                             :abort_message].freeze
+    def raise_if_reserved_column_name(column)
+      return unless RESERVED_COLUMN_NAMES.include? column
+      raise ReservedColumnNameError.new(RESERVED_COLUMN_NAMES)
+    end
+  end
+end