RubyGems - csv-diff - Versions diffs - 0.1 - Mend

csv-diff 0.1

Files changed (8) hide show

data/LICENSE ADDED

@@ -0,0 +1,22 @@
+Copyright (c) 2013, Adam Gardiner
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README.md ADDED

@@ -0,0 +1,222 @@
+# CSV-Diff
+CSV-Diff is a small library for performing diffs of CSV data.
+Unlike a standard diff that compares line by line, and is sensitive to the
+ordering of records, CSV-Diff identifies common lines by key field(s), and
+then compares the contents of the fields in each line.
+Data may be supplied in the form of CSV files, or as an array of arrays. The
+diff process provides a fine level of control over what to diff, and can
+optionally ignore certain types of changes (e.g. changes in position).
+CSV-Diff is particularly well suited to data in parent-child format. Parent-
+child data does not lend itself well to standard text diffs, as small changes
+in the organisation of the tree at an upper level can lead to big movements
+in the position of descendant records. By instead matching records by key,
+CSV-Diff avoids this issue, while still being able to detect changes in
+sibling order.
+## Usage
+CSV-Diff is supplied as a gem, and has no dependencies. To use it, simply:
+```
+gem install csv-diff
+```
+To compare two CSV files where the field names are in the first row of the file,
+and the first field contains the unique key for each record, simply use:
+```ruby
+require 'csv-diff'
+diff = CSVDiff.new(file1, file2)
+```
+The returned diff object can be queried for the differences that exist between
+the two files, e.g.:
+```ruby
+puts diff.summary.inspect   # Summary of the adds, deletes, updates, and moves
+puts diff.adds.inspect      # Details of the additions to file2
+puts diff.deletes.inspect   # Details of the deletions to file1
+puts diff.updates.inspect   # Details of the updates from file1 to file2
+puts diff.moves.inspect     # Details of the moves from file1 to file2
+puts diff.diffs.inspect     # Details of all differences
+puts diff.warnings.inspect  # Any warnings generated during the diff process
+```
+## Unique Row Identifiers
+CSVDiff is preferable over a standard line-by-line diff when row order is
+significantly impacted by small changes. The classic example is a parent-child
+file generated by a hierarchy traversal. A simple change in position of a parent
+member near the root of the hierarchy will have a large impact on the positions
+of all descendant rows. Consider the following example:
+```
+Root
+  |- A
+  |  |- A1
+  |  |- A2
+  |
+  |- B
+     |- B1
+     |- B2
+```
+A hierarchy traversal of this tree into a parent-child format would generate a CSV
+as follows:
+```
+Root,A
+A,A1
+A,A2
+Root,B
+B,B1
+B,B2
+```
+If the positions of A and B were swapped, a hierarchy traversal would now produce a CSV
+as follows:
+```
+Root,B
+B,B1
+B,B2
+Root,A
+A,A1
+A,A2
+```
+A simple diff using a diff utility would highlight this as 3 additions and 3 deletions.
+CSVDiff, however, would classify this as 2 moves (a change in sibling position for A and B).
+In order to do this, CSVDiff needs to know what field(s) confer uniqueness on each row.
+In this example, we could use the child field alone (since each member name only appears
+once); however, this would imply a flat structure, where all rows are children of a single
+parent. This in turn would cause CSVDiff to classify the above change as a Move (i.e. a
+change in order) of all 6 rows.
+The more correct specification of this file is that column 0 contains a unique parent
+identifier, and column 1 contains a unique child identifier. CSVDiff can then correctly
+deduce that there is in fact only two changes in order - the swap in positions of A and
+B below Root.
+Note: If you aren't interested in changes in the order of siblings, then you could use
+CSVDiff with a :key_field option of column 1, and specify the :ignore_moves option.
+## Warnings
+When processing and diffing files, CSVDiff may encounter problems with the data or
+the specifications it has been given. It will continue even in the face of problems,
+but will log details of the problems in a #warnings Array. The number of warnings
+will also be included in the Hash returned by the #summary method.
+Warnings may be raised for any of the following:
+* Missing fields: If the right/to file contains fields that are not present in the
+  left/from file, a warning is raised and the field is ignored for diff purposes.
+* Duplicate keys: If two rows are found that have the same values for the key field(s),
+  a warning is raised, and the duplicate values are ignored.
+## Examples
+The simplest use case is as shown above, where the data to be diffed is in CSV files
+with the column names as the first record, and where the unique key is the first
+column in the data. In this case, a diff can be created simply via:
+```ruby
+diff = CSVDiff.new(file1, file2)
+```
+### Specifynig Unique Row Identifiers
+Often however, rows are not uniquely identifiable via the first column in the file.
+In a parent-child hierarchy, for example, combinations of parent and child may be
+necessary to uniquely identify a row. In these cases, it is necessary to indicate
+which fields are used to uniquely identify common rows across the two files. This
+can be done in several different ways.
+1. Using the :key_fields option with field numbers (these are 0-based):
+    ```ruby
+    diff = CSVDiff.new(file1, file2, key_fields: [0, 1])
+    ```
+2. Using the :key_fields options with column names:
+    ```ruby
+    diff = CSVDiff.new(file1, file2, key_fields: ['Parent', 'Child'])
+    ```
+3. Using the :parent_fields and :child_fields with field numbers:
+    ```ruby
+    diff = CSVDiff.new(file1, file2, parent_field: 1, child_fields: [2, 3])
+    ```
+4. Using the :parent_fields and :child_fields with column names:
+    ```ruby
+    diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'])
+    ```
+### Using Non-CSV File Sources
+Data from non-CSV sources can be diffed, as long as it can be supplied as an Array
+of Arrays:
+```ruby
+DATA1 = [
+    ['Parent', 'Child', 'Description'],
+    ['A', 'A1', 'Account 1'],
+    ['A', 'A2', 'Account 2']
+]
+DATA2 = [
+    ['Parent', 'Child', 'Description'],
+    ['A', 'A1', 'Account1'],
+    ['A', 'A2', 'Account2']
+]
+diff = CSVDiff.new(DATA1, DATA2, key_fields: [1, 0])
+```
+### Specifying Column Names
+If your data file does not include column headers, you can specify the names of
+each column when creating the diff. The names supplied are the keys used in the
+diff results:
+```ruby
+DATA1 = [
+    ['A', 'A1', 'Account 1'],
+    ['A', 'A2', 'Account 2']
+]
+DATA2 = [
+    ['A', 'A1', 'Account1'],
+    ['A', 'A2', 'Account2']
+]
+diff = CSVDiff.new(DATA1, DATA2, key_fields: [1, 0], field_names: ['Parent', 'Child', 'Description'])
+```
+If your data file does contain a header row, but you wish to use your own column
+names, you can specify the :field_names option and the :ignore_header option to
+ignore the first row.
+### Ignoring Fields
+If your data contains fields that you aren't interested in, these can be excluded
+from the diff process using the :ignore_fields option:
+```ruby
+diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
+                   ignore_fields: ['CreatedAt', 'UpdatedAt'])
+```
+### Ignoring Certain Changes
+CSVDiff identifies Adds, Updates, Moves and Deletes; any of these changes can be selectively
+ignored, e.g. if you are not interested in Deletes, you can pass the :ignore_deletes option:
+```ruby
+diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
+                   ignore_fields: ['CreatedAt', 'UpdatedAt'],
+                   ignore_deletes: true, ignore_moves: true)
+```

data/lib/csv-diff.rb ADDED

@@ -0,0 +1,4 @@
+require 'csv-diff/csv_source'
+require 'csv-diff/algorithm'
+require 'csv-diff/csv_diff'

data/lib/csv-diff/algorithm.rb ADDED

@@ -0,0 +1,124 @@
+class CSVDiff
+    # Implements the CSV diff algorithm.
+    module Algorithm
+        # Diffs two CSVSource structures.
+        #
+        # @param left [CSVSource] A CSVSource object containing the contents of
+        #   the left/from input.
+        # @param right [CSVSource] A CSVSource object containing the contents of
+        #   the right/to input.
+        # @param key_fields [Array] An array containing the names of the field(s)
+        #   that uniquely identify each row.
+        # @param diff_fields [Array] An array containing the names of the fields
+        #   to be diff-ed.
+        def diff_sources(left, right, key_fields, diff_fields, options = {})
+            left_index = left.index
+            left_values = left.lines
+            left_keys = left_values.keys
+            right_index = right.index
+            right_values = right.lines
+            right_keys = right_values.keys
+            parent_fields = left.parent_fields.length
+            include_adds = !options[:ignore_adds]
+            include_moves = !options[:ignore_moves]
+            include_updates = !options[:ignore_updates]
+            include_deletes = !options[:ignore_deletes]
+            diffs = Hash.new{ |h, k| h[k] = {} }
+            right_keys.each_with_index do |key, right_row_id|
+                key_vals = key.split('~')
+                parent = key_vals[0...parent_fields].join('~')
+                child = key_vals[parent_fields..-1].join('~')
+                left_parent = left_index[parent]
+                right_parent = right_index[parent]
+                left_value = left_values[key]
+                right_value = right_values[key]
+                left_idx = left_parent && left_parent.index(key)
+                right_idx = right_parent && right_parent.index(key)
+                id = {}
+                id[:row] = right_row_id + 1
+                id[:sibling_position] = right_idx + 1
+                key_fields.each do |field_name|
+                    id[field_name] = right_value[field_name]
+                end
+                if left_idx && right_idx
+                    if include_moves
+                        left_common = left_parent & right_parent
+                        right_common = right_parent & left_parent
+                        left_pos = left_common.index(key)
+                        right_pos = right_common.index(key)
+                        if left_pos != right_pos
+                            # Move
+                            diffs[key].merge!(id.merge!(:action => 'Move',
+                                              :sibling_position => [left_idx + 1, right_idx + 1]))
+                            #puts "Move #{left_idx} -> #{right_idx}: #{key}"
+                        end
+                    end
+                    if include_updates && (changes = diff_row(left_values[key], right_values[key], diff_fields))
+                        diffs[key].merge!(id.merge(changes.merge(:action => 'Update')))
+                        #puts "Change: #{key}"
+                    end
+                elsif include_adds && right_idx
+                    # Add
+                    diffs[key].merge!(id.merge(right_values[key].merge(:action => 'Add')))
+                    #puts "Add: #{key}"
+                end
+            end
+            # Now identify deletions
+            if include_deletes
+                (left_keys - right_keys).each do |key|
+                    # Delete
+                    key_vals = key.split('~')
+                    parent = key_vals[0...parent_fields].join('~')
+                    child = key_vals[parent_fields..-1].join('~')
+                    left_parent = left_index[parent]
+                    left_value = left_values[key]
+                    left_idx = left_parent.index(key)
+                    next unless left_idx
+                    id = {}
+                    id[:row] = left_keys.index(key) + 1
+                    id[:sibling_position] = left_idx + 1
+                    key_fields.each do |field_name|
+                        id[field_name] = left_value[field_name]
+                    end
+                    diffs[key].merge!(id.merge(left_values[key].merge(:action => 'Delete')))
+                    #puts "Delete: #{key}"
+                end
+            end
+            diffs
+        end
+        # Identifies the fields that are different between two versions of the
+        # same row.
+        #
+        # @param left_row [Hash] The version of the CSV row from the left/from
+        #   file.
+        # @param right_row [Hash] The version of the CSV row from the right/to
+        #   file.
+        # @return [Hash<String, Array>] A Hash whose keys are the fields that
+        #   contain differences, and whose values are a two-element array of
+        #   [left/from, right/to] values.
+        def diff_row(left_row, right_row, fields)
+            diffs = {}
+            fields.each do |attr|
+                right_val = right_row[attr]
+                right_val = nil if right_val == ""
+                left_val = left_row[attr]
+                left_val = nil if left_val == ""
+                if left_val != right_val
+                    diffs[attr] = [left_val, right_val]
+                    #puts "#{attr}: #{left_val} -> #{right_val}"
+                end
+            end
+            diffs if diffs.size > 0
+        end
+    end
+end

data/lib/csv-diff/csv_diff.rb ADDED

@@ -0,0 +1,142 @@
+# This library performs diffs of flat file content that contains structured data
+# in fields, with rows provided in a parent-child format.
+#
+# Parent-child data does not lend itself well to standard text diffs, as small
+# changes in the organisation of the tree at an upper level (e.g. re-ordering of
+# two ancestor nodes) can lead to big movements in the position of descendant
+# records - particularly when the parent-child data is generated by a hierarchy
+# traversal.
+#
+# Additionally, simple line-based diffs can identify that a line has changed,
+# but not which field(s) in the line have changed.
+#
+# Data may be supplied in the form of CSV files, or as an array of arrays. The
+# diff process process provides a fine level of control over what to diff, and
+# can optionally ignore certain types of changes (e.g. changes in order).
+class CSVDiff
+    # @return [CSVSource] CSVSource object containing details of the left/from
+    #    input.
+    attr_reader :left
+    alias_method :from, :left
+    # @return [CSVSource] CSVSource object containing details of the right/to
+    #    input.
+    attr_reader :right
+    alias_method :to, :right
+    # @return [Array<Hash>] An array of differences
+    attr_reader :diffs
+    # @return [Array<String>] An array of field names that are compared in the
+    #    diff process.
+    attr_reader :diff_fields
+    # @return [Array<Fixnum>] An array of field indexes identifying the key
+    #    fields that uniquely identify each row.
+    attr_reader :key_fields
+    # @return [Array<String>] An array of field names for the parent field(s).
+    attr_reader :parent_fields
+    # @return [Array<String>] An array of field names for the child field(s).
+    attr_reader :child_fields
+    # Generates a diff between two hierarchical tree structures, provided
+    # as +left+ and +right+, each of which consists of an array of lines in CSV
+    # format.
+    # An array of field indexes can also be specified as +key_fields+;
+    # a minimum of one field index must be specified; the last index is the
+    # child id, and the remaining fields (if any) are the parent field(s) that
+    # uniquely qualify the child instance.
+    #
+    # @param left [Array<Array<String>>] An Array of lines, each of which is in
+    #   turn an Array containing fields.
+    # @param right [Array<Array<String>>] An Array of lines, each of which is in
+    #   turn an Array containing fields.
+    # @param options [Hash] A hash containing options.
+    # @option options [Array<String>] :field_names An Array of field names for
+    #   each field in +left+ and +right+. If not provided, the first row is
+    #   assumed to contain field names.
+    # @option options [Boolean] :ignore_header If true, the first line of each
+    #   file is ignored. This option can only be true if :field_names is
+    #   specified.
+    # @options options [Array] :ignore_fields The names of any fields to be
+    #   ignored when performing the diff.
+    # @option options [String] :key_field The name of the field that uniquely
+    #   identifies each row.
+    # @option options [Array<String>] :key_fields The names of the fields
+    #   that uniquely identifies each row.
+    # @option options [String] :parent_field The name of the field that
+    #   identifies a parent within which sibling order should be checked.
+    # @option options [String] :child_field The name of the field that
+    #   uniquely identifies a child of a parent.
+    # @option options [Boolean] :ignore_adds If true, records that appear in
+    #   the right/to file but not in the left/from file are not reported.
+    # @option options [Boolean] :ignore_updates If true, records that have been
+    #   updated are not reported.
+    # @option options [Boolean] :ignore_moves If true, changes in row position
+    #   amongst sibling rows are not reported.
+    # @option options [Boolean] :ignore_deletes If true, records that appear
+    #   in the left/from file but not in the right/to file are not reported.
+    def initialize(left, right, options = {})
+        @left = CSVSource.new(left, options)
+        raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
+        @right = CSVSource.new(right, options)
+        raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
+        @warnings = []
+        @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options.fetch(:ignore_fields, []))
+        @key_fields = @left.key_fields.map{ |kf| @diff_fields[kf] }
+        diff(options)
+    end
+    # Performs a diff with the specified +options+.
+    def diff(options = {})
+        @summary = nil
+        @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
+    end
+    # Returns a summary of the number of adds, deletes, moves, and updates.
+    def summary
+        unless @summary
+            @summary = Hash.new{ |h, k| h[k] = 0 }
+            @diffs.each{ |k, v| @summary[v[:action]] += 1 }
+            @summary['Warnings'] = warnings.size if warnings.size > 0
+        end
+        @summary
+    end
+    [:adds, :deletes, :updates, :moves].each do |mthd|
+        define_method mthd do
+            action = mthd.to_s.chomp('s')
+            @diffs.select{ |k, v| v[:action].downcase == action }
+        end
+    end
+    # @return [Array<String>] an array of warning messages generated during the
+    #    diff process.
+    def warnings
+        @left.warnings + @right.warnings + @warnings
+    end
+    private
+    # Given two sets of field names, determines the common set of fields present
+    # in both, on which members can be diffed.
+    def get_diff_fields(left_fields, right_fields, ignore_fields)
+        diff_fields = []
+        right_fields.each do |fld|
+            if left_fields.include?(fld)
+                diff_fields << fld unless ignore_fields.include?(fld)
+            else
+                @warnings << "Field '#{fld}' is missing from the left (from) file, and won't be diffed"
+            end
+        end
+        diff_fields
+    end
+    include Algorithm
+end

data/lib/csv-diff/csv_source.rb ADDED

@@ -0,0 +1,151 @@
+class CSVDiff
+    # Represents a CSV input (i.e. the left/from or right/to input) to the diff
+    # process.
+    class CSVSource
+        # @return [String] the path to the source file
+        attr_accessor :path
+        # @return [Array<String>] The names of the fields in the source file
+        attr_reader :field_names
+        # @return [Array<String>] The names of the field(s) that uniquely
+        #   identify each row.
+        attr_reader :key_fields
+        # @return [Array<String>] The names of the field(s) that identify a
+        #   common parent of child records.
+        attr_reader :parent_fields
+        # @return [Array<String>] The names of the field(s) that distinguish a
+        #   child of a parent record.
+        attr_reader :child_fields
+        # @return [Hash<String,Hash>] A hash containing each line of the source,
+        #   keyed on the values of the +key_fields+.
+        attr_reader :lines
+        # @return [Hash<String,Array<String>>] A hash containing each parent key,
+        #   and an Array of the child keys it is a parent of.
+        attr_reader :index
+        # @return [Array<String>] An array of any warnings encountered while
+        #   processing the source.
+        attr_reader :warnings
+        # Creates a new diff source.
+        #
+        # A diff source must contain at least one field that will be used as the
+        # key to identify the same record in a different version of this file.
+        # If not specified via one of the options, the first field is assumed to
+        # be the unique key.
+        #
+        # If multiple fields combine to form a unique key, the parent is assumed
+        # to be identified by all but the last field of the unique key. If finer
+        # control is required, use a combination of the :parent_fields and
+        # :child_fields options.
+        #
+        # All key options can be specified either by field name, or by field
+        # index (0 based).
+        #
+        # @param source [String|Array<Array>] Either a path to a CSV file, or an
+        #   Array of Arrays containing CSV data. If the :field_names option is
+        #   not specified, the first line must contain the names of the fields.
+        # @param options [Hash] An options hash.
+        # @option options [String] :mode_string The mode to use when opening the
+        #   CSV file. Defaults to 'r'.
+        # @option options [Hash] :csv_options Any options you wish to pass to
+        #   CSV.open, e.g. :col_sep.
+        # @option options [Array<String>] :field_names The names of each of the
+        #   fields in +source+.
+        # @option options [Boolean] :ignore_header If true, and :field_names has
+        #   been specified, then the first row of the file is ignored.
+        # @option options [String] :key_field The name of the field that uniquely
+        #   identifies each row.
+        # @option options [Array<String>] :key_fields The names of the fields
+        #   that uniquely identifies each row.
+        # @option options [String] :parent_field The name of the field that
+        #   identifies a parent within which sibling order should be checked.
+        # @option options [String] :child_field The name of the field that
+        #   uniquely identifies a child of a parent.
+        def initialize(source, options = {})
+            if source.is_a?(String)
+                require 'csv'
+                mode_string = options.fetch(:mode_string, 'r')
+                csv_options = options.fetch(:csv_options, {})
+                @path = source
+                source = CSV.open(@path, mode_string, csv_options).readlines
+            end
+            if kf = options.fetch(:key_field, options[:key_fields])
+                @key_fields = [kf].flatten
+                @parent_fields = @key_fields[0...-1]
+                @child_fields = @key_fields[-1..-1]
+            else
+                @parent_fields = [options.fetch(:parent_field, options.fetch(:parent_fields, []))].flatten
+                @child_fields = [options.fetch(:child_field, options.fetch(:child_fields, [0]))].flatten
+                @key_fields = @parent_fields + @child_fields
+            end
+            @field_names = options[:field_names]
+            @warnings = []
+            index_source(source, options)
+        end
+        # Returns the row in the CSV source corresponding to the supplied key.
+        #
+        # @param key [String] The unique key to use to lookup the row.
+        # @return [Hash] The fields for the line corresponding to +key+, or nil
+        #   if the key is not recognised.
+        def [](key)
+            @lines[key]
+        end
+        private
+        # Given an array of lines, where each line is an array of fields, indexes
+        # the array contents so that it can be looked up by key.
+        def index_source(lines, options)
+            @lines = {}
+            @index = Hash.new{ |h, k| h[k] = [] }
+            @key_fields = find_field_indexes(@key_fields, @field_names) if @field_names
+            line_num = 0
+            lines.each do |row|
+                line_num += 1
+                next if line_num == 1 && @field_names && options[:ignore_header]
+                unless @field_names
+                    @field_names = row
+                    @key_fields = find_field_indexes(@key_fields, @field_names)
+                    next
+                end
+                field_vals = row
+                line = {}
+                @field_names.each_with_index do |field, i|
+                    line[field] = field_vals[i]
+                end
+                key_values = @key_fields.map{ |kf| field_vals[kf].to_s.upcase }
+                key = key_values.join('~')
+                parent_key = key_values[0...(@parent_fields.length)].join('~')
+                if @lines[key]
+                    @warnings << "Duplicate key '#{key}' encountered and ignored at line #{line_num}"
+                else
+                    @index[parent_key] << key
+                    @lines[key] = line
+                end
+            end
+        end
+        # Converts an array of field names to an array of indexes of the fields
+        # matching those names.
+        def find_field_indexes(key_fields, field_names)
+            key_fields.map do |field|
+                if field.is_a?(Fixnum)
+                    field
+                else
+                    field_names.index{ |field_name| field.to_s.downcase == field_name.downcase } or
+                        raise ArgumentError, "Could not locate field '#{field}' in source field names: #{
+                            field_names.join(', ')}"
+                end
+            end
+        end
+    end
+end

data/lib/csv_diff.rb ADDED

	@@ -0,0 +1,2 @@
1	+ require 'csv-diff'
2	+

metadata ADDED

@@ -0,0 +1,62 @@
+--- !ruby/object:Gem::Specification
+name: csv-diff
+version: !ruby/object:Gem::Version
+  version: '0.1'
+  prerelease:
+platform: ruby
+authors:
+- Adam Gardiner
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2014-05-30 00:00:00.000000000 Z
+dependencies: []
+description: ! "        This library performs diffs of CSV files.\n\n        Unlike
+  a standard diff that compares line by line, and is sensitive to the\n        ordering
+  of records, CSV-Diff identifies common lines by key field(s), and\n        then
+  compares the contents of the fields in each line.\n\n        Data may be supplied
+  in the form of CSV files, or as an array of arrays. The\n        diff process provides
+  a fine level of control over what to diff, and can\n        optionally ignore certain
+  types of changes (e.g. changes in position).\n\n        CSV-Diff is particularly
+  well suited to data in parent-child format. Parent-\n        child data does not
+  lend itself well to standard text diffs, as small changes\n        in the organisation
+  of the tree at an upper level can lead to big movements\n        in the position
+  of descendant records. By instead matching records by key,\n        CSV-Diff avoids
+  this issue, while still being able to detect changes in\n        sibling order.\n"
+email: adam.b.gardiner@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- LICENSE
+- lib/csv-diff/algorithm.rb
+- lib/csv-diff/csv_diff.rb
+- lib/csv-diff/csv_source.rb
+- lib/csv-diff.rb
+- lib/csv_diff.rb
+homepage: https://github.com/agardiner/csv-diff
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.23
+signing_key:
+specification_version: 3
+summary: CSV Diff is a library for generating diffs from data in CSV format
+test_files: []