RubyGems - csv-diff-report - Versions diffs - 0.2 - Mend

csv-diff-report 0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +15 -0
data/LICENSE +22 -0
data/README.md +264 -0
data/bin/csvdiff +8 -0
data/lib/csv-diff-report.rb +6 -0
data/lib/csv-diff-report/cli.rb +122 -0
data/lib/csv-diff-report/excel.rb +154 -0
data/lib/csv-diff-report/html.rb +170 -0
data/lib/csv-diff-report/report.rb +226 -0
data/lib/csv_diff_report.rb +2 -0
metadata +116 -0

checksums.yaml ADDED

@@ -0,0 +1,15 @@
+---
+!binary "U0hBMQ==":
+  metadata.gz: !binary |-
+    ZTAxMGE1NTA2MjRhMjkzOGUyNThmMGQ1MjJmMGUwY2FjOWUzNzg2OQ==
+  data.tar.gz: !binary |-
+    NDdhNTE0M2M5MWFmNDUyYTE4NTgyMWIzMWY0NWNkOTYwMGFhM2FmNA==
+SHA512:
+  metadata.gz: !binary |-
+    YjU2ZGZjMTJlOTUzYzE1YTE5MmZiOTZkYTRhZjBhOWVmODRiMDlhMDI5ZDZh
+    ZTUzYjUxYzNlOGYzMDZiYTgzNjU1YTE2ZDkzZTk4Mjk1ZTQxMmI2MzAyYzM1
+    OGFmOWEzMTVjZDM0NThlYmExMWJlZjQwZmZkNWVkYWU1Nzk3ZTQ=
+  data.tar.gz: !binary |-
+    N2I3NjgzYjQ0YTc5OTJkOTY2YWM3MTQyMDdiZjJiODdmNTQ5MDQ2MjBjZTYx
+    MjRhMDBiMTY1NDE0ZjI1YTE2NTUwZWFjZDcyMDFiMDg1MzNkMmU3MjVkNDBi
+    MDZkMWY2Yzk4YmVlYjZlMWNiZDQyYzIyMTg5MjA3MzJlODYxZmQ=

data/LICENSE ADDED

@@ -0,0 +1,22 @@
+Copyright (c) 2013, Adam Gardiner
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README.md ADDED

@@ -0,0 +1,264 @@
+# CSV-Diff Report
+CSV-Diff Report is a command-line tool for generating diff reports in Excel or
+HTML format from CSV files. It uses the CSV-Diff gem to perform diffs, and adds
+to that library the ability to generate formatted reports, and a command-line
+tool `csvdiff` for running diffs between files or directories.
+## CSV-Diff
+Unlike a standard diff that compares line by line, and is sensitive to the
+ordering of records, CSV-Diff identifies common lines by key field(s), and
+then compares the contents of the fields in each line.
+CSV-Diff is particularly well suited to data in parent-child format. Parent-
+child data does not lend itself well to standard text diffs, as small changes
+in the organisation of the tree at an upper level can lead to big movements
+in the position of descendant records. By instead matching records by key,
+CSV-Diff avoids this issue, while still being able to detect changes in
+sibling order.
+## Usage
+CSV-Diff Report is supplied as a gem, and has dependencies on a few small libraries.
+To install it, simply:
+```
+gem install csv-diff-report
+```
+To compare two CSV files where the field names are in the first row of the file,
+and the first field contains the unique key for each record, simply use:
+```
+csvdiff <file1> <file2>
+```
+The `csvdiff` command-line tool provides many options to control behaviour of
+the diff and reporting process. To see all available options, run:
+```
+csv-diff --help
+```
+This will display a help screen like the following:
+```
+CSV-Diff
+========
+Generate a diff report between two files using the CSV-Diff algorithm.
+USAGE
+-----
+  ruby /usr/local/opt/ruby/bin/csvdiff FROM TO [OPTIONS]
+  FROM    The file or dir to use as the left or from source in the diff
+  TO      The file or dir to use as the right or to source in the diff
+OPTIONS
+-------
+Source Options
+  --pattern PATTERN                A file name pattern to use to filter matching files if a directory diff is
+                                   being performed
+                                   [Default: *]
+  --field-names FIELD-NAMES        A comma-separated list of field names for each field in the source files
+  --parent-fields PARENT-FIELDS    The parent field name(s) or index(es)
+  --child-fields CHILD-FIELDS      The child field name(s) or index(es)
+  --key-fields KEY-FIELDS          The key field name(s) or index(es)
+  --encoding ENCODING              The encoding to use when opening the CSV files
+  --ignore-header                  If true, the first line in each source file is ignored; requires the use of
+                                   the --field-names option to name the fields
+Diff Options
+  --ignore-fields IGNORE-FIELDS    The names or indexes of any fields to be ignored during the diff
+  --ignore-adds                    If true, items in TO that are not in FROM are ignored
+  --ignore-deletes                 If true, items in FROM that are not in TO are ignored
+  --ignore-updates                 If true, changes to non-key properties are ignored
+  --ignore-moves                   If true, changes in an item's position are ignored
+Output Options
+  --format FORMAT                  The format in which to produce the diff report
+                                   [Default: HTML]
+  --output OUTPUT                  The path to save the diff report to. If not specified, the diff report will
+                                   be placed in the same directory as the FROM file, and will be named
+                                   Diff_<FROM>_to_<TO>.<FORMAT>
+```
+## Unique Row Identifiers
+CSVDiff is preferable over a standard line-by-line diff when row order is
+significantly impacted by small changes. The classic example is a parent-child
+file generated by a hierarchy traversal. A simple change in position of a parent
+member near the root of the hierarchy will have a large impact on the positions
+of all descendant rows. Consider the following example:
+```
+Root
+  |- A
+  |  |- A1
+  |  |- A2
+  |
+  |- B
+     |- B1
+     |- B2
+```
+A hierarchy traversal of this tree into a parent-child format would generate a CSV
+as follows:
+```
+Root,A
+A,A1
+A,A2
+Root,B
+B,B1
+B,B2
+```
+If the positions of A and B were swapped, a hierarchy traversal would now produce a CSV
+as follows:
+```
+Root,B
+B,B1
+B,B2
+Root,A
+A,A1
+A,A2
+```
+A simple diff using a diff utility would highlight this as 3 additions and 3 deletions.
+CSVDiff, however, would classify this as 2 moves (a change in sibling position for A and B).
+In order to do this, CSVDiff needs to know what field(s) confer uniqueness on each row.
+In this example, we could use the child field alone (since each member name only appears
+once); however, this would imply a flat structure, where all rows are children of a single
+parent. This in turn would cause CSVDiff to classify the above change as a Move (i.e. a
+change in order) of all 6 rows.
+The more correct specification of this file is that column 0 contains a unique parent
+identifier, and column 1 contains a unique child identifier. CSVDiff can then correctly
+deduce that there is in fact only two changes in order - the swap in positions of A and
+B below Root.
+Note: If you aren't interested in changes in the order of siblings, then you could use
+CSVDiff with a :key_field option of column 1, and specify the :ignore_moves option.
+## Warnings
+When processing and diffing files, CSVDiff may encounter problems with the data or
+the specifications it has been given. It will continue even in the face of problems,
+but will log details of the problems in a #warnings Array. The number of warnings
+will also be included in the Hash returned by the #summary method.
+Warnings may be raised for any of the following:
+* Missing fields: If the right/to file contains fields that are not present in the
+  left/from file, a warning is raised and the field is ignored for diff purposes.
+* Duplicate keys: If two rows are found that have the same values for the key field(s),
+  a warning is raised, and the duplicate values are ignored.
+## Examples
+The simplest use case is as shown above, where the data to be diffed is in CSV files
+with the column names as the first record, and where the unique key is the first
+column in the data. In this case, a diff can be created simply via:
+```ruby
+diff = CSVDiff.new(file1, file2)
+```
+### Specifynig Unique Row Identifiers
+Often however, rows are not uniquely identifiable via the first column in the file.
+In a parent-child hierarchy, for example, combinations of parent and child may be
+necessary to uniquely identify a row. In these cases, it is necessary to indicate
+which fields are used to uniquely identify common rows across the two files. This
+can be done in several different ways.
+1. Using the :key_fields option with field numbers (these are 0-based):
+    ```ruby
+    diff = CSVDiff.new(file1, file2, key_fields: [0, 1])
+    ```
+2. Using the :key_fields options with column names:
+    ```ruby
+    diff = CSVDiff.new(file1, file2, key_fields: ['Parent', 'Child'])
+    ```
+3. Using the :parent_fields and :child_fields with field numbers:
+    ```ruby
+    diff = CSVDiff.new(file1, file2, parent_field: 1, child_fields: [2, 3])
+    ```
+4. Using the :parent_fields and :child_fields with column names:
+    ```ruby
+    diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'])
+    ```
+### Using Non-CSV File Sources
+Data from non-CSV sources can be diffed, as long as it can be supplied as an Array
+of Arrays:
+```ruby
+DATA1 = [
+    ['Parent', 'Child', 'Description'],
+    ['A', 'A1', 'Account 1'],
+    ['A', 'A2', 'Account 2']
+]
+DATA2 = [
+    ['Parent', 'Child', 'Description'],
+    ['A', 'A1', 'Account1'],
+    ['A', 'A2', 'Account2']
+]
+diff = CSVDiff.new(DATA1, DATA2, key_fields: [1, 0])
+```
+### Specifying Column Names
+If your data file does not include column headers, you can specify the names of
+each column when creating the diff. The names supplied are the keys used in the
+diff results:
+```ruby
+DATA1 = [
+    ['A', 'A1', 'Account 1'],
+    ['A', 'A2', 'Account 2']
+]
+DATA2 = [
+    ['A', 'A1', 'Account1'],
+    ['A', 'A2', 'Account2']
+]
+diff = CSVDiff.new(DATA1, DATA2, key_fields: [1, 0], field_names: ['Parent', 'Child', 'Description'])
+```
+If your data file does contain a header row, but you wish to use your own column
+names, you can specify the :field_names option and the :ignore_header option to
+ignore the first row.
+### Ignoring Fields
+If your data contains fields that you aren't interested in, these can be excluded
+from the diff process using the :ignore_fields option:
+```ruby
+diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
+                   ignore_fields: ['CreatedAt', 'UpdatedAt'])
+```
+### Ignoring Certain Changes
+CSVDiff identifies Adds, Updates, Moves and Deletes; any of these changes can be selectively
+ignored, e.g. if you are not interested in Deletes, you can pass the :ignore_deletes option:
+```ruby
+diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
+                   ignore_fields: ['CreatedAt', 'UpdatedAt'],
+                   ignore_deletes: true, ignore_moves: true)
+```

data/bin/csvdiff ADDED

@@ -0,0 +1,8 @@
+#!ruby -w
+require 'csv-diff-report'
+require 'csv-diff-report/cli'
+CSVDiff::CLI.new.run

data/lib/csv-diff-report.rb ADDED

@@ -0,0 +1,6 @@
+require 'pathname'
+require 'yaml'
+require 'csv-diff'
+require 'color-console'
+require 'csv-diff-report/report'

data/lib/csv-diff-report/cli.rb ADDED

@@ -0,0 +1,122 @@
+require 'arg-parser'
+require 'csv-diff-report'
+class CSVDiff
+    class CLI
+        include ArgParser::DSL
+        # Define an on_parse handler for field names or indexes. Splits the
+        # supplied argument value on commas, and converts numbers to Fixnums.
+        register_parse_handler(:parse_fields) do |val, arg, hsh|
+            val.split(',').map{ |fld| fld =~ /^\d+$/ ? fld.to_i : fld }
+        end
+        title 'CSV-Diff'
+        purpose <<-EOT
+            Generate a diff report between two files using the CSV-Diff algorithm.
+        EOT
+        positional_arg :from, 'The file or dir to use as the left or from source in the diff'
+        positional_arg :to, 'The file or dir to use as the right or to source in the diff'
+        positional_arg :pattern, 'A file name pattern to use to filter matching files if a directory ' +
+            'diff is being performed', default: '*'
+        usage_break 'Source Options'
+        keyword_arg :file_types, 'A comma-separated list of file-type names (supports wildcards) to process. ' +
+            'Requires the presence of a .csvdiff file in the FROM or current directory to define ' +
+            'the file type patterns',
+            short_key: 't', on_parse: :split_to_array
+        keyword_arg :exclude, 'A file name pattern of files to exclude from the diff if a directory ' +
+            'diff is being performed',
+            short_key: 'x'
+        keyword_arg :field_names, 'A comma-separated list of field names for each ' +
+            'field in the source files',
+            short_key: 'f', on_parse: :split_to_array
+        keyword_arg :parent_fields, 'The parent field name(s) or index(es)',
+            short_key: 'p', on_parse: :parse_fields
+        keyword_arg :child_fields, 'The child field name(s) or index(es)',
+            short_key: 'c', on_parse: :parse_fields
+        keyword_arg :key_fields, 'The key field name(s) or index(es)',
+            short_key: 'k', on_parse: :parse_fields
+        keyword_arg :encoding, 'The encoding to use when opening the CSV files',
+            short_key: 'e'
+        flag_arg :tab_delimited, 'If true, the file is assumed to be tab-delimited rather than comma-delimited'
+        flag_arg :ignore_header, 'If true, the first line in each source file is ignored; ' +
+            'requires the use of the --field-names option to name the fields'
+        usage_break 'Diff Options'
+        keyword_arg :ignore_fields, 'The names or indexes of any fields to be ignored during the diff',
+            short_key: 'i', on_parse: :parse_fields
+        flag_arg :ignore_adds, "If true, items in TO that are not in FROM are ignored",
+            short_key: 'A'
+        flag_arg :ignore_deletes, "If true, items in FROM that are not in TO are ignored",
+            short_key: 'D'
+        flag_arg :ignore_updates, "If true, changes to non-key properties are ignored",
+            short_key: 'U'
+        flag_arg :ignore_moves, "If true, changes in an item's position are ignored",
+            short_key: 'M'
+        usage_break 'Output Options'
+        keyword_arg :format, 'The format in which to produce the diff report',
+            default: 'HTML', validation: /^(html|xls(x)?)$/i
+        keyword_arg :output, 'The path to save the diff report to. If not specified, the diff ' +
+            'report will be placed in the same directory as the FROM file, and will be named ' +
+            'Diff_<FROM>_to_<TO>.<FORMAT>'
+        # Parses command-line options, and then performs the diff.
+        def run
+            if arguments = parse_arguments
+                begin
+                    process(arguments)
+                rescue RuntimeError => ex
+                    Console.puts ex.message, :red
+                    exit 1
+                end
+            else
+                if show_help?
+                    show_help(nil, Console.width).each do |line|
+                        Console.puts line, :cyan
+                    end
+                else
+                    show_usage(nil, Console.width).each do |line|
+                        Console.puts line, :yellow
+                    end
+                end
+                exit 2
+            end
+        end
+        # Process a CSVDiffReport using +arguments+ to determine all options.
+        def process(arguments)
+            options = {}
+            exclude_args = [:from, :to, :tab_delimited]
+            arguments.each_pair do |arg, val|
+                options[arg] = val if val && !exclude_args.include?(arg)
+            end
+            options[:csv_options] = {:col_sep => "\t"} if arguments.tab_delimited
+            rep = CSVDiff::Report.new
+            rep.diff(arguments.from, arguments.to, options)
+            output_dir = FileTest.directory?(arguments.from) ?
+                arguments.from : File.dirname(arguments.from)
+            left_name = File.basename(arguments.from, File.extname(arguments.from))
+            right_name = File.basename(arguments.to, File.extname(arguments.to))
+            output = arguments.output ||
+                "#{output_dir}/Diff_#{left_name}_to_#{right_name}.diff"
+            rep.output(output, arguments.format)
+        end
+    end
+end
+if __FILE__ == $0
+    CSVDiff::CLI.new.run
+end

data/lib/csv-diff-report/excel.rb ADDED

@@ -0,0 +1,154 @@
+class CSVDiff
+    # Defines functionality for exporting a Diff report to Excel in XLSX format
+    # using the Axlsx library.
+    module Excel
+        private
+        # Generare a diff report in XLSX format.
+        def xl_output(output)
+            require 'axlsx'
+            # Create workbook
+            xl = xl_new
+            # Add a summary sheet and diff sheets for each diff
+            xl_summary_sheet(xl)
+            # Save workbook
+            path = "#{File.dirname(output)}/#{File.basename(output, File.extname(output))}.xlsx"
+            xl_save(xl, path)
+        end
+        # Create a new XL package object
+        def xl_new
+            @xl_styles = {}
+            xl = Axlsx::Package.new
+            xl.use_shared_strings = true
+            xl.workbook.styles do |s|
+                s.fonts[0].sz = 9
+                @xl_styles['Title'] = s.add_style(:b => true)
+                @xl_styles['Comma'] = s.add_style(:format_code => '#,##0')
+                @xl_styles['Right'] = s.add_style(:alignment => {:horizontal => :right})
+                @xl_styles['Add'] = s.add_style :fg_color => '00A000'
+                @xl_styles['Update'] = s.add_style :fg_color => '0000A0', :bg_color => 'F0F0FF'
+                @xl_styles['Move'] = s.add_style :fg_color => '4040FF'
+                @xl_styles['Delete'] = s.add_style :fg_color => 'FF0000', :strike => true
+            end
+            xl
+        end
+        # Add summary sheet
+        def xl_summary_sheet(xl)
+            compare_from = @left
+            compare_to = @right
+            xl.workbook.add_worksheet(name: 'Summary') do |sheet|
+                sheet.add_row do |row|
+                    row.add_cell 'From:', :style => @xl_styles['Title']
+                    row.add_cell compare_from
+                end
+                sheet.add_row do |row|
+                    row.add_cell 'To:', :style => @xl_styles['Title']
+                    row.add_cell compare_to
+                end
+                sheet.add_row
+                sheet.add_row ['Sheet', 'Adds', 'Deletes', 'Updates', 'Moves'], :style => @xl_styles['Title']
+                sheet.column_info.each do |ci|
+                    ci.width = 10
+                end
+                sheet.column_info.first.width = 20
+                @diffs.each do |file_diff|
+                    sheet.add_row([File.basename(file_diff.left.path, File.extname(file_diff.left.path)),
+                                   file_diff.summary['Add'], file_diff.summary['Delete'],
+                                   file_diff.summary['Update'], file_diff.summary['Move']])
+                    xl_diff_sheet(xl, file_diff) if file_diff.diffs.size > 0
+                end
+            end
+        end
+        # Add diff sheet
+        def xl_diff_sheet(xl, file_diff)
+            sheet_name = File.basename(file_diff.left.path, File.extname(file_diff.left.path))
+            all_fields = [:row, :action, :sibling_position] + file_diff.diff_fields
+            xl.workbook.add_worksheet(name: sheet_name) do |sheet|
+                sheet.add_row(all_fields.map{ |f| f.to_s }, :style => @xl_styles['Title'])
+                file_diff.diffs.sort_by{|k, v| v[:row] }.each do |key, diff|
+                    sheet.add_row do |row|
+                        chg = diff[:action]
+                        all_fields.each_with_index do |field, i|
+                            cell = nil
+                            comment = nil
+                            old = nil
+                            style = case chg
+                            when 'Add', 'Delete' then @xl_styles[chg]
+                            else 0
+                            end
+                            d = diff[field]
+                            if d.is_a?(Array)
+                                old = d.first
+                                new = d.last
+                                if old.nil?
+                                    style = @xl_styles['Add']
+                                else
+                                    style = @xl_styles[chg]
+                                    comment = old
+                                end
+                            else
+                                new = d
+                                style = @xl_styles[chg] if i == 1
+                            end
+                            case new
+                            when String
+                                cell = row.add_cell(new.encode('utf-8'), :style => style) #, :type => :string)
+                            #    cell = row.add_cell(new, :style => style)
+                            else
+                                cell = row.add_cell(new, :style => style)
+                            end
+                            sheet.add_comment(:ref => cell.r, :author => 'Current', :visible => false,
+                                              :text => old.to_s.encode('utf-8')) if comment
+                        end
+                    end
+                end
+                sheet.column_info.each do |ci|
+                    ci.width = 80 if ci.width > 80
+                end
+                xl_filter_and_freeze(sheet, 5)
+            end
+        end
+        # Freeze the top row and +freeze_cols+ of +sheet+.
+        def xl_filter_and_freeze(sheet, freeze_cols = 0)
+            sheet.auto_filter = "A1:#{Axlsx::cell_r(sheet.rows.first.cells.size - 1, sheet.rows.size - 1)}"
+            sheet.sheet_view do |sv|
+                sv.pane do |p|
+                    p.state = :frozen
+                    p.x_split = freeze_cols
+                    p.y_split = 1
+                end
+            end
+        end
+        # Save +xl+ package to +path+
+        def xl_save(xl, path)
+            begin
+                xl.serialize(path)
+                path
+            rescue RuntimeError => ex
+                Console.puts ex.message, :red
+                raise "Unable to replace existing Excel file #{path} - is it already open in Excel?"
+            end
+        end
+    end
+end

data/lib/csv-diff-report/html.rb ADDED

@@ -0,0 +1,170 @@
+class CSVDiff
+    # Defines functionality for exporting a Diff report in HTML format.
+    module Html
+        private
+        # Generare a diff report in XLSX format.
+        def html_output(output)
+            content = []
+            content << '<html>'
+            content << '<head>'
+            content << '<title>Diff Report</title>'
+            content << '<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">'
+            content << html_styles
+            content << '</head>'
+            content << '<body>'
+            html_summary(content)
+            @diffs.each do |file_diff|
+                html_diff(content, file_diff) if file_diff.diffs.size > 0
+            end
+            content << '</body>'
+            content << '</html>'
+            # Save workbook
+            path = "#{File.dirname(output)}/#{File.basename(output, File.extname(output))}.html"
+            File.open(path, 'w'){ |f| f.write(content.join("\n")) }
+            path
+        end
+        # Returns the HTML head content, which contains the styles used for diffing.
+        def html_styles
+            style = <<-EOT
+                <style>
+                    @font-face {font-family: Calibri;}
+                    h1 {font-family: Calibri; font-size: 16pt;}
+                    h2 {font-family: Calibri; font-size: 14pt; margin: 1em 0em .2em;}
+                    h3 {font-family: Calibri; font-size: 12pt; margin: 1em 0em .2em;}
+                    body {font-family: Calibri; font-size: 11pt;}
+                    p {margin: .2em 0em;}
+                    table {font-family: Calibri; font-size: 10pt; line-height: 12pt; border-collapse: collapse;}
+                    th {background-color: #00205B; color: white; font-size: 11pt; font-weight: bold; text-align: left;
+                        border: 1px solid #DDDDFF; padding: 1px 5px;}
+                    td {border: 1px solid #DDDDFF; padding: 1px 5px;}
+                    .summary {font-size: 13pt;}
+                    .add {background-color: white; color: #33A000;}
+                    .delete {background-color: white; color: #FF0000; text-decoration: line-through;}
+                    .update {background-color: white; color: #0000A0;}
+                    .move {background-color: white; color: #0000A0;}
+                    .bold {font-weight: bold;}
+                    .center {text-align: center;}
+                    .right {text-align: right;}
+                    .separator {width: 200px; border-bottom: 1px gray solid;}
+                </style>
+            EOT
+            style
+        end
+        def html_summary(body)
+            body << '<h2>Summary</h2>'
+            body << '<p>Source Locations:</p>'
+            body << '<table>'
+            body << '<tbody>'
+            body << "<tr><th>From:</th><td>#{@left}</td></tr>"
+            body << "<tr><th>To:</th><td>#{@right}</td></tr>"
+            body << '</tbody>'
+            body << '</table>'
+            body << '<br>'
+            body << '<p>Differences:</p>'
+            body << '<table>'
+            body << '<thead><tr>'
+            body << '<th>File</th><th>Adds</th><th>Deletes</th><th>Updates</th><th>Moves</th>'
+            body << '</tr></thead>'
+            body << '<tbody>'
+            @diffs.each do |file_diff|
+                label = File.basename(file_diff.left.path)
+                body << '<tr>'
+                if file_diff.diffs.size > 0
+                    body << "<td><a href='##{label}'>#{label}</a></td>"
+                else
+                    body << "<td>#{label}</td>"
+                end
+                body << "<td class='right'>#{file_diff.summary['Add']}</td>"
+                body << "<td class='right'>#{file_diff.summary['Delete']}</td>"
+                body << "<td class='right'>#{file_diff.summary['Update']}</td>"
+                body << "<td class='right'>#{file_diff.summary['Move']}</td>"
+                body << '</tr>'
+            end
+            body << '</tbody>'
+            body << '</table>'
+        end
+        def html_diff(body, file_diff)
+            label = File.basename(file_diff.left.path)
+            body << "<h2 id=#{label}>#{label}</h2>"
+            body << '<p>'
+            count = 0
+            if file_diff.summary['Add'] > 0
+                body << "<span class='add'>#{file_diff.summary['Add']} Adds</span>"
+                count += 1
+            end
+            if file_diff.summary['Delete'] > 0
+                body << ', ' if count > 0
+                body << "<span class='delete'>#{file_diff.summary['Delete']} Deletes</span>"
+                count += 1
+            end
+            if file_diff.summary['Update'] > 0
+                body << ', ' if count > 0
+                body << "<span class='update'>#{file_diff.summary['Update']} Updates</span>"
+                count += 1
+            end
+            if file_diff.summary['Move'] > 0
+                body << ', ' if count > 0
+                body << "<span class='move'>#{file_diff.summary['Move']} Moves</span>"
+            end
+            body << '</p>'
+            all_fields = [:row, :action, :sibling_position] + file_diff.diff_fields
+            body << '<table>'
+            body << '<thead><tr>'
+            all_fields.each do |fld|
+                body << "<th>#{fld.to_s}</th>"
+            end
+            body << '</tr></thead>'
+            body << '<tbody>'
+            file_diff.diffs.sort_by{|k, v| v[:row] }.each do |key, diff|
+                body << '<tr>'
+                chg = diff[:action]
+                all_fields.each_with_index do |field, i|
+                    old = nil
+                    style = case chg
+                    when 'Add', 'Delete' then chg.downcase
+                    end
+                    d = diff[field]
+                    if d.is_a?(Array)
+                        old = d.first
+                        new = d.last
+                        if old.nil?
+                            style = 'add'
+                        else
+                            style = chg.downcase
+                        end
+                    else
+                        new = d
+                        style = chg.downcase if i == 1
+                    end
+                    body << '<td>'
+                    body << "<span class='delete'>#{old}</span>" if old
+                    body << '<br>' if old && old.to_s.length > 10
+                    body << "<span#{style ? " class='#{style}'" : ''}>#{new}</span>"
+                    body << '</td>'
+                end
+                body << '</tr>'
+            end
+            body << '</tbody>'
+            body << '</table>'
+        end
+    end
+end

data/lib/csv-diff-report/report.rb ADDED

@@ -0,0 +1,226 @@
+require 'csv-diff-report/excel'
+require 'csv-diff-report/html'
+class CSVDiff
+    # Defines a class for generating diff reports using CSVDiff.
+    #
+    # A diff report may contain multiple file diffs, and can be output as either an
+    # XLSX spreadsheet document, or an HTML file.
+    class Report
+        include Excel
+        include Html
+        # Instantiate a new diff report object.
+        def initialize
+            @diffs = []
+        end
+        # Add a CSVDiff object to this report.
+        def <<(diff)
+            if diff.is_a?(CSVDiff)
+                @diffs << diff
+            else
+                raise ArgumentError, "Only CSVDiff objects can be added to a CSVDiff::Report"
+            end
+        end
+        # Add a diff to the diff report.
+        #
+        # @param options [Hash] Options to be passed to the diff process.
+        def diff(left, right, options = {})
+            @left = Pathname.new(left)
+            @right = Pathname.new(right)
+            if @left.file? && @right.file?
+                Console.puts "Performing file diff:"
+                Console.puts "  From File:    #{@left}"
+                Console.puts "  To File:      #{@right}"
+                opt_file = load_opt_file(@left.dirname)
+                diff_file(@left.to_s, @right.to_s, options, opt_file)
+            elsif @left.directory? && @right.directory?
+                Console.puts "Performing directory diff:"
+                Console.puts "  From directory:  #{@left}"
+                Console.puts "  To directory:    #{@right}"
+                opt_file = load_opt_file(@left)
+                if fts = options[:file_types]
+                    file_types = find_matching_file_types(fts, opt_file)
+                    file_types.each do |file_type|
+                        hsh = opt_file[:file_types][file_type]
+                        ft_opts = options.merge(hsh)
+                        diff_dir(@left, @right, ft_opts, opt_file)
+                    end
+                else
+                    diff_dir(@left, @right, options, opt_file)
+                end
+            else
+                raise ArgumentError, "Left and right must both exist and be files or directories"
+            end
+        end
+        # Saves a diff report to +path+ in +format+.
+        #
+        # @param path [String] The path to the output report.
+        # @param format [Symbol] The output format for the report; one of :html or
+        #   :xlsx.
+        def output(path, format = :html)
+            path = case format.to_s
+            when /^html$/i
+                html_output(path)
+            when /^xls(x)?$/i
+                xl_output(path)
+            else
+                raise ArgumentError, "Unrecognised output format: #{format}"
+            end
+            Console.puts "Diff report saved to '#{path}'"
+        end
+        private
+        # Loads an options file from +dir+
+        def load_opt_file(dir)
+            opt_path = Pathname(dir + '.csvdiff')
+            opt_path = Pathname('.csvdiff') unless opt_path.exist?
+            if opt_path.exist?
+                Console.puts "Loading options from .csvdiff at '#{dir}'"
+                opt_file = YAML.load(IO.read(opt_path))
+                symbolize_keys(opt_file)
+            end
+        end
+        # Convert keys in hashes to lower-case symbols for consistency
+        def symbolize_keys(hsh)
+            Hash[hsh.map{ |k, v| [k.to_s.downcase.intern, v.is_a?(Hash) ?
+                symbolize_keys(v) : v] }]
+        end
+        # Locates the file types in +opt_file+ that match the +file_types+ list of
+        # file type names or patterns
+        def find_matching_file_types(file_types, opt_file)
+            matched_fts = []
+            if known_fts = opt_file && opt_file[:file_types] && opt_file[:file_types].keys
+                file_types.each do |ft|
+                    re = Regexp.new(ft.gsub('.', '\.').gsub('?', '.').gsub('*', '.*'), true)
+                    matches = known_fts.select{ |file_type| file_type.to_s =~ re }
+                    if matches.size > 0
+                        matched_fts.concat(matches)
+                    else
+                        Console.puts "No file type matching '#{ft}' defined in .csvdiff", :yellow
+                        Console.puts "Known file types are: #{opt_file[:file_types].keys.join(', ')}", :yellow
+                    end
+                end
+            else
+                if opt_file
+                    Console.puts "No file types are defined in .csvdiff", :yellow
+                else
+                    Console.puts "The file_types option can only be used when a " +
+                        ".csvdiff is present in the LEFT or current directory", :yellow
+                end
+            end
+            matched_fts.uniq
+        end
+        # Diff files that exist in both +left+ and +right+ directories.
+        def diff_dir(left, right, options, opt_file)
+            pattern = Pathname(options[:pattern] || '*')
+            exclude = options[:exclude]
+            Console.puts "  Include Pattern: #{pattern}"
+            Console.puts "  Exclude Pattern: #{exclude}" if exclude
+            left_files = Dir[left + pattern].sort
+            excludes = exclude ? Dir[left + exclude] : []
+            (left_files - excludes).each_with_index do |file, i|
+                right_file = right + File.basename(file)
+                if right_file.file?
+                    diff_file(file, right_file.to_s, options, opt_file)
+                else
+                    Console.puts "Skipping file '#{File.basename(file)}', as there is " +
+                        "no corresponding TO file", :yellow
+                end
+            end
+        end
+        # Diff two files, and add the results to the diff report.
+        #
+        # @param left [String] The path to the left file
+        # @param right [String] The path to the right file
+        # @param options [Hash] The options to be passed to CSVDiff.
+        def diff_file(left, right, options, opt_file)
+            settings = find_file_type_settings(left, opt_file)
+            return if settings[:ignore]
+            options = settings.merge(options)
+            from = open_source(left, :from, options)
+            to = open_source(right, :to, options)
+            diff = CSVDiff.new(left, right, options)
+            diff.diff_warnings.each{ |warn| Console.puts warn, :yellow }
+            Console.write "Found #{diff.diffs.size} differences"
+            diff.summary.each_with_index.map do |pair, i|
+                Console.write i == 0 ? ": " : ", "
+                k, v = pair
+                color = case k
+                        when 'Add' then :light_green
+                        when 'Delete' then :red
+                        when 'Update' then :cyan
+                        when 'Move' then :light_magenta
+                        when 'Warning' then :yellow
+                        end
+                Console.write "#{v} #{k}s", color
+            end
+            Console.puts
+            self << diff
+        end
+        # Locates any file type settings for +left+ in the +opt_file+ hash.
+        def find_file_type_settings(left, opt_file)
+            left = Pathname(left.gsub('\\', '/'))
+            settings = opt_file && opt_file[:defaults] || {}
+            opt_file && opt_file[:file_types] && opt_file[:file_types].each do |file_type, hsh|
+                unless hsh[:pattern]
+                    Console.puts "Invalid setting for file_type #{file_type} in .csvdiff; " +
+                        "missing a 'pattern' key to use to match files", :yellow
+                    hsh[:pattern] = '-'
+                end
+                next if hsh[:pattern] == '-'
+                unless hsh[:matched_files]
+                    hsh[:matched_files] = Dir[(left.dirname + hsh[:pattern]).to_s]
+                    hsh[:matched_files] -= Dir[(left.dirname + hsh[:exclude]).to_s] if hsh[:exclude]
+                end
+                if hsh[:matched_files].include?(left.to_s)
+                    settings.merge!(hsh)
+                    [:pattern, :exclude, :matched_files].each{ |k| settings.delete(k) }
+                    break
+                end
+            end
+            settings
+        end
+        # Opens a source file.
+        #
+        # @param src [String] A path to the file to be opened.
+        # @param options [Hash] An options hash to be passed to CSVSource.
+        def open_source(src, left_right, options)
+            Console.write "Opening #{left_right.to_s.upcase} file '#{File.basename(src)}'..."
+            csv_src = CSVDiff::CSVSource.new(src.to_s, options)
+            Console.puts "  #{csv_src.lines.size} lines read", :white
+            csv_src.warnings.each{ |warn| Console.puts warn, :yellow }
+            csv_src
+        end
+    end
+end

data/lib/csv_diff_report.rb ADDED

	@@ -0,0 +1,2 @@
1	+ require 'csv-diff-report'
2	+

metadata ADDED

@@ -0,0 +1,116 @@
+--- !ruby/object:Gem::Specification
+name: csv-diff-report
+version: !ruby/object:Gem::Version
+  version: '0.2'
+platform: ruby
+authors:
+- Adam Gardiner
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2014-08-11 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: csv-diff
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0.1'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0.1'
+- !ruby/object:Gem::Dependency
+  name: arg-parser
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0.1'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0.1'
+- !ruby/object:Gem::Dependency
+  name: color-console
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0.1'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0.1'
+- !ruby/object:Gem::Dependency
+  name: axlsx
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        version: '1.3'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        version: '1.3'
+description: ! "        This library generates diff reports of CSV files, using the
+  diff capabilities\n        of the CSV Diff gem.\n\n        Unlike a standard diff
+  that compares line by line, and is sensitive to the\n        ordering of records,
+  CSV Diff identifies common lines by key field(s), and\n        then compares the
+  contents of the fields in each line.\n\n        CSV Diff Report takes the diff information
+  calculated by CSV Diff, and uses it to produce\n        Excel or HTML-based diff
+  reports. It also provides a command-line tool for generating\n        these diff
+  reports from CSV files.\n"
+email: adam.b.gardiner@gmail.com
+executables:
+- csvdiff
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE
+- README.md
+- bin/csvdiff
+- lib/csv-diff-report.rb
+- lib/csv-diff-report/cli.rb
+- lib/csv-diff-report/excel.rb
+- lib/csv-diff-report/html.rb
+- lib/csv-diff-report/report.rb
+- lib/csv_diff_report.rb
+homepage: https://github.com/agardiner/csv-diff-report
+licenses: []
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.1
+signing_key:
+specification_version: 4
+summary: CSV Diff Report is a library for generating diff reports using the CSV Diff
+  gem
+test_files: []