csv-diff 0.3.1 → 0.6.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 87807b9af487947c60d18ac81d38b6782133dfdb
4
+ data.tar.gz: 3681d305dc566f49e7b5166fd22f5ea858f1260f
5
+ SHA512:
6
+ metadata.gz: d4b617cae1c1e2201633ba5bdb2752aa16af6e293ac32c75b812502c11e54f8d930b965d7e86a81fd5592b1a469db2498a71b635fd563d576039a4808b6dd776
7
+ data.tar.gz: a6561a7c91e8e4cb8ef8487032fd8bd46794bc79854dca274d7134e251741395f049c2e5c8ca603e257310a961788d6c1472fdc9bf82a171b099f869c59cefcf
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013, Adam Gardiner
1
+ Copyright (c) 2013-2016, Adam Gardiner
2
2
  All rights reserved.
3
3
 
4
4
  Redistribution and use in source and binary forms, with or without
data/README.md CHANGED
@@ -1,14 +1,19 @@
1
1
  # CSV-Diff
2
2
 
3
- CSV-Diff is a small library for performing diffs of CSV data.
3
+ CSV-Diff is a small library for performing diffs of tabular data, typically
4
+ data loaded from CSV files.
4
5
 
5
6
  Unlike a standard diff that compares line by line, and is sensitive to the
6
7
  ordering of records, CSV-Diff identifies common lines by key field(s), and
7
8
  then compares the contents of the fields in each line.
8
9
 
9
- Data may be supplied in the form of CSV files, or as an array of arrays. The
10
- diff process provides a fine level of control over what to diff, and can
11
- optionally ignore certain types of changes (e.g. changes in position).
10
+ Data may be supplied in the form of CSV files, or as an array of arrays.
11
+ More complex usage also allows you to specify XPath expressions to extract
12
+ tabular data from XML documents for diffing.
13
+
14
+ The diff process provides a fine level of control over what to diff, and can
15
+ optionally ignore certain types of changes (e.g. adds, deletes, changes in
16
+ position etc).
12
17
 
13
18
  CSV-Diff is particularly well suited to data in parent-child format. Parent-
14
19
  child data does not lend itself well to standard text diffs, as small changes
@@ -21,17 +26,19 @@ sibling order.
21
26
  ## Usage
22
27
 
23
28
  CSV-Diff is supplied as a gem, and has no dependencies. To use it, simply:
24
- ```
25
- gem install csv-diff
26
- ```
29
+
30
+ ```
31
+ gem install csv-diff
32
+ ```
27
33
 
28
34
  To compare two CSV files where the field names are in the first row of the file,
29
35
  and the first field contains the unique key for each record, simply use:
30
- ```ruby
31
- require 'csv-diff'
32
36
 
33
- diff = CSVDiff.new(file1, file2)
34
- ```
37
+ ```ruby
38
+ require 'csv-diff'
39
+
40
+ diff = CSVDiff.new(file1, file2)
41
+ ```
35
42
 
36
43
  The returned diff object can be queried for the differences that exist between
37
44
  the two files, e.g.:
@@ -96,7 +103,7 @@ change in order) of all 6 rows.
96
103
 
97
104
  The more correct specification of this file is that column 0 contains a unique parent
98
105
  identifier, and column 1 contains a unique child identifier. CSVDiff can then correctly
99
- deduce that there is in fact only two changes in order - the swap in positions of A and
106
+ deduce that there are in fact only two changes in order - the swap in positions of A and
100
107
  B below Root.
101
108
 
102
109
  Note: If you aren't interested in changes in the order of siblings, then you could use
@@ -121,43 +128,59 @@ Warnings may be raised for any of the following:
121
128
  The simplest use case is as shown above, where the data to be diffed is in CSV files
122
129
  with the column names as the first record, and where the unique key is the first
123
130
  column in the data. In this case, a diff can be created simply via:
124
- ```ruby
125
- diff = CSVDiff.new(file1, file2)
126
- ```
127
131
 
128
- ### Specifynig Unique Row Identifiers
132
+ ```ruby
133
+ diff = CSVDiff.new(file1, file2)
134
+ ```
135
+
136
+ ### Specifying Unique Row Identifiers
129
137
 
130
138
  Often however, rows are not uniquely identifiable via the first column in the file.
131
139
  In a parent-child hierarchy, for example, combinations of parent and child may be
132
- necessary to uniquely identify a row. In these cases, it is necessary to indicate
133
- which fields are used to uniquely identify common rows across the two files. This
134
- can be done in several different ways.
140
+ necessary to uniquely identify a row, while in other cases a combination of fields
141
+ may be needed to derive a natural unique key or identifier for each row.
142
+ In these cases, it is necessary to indicate to CSVDiff which fields are needed to
143
+ uniquely identify common rows across the two files. This can be done in several
144
+ different ways.
145
+
146
+ #### :key_field(s)
135
147
 
136
- 1. Using the :key_fields option with field numbers (these are 0-based):
148
+ The first method is using the **key_fields** option (or key_field if you have only a
149
+ single key field). Use this option when your data represents a flat structure rather
150
+ than a parent-child hierarchy or flattened tree. You can specify key_fields using
151
+ field numbers/column indices (0-based):
137
152
 
138
153
  ```ruby
139
154
  diff = CSVDiff.new(file1, file2, key_fields: [0, 1])
140
155
  ```
141
156
 
142
- 2. Using the :key_fields options with column names:
157
+ Alternatively, you can use the :key_fields options with column names (provided CSVDiff
158
+ knows the names of your fields, either via the **field_names** option or from headers
159
+ in the file):
143
160
 
144
161
  ```ruby
145
- diff = CSVDiff.new(file1, file2, key_fields: ['Parent', 'Child'])
162
+ diff = CSVDiff.new(file1, file2, key_fields: ['First Name', 'Last Name'])
146
163
  ```
147
164
 
148
- 3. Using the :parent_fields and :child_fields with field numbers:
165
+ #### :parent_field(s)/:child_field(s)
166
+
167
+ The second method for identifying the unique identifiers in your file is to use the
168
+ :parent_fields and :child_fields options. Use this option when your data represents
169
+ a tree structure flattened to a table in parent-child form.
170
+
171
+ Using the :parent_fields and :child_fields with field numbers:
149
172
 
150
173
  ```ruby
151
174
  diff = CSVDiff.new(file1, file2, parent_field: 1, child_fields: [2, 3])
152
175
  ```
153
176
 
154
- 4. Using the :parent_fields and :child_fields with column names:
177
+ Using the :parent_fields and :child_fields with column names:
155
178
 
156
179
  ```ruby
157
180
  diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'])
158
181
  ```
159
182
 
160
- ### Using Non-CSV File Sources
183
+ ### Using Non-CSV Sources
161
184
 
162
185
  Data from non-CSV sources can be diffed, as long as it can be supplied as an Array
163
186
  of Arrays:
@@ -174,7 +197,53 @@ DATA2 = [
174
197
  ['A', 'A2', 'Account2']
175
198
  ]
176
199
 
177
- diff = CSVDiff.new(DATA1, DATA2, key_fields: [1, 0])
200
+ diff = CSVDiff.new(DATA1, DATA2, parent_field: 1, child_field: 0)
201
+ ```
202
+
203
+ Data can also be diffed if it is an XML source, although this requires a little
204
+ more effort to tell CSVDiff how to transform/extract content from the XML document
205
+ into an array-of-arrays form. It also introduces a dependency on Nokogiri - you
206
+ will need to install this gem to use CSVDiff with XML sources.
207
+
208
+ The first step is to use the CSVDiff::XMLSource class to define how to convert
209
+ your XML content to an array-of-arrays. The XMLSource class is quite flexible,
210
+ and can be used to convert single or multiple XML sources into a single data set
211
+ for diffing, and different documents may even have different layouts.
212
+
213
+ The first step is to create an XMLSource object, which requires a label to
214
+ identify the type of data it will generate:
215
+ ```ruby
216
+ xml_source_1 = CSVDiff::XMLSource.new('My Label')
217
+ ```
218
+
219
+ Next, we pass XML documents to this source, and specify XPath expressions for each
220
+ row and column of data to produce via the `process(rec_xpath, field_maps, options)`
221
+ method:
222
+
223
+ * An XPath expression is provided to select each node value in the document that
224
+ will represent a row. Taking an HTML table as an example of something we wanted
225
+ to parse, your rec_xpath value might be something like the following:
226
+ `'//table/tbody/tr'`. This would locate all tables in the document, and create
227
+ a new row of data in the XMLSource every time a `<tr>` tag was encountered.
228
+ * A hash of field_maps is then provided to describe how to generate column values
229
+ for each row of data. The keys to field_maps are the names of the fields to be
230
+ output, while the values are the epression for how to generate values. Most
231
+ commonly, this will be another XPath expression that is evaluated in the context
232
+ of the node returned by the row XPath expression. So continuing our HTML example,
233
+ we might use `'./td[0]/text()'` as an expression to select the content of the
234
+ first `<td>` element within the `<tr>` representing the current row.
235
+
236
+ ```ruby
237
+ xml_source1.process('//table/tbody/tr',
238
+ col_A: './td[0]/text()',
239
+ col_B: './td[1]/text()',
240
+ col_C: './td[2]/text()')
241
+ ```
242
+
243
+ Finally, to diff two XML sources, we create a CSVDiff object with two XMLSource
244
+ objects as the source:
245
+ ```ruby
246
+ diff = CSVDiff.new(xml_source1, xml_source2, key_field: 'col_A')
178
247
  ```
179
248
 
180
249
  ### Specifying Column Names
@@ -211,6 +280,23 @@ diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam'
211
280
  ignore_fields: ['CreatedAt', 'UpdatedAt'])
212
281
  ```
213
282
 
283
+ ### Filtering Rows
284
+
285
+ If you need to filter source data before running the diff process, you can use the :include
286
+ and :exclude options to do so. Both options take a Hash as their value; the hash should have
287
+ keys that are the field names or indexes (0-based) on which to filter, and whose values are
288
+ regular expressions or lambdas to be applied to values of the corresponding field. Rows will
289
+ only be diffed if they satisfy :include conditions, and do not satisfy :exclude conditions.
290
+ ```ruby
291
+ # Generate a diff of Arsenal home games not refereed by Clattenburg
292
+ diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
293
+ include: {HomeTeam: 'Arsenal'}, exclude: {Referee: /Clattenburg/})
294
+
295
+ # Generate a diff of games played over the Xmas/New Year period
296
+ diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
297
+ include: {Date: lambda{ |d| holiday_period.include?(Date.strptime(d, '%y/%m/%d')) } })
298
+ ```
299
+
214
300
  ### Ignoring Certain Changes
215
301
 
216
302
  CSVDiff identifies Adds, Updates, Moves and Deletes; any of these changes can be selectively
@@ -1,3 +1,4 @@
1
+ require 'csv-diff/source'
1
2
  require 'csv-diff/csv_source'
2
3
  require 'csv-diff/algorithm'
3
4
  require 'csv-diff/csv_diff'
@@ -3,6 +3,55 @@ class CSVDiff
3
3
  # Implements the CSV diff algorithm.
4
4
  module Algorithm
5
5
 
6
+ # Holds the details of a single difference
7
+ class Diff
8
+
9
+ attr_accessor :diff_type
10
+ attr_reader :fields
11
+ attr_reader :row
12
+ attr_reader :sibling_position
13
+
14
+ def initialize(diff_type, fields, row_idx, pos_idx)
15
+ @diff_type = diff_type
16
+ @fields = fields
17
+ @row = row_idx + 1
18
+ self.sibling_position = pos_idx
19
+ end
20
+
21
+
22
+ def sibling_position=(pos_idx)
23
+ if pos_idx.is_a?(Array)
24
+ pos_idx.compact!
25
+ if pos_idx.first != pos_idx.last
26
+ @sibling_position = pos_idx.map{ |pos| pos + 1 }
27
+ else
28
+ @sibling_position = pos_idx.first + 1
29
+ end
30
+ else
31
+ @sibling_position = pos_idx + 1
32
+ end
33
+ end
34
+
35
+
36
+ # For backwards compatibility and access to fields with differences
37
+ def [](key)
38
+ case key
39
+ when :action
40
+ a = diff_type.to_s
41
+ a[0] = a[0].upcase
42
+ a
43
+ when :row
44
+ @row
45
+ when :sibling_position
46
+ @sibling_position
47
+ else
48
+ @fields[key] || @fields[key.to_s.intern] || @fields[key.to_s]
49
+ end
50
+ end
51
+
52
+ end
53
+
54
+
6
55
  # Diffs two CSVSource structures.
7
56
  #
8
57
  # @param left [CSVSource] A CSVSource object containing the contents of
@@ -22,43 +71,53 @@ class CSVDiff
22
71
  # items that exist in both +left+ and +right+.
23
72
  # @option options [Boolean] :ignore_deletes If set to true, we ignore any
24
73
  # new items that appear only in +left+.
74
+ # @option options [Hash<Object,Proc>] :equality_procs A Hash mapping fields
75
+ # to a 2-arg Proc that should be used to compare values in that field for
76
+ # equality.
25
77
  def diff_sources(left, right, key_fields, diff_fields, options = {})
26
78
  unless left.case_sensitive? == right.case_sensitive?
27
79
  raise ArgumentError, "Left and right must have same settings for case-sensitivity"
28
80
  end
29
- case_sensitive = left.case_sensitive?
81
+ unless left.parent_fields.length == right.parent_fields.length
82
+ raise ArgumentError, "Left and right must have same settings for parent/child fields"
83
+ end
84
+
85
+ # Ensure key fields are not also in the diff_fields
86
+ diff_fields = diff_fields - key_fields
87
+
30
88
  left_index = left.index
31
89
  left_values = left.lines
32
90
  left_keys = left_values.keys
33
91
  right_index = right.index
34
92
  right_values = right.lines
35
93
  right_keys = right_values.keys
36
- parent_fields = left.parent_fields.length
94
+ parent_field_count = left.parent_fields.length
37
95
 
38
96
  include_adds = !options[:ignore_adds]
39
97
  include_moves = !options[:ignore_moves]
40
98
  include_updates = !options[:ignore_updates]
41
99
  include_deletes = !options[:ignore_deletes]
42
100
 
43
- diffs = Hash.new{ |h, k| h[k] = {} }
101
+ @case_sensitive = left.case_sensitive?
102
+ @equality_procs = options.fetch(:equality_procs, {})
103
+
104
+ diffs = {}
105
+ potential_moves = Hash.new{ |h, k| h[k] = [] }
44
106
 
45
107
  # First identify deletions
46
108
  if include_deletes
47
109
  (left_keys - right_keys).each do |key|
48
110
  # Delete
49
111
  key_vals = key.split('~', -1)
50
- parent = key_vals[0...parent_fields].join('~')
112
+ parent = key_vals[0...parent_field_count].join('~')
113
+ child = key_vals[parent_field_count..-1].join('~')
51
114
  left_parent = left_index[parent]
52
115
  left_value = left_values[key]
53
- left_idx = left_parent.index(key)
54
- next unless left_idx
55
- id = {}
56
- id[:row] = left_keys.index(key) + 1
57
- id[:sibling_position] = left_idx + 1
58
- key_fields.each do |field_name|
59
- id[field_name] = left_value[field_name]
60
- end
61
- diffs[key].merge!(id.merge(left_values[key].merge(:action => 'Delete')))
116
+ row_idx = left_keys.index(key)
117
+ sib_idx = left_parent.index(key)
118
+ raise "Can't locate key #{key} in parent #{parent}" unless sib_idx
119
+ diffs[key] = Diff.new(:delete, left_value, row_idx, sib_idx)
120
+ potential_moves[child] << key
62
121
  #puts "Delete: #{key}"
63
122
  end
64
123
  end
@@ -66,7 +125,7 @@ class CSVDiff
66
125
  # Now identify adds/updates
67
126
  right_keys.each_with_index do |key, right_row_id|
68
127
  key_vals = key.split('~', -1)
69
- parent = key_vals[0...parent_fields].join('~')
128
+ parent = key_vals[0...parent_field_count].join('~')
70
129
  left_parent = left_index[parent]
71
130
  right_parent = right_index[parent]
72
131
  left_value = left_values[key]
@@ -74,13 +133,12 @@ class CSVDiff
74
133
  left_idx = left_parent && left_parent.index(key)
75
134
  right_idx = right_parent && right_parent.index(key)
76
135
 
77
- id = {}
78
- id[:row] = right_row_id + 1
79
- id[:sibling_position] = right_idx + 1
80
- key_fields.each do |field_name|
81
- id[field_name] = right_value[field_name]
82
- end
83
136
  if left_idx && right_idx
137
+ if include_updates && (changes = diff_row(left_value, right_value, diff_fields))
138
+ id = id_fields(key_fields, right_value)
139
+ diffs[key] = Diff.new(:update, id.merge!(changes), right_row_id, right_idx)
140
+ #puts "Change: #{key}"
141
+ end
84
142
  if include_moves
85
143
  left_common = left_parent & right_parent
86
144
  right_common = right_parent & left_parent
@@ -88,19 +146,31 @@ class CSVDiff
88
146
  right_pos = right_common.index(key)
89
147
  if left_pos != right_pos
90
148
  # Move
91
- diffs[key].merge!(id.merge!(:action => 'Move',
92
- :sibling_position => [left_idx + 1, right_idx + 1]))
149
+ if d = diffs[key]
150
+ d.sibling_position = [left_idx, right_idx]
151
+ else
152
+ id = id_fields(key_fields, right_value)
153
+ diffs[key] = Diff.new(:move, id, right_row_id, [left_idx, right_idx])
154
+ end
93
155
  #puts "Move #{left_idx} -> #{right_idx}: #{key}"
94
156
  end
95
157
  end
96
- if include_updates && (changes = diff_row(left_value, right_value, diff_fields, case_sensitive))
97
- diffs[key].merge!(id.merge(changes.merge(:action => 'Update')))
98
- #puts "Change: #{key}"
99
- end
100
- elsif include_adds && right_idx
158
+ elsif right_idx
101
159
  # Add
102
- diffs[key].merge!(id.merge(right_values[key].merge(:action => 'Add')))
103
- #puts "Add: #{key}"
160
+ child = key_vals[parent_field_count..-1].join('~')
161
+ if potential_moves.has_key?(child) && old_key = potential_moves[child].pop
162
+ diffs.delete(old_key)
163
+ if include_updates
164
+ left_value = left_values[old_key]
165
+ id = id_fields(right.child_fields, right_value)
166
+ changes = diff_row(left_value, right_value, left.parent_fields + diff_fields)
167
+ diffs[key] = Diff.new(:update, id.merge!(changes), right_row_id, right_idx)
168
+ #puts "Update Parent: #{key}"
169
+ end
170
+ elsif include_adds
171
+ diffs[key] = Diff.new(:add, right_value, right_row_id, right_idx)
172
+ #puts "Add: #{key}"
173
+ end
104
174
  end
105
175
  end
106
176
 
@@ -116,27 +186,41 @@ class CSVDiff
116
186
  # @param right_row [Hash] The version of the CSV row from the right/to
117
187
  # file.
118
188
  # @param fields [Array<String>] An array of field names to compare.
119
- # @param case_sensitive [Boolean] Whether field comparisons should be
120
- # case sensitive or not.
121
189
  # @return [Hash<String, Array>] A Hash whose keys are the fields that
122
190
  # contain differences, and whose values are a two-element array of
123
191
  # [left/from, right/to] values.
124
- def diff_row(left_row, right_row, fields, case_sensitive)
192
+ def diff_row(left_row, right_row, fields)
125
193
  diffs = {}
126
194
  fields.each do |attr|
195
+ eq_proc = @equality_procs[attr]
127
196
  right_val = right_row[attr]
128
197
  right_val = nil if right_val == ""
129
198
  left_val = left_row[attr]
130
199
  left_val = nil if left_val == ""
131
- if (case_sensitive && left_val != right_val) ||
132
- (left_val.to_s.upcase != right_val.to_s.upcase)
200
+ if eq_proc
201
+ diffs[attr] = [left_val, right_val] unless eq_proc.call(left_val, right_val)
202
+ elsif @case_sensitive
203
+ diffs[attr] = [left_val, right_val] unless left_val == right_val
204
+ elsif (left_val.to_s.upcase != right_val.to_s.upcase)
133
205
  diffs[attr] = [left_val, right_val]
134
- #puts "#{attr}: #{left_val} -> #{right_val}"
135
206
  end
136
207
  end
137
208
  diffs if diffs.size > 0
138
209
  end
139
210
 
211
+
212
+ private
213
+
214
+
215
+ # Return a hash containing just the key field values
216
+ def id_fields(key_fields, fields)
217
+ id = {}
218
+ key_fields.each do |field_name|
219
+ id[field_name] = fields[field_name]
220
+ end
221
+ id
222
+ end
223
+
140
224
  end
141
225
 
142
226
  end
@@ -81,13 +81,15 @@ class CSVDiff
81
81
  # @option options [Boolean] :ignore_deletes If true, records that appear
82
82
  # in the left/from file but not in the right/to file are not reported.
83
83
  def initialize(left, right, options = {})
84
- @left = left.is_a?(CSVSource) ? left : CSVSource.new(left, options)
84
+ @left = left.is_a?(Source) ? left : CSVSource.new(left, options)
85
+ @left.index_source if @left.lines.nil?
85
86
  raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
86
- @right = right.is_a?(CSVSource) ? right : CSVSource.new(right, options)
87
+ @right = right.is_a?(Source) ? right : CSVSource.new(right, options)
88
+ @right.index_source if @right.lines.nil?
87
89
  raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
88
90
  @warnings = []
89
91
  @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options)
90
- @key_fields = @left.key_fields.map{ |kf| @diff_fields[kf] }
92
+ @key_fields = @left.key_fields
91
93
  diff(options)
92
94
  end
93
95
 
@@ -141,15 +143,13 @@ class CSVDiff
141
143
  ignore_fields = options.fetch(:ignore_fields, [])
142
144
  ignore_fields = [ignore_fields] unless ignore_fields.is_a?(Array)
143
145
  ignore_fields.map! do |f|
144
- (f.is_a?(Fixnum) ? right_fields[f] : f).upcase
146
+ (f.is_a?(Numeric) ? right_fields[f] : f).upcase
145
147
  end
146
148
  diff_fields = []
147
149
  if options[:diff_common_fields_only]
148
150
  right_fields.each_with_index do |fld, i|
149
151
  if left_fields.include?(fld)
150
152
  diff_fields << fld unless ignore_fields.include?(fld.upcase)
151
- else
152
- @warnings << "Field '#{fld}' is missing from the left (from) file, and won't be diffed"
153
153
  end
154
154
  end
155
155
  else
@@ -2,39 +2,7 @@ class CSVDiff
2
2
 
3
3
  # Represents a CSV input (i.e. the left/from or right/to input) to the diff
4
4
  # process.
5
- class CSVSource
6
-
7
- # @return [String] the path to the source file
8
- attr_accessor :path
9
- # @return [Array<String>] The names of the fields in the source file
10
- attr_reader :field_names
11
- # @return [Array<String>] The names of the field(s) that uniquely
12
- # identify each row.
13
- attr_reader :key_fields
14
- # @return [Array<String>] The names of the field(s) that identify a
15
- # common parent of child records.
16
- attr_reader :parent_fields
17
- # @return [Array<String>] The names of the field(s) that distinguish a
18
- # child of a parent record.
19
- attr_reader :child_fields
20
- # @return [Boolean] True if the source has been indexed with case-
21
- # sensitive keys, or false if it has been indexed using upper-case key
22
- # values.
23
- attr_reader :case_sensitive
24
- alias_method :case_sensitive?, :case_sensitive
25
- # @return [Boolean] True if leading/trailing whitespace should be stripped
26
- # from fields
27
- attr_reader :trim_whitespace
28
- # @return [Hash<String,Hash>] A hash containing each line of the source,
29
- # keyed on the values of the +key_fields+.
30
- attr_reader :lines
31
- # @return [Hash<String,Array<String>>] A hash containing each parent key,
32
- # and an Array of the child keys it is a parent of.
33
- attr_reader :index
34
- # @return [Array<String>] An array of any warnings encountered while
35
- # processing the source.
36
- attr_reader :warnings
37
-
5
+ class CSVSource < Source
38
6
 
39
7
  # Creates a new diff source.
40
8
  #
@@ -72,92 +40,32 @@ class CSVDiff
72
40
  # @option options [String] :child_field The name of the field(s) that
73
41
  # uniquely identify a child of a parent.
74
42
  # @option options [Boolean] :case_sensitive If true (the default), keys
75
- # are indexed as-is; if false, the index is built in upper-case for
76
- # case-insensitive comparisons.
43
+ # are indexed as-is; if false, the index is built in upper-case for
44
+ # case-insensitive comparisons.
45
+ # @option options [Hash] :include A hash of field name(s) or index(es) to
46
+ # regular expression(s). Only source rows whose field values satisfy the
47
+ # regular expressions will be indexed and included in the diff process.
48
+ # @option options [Hash] :exclude A hash of field name(s) or index(es) to
49
+ # regular expression(s). Source rows with a field value that satisfies
50
+ # the regular expressions will be excluded from the diff process.
77
51
  def initialize(source, options = {})
52
+ super(options)
78
53
  if source.is_a?(String)
79
54
  require 'csv'
80
55
  mode_string = options[:encoding] ? "r:#{options[:encoding]}" : 'r'
81
56
  csv_options = options.fetch(:csv_options, {})
82
57
  @path = source
83
- source = CSV.open(@path, mode_string, csv_options).readlines
84
- end
85
- if kf = options.fetch(:key_field, options[:key_fields])
86
- @key_fields = [kf].flatten
87
- @parent_fields = @key_fields[0...-1]
88
- @child_fields = @key_fields[-1..-1]
89
- else
90
- @parent_fields = [options.fetch(:parent_field, options[:parent_fields]) || []].flatten
91
- @child_fields = [options.fetch(:child_field, options[:child_fields]) || [0]].flatten
92
- @key_fields = @parent_fields + @child_fields
93
- end
94
- @field_names = options[:field_names]
95
- @warnings = []
96
- index_source(source, options)
97
- end
98
-
99
-
100
- # Returns the row in the CSV source corresponding to the supplied key.
101
- #
102
- # @param key [String] The unique key to use to lookup the row.
103
- # @return [Hash] The fields for the line corresponding to +key+, or nil
104
- # if the key is not recognised.
105
- def [](key)
106
- @lines[key]
107
- end
108
-
109
-
110
- private
111
-
112
- # Given an array of lines, where each line is an array of fields, indexes
113
- # the array contents so that it can be looked up by key.
114
- def index_source(lines, options)
115
- @lines = {}
116
- @index = Hash.new{ |h, k| h[k] = [] }
117
- @key_fields = find_field_indexes(@key_fields, @field_names) if @field_names
118
- @case_sensitive = options.fetch(:case_sensitive, true)
119
- @trim_whitespace = options.fetch(:trim_whitespace, false)
120
- line_num = 0
121
- lines.each do |row|
122
- line_num += 1
123
- next if line_num == 1 && @field_names && options[:ignore_header]
124
- unless @field_names
125
- @field_names = row
126
- @key_fields = find_field_indexes(@key_fields, @field_names)
127
- next
128
- end
129
- field_vals = row
130
- line = {}
131
- @field_names.each_with_index do |field, i|
132
- line[field] = field_vals[i]
133
- line[field].strip! if @trim_whitespace && line[field]
134
- end
135
- key_values = @key_fields.map{ |kf| field_vals[kf].to_s.upcase }
136
- key = key_values.join('~')
137
- parent_key = key_values[0...(@parent_fields.length)].join('~')
138
- parent_key.upcase! unless @case_sensitive
139
- if @lines[key]
140
- @warnings << "Duplicate key '#{key}' encountered and ignored at line #{line_num}"
141
- else
142
- @index[parent_key] << key
143
- @lines[key] = line
144
- end
145
- end
146
- end
147
-
148
-
149
- # Converts an array of field names to an array of indexes of the fields
150
- # matching those names.
151
- def find_field_indexes(key_fields, field_names)
152
- key_fields.map do |field|
153
- if field.is_a?(Fixnum)
154
- field
155
- else
156
- field_names.index{ |field_name| field.to_s.downcase == field_name.downcase } or
157
- raise ArgumentError, "Could not locate field '#{field}' in source field names: #{
158
- field_names.join(', ')}"
58
+ # When you call CSV.open, it's best to pass in a block so that after it's yielded,
59
+ # the underlying file handle is closed. Otherwise, you risk leaking the handle.
60
+ @data = CSV.open(@path, mode_string, csv_options) do |csv|
61
+ csv.readlines
159
62
  end
63
+ elsif source.is_a?(Enumerable) && source.size == 0 || (source.size > 0 && source.first.is_a?(Enumerable))
64
+ @data = source
65
+ else
66
+ raise ArgumentError, "source must be a path to a file or an Enumerable<Enumerable>"
160
67
  end
68
+ index_source
161
69
  end
162
70
 
163
71
  end
@@ -0,0 +1,289 @@
1
+ class CSVDiff
2
+
3
+ # Reppresents an input (i.e the left/from or tight/to input) to the diff
4
+ # process.
5
+ class Source
6
+
7
+ # @return [String] the path to the source file
8
+ attr_accessor :path
9
+ # @return [Array<Arrary>] The data for this source
10
+ attr_reader :data
11
+
12
+ # @return [Array<String>] The names of the fields in the source file
13
+ attr_reader :field_names
14
+ # @return [Array<String>] The names of the field(s) that uniquely
15
+ # identify each row.
16
+ attr_reader :key_fields
17
+ # @return [Array<String>] The names of the field(s) that identify a
18
+ # common parent of child records.
19
+ attr_reader :parent_fields
20
+ # @return [Array<String>] The names of the field(s) that distinguish a
21
+ # child of a parent record.
22
+ attr_reader :child_fields
23
+
24
+ # @return [Array<Fixnum>] The indexes of the key fields in the source
25
+ # file.
26
+ attr_reader :key_field_indexes
27
+ # @return [Array<Fixnum>] The indexes of the parent fields in the source
28
+ # file.
29
+ attr_reader :parent_field_indexes
30
+ # @return [Array<Fixnum>] The indexes of the child fields in the source
31
+ # file.
32
+ attr_reader :child_field_indexes
33
+
34
+ # @return [Boolean] True if the source has been indexed with case-
35
+ # sensitive keys, or false if it has been indexed using upper-case key
36
+ # values.
37
+ attr_reader :case_sensitive
38
+ alias_method :case_sensitive?, :case_sensitive
39
+ # @return [Boolean] True if leading/trailing whitespace should be stripped
40
+ # from fields
41
+ attr_reader :trim_whitespace
42
+ # @return [Hash<String,Hash>] A hash containing each line of the source,
43
+ # keyed on the values of the +key_fields+.
44
+ attr_reader :lines
45
+ # @return [Hash<String,Array<String>>] A hash containing each parent key,
46
+ # and an Array of the child keys it is a parent of.
47
+ attr_reader :index
48
+ # @return [Array<String>] An array of any warnings encountered while
49
+ # processing the source.
50
+ attr_reader :warnings
51
+ # @return [Fixnum] A count of the lines processed from this source.
52
+ # Excludes any header and duplicate records identified during indexing.
53
+ attr_reader :line_count
54
+ # @return [Fixnum] A count of the lines from this source that were skipped
55
+ # due to filter conditions.
56
+ attr_reader :skip_count
57
+ # @return [Fixnum] A count of the lines from this source that had the same
58
+ # key value as another line.
59
+ attr_reader :dup_count
60
+
61
+
62
+ # Creates a new diff source.
63
+ #
64
+ # A diff source must contain at least one field that will be used as the
65
+ # key to identify the same record in a different version of this file.
66
+ # If not specified via one of the options, the first field is assumed to
67
+ # be the unique key.
68
+ #
69
+ # If multiple fields combine to form a unique key, the combined fields
70
+ # are considered as a single unique identifier. If your key represents
71
+ # data that can be represented as a tree, you can instead break your key
72
+ # fields into :parent_fields and :child_fields. By doing this, if a child
73
+ # key is deleted from one parent, and added to another, that will be
74
+ # reported as an update, with a change to the parent key part(s) of the
75
+ # record.
76
+ #
77
+ # All key options can be specified either by field name, or by field
78
+ # index (0 based).
79
+ #
80
+ # @param options [Hash] An options hash.
81
+ # @option options [Array<String>] :field_names The names of each of the
82
+ # fields in +source+.
83
+ # @option options [Boolean] :ignore_header If true, and :field_names has
84
+ # been specified, then the first row of the file is ignored.
85
+ # @option options [String] :key_field The name of the field that uniquely
86
+ # identifies each row.
87
+ # @option options [Array<String>] :key_fields The names of the fields
88
+ # that uniquely identifies each row.
89
+ # @option options [String] :parent_field The name of the field(s) that
90
+ # identify a parent within which sibling order should be checked.
91
+ # @option options [String] :child_field The name of the field(s) that
92
+ # uniquely identify a child of a parent.
93
+ # @option options [Boolean] :case_sensitive If true (the default), keys
94
+ # are indexed as-is; if false, the index is built in upper-case for
95
+ # case-insensitive comparisons.
96
+ # @option options [Hash] :include A hash of field name(s) or index(es) to
97
+ # regular expression(s). Only source rows whose field values satisfy the
98
+ # regular expressions will be indexed and included in the diff process.
99
+ # @option options [Hash] :exclude A hash of field name(s) or index(es) to
100
+ # regular expression(s). Source rows with a field value that satisfies
101
+ # the regular expressions will be excluded from the diff process.
102
+ def initialize(options = {})
103
+ if (options.keys & [:parent_field, :parent_fields, :child_field, :child_fields]).empty? &&
104
+ (kf = options.fetch(:key_field, options[:key_fields]))
105
+ @key_fields = [kf].flatten
106
+ @parent_fields = []
107
+ @child_fields = @key_fields
108
+ else
109
+ @parent_fields = [options.fetch(:parent_field, options[:parent_fields]) || []].flatten
110
+ @child_fields = [options.fetch(:child_field, options[:child_fields]) || [0]].flatten
111
+ @key_fields = @parent_fields + @child_fields
112
+ end
113
+ @field_names = options[:field_names]
114
+ @case_sensitive = options.fetch(:case_sensitive, true)
115
+ @trim_whitespace = options.fetch(:trim_whitespace, false)
116
+ @ignore_header = options[:ignore_header]
117
+ @include = options[:include]
118
+ @exclude = options[:exclude]
119
+ @path = options.fetch(:path, 'NA') unless @path
120
+ @warnings = []
121
+ end
122
+
123
+
124
+ def path?
125
+ @path != 'NA'
126
+ end
127
+
128
+
129
+ # Returns the row in the CSV source corresponding to the supplied key.
130
+ #
131
+ # @param key [String] The unique key to use to lookup the row.
132
+ # @return [Hash] The fields for the line corresponding to +key+, or nil
133
+ # if the key is not recognised.
134
+ def [](key)
135
+ @lines[key]
136
+ end
137
+
138
+
139
+ # Given an array of lines, where each line is an array of fields, indexes
140
+ # the array contents so that it can be looked up by key.
141
+ def index_source
142
+ @lines = {}
143
+ @index = Hash.new{ |h, k| h[k] = [] }
144
+ if @field_names
145
+ index_fields
146
+ include_filter = convert_filter(@include, @field_names)
147
+ exclude_filter = convert_filter(@exclude, @field_names)
148
+ end
149
+ @line_count = 0
150
+ @skip_count = 0
151
+ @dup_count = 0
152
+ line_num = 0
153
+ @data.each do |row|
154
+ line_num += 1
155
+ next if line_num == 1 && @field_names && @ignore_header
156
+ unless @field_names
157
+ if row.class.name == 'CSV::Row'
158
+ @field_names = row.headers.each_with_index.map{ |f, i| f || i.to_s }
159
+ else
160
+ @field_names = row.each_with_index.map{ |f, i| f || i.to_s }
161
+ end
162
+ index_fields
163
+ include_filter = convert_filter(@include, @field_names)
164
+ exclude_filter = convert_filter(@exclude, @field_names)
165
+ next
166
+ end
167
+ field_vals = row
168
+ line = {}
169
+ filter = false
170
+ @field_names.each_with_index do |field, i|
171
+ val = field_vals[i]
172
+ val = val.to_s.strip if val && @trim_whitespace
173
+ line[field] = val
174
+ if include_filter && f = include_filter[i]
175
+ filter = !check_filter(f, line[field])
176
+ end
177
+ if exclude_filter && f = exclude_filter[i]
178
+ filter = check_filter(f, line[field])
179
+ end
180
+ break if filter
181
+ end
182
+ if filter
183
+ @skip_count += 1
184
+ next
185
+ end
186
+ key_values = @key_field_indexes.map{ |kf| @case_sensitive ?
187
+ field_vals[kf].to_s :
188
+ field_vals[kf].to_s.upcase }
189
+ key = key_values.join('~')
190
+ parent_key = key_values[0...(@parent_fields.length)].join('~')
191
+ if @lines[key]
192
+ @warnings << "Duplicate key '#{key}' encountered at line #{line_num}"
193
+ @dup_count += 1
194
+ key += "[#{@dup_count}]"
195
+ end
196
+ @index[parent_key] << key
197
+ @lines[key] = line
198
+ @line_count += 1
199
+ end
200
+ end
201
+
202
+
203
+ # Save the data in this Source as a CSV at +file_path+.
204
+ #
205
+ # @parma file_path [String] The target path to save the data to.
206
+ # @param options [Hash] A set of options to pass to CSV.open to control
207
+ # how the CSV is generated.
208
+ def save_csv(file_path, options = {})
209
+ require 'csv'
210
+ default_opts = {
211
+ headers: @field_name, write_headers: true
212
+ }
213
+ CSV.open(file_path, 'wb', default_opts.merge(options)) do |csv|
214
+ @data.each{ |rec| csv << rec }
215
+ end
216
+ end
217
+
218
+
219
+ # Convert the data in this source to Array<Hash> using the field names
220
+ # as keys for the Hash in each row.
221
+ def to_hash
222
+ @data.map do |row|
223
+ hsh = {}
224
+ @field_names.each_with_index.map{ |fld, i| hsh[fld] = row[i] }
225
+ hsh
226
+ end
227
+ end
228
+
229
+
230
+ private
231
+
232
+
233
+ def index_fields
234
+ @key_field_indexes = find_field_indexes(@key_fields, @field_names)
235
+ @parent_field_indexes = find_field_indexes(@parent_fields, @field_names)
236
+ @child_field_indexes = find_field_indexes(@child_fields, @field_names)
237
+ @key_fields = @key_field_indexes.map{ |i| @field_names[i] }
238
+ @parent_fields = @parent_field_indexes.map{ |i| @field_names[i] }
239
+ @child_fields = @child_field_indexes.map{ |i| @field_names[i] }
240
+ end
241
+
242
+
243
+ # Converts an array of field names to an array of indexes of the fields
244
+ # matching those names.
245
+ def find_field_indexes(key_fields, field_names)
246
+ key_fields.map do |field|
247
+ if field.is_a?(Integer)
248
+ field
249
+ else
250
+ field_names.index{ |field_name| field.to_s.downcase == field_name.to_s.downcase } or
251
+ raise ArgumentError, "Could not locate field '#{field}' in source field names: #{
252
+ field_names.join(', ')}"
253
+ end
254
+ end
255
+ end
256
+
257
+
258
+ def convert_filter(hsh, field_names)
259
+ return unless hsh
260
+ if !hsh.is_a?(Hash)
261
+ raise ArgumentError, ":include/:exclude option must be a Hash of field name(s)/index(es) to RegExp(s)"
262
+ end
263
+ keys = hsh.keys
264
+ idxs = find_field_indexes(keys, @field_names)
265
+ Hash[keys.each_with_index.map{ |k, i| [idxs[i], hsh[k]] }]
266
+ end
267
+
268
+
269
+ def check_filter(filter, field_val)
270
+ case filter
271
+ when String
272
+ if @case_sensitive
273
+ filter == field_val
274
+ else
275
+ filter.downcase == field_val.to_s.downcase
276
+ end
277
+ when Regexp
278
+ filter.match(field_val)
279
+ when Proc
280
+ filter.call(field_val)
281
+ else
282
+ raise ArgumentError, "Unsupported filter expression: #{filter.inspect}"
283
+ end
284
+ end
285
+
286
+ end
287
+
288
+ end
289
+
@@ -0,0 +1,142 @@
1
+ require 'nokogiri'
2
+ require 'cgi'
3
+
4
+
5
+ class CSVDiff
6
+
7
+ # Convert XML content to CSV format using XPath selectors to identify the
8
+ # rows and field values in an XML document
9
+ class XMLSource < Source
10
+
11
+ attr_accessor :context
12
+
13
+ # Create a new XMLSource, identified by +path+. Normally this is a path
14
+ # to the XML document, but any value is fine, as it is just a label to
15
+ # identify this data set.
16
+ #
17
+ # @param path [String] A label for this data set (often a path to the
18
+ # XML document used as the source).
19
+ # @param options [Hash] An options hash.
20
+ # @option options [Array<String>] :field_names The names of each of the
21
+ # fields in +source+.
22
+ # @option options [Boolean] :ignore_header If true, and :field_names has
23
+ # been specified, then the first row of the file is ignored.
24
+ # @option options [String] :key_field The name of the field that uniquely
25
+ # identifies each row.
26
+ # @option options [Array<String>] :key_fields The names of the fields
27
+ # that uniquely identifies each row.
28
+ # @option options [String] :parent_field The name of the field(s) that
29
+ # identify a parent within which sibling order should be checked.
30
+ # @option options [String] :child_field The name of the field(s) that
31
+ # uniquely identify a child of a parent.
32
+ # @option options [Boolean] :case_sensitive If true (the default), keys
33
+ # are indexed as-is; if false, the index is built in upper-case for
34
+ # case-insensitive comparisons.
35
+ # @option options [Hash] :include A hash of field name(s) or index(es) to
36
+ # regular expression(s). Only source rows whose field values satisfy the
37
+ # regular expressions will be indexed and included in the diff process.
38
+ # @option options [Hash] :exclude A hash of field name(s) or index(es) to
39
+ # regular expression(s). Source rows with a field value that satisfies
40
+ # the regular expressions will be excluded from the diff process.
41
+ # @option options [String] :context A context value from which fields
42
+ # can be populated using a Regexp.
43
+ def initialize(path, options = {})
44
+ super(options)
45
+ @path = path
46
+ @context = options[:context]
47
+ @data = []
48
+ end
49
+
50
+
51
+ # Process a +source+, converting the XML into a table of data, using
52
+ # +rec_xpath+ to identify the nodes that correspond each record that
53
+ # should appear in the output, and +field_maps+ to populate each field
54
+ # in each row.
55
+ #
56
+ # @param source [String|Array] may be a String containing XML content,
57
+ # an Array of paths to files containing XML content, or a path to
58
+ # a single file.
59
+ # @param rec_xpath [String] An XPath expression that selects all the
60
+ # items in the XML document that are to be converted into new rows.
61
+ # The returned items are not directly used to populate the fields,
62
+ # but provide a context for the field XPath expreessions that populate
63
+ # each field's content.
64
+ # @param field_maps [Hash<String, String>] A map of field names to
65
+ # expressions that are evaluated in the context of each row node
66
+ # selected by +rec_xpath+. The field expressions are typically XPath
67
+ # expressions evaluated in the context of the nodes returned by the
68
+ # +rec_xpath+. Alternatively, a String that is not an XPath expression
69
+ # is used as a literal value for a field, while a Regexp can also
70
+ # be used to pull a value from any context specified in the +options+
71
+ # hash. The Regexp should include a single grouping, as the value used
72
+ # will be the result in $1 after the match is performed.
73
+ # @param context [String] An optional context for the XML to be processed.
74
+ # The value passed here can be referenced in field map expressions
75
+ # using a Regexp, with the value of the first grouping in the regex
76
+ # being the value returned for the field.
77
+ def process(source, rec_xpath, field_maps, context = nil)
78
+ @field_names = field_maps.keys unless @field_names
79
+ case source
80
+ when Nokogiri::XML::Document
81
+ add_data(source, rec_xpath, field_maps, context || @context)
82
+ when /<\?xml/
83
+ doc = Nokogiri::XML(source)
84
+ add_data(doc, rec_xpath, field_maps, context || @context)
85
+ when Array
86
+ source.each{ |f| process_file(f, rec_xpath, field_maps) }
87
+ when String
88
+ process_file(source, rec_xpath, field_maps)
89
+ else
90
+ raise ArgumentError, "Unhandled source type #{source.class.name}"
91
+ end
92
+ @data
93
+ end
94
+
95
+
96
+ private
97
+
98
+
99
+ # Load the XML document at +file_path+ and process it into rows of data.
100
+ def process_file(file_path, rec_xpath, field_maps)
101
+ begin
102
+ File.open(file_path) do |f|
103
+ doc = Nokogiri::XML(f)
104
+ add_data(doc, rec_xpath, field_maps, @context || file_path)
105
+ end
106
+ rescue
107
+ STDERR.puts "An error occurred while attempting to open #{file_path}"
108
+ raise
109
+ end
110
+ end
111
+
112
+
113
+ # Locate records in +doc+ using +rec_xpath+ to identify the nodes that
114
+ # correspond to a new record in the data, and +field_maps+ to populate
115
+ # the fields in each row.
116
+ def add_data(doc, rec_xpath, field_maps, context)
117
+ doc.xpath(rec_xpath).each do |rec_node|
118
+ rec = []
119
+ field_maps.each do |field_name, expr|
120
+ case expr
121
+ when Regexp # Match context against Regexp and extract first grouping
122
+ if context
123
+ context =~ expr
124
+ rec << $1
125
+ else
126
+ rec << nil
127
+ end
128
+ when %r{[/(.@]} # XPath expression
129
+ res = rec_node.xpath(expr)
130
+ rec << CGI.unescape_html(res.to_s)
131
+ else # Use expr as the value for this field
132
+ rec << expr
133
+ end
134
+ end
135
+ @data << rec
136
+ end
137
+ end
138
+
139
+ end
140
+
141
+ end
142
+
metadata CHANGED
@@ -1,69 +1,80 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csv-diff
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
5
- prerelease:
4
+ version: 0.6.1
6
5
  platform: ruby
7
6
  authors:
8
7
  - Adam Gardiner
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2016-01-26 00:00:00.000000000 Z
11
+ date: 2020-10-21 00:00:00.000000000 Z
13
12
  dependencies: []
14
- description: ! " This library performs diffs of CSV files.\n\n Unlike
15
- a standard diff that compares line by line, and is sensitive to the\n ordering
16
- of records, CSV-Diff identifies common lines by key field(s), and\n then
17
- compares the contents of the fields in each line.\n\n Data may be supplied
18
- in the form of CSV files, or as an array of arrays. The\n diff process provides
19
- a fine level of control over what to diff, and can\n optionally ignore certain
20
- types of changes (e.g. changes in position).\n\n CSV-Diff is particularly
21
- well suited to data in parent-child format. Parent-\n child data does not
22
- lend itself well to standard text diffs, as small changes\n in the organisation
23
- of the tree at an upper level can lead to big movements\n in the position
24
- of descendant records. By instead matching records by key,\n CSV-Diff avoids
25
- this issue, while still being able to detect changes in\n sibling order.\n\n
26
- \ This gem implements the core diff algorithm, and handles the loading and\n
27
- \ diffing of CSV files. It returns a CSVDiff object, that contains the details\n
28
- \ of differences in object form. This is useful for projects that need diff\n
29
- \ capability, but want to handle the reporting of differences themselves.
30
- For\n a pre-built diff reporting capability, see the csv-diff-report gem,
31
- which\n provides a command-line tool for generating diff reports in HTML
32
- or Excel\n format.\n"
13
+ description: |2
14
+ This library performs diffs of CSV data, or any table-like source.
15
+
16
+ Unlike a standard diff that compares line by line, and is sensitive to the
17
+ ordering of records, CSV-Diff identifies common lines by key field(s), and
18
+ then compares the contents of the fields in each line.
19
+
20
+ Data may be supplied in the form of CSV files, or as an array of arrays. The
21
+ diff process provides a fine level of control over what to diff, and can
22
+ optionally ignore certain types of changes (e.g. changes in position).
23
+
24
+ CSV-Diff is particularly well suited to data in parent-child format. Parent-
25
+ child data does not lend itself well to standard text diffs, as small changes
26
+ in the organisation of the tree at an upper level can lead to big movements
27
+ in the position of descendant records. By instead matching records by key,
28
+ CSV-Diff avoids this issue, while still being able to detect changes in
29
+ sibling order.
30
+
31
+ This gem implements the core diff algorithm, and handles the loading and
32
+ diffing of CSV files (or Arrays of Arrays). It also supports converting
33
+ data in XML format into tabular form, so that it can then be processed
34
+ like any other CSV or table-like source. It returns a CSVDiff object
35
+ containing the details of differences in object form. This is useful for
36
+ projects that need diff capability, but want to handle the reporting or
37
+ actioning of differences themselves.
38
+
39
+ For a pre-built diff reporting capability, see the csv-diff-report gem,
40
+ which provides a command-line tool for generating diff reports in HTML,
41
+ Excel, or text formats.
33
42
  email: adam.b.gardiner@gmail.com
34
43
  executables: []
35
44
  extensions: []
36
45
  extra_rdoc_files: []
37
46
  files:
38
- - README.md
39
47
  - LICENSE
48
+ - README.md
49
+ - lib/csv-diff.rb
40
50
  - lib/csv-diff/algorithm.rb
41
51
  - lib/csv-diff/csv_diff.rb
42
52
  - lib/csv-diff/csv_source.rb
43
- - lib/csv-diff.rb
53
+ - lib/csv-diff/source.rb
54
+ - lib/csv-diff/xml_source.rb
44
55
  - lib/csv_diff.rb
45
56
  homepage: https://github.com/agardiner/csv-diff
46
- licenses: []
57
+ licenses:
58
+ - MIT
59
+ metadata: {}
47
60
  post_install_message: For command-line tools and diff reports, 'gem install csv-diff-report'
48
61
  rdoc_options: []
49
62
  require_paths:
50
63
  - lib
51
64
  required_ruby_version: !ruby/object:Gem::Requirement
52
- none: false
53
65
  requirements:
54
- - - ! '>='
66
+ - - ">="
55
67
  - !ruby/object:Gem::Version
56
68
  version: '0'
57
69
  required_rubygems_version: !ruby/object:Gem::Requirement
58
- none: false
59
70
  requirements:
60
- - - ! '>='
71
+ - - ">="
61
72
  - !ruby/object:Gem::Version
62
73
  version: '0'
63
74
  requirements: []
64
75
  rubyforge_project:
65
- rubygems_version: 1.8.21
76
+ rubygems_version: 2.5.2.3
66
77
  signing_key:
67
- specification_version: 3
68
- summary: CSV Diff is a library for generating diffs from data in CSV format
78
+ specification_version: 4
79
+ summary: CSV Diff is a library for generating diffs from data in CSV or XML format
69
80
  test_files: []