csv-diff 0.3.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- !binary "U0hBMQ==":
3
- metadata.gz: !binary |-
4
- MzM3MmMzNzU5ZDE5ZTA5MGI4OGQxNTBkMTE0NWM1MzYzNDVkYTFjYQ==
5
- data.tar.gz: !binary |-
6
- MDQ3Yzk3ZDc4ZTZiNTMwMzc1NGMxMTU5ZTBkMzdjMTMyNzE5OTYwMg==
2
+ SHA1:
3
+ metadata.gz: 9dde7ded89bb58f75505ae9237c97b8acd365c42
4
+ data.tar.gz: adcf17af6b67797c9b5fbab80b6c7d421cd73d6e
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- YTYxMDM1MjUxNDk3ZjE3YWUxZWJjNGFmMzQzMDUyNGJlOGUzMmI2MDVlNjg4
10
- OWJiYmQxNTI2MjBlMmQzNGFkZTk0ZGY1Y2I0ZDBkODljYjc4NDQ1ZDY1ODky
11
- ZDU4MDUwZjVjYWU0MTE5NTExMmExYTM0NTY1ZGMzYzc5NDkxYmE=
12
- data.tar.gz: !binary |-
13
- YzU4YWM2MmZjMjE4MjlhZDgxN2IxNmI1NmU2YjFiZTcwN2ZlNTZlYzE5MzZm
14
- NGJkZDNkODUzNGNlNjA2NGZlZWIyMWFmMjljOTQyNjE3OGU4YWFmY2UyZWE2
15
- YzhlZmI0YTcxMGY3N2Y2ODQ5NTM3ODcyMTUzOTJkODhjNjY0YWM=
6
+ metadata.gz: 1aad2d1174758488d1e984239ae38917211bf1e185d2b37350a402e145bde652b44e437774f2695bb33c2b48531637f6c9ed13fb5fbaa0b610ac9d7810ec16ed
7
+ data.tar.gz: 3aab02344cfa4f111433c616caa445f43d3f4515c65e8ef7175f2287996d8f6bc592e1daca5683bef5d561e21f6f23e954a1115854577cced4551582ec4fa293
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013, Adam Gardiner
1
+ Copyright (c) 2013-2016, Adam Gardiner
2
2
  All rights reserved.
3
3
 
4
4
  Redistribution and use in source and binary forms, with or without
data/README.md CHANGED
@@ -1,14 +1,19 @@
1
1
  # CSV-Diff
2
2
 
3
- CSV-Diff is a small library for performing diffs of CSV data.
3
+ CSV-Diff is a small library for performing diffs of tabular data, typically
4
+ data loaded from CSV files.
4
5
 
5
6
  Unlike a standard diff that compares line by line, and is sensitive to the
6
7
  ordering of records, CSV-Diff identifies common lines by key field(s), and
7
8
  then compares the contents of the fields in each line.
8
9
 
9
- Data may be supplied in the form of CSV files, or as an array of arrays. The
10
- diff process provides a fine level of control over what to diff, and can
11
- optionally ignore certain types of changes (e.g. changes in position).
10
+ Data may be supplied in the form of CSV files, or as an array of arrays.
11
+ More complex usage also allows you to specify XPath expressions to extract
12
+ tabular data from XML documents for diffing.
13
+
14
+ The diff process provides a fine level of control over what to diff, and can
15
+ optionally ignore certain types of changes (e.g. adds, deletes, changes in
16
+ position etc).
12
17
 
13
18
  CSV-Diff is particularly well suited to data in parent-child format. Parent-
14
19
  child data does not lend itself well to standard text diffs, as small changes
@@ -21,17 +26,19 @@ sibling order.
21
26
  ## Usage
22
27
 
23
28
  CSV-Diff is supplied as a gem, and has no dependencies. To use it, simply:
24
- ```
25
- gem install csv-diff
26
- ```
29
+
30
+ ```
31
+ gem install csv-diff
32
+ ```
27
33
 
28
34
  To compare two CSV files where the field names are in the first row of the file,
29
35
  and the first field contains the unique key for each record, simply use:
30
- ```ruby
31
- require 'csv-diff'
32
36
 
33
- diff = CSVDiff.new(file1, file2)
34
- ```
37
+ ```ruby
38
+ require 'csv-diff'
39
+
40
+ diff = CSVDiff.new(file1, file2)
41
+ ```
35
42
 
36
43
  The returned diff object can be queried for the differences that exist between
37
44
  the two files, e.g.:
@@ -96,7 +103,7 @@ change in order) of all 6 rows.
96
103
 
97
104
  The more correct specification of this file is that column 0 contains a unique parent
98
105
  identifier, and column 1 contains a unique child identifier. CSVDiff can then correctly
99
- deduce that there is in fact only two changes in order - the swap in positions of A and
106
+ deduce that there are in fact only two changes in order - the swap in positions of A and
100
107
  B below Root.
101
108
 
102
109
  Note: If you aren't interested in changes in the order of siblings, then you could use
@@ -121,43 +128,59 @@ Warnings may be raised for any of the following:
121
128
  The simplest use case is as shown above, where the data to be diffed is in CSV files
122
129
  with the column names as the first record, and where the unique key is the first
123
130
  column in the data. In this case, a diff can be created simply via:
124
- ```ruby
125
- diff = CSVDiff.new(file1, file2)
126
- ```
127
131
 
128
- ### Specifynig Unique Row Identifiers
132
+ ```ruby
133
+ diff = CSVDiff.new(file1, file2)
134
+ ```
135
+
136
+ ### Specifying Unique Row Identifiers
129
137
 
130
138
  Often however, rows are not uniquely identifiable via the first column in the file.
131
139
  In a parent-child hierarchy, for example, combinations of parent and child may be
132
- necessary to uniquely identify a row. In these cases, it is necessary to indicate
133
- which fields are used to uniquely identify common rows across the two files. This
134
- can be done in several different ways.
140
+ necessary to uniquely identify a row, while in other cases a combination of fields
141
+ may be needed to derive a natural unique key or identifier for each row.
142
+ In these cases, it is necessary to indicate to CSVDiff which fields are needed to
143
+ uniquely identify common rows across the two files. This can be done in several
144
+ different ways.
145
+
146
+ #### :key_field(s)
135
147
 
136
- 1. Using the :key_fields option with field numbers (these are 0-based):
148
+ The first method is using the **key_fields** option (or key_field if you have only a
149
+ single key field). Use this option when your data represents a flat structure rather
150
+ than a parent-child hierarchy or flattened tree. You can specify key_fields using
151
+ field numbers/column indices (0-based):
137
152
 
138
153
  ```ruby
139
154
  diff = CSVDiff.new(file1, file2, key_fields: [0, 1])
140
155
  ```
141
156
 
142
- 2. Using the :key_fields options with column names:
157
+ Alternatively, you can use the :key_fields options with column names (provided CSVDiff
158
+ knows the names of your fields, either via the **field_names** option or from headers
159
+ in the file):
143
160
 
144
161
  ```ruby
145
- diff = CSVDiff.new(file1, file2, key_fields: ['Parent', 'Child'])
162
+ diff = CSVDiff.new(file1, file2, key_fields: ['First Name', 'Last Name'])
146
163
  ```
147
164
 
148
- 3. Using the :parent_fields and :child_fields with field numbers:
165
+ #### :parent_field(s)/:child_field(s)
166
+
167
+ The second method for identifying the unique identifiers in your file is to use the
168
+ :parent_fields and :child_fields options. Use this option when your data represents
169
+ a tree structure flattened to a table in parent-child form.
170
+
171
+ Using the :parent_fields and :child_fields with field numbers:
149
172
 
150
173
  ```ruby
151
174
  diff = CSVDiff.new(file1, file2, parent_field: 1, child_fields: [2, 3])
152
175
  ```
153
176
 
154
- 4. Using the :parent_fields and :child_fields with column names:
177
+ Using the :parent_fields and :child_fields with column names:
155
178
 
156
179
  ```ruby
157
180
  diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'])
158
181
  ```
159
182
 
160
- ### Using Non-CSV File Sources
183
+ ### Using Non-CSV Sources
161
184
 
162
185
  Data from non-CSV sources can be diffed, as long as it can be supplied as an Array
163
186
  of Arrays:
@@ -174,7 +197,53 @@ DATA2 = [
174
197
  ['A', 'A2', 'Account2']
175
198
  ]
176
199
 
177
- diff = CSVDiff.new(DATA1, DATA2, key_fields: [1, 0])
200
+ diff = CSVDiff.new(DATA1, DATA2, parent_field: 1, child_field: 0)
201
+ ```
202
+
203
+ Data can also be diffed if it is an XML source, although this requires a little
204
+ more effort to tell CSVDiff how to transform/extract content from the XML document
205
+ into an array-of-arrays form. It also introduces a dependency on Nokogiri - you
206
+ will need to install this gem to use CSVDiff with XML sources.
207
+
208
+ The first step is to use the CSVDiff::XMLSource class to define how to convert
209
+ your XML content to an array-of-arrays. The XMLSource class is quite flexible,
210
+ and can be used to convert single or multiple XML sources into a single data set
211
+ for diffing, and different documents may even have different layouts.
212
+
213
+ The first step is to create an XMLSource object, which requires a label to
214
+ identify the type of data it will generate:
215
+ ```ruby
216
+ xml_source_1 = CSVDiff::XMLSource.new('My Label')
217
+ ```
218
+
219
+ Next, we pass XML documents to this source, and specify XPath expressions for each
220
+ row and column of data to produce via the `process(rec_xpath, field_maps, options)`
221
+ method:
222
+
223
+ * An XPath expression is provided to select each node value in the document that
224
+ will represent a row. Taking an HTML table as an example of something we wanted
225
+ to parse, your rec_xpath value might be something like the following:
226
+ `'//table/tbody/tr'`. This would locate all tables in the document, and create
227
+ a new row of data in the XMLSource every time a `<tr>` tag was encountered.
228
+ * A hash of field_maps is then provided to describe how to generate column values
229
+ for each row of data. The keys to field_maps are the names of the fields to be
230
+ output, while the values are the epression for how to generate values. Most
231
+ commonly, this will be another XPath expression that is evaluated in the context
232
+ of the node returned by the row XPath expression. So continuing our HTML example,
233
+ we might use `'./td[0]/text()'` as an expression to select the content of the
234
+ first `<td>` element within the `<tr>` representing the current row.
235
+
236
+ ```ruby
237
+ xml_source1.process('//table/tbody/tr',
238
+ col_A: './td[0]/text()',
239
+ col_B: './td[1]/text()',
240
+ col_C: './td[2]/text()')
241
+ ```
242
+
243
+ Finally, to diff two XML sources, we create a CSVDiff object with two XMLSource
244
+ objects as the source:
245
+ ```ruby
246
+ diff = CSVDiff.new(xml_source1, xml_source2, key_field: 'col_A')
178
247
  ```
179
248
 
180
249
  ### Specifying Column Names
@@ -211,6 +280,23 @@ diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam'
211
280
  ignore_fields: ['CreatedAt', 'UpdatedAt'])
212
281
  ```
213
282
 
283
+ ### Filtering Rows
284
+
285
+ If you need to filter source data before running the diff process, you can use the :include
286
+ and :exclude options to do so. Both options take a Hash as their value; the hash should have
287
+ keys that are the field names or indexes (0-based) on which to filter, and whose values are
288
+ regular expressions or lambdas to be applied to values of the corresponding field. Rows will
289
+ only be diffed if they satisfy :include conditions, and do not satisfy :exclude conditions.
290
+ ```ruby
291
+ # Generate a diff of Arsenal home games not refereed by Clattenburg
292
+ diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
293
+ include: {HomeTeam: 'Arsenal'}, exclude: {Referee: /Clattenburg/})
294
+
295
+ # Generate a diff of games played over the Xmas/New Year period
296
+ diff = CSVDiff.new(file1, file2, parent_field: 'Date', child_fields: ['HomeTeam', 'AwayTeam'],
297
+ include: {Date: lambda{ |d| holiday_period.include?(Date.strptime(d, '%y/%m/%d')) } })
298
+ ```
299
+
214
300
  ### Ignoring Certain Changes
215
301
 
216
302
  CSVDiff identifies Adds, Updates, Moves and Deletes; any of these changes can be selectively
@@ -1,3 +1,4 @@
1
+ require 'csv-diff/source'
1
2
  require 'csv-diff/csv_source'
2
3
  require 'csv-diff/algorithm'
3
4
  require 'csv-diff/csv_diff'
@@ -3,6 +3,55 @@ class CSVDiff
3
3
  # Implements the CSV diff algorithm.
4
4
  module Algorithm
5
5
 
6
+ # Holds the details of a single difference
7
+ class Diff
8
+
9
+ attr_accessor :diff_type
10
+ attr_reader :fields
11
+ attr_reader :row
12
+ attr_reader :sibling_position
13
+
14
+ def initialize(diff_type, fields, row_idx, pos_idx)
15
+ @diff_type = diff_type
16
+ @fields = fields
17
+ @row = row_idx + 1
18
+ self.sibling_position = pos_idx
19
+ end
20
+
21
+
22
+ def sibling_position=(pos_idx)
23
+ if pos_idx.is_a?(Array)
24
+ pos_idx.compact!
25
+ if pos_idx.first != pos_idx.last
26
+ @sibling_position = pos_idx.map{ |pos| pos + 1 }
27
+ else
28
+ @sibling_position = pos_idx.first + 1
29
+ end
30
+ else
31
+ @sibling_position = pos_idx + 1
32
+ end
33
+ end
34
+
35
+
36
+ # For backwards compatibility and access to fields with differences
37
+ def [](key)
38
+ case key
39
+ when :action
40
+ a = diff_type.to_s
41
+ a[0] = a[0].upcase
42
+ a
43
+ when :row
44
+ @row
45
+ when :sibling_position
46
+ @sibling_position
47
+ else
48
+ @fields[key]
49
+ end
50
+ end
51
+
52
+ end
53
+
54
+
6
55
  # Diffs two CSVSource structures.
7
56
  #
8
57
  # @param left [CSVSource] A CSVSource object containing the contents of
@@ -22,28 +71,61 @@ class CSVDiff
22
71
  # items that exist in both +left+ and +right+.
23
72
  # @option options [Boolean] :ignore_deletes If set to true, we ignore any
24
73
  # new items that appear only in +left+.
74
+ # @option options [Hash<Object,Proc>] :equality_procs A Hash mapping fields
75
+ # to a 2-arg Proc that should be used to compare values in that field for
76
+ # equality.
25
77
  def diff_sources(left, right, key_fields, diff_fields, options = {})
26
78
  unless left.case_sensitive? == right.case_sensitive?
27
79
  raise ArgumentError, "Left and right must have same settings for case-sensitivity"
28
80
  end
29
- case_sensitive = left.case_sensitive?
81
+ unless left.parent_fields.length == right.parent_fields.length
82
+ raise ArgumentError, "Left and right must have same settings for parent/child fields"
83
+ end
84
+
85
+ # Ensure key fields are not also in the diff_fields
86
+ diff_fields = diff_fields - key_fields
87
+
30
88
  left_index = left.index
31
89
  left_values = left.lines
32
90
  left_keys = left_values.keys
33
91
  right_index = right.index
34
92
  right_values = right.lines
35
93
  right_keys = right_values.keys
36
- parent_fields = left.parent_fields.length
94
+ parent_field_count = left.parent_fields.length
37
95
 
38
96
  include_adds = !options[:ignore_adds]
39
97
  include_moves = !options[:ignore_moves]
40
98
  include_updates = !options[:ignore_updates]
41
99
  include_deletes = !options[:ignore_deletes]
42
100
 
43
- diffs = Hash.new{ |h, k| h[k] = {} }
101
+ @case_sensitive = left.case_sensitive?
102
+ @equality_procs = options.fetch(:equality_procs, {})
103
+
104
+ diffs = {}
105
+ potential_moves = Hash.new{ |h, k| h[k] = [] }
106
+
107
+ # First identify deletions
108
+ if include_deletes
109
+ (left_keys - right_keys).each do |key|
110
+ # Delete
111
+ key_vals = key.split('~', -1)
112
+ parent = key_vals[0...parent_field_count].join('~')
113
+ child = key_vals[parent_field_count..-1].join('~')
114
+ left_parent = left_index[parent]
115
+ left_value = left_values[key]
116
+ row_idx = left_keys.index(key)
117
+ sib_idx = left_parent.index(key)
118
+ raise "Can't locate key #{key} in parent #{parent}" unless sib_idx
119
+ diffs[key] = Diff.new(:delete, left_value, row_idx, sib_idx)
120
+ potential_moves[child] << key
121
+ #puts "Delete: #{key}"
122
+ end
123
+ end
124
+
125
+ # Now identify adds/updates
44
126
  right_keys.each_with_index do |key, right_row_id|
45
- key_vals = key.split('~')
46
- parent = key_vals[0...parent_fields].join('~')
127
+ key_vals = key.split('~', -1)
128
+ parent = key_vals[0...parent_field_count].join('~')
47
129
  left_parent = left_index[parent]
48
130
  right_parent = right_index[parent]
49
131
  left_value = left_values[key]
@@ -51,13 +133,12 @@ class CSVDiff
51
133
  left_idx = left_parent && left_parent.index(key)
52
134
  right_idx = right_parent && right_parent.index(key)
53
135
 
54
- id = {}
55
- id[:row] = right_row_id + 1
56
- id[:sibling_position] = right_idx + 1
57
- key_fields.each do |field_name|
58
- id[field_name] = right_value[field_name]
59
- end
60
136
  if left_idx && right_idx
137
+ if include_updates && (changes = diff_row(left_value, right_value, diff_fields))
138
+ id = id_fields(key_fields, right_value)
139
+ diffs[key] = Diff.new(:update, id.merge!(changes), right_row_id, right_idx)
140
+ #puts "Change: #{key}"
141
+ end
61
142
  if include_moves
62
143
  left_common = left_parent & right_parent
63
144
  right_common = right_parent & left_parent
@@ -65,42 +146,34 @@ class CSVDiff
65
146
  right_pos = right_common.index(key)
66
147
  if left_pos != right_pos
67
148
  # Move
68
- diffs[key].merge!(id.merge!(:action => 'Move',
69
- :sibling_position => [left_idx + 1, right_idx + 1]))
149
+ if d = diffs[key]
150
+ d.sibling_position = [left_idx, right_idx]
151
+ else
152
+ id = id_fields(key_fields, right_value)
153
+ diffs[key] = Diff.new(:move, id, right_row_id, [left_idx, right_idx])
154
+ end
70
155
  #puts "Move #{left_idx} -> #{right_idx}: #{key}"
71
156
  end
72
157
  end
73
- if include_updates && (changes = diff_row(left_value, right_value, diff_fields, case_sensitive))
74
- diffs[key].merge!(id.merge(changes.merge(:action => 'Update')))
75
- #puts "Change: #{key}"
76
- end
77
- elsif include_adds && right_idx
158
+ elsif right_idx
78
159
  # Add
79
- diffs[key].merge!(id.merge(right_values[key].merge(:action => 'Add')))
80
- #puts "Add: #{key}"
81
- end
82
- end
83
-
84
- # Now identify deletions
85
- if include_deletes
86
- (left_keys - right_keys).each do |key|
87
- # Delete
88
- key_vals = key.split('~')
89
- parent = key_vals[0...parent_fields].join('~')
90
- left_parent = left_index[parent]
91
- left_value = left_values[key]
92
- left_idx = left_parent.index(key)
93
- next unless left_idx
94
- id = {}
95
- id[:row] = left_keys.index(key) + 1
96
- id[:sibling_position] = left_idx + 1
97
- key_fields.each do |field_name|
98
- id[field_name] = left_value[field_name]
160
+ child = key_vals[parent_field_count..-1].join('~')
161
+ if potential_moves.has_key?(child) && old_key = potential_moves[child].pop
162
+ diffs.delete(old_key)
163
+ if include_updates
164
+ left_value = left_values[old_key]
165
+ id = id_fields(right.child_fields, right_value)
166
+ changes = diff_row(left_value, right_value, left.parent_fields + diff_fields)
167
+ diffs[key] = Diff.new(:update, id.merge!(changes), right_row_id, right_idx)
168
+ #puts "Update Parent: #{key}"
169
+ end
170
+ elsif include_adds
171
+ diffs[key] = Diff.new(:add, right_value, right_row_id, right_idx)
172
+ #puts "Add: #{key}"
99
173
  end
100
- diffs[key].merge!(id.merge(left_values[key].merge(:action => 'Delete')))
101
- #puts "Delete: #{key}"
102
174
  end
103
175
  end
176
+
104
177
  diffs
105
178
  end
106
179
 
@@ -113,27 +186,41 @@ class CSVDiff
113
186
  # @param right_row [Hash] The version of the CSV row from the right/to
114
187
  # file.
115
188
  # @param fields [Array<String>] An array of field names to compare.
116
- # @param case_sensitive [Boolean] Whether field comparisons should be
117
- # case sensitive or not.
118
189
  # @return [Hash<String, Array>] A Hash whose keys are the fields that
119
190
  # contain differences, and whose values are a two-element array of
120
191
  # [left/from, right/to] values.
121
- def diff_row(left_row, right_row, fields, case_sensitive)
192
+ def diff_row(left_row, right_row, fields)
122
193
  diffs = {}
123
194
  fields.each do |attr|
195
+ eq_proc = @equality_procs[attr]
124
196
  right_val = right_row[attr]
125
197
  right_val = nil if right_val == ""
126
198
  left_val = left_row[attr]
127
199
  left_val = nil if left_val == ""
128
- if (case_sensitive && left_val != right_val) ||
129
- (left_val.to_s.upcase != right_val.to_s.upcase)
200
+ if eq_proc
201
+ diffs[attr] = [left_val, right_val] unless eq_proc.call(left_val, right_val)
202
+ elsif @case_sensitive
203
+ diffs[attr] = [left_val, right_val] unless left_val == right_val
204
+ elsif (left_val.to_s.upcase != right_val.to_s.upcase)
130
205
  diffs[attr] = [left_val, right_val]
131
- #puts "#{attr}: #{left_val} -> #{right_val}"
132
206
  end
133
207
  end
134
208
  diffs if diffs.size > 0
135
209
  end
136
210
 
211
+
212
+ private
213
+
214
+
215
+ # Return a hash containing just the key field values
216
+ def id_fields(key_fields, fields)
217
+ id = {}
218
+ key_fields.each do |field_name|
219
+ id[field_name] = fields[field_name]
220
+ end
221
+ id
222
+ end
223
+
137
224
  end
138
225
 
139
226
  end
@@ -81,13 +81,15 @@ class CSVDiff
81
81
  # @option options [Boolean] :ignore_deletes If true, records that appear
82
82
  # in the left/from file but not in the right/to file are not reported.
83
83
  def initialize(left, right, options = {})
84
- @left = left.is_a?(CSVSource) ? left : CSVSource.new(left, options)
84
+ @left = left.is_a?(Source) ? left : CSVSource.new(left, options)
85
+ @left.index_source if @left.lines.nil?
85
86
  raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
86
- @right = right.is_a?(CSVSource) ? right : CSVSource.new(right, options)
87
+ @right = right.is_a?(Source) ? right : CSVSource.new(right, options)
88
+ @right.index_source if @right.lines.nil?
87
89
  raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
88
90
  @warnings = []
89
91
  @diff_fields = get_diff_fields(@left.field_names, @right.field_names, options)
90
- @key_fields = @left.key_fields.map{ |kf| @diff_fields[kf] }
92
+ @key_fields = @left.key_fields
91
93
  diff(options)
92
94
  end
93
95
 
@@ -95,8 +97,8 @@ class CSVDiff
95
97
  # Performs a diff with the specified +options+.
96
98
  def diff(options = {})
97
99
  @summary = nil
98
- @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
99
100
  @options = options
101
+ @diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
100
102
  end
101
103
 
102
104
 
@@ -138,20 +140,20 @@ class CSVDiff
138
140
  # Given two sets of field names, determines the common set of fields present
139
141
  # in both, on which members can be diffed.
140
142
  def get_diff_fields(left_fields, right_fields, options)
141
- ignore_fields = (options[:ignore_fields] || []).map do |f|
142
- f.is_a?(Fixnum) ? right_fields[f] : f
143
+ ignore_fields = options.fetch(:ignore_fields, [])
144
+ ignore_fields = [ignore_fields] unless ignore_fields.is_a?(Array)
145
+ ignore_fields.map! do |f|
146
+ (f.is_a?(Numeric) ? right_fields[f] : f).upcase
143
147
  end
144
148
  diff_fields = []
145
149
  if options[:diff_common_fields_only]
146
150
  right_fields.each_with_index do |fld, i|
147
151
  if left_fields.include?(fld)
148
- diff_fields << fld unless ignore_fields.include?(fld)
149
- else
150
- @warnings << "Field '#{fld}' is missing from the left (from) file, and won't be diffed"
152
+ diff_fields << fld unless ignore_fields.include?(fld.upcase)
151
153
  end
152
154
  end
153
155
  else
154
- diff_fields = (right_fields + left_fields).uniq.reject{ |fld| ignore_fields.include?(fld) }
156
+ diff_fields = (right_fields + left_fields).uniq.reject{ |fld| ignore_fields.include?(fld.upcase) }
155
157
  end
156
158
  diff_fields
157
159
  end
@@ -2,36 +2,7 @@ class CSVDiff
2
2
 
3
3
  # Represents a CSV input (i.e. the left/from or right/to input) to the diff
4
4
  # process.
5
- class CSVSource
6
-
7
- # @return [String] the path to the source file
8
- attr_accessor :path
9
- # @return [Array<String>] The names of the fields in the source file
10
- attr_reader :field_names
11
- # @return [Array<String>] The names of the field(s) that uniquely
12
- # identify each row.
13
- attr_reader :key_fields
14
- # @return [Array<String>] The names of the field(s) that identify a
15
- # common parent of child records.
16
- attr_reader :parent_fields
17
- # @return [Array<String>] The names of the field(s) that distinguish a
18
- # child of a parent record.
19
- attr_reader :child_fields
20
- # @return [Boolean] True if the source has been indexed with case-
21
- # sensitive keys, or false if it has been indexed using upper-case key
22
- # values.
23
- attr_reader :case_sensitive
24
- alias_method :case_sensitive?, :case_sensitive
25
- # @return [Hash<String,Hash>] A hash containing each line of the source,
26
- # keyed on the values of the +key_fields+.
27
- attr_reader :lines
28
- # @return [Hash<String,Array<String>>] A hash containing each parent key,
29
- # and an Array of the child keys it is a parent of.
30
- attr_reader :index
31
- # @return [Array<String>] An array of any warnings encountered while
32
- # processing the source.
33
- attr_reader :warnings
34
-
5
+ class CSVSource < Source
35
6
 
36
7
  # Creates a new diff source.
37
8
  #
@@ -69,90 +40,32 @@ class CSVDiff
69
40
  # @option options [String] :child_field The name of the field(s) that
70
41
  # uniquely identify a child of a parent.
71
42
  # @option options [Boolean] :case_sensitive If true (the default), keys
72
- # are indexed as-is; if false, the index is built in upper-case for
73
- # case-insensitive comparisons.
43
+ # are indexed as-is; if false, the index is built in upper-case for
44
+ # case-insensitive comparisons.
45
+ # @option options [Hash] :include A hash of field name(s) or index(es) to
46
+ # regular expression(s). Only source rows whose field values satisfy the
47
+ # regular expressions will be indexed and included in the diff process.
48
+ # @option options [Hash] :exclude A hash of field name(s) or index(es) to
49
+ # regular expression(s). Source rows with a field value that satisfies
50
+ # the regular expressions will be excluded from the diff process.
74
51
  def initialize(source, options = {})
52
+ super(options)
75
53
  if source.is_a?(String)
76
54
  require 'csv'
77
55
  mode_string = options[:encoding] ? "r:#{options[:encoding]}" : 'r'
78
56
  csv_options = options.fetch(:csv_options, {})
79
57
  @path = source
80
- source = CSV.open(@path, mode_string, csv_options).readlines
81
- end
82
- if kf = options.fetch(:key_field, options[:key_fields])
83
- @key_fields = [kf].flatten
84
- @parent_fields = @key_fields[0...-1]
85
- @child_fields = @key_fields[-1..-1]
86
- else
87
- @parent_fields = [options.fetch(:parent_field, options[:parent_fields]) || []].flatten
88
- @child_fields = [options.fetch(:child_field, options[:child_fields]) || [0]].flatten
89
- @key_fields = @parent_fields + @child_fields
90
- end
91
- @field_names = options[:field_names]
92
- @warnings = []
93
- index_source(source, options)
94
- end
95
-
96
-
97
- # Returns the row in the CSV source corresponding to the supplied key.
98
- #
99
- # @param key [String] The unique key to use to lookup the row.
100
- # @return [Hash] The fields for the line corresponding to +key+, or nil
101
- # if the key is not recognised.
102
- def [](key)
103
- @lines[key]
104
- end
105
-
106
-
107
- private
108
-
109
- # Given an array of lines, where each line is an array of fields, indexes
110
- # the array contents so that it can be looked up by key.
111
- def index_source(lines, options)
112
- @lines = {}
113
- @index = Hash.new{ |h, k| h[k] = [] }
114
- @key_fields = find_field_indexes(@key_fields, @field_names) if @field_names
115
- @case_sensitive = options.fetch(:case_sensitive, true)
116
- line_num = 0
117
- lines.each do |row|
118
- line_num += 1
119
- next if line_num == 1 && @field_names && options[:ignore_header]
120
- unless @field_names
121
- @field_names = row
122
- @key_fields = find_field_indexes(@key_fields, @field_names)
123
- next
124
- end
125
- field_vals = row
126
- line = {}
127
- @field_names.each_with_index do |field, i|
128
- line[field] = field_vals[i]
129
- end
130
- key_values = @key_fields.map{ |kf| field_vals[kf].to_s.upcase }
131
- key = key_values.join('~')
132
- parent_key = key_values[0...(@parent_fields.length)].join('~')
133
- parent_key.upcase! unless @case_sensitive
134
- if @lines[key]
135
- @warnings << "Duplicate key '#{key}' encountered and ignored at line #{line_num}"
136
- else
137
- @index[parent_key] << key
138
- @lines[key] = line
139
- end
140
- end
141
- end
142
-
143
-
144
- # Converts an array of field names to an array of indexes of the fields
145
- # matching those names.
146
- def find_field_indexes(key_fields, field_names)
147
- key_fields.map do |field|
148
- if field.is_a?(Fixnum)
149
- field
150
- else
151
- field_names.index{ |field_name| field.to_s.downcase == field_name.downcase } or
152
- raise ArgumentError, "Could not locate field '#{field}' in source field names: #{
153
- field_names.join(', ')}"
58
+ # When you call CSV.open, it's best to pass in a block so that after it's yielded,
59
+ # the underlying file handle is closed. Otherwise, you risk leaking the handle.
60
+ @data = CSV.open(@path, mode_string, csv_options) do |csv|
61
+ csv.readlines
154
62
  end
63
+ elsif source.is_a?(Enumerable) && source.size == 0 || (source.size > 0 && source.first.is_a?(Enumerable))
64
+ @data = source
65
+ else
66
+ raise ArgumentError, "source must be a path to a file or an Enumerable<Enumerable>"
155
67
  end
68
+ index_source
156
69
  end
157
70
 
158
71
  end
@@ -0,0 +1,278 @@
1
+ class CSVDiff
2
+
3
+ # Reppresents an input (i.e the left/from or tight/to input) to the diff
4
+ # process.
5
+ class Source
6
+
7
+ # @return [String] the path to the source file
8
+ attr_accessor :path
9
+ # @return [Array<Arrary>] The data for this source
10
+ attr_reader :data
11
+
12
+ # @return [Array<String>] The names of the fields in the source file
13
+ attr_reader :field_names
14
+ # @return [Array<String>] The names of the field(s) that uniquely
15
+ # identify each row.
16
+ attr_reader :key_fields
17
+ # @return [Array<String>] The names of the field(s) that identify a
18
+ # common parent of child records.
19
+ attr_reader :parent_fields
20
+ # @return [Array<String>] The names of the field(s) that distinguish a
21
+ # child of a parent record.
22
+ attr_reader :child_fields
23
+
24
+ # @return [Array<Fixnum>] The indexes of the key fields in the source
25
+ # file.
26
+ attr_reader :key_field_indexes
27
+ # @return [Array<Fixnum>] The indexes of the parent fields in the source
28
+ # file.
29
+ attr_reader :parent_field_indexes
30
+ # @return [Array<Fixnum>] The indexes of the child fields in the source
31
+ # file.
32
+ attr_reader :child_field_indexes
33
+
34
+ # @return [Boolean] True if the source has been indexed with case-
35
+ # sensitive keys, or false if it has been indexed using upper-case key
36
+ # values.
37
+ attr_reader :case_sensitive
38
+ alias_method :case_sensitive?, :case_sensitive
39
+ # @return [Boolean] True if leading/trailing whitespace should be stripped
40
+ # from fields
41
+ attr_reader :trim_whitespace
42
+ # @return [Hash<String,Hash>] A hash containing each line of the source,
43
+ # keyed on the values of the +key_fields+.
44
+ attr_reader :lines
45
+ # @return [Hash<String,Array<String>>] A hash containing each parent key,
46
+ # and an Array of the child keys it is a parent of.
47
+ attr_reader :index
48
+ # @return [Array<String>] An array of any warnings encountered while
49
+ # processing the source.
50
+ attr_reader :warnings
51
+ # @return [Fixnum] A count of the lines processed from this source.
52
+ # Excludes any header and duplicate records identified during indexing.
53
+ attr_reader :line_count
54
+ # @return [Fixnum] A count of the lines from this source that were skipped
55
+ # due to filter conditions.
56
+ attr_reader :skip_count
57
+ # @return [Fixnum] A count of the lines from this source that had the same
58
+ # key value as another line.
59
+ attr_reader :dup_count
60
+
61
+
62
+ # Creates a new diff source.
63
+ #
64
+ # A diff source must contain at least one field that will be used as the
65
+ # key to identify the same record in a different version of this file.
66
+ # If not specified via one of the options, the first field is assumed to
67
+ # be the unique key.
68
+ #
69
+ # If multiple fields combine to form a unique key, the combined fields
70
+ # are considered as a single unique identifier. If your key represents
71
+ # data that can be represented as a tree, you can instead break your key
72
+ # fields into :parent_fields and :child_fields. By doing this, if a child
73
+ # key is deleted from one parent, and added to another, that will be
74
+ # reported as an update, with a change to the parent key part(s) of the
75
+ # record.
76
+ #
77
+ # All key options can be specified either by field name, or by field
78
+ # index (0 based).
79
+ #
80
+ # @param options [Hash] An options hash.
81
+ # @option options [Array<String>] :field_names The names of each of the
82
+ # fields in +source+.
83
+ # @option options [Boolean] :ignore_header If true, and :field_names has
84
+ # been specified, then the first row of the file is ignored.
85
+ # @option options [String] :key_field The name of the field that uniquely
86
+ # identifies each row.
87
+ # @option options [Array<String>] :key_fields The names of the fields
88
+ # that uniquely identifies each row.
89
+ # @option options [String] :parent_field The name of the field(s) that
90
+ # identify a parent within which sibling order should be checked.
91
+ # @option options [String] :child_field The name of the field(s) that
92
+ # uniquely identify a child of a parent.
93
+ # @option options [Boolean] :case_sensitive If true (the default), keys
94
+ # are indexed as-is; if false, the index is built in upper-case for
95
+ # case-insensitive comparisons.
96
+ # @option options [Hash] :include A hash of field name(s) or index(es) to
97
+ # regular expression(s). Only source rows whose field values satisfy the
98
+ # regular expressions will be indexed and included in the diff process.
99
+ # @option options [Hash] :exclude A hash of field name(s) or index(es) to
100
+ # regular expression(s). Source rows with a field value that satisfies
101
+ # the regular expressions will be excluded from the diff process.
102
+ def initialize(options = {})
103
+ if (options.keys & [:parent_field, :parent_fields, :child_field, :child_fields]).empty? &&
104
+ (kf = options.fetch(:key_field, options[:key_fields]))
105
+ @key_fields = [kf].flatten
106
+ @parent_fields = []
107
+ @child_fields = @key_fields
108
+ else
109
+ @parent_fields = [options.fetch(:parent_field, options[:parent_fields]) || []].flatten
110
+ @child_fields = [options.fetch(:child_field, options[:child_fields]) || [0]].flatten
111
+ @key_fields = @parent_fields + @child_fields
112
+ end
113
+ @field_names = options[:field_names]
114
+ @case_sensitive = options.fetch(:case_sensitive, true)
115
+ @trim_whitespace = options.fetch(:trim_whitespace, false)
116
+ @ignore_header = options[:ignore_header]
117
+ @include = options[:include]
118
+ @exclued = options[:exclude]
119
+ @path = options.fetch(:path, 'NA') unless @path
120
+ @warnings = []
121
+ end
122
+
123
+
124
+ def path?
125
+ @path != 'NA'
126
+ end
127
+
128
+
129
+ # Returns the row in the CSV source corresponding to the supplied key.
130
+ #
131
+ # @param key [String] The unique key to use to lookup the row.
132
+ # @return [Hash] The fields for the line corresponding to +key+, or nil
133
+ # if the key is not recognised.
134
+ def [](key)
135
+ @lines[key]
136
+ end
137
+
138
+
139
+ # Given an array of lines, where each line is an array of fields, indexes
140
+ # the array contents so that it can be looked up by key.
141
+ def index_source
142
+ @lines = {}
143
+ @index = Hash.new{ |h, k| h[k] = [] }
144
+ if @field_names
145
+ index_fields
146
+ include_filter = convert_filter(@include, @field_names)
147
+ exclude_filter = convert_filter(@exclude, @field_names)
148
+ end
149
+ @line_count = 0
150
+ @skip_count = 0
151
+ @dup_count = 0
152
+ line_num = 0
153
+ @data.each do |row|
154
+ line_num += 1
155
+ next if line_num == 1 && @field_names && @ignore_header
156
+ unless @field_names
157
+ if row.class.name == 'CSV::Row'
158
+ @field_names = row.headers.each_with_index.map{ |f, i| f || i.to_s }
159
+ else
160
+ @field_names = row.each_with_index.map{ |f, i| f || i.to_s }
161
+ end
162
+ index_fields
163
+ include_filter = convert_filter(@include, @field_names)
164
+ exclude_filter = convert_filter(@exclude, @field_names)
165
+ next
166
+ end
167
+ field_vals = row
168
+ line = {}
169
+ filter = false
170
+ @field_names.each_with_index do |field, i|
171
+ val = field_vals[i]
172
+ val = val.to_s.strip if val && @trim_whitespace
173
+ line[field] = val
174
+ if include_filter && f = include_filter[i]
175
+ filter = !check_filter(f, line[field])
176
+ end
177
+ if exclude_filter && f = exclude_filter[i]
178
+ filter = check_filter(f, line[field])
179
+ end
180
+ break if filter
181
+ end
182
+ if filter
183
+ @skip_count += 1
184
+ next
185
+ end
186
+ key_values = @key_field_indexes.map{ |kf| @case_sensitive ?
187
+ field_vals[kf].to_s :
188
+ field_vals[kf].to_s.upcase }
189
+ key = key_values.join('~')
190
+ parent_key = key_values[0...(@parent_fields.length)].join('~')
191
+ if @lines[key]
192
+ @warnings << "Duplicate key '#{key}' encountered at line #{line_num}"
193
+ @dup_count += 1
194
+ key += "[#{@dup_count}]"
195
+ end
196
+ @index[parent_key] << key
197
+ @lines[key] = line
198
+ @line_count += 1
199
+ end
200
+ end
201
+
202
+
203
+ # Save the data in this Source as a CSV at +file_path+.
204
+ #
205
+ # @parma file_path [String] The target path to save the data to.
206
+ # @param options [Hash] A set of options to pass to CSV.open to control
207
+ # how the CSV is generated.
208
+ def save_csv(file_path, options = {})
209
+ require 'csv'
210
+ default_opts = {
211
+ headers: @field_name, write_headers: true
212
+ }
213
+ CSV.open(file_path, 'wb', default_opts.merge(options)) do |csv|
214
+ @data.each{ |rec| csv << rec }
215
+ end
216
+ end
217
+
218
+
219
+ private
220
+
221
+
222
+ def index_fields
223
+ @key_field_indexes = find_field_indexes(@key_fields, @field_names)
224
+ @parent_field_indexes = find_field_indexes(@parent_fields, @field_names)
225
+ @child_field_indexes = find_field_indexes(@child_fields, @field_names)
226
+ @key_fields = @key_field_indexes.map{ |i| @field_names[i] }
227
+ @parent_fields = @parent_field_indexes.map{ |i| @field_names[i] }
228
+ @child_fields = @child_field_indexes.map{ |i| @field_names[i] }
229
+ end
230
+
231
+
232
+ # Converts an array of field names to an array of indexes of the fields
233
+ # matching those names.
234
+ def find_field_indexes(key_fields, field_names)
235
+ key_fields.map do |field|
236
+ if field.is_a?(Integer)
237
+ field
238
+ else
239
+ field_names.index{ |field_name| field.to_s.downcase == field_name.to_s.downcase } or
240
+ raise ArgumentError, "Could not locate field '#{field}' in source field names: #{
241
+ field_names.join(', ')}"
242
+ end
243
+ end
244
+ end
245
+
246
+
247
+ def convert_filter(hsh, field_names)
248
+ return unless hsh
249
+ if !hsh.is_a?(Hash)
250
+ raise ArgumentError, ":include/:exclude option must be a Hash of field name(s)/index(es) to RegExp(s)"
251
+ end
252
+ keys = hsh.keys
253
+ idxs = find_field_indexes(keys, @field_names)
254
+ Hash[keys.each_with_index.map{ |k, i| [idxs[i], hsh[k]] }]
255
+ end
256
+
257
+
258
+ def check_filter(filter, field_val)
259
+ case filter
260
+ when String
261
+ if @case_sensitive
262
+ filter == field_val
263
+ else
264
+ filter.downcase == field_val.to_s.downcase
265
+ end
266
+ when Regexp
267
+ filter.match(field_val)
268
+ when Proc
269
+ filter.call(field_val)
270
+ else
271
+ raise ArgumentError, "Unsupported filter expression: #{filter.inspect}"
272
+ end
273
+ end
274
+
275
+ end
276
+
277
+ end
278
+
@@ -0,0 +1,142 @@
1
+ require 'nokogiri'
2
+ require 'cgi'
3
+
4
+
5
+ class CSVDiff
6
+
7
+ # Convert XML content to CSV format using XPath selectors to identify the
8
+ # rows and field values in an XML document
9
+ class XMLSource < Source
10
+
11
+ attr_accessor :context
12
+
13
+ # Create a new XMLSource, identified by +path+. Normally this is a path
14
+ # to the XML document, but any value is fine, as it isreally just a label
15
+ # to identify this data set.
16
+ #
17
+ # @param path [String] A label for this data set (often a path to the
18
+ # XML document used as the source).
19
+ # @param options [Hash] An options hash.
20
+ # @option options [Array<String>] :field_names The names of each of the
21
+ # fields in +source+.
22
+ # @option options [Boolean] :ignore_header If true, and :field_names has
23
+ # been specified, then the first row of the file is ignored.
24
+ # @option options [String] :key_field The name of the field that uniquely
25
+ # identifies each row.
26
+ # @option options [Array<String>] :key_fields The names of the fields
27
+ # that uniquely identifies each row.
28
+ # @option options [String] :parent_field The name of the field(s) that
29
+ # identify a parent within which sibling order should be checked.
30
+ # @option options [String] :child_field The name of the field(s) that
31
+ # uniquely identify a child of a parent.
32
+ # @option options [Boolean] :case_sensitive If true (the default), keys
33
+ # are indexed as-is; if false, the index is built in upper-case for
34
+ # case-insensitive comparisons.
35
+ # @option options [Hash] :include A hash of field name(s) or index(es) to
36
+ # regular expression(s). Only source rows whose field values satisfy the
37
+ # regular expressions will be indexed and included in the diff process.
38
+ # @option options [Hash] :exclude A hash of field name(s) or index(es) to
39
+ # regular expression(s). Source rows with a field value that satisfies
40
+ # the regular expressions will be excluded from the diff process.
41
+ # @option options [String] :context A context value from which fields
42
+ # can be populated using a Regexp.
43
+ def initialize(path, options = {})
44
+ super(options)
45
+ @path = path
46
+ @context = options[:context]
47
+ @data = []
48
+ end
49
+
50
+
51
+ # Process a +source+, converting the XML into a table of data, using
52
+ # +rec_xpath+ to identify the nodes that correspond each record that
53
+ # should appear in the output, and +field_maps+ to populate each field
54
+ # in each row.
55
+ #
56
+ # @param source [String|Array] may be a String containing XML content,
57
+ # an Array of paths to files containing XML content, or a path to
58
+ # a single file.
59
+ # @param rec_xpath [String] An XPath expression that selects all the
60
+ # items in the XML document that are to be converted into new rows.
61
+ # The returned items are not directly used to populate the fields,
62
+ # but provide a context for the field XPath expreessions that populate
63
+ # each field's content.
64
+ # @param field_maps [Hash<String, String>] A map of field names to
65
+ # expressions that are evaluated in the context of each row node
66
+ # selected by +rec_xpath+. The field expressions are typically XPath
67
+ # expressions evaluated in the context of the nodes returned by the
68
+ # +rec_xpath+. Alternatively, a String that is not an XPath expression
69
+ # is used as a literal value for a field, while a Regexp can also
70
+ # be used to pull a value from any context specified in the +options+
71
+ # hash. The Regexp should include a single grouping, as the value used
72
+ # will be the result in $1 after the match is performed.
73
+ # @param context [String] An optional context for the XML to be processed.
74
+ # The value passed here can be referenced in field map expressions
75
+ # using a Regexp, with the value of the first grouping in the regex
76
+ # being the value returned for the field.
77
+ def process(source, rec_xpath, field_maps, context = nil)
78
+ @field_names = field_maps.keys unless @field_names
79
+ case source
80
+ when Nokogiri::XML::Document
81
+ add_data(source, rec_xpath, field_maps, context || @context)
82
+ when /<\?xml/
83
+ doc = Nokogiri::XML(source)
84
+ add_data(doc, rec_xpath, field_maps, context || @context)
85
+ when Array
86
+ source.each{ |f| process_file(f, rec_xpath, field_maps) }
87
+ when String
88
+ process_file(source, rec_xpath, field_maps)
89
+ else
90
+ raise ArgumentError, "Unhandled source type #{source.class.name}"
91
+ end
92
+ @data
93
+ end
94
+
95
+
96
+ private
97
+
98
+
99
+ # Load the XML document at +file_path+ and process it into rows of data.
100
+ def process_file(file_path, rec_xpath, field_maps)
101
+ begin
102
+ File.open(file_path) do |f|
103
+ doc = Nokogiri::XML(f)
104
+ add_data(doc, rec_xpath, field_maps, @context || file_path)
105
+ end
106
+ rescue
107
+ STDERR.puts "An error occurred while attempting to open #{file_path}"
108
+ raise
109
+ end
110
+ end
111
+
112
+
113
+ # Locate records in +doc+ using +rec_xpath+ to identify the nodes that
114
+ # correspond to a new record in the data, and +field_maps+ to populate
115
+ # the fields in each row.
116
+ def add_data(doc, rec_xpath, field_maps, context)
117
+ doc.xpath(rec_xpath).each do |rec_node|
118
+ rec = []
119
+ field_maps.each do |field_name, expr|
120
+ case expr
121
+ when Regexp # Match context against Regexp and extract first grouping
122
+ if context
123
+ context =~ expr
124
+ rec << $1
125
+ else
126
+ rec << nil
127
+ end
128
+ when %r{[/(.@]} # XPath expression
129
+ res = rec_node.xpath(expr)
130
+ rec << CGI.unescape_html(res.to_s)
131
+ else # Use expr as the value for this field
132
+ rec << expr
133
+ end
134
+ end
135
+ @data << rec
136
+ end
137
+ end
138
+
139
+ end
140
+
141
+ end
142
+
metadata CHANGED
@@ -1,34 +1,44 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csv-diff
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adam Gardiner
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-08-14 00:00:00.000000000 Z
11
+ date: 2020-08-28 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: ! " This library performs diffs of CSV files.\n\n Unlike
14
- a standard diff that compares line by line, and is sensitive to the\n ordering
15
- of records, CSV-Diff identifies common lines by key field(s), and\n then
16
- compares the contents of the fields in each line.\n\n Data may be supplied
17
- in the form of CSV files, or as an array of arrays. The\n diff process provides
18
- a fine level of control over what to diff, and can\n optionally ignore certain
19
- types of changes (e.g. changes in position).\n\n CSV-Diff is particularly
20
- well suited to data in parent-child format. Parent-\n child data does not
21
- lend itself well to standard text diffs, as small changes\n in the organisation
22
- of the tree at an upper level can lead to big movements\n in the position
23
- of descendant records. By instead matching records by key,\n CSV-Diff avoids
24
- this issue, while still being able to detect changes in\n sibling order.\n\n
25
- \ This gem implements the core diff algorithm, and handles the loading and\n
26
- \ diffing of CSV files. It returns a CSVDiff object, that contains the details\n
27
- \ of differences in object form. This is useful for projects that need diff\n
28
- \ capability, but want to handle the reporting of differences themselves.
29
- For\n a pre-built diff reporting capability, see the csv-diff-report gem,
30
- which\n provides a command-line tool for generating diff reports in HTML
31
- or Excel\n format.\n"
13
+ description: |2
14
+ This library performs diffs of CSV data, or any table-like source.
15
+
16
+ Unlike a standard diff that compares line by line, and is sensitive to the
17
+ ordering of records, CSV-Diff identifies common lines by key field(s), and
18
+ then compares the contents of the fields in each line.
19
+
20
+ Data may be supplied in the form of CSV files, or as an array of arrays. The
21
+ diff process provides a fine level of control over what to diff, and can
22
+ optionally ignore certain types of changes (e.g. changes in position).
23
+
24
+ CSV-Diff is particularly well suited to data in parent-child format. Parent-
25
+ child data does not lend itself well to standard text diffs, as small changes
26
+ in the organisation of the tree at an upper level can lead to big movements
27
+ in the position of descendant records. By instead matching records by key,
28
+ CSV-Diff avoids this issue, while still being able to detect changes in
29
+ sibling order.
30
+
31
+ This gem implements the core diff algorithm, and handles the loading and
32
+ diffing of CSV files (or Arrays of Arrays). It also supports converting
33
+ data in XML format into tabular form, so that it can then be processed
34
+ like any other CSV or table-like source. It returns a CSVDiff object
35
+ containing the details of differences in object form. This is useful for
36
+ projects that need diff capability, but want to handle the reporting or
37
+ actioning of differences themselves.
38
+
39
+ For a pre-built diff reporting capability, see the csv-diff-report gem,
40
+ which provides a command-line tool for generating diff reports in HTML,
41
+ Excel, or text formats.
32
42
  email: adam.b.gardiner@gmail.com
33
43
  executables: []
34
44
  extensions: []
@@ -40,9 +50,12 @@ files:
40
50
  - lib/csv-diff/algorithm.rb
41
51
  - lib/csv-diff/csv_diff.rb
42
52
  - lib/csv-diff/csv_source.rb
53
+ - lib/csv-diff/source.rb
54
+ - lib/csv-diff/xml_source.rb
43
55
  - lib/csv_diff.rb
44
56
  homepage: https://github.com/agardiner/csv-diff
45
- licenses: []
57
+ licenses:
58
+ - MIT
46
59
  metadata: {}
47
60
  post_install_message: For command-line tools and diff reports, 'gem install csv-diff-report'
48
61
  rdoc_options: []
@@ -50,18 +63,18 @@ require_paths:
50
63
  - lib
51
64
  required_ruby_version: !ruby/object:Gem::Requirement
52
65
  requirements:
53
- - - ! '>='
66
+ - - ">="
54
67
  - !ruby/object:Gem::Version
55
68
  version: '0'
56
69
  required_rubygems_version: !ruby/object:Gem::Requirement
57
70
  requirements:
58
- - - ! '>='
71
+ - - ">="
59
72
  - !ruby/object:Gem::Version
60
73
  version: '0'
61
74
  requirements: []
62
75
  rubyforge_project:
63
- rubygems_version: 2.4.1
76
+ rubygems_version: 2.5.2.3
64
77
  signing_key:
65
78
  specification_version: 4
66
- summary: CSV Diff is a library for generating diffs from data in CSV format
79
+ summary: CSV Diff is a library for generating diffs from data in CSV or XML format
67
80
  test_files: []