compare-xml 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: eb7ad2fa6ba45154479d1129ad6ce84337b331fa
4
+ data.tar.gz: eb3a5b0f7caedccb403c1a5504baa80b4e5ff7a8
5
+ SHA512:
6
+ metadata.gz: d46798f576e812ad39b3604c8b41f01d531f6490ba1690facb1cd978c200139d8b1e58a17b98a2d47fbbd76cc011487f9850bb68f5a5e28f5d3156d41a7c9f7c
7
+ data.tar.gz: 91993f87fca6eb40ec302bb598a88b85fa36e1876e1c9ef88396b2f52360e8bed89597400d9803ceaa7a455e8d5930750a9239d4610f607efe46f01d03c5dc31
@@ -0,0 +1,13 @@
1
+ *.DS_Store
2
+ *thumbs.db
3
+ /*.gem
4
+ /.bundle/
5
+ /.idea/
6
+ /.yardoc
7
+ /_yardoc/
8
+ /coverage/
9
+ /doc/
10
+ /Gemfile.lock
11
+ /pkg/
12
+ /spec/reports/
13
+ /tmp/
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in compare-xml-xml.gemspec
4
+ gemspec
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2016 Vadim Kononov
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,358 @@
1
+ # CompareXML
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/compare-xml.svg)](https://rubygems.org/gems/compare-xml)
4
+
5
+ CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet` for equality or equivalency.
6
+
7
+ **Features**
8
+
9
+ - Fast, light-weight and highly customizable
10
+ - Compares XML/HTML documents and document fragments
11
+ - Can produce both detailed diffing discrepancies or execute silently
12
+ - Has the ability to exclude specific nodes or attributes from all comparisons
13
+
14
+
15
+
16
+ ## Installation
17
+
18
+ Add this line to your application's Gemfile:
19
+
20
+ ```ruby
21
+ gem 'compare-xml'
22
+ ```
23
+
24
+ And then execute:
25
+
26
+ $ bundle
27
+
28
+ Or install it yourself as:
29
+
30
+ $ gem install compare-xml
31
+
32
+
33
+
34
+ ## Usage
35
+
36
+ Using CompareXML is as simple as
37
+
38
+ ```ruby
39
+ CompareXML.equivalent?(doc1, doc2)
40
+ ```
41
+
42
+ where `doc1` and `doc2` are instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet`.
43
+
44
+ **Example**
45
+
46
+ Suppose you have two files `1.html` and `2.html` that you would like to compare. You could do it as follows:
47
+
48
+ ```ruby
49
+ doc1 = Nokogiri::HTML(open('1.html'))
50
+ doc2 = Nokogiri::HTML(open('2.html'))
51
+ puts CompareXML.equivalent?(doc1, doc2)
52
+ ```
53
+
54
+ The above code will print `true` or `false` depending on the result of the comparison.
55
+
56
+ > If you are using CompareXML in a script, then you need to require it manually with:
57
+
58
+ ```ruby
59
+ require 'compare-xml'
60
+ ```
61
+
62
+
63
+ ## Options
64
+
65
+ CompareXML has a variety of options that can be invoked as an optional argument, e.g.:
66
+
67
+ ```ruby
68
+ CompareXML.equivalent?(doc1, doc2, {squeeze_whitespace: true, verbose: true})
69
+ ```
70
+
71
+
72
+ ----------
73
+
74
+
75
+ - ####`ignore_attr_order: {true|false}` default: **`true`**
76
+
77
+ When `true`, all attributes are sorted before comparison and only attributes of the same type are compared.
78
+
79
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_order: true})`
80
+
81
+ **Example:** When `true` the following HTML strings are considered equal:
82
+
83
+ <a href="/admin" class="button" target="_blank">Link</a>
84
+ <a class="button" target="_blank" href="/admin">Link</a>
85
+
86
+ **Example:** When `false` the above HTML strings are compared as follows:
87
+
88
+ href="admin" != class="button
89
+
90
+ The comparison of the `<a>` element will stop at this point, since a discrepancy is found.
91
+
92
+ **Example:** When `true` the following HTML strings are compared as follows:
93
+
94
+ <a href="/admin" class="button" target="_blank">Link</a>
95
+ <a class="button" target="_blank" href="/admin" rel="nofollow">Link</a>
96
+
97
+ class="button" == class="button"
98
+ href="/admin" == href="/admin"
99
+ =! rel="nofollow"
100
+ target="_blank" == target="_blank"
101
+
102
+
103
+ ----------
104
+
105
+
106
+ - ####`ignore_attrs: {css}` default: **`{}`**
107
+
108
+ When provided, ignores all **attributes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
109
+
110
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs: ['a[rel="nofollow"]', 'input[type="hidden"']})`
111
+
112
+ **Example:** With `ignore_attrs: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal:
113
+
114
+ <a href="/admin" class="button" target="_blank">Link</a>
115
+ <a href="/admin" class="button" target="_self" rel="nofollow">Link</a>
116
+
117
+ **Example:** With `ignore_attrs: ['a[href^="http"]', 'a[class*="button"]']` the following HTML strings are considered equal:
118
+
119
+ <a href="http://google.ca" class="primary button">Link</a>
120
+ <a href="https://google.com" class="primary button rounded">Link</a>
121
+
122
+
123
+ ----------
124
+
125
+
126
+ - ####`ignore_comments: {true|false}` default: **`true`**
127
+
128
+ When `true`, ignores comments, such as `<!-- This is a comment -->`.
129
+
130
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_comments: true})`
131
+
132
+ **Example:** When `true` the following HTML strings are considered equal:
133
+
134
+ <!-- This is a comment -->
135
+ <!-- This is another comment -->
136
+
137
+ **Example:** When `true` the following HTML strings are considered equal:
138
+
139
+ <a href="/admin"><!-- This is a comment -->Link</a>
140
+ <a href="/admin">Link</a>
141
+
142
+
143
+ ----------
144
+
145
+
146
+ - ####`ignore_nodes: {css}` default: **`{}`**
147
+
148
+ When provided, ignores all **nodes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
149
+
150
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_nodes: ['script', 'object']})`
151
+
152
+ **Example:** With `ignore_nodes: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal:
153
+
154
+ <a href="/admin" class="icon" target="_blank">Link 1</a>
155
+ <a href="/index" class="button" target="_self" rel="nofollow">Link 2</a>
156
+
157
+ **Example:** With `ignore_nodes: ['b', 'i']` the following HTML strings are considered equal:
158
+
159
+ <a href="/admin"><i class"icon bulb"></i><b>Warning:</b> Link</a>
160
+ <a href="/admin"><i class"icon info"></i><b>Message:</b> Link</a>
161
+
162
+
163
+ ----------
164
+
165
+
166
+ - ####`ignore_text_nodes: {true|false}` default: **`false`**
167
+
168
+ When `true`, ignores all text content. Text content is anything that is included between an opening and a closing tag, e.g. `<tag>THIS IS TEXT CONTENT</tag>`.
169
+
170
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_text_nodes: true})`
171
+
172
+ **Example:** When `true` the following HTML strings are considered equal:
173
+
174
+ <a href="/admin">SOME TEXT CONTENT</a>
175
+ <a href="/admin">DIFFERENT TEXT CONTENT</a>
176
+
177
+ **Example:** When `true` the following HTML strings are considered equal:
178
+
179
+ <i class="icon></i> <b>Warning:</b>
180
+ <i class="icon> </i> <b>Message:</b>
181
+
182
+
183
+ ----------
184
+
185
+
186
+ - ####`squeeze_whitespace: {true|false}` default: **`true`**
187
+
188
+ When `true`, all text content within the document is trimmed (i.e. space removed from left and right) and whitespace is squeezed (i.e. tabs, new lines, multiple whitespaces are all replaced by a single whitespace).
189
+
190
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {squeeze_whitespace: true})`
191
+
192
+ **Example:** When `true` the following HTML strings are considered equal:
193
+
194
+ <a href="/admin"> SOME TEXT CONTENT </a>
195
+ <a href="/index"> SOME TEXT CONTENT </a>
196
+
197
+ **Example:** When `true` the following HTML strings are considered equal:
198
+
199
+ <html>
200
+ <title>
201
+ This is my title
202
+ </title>
203
+ </html>
204
+
205
+ <html><title>This is my title</title></html>
206
+
207
+
208
+ ----------
209
+
210
+
211
+ - ####`verbose: {true|false}` default: **`false`**
212
+
213
+ When `true`, instead of returning a boolean value `CompareXML.equivalent?` returns an array of all errors encountered when performing a comparison.
214
+
215
+ > **Warning:** When `true`, the comparison takes longer! Not only because more processing is required to produce meaningful error messages, but also because in this mode, comparison does **NOT** stop when a first error is encountered, because the goal is to capture as many discrepancies as possible.
216
+
217
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {verbose: true})`
218
+
219
+ **Example:** When `true` given the following HTML strings:
220
+
221
+ <!DOCTYPE html>
222
+ <html lang="en">
223
+ <head><title>TITLE</title></head>
224
+ <body>
225
+ <h1>SOME HEADING</h1>
226
+ <div id="content">
227
+ <h2><i class="fa fa-cogs"></i> ANOTHER HEADING</h2>
228
+ <p>Extra content</p>
229
+ </div>
230
+ <div class="window">
231
+ <a href="/admin" rel="icon">Link</a>
232
+ </div>
233
+ <blockquote>Some fancy quote <cite>Author Name</cite></blockquote>
234
+ <p>Some more text</p>
235
+ <p>Yet more text</p>
236
+ <p>Too much text</p>
237
+ <!-- The footer is below -->
238
+ <p class="footer">FOOTER</p>
239
+ </body>
240
+ </html>
241
+
242
+ <!DOCTYPE html>
243
+ <html lang="en">
244
+ <head><title>ANOTHER TITLE</title></head>
245
+ <body>
246
+ <h1 id="main">SOME HEADING</h1>
247
+ <div id="content">
248
+ <h2><i class="fa fa-cogs"></i> ANOTHER HEADING</h2>
249
+ <p>Extra content</p>
250
+ </div>
251
+ <div class="window">
252
+ <a rel="button" href="/admin">Link</a>
253
+ </div>
254
+ <blockquote>Some fancy quote</blockquote>
255
+ <p>Some more text</p>
256
+ <p>Yet more text</p>
257
+ <p>Too much text</p>
258
+ <!-- This is the footer -->
259
+ <div class="footer">FOOTER</div>
260
+ </body>
261
+ </html>
262
+
263
+ `CompareXML.equivalent?(doc1, doc2, {verbose: true})` will produce an array shown below.
264
+
265
+ [
266
+ "html:head:title",
267
+ "TITLE",
268
+ 10,
269
+ "ANOTHER TITLE",
270
+ "html:head:title"
271
+ ],
272
+ [
273
+ "html:body:h1",
274
+ nil,
275
+ 2,
276
+ "id=\"main\"",
277
+ "html:body:h1"
278
+ ],
279
+ [
280
+ "html:body:div(2):a",
281
+ "rel=\"button\"",
282
+ 4,
283
+ "rel=\"icon\"",
284
+ "html:body:div(2):a"
285
+ ],
286
+ [
287
+ "html:body:blockquote:cite",
288
+ "cite",
289
+ 3,
290
+ nil,
291
+ "html:body:blockquote:cite"
292
+ ],
293
+ [
294
+ "html:body:p(4)",
295
+ "p",
296
+ 8,
297
+ "div",
298
+ "html:body:div(3)"
299
+ ]
300
+
301
+ The structure of the array is as follows:
302
+
303
+ [left_node_location, left_content, error_code, right_content, right_node_location]
304
+
305
+ **Node location** of `html:body:p(4)` means that the element in question is `<p>`, its hierarchical ancestors are `html > body`, and it is the **4th** `<p>` tag. That is, it could be found in
306
+
307
+ <html><body><p>one</p>...<p>two</p>...<p>three</p>...<p>TARGET</p></body></html>
308
+
309
+ > **Note:** `p(4)` means that it is the fourth tag of type `<p>`, but there could be many other tags of other types between `p(3)` and `p(4)`.
310
+
311
+ **Node content** displays the discrepancy in content (which could be the name of the tag, attributes, text content, comments, etc)
312
+
313
+ **Error code** is a numeric value that indicates the type of a discrepancy. CompareXML implements the following error codes
314
+
315
+ ```ruby
316
+ EQUIVALENT = 1 # nodes are equal (for internal use only)
317
+ MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart
318
+ MISSING_NODE = 3 # node is missing its counterpart
319
+ UNEQUAL_ATTRIBUTES = 4 # attributes are not equal
320
+ UNEQUAL_COMMENTS = 5 # comment contents are not equal
321
+ UNEQUAL_DOCUMENTS = 6 # document types are not equal
322
+ UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal
323
+ UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type
324
+ UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal
325
+ ```
326
+
327
+ Here is an example of how these could be used:
328
+
329
+ ```ruby
330
+ case error_code
331
+ when CompareXML::UNEQUAL_ATTRIBUTES
332
+ '!='
333
+ when CompareXML::MISSING_ATTRIBUTE
334
+ '?'
335
+ end
336
+ ```
337
+
338
+
339
+
340
+ ## Contributing
341
+
342
+ 1. Fork it
343
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
344
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
345
+ 4. Push to the branch (`git push origin my-new-feature`)
346
+ 5. Create new Pull Request
347
+
348
+
349
+
350
+ ## Credits
351
+
352
+ This gem was inspired by [Michael B. Klein](https://github.com/mbklein)'s gem [`equivalent-xml`](https://github.com/mbklein/equivalent-xml) - another excellent tool for XML comparison.
353
+
354
+
355
+
356
+ ## License
357
+
358
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
@@ -0,0 +1,2 @@
1
+ require 'bundler/gem_tasks'
2
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'compare-xml/xml'
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require 'pry'
11
+ # Pry.start
12
+
13
+ require 'irb'
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,25 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'compare-xml/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = 'compare-xml'
8
+ spec.version = CompareXML::VERSION
9
+ spec.authors = ['Vadim Kononov']
10
+ spec.email = ['vadim@poetic.com']
11
+
12
+ spec.summary = %q{A customizable tool that compares two instances of Nokogiri::XML::Node for equality or equivalency.}
13
+ spec.description = %q{CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of Nokogiri::XML::Node or Nokogiri::XML::NodeSet for equality or equivalency.}
14
+ spec.homepage = 'https://github.com/vkononov/compare-xml-xml'
15
+ spec.license = 'MIT'
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
+ spec.bindir = 'exe'
19
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
+ spec.require_paths = ['lib']
21
+
22
+ spec.add_development_dependency 'bundler', '~> 1.11'
23
+ spec.add_development_dependency 'rake', '~> 11.1'
24
+ spec.add_runtime_dependency 'nokogiri', '~> 1.6'
25
+ end
@@ -0,0 +1,452 @@
1
+ require 'compare-xml/version'
2
+ require 'nokogiri'
3
+
4
+ module CompareXML
5
+
6
+ # default options used by the module; all of these can be overridden
7
+ DEFAULTS_OPTS = {
8
+ # when true, attribute order is not important (all attributes are sorted before comparison)
9
+ # when false, attributes are compared in order and comparison stops on the first mismatch
10
+ ignore_attr_order: true,
11
+
12
+ # contains an array of user-specified CSS rules used to perform attribute exclusions
13
+ # for this to work, a CSS rule MUST contain the attribute to be excluded,
14
+ # i.e. a[href] will exclude all "href" attributes contained in <a> tags.
15
+ ignore_attrs: {},
16
+
17
+ # when true ignores XML and HTML comments
18
+ # when false, all comments are compared to their counterparts
19
+ ignore_comments: true,
20
+
21
+ # contains an array of user-specified CSS rules used to perform node exclusions
22
+ ignore_nodes: {},
23
+
24
+ # when true, ignores all text nodes (although blank text nodes are always ignored)
25
+ # when false, all text nodes are compared to their counterparts (except the empty ones)
26
+ ignore_text_nodes: false,
27
+
28
+ # when true, trims and squeezes whitespace in text nodes and comments to a single space
29
+ # when false, all whitespace is preserved as it is without any changes
30
+ squeeze_whitespace: true,
31
+
32
+ # when true, provides a list of all error messages encountered in comparisons
33
+ # when false, execution stops when the first error is encountered with no error messages
34
+ verbose: false
35
+ }
36
+
37
+ # used internally only in order to differentiate equivalence for inequivalence
38
+ EQUIVALENT = 1
39
+
40
+ # a list of all possible inequivalence types for nodes
41
+ # these are returned in the errors array to differentiate error types.
42
+ MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart
43
+ MISSING_NODE = 3 # node is missing its counterpart
44
+ UNEQUAL_ATTRIBUTES = 4 # attributes are not equal
45
+ UNEQUAL_COMMENTS = 5 # comment contents are not equal
46
+ UNEQUAL_DOCUMENTS = 6 # document types are not equal
47
+ UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal
48
+ UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type
49
+ UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal
50
+
51
+
52
+ class << self
53
+
54
+ ##
55
+ # Determines whether two XML documents or fragments are equal to each other.
56
+ # The two parameters could be any type of XML documents, or fragments
57
+ # or node sets or even text nodes - any subclass of Nokogiri::XML::Node.
58
+ #
59
+ # @param [Nokogiri::XML::Node] n1 left attribute
60
+ # @param [Nokogiri::XML::Node] n2 right attribute
61
+ # @param [Hash] opts user-overridden options
62
+ #
63
+ # @return true if equal, [Array] errors otherwise
64
+ #
65
+ def equivalent?(n1, n2, opts = {})
66
+ opts, errors = DEFAULTS_OPTS.merge(opts), []
67
+ result = compareNodes(n1, n2, opts, errors)
68
+ opts[:verbose] ? errors : result == EQUIVALENT
69
+ end
70
+
71
+
72
+ private
73
+
74
+ ##
75
+ # Compares two nodes for equivalence. The nodes could be any subclass
76
+ # of Nokogiri::XML::Node including node sets and document fragments.
77
+ #
78
+ # @param [Nokogiri::XML::Node] n1 left attribute
79
+ # @param [Nokogiri::XML::Node] n2 right attribute
80
+ # @param [Hash] opts user-overridden options
81
+ # @param [Array] errors inequivalence messages
82
+ #
83
+ # @return type of equivalence (from equivalence constants)
84
+ #
85
+ def compareNodes(n1, n2, opts, errors, status = EQUIVALENT)
86
+ if n1.class == n2.class
87
+ case n1
88
+ when Nokogiri::XML::Comment
89
+ compareCommentNodes(n1, n2, opts, errors)
90
+ when Nokogiri::HTML::Document
91
+ compareDocumentNodes(n1, n2, opts, errors)
92
+ when Nokogiri::XML::Element
93
+ status = compareElementNodes(n1, n2, opts, errors)
94
+ when Nokogiri::XML::Text
95
+ status = compareTextNodes(n1, n2, opts, errors)
96
+ else
97
+ status = compareChildren(n1.children, n2.children, opts, errors)
98
+ end
99
+ elsif n1.nil?
100
+ status = MISSING_NODE
101
+ errors << [nodePath(n2), nil, status, n2.name, nodePath(n2)] if opts[:verbose]
102
+ elsif n2.nil?
103
+ status = MISSING_NODE
104
+ errors << [nodePath(n1), n1.name, status, nil, nodePath(n1)] if opts[:verbose]
105
+ else
106
+ status = UNEQUAL_NODES_TYPES
107
+ errors << [nodePath(n1), n1.class, status, n2.class, nodePath(n2)] if opts[:verbose]
108
+ end
109
+ status
110
+ end
111
+
112
+
113
+ ##
114
+ # Compares two nodes of type Nokogiri::HTML::Comment.
115
+ #
116
+ # @param [Nokogiri::XML::Comment] n1 left attribute
117
+ # @param [Nokogiri::XML::Comment] n2 right attribute
118
+ # @param [Hash] opts user-overridden options
119
+ # @param [Array] errors inequivalence messages
120
+ #
121
+ # @return type of equivalence (from equivalence constants)
122
+ #
123
+ def compareCommentNodes(n1, n2, opts, errors, status = EQUIVALENT)
124
+ return true if opts[:ignore_comments]
125
+ t1, t2 = n1.content, n2.content
126
+ t1, t2 = squeeze(t1), squeeze(t2) if opts[:squeeze_whitespace]
127
+ unless t1 == t2
128
+ status = UNEQUAL_COMMENTS
129
+ errors << [nodePath(n1.parent), t1, status, t2, nodePath(n2.parent)] if opts[:verbose]
130
+ end
131
+ status
132
+ end
133
+
134
+
135
+ ##
136
+ # Compares two nodes of type Nokogiri::HTML::Document.
137
+ #
138
+ # @param [Nokogiri::XML::Document] n1 left attribute
139
+ # @param [Nokogiri::XML::Document] n2 right attribute
140
+ # @param [Hash] opts user-overridden options
141
+ # @param [Array] errors inequivalence messages
142
+ #
143
+ # @return type of equivalence (from equivalence constants)
144
+ #
145
+ def compareDocumentNodes(n1, n2, opts, errors, status = EQUIVALENT)
146
+ if n1.name == n2.name
147
+ status = compareChildren(n1.children, n2.children, opts, errors)
148
+ else
149
+ status == UNEQUAL_DOCUMENTS
150
+ errors << [nodePath(n1), n1, status, n2, nodePath(n2)] if opts[:verbose]
151
+ end
152
+ status
153
+ end
154
+
155
+
156
+ ##
157
+ # Compares two sets of Nokogiri::XML::NodeSet elements.
158
+ #
159
+ # @param [Nokogiri::XML::NodeSet] n1_set left set of Nokogiri::XML::Node elements
160
+ # @param [Nokogiri::XML::NodeSet] n2_set right set of Nokogiri::XML::Node elements
161
+ # @param [Hash] opts user-overridden options
162
+ # @param [Array] errors inequivalence messages
163
+ #
164
+ # @return type of equivalence (from equivalence constants)
165
+ #
166
+ def compareChildren(n1_set, n2_set, opts, errors, status = EQUIVALENT)
167
+ i = 0; j = 0
168
+ while i < n1_set.length || j < n2_set.length
169
+ if !n1_set[i].nil? && nodeExcluded?(n1_set[i], opts)
170
+ i += 1 # increment counter if left node is excluded
171
+ elsif !n2_set[j].nil? && nodeExcluded?(n2_set[j], opts)
172
+ j += 1 # increment counter if right node is excluded
173
+ else
174
+ result = compareNodes(n1_set[i], n2_set[j], opts, errors)
175
+ status = result unless result == EQUIVALENT
176
+
177
+ # return false so that this subtree could halt comparison on error
178
+ # but neighbours of parents' subtrees could still be compared (in verbose mode)
179
+ return false if status == UNEQUAL_NODES_TYPES || status == UNEQUAL_ELEMENTS
180
+
181
+ # stop execution if a single error is found (unless in verbose mode)
182
+ break unless status == EQUIVALENT || opts[:verbose]
183
+
184
+ # increment both counters when both nodes have been compared
185
+ i += 1; j += 1
186
+ end
187
+ status
188
+ end
189
+ end
190
+
191
+
192
+ ##
193
+ # Compares two nodes of type Nokogiri::XML::Element.
194
+ # - compares element attributes
195
+ # - recursively compares element children
196
+ #
197
+ # @param [Nokogiri::XML::Element] n1 left attribute
198
+ # @param [Nokogiri::XML::Element] n2 right attribute
199
+ # @param [Hash] opts user-overridden options
200
+ # @param [Array] errors inequivalence messages
201
+ #
202
+ # @return type of equivalence (from equivalence constants)
203
+ #
204
+ def compareElementNodes(n1, n2, opts, errors, status = EQUIVALENT)
205
+ if n1.name == n2.name
206
+ result = compareAttributeSets(n1.attribute_nodes, n2.attribute_nodes, opts, errors)
207
+ status = result unless result == EQUIVALENT
208
+ result = compareChildren(n1.children, n2.children, opts, errors)
209
+ status = result unless result == EQUIVALENT
210
+ else
211
+ status = UNEQUAL_ELEMENTS
212
+ errors << [nodePath(n1), n1.name, status, n2.name, nodePath(n2)] if opts[:verbose]
213
+ end
214
+ status
215
+ end
216
+
217
+
218
+ ##
219
+ # Compares two nodes of type Nokogiri::XML::Text.
220
+ #
221
+ # @param [Nokogiri::XML::Text] n1 left attribute
222
+ # @param [Nokogiri::XML::Text] n2 right attribute
223
+ # @param [Hash] opts user-overridden options
224
+ # @param [Array] errors inequivalence messages
225
+ #
226
+ # @return type of equivalence (from equivalence constants)
227
+ #
228
+ def compareTextNodes(n1, n2, opts, errors, status = EQUIVALENT)
229
+ return true if opts[:ignore_text_nodes]
230
+ t1, t2 = n1.content, n2.content
231
+ t1, t2 = squeeze(t1), squeeze(t2) if opts[:squeeze_whitespace]
232
+ unless t1 == t2
233
+ status = UNEQUAL_TEXT_CONTENTS
234
+ errors << [nodePath(n1.parent), t1, status, t2, nodePath(n2.parent)] if opts[:verbose]
235
+ end
236
+ status
237
+ end
238
+
239
+
240
+ ##
241
+ # Compares two sets of Nokogiri::XML::Node attributes.
242
+ #
243
+ # @param [Array] a1_set left attribute set
244
+ # @param [Array] a2_set right attribute set
245
+ # @param [Hash] opts user-overridden options
246
+ # @param [Array] errors inequivalence messages
247
+ #
248
+ # @return type of equivalence (from equivalence constants)
249
+ #
250
+ def compareAttributeSets(a1_set, a2_set, opts, errors)
251
+ return false unless a1_set.length == a2_set.length || opts[:verbose]
252
+ if opts[:ignore_attr_order]
253
+ compareSortedAttributeSets(a1_set, a2_set, opts, errors)
254
+ else
255
+ compareUnsortedAttributeSets(a1_set, a2_set, opts, errors)
256
+ end
257
+ end
258
+
259
+
260
+ ##
261
+ # Compares two sets of Nokogiri::XML::Node attributes by sorting them first.
262
+ # When the attributes are sorted, only attributes of the same type are compared
263
+ # to each other, and missing attributes can be easily detected.
264
+ #
265
+ # @param [Array] a1_set left attribute set
266
+ # @param [Array] a2_set right attribute set
267
+ # @param [Hash] opts user-overridden options
268
+ # @param [Array] errors inequivalence messages
269
+ #
270
+ # @return type of equivalence (from equivalence constants)
271
+ #
272
+ def compareSortedAttributeSets(a1_set, a2_set, opts, errors, status = EQUIVALENT)
273
+ a1_set, a2_set = a1_set.sort_by { |a| a.name }, a2_set.sort_by { |a| a.name }
274
+ i = j = 0
275
+
276
+ while i < a1_set.length || j < a2_set.length
277
+ if a1_set[i].nil?
278
+ result = compareAttributes(nil, a2_set[j], opts, errors); j += 1
279
+ elsif a2_set[j].nil?
280
+ result = compareAttributes(a1_set[i], nil, opts, errors); i += 1
281
+ elsif a1_set[i].name < a2_set[j].name
282
+ result = compareAttributes(a1_set[i], nil, opts, errors); i += 1
283
+ elsif a1_set[i].name > a2_set[j].name
284
+ result = compareAttributes(nil, a2_set[j], opts, errors); j += 1
285
+ else
286
+ result = compareAttributes(a1_set[i], a2_set[j], opts, errors); i += 1; j += 1
287
+ end
288
+ status = result unless result == EQUIVALENT
289
+ break unless status == EQUIVALENT || opts[:verbose]
290
+ end
291
+ status
292
+ end
293
+
294
+
295
+ ##
296
+ # Compares two sets of Nokogiri::XML::Node attributes without sorting them.
297
+ # As a result attributes of different types may be compared, and even if all
298
+ # attributes are identical in both sets, if their order is different,
299
+ # the comparison will stop as soon two unequal attributes are found.
300
+ #
301
+ # @param [Array] a1_set left attribute set
302
+ # @param [Array] a2_set right attribute set
303
+ # @param [Hash] opts user-overridden options
304
+ # @param [Array] errors inequivalence messages
305
+ #
306
+ # @return type of equivalence (from equivalence constants)
307
+ #
308
+ def compareUnsortedAttributeSets(a1_set, a2_set, opts, errors, status = EQUIVALENT)
309
+ [a1_set.length, a2_set.length].max.times do |i|
310
+ result = compareAttributes(a1_set[i], a2_set[i], opts, errors)
311
+ status = result unless result == EQUIVALENT
312
+ break unless status == EQUIVALENT
313
+ end
314
+ status
315
+ end
316
+
317
+
318
+ ##
319
+ # Compares two attributes by name and value.
320
+ #
321
+ # @param [Nokogiri::XML::Attr] a1 left attribute
322
+ # @param [Nokogiri::XML::Attr] a2 right attribute
323
+ # @param [Hash] opts user-overridden options
324
+ # @param [Array] errors inequivalence messages
325
+ #
326
+ # @return type of equivalence (from equivalence constants)
327
+ #
328
+ def compareAttributes(a1, a2, opts, errors, status = EQUIVALENT)
329
+ if a1.nil?
330
+ status = MISSING_ATTRIBUTE
331
+ errors << [nodePath(a2.parent), nil, status, "#{a2.name}=\"#{a2.value}\"", nodePath(a2.parent)] if opts[:verbose]
332
+ elsif a2.nil?
333
+ status = MISSING_ATTRIBUTE
334
+ errors << [nodePath(a1.parent), "#{a1.name}=\"#{a1.value}\"", status, nil, nodePath(a1.parent)] if opts[:verbose]
335
+ elsif a1.name == a2.name
336
+ return status if attrsExcluded?(a1, a2, opts)
337
+ if a1.value != a2.value
338
+ status = UNEQUAL_ATTRIBUTES
339
+ errors << [nodePath(a1.parent), "#{a1.name}=\"#{a1.value}\"", status, "#{a2.name}=\"#{a2.value}\"", nodePath(a2.parent)] if opts[:verbose]
340
+ end
341
+ else
342
+ status = UNEQUAL_ATTRIBUTES
343
+ errors << [nodePath(a1.parent), a1.name, status, a2.name, nodePath(a2.parent)] if opts[:verbose]
344
+ end
345
+ status
346
+ end
347
+
348
+
349
+ ##
350
+ # Determines if a node should be excluded from the comparison. When a node is excluded,
351
+ # it is completely ignored, as if it did not exist.
352
+ #
353
+ # Several types of nodes are considered ignored:
354
+ # - comments (only in +ignore_comments+ mode)
355
+ # - text nodes (only in +ignore_text_nodes+ mode OR when a text node is empty)
356
+ # - node matches a user-specified css rule from +ignore_comments+
357
+ #
358
+ # @param [Nokogiri::XML::Node] n node being tested for exclusion
359
+ # @param [Hash] opts user-overridden options
360
+ #
361
+ # @return true if excluded, false otherwise
362
+ #
363
+ def nodeExcluded?(n, opts)
364
+ return true if n.is_a?(Nokogiri::XML::Comment) && opts[:ignore_comments]
365
+ return true if n.is_a?(Nokogiri::XML::Text) && (opts[:ignore_text_nodes] || squeeze(n.content).empty?)
366
+ opts[:ignore_nodes].each do |css|
367
+ return true if n.xpath('../*').css(css).include?(n)
368
+ end
369
+ false
370
+ end
371
+
372
+
373
+ ##
374
+ # Checks whether two given attributes should be excluded, based on a user-specified css rule.
375
+ # If true, only the specified attributes are ignored; all remaining attributes are still compared.
376
+ # The CSS rule is used to locate the node that contains the attributes to be excluded.
377
+ # The CSS rule MUST contain the name of the attribute to be ignored.
378
+ #
379
+ # @param [Nokogiri::XML::Attr] a1 left attribute
380
+ # @param [Nokogiri::XML::Attr] a2 right attribute
381
+ # @param [Hash] opts user-overridden options
382
+ #
383
+ # @return true if excluded, false otherwise
384
+ #
385
+ def attrsExcluded?(a1, a2, opts)
386
+ opts[:ignore_attrs].each do |css|
387
+ if css.include?(a1.name) && css.include?(a2.name)
388
+ return true if a1.parent.xpath('../*').css(css).include?(a1.parent) && a2.parent.xpath('../*').css(css).include?(a2.parent)
389
+ end
390
+ end
391
+ false
392
+ end
393
+
394
+
395
+ ##
396
+ # Produces the hierarchical ancestral path of a node in the following format: <html:body:div(3):h2:b(2)>.
397
+ # This means that the element is located in:
398
+ #
399
+ # <html>
400
+ # <body>
401
+ # <div>...</div>
402
+ # <div>...</div>
403
+ # <div>
404
+ # <h2>
405
+ # <b>...</b>
406
+ # <b>TARGET</b>
407
+ # </h2>
408
+ # </div>
409
+ # </body>
410
+ # </html>
411
+ #
412
+ # Note that the counts of element locations only apply to elements of the same type. For example, div(3) means
413
+ # that it is the 3rd <div> element in the <body>, but there could be many other elements in between the three
414
+ # <div> elements.
415
+ #
416
+ # When +ignore_comments+ mode is disabled, mismatching comments will show up as <...:comment>.
417
+ #
418
+ # @param [Nokogiri::XML::Node] n node for which to determine a hierarchical path
419
+ #
420
+ # @return true if excluded, false otherwise
421
+ #
422
+ def nodePath(n)
423
+ name = n.name
424
+
425
+ # find the index of the node if there are several of the same type
426
+ siblings = n.xpath("../#{name}")
427
+ name += "(#{siblings.index(n) + 1})" if siblings.length > 1
428
+
429
+ if defined? n.parent
430
+ status = "#{nodePath(n.parent)}:#{name}"
431
+ status = status[1..-1] if status[0] == ':'
432
+ status
433
+ end
434
+ end
435
+
436
+
437
+ ##
438
+ # Strips the whitespace (from beginning and end) and squeezes it,
439
+ # i.e. multiple spaces, new lines and tabs are all squeezed to a single space.
440
+ #
441
+ # @param [String] text string to squeeze
442
+ #
443
+ # @return squeezed string
444
+ #
445
+ def squeeze(text)
446
+ text = text.to_s unless text.is_a? String
447
+ text.strip.gsub(/\s+/, ' ')
448
+ end
449
+
450
+ end
451
+
452
+ end
@@ -0,0 +1,3 @@
1
+ module CompareXML
2
+ VERSION = '0.5.1'
3
+ end
metadata ADDED
@@ -0,0 +1,99 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: compare-xml
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.5.1
5
+ platform: ruby
6
+ authors:
7
+ - Vadim Kononov
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-04-05 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '11.1'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '11.1'
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.6'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.6'
55
+ description: CompareXML is a fast, lightweight and feature-rich tool that will solve
56
+ your XML/HTML comparison or diffing needs. its purpose is to compare two instances
57
+ of Nokogiri::XML::Node or Nokogiri::XML::NodeSet for equality or equivalency.
58
+ email:
59
+ - vadim@poetic.com
60
+ executables: []
61
+ extensions: []
62
+ extra_rdoc_files: []
63
+ files:
64
+ - ".gitignore"
65
+ - Gemfile
66
+ - LICENSE.txt
67
+ - README.md
68
+ - Rakefile
69
+ - bin/console
70
+ - bin/setup
71
+ - compare-xml.gemspec
72
+ - lib/compare-xml.rb
73
+ - lib/compare-xml/version.rb
74
+ homepage: https://github.com/vkononov/compare-xml-xml
75
+ licenses:
76
+ - MIT
77
+ metadata: {}
78
+ post_install_message:
79
+ rdoc_options: []
80
+ require_paths:
81
+ - lib
82
+ required_ruby_version: !ruby/object:Gem::Requirement
83
+ requirements:
84
+ - - ">="
85
+ - !ruby/object:Gem::Version
86
+ version: '0'
87
+ required_rubygems_version: !ruby/object:Gem::Requirement
88
+ requirements:
89
+ - - ">="
90
+ - !ruby/object:Gem::Version
91
+ version: '0'
92
+ requirements: []
93
+ rubyforge_project:
94
+ rubygems_version: 2.5.2
95
+ signing_key:
96
+ specification_version: 4
97
+ summary: A customizable tool that compares two instances of Nokogiri::XML::Node for
98
+ equality or equivalency.
99
+ test_files: []