compare-xml 0.5.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: eb7ad2fa6ba45154479d1129ad6ce84337b331fa
4
+ data.tar.gz: eb3a5b0f7caedccb403c1a5504baa80b4e5ff7a8
5
+ SHA512:
6
+ metadata.gz: d46798f576e812ad39b3604c8b41f01d531f6490ba1690facb1cd978c200139d8b1e58a17b98a2d47fbbd76cc011487f9850bb68f5a5e28f5d3156d41a7c9f7c
7
+ data.tar.gz: 91993f87fca6eb40ec302bb598a88b85fa36e1876e1c9ef88396b2f52360e8bed89597400d9803ceaa7a455e8d5930750a9239d4610f607efe46f01d03c5dc31
@@ -0,0 +1,13 @@
1
+ *.DS_Store
2
+ *thumbs.db
3
+ /*.gem
4
+ /.bundle/
5
+ /.idea/
6
+ /.yardoc
7
+ /_yardoc/
8
+ /coverage/
9
+ /doc/
10
+ /Gemfile.lock
11
+ /pkg/
12
+ /spec/reports/
13
+ /tmp/
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in compare-xml-xml.gemspec
4
+ gemspec
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2016 Vadim Kononov
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,358 @@
1
+ # CompareXML
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/compare-xml.svg)](https://rubygems.org/gems/compare-xml)
4
+
5
+ CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet` for equality or equivalency.
6
+
7
+ **Features**
8
+
9
+ - Fast, light-weight and highly customizable
10
+ - Compares XML/HTML documents and document fragments
11
+ - Can produce both detailed diffing discrepancies or execute silently
12
+ - Has the ability to exclude specific nodes or attributes from all comparisons
13
+
14
+
15
+
16
+ ## Installation
17
+
18
+ Add this line to your application's Gemfile:
19
+
20
+ ```ruby
21
+ gem 'compare-xml'
22
+ ```
23
+
24
+ And then execute:
25
+
26
+ $ bundle
27
+
28
+ Or install it yourself as:
29
+
30
+ $ gem install compare-xml
31
+
32
+
33
+
34
+ ## Usage
35
+
36
+ Using CompareXML is as simple as
37
+
38
+ ```ruby
39
+ CompareXML.equivalent?(doc1, doc2)
40
+ ```
41
+
42
+ where `doc1` and `doc2` are instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet`.
43
+
44
+ **Example**
45
+
46
+ Suppose you have two files `1.html` and `2.html` that you would like to compare. You could do it as follows:
47
+
48
+ ```ruby
49
+ doc1 = Nokogiri::HTML(open('1.html'))
50
+ doc2 = Nokogiri::HTML(open('2.html'))
51
+ puts CompareXML.equivalent?(doc1, doc2)
52
+ ```
53
+
54
+ The above code will print `true` or `false` depending on the result of the comparison.
55
+
56
+ > If you are using CompareXML in a script, then you need to require it manually with:
57
+
58
+ ```ruby
59
+ require 'compare-xml'
60
+ ```
61
+
62
+
63
+ ## Options
64
+
65
+ CompareXML has a variety of options that can be invoked as an optional argument, e.g.:
66
+
67
+ ```ruby
68
+ CompareXML.equivalent?(doc1, doc2, {squeeze_whitespace: true, verbose: true})
69
+ ```
70
+
71
+
72
+ ----------
73
+
74
+
75
+ - ####`ignore_attr_order: {true|false}` default: **`true`**
76
+
77
+ When `true`, all attributes are sorted before comparison and only attributes of the same type are compared.
78
+
79
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_order: true})`
80
+
81
+ **Example:** When `true` the following HTML strings are considered equal:
82
+
83
+ <a href="/admin" class="button" target="_blank">Link</a>
84
+ <a class="button" target="_blank" href="/admin">Link</a>
85
+
86
+ **Example:** When `false` the above HTML strings are compared as follows:
87
+
88
+ href="admin" != class="button
89
+
90
+ The comparison of the `<a>` element will stop at this point, since a discrepancy is found.
91
+
92
+ **Example:** When `true` the following HTML strings are compared as follows:
93
+
94
+ <a href="/admin" class="button" target="_blank">Link</a>
95
+ <a class="button" target="_blank" href="/admin" rel="nofollow">Link</a>
96
+
97
+ class="button" == class="button"
98
+ href="/admin" == href="/admin"
99
+ =! rel="nofollow"
100
+ target="_blank" == target="_blank"
101
+
102
+
103
+ ----------
104
+
105
+
106
+ - ####`ignore_attrs: {css}` default: **`{}`**
107
+
108
+ When provided, ignores all **attributes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
109
+
110
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs: ['a[rel="nofollow"]', 'input[type="hidden"']})`
111
+
112
+ **Example:** With `ignore_attrs: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal:
113
+
114
+ <a href="/admin" class="button" target="_blank">Link</a>
115
+ <a href="/admin" class="button" target="_self" rel="nofollow">Link</a>
116
+
117
+ **Example:** With `ignore_attrs: ['a[href^="http"]', 'a[class*="button"]']` the following HTML strings are considered equal:
118
+
119
+ <a href="http://google.ca" class="primary button">Link</a>
120
+ <a href="https://google.com" class="primary button rounded">Link</a>
121
+
122
+
123
+ ----------
124
+
125
+
126
+ - ####`ignore_comments: {true|false}` default: **`true`**
127
+
128
+ When `true`, ignores comments, such as `<!-- This is a comment -->`.
129
+
130
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_comments: true})`
131
+
132
+ **Example:** When `true` the following HTML strings are considered equal:
133
+
134
+ <!-- This is a comment -->
135
+ <!-- This is another comment -->
136
+
137
+ **Example:** When `true` the following HTML strings are considered equal:
138
+
139
+ <a href="/admin"><!-- This is a comment -->Link</a>
140
+ <a href="/admin">Link</a>
141
+
142
+
143
+ ----------
144
+
145
+
146
+ - ####`ignore_nodes: {css}` default: **`{}`**
147
+
148
+ When provided, ignores all **nodes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
149
+
150
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_nodes: ['script', 'object']})`
151
+
152
+ **Example:** With `ignore_nodes: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal:
153
+
154
+ <a href="/admin" class="icon" target="_blank">Link 1</a>
155
+ <a href="/index" class="button" target="_self" rel="nofollow">Link 2</a>
156
+
157
+ **Example:** With `ignore_nodes: ['b', 'i']` the following HTML strings are considered equal:
158
+
159
+ <a href="/admin"><i class"icon bulb"></i><b>Warning:</b> Link</a>
160
+ <a href="/admin"><i class"icon info"></i><b>Message:</b> Link</a>
161
+
162
+
163
+ ----------
164
+
165
+
166
+ - ####`ignore_text_nodes: {true|false}` default: **`false`**
167
+
168
+ When `true`, ignores all text content. Text content is anything that is included between an opening and a closing tag, e.g. `<tag>THIS IS TEXT CONTENT</tag>`.
169
+
170
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_text_nodes: true})`
171
+
172
+ **Example:** When `true` the following HTML strings are considered equal:
173
+
174
+ <a href="/admin">SOME TEXT CONTENT</a>
175
+ <a href="/admin">DIFFERENT TEXT CONTENT</a>
176
+
177
+ **Example:** When `true` the following HTML strings are considered equal:
178
+
179
+ <i class="icon></i> <b>Warning:</b>
180
+ <i class="icon> </i> <b>Message:</b>
181
+
182
+
183
+ ----------
184
+
185
+
186
+ - ####`squeeze_whitespace: {true|false}` default: **`true`**
187
+
188
+ When `true`, all text content within the document is trimmed (i.e. space removed from left and right) and whitespace is squeezed (i.e. tabs, new lines, multiple whitespaces are all replaced by a single whitespace).
189
+
190
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {squeeze_whitespace: true})`
191
+
192
+ **Example:** When `true` the following HTML strings are considered equal:
193
+
194
+ <a href="/admin"> SOME TEXT CONTENT </a>
195
+ <a href="/index"> SOME TEXT CONTENT </a>
196
+
197
+ **Example:** When `true` the following HTML strings are considered equal:
198
+
199
+ <html>
200
+ <title>
201
+ This is my title
202
+ </title>
203
+ </html>
204
+
205
+ <html><title>This is my title</title></html>
206
+
207
+
208
+ ----------
209
+
210
+
211
+ - ####`verbose: {true|false}` default: **`false`**
212
+
213
+ When `true`, instead of returning a boolean value `CompareXML.equivalent?` returns an array of all errors encountered when performing a comparison.
214
+
215
+ > **Warning:** When `true`, the comparison takes longer! Not only because more processing is required to produce meaningful error messages, but also because in this mode, comparison does **NOT** stop when a first error is encountered, because the goal is to capture as many discrepancies as possible.
216
+
217
+ **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {verbose: true})`
218
+
219
+ **Example:** When `true` given the following HTML strings:
220
+
221
+ <!DOCTYPE html>
222
+ <html lang="en">
223
+ <head><title>TITLE</title></head>
224
+ <body>
225
+ <h1>SOME HEADING</h1>
226
+ <div id="content">
227
+ <h2><i class="fa fa-cogs"></i> ANOTHER HEADING</h2>
228
+ <p>Extra content</p>
229
+ </div>
230
+ <div class="window">
231
+ <a href="/admin" rel="icon">Link</a>
232
+ </div>
233
+ <blockquote>Some fancy quote <cite>Author Name</cite></blockquote>
234
+ <p>Some more text</p>
235
+ <p>Yet more text</p>
236
+ <p>Too much text</p>
237
+ <!-- The footer is below -->
238
+ <p class="footer">FOOTER</p>
239
+ </body>
240
+ </html>
241
+
242
+ <!DOCTYPE html>
243
+ <html lang="en">
244
+ <head><title>ANOTHER TITLE</title></head>
245
+ <body>
246
+ <h1 id="main">SOME HEADING</h1>
247
+ <div id="content">
248
+ <h2><i class="fa fa-cogs"></i> ANOTHER HEADING</h2>
249
+ <p>Extra content</p>
250
+ </div>
251
+ <div class="window">
252
+ <a rel="button" href="/admin">Link</a>
253
+ </div>
254
+ <blockquote>Some fancy quote</blockquote>
255
+ <p>Some more text</p>
256
+ <p>Yet more text</p>
257
+ <p>Too much text</p>
258
+ <!-- This is the footer -->
259
+ <div class="footer">FOOTER</div>
260
+ </body>
261
+ </html>
262
+
263
+ `CompareXML.equivalent?(doc1, doc2, {verbose: true})` will produce an array shown below.
264
+
265
+ [
266
+ "html:head:title",
267
+ "TITLE",
268
+ 10,
269
+ "ANOTHER TITLE",
270
+ "html:head:title"
271
+ ],
272
+ [
273
+ "html:body:h1",
274
+ nil,
275
+ 2,
276
+ "id=\"main\"",
277
+ "html:body:h1"
278
+ ],
279
+ [
280
+ "html:body:div(2):a",
281
+ "rel=\"button\"",
282
+ 4,
283
+ "rel=\"icon\"",
284
+ "html:body:div(2):a"
285
+ ],
286
+ [
287
+ "html:body:blockquote:cite",
288
+ "cite",
289
+ 3,
290
+ nil,
291
+ "html:body:blockquote:cite"
292
+ ],
293
+ [
294
+ "html:body:p(4)",
295
+ "p",
296
+ 8,
297
+ "div",
298
+ "html:body:div(3)"
299
+ ]
300
+
301
+ The structure of the array is as follows:
302
+
303
+ [left_node_location, left_content, error_code, right_content, right_node_location]
304
+
305
+ **Node location** of `html:body:p(4)` means that the element in question is `<p>`, its hierarchical ancestors are `html > body`, and it is the **4th** `<p>` tag. That is, it could be found in
306
+
307
+ <html><body><p>one</p>...<p>two</p>...<p>three</p>...<p>TARGET</p></body></html>
308
+
309
+ > **Note:** `p(4)` means that it is the fourth tag of type `<p>`, but there could be many other tags of other types between `p(3)` and `p(4)`.
310
+
311
+ **Node content** displays the discrepancy in content (which could be the name of the tag, attributes, text content, comments, etc)
312
+
313
+ **Error code** is a numeric value that indicates the type of a discrepancy. CompareXML implements the following error codes
314
+
315
+ ```ruby
316
+ EQUIVALENT = 1 # nodes are equal (for internal use only)
317
+ MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart
318
+ MISSING_NODE = 3 # node is missing its counterpart
319
+ UNEQUAL_ATTRIBUTES = 4 # attributes are not equal
320
+ UNEQUAL_COMMENTS = 5 # comment contents are not equal
321
+ UNEQUAL_DOCUMENTS = 6 # document types are not equal
322
+ UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal
323
+ UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type
324
+ UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal
325
+ ```
326
+
327
+ Here is an example of how these could be used:
328
+
329
+ ```ruby
330
+ case error_code
331
+ when CompareXML::UNEQUAL_ATTRIBUTES
332
+ '!='
333
+ when CompareXML::MISSING_ATTRIBUTE
334
+ '?'
335
+ end
336
+ ```
337
+
338
+
339
+
340
+ ## Contributing
341
+
342
+ 1. Fork it
343
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
344
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
345
+ 4. Push to the branch (`git push origin my-new-feature`)
346
+ 5. Create new Pull Request
347
+
348
+
349
+
350
+ ## Credits
351
+
352
+ This gem was inspired by [Michael B. Klein](https://github.com/mbklein)'s gem [`equivalent-xml`](https://github.com/mbklein/equivalent-xml) - another excellent tool for XML comparison.
353
+
354
+
355
+
356
+ ## License
357
+
358
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
@@ -0,0 +1,2 @@
1
+ require 'bundler/gem_tasks'
2
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'compare-xml/xml'
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require 'pry'
11
+ # Pry.start
12
+
13
+ require 'irb'
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,25 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'compare-xml/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = 'compare-xml'
8
+ spec.version = CompareXML::VERSION
9
+ spec.authors = ['Vadim Kononov']
10
+ spec.email = ['vadim@poetic.com']
11
+
12
+ spec.summary = %q{A customizable tool that compares two instances of Nokogiri::XML::Node for equality or equivalency.}
13
+ spec.description = %q{CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of Nokogiri::XML::Node or Nokogiri::XML::NodeSet for equality or equivalency.}
14
+ spec.homepage = 'https://github.com/vkononov/compare-xml-xml'
15
+ spec.license = 'MIT'
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
+ spec.bindir = 'exe'
19
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
+ spec.require_paths = ['lib']
21
+
22
+ spec.add_development_dependency 'bundler', '~> 1.11'
23
+ spec.add_development_dependency 'rake', '~> 11.1'
24
+ spec.add_runtime_dependency 'nokogiri', '~> 1.6'
25
+ end
@@ -0,0 +1,452 @@
1
+ require 'compare-xml/version'
2
+ require 'nokogiri'
3
+
4
+ module CompareXML
5
+
6
+ # default options used by the module; all of these can be overridden
7
+ DEFAULTS_OPTS = {
8
+ # when true, attribute order is not important (all attributes are sorted before comparison)
9
+ # when false, attributes are compared in order and comparison stops on the first mismatch
10
+ ignore_attr_order: true,
11
+
12
+ # contains an array of user-specified CSS rules used to perform attribute exclusions
13
+ # for this to work, a CSS rule MUST contain the attribute to be excluded,
14
+ # i.e. a[href] will exclude all "href" attributes contained in <a> tags.
15
+ ignore_attrs: {},
16
+
17
+ # when true ignores XML and HTML comments
18
+ # when false, all comments are compared to their counterparts
19
+ ignore_comments: true,
20
+
21
+ # contains an array of user-specified CSS rules used to perform node exclusions
22
+ ignore_nodes: {},
23
+
24
+ # when true, ignores all text nodes (although blank text nodes are always ignored)
25
+ # when false, all text nodes are compared to their counterparts (except the empty ones)
26
+ ignore_text_nodes: false,
27
+
28
+ # when true, trims and squeezes whitespace in text nodes and comments to a single space
29
+ # when false, all whitespace is preserved as it is without any changes
30
+ squeeze_whitespace: true,
31
+
32
+ # when true, provides a list of all error messages encountered in comparisons
33
+ # when false, execution stops when the first error is encountered with no error messages
34
+ verbose: false
35
+ }
36
+
37
+ # used internally only in order to differentiate equivalence for inequivalence
38
+ EQUIVALENT = 1
39
+
40
+ # a list of all possible inequivalence types for nodes
41
+ # these are returned in the errors array to differentiate error types.
42
+ MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart
43
+ MISSING_NODE = 3 # node is missing its counterpart
44
+ UNEQUAL_ATTRIBUTES = 4 # attributes are not equal
45
+ UNEQUAL_COMMENTS = 5 # comment contents are not equal
46
+ UNEQUAL_DOCUMENTS = 6 # document types are not equal
47
+ UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal
48
+ UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type
49
+ UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal
50
+
51
+
52
+ class << self
53
+
54
+ ##
55
+ # Determines whether two XML documents or fragments are equal to each other.
56
+ # The two parameters could be any type of XML documents, or fragments
57
+ # or node sets or even text nodes - any subclass of Nokogiri::XML::Node.
58
+ #
59
+ # @param [Nokogiri::XML::Node] n1 left attribute
60
+ # @param [Nokogiri::XML::Node] n2 right attribute
61
+ # @param [Hash] opts user-overridden options
62
+ #
63
+ # @return true if equal, [Array] errors otherwise
64
+ #
65
+ def equivalent?(n1, n2, opts = {})
66
+ opts, errors = DEFAULTS_OPTS.merge(opts), []
67
+ result = compareNodes(n1, n2, opts, errors)
68
+ opts[:verbose] ? errors : result == EQUIVALENT
69
+ end
70
+
71
+
72
+ private
73
+
74
+ ##
75
+ # Compares two nodes for equivalence. The nodes could be any subclass
76
+ # of Nokogiri::XML::Node including node sets and document fragments.
77
+ #
78
+ # @param [Nokogiri::XML::Node] n1 left attribute
79
+ # @param [Nokogiri::XML::Node] n2 right attribute
80
+ # @param [Hash] opts user-overridden options
81
+ # @param [Array] errors inequivalence messages
82
+ #
83
+ # @return type of equivalence (from equivalence constants)
84
+ #
85
+ def compareNodes(n1, n2, opts, errors, status = EQUIVALENT)
86
+ if n1.class == n2.class
87
+ case n1
88
+ when Nokogiri::XML::Comment
89
+ compareCommentNodes(n1, n2, opts, errors)
90
+ when Nokogiri::HTML::Document
91
+ compareDocumentNodes(n1, n2, opts, errors)
92
+ when Nokogiri::XML::Element
93
+ status = compareElementNodes(n1, n2, opts, errors)
94
+ when Nokogiri::XML::Text
95
+ status = compareTextNodes(n1, n2, opts, errors)
96
+ else
97
+ status = compareChildren(n1.children, n2.children, opts, errors)
98
+ end
99
+ elsif n1.nil?
100
+ status = MISSING_NODE
101
+ errors << [nodePath(n2), nil, status, n2.name, nodePath(n2)] if opts[:verbose]
102
+ elsif n2.nil?
103
+ status = MISSING_NODE
104
+ errors << [nodePath(n1), n1.name, status, nil, nodePath(n1)] if opts[:verbose]
105
+ else
106
+ status = UNEQUAL_NODES_TYPES
107
+ errors << [nodePath(n1), n1.class, status, n2.class, nodePath(n2)] if opts[:verbose]
108
+ end
109
+ status
110
+ end
111
+
112
+
113
+ ##
114
+ # Compares two nodes of type Nokogiri::HTML::Comment.
115
+ #
116
+ # @param [Nokogiri::XML::Comment] n1 left attribute
117
+ # @param [Nokogiri::XML::Comment] n2 right attribute
118
+ # @param [Hash] opts user-overridden options
119
+ # @param [Array] errors inequivalence messages
120
+ #
121
+ # @return type of equivalence (from equivalence constants)
122
+ #
123
+ def compareCommentNodes(n1, n2, opts, errors, status = EQUIVALENT)
124
+ return true if opts[:ignore_comments]
125
+ t1, t2 = n1.content, n2.content
126
+ t1, t2 = squeeze(t1), squeeze(t2) if opts[:squeeze_whitespace]
127
+ unless t1 == t2
128
+ status = UNEQUAL_COMMENTS
129
+ errors << [nodePath(n1.parent), t1, status, t2, nodePath(n2.parent)] if opts[:verbose]
130
+ end
131
+ status
132
+ end
133
+
134
+
135
+ ##
136
+ # Compares two nodes of type Nokogiri::HTML::Document.
137
+ #
138
+ # @param [Nokogiri::XML::Document] n1 left attribute
139
+ # @param [Nokogiri::XML::Document] n2 right attribute
140
+ # @param [Hash] opts user-overridden options
141
+ # @param [Array] errors inequivalence messages
142
+ #
143
+ # @return type of equivalence (from equivalence constants)
144
+ #
145
+ def compareDocumentNodes(n1, n2, opts, errors, status = EQUIVALENT)
146
+ if n1.name == n2.name
147
+ status = compareChildren(n1.children, n2.children, opts, errors)
148
+ else
149
+ status == UNEQUAL_DOCUMENTS
150
+ errors << [nodePath(n1), n1, status, n2, nodePath(n2)] if opts[:verbose]
151
+ end
152
+ status
153
+ end
154
+
155
+
156
+ ##
157
+ # Compares two sets of Nokogiri::XML::NodeSet elements.
158
+ #
159
+ # @param [Nokogiri::XML::NodeSet] n1_set left set of Nokogiri::XML::Node elements
160
+ # @param [Nokogiri::XML::NodeSet] n2_set right set of Nokogiri::XML::Node elements
161
+ # @param [Hash] opts user-overridden options
162
+ # @param [Array] errors inequivalence messages
163
+ #
164
+ # @return type of equivalence (from equivalence constants)
165
+ #
166
+ def compareChildren(n1_set, n2_set, opts, errors, status = EQUIVALENT)
167
+ i = 0; j = 0
168
+ while i < n1_set.length || j < n2_set.length
169
+ if !n1_set[i].nil? && nodeExcluded?(n1_set[i], opts)
170
+ i += 1 # increment counter if left node is excluded
171
+ elsif !n2_set[j].nil? && nodeExcluded?(n2_set[j], opts)
172
+ j += 1 # increment counter if right node is excluded
173
+ else
174
+ result = compareNodes(n1_set[i], n2_set[j], opts, errors)
175
+ status = result unless result == EQUIVALENT
176
+
177
+ # return false so that this subtree could halt comparison on error
178
+ # but neighbours of parents' subtrees could still be compared (in verbose mode)
179
+ return false if status == UNEQUAL_NODES_TYPES || status == UNEQUAL_ELEMENTS
180
+
181
+ # stop execution if a single error is found (unless in verbose mode)
182
+ break unless status == EQUIVALENT || opts[:verbose]
183
+
184
+ # increment both counters when both nodes have been compared
185
+ i += 1; j += 1
186
+ end
187
+ status
188
+ end
189
+ end
190
+
191
+
192
+ ##
193
+ # Compares two nodes of type Nokogiri::XML::Element.
194
+ # - compares element attributes
195
+ # - recursively compares element children
196
+ #
197
+ # @param [Nokogiri::XML::Element] n1 left attribute
198
+ # @param [Nokogiri::XML::Element] n2 right attribute
199
+ # @param [Hash] opts user-overridden options
200
+ # @param [Array] errors inequivalence messages
201
+ #
202
+ # @return type of equivalence (from equivalence constants)
203
+ #
204
+ def compareElementNodes(n1, n2, opts, errors, status = EQUIVALENT)
205
+ if n1.name == n2.name
206
+ result = compareAttributeSets(n1.attribute_nodes, n2.attribute_nodes, opts, errors)
207
+ status = result unless result == EQUIVALENT
208
+ result = compareChildren(n1.children, n2.children, opts, errors)
209
+ status = result unless result == EQUIVALENT
210
+ else
211
+ status = UNEQUAL_ELEMENTS
212
+ errors << [nodePath(n1), n1.name, status, n2.name, nodePath(n2)] if opts[:verbose]
213
+ end
214
+ status
215
+ end
216
+
217
+
218
+ ##
219
+ # Compares two nodes of type Nokogiri::XML::Text.
220
+ #
221
+ # @param [Nokogiri::XML::Text] n1 left attribute
222
+ # @param [Nokogiri::XML::Text] n2 right attribute
223
+ # @param [Hash] opts user-overridden options
224
+ # @param [Array] errors inequivalence messages
225
+ #
226
+ # @return type of equivalence (from equivalence constants)
227
+ #
228
+ def compareTextNodes(n1, n2, opts, errors, status = EQUIVALENT)
229
+ return true if opts[:ignore_text_nodes]
230
+ t1, t2 = n1.content, n2.content
231
+ t1, t2 = squeeze(t1), squeeze(t2) if opts[:squeeze_whitespace]
232
+ unless t1 == t2
233
+ status = UNEQUAL_TEXT_CONTENTS
234
+ errors << [nodePath(n1.parent), t1, status, t2, nodePath(n2.parent)] if opts[:verbose]
235
+ end
236
+ status
237
+ end
238
+
239
+
240
+ ##
241
+ # Compares two sets of Nokogiri::XML::Node attributes.
242
+ #
243
+ # @param [Array] a1_set left attribute set
244
+ # @param [Array] a2_set right attribute set
245
+ # @param [Hash] opts user-overridden options
246
+ # @param [Array] errors inequivalence messages
247
+ #
248
+ # @return type of equivalence (from equivalence constants)
249
+ #
250
+ def compareAttributeSets(a1_set, a2_set, opts, errors)
251
+ return false unless a1_set.length == a2_set.length || opts[:verbose]
252
+ if opts[:ignore_attr_order]
253
+ compareSortedAttributeSets(a1_set, a2_set, opts, errors)
254
+ else
255
+ compareUnsortedAttributeSets(a1_set, a2_set, opts, errors)
256
+ end
257
+ end
258
+
259
+
260
+ ##
261
+ # Compares two sets of Nokogiri::XML::Node attributes by sorting them first.
262
+ # When the attributes are sorted, only attributes of the same type are compared
263
+ # to each other, and missing attributes can be easily detected.
264
+ #
265
+ # @param [Array] a1_set left attribute set
266
+ # @param [Array] a2_set right attribute set
267
+ # @param [Hash] opts user-overridden options
268
+ # @param [Array] errors inequivalence messages
269
+ #
270
+ # @return type of equivalence (from equivalence constants)
271
+ #
272
+ def compareSortedAttributeSets(a1_set, a2_set, opts, errors, status = EQUIVALENT)
273
+ a1_set, a2_set = a1_set.sort_by { |a| a.name }, a2_set.sort_by { |a| a.name }
274
+ i = j = 0
275
+
276
+ while i < a1_set.length || j < a2_set.length
277
+ if a1_set[i].nil?
278
+ result = compareAttributes(nil, a2_set[j], opts, errors); j += 1
279
+ elsif a2_set[j].nil?
280
+ result = compareAttributes(a1_set[i], nil, opts, errors); i += 1
281
+ elsif a1_set[i].name < a2_set[j].name
282
+ result = compareAttributes(a1_set[i], nil, opts, errors); i += 1
283
+ elsif a1_set[i].name > a2_set[j].name
284
+ result = compareAttributes(nil, a2_set[j], opts, errors); j += 1
285
+ else
286
+ result = compareAttributes(a1_set[i], a2_set[j], opts, errors); i += 1; j += 1
287
+ end
288
+ status = result unless result == EQUIVALENT
289
+ break unless status == EQUIVALENT || opts[:verbose]
290
+ end
291
+ status
292
+ end
293
+
294
+
295
+ ##
296
+ # Compares two sets of Nokogiri::XML::Node attributes without sorting them.
297
+ # As a result attributes of different types may be compared, and even if all
298
+ # attributes are identical in both sets, if their order is different,
299
+ # the comparison will stop as soon two unequal attributes are found.
300
+ #
301
+ # @param [Array] a1_set left attribute set
302
+ # @param [Array] a2_set right attribute set
303
+ # @param [Hash] opts user-overridden options
304
+ # @param [Array] errors inequivalence messages
305
+ #
306
+ # @return type of equivalence (from equivalence constants)
307
+ #
308
+ def compareUnsortedAttributeSets(a1_set, a2_set, opts, errors, status = EQUIVALENT)
309
+ [a1_set.length, a2_set.length].max.times do |i|
310
+ result = compareAttributes(a1_set[i], a2_set[i], opts, errors)
311
+ status = result unless result == EQUIVALENT
312
+ break unless status == EQUIVALENT
313
+ end
314
+ status
315
+ end
316
+
317
+
318
+ ##
319
+ # Compares two attributes by name and value.
320
+ #
321
+ # @param [Nokogiri::XML::Attr] a1 left attribute
322
+ # @param [Nokogiri::XML::Attr] a2 right attribute
323
+ # @param [Hash] opts user-overridden options
324
+ # @param [Array] errors inequivalence messages
325
+ #
326
+ # @return type of equivalence (from equivalence constants)
327
+ #
328
+ def compareAttributes(a1, a2, opts, errors, status = EQUIVALENT)
329
+ if a1.nil?
330
+ status = MISSING_ATTRIBUTE
331
+ errors << [nodePath(a2.parent), nil, status, "#{a2.name}=\"#{a2.value}\"", nodePath(a2.parent)] if opts[:verbose]
332
+ elsif a2.nil?
333
+ status = MISSING_ATTRIBUTE
334
+ errors << [nodePath(a1.parent), "#{a1.name}=\"#{a1.value}\"", status, nil, nodePath(a1.parent)] if opts[:verbose]
335
+ elsif a1.name == a2.name
336
+ return status if attrsExcluded?(a1, a2, opts)
337
+ if a1.value != a2.value
338
+ status = UNEQUAL_ATTRIBUTES
339
+ errors << [nodePath(a1.parent), "#{a1.name}=\"#{a1.value}\"", status, "#{a2.name}=\"#{a2.value}\"", nodePath(a2.parent)] if opts[:verbose]
340
+ end
341
+ else
342
+ status = UNEQUAL_ATTRIBUTES
343
+ errors << [nodePath(a1.parent), a1.name, status, a2.name, nodePath(a2.parent)] if opts[:verbose]
344
+ end
345
+ status
346
+ end
347
+
348
+
349
+ ##
350
+ # Determines if a node should be excluded from the comparison. When a node is excluded,
351
+ # it is completely ignored, as if it did not exist.
352
+ #
353
+ # Several types of nodes are considered ignored:
354
+ # - comments (only in +ignore_comments+ mode)
355
+ # - text nodes (only in +ignore_text_nodes+ mode OR when a text node is empty)
356
+ # - node matches a user-specified css rule from +ignore_comments+
357
+ #
358
+ # @param [Nokogiri::XML::Node] n node being tested for exclusion
359
+ # @param [Hash] opts user-overridden options
360
+ #
361
+ # @return true if excluded, false otherwise
362
+ #
363
+ def nodeExcluded?(n, opts)
364
+ return true if n.is_a?(Nokogiri::XML::Comment) && opts[:ignore_comments]
365
+ return true if n.is_a?(Nokogiri::XML::Text) && (opts[:ignore_text_nodes] || squeeze(n.content).empty?)
366
+ opts[:ignore_nodes].each do |css|
367
+ return true if n.xpath('../*').css(css).include?(n)
368
+ end
369
+ false
370
+ end
371
+
372
+
373
+ ##
374
+ # Checks whether two given attributes should be excluded, based on a user-specified css rule.
375
+ # If true, only the specified attributes are ignored; all remaining attributes are still compared.
376
+ # The CSS rule is used to locate the node that contains the attributes to be excluded.
377
+ # The CSS rule MUST contain the name of the attribute to be ignored.
378
+ #
379
+ # @param [Nokogiri::XML::Attr] a1 left attribute
380
+ # @param [Nokogiri::XML::Attr] a2 right attribute
381
+ # @param [Hash] opts user-overridden options
382
+ #
383
+ # @return true if excluded, false otherwise
384
+ #
385
+ def attrsExcluded?(a1, a2, opts)
386
+ opts[:ignore_attrs].each do |css|
387
+ if css.include?(a1.name) && css.include?(a2.name)
388
+ return true if a1.parent.xpath('../*').css(css).include?(a1.parent) && a2.parent.xpath('../*').css(css).include?(a2.parent)
389
+ end
390
+ end
391
+ false
392
+ end
393
+
394
+
395
+ ##
396
+ # Produces the hierarchical ancestral path of a node in the following format: <html:body:div(3):h2:b(2)>.
397
+ # This means that the element is located in:
398
+ #
399
+ # <html>
400
+ # <body>
401
+ # <div>...</div>
402
+ # <div>...</div>
403
+ # <div>
404
+ # <h2>
405
+ # <b>...</b>
406
+ # <b>TARGET</b>
407
+ # </h2>
408
+ # </div>
409
+ # </body>
410
+ # </html>
411
+ #
412
+ # Note that the counts of element locations only apply to elements of the same type. For example, div(3) means
413
+ # that it is the 3rd <div> element in the <body>, but there could be many other elements in between the three
414
+ # <div> elements.
415
+ #
416
+ # When +ignore_comments+ mode is disabled, mismatching comments will show up as <...:comment>.
417
+ #
418
+ # @param [Nokogiri::XML::Node] n node for which to determine a hierarchical path
419
+ #
420
+ # @return true if excluded, false otherwise
421
+ #
422
+ def nodePath(n)
423
+ name = n.name
424
+
425
+ # find the index of the node if there are several of the same type
426
+ siblings = n.xpath("../#{name}")
427
+ name += "(#{siblings.index(n) + 1})" if siblings.length > 1
428
+
429
+ if defined? n.parent
430
+ status = "#{nodePath(n.parent)}:#{name}"
431
+ status = status[1..-1] if status[0] == ':'
432
+ status
433
+ end
434
+ end
435
+
436
+
437
+ ##
438
+ # Strips the whitespace (from beginning and end) and squeezes it,
439
+ # i.e. multiple spaces, new lines and tabs are all squeezed to a single space.
440
+ #
441
+ # @param [String] text string to squeeze
442
+ #
443
+ # @return squeezed string
444
+ #
445
+ def squeeze(text)
446
+ text = text.to_s unless text.is_a? String
447
+ text.strip.gsub(/\s+/, ' ')
448
+ end
449
+
450
+ end
451
+
452
+ end
@@ -0,0 +1,3 @@
1
+ module CompareXML
2
+ VERSION = '0.5.1'
3
+ end
metadata ADDED
@@ -0,0 +1,99 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: compare-xml
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.5.1
5
+ platform: ruby
6
+ authors:
7
+ - Vadim Kononov
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-04-05 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '11.1'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '11.1'
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.6'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.6'
55
+ description: CompareXML is a fast, lightweight and feature-rich tool that will solve
56
+ your XML/HTML comparison or diffing needs. its purpose is to compare two instances
57
+ of Nokogiri::XML::Node or Nokogiri::XML::NodeSet for equality or equivalency.
58
+ email:
59
+ - vadim@poetic.com
60
+ executables: []
61
+ extensions: []
62
+ extra_rdoc_files: []
63
+ files:
64
+ - ".gitignore"
65
+ - Gemfile
66
+ - LICENSE.txt
67
+ - README.md
68
+ - Rakefile
69
+ - bin/console
70
+ - bin/setup
71
+ - compare-xml.gemspec
72
+ - lib/compare-xml.rb
73
+ - lib/compare-xml/version.rb
74
+ homepage: https://github.com/vkononov/compare-xml-xml
75
+ licenses:
76
+ - MIT
77
+ metadata: {}
78
+ post_install_message:
79
+ rdoc_options: []
80
+ require_paths:
81
+ - lib
82
+ required_ruby_version: !ruby/object:Gem::Requirement
83
+ requirements:
84
+ - - ">="
85
+ - !ruby/object:Gem::Version
86
+ version: '0'
87
+ required_rubygems_version: !ruby/object:Gem::Requirement
88
+ requirements:
89
+ - - ">="
90
+ - !ruby/object:Gem::Version
91
+ version: '0'
92
+ requirements: []
93
+ rubyforge_project:
94
+ rubygems_version: 2.5.2
95
+ signing_key:
96
+ specification_version: 4
97
+ summary: A customizable tool that compares two instances of Nokogiri::XML::Node for
98
+ equality or equivalency.
99
+ test_files: []