saxy 0.5.2 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +22 -0
- data/README.md +51 -6
- data/lib/saxy/parser.rb +22 -7
- data/lib/saxy/parsing_error.rb +8 -2
- data/lib/saxy/version.rb +1 -1
- data/spec/fixtures/invalid.xml +9 -0
- data/spec/saxy/parser_spec.rb +15 -3
- data/spec/spec_helper.rb +1 -1
- data/spec/{fixtures → support}/io_like.rb +0 -0
- metadata +7 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d90158e708f84a77ddd18b19c881fa528a839f28
|
4
|
+
data.tar.gz: 14e88af956179b23fff98fc80a2dac03cd36efa2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d1380012654abe0e6c51095220d0be5349666de298294addbd81ae1954842249cea8e055514c3583eaa285834eb5ee9ac181b632647bc30d1fa8ebcfd6d049d9
|
7
|
+
data.tar.gz: 4bb6c121dd6543cb736d02278ee752eba2bf9a2e6105cdaf3bdff7fa617facda2738ba697de4626135a09984f77a100a4d3ccd7082c25a1ea880cf87aee1a127
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
# Saxy Changelog
|
2
|
+
|
3
|
+
## 0.6.0
|
4
|
+
|
5
|
+
* [BREAKING] `Saxy::ParsingError` now inherits from `StandardError`, not `Exception`.
|
6
|
+
* [BREAKING] Forced encoding is now an option instead of third argument of `Saxy.parse` method.
|
7
|
+
* Added `recovery` and `replace_entities` options that are internally passed to `Nokogiri::XML::SAX::ParserContext`
|
8
|
+
* Added `context` method to `Saxy::ParsingError` that holds parser context at the time of error.
|
9
|
+
|
10
|
+
## 0.5.2
|
11
|
+
|
12
|
+
* Added optional `encoding` argument to `Saxy.parse`
|
13
|
+
|
14
|
+
## 0.5.1
|
15
|
+
|
16
|
+
* Removed `activesupport` dependency
|
17
|
+
|
18
|
+
## 0.5.0
|
19
|
+
|
20
|
+
* [BREAKING] Dropped support for ruby 1.9.2 and lower
|
21
|
+
* [BREAKING] Yields hashes instead of `OpenStruct`s
|
22
|
+
* Added support for `IO`-like objects
|
data/README.md
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
|
6
6
|
Memory-efficient XML parser. Finds object definitions in XML and translates them into Ruby objects.
|
7
7
|
|
8
|
-
It uses SAX parser under the hood, which means that it doesn't load the whole XML file into memory. It goes once through it and yields objects along the way.
|
8
|
+
It uses SAX parser (provided by Nokogiri gem) under the hood, which means that it doesn't load the whole XML file into memory. It goes once through it and yields objects along the way.
|
9
9
|
|
10
10
|
In result the memory footprint of the parser remains small and more or less constant irrespective of the size of the XML file, be it few KB or hundreds of GB.
|
11
11
|
|
@@ -23,24 +23,51 @@ Or install it yourself as:
|
|
23
23
|
|
24
24
|
$ gem install saxy
|
25
25
|
|
26
|
+
## Requirements
|
27
|
+
|
26
28
|
As of `0.5.0` version `saxy` requires ruby 1.9.3 or higher. Previous versions of the gem work with ruby 1.8 and 1.9.2 (see below), but they are not maintained anymore.
|
27
29
|
|
28
|
-
|
30
|
+
### Ruby 1.8 support
|
29
31
|
|
30
32
|
See `ruby-1.8` branch. Install with:
|
31
33
|
|
32
34
|
gem 'saxy', '~> 0.3.0'
|
33
35
|
|
34
|
-
|
36
|
+
### Ruby 1.9.2 support
|
35
37
|
|
36
38
|
See `ruby-1.9.2` branch. Install with:
|
37
39
|
|
38
40
|
gem 'saxy', '~> 0.4.0'
|
39
41
|
|
42
|
+
## Changelog
|
43
|
+
|
44
|
+
See `CHANGELOG.md` file.
|
40
45
|
|
41
46
|
## Usage
|
42
47
|
|
43
|
-
|
48
|
+
You instantiate the parser by passing path to XML file or an IO-like object, object-identifying tag name and options hash (optionally) as its arguments.
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
parser = Saxy.parse(path_or_io, object_tag, options = {})
|
52
|
+
```
|
53
|
+
|
54
|
+
Then iterate over it using `each` (or any of convenient methods provided by `Enumerable` mix-in).
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
parser.each do |object|
|
58
|
+
...
|
59
|
+
end
|
60
|
+
```
|
61
|
+
|
62
|
+
### Options
|
63
|
+
|
64
|
+
* `encoding` - Forces the parser to work in given encoding
|
65
|
+
* `recovery` - Should this parser recover from structural errors? It will not stop processing file on structural errors if set to `true`.
|
66
|
+
* `replace_entities` - Should this parser replace entities? `&` will get converted to `&` if set to `true`.
|
67
|
+
|
68
|
+
## Example
|
69
|
+
|
70
|
+
Assume the XML file (an imaginary product feed):
|
44
71
|
|
45
72
|
````xml
|
46
73
|
<?xml version='1.0' encoding='UTF-8'?>
|
@@ -63,8 +90,6 @@ Assume the XML file:
|
|
63
90
|
</webstore>
|
64
91
|
````
|
65
92
|
|
66
|
-
You instantiate the parser by passing path to XML file or an IO-like object and object-identyfing tag name as its arguments.
|
67
|
-
|
68
93
|
The following will parse the XML, find product definitions (inside `<product>` and `</product>` tags), build `Hash`es and yield them inside the block.
|
69
94
|
|
70
95
|
Usage with a file path:
|
@@ -119,6 +144,18 @@ webstore = Saxy.parse("filename.xml", "webstore").first
|
|
119
144
|
webstore[:products][:product].size # => 2
|
120
145
|
````
|
121
146
|
|
147
|
+
## Debugging
|
148
|
+
|
149
|
+
Invalid XML files happen a lot and error messages are not always extremely helpful. In case of a parsing error, some additional information can be retrieved from parser's context.
|
150
|
+
|
151
|
+
```ruby
|
152
|
+
begin
|
153
|
+
Saxy.parse(...) { ... }
|
154
|
+
rescue e => Saxy::ParsingError
|
155
|
+
puts "#{e.message} at #{e.context.line} line and #{e.context.column}"
|
156
|
+
end
|
157
|
+
```
|
158
|
+
|
122
159
|
## Contributing
|
123
160
|
|
124
161
|
1. Fork it
|
@@ -126,3 +163,11 @@ webstore[:products][:product].size # => 2
|
|
126
163
|
3. Commit your changes (`git commit -am 'Added some feature'`)
|
127
164
|
4. Push to the branch (`git push origin my-new-feature`)
|
128
165
|
5. Create new Pull Request
|
166
|
+
|
167
|
+
## License
|
168
|
+
|
169
|
+
See `LICENSE.txt` file.
|
170
|
+
|
171
|
+
## Author
|
172
|
+
|
173
|
+
Michał Szajbe, [@szajbus](https://twitter.com/szajbus), [szajbe.pl](http://szajbe.pl)
|
data/lib/saxy/parser.rb
CHANGED
@@ -16,10 +16,15 @@ module Saxy
|
|
16
16
|
# Will yield objects inside the callback after they're built
|
17
17
|
attr_reader :callback
|
18
18
|
|
19
|
-
|
20
|
-
|
19
|
+
# Parser context
|
20
|
+
attr_reader :context
|
21
|
+
|
22
|
+
# Parser options
|
23
|
+
attr_reader :options
|
24
|
+
|
25
|
+
def initialize(object, object_tag, options={})
|
26
|
+
@object, @object_tag, @options = object, object_tag, options
|
21
27
|
@tags, @elements = [], []
|
22
|
-
@encoding = encoding
|
23
28
|
end
|
24
29
|
|
25
30
|
def start_element(tag, attributes=[])
|
@@ -56,7 +61,7 @@ module Saxy
|
|
56
61
|
end
|
57
62
|
|
58
63
|
def error(message)
|
59
|
-
raise ParsingError.new(message)
|
64
|
+
raise ParsingError.new(message, context)
|
60
65
|
end
|
61
66
|
|
62
67
|
def current_element
|
@@ -68,15 +73,25 @@ module Saxy
|
|
68
73
|
|
69
74
|
@callback = blk
|
70
75
|
|
71
|
-
args = [self,
|
76
|
+
args = [self, options[:encoding]].compact
|
72
77
|
|
73
78
|
parser = Nokogiri::XML::SAX::Parser.new(*args)
|
74
79
|
|
75
80
|
if @object.respond_to?(:read) && @object.respond_to?(:close)
|
76
|
-
parser.parse_io(@object)
|
81
|
+
parser.parse_io(@object, &context_blk)
|
77
82
|
else
|
78
|
-
parser.parse_file(@object)
|
83
|
+
parser.parse_file(@object, &context_blk)
|
79
84
|
end
|
80
85
|
end
|
86
|
+
|
87
|
+
def context_blk
|
88
|
+
proc { |context|
|
89
|
+
[:recovery, :replace_entities].each do |key|
|
90
|
+
context.send("#{key}=", options[key]) if options.has_key?(key)
|
91
|
+
end
|
92
|
+
|
93
|
+
@context = context
|
94
|
+
}
|
95
|
+
end
|
81
96
|
end
|
82
97
|
end
|
data/lib/saxy/parsing_error.rb
CHANGED
data/lib/saxy/version.rb
CHANGED
data/spec/saxy/parser_spec.rb
CHANGED
@@ -3,7 +3,8 @@ require 'spec_helper'
|
|
3
3
|
describe Saxy::Parser do
|
4
4
|
include FixturesHelper
|
5
5
|
|
6
|
-
let(:parser) { Saxy::Parser.new(fixture_file("webstore.xml"), "product"
|
6
|
+
let(:parser) { Saxy::Parser.new(fixture_file("webstore.xml"), "product") }
|
7
|
+
let(:invalid_parser) { Saxy::Parser.new(fixture_file("invalid.xml"), "product") }
|
7
8
|
let(:file_io) { File.new(fixture_file("webstore.xml")) }
|
8
9
|
let(:io_like) { IOLike.new(file_io) }
|
9
10
|
|
@@ -19,11 +20,18 @@ describe Saxy::Parser do
|
|
19
20
|
end
|
20
21
|
|
21
22
|
it "should accept optional force-encoding" do
|
22
|
-
parser = Saxy::Parser.new(file_io, "product",
|
23
|
+
parser = Saxy::Parser.new(file_io, "product", encoding: "UTF-8")
|
23
24
|
expect(Nokogiri::XML::SAX::Parser).to receive(:new).with(parser, "UTF-8").and_call_original
|
24
25
|
expect(parser.each.to_a.size).to eq(2)
|
25
26
|
end
|
26
27
|
|
28
|
+
it "should pass options to parser context" do
|
29
|
+
parser = Saxy::Parser.new(file_io, "product", recovery: true, replace_entities: true)
|
30
|
+
parser.each.to_a
|
31
|
+
expect(parser.context.recovery).to be_truthy
|
32
|
+
expect(parser.context.replace_entities).to be_truthy
|
33
|
+
end
|
34
|
+
|
27
35
|
it "should accept an IO-like for parsing" do
|
28
36
|
parser = Saxy::Parser.new(io_like, "product")
|
29
37
|
expect(parser.each.to_a.size).to eq(2)
|
@@ -158,7 +166,11 @@ describe Saxy::Parser do
|
|
158
166
|
end
|
159
167
|
|
160
168
|
it "should raise Saxy::ParsingError on error" do
|
161
|
-
expect {
|
169
|
+
expect { invalid_parser.each.to_a }.to raise_error { |error|
|
170
|
+
expect(error).to be_a(Saxy::ParsingError)
|
171
|
+
expect(error.message).to match(/Opening and ending tag mismatch/)
|
172
|
+
expect(error.context).to be_a(Nokogiri::XML::SAX::ParserContext)
|
173
|
+
}
|
162
174
|
end
|
163
175
|
|
164
176
|
it "should return Enumerator when calling #each without a block" do
|
data/spec/spec_helper.rb
CHANGED
File without changes
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: saxy
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Michał Szajbe
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-
|
11
|
+
date: 2017-08-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -66,6 +66,7 @@ files:
|
|
66
66
|
- ".rspec"
|
67
67
|
- ".travis.yml"
|
68
68
|
- Appraisals
|
69
|
+
- CHANGELOG.md
|
69
70
|
- Gemfile
|
70
71
|
- LICENSE.txt
|
71
72
|
- README.md
|
@@ -78,13 +79,14 @@ files:
|
|
78
79
|
- lib/saxy/parsing_error.rb
|
79
80
|
- lib/saxy/version.rb
|
80
81
|
- saxy.gemspec
|
81
|
-
- spec/fixtures/
|
82
|
+
- spec/fixtures/invalid.xml
|
82
83
|
- spec/fixtures/webstore.xml
|
83
84
|
- spec/fixtures_helper.rb
|
84
85
|
- spec/saxy/element_spec.rb
|
85
86
|
- spec/saxy/parser_spec.rb
|
86
87
|
- spec/saxy_spec.rb
|
87
88
|
- spec/spec_helper.rb
|
89
|
+
- spec/support/io_like.rb
|
88
90
|
homepage: http://github.com/humante/saxy
|
89
91
|
licenses: []
|
90
92
|
metadata: {}
|
@@ -110,10 +112,11 @@ specification_version: 4
|
|
110
112
|
summary: Memory-efficient XML parser. Finds object definitions and translates them
|
111
113
|
into Ruby objects.
|
112
114
|
test_files:
|
113
|
-
- spec/fixtures/
|
115
|
+
- spec/fixtures/invalid.xml
|
114
116
|
- spec/fixtures/webstore.xml
|
115
117
|
- spec/fixtures_helper.rb
|
116
118
|
- spec/saxy/element_spec.rb
|
117
119
|
- spec/saxy/parser_spec.rb
|
118
120
|
- spec/saxy_spec.rb
|
119
121
|
- spec/spec_helper.rb
|
122
|
+
- spec/support/io_like.rb
|