sax_stream 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +21 -0
- data/README.markdown +96 -0
- data/lib/sax_stream/errors.rb +7 -0
- data/lib/sax_stream/internal/child_mapping.rb +65 -0
- data/lib/sax_stream/internal/combined_handler.rb +27 -0
- data/lib/sax_stream/internal/element_stack.rb +81 -0
- data/lib/sax_stream/internal/field_mapping.rb +33 -0
- data/lib/sax_stream/internal/handler_stack.rb +27 -0
- data/lib/sax_stream/internal/mapper_handler.rb +115 -0
- data/lib/sax_stream/internal/sax_handler.rb +26 -0
- data/lib/sax_stream/internal/singular_relationship_collector.rb +17 -0
- data/lib/sax_stream/mapper.rb +162 -0
- data/lib/sax_stream/naive_collector.rb +19 -0
- data/lib/sax_stream/parser.rb +18 -0
- metadata +71 -0
data/LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
== MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2012, Craig Ambrose
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
data/README.markdown
ADDED
@@ -0,0 +1,96 @@
|
|
1
|
+
# Sax Stream
|
2
|
+
|
3
|
+
[![Build Status](https://secure.travis-ci.org/craigambrose/sax_stream.png)](http://travis-ci.org/craigambrose/sax_stream)
|
4
|
+
|
5
|
+
An XML parsing library using Nokogiri's Sax Parser which uses an object mapper to build objects from XML nodes and lets you use them as they are built, rather than waiting until the file is fully parsed.
|
6
|
+
|
7
|
+
The two main goals of this process are:
|
8
|
+
|
9
|
+
1. To avoid loading the entire XML file stream into memory at once.
|
10
|
+
2. To avoid loading all the mapped objects into memory simultaneously.
|
11
|
+
|
12
|
+
This is currently only for XML importing. Supporting exporting too would be nice if I need it.
|
13
|
+
|
14
|
+
## Status
|
15
|
+
|
16
|
+
Supports basic XML examples. Still needs to be tested with more complex XML.
|
17
|
+
Even slightly invalid XML is likely to cause an immediate exception.
|
18
|
+
|
19
|
+
## Installation
|
20
|
+
|
21
|
+
Not yet packed as a gem, install it from github.
|
22
|
+
|
23
|
+
## Usage
|
24
|
+
|
25
|
+
### Define your mapper classes
|
26
|
+
|
27
|
+
These are object definitions that you would like to extract from the XML. I recommend sticking with fairly thin classes for these. In the past, I've used similar libraries like ROXML and inherited my mapping classes from ActiveRecord or other base classes, and ended up with poorly designed code. Your mapper classes should do one thing only, provide access to structured in-memory data which can be streamed from XML.
|
28
|
+
|
29
|
+
```ruby
|
30
|
+
require 'sax_stream/mapper'
|
31
|
+
|
32
|
+
class Product
|
33
|
+
include SaxStream::Mapper
|
34
|
+
|
35
|
+
node 'product'
|
36
|
+
map :id, :to => '@id'
|
37
|
+
map :status, :to => '@status'
|
38
|
+
map :name_confirmed, :to => 'name/@confirmed'
|
39
|
+
map :name, :to => 'name'
|
40
|
+
end
|
41
|
+
```
|
42
|
+
|
43
|
+
In this example, Product is a mapping class. It maps to an xml node named "product". Each "attribute" on this product object is defined using the "map" class method. The :to option uses a syntax which is similar to XPath, but not the same. Slashes seperate levels in the XML node heirarchy. If the data is in an attribute, this is designated by the @symbol. Obviously attributes must be at the end of the path, as they have no children. This product class is used to parse XML like this:
|
44
|
+
|
45
|
+
```xml
|
46
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
47
|
+
<product id="123" status="new">
|
48
|
+
<name confirmed="yes">iPhone 5G</name>
|
49
|
+
</product>
|
50
|
+
```
|
51
|
+
|
52
|
+
All data written to your object from the XML will be string, unless you convert the. To specify a converter on a field mapping, use the :as option.
|
53
|
+
|
54
|
+
```ruby
|
55
|
+
map :created_at, :to => '@createdAt', :as => DateTime
|
56
|
+
```
|
57
|
+
|
58
|
+
The :as option can be handed any object which supports a "parse" method, which takes a string as it's first and only parameter, and outputs the converted value. It looks particularly good if this converter is also a class, which indicates that the converted value will be an instance of this class.
|
59
|
+
|
60
|
+
This library doesn't include any converters, so the DateTime example above wont work out of the box. However, if you are using the active_support gem, then Date, Time & DateTime will all work as values for :as. If you're dealing with a file format that formats dates (or some other object) in an unusual way, simply define your own.
|
61
|
+
|
62
|
+
```ruby
|
63
|
+
class UnusualDate
|
64
|
+
def self.parse(string)
|
65
|
+
# convert this unusual string to a Date
|
66
|
+
# No need to check for nil, sax_stream does that
|
67
|
+
end
|
68
|
+
end
|
69
|
+
```
|
70
|
+
|
71
|
+
### Run the parser
|
72
|
+
|
73
|
+
The parser object must be supplied with a collector and an array of mapping classes to use.
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
require 'sax_stream/parser'
|
77
|
+
|
78
|
+
collector = SaxStream::NaiveCollector.new
|
79
|
+
parser = SaxStream::Parser.new(collector, [Product])
|
80
|
+
|
81
|
+
parser.parse_stream(File.open('products.xml'))
|
82
|
+
```
|
83
|
+
|
84
|
+
The purpose of the collector is to be given the object once it has been built. It's basically any object which supports the "<<" operator (so yes, an Array works as a collector). SaxStream includes a NaiveCollector which you can use, but it's so named to remind you that this probably isn't what you want to do.
|
85
|
+
|
86
|
+
If you use the example above, you will get some memory benefits over using a library like ROXML or HappyMapper, because the XML file is at least being processed as a stream, and not held in memory. This lets you process a very large file. However, the naive collector is holding all the created objects in memory, so as you parse the file, this gets bigger.
|
87
|
+
|
88
|
+
To get the full benefits of this library, supply a collector which does something else with the objects as they are received, such as save them to a database.
|
89
|
+
|
90
|
+
I plan to supply a batching collector which will collect a certain number of objects before passing them off to another collector you supply, so you can save objects in batches of 100 or whatever is optimal for your application.
|
91
|
+
|
92
|
+
## Author
|
93
|
+
|
94
|
+
Craig Ambrose
|
95
|
+
|
96
|
+
http://www.craigambrose.com
|
@@ -0,0 +1,65 @@
|
|
1
|
+
require 'sax_stream/internal/mapper_handler'
|
2
|
+
require 'sax_stream/internal/singular_relationship_collector'
|
3
|
+
|
4
|
+
module SaxStream
|
5
|
+
module Internal
|
6
|
+
class ChildMapping
|
7
|
+
attr_reader :name
|
8
|
+
|
9
|
+
# Supported options are :to, :as & :parent_collects. See Mapper.relate documentation for more details.
|
10
|
+
def initialize(name, options)
|
11
|
+
@name = name.to_s
|
12
|
+
@parent_collects = options[:parent_collects]
|
13
|
+
process_conversion_type(options[:as])
|
14
|
+
end
|
15
|
+
|
16
|
+
def handler_for(node_path, collector, handler_stack, parent_object)
|
17
|
+
node_name = node_path.split('/').last
|
18
|
+
@mapper_classes.each do |mapper_class|
|
19
|
+
if mapper_class.maps_node?(node_name)
|
20
|
+
return MapperHandler.new(mapper_class, child_collector(parent_object, collector), handler_stack)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
nil
|
24
|
+
end
|
25
|
+
|
26
|
+
def map_value_onto_object(object, value)
|
27
|
+
end
|
28
|
+
|
29
|
+
def build_empty_relation
|
30
|
+
[] if @plural
|
31
|
+
end
|
32
|
+
|
33
|
+
private
|
34
|
+
|
35
|
+
def child_collector(parent_object, collector)
|
36
|
+
if @parent_collects
|
37
|
+
if @plural
|
38
|
+
parent_object.relations[name]
|
39
|
+
else
|
40
|
+
SingularRelationshipCollector.new(parent_object, @name)
|
41
|
+
end
|
42
|
+
else
|
43
|
+
collector
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def arrayify(value)
|
48
|
+
value.is_a?(Enumerable) ? value : [value]
|
49
|
+
end
|
50
|
+
|
51
|
+
def process_conversion_type(as)
|
52
|
+
@plural = as.is_a?(Enumerable)
|
53
|
+
@mapper_classes = arrayify(as).compact
|
54
|
+
if @mapper_classes.empty?
|
55
|
+
raise ":as options for #{@name} field is empty, for child nodes it must be a mapper class or array of mapper classes"
|
56
|
+
end
|
57
|
+
@mapper_classes.each do |mapper_class|
|
58
|
+
unless mapper_class.respond_to?(:map_key_onto_object)
|
59
|
+
raise ":as options for #{@name} field contains #{mapper_class.inspect} which does not appear to be a valid mapper class"
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
65
|
+
end
|
@@ -0,0 +1,27 @@
|
|
1
|
+
require 'sax_stream/errors'
|
2
|
+
|
3
|
+
module SaxStream
|
4
|
+
module Internal
|
5
|
+
class CombinedHandler
|
6
|
+
def initialize(stack, mapper_handlers)
|
7
|
+
@stack = stack
|
8
|
+
@handlers = mapper_handlers
|
9
|
+
end
|
10
|
+
|
11
|
+
def start_element(name, *other_params)
|
12
|
+
handler = handle_for_element_name(name)
|
13
|
+
raise UnexpectedNode, "Could not find a handler for the element start: #{name.inspect}" unless handler
|
14
|
+
@stack.push(handler)
|
15
|
+
handler.start_element(name, *other_params)
|
16
|
+
end
|
17
|
+
|
18
|
+
private
|
19
|
+
|
20
|
+
def handle_for_element_name(name)
|
21
|
+
@handlers.detect do |handler|
|
22
|
+
handler if handler.maps_node?(name)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
@@ -0,0 +1,81 @@
|
|
1
|
+
require 'sax_stream/errors'
|
2
|
+
|
3
|
+
module SaxStream
|
4
|
+
module Internal
|
5
|
+
class ElementStack
|
6
|
+
class Element
|
7
|
+
attr_accessor :name
|
8
|
+
|
9
|
+
def initialize(name, attrs)
|
10
|
+
@name = name
|
11
|
+
@attrs = attrs
|
12
|
+
@content = nil
|
13
|
+
end
|
14
|
+
|
15
|
+
def attributes(prefix = nil)
|
16
|
+
prefix ||= @name
|
17
|
+
@attrs.map do |key, value|
|
18
|
+
["#{prefix}/@#{key}", value]
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
def content
|
23
|
+
@content
|
24
|
+
end
|
25
|
+
|
26
|
+
def record_characters(string)
|
27
|
+
@content ||= ''
|
28
|
+
@content += string
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def initialize
|
33
|
+
@elements = []
|
34
|
+
end
|
35
|
+
|
36
|
+
def top_name
|
37
|
+
@elements.last.name if @elements.last
|
38
|
+
end
|
39
|
+
|
40
|
+
def push(name, attrs)
|
41
|
+
@elements.push(Element.new(name, attrs))
|
42
|
+
# indented_puts "push element #{name}"
|
43
|
+
end
|
44
|
+
|
45
|
+
def pop
|
46
|
+
raise ProgramError, "attempting to pop an empty ElementStack" if @elements.empty?
|
47
|
+
# indented_puts "pop element"
|
48
|
+
@elements.pop
|
49
|
+
end
|
50
|
+
|
51
|
+
def empty?
|
52
|
+
@elements.empty?
|
53
|
+
end
|
54
|
+
|
55
|
+
def path
|
56
|
+
@elements.map(&:name).join('/')
|
57
|
+
end
|
58
|
+
|
59
|
+
def content
|
60
|
+
@elements.last.content
|
61
|
+
end
|
62
|
+
|
63
|
+
def attributes
|
64
|
+
@elements.last.attributes(path)
|
65
|
+
end
|
66
|
+
|
67
|
+
def record_characters(string)
|
68
|
+
# indented_puts " record: #{string.inspect}"
|
69
|
+
@elements.last.record_characters(string)
|
70
|
+
end
|
71
|
+
|
72
|
+
private
|
73
|
+
|
74
|
+
def indented_puts(string)
|
75
|
+
indent = ''
|
76
|
+
@elements.length.times { indent << ' ' }
|
77
|
+
puts indent + string
|
78
|
+
end
|
79
|
+
end
|
80
|
+
end
|
81
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module SaxStream
|
2
|
+
module Internal
|
3
|
+
class FieldMapping
|
4
|
+
def initialize(name, options = {})
|
5
|
+
@name = name.to_s
|
6
|
+
@path = options[:to]
|
7
|
+
process_conversion_type(options[:as])
|
8
|
+
end
|
9
|
+
|
10
|
+
def map_value_onto_object(object, value)
|
11
|
+
if value && @parser
|
12
|
+
value = @parser.parse(value)
|
13
|
+
end
|
14
|
+
object[@name] = value
|
15
|
+
end
|
16
|
+
|
17
|
+
def handler_for(name, collector, handler_stack, parent_object)
|
18
|
+
end
|
19
|
+
|
20
|
+
private
|
21
|
+
|
22
|
+
def process_conversion_type(as)
|
23
|
+
if as
|
24
|
+
if as.respond_to?(:parse)
|
25
|
+
@parser = as
|
26
|
+
else
|
27
|
+
raise ArgumentError, ":as options for #{@name} field is a #{as.inspect} which must respond to parse"
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,27 @@
|
|
1
|
+
module SaxStream
|
2
|
+
module Internal
|
3
|
+
class HandlerStack
|
4
|
+
def initialize()
|
5
|
+
@handlers = []
|
6
|
+
end
|
7
|
+
|
8
|
+
def root=(value)
|
9
|
+
@handlers = [value]
|
10
|
+
end
|
11
|
+
|
12
|
+
def top
|
13
|
+
@handlers.last
|
14
|
+
end
|
15
|
+
|
16
|
+
def push(handler)
|
17
|
+
@handlers.push(handler)
|
18
|
+
end
|
19
|
+
|
20
|
+
def pop(handler = nil)
|
21
|
+
raise ProgramError, "can't pop the last handler" if @handlers.length <= 1
|
22
|
+
raise ProgramError, "popping handler that isn't the top" if handler && handler != @handlers.last
|
23
|
+
@handlers.pop
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
@@ -0,0 +1,115 @@
|
|
1
|
+
require 'sax_stream/internal/element_stack'
|
2
|
+
|
3
|
+
module SaxStream
|
4
|
+
module Internal
|
5
|
+
# Handles SAX events on behalf of a mapper class. Expects the first event to be start_element, and then
|
6
|
+
# uses other events to build the element until end_element is received, at which time the completed
|
7
|
+
# object will be sent off to the collector.
|
8
|
+
#
|
9
|
+
# Also handles child elements which use their own mapper class, and will pass off SAX control to other
|
10
|
+
# handlers to achieve this.
|
11
|
+
class MapperHandler
|
12
|
+
attr_accessor :stack
|
13
|
+
attr_reader :mapper_class, :collector
|
14
|
+
|
15
|
+
# mapper_class:: A class which has had SaxStream::Mapper included in it.
|
16
|
+
#
|
17
|
+
# collector:: The collector object used for this parsing run. This gets passed around to everything
|
18
|
+
# that needs it.
|
19
|
+
#
|
20
|
+
# handler_stack:: The current stack of Sax handling objects for this parsing run. This gets passed around
|
21
|
+
# to everything that needs it. This class does not need to push itself onto the stack,
|
22
|
+
# that has already been done for it. If it pushes other handlers onto the stack, then
|
23
|
+
# it will no longer be handling SAX events itself until they get popped off.
|
24
|
+
#
|
25
|
+
# element_stack:: Used internally by this object to collect XML elements that have been parsed which may
|
26
|
+
# be used when mapping this class. You don't need to pass this in except for dependency
|
27
|
+
# injection purposes.
|
28
|
+
def initialize(mapper_class, collector, handler_stack, element_stack = ElementStack.new)
|
29
|
+
raise ArgumentError, "no collector" unless collector
|
30
|
+
raise ArgumentError, "no mapper class" unless mapper_class
|
31
|
+
raise ArgumentError, "no handler stack" unless handler_stack
|
32
|
+
raise ArgumentError, "no element stack" unless element_stack
|
33
|
+
|
34
|
+
@mapper_class = mapper_class
|
35
|
+
@collector = collector
|
36
|
+
@element_stack = element_stack
|
37
|
+
@stack = handler_stack
|
38
|
+
end
|
39
|
+
|
40
|
+
def maps_node?(node_name)
|
41
|
+
@mapper_class.maps_node?(node_name)
|
42
|
+
end
|
43
|
+
|
44
|
+
def start_element(name, attrs = [])
|
45
|
+
start_current_object(name, attrs) || start_child_node(name, attrs) || start_child_data(name, attrs)
|
46
|
+
end
|
47
|
+
|
48
|
+
def end_element(name)
|
49
|
+
pop_element_stack(name) || end_current_object(name)
|
50
|
+
end
|
51
|
+
|
52
|
+
def cdata_block(string)
|
53
|
+
characters(string)
|
54
|
+
end
|
55
|
+
|
56
|
+
def characters(string)
|
57
|
+
unless @element_stack.empty?
|
58
|
+
@element_stack.record_characters(string)
|
59
|
+
end
|
60
|
+
end
|
61
|
+
|
62
|
+
def current_object
|
63
|
+
@current_object
|
64
|
+
end
|
65
|
+
|
66
|
+
private
|
67
|
+
|
68
|
+
def start_current_object(name, attrs)
|
69
|
+
if maps_node?(name)
|
70
|
+
@current_object = @mapper_class.new
|
71
|
+
attrs.each do |key, value|
|
72
|
+
@mapper_class.map_attribute_onto_object(@current_object, key, value)
|
73
|
+
end
|
74
|
+
@current_object
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
def start_child_node(name, attrs)
|
79
|
+
handler = @mapper_class.child_handler_for(prefix_with_element_stack(name), @collector, @stack, @current_object)
|
80
|
+
if handler
|
81
|
+
@stack.push(handler)
|
82
|
+
handler.start_element(name, attrs)
|
83
|
+
handler
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
def start_child_data(name, attrs)
|
88
|
+
raise ProgramError, "received child element #{name.inspect} before receiving main expected node #{@mapper_class.node_name.inspect}" unless current_object
|
89
|
+
@element_stack.push(name, attrs)
|
90
|
+
end
|
91
|
+
|
92
|
+
def pop_element_stack(name)
|
93
|
+
unless @element_stack.empty?
|
94
|
+
raise ProgramError "received end element event for #{name.inspect} but currently processing #{@element_stack.top_name.inspect}" unless @element_stack.top_name == name
|
95
|
+
@mapper_class.map_element_stack_top_onto_object(@current_object, @element_stack)
|
96
|
+
@element_stack.pop
|
97
|
+
end
|
98
|
+
end
|
99
|
+
|
100
|
+
def end_current_object(name)
|
101
|
+
raise ProgramError unless @current_object
|
102
|
+
raise ArgumentError, "received end element event for #{name.inspect} but currently processing #{@current_object.class.node_name.inspect}" unless @current_object.class.node_name == name
|
103
|
+
if @current_object.class.should_collect?
|
104
|
+
@collector << @current_object
|
105
|
+
end
|
106
|
+
@stack.pop(self)
|
107
|
+
@current_object = nil
|
108
|
+
end
|
109
|
+
|
110
|
+
def prefix_with_element_stack(name)
|
111
|
+
[@element_stack.path, name].reject {|part| part.nil? || part == ''}.join('/')
|
112
|
+
end
|
113
|
+
end
|
114
|
+
end
|
115
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'nokogiri'
|
2
|
+
require 'sax_stream/internal/handler_stack'
|
3
|
+
require 'sax_stream/internal/combined_handler'
|
4
|
+
|
5
|
+
module SaxStream
|
6
|
+
module Internal
|
7
|
+
class SaxHandler < Nokogiri::XML::SAX::Document
|
8
|
+
def initialize(collector, mappers, handler_stack = HandlerStack.new)
|
9
|
+
@handler_stack = handler_stack
|
10
|
+
mapper_handlers = mappers.map do |mapper|
|
11
|
+
MapperHandler.new(mapper, collector, @handler_stack)
|
12
|
+
end
|
13
|
+
@handler_stack.root = CombinedHandler.new(@handler_stack, mapper_handlers)
|
14
|
+
end
|
15
|
+
|
16
|
+
[:start_element, :end_element, :characters, :cdata_block].each do |key|
|
17
|
+
code = <<-RUBY
|
18
|
+
def #{key}(*params)
|
19
|
+
@handler_stack.top.#{key}(*params)
|
20
|
+
end
|
21
|
+
RUBY
|
22
|
+
module_eval(code)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module SaxStream
|
2
|
+
module Internal
|
3
|
+
class SingularRelationshipCollector
|
4
|
+
def initialize(parent, relation_name)
|
5
|
+
@parent = parent
|
6
|
+
@relation_name = relation_name
|
7
|
+
end
|
8
|
+
|
9
|
+
def <<(value)
|
10
|
+
if @parent.relations[@relation_name]
|
11
|
+
raise ProgramError, "found singular relationship #{@relation_name.inspect} occuring more than once. Existing is #{@parent.relations[@relation_name].inspect}"
|
12
|
+
end
|
13
|
+
@parent.relations[@relation_name] = value
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
@@ -0,0 +1,162 @@
|
|
1
|
+
require 'sax_stream/internal/field_mapping'
|
2
|
+
require 'sax_stream/internal/child_mapping'
|
3
|
+
|
4
|
+
module SaxStream
|
5
|
+
# Include this module to make your class map an XML node. For usage examples, see the READEME.
|
6
|
+
module Mapper
|
7
|
+
def self.included(base)
|
8
|
+
base.extend ClassMethods
|
9
|
+
end
|
10
|
+
|
11
|
+
module ClassMethods
|
12
|
+
def node(name, options = {})
|
13
|
+
@node_name = name
|
14
|
+
@collect = options.has_key?(:collect) ? options[:collect] : true
|
15
|
+
end
|
16
|
+
|
17
|
+
def map(attribute_name, options = {})
|
18
|
+
store_field_mapping(options[:to], Internal::FieldMapping.new(attribute_name, options))
|
19
|
+
end
|
20
|
+
|
21
|
+
# Define a relation to another object which is built from an XML node using another class
|
22
|
+
# which also includes SaxStream::Mapper.
|
23
|
+
#
|
24
|
+
# attribute_name:: The name of the attribute on your object that the related objects will be stored in.
|
25
|
+
# options:: An options hash which can accept the following-
|
26
|
+
# [:to] Default value: "*"
|
27
|
+
# The path to the XML which defines an instance of this related node. This
|
28
|
+
# is a bit like an XPath expression, but not quite. See the README for examples.
|
29
|
+
# For relations, this can include a wildcard "*" as the last part of the path,
|
30
|
+
# eg: "product/review/*". If the path is just set to "*" then this will match
|
31
|
+
# any immediate child of the current node, which is good for polymorphic collections.
|
32
|
+
# [:as] Required, no default value.
|
33
|
+
# Needs to be a class which includes SaxStream::Mapper, or an array of such classes.
|
34
|
+
# Using an array of classes, even if the array only has one item, denotes that an
|
35
|
+
# array of related items are expected. Calling @object.relations['name'] will return
|
36
|
+
# an array which will be empty if nothing is found. If a singular value is used for
|
37
|
+
# the :as option, then the relation will be assumed to be singular, and so it will
|
38
|
+
# be nil or the expected class (and will raise an error if multiples are in the file).
|
39
|
+
# [:parent_collects] Default value: false
|
40
|
+
# Set to true if the object defining this relationship (ie, the parent
|
41
|
+
# in the relationship) needs to collect the defined children. If so, the
|
42
|
+
# parent object will be used as the collector for these children, and they
|
43
|
+
# will not be passed to the collector supplied to the parser. Use this when
|
44
|
+
# the child objects are not something you want to process on their own, but
|
45
|
+
# instead you want them all to be loaded into the parent which will then be
|
46
|
+
# collected as normal. If this is left false, then the parent ojbect will
|
47
|
+
# not be informed of it's children, because they will be passed to the collector
|
48
|
+
# and then forgotten about. However, the children will know about their parent,
|
49
|
+
# or at least what is known about it, but the parent will not be finished being
|
50
|
+
# parsed. The parent will have already parsed all XML attributes though.
|
51
|
+
def relate(attribute_name, options = {})
|
52
|
+
store_relation_mapping(options[:to] || '*', Internal::ChildMapping.new(attribute_name, options))
|
53
|
+
end
|
54
|
+
|
55
|
+
def node_name
|
56
|
+
@node_name
|
57
|
+
end
|
58
|
+
|
59
|
+
def maps_node?(name)
|
60
|
+
@node_name == name
|
61
|
+
end
|
62
|
+
|
63
|
+
def map_attribute_onto_object(object, key, value)
|
64
|
+
map_key_onto_object(object, "@#{key}", value)
|
65
|
+
end
|
66
|
+
|
67
|
+
def map_element_stack_top_onto_object(object, element_stack)
|
68
|
+
map_key_onto_object(object, element_stack.path, element_stack.content)
|
69
|
+
element_stack.attributes.each do |key, value|
|
70
|
+
map_key_onto_object(object, key, value)
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
def map_key_onto_object(object, key, value)
|
75
|
+
mapping = field_mapping(key)
|
76
|
+
if mapping
|
77
|
+
mapping.map_value_onto_object(object, value)
|
78
|
+
end
|
79
|
+
end
|
80
|
+
|
81
|
+
def child_handler_for(key, collector, handler_stack, current_object)
|
82
|
+
mapping = field_mapping(key)
|
83
|
+
if mapping
|
84
|
+
mapping.handler_for(key, collector, handler_stack, current_object)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
def relation_mappings
|
89
|
+
@relation_mappings ||= []
|
90
|
+
end
|
91
|
+
|
92
|
+
def should_collect?
|
93
|
+
@collect
|
94
|
+
end
|
95
|
+
|
96
|
+
private
|
97
|
+
|
98
|
+
def store_relation_mapping(key, mapping)
|
99
|
+
relation_mappings << mapping
|
100
|
+
store_field_mapping(key, mapping)
|
101
|
+
end
|
102
|
+
|
103
|
+
def store_field_mapping(key, mapping)
|
104
|
+
if key.include?('*')
|
105
|
+
regex_mappings << [Regexp.new(key.gsub('*', '[^/]+')), mapping]
|
106
|
+
else
|
107
|
+
mappings[key] = mapping
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
def field_mapping(key)
|
112
|
+
mappings[key] || regex_field_mapping(key)
|
113
|
+
end
|
114
|
+
|
115
|
+
def regex_field_mapping(key)
|
116
|
+
regex_mappings.each do |regex, mapping|
|
117
|
+
return mapping if regex =~ key
|
118
|
+
end
|
119
|
+
nil
|
120
|
+
end
|
121
|
+
|
122
|
+
def regex_mappings
|
123
|
+
@regex_mappings ||= []
|
124
|
+
end
|
125
|
+
|
126
|
+
def mappings
|
127
|
+
@mappings ||= {}
|
128
|
+
end
|
129
|
+
end
|
130
|
+
|
131
|
+
def []=(key, value)
|
132
|
+
attributes[key] = value
|
133
|
+
end
|
134
|
+
|
135
|
+
def [](key)
|
136
|
+
attributes[key]
|
137
|
+
end
|
138
|
+
|
139
|
+
def inspect
|
140
|
+
"#{self.class.name}: #{attributes.inspect}"
|
141
|
+
end
|
142
|
+
|
143
|
+
def attributes
|
144
|
+
@attributes ||= {}
|
145
|
+
end
|
146
|
+
|
147
|
+
def relations
|
148
|
+
@relations ||= build_empty_relations
|
149
|
+
end
|
150
|
+
|
151
|
+
private
|
152
|
+
|
153
|
+
def build_empty_relations
|
154
|
+
result = {}
|
155
|
+
self.class.relation_mappings.each do |relation_mapping|
|
156
|
+
result[relation_mapping.name] = relation_mapping.build_empty_relation
|
157
|
+
end
|
158
|
+
result
|
159
|
+
end
|
160
|
+
|
161
|
+
end
|
162
|
+
end
|
@@ -0,0 +1,19 @@
|
|
1
|
+
module SaxStream
|
2
|
+
class NaiveCollector
|
3
|
+
def initialize
|
4
|
+
@objects = []
|
5
|
+
end
|
6
|
+
|
7
|
+
def mapped_objects
|
8
|
+
@objects
|
9
|
+
end
|
10
|
+
|
11
|
+
def <<(value)
|
12
|
+
@objects << value
|
13
|
+
end
|
14
|
+
|
15
|
+
def for_type(klass)
|
16
|
+
mapped_objects.select { |object| object.class == klass }
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
require 'sax_stream/errors'
|
2
|
+
require 'sax_stream/internal/mapper_handler'
|
3
|
+
require 'sax_stream/internal/sax_handler'
|
4
|
+
|
5
|
+
module SaxStream
|
6
|
+
class Parser
|
7
|
+
def initialize(collector, mappers)
|
8
|
+
raise ArgumentError, "You must supply your parser with a collector" unless collector
|
9
|
+
raise ArgumentError, "You must supply your parser with at least one mapper class" if mappers.empty?
|
10
|
+
@sax_handler = Internal::SaxHandler.new(collector, mappers)
|
11
|
+
end
|
12
|
+
|
13
|
+
def parse_stream(io_stream)
|
14
|
+
parser = Nokogiri::XML::SAX::Parser.new(@sax_handler)
|
15
|
+
parser.parse(io_stream)
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
metadata
ADDED
@@ -0,0 +1,71 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: sax_stream
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Craig Ambrose
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2012-04-07 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: nokogiri
|
16
|
+
requirement: &70166827474840 !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: 1.4.0
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: *70166827474840
|
25
|
+
description:
|
26
|
+
email:
|
27
|
+
- craig@craigambrose.com
|
28
|
+
executables: []
|
29
|
+
extensions: []
|
30
|
+
extra_rdoc_files: []
|
31
|
+
files:
|
32
|
+
- lib/sax_stream/errors.rb
|
33
|
+
- lib/sax_stream/internal/child_mapping.rb
|
34
|
+
- lib/sax_stream/internal/combined_handler.rb
|
35
|
+
- lib/sax_stream/internal/element_stack.rb
|
36
|
+
- lib/sax_stream/internal/field_mapping.rb
|
37
|
+
- lib/sax_stream/internal/handler_stack.rb
|
38
|
+
- lib/sax_stream/internal/mapper_handler.rb
|
39
|
+
- lib/sax_stream/internal/sax_handler.rb
|
40
|
+
- lib/sax_stream/internal/singular_relationship_collector.rb
|
41
|
+
- lib/sax_stream/mapper.rb
|
42
|
+
- lib/sax_stream/naive_collector.rb
|
43
|
+
- lib/sax_stream/parser.rb
|
44
|
+
- LICENSE
|
45
|
+
- README.markdown
|
46
|
+
homepage: http://github.com/craigambrose/sax_stream
|
47
|
+
licenses: []
|
48
|
+
post_install_message:
|
49
|
+
rdoc_options: []
|
50
|
+
require_paths:
|
51
|
+
- lib
|
52
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
53
|
+
none: false
|
54
|
+
requirements:
|
55
|
+
- - ! '>='
|
56
|
+
- !ruby/object:Gem::Version
|
57
|
+
version: '0'
|
58
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
59
|
+
none: false
|
60
|
+
requirements:
|
61
|
+
- - ! '>='
|
62
|
+
- !ruby/object:Gem::Version
|
63
|
+
version: '0'
|
64
|
+
requirements: []
|
65
|
+
rubyforge_project:
|
66
|
+
rubygems_version: 1.8.6
|
67
|
+
signing_key:
|
68
|
+
specification_version: 3
|
69
|
+
summary: A streaming XML parser which builds objects and passes them to a collecter
|
70
|
+
as they are ready
|
71
|
+
test_files: []
|