sax_stream 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ == MIT License
2
+
3
+ Copyright (c) 2012, Craig Ambrose
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.markdown ADDED
@@ -0,0 +1,96 @@
1
+ # Sax Stream
2
+
3
+ [![Build Status](https://secure.travis-ci.org/craigambrose/sax_stream.png)](http://travis-ci.org/craigambrose/sax_stream)
4
+
5
+ An XML parsing library using Nokogiri's Sax Parser which uses an object mapper to build objects from XML nodes and lets you use them as they are built, rather than waiting until the file is fully parsed.
6
+
7
+ The two main goals of this process are:
8
+
9
+ 1. To avoid loading the entire XML file stream into memory at once.
10
+ 2. To avoid loading all the mapped objects into memory simultaneously.
11
+
12
+ This is currently only for XML importing. Supporting exporting too would be nice if I need it.
13
+
14
+ ## Status
15
+
16
+ Supports basic XML examples. Still needs to be tested with more complex XML.
17
+ Even slightly invalid XML is likely to cause an immediate exception.
18
+
19
+ ## Installation
20
+
21
+ Not yet packed as a gem, install it from github.
22
+
23
+ ## Usage
24
+
25
+ ### Define your mapper classes
26
+
27
+ These are object definitions that you would like to extract from the XML. I recommend sticking with fairly thin classes for these. In the past, I've used similar libraries like ROXML and inherited my mapping classes from ActiveRecord or other base classes, and ended up with poorly designed code. Your mapper classes should do one thing only, provide access to structured in-memory data which can be streamed from XML.
28
+
29
+ ```ruby
30
+ require 'sax_stream/mapper'
31
+
32
+ class Product
33
+ include SaxStream::Mapper
34
+
35
+ node 'product'
36
+ map :id, :to => '@id'
37
+ map :status, :to => '@status'
38
+ map :name_confirmed, :to => 'name/@confirmed'
39
+ map :name, :to => 'name'
40
+ end
41
+ ```
42
+
43
+ In this example, Product is a mapping class. It maps to an xml node named "product". Each "attribute" on this product object is defined using the "map" class method. The :to option uses a syntax which is similar to XPath, but not the same. Slashes seperate levels in the XML node heirarchy. If the data is in an attribute, this is designated by the @symbol. Obviously attributes must be at the end of the path, as they have no children. This product class is used to parse XML like this:
44
+
45
+ ```xml
46
+ <?xml version="1.0" encoding="UTF-8"?>
47
+ <product id="123" status="new">
48
+ <name confirmed="yes">iPhone 5G</name>
49
+ </product>
50
+ ```
51
+
52
+ All data written to your object from the XML will be string, unless you convert the. To specify a converter on a field mapping, use the :as option.
53
+
54
+ ```ruby
55
+ map :created_at, :to => '@createdAt', :as => DateTime
56
+ ```
57
+
58
+ The :as option can be handed any object which supports a "parse" method, which takes a string as it's first and only parameter, and outputs the converted value. It looks particularly good if this converter is also a class, which indicates that the converted value will be an instance of this class.
59
+
60
+ This library doesn't include any converters, so the DateTime example above wont work out of the box. However, if you are using the active_support gem, then Date, Time & DateTime will all work as values for :as. If you're dealing with a file format that formats dates (or some other object) in an unusual way, simply define your own.
61
+
62
+ ```ruby
63
+ class UnusualDate
64
+ def self.parse(string)
65
+ # convert this unusual string to a Date
66
+ # No need to check for nil, sax_stream does that
67
+ end
68
+ end
69
+ ```
70
+
71
+ ### Run the parser
72
+
73
+ The parser object must be supplied with a collector and an array of mapping classes to use.
74
+
75
+ ```ruby
76
+ require 'sax_stream/parser'
77
+
78
+ collector = SaxStream::NaiveCollector.new
79
+ parser = SaxStream::Parser.new(collector, [Product])
80
+
81
+ parser.parse_stream(File.open('products.xml'))
82
+ ```
83
+
84
+ The purpose of the collector is to be given the object once it has been built. It's basically any object which supports the "<<" operator (so yes, an Array works as a collector). SaxStream includes a NaiveCollector which you can use, but it's so named to remind you that this probably isn't what you want to do.
85
+
86
+ If you use the example above, you will get some memory benefits over using a library like ROXML or HappyMapper, because the XML file is at least being processed as a stream, and not held in memory. This lets you process a very large file. However, the naive collector is holding all the created objects in memory, so as you parse the file, this gets bigger.
87
+
88
+ To get the full benefits of this library, supply a collector which does something else with the objects as they are received, such as save them to a database.
89
+
90
+ I plan to supply a batching collector which will collect a certain number of objects before passing them off to another collector you supply, so you can save objects in batches of 100 or whatever is optimal for your application.
91
+
92
+ ## Author
93
+
94
+ Craig Ambrose
95
+
96
+ http://www.craigambrose.com
@@ -0,0 +1,7 @@
1
+ module SaxStream
2
+ class ImportError < Exception; end
3
+
4
+ class ProgramError < ImportError; end
5
+ class UnexpectedNode < ImportError; end
6
+ class UnexpectedAttribute < ImportError; end
7
+ end
@@ -0,0 +1,65 @@
1
+ require 'sax_stream/internal/mapper_handler'
2
+ require 'sax_stream/internal/singular_relationship_collector'
3
+
4
+ module SaxStream
5
+ module Internal
6
+ class ChildMapping
7
+ attr_reader :name
8
+
9
+ # Supported options are :to, :as & :parent_collects. See Mapper.relate documentation for more details.
10
+ def initialize(name, options)
11
+ @name = name.to_s
12
+ @parent_collects = options[:parent_collects]
13
+ process_conversion_type(options[:as])
14
+ end
15
+
16
+ def handler_for(node_path, collector, handler_stack, parent_object)
17
+ node_name = node_path.split('/').last
18
+ @mapper_classes.each do |mapper_class|
19
+ if mapper_class.maps_node?(node_name)
20
+ return MapperHandler.new(mapper_class, child_collector(parent_object, collector), handler_stack)
21
+ end
22
+ end
23
+ nil
24
+ end
25
+
26
+ def map_value_onto_object(object, value)
27
+ end
28
+
29
+ def build_empty_relation
30
+ [] if @plural
31
+ end
32
+
33
+ private
34
+
35
+ def child_collector(parent_object, collector)
36
+ if @parent_collects
37
+ if @plural
38
+ parent_object.relations[name]
39
+ else
40
+ SingularRelationshipCollector.new(parent_object, @name)
41
+ end
42
+ else
43
+ collector
44
+ end
45
+ end
46
+
47
+ def arrayify(value)
48
+ value.is_a?(Enumerable) ? value : [value]
49
+ end
50
+
51
+ def process_conversion_type(as)
52
+ @plural = as.is_a?(Enumerable)
53
+ @mapper_classes = arrayify(as).compact
54
+ if @mapper_classes.empty?
55
+ raise ":as options for #{@name} field is empty, for child nodes it must be a mapper class or array of mapper classes"
56
+ end
57
+ @mapper_classes.each do |mapper_class|
58
+ unless mapper_class.respond_to?(:map_key_onto_object)
59
+ raise ":as options for #{@name} field contains #{mapper_class.inspect} which does not appear to be a valid mapper class"
60
+ end
61
+ end
62
+ end
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,27 @@
1
+ require 'sax_stream/errors'
2
+
3
+ module SaxStream
4
+ module Internal
5
+ class CombinedHandler
6
+ def initialize(stack, mapper_handlers)
7
+ @stack = stack
8
+ @handlers = mapper_handlers
9
+ end
10
+
11
+ def start_element(name, *other_params)
12
+ handler = handle_for_element_name(name)
13
+ raise UnexpectedNode, "Could not find a handler for the element start: #{name.inspect}" unless handler
14
+ @stack.push(handler)
15
+ handler.start_element(name, *other_params)
16
+ end
17
+
18
+ private
19
+
20
+ def handle_for_element_name(name)
21
+ @handlers.detect do |handler|
22
+ handler if handler.maps_node?(name)
23
+ end
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,81 @@
1
+ require 'sax_stream/errors'
2
+
3
+ module SaxStream
4
+ module Internal
5
+ class ElementStack
6
+ class Element
7
+ attr_accessor :name
8
+
9
+ def initialize(name, attrs)
10
+ @name = name
11
+ @attrs = attrs
12
+ @content = nil
13
+ end
14
+
15
+ def attributes(prefix = nil)
16
+ prefix ||= @name
17
+ @attrs.map do |key, value|
18
+ ["#{prefix}/@#{key}", value]
19
+ end
20
+ end
21
+
22
+ def content
23
+ @content
24
+ end
25
+
26
+ def record_characters(string)
27
+ @content ||= ''
28
+ @content += string
29
+ end
30
+ end
31
+
32
+ def initialize
33
+ @elements = []
34
+ end
35
+
36
+ def top_name
37
+ @elements.last.name if @elements.last
38
+ end
39
+
40
+ def push(name, attrs)
41
+ @elements.push(Element.new(name, attrs))
42
+ # indented_puts "push element #{name}"
43
+ end
44
+
45
+ def pop
46
+ raise ProgramError, "attempting to pop an empty ElementStack" if @elements.empty?
47
+ # indented_puts "pop element"
48
+ @elements.pop
49
+ end
50
+
51
+ def empty?
52
+ @elements.empty?
53
+ end
54
+
55
+ def path
56
+ @elements.map(&:name).join('/')
57
+ end
58
+
59
+ def content
60
+ @elements.last.content
61
+ end
62
+
63
+ def attributes
64
+ @elements.last.attributes(path)
65
+ end
66
+
67
+ def record_characters(string)
68
+ # indented_puts " record: #{string.inspect}"
69
+ @elements.last.record_characters(string)
70
+ end
71
+
72
+ private
73
+
74
+ def indented_puts(string)
75
+ indent = ''
76
+ @elements.length.times { indent << ' ' }
77
+ puts indent + string
78
+ end
79
+ end
80
+ end
81
+ end
@@ -0,0 +1,33 @@
1
+ module SaxStream
2
+ module Internal
3
+ class FieldMapping
4
+ def initialize(name, options = {})
5
+ @name = name.to_s
6
+ @path = options[:to]
7
+ process_conversion_type(options[:as])
8
+ end
9
+
10
+ def map_value_onto_object(object, value)
11
+ if value && @parser
12
+ value = @parser.parse(value)
13
+ end
14
+ object[@name] = value
15
+ end
16
+
17
+ def handler_for(name, collector, handler_stack, parent_object)
18
+ end
19
+
20
+ private
21
+
22
+ def process_conversion_type(as)
23
+ if as
24
+ if as.respond_to?(:parse)
25
+ @parser = as
26
+ else
27
+ raise ArgumentError, ":as options for #{@name} field is a #{as.inspect} which must respond to parse"
28
+ end
29
+ end
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,27 @@
1
+ module SaxStream
2
+ module Internal
3
+ class HandlerStack
4
+ def initialize()
5
+ @handlers = []
6
+ end
7
+
8
+ def root=(value)
9
+ @handlers = [value]
10
+ end
11
+
12
+ def top
13
+ @handlers.last
14
+ end
15
+
16
+ def push(handler)
17
+ @handlers.push(handler)
18
+ end
19
+
20
+ def pop(handler = nil)
21
+ raise ProgramError, "can't pop the last handler" if @handlers.length <= 1
22
+ raise ProgramError, "popping handler that isn't the top" if handler && handler != @handlers.last
23
+ @handlers.pop
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,115 @@
1
+ require 'sax_stream/internal/element_stack'
2
+
3
+ module SaxStream
4
+ module Internal
5
+ # Handles SAX events on behalf of a mapper class. Expects the first event to be start_element, and then
6
+ # uses other events to build the element until end_element is received, at which time the completed
7
+ # object will be sent off to the collector.
8
+ #
9
+ # Also handles child elements which use their own mapper class, and will pass off SAX control to other
10
+ # handlers to achieve this.
11
+ class MapperHandler
12
+ attr_accessor :stack
13
+ attr_reader :mapper_class, :collector
14
+
15
+ # mapper_class:: A class which has had SaxStream::Mapper included in it.
16
+ #
17
+ # collector:: The collector object used for this parsing run. This gets passed around to everything
18
+ # that needs it.
19
+ #
20
+ # handler_stack:: The current stack of Sax handling objects for this parsing run. This gets passed around
21
+ # to everything that needs it. This class does not need to push itself onto the stack,
22
+ # that has already been done for it. If it pushes other handlers onto the stack, then
23
+ # it will no longer be handling SAX events itself until they get popped off.
24
+ #
25
+ # element_stack:: Used internally by this object to collect XML elements that have been parsed which may
26
+ # be used when mapping this class. You don't need to pass this in except for dependency
27
+ # injection purposes.
28
+ def initialize(mapper_class, collector, handler_stack, element_stack = ElementStack.new)
29
+ raise ArgumentError, "no collector" unless collector
30
+ raise ArgumentError, "no mapper class" unless mapper_class
31
+ raise ArgumentError, "no handler stack" unless handler_stack
32
+ raise ArgumentError, "no element stack" unless element_stack
33
+
34
+ @mapper_class = mapper_class
35
+ @collector = collector
36
+ @element_stack = element_stack
37
+ @stack = handler_stack
38
+ end
39
+
40
+ def maps_node?(node_name)
41
+ @mapper_class.maps_node?(node_name)
42
+ end
43
+
44
+ def start_element(name, attrs = [])
45
+ start_current_object(name, attrs) || start_child_node(name, attrs) || start_child_data(name, attrs)
46
+ end
47
+
48
+ def end_element(name)
49
+ pop_element_stack(name) || end_current_object(name)
50
+ end
51
+
52
+ def cdata_block(string)
53
+ characters(string)
54
+ end
55
+
56
+ def characters(string)
57
+ unless @element_stack.empty?
58
+ @element_stack.record_characters(string)
59
+ end
60
+ end
61
+
62
+ def current_object
63
+ @current_object
64
+ end
65
+
66
+ private
67
+
68
+ def start_current_object(name, attrs)
69
+ if maps_node?(name)
70
+ @current_object = @mapper_class.new
71
+ attrs.each do |key, value|
72
+ @mapper_class.map_attribute_onto_object(@current_object, key, value)
73
+ end
74
+ @current_object
75
+ end
76
+ end
77
+
78
+ def start_child_node(name, attrs)
79
+ handler = @mapper_class.child_handler_for(prefix_with_element_stack(name), @collector, @stack, @current_object)
80
+ if handler
81
+ @stack.push(handler)
82
+ handler.start_element(name, attrs)
83
+ handler
84
+ end
85
+ end
86
+
87
+ def start_child_data(name, attrs)
88
+ raise ProgramError, "received child element #{name.inspect} before receiving main expected node #{@mapper_class.node_name.inspect}" unless current_object
89
+ @element_stack.push(name, attrs)
90
+ end
91
+
92
+ def pop_element_stack(name)
93
+ unless @element_stack.empty?
94
+ raise ProgramError "received end element event for #{name.inspect} but currently processing #{@element_stack.top_name.inspect}" unless @element_stack.top_name == name
95
+ @mapper_class.map_element_stack_top_onto_object(@current_object, @element_stack)
96
+ @element_stack.pop
97
+ end
98
+ end
99
+
100
+ def end_current_object(name)
101
+ raise ProgramError unless @current_object
102
+ raise ArgumentError, "received end element event for #{name.inspect} but currently processing #{@current_object.class.node_name.inspect}" unless @current_object.class.node_name == name
103
+ if @current_object.class.should_collect?
104
+ @collector << @current_object
105
+ end
106
+ @stack.pop(self)
107
+ @current_object = nil
108
+ end
109
+
110
+ def prefix_with_element_stack(name)
111
+ [@element_stack.path, name].reject {|part| part.nil? || part == ''}.join('/')
112
+ end
113
+ end
114
+ end
115
+ end
@@ -0,0 +1,26 @@
1
+ require 'nokogiri'
2
+ require 'sax_stream/internal/handler_stack'
3
+ require 'sax_stream/internal/combined_handler'
4
+
5
+ module SaxStream
6
+ module Internal
7
+ class SaxHandler < Nokogiri::XML::SAX::Document
8
+ def initialize(collector, mappers, handler_stack = HandlerStack.new)
9
+ @handler_stack = handler_stack
10
+ mapper_handlers = mappers.map do |mapper|
11
+ MapperHandler.new(mapper, collector, @handler_stack)
12
+ end
13
+ @handler_stack.root = CombinedHandler.new(@handler_stack, mapper_handlers)
14
+ end
15
+
16
+ [:start_element, :end_element, :characters, :cdata_block].each do |key|
17
+ code = <<-RUBY
18
+ def #{key}(*params)
19
+ @handler_stack.top.#{key}(*params)
20
+ end
21
+ RUBY
22
+ module_eval(code)
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,17 @@
1
+ module SaxStream
2
+ module Internal
3
+ class SingularRelationshipCollector
4
+ def initialize(parent, relation_name)
5
+ @parent = parent
6
+ @relation_name = relation_name
7
+ end
8
+
9
+ def <<(value)
10
+ if @parent.relations[@relation_name]
11
+ raise ProgramError, "found singular relationship #{@relation_name.inspect} occuring more than once. Existing is #{@parent.relations[@relation_name].inspect}"
12
+ end
13
+ @parent.relations[@relation_name] = value
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,162 @@
1
+ require 'sax_stream/internal/field_mapping'
2
+ require 'sax_stream/internal/child_mapping'
3
+
4
+ module SaxStream
5
+ # Include this module to make your class map an XML node. For usage examples, see the READEME.
6
+ module Mapper
7
+ def self.included(base)
8
+ base.extend ClassMethods
9
+ end
10
+
11
+ module ClassMethods
12
+ def node(name, options = {})
13
+ @node_name = name
14
+ @collect = options.has_key?(:collect) ? options[:collect] : true
15
+ end
16
+
17
+ def map(attribute_name, options = {})
18
+ store_field_mapping(options[:to], Internal::FieldMapping.new(attribute_name, options))
19
+ end
20
+
21
+ # Define a relation to another object which is built from an XML node using another class
22
+ # which also includes SaxStream::Mapper.
23
+ #
24
+ # attribute_name:: The name of the attribute on your object that the related objects will be stored in.
25
+ # options:: An options hash which can accept the following-
26
+ # [:to] Default value: "*"
27
+ # The path to the XML which defines an instance of this related node. This
28
+ # is a bit like an XPath expression, but not quite. See the README for examples.
29
+ # For relations, this can include a wildcard "*" as the last part of the path,
30
+ # eg: "product/review/*". If the path is just set to "*" then this will match
31
+ # any immediate child of the current node, which is good for polymorphic collections.
32
+ # [:as] Required, no default value.
33
+ # Needs to be a class which includes SaxStream::Mapper, or an array of such classes.
34
+ # Using an array of classes, even if the array only has one item, denotes that an
35
+ # array of related items are expected. Calling @object.relations['name'] will return
36
+ # an array which will be empty if nothing is found. If a singular value is used for
37
+ # the :as option, then the relation will be assumed to be singular, and so it will
38
+ # be nil or the expected class (and will raise an error if multiples are in the file).
39
+ # [:parent_collects] Default value: false
40
+ # Set to true if the object defining this relationship (ie, the parent
41
+ # in the relationship) needs to collect the defined children. If so, the
42
+ # parent object will be used as the collector for these children, and they
43
+ # will not be passed to the collector supplied to the parser. Use this when
44
+ # the child objects are not something you want to process on their own, but
45
+ # instead you want them all to be loaded into the parent which will then be
46
+ # collected as normal. If this is left false, then the parent ojbect will
47
+ # not be informed of it's children, because they will be passed to the collector
48
+ # and then forgotten about. However, the children will know about their parent,
49
+ # or at least what is known about it, but the parent will not be finished being
50
+ # parsed. The parent will have already parsed all XML attributes though.
51
+ def relate(attribute_name, options = {})
52
+ store_relation_mapping(options[:to] || '*', Internal::ChildMapping.new(attribute_name, options))
53
+ end
54
+
55
+ def node_name
56
+ @node_name
57
+ end
58
+
59
+ def maps_node?(name)
60
+ @node_name == name
61
+ end
62
+
63
+ def map_attribute_onto_object(object, key, value)
64
+ map_key_onto_object(object, "@#{key}", value)
65
+ end
66
+
67
+ def map_element_stack_top_onto_object(object, element_stack)
68
+ map_key_onto_object(object, element_stack.path, element_stack.content)
69
+ element_stack.attributes.each do |key, value|
70
+ map_key_onto_object(object, key, value)
71
+ end
72
+ end
73
+
74
+ def map_key_onto_object(object, key, value)
75
+ mapping = field_mapping(key)
76
+ if mapping
77
+ mapping.map_value_onto_object(object, value)
78
+ end
79
+ end
80
+
81
+ def child_handler_for(key, collector, handler_stack, current_object)
82
+ mapping = field_mapping(key)
83
+ if mapping
84
+ mapping.handler_for(key, collector, handler_stack, current_object)
85
+ end
86
+ end
87
+
88
+ def relation_mappings
89
+ @relation_mappings ||= []
90
+ end
91
+
92
+ def should_collect?
93
+ @collect
94
+ end
95
+
96
+ private
97
+
98
+ def store_relation_mapping(key, mapping)
99
+ relation_mappings << mapping
100
+ store_field_mapping(key, mapping)
101
+ end
102
+
103
+ def store_field_mapping(key, mapping)
104
+ if key.include?('*')
105
+ regex_mappings << [Regexp.new(key.gsub('*', '[^/]+')), mapping]
106
+ else
107
+ mappings[key] = mapping
108
+ end
109
+ end
110
+
111
+ def field_mapping(key)
112
+ mappings[key] || regex_field_mapping(key)
113
+ end
114
+
115
+ def regex_field_mapping(key)
116
+ regex_mappings.each do |regex, mapping|
117
+ return mapping if regex =~ key
118
+ end
119
+ nil
120
+ end
121
+
122
+ def regex_mappings
123
+ @regex_mappings ||= []
124
+ end
125
+
126
+ def mappings
127
+ @mappings ||= {}
128
+ end
129
+ end
130
+
131
+ def []=(key, value)
132
+ attributes[key] = value
133
+ end
134
+
135
+ def [](key)
136
+ attributes[key]
137
+ end
138
+
139
+ def inspect
140
+ "#{self.class.name}: #{attributes.inspect}"
141
+ end
142
+
143
+ def attributes
144
+ @attributes ||= {}
145
+ end
146
+
147
+ def relations
148
+ @relations ||= build_empty_relations
149
+ end
150
+
151
+ private
152
+
153
+ def build_empty_relations
154
+ result = {}
155
+ self.class.relation_mappings.each do |relation_mapping|
156
+ result[relation_mapping.name] = relation_mapping.build_empty_relation
157
+ end
158
+ result
159
+ end
160
+
161
+ end
162
+ end
@@ -0,0 +1,19 @@
1
+ module SaxStream
2
+ class NaiveCollector
3
+ def initialize
4
+ @objects = []
5
+ end
6
+
7
+ def mapped_objects
8
+ @objects
9
+ end
10
+
11
+ def <<(value)
12
+ @objects << value
13
+ end
14
+
15
+ def for_type(klass)
16
+ mapped_objects.select { |object| object.class == klass }
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,18 @@
1
+ require 'sax_stream/errors'
2
+ require 'sax_stream/internal/mapper_handler'
3
+ require 'sax_stream/internal/sax_handler'
4
+
5
+ module SaxStream
6
+ class Parser
7
+ def initialize(collector, mappers)
8
+ raise ArgumentError, "You must supply your parser with a collector" unless collector
9
+ raise ArgumentError, "You must supply your parser with at least one mapper class" if mappers.empty?
10
+ @sax_handler = Internal::SaxHandler.new(collector, mappers)
11
+ end
12
+
13
+ def parse_stream(io_stream)
14
+ parser = Nokogiri::XML::SAX::Parser.new(@sax_handler)
15
+ parser.parse(io_stream)
16
+ end
17
+ end
18
+ end
metadata ADDED
@@ -0,0 +1,71 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: sax_stream
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Craig Ambrose
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-04-07 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: nokogiri
16
+ requirement: &70166827474840 !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: 1.4.0
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: *70166827474840
25
+ description:
26
+ email:
27
+ - craig@craigambrose.com
28
+ executables: []
29
+ extensions: []
30
+ extra_rdoc_files: []
31
+ files:
32
+ - lib/sax_stream/errors.rb
33
+ - lib/sax_stream/internal/child_mapping.rb
34
+ - lib/sax_stream/internal/combined_handler.rb
35
+ - lib/sax_stream/internal/element_stack.rb
36
+ - lib/sax_stream/internal/field_mapping.rb
37
+ - lib/sax_stream/internal/handler_stack.rb
38
+ - lib/sax_stream/internal/mapper_handler.rb
39
+ - lib/sax_stream/internal/sax_handler.rb
40
+ - lib/sax_stream/internal/singular_relationship_collector.rb
41
+ - lib/sax_stream/mapper.rb
42
+ - lib/sax_stream/naive_collector.rb
43
+ - lib/sax_stream/parser.rb
44
+ - LICENSE
45
+ - README.markdown
46
+ homepage: http://github.com/craigambrose/sax_stream
47
+ licenses: []
48
+ post_install_message:
49
+ rdoc_options: []
50
+ require_paths:
51
+ - lib
52
+ required_ruby_version: !ruby/object:Gem::Requirement
53
+ none: false
54
+ requirements:
55
+ - - ! '>='
56
+ - !ruby/object:Gem::Version
57
+ version: '0'
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ none: false
60
+ requirements:
61
+ - - ! '>='
62
+ - !ruby/object:Gem::Version
63
+ version: '0'
64
+ requirements: []
65
+ rubyforge_project:
66
+ rubygems_version: 1.8.6
67
+ signing_key:
68
+ specification_version: 3
69
+ summary: A streaming XML parser which builds objects and passes them to a collecter
70
+ as they are ready
71
+ test_files: []