saxerator 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,3 +1,4 @@
1
+ script: "rspec spec"
1
2
  language: ruby
2
3
  bundler_args: --without coverage
3
4
  rvm:
data/README.md CHANGED
@@ -1,10 +1,8 @@
1
1
  Saxerator [![Build Status](https://secure.travis-ci.org/soulcutter/saxerator.png?branch=master)](http://travis-ci.org/soulcutter/saxerator)
2
2
  =========
3
3
 
4
- Saxerator is a SAX-based xml-to-hash parser designed for parsing very large files into manageable chunks. Rather than
5
- dealing directly with SAX callback methods, Saxerator gives you Enumerable access to chunks of an xml document.
6
- This approach is ideal for large xml files containing a collection of elements that you can process
7
- independently.
4
+ Saxerator is a streaming xml-to-hash parser designed for working with very large xml files by
5
+ giving you Enumerable access to manageable chunks of the document.
8
6
 
9
7
  Each xml chunk is parsed into a JSON-like Ruby Hash structure for consumption.
10
8
 
@@ -21,14 +19,15 @@ The DSL consists of predicates that may be combined to describe which elements t
21
19
  Saxerator will only enumerate over chunks of xml that match all of the combined predicates (see Examples section
22
20
  for added clarity).
23
21
 
24
- | Predicate | Explanation |
25
- |:----------------|:------------|
26
- | `all` | Returns the entire document parsed into a hash. Cannot combine with other predicates
27
- | `for_tag(name)` | Elements whose name matches the given `name`
28
- | `at_depth(n)` | Elements `n` levels deep inside the root of an xml document. The root element itself is `n = 0`
29
- | `within(name)` | Elements nested anywhere within an element with the given `name`
30
- | `child_of(name)`| Elements that are direct children of an element with the given `name`
31
- | `with_attribute(name, value)` | Elements with a given `name` and `value`. If no `value` is given, matches any element with the specified attribute name present
22
+ | Predicate | Explanation |
23
+ |:-----------------|:------------|
24
+ | `all` | Returns the entire document parsed into a hash. Cannot combine with other predicates
25
+ | `for_tag(name)` | Elements whose name matches the given `name`
26
+ | `for_tags(names)`| Elements whose name is in the `names` Array
27
+ | `at_depth(n)` | Elements `n` levels deep inside the root of an xml document. The root element itself is `n = 0`
28
+ | `within(name)` | Elements nested anywhere within an element with the given `name`
29
+ | `child_of(name)` | Elements that are direct children of an element with the given `name`
30
+ | `with_attribute(name, value)` | Elements that have an attribute with a given `name` and `value`. If no `value` is given, matches any element with the specified attribute name present
32
31
 
33
32
 
34
33
  Examples
@@ -70,20 +69,14 @@ books = bookshelf_contents.for_tag(:book)
70
69
  magazines = bookshelf_contents.for_tag(:magazine)
71
70
 
72
71
  books.each do |book|
73
- # Some processing on a book
72
+ # ...
74
73
  end
75
74
 
76
75
  magazines.each do |magazine|
77
- # Do some work with a magazine
76
+ # ...
78
77
  end
79
78
  ```
80
79
 
81
- Don't care about memory/streaming, you just want your xml in one big hash? Saxerator can do that too.
82
-
83
- ```ruby
84
- parser.all # big, giant hash
85
- ```
86
-
87
80
  Known Issues
88
81
  ------------
89
82
  * JRuby closes the file stream at the end of parsing, therefor to perform multiple operations
@@ -2,12 +2,12 @@ require 'saxerator/version'
2
2
 
3
3
  require 'saxerator/full_document'
4
4
  require 'saxerator/document_fragment'
5
- require 'saxerator/string_with_attributes'
6
- require 'saxerator/hash_with_attributes'
5
+ require 'saxerator/string_element'
6
+ require 'saxerator/hash_element'
7
7
  require 'saxerator/xml_node'
8
8
 
9
9
  require 'saxerator/parser/accumulator'
10
- require 'saxerator/parser/for_tag_latch'
10
+ require 'saxerator/parser/for_tags_latch'
11
11
  require 'saxerator/parser/at_depth_latch'
12
12
  require 'saxerator/parser/within_latch'
13
13
  require 'saxerator/parser/latched_accumulator'
@@ -1,7 +1,12 @@
1
1
  module Saxerator
2
2
  module DSL
3
3
  def for_tag(tag)
4
- specify Parser::ForTagLatch.new(tag.to_s)
4
+ for_tags([tag])
5
+ end
6
+
7
+ def for_tags(tags)
8
+ raise ArgumentError('#for_tags requires an Array argument') unless tags.is_a? Array
9
+ specify Parser::ForTagsLatch.new(tags.map(&:to_s))
5
10
  end
6
11
 
7
12
  def at_depth(depth)
@@ -0,0 +1,11 @@
1
+ module Saxerator
2
+ class HashElement < Hash
3
+ attr_accessor :attributes
4
+ attr_accessor :name
5
+
6
+ def initialize(name, attributes)
7
+ self.name = name
8
+ self.attributes = attributes
9
+ end
10
+ end
11
+ end
@@ -18,7 +18,7 @@ module Saxerator
18
18
  last = @stack.pop
19
19
  @stack.last.add_node last
20
20
  else
21
- @block.call(@stack.pop.to_hash)
21
+ @block.call(@stack.pop.block_variable)
22
22
  end
23
23
  end
24
24
 
@@ -1,8 +1,6 @@
1
- require 'saxerator/parser/document_latch'
2
-
3
1
  module Saxerator
4
2
  module Parser
5
- class AtDepthLatch < DocumentLatch
3
+ class AtDepthLatch < Nokogiri::XML::SAX::Document
6
4
  def initialize(depth)
7
5
  @target_depth = depth
8
6
  @current_depth = -1
@@ -1,18 +1,14 @@
1
- require 'saxerator/parser/document_latch'
2
-
3
1
  module Saxerator
4
2
  module Parser
5
- class ChildOfLatch < DocumentLatch
3
+ class ChildOfLatch < Nokogiri::XML::SAX::Document
6
4
  def initialize(name)
7
5
  @name = name
8
6
  @depths = []
9
- @depth_within_element = 0
10
7
  end
11
8
 
12
9
  def start_element name, _
13
10
  if depth_within_element > 0
14
11
  increment_depth(1)
15
- resolve_open_status
16
12
  end
17
13
  if @name == name
18
14
  @depths.push 1
@@ -23,10 +19,13 @@ module Saxerator
23
19
  if depth_within_element > 0
24
20
  increment_depth(-1)
25
21
  @depths.pop if @depths.last == 0
26
- resolve_open_status
27
22
  end
28
23
  end
29
24
 
25
+ def open?
26
+ depth_within_element == 2
27
+ end
28
+
30
29
  def increment_depth(amount)
31
30
  @depths.map! { |depth| depth + amount }
32
31
  end
@@ -34,14 +33,6 @@ module Saxerator
34
33
  def depth_within_element
35
34
  @depths.size > 0 ? @depths.last : 0
36
35
  end
37
-
38
- def resolve_open_status
39
- if depth_within_element == 2
40
- open
41
- else
42
- close
43
- end
44
- end
45
36
  end
46
37
  end
47
38
  end
@@ -2,17 +2,17 @@ require 'saxerator/parser/document_latch'
2
2
 
3
3
  module Saxerator
4
4
  module Parser
5
- class ForTagLatch < DocumentLatch
6
- def initialize(name)
7
- @name = name
5
+ class ForTagsLatch < DocumentLatch
6
+ def initialize(names)
7
+ @names = names
8
8
  end
9
9
 
10
10
  def start_element name, _
11
- name == @name ? open : close
11
+ @names.include?(name) ? open : close
12
12
  end
13
13
 
14
14
  def end_element name
15
- close if name == @name
15
+ close if @names.include?(name)
16
16
  end
17
17
  end
18
18
  end
@@ -1,8 +1,6 @@
1
- require 'saxerator/parser/document_latch'
2
-
3
1
  module Saxerator
4
2
  module Parser
5
- class WithinLatch < DocumentLatch
3
+ class WithinLatch < Nokogiri::XML::SAX::Document
6
4
  def initialize(name)
7
5
  @name = name
8
6
  @depth_within_element = 0
@@ -11,16 +9,18 @@ module Saxerator
11
9
  def start_element name, _
12
10
  if name == @name || @depth_within_element > 0
13
11
  @depth_within_element += 1
14
- open if @depth_within_element == 2
15
12
  end
16
13
  end
17
14
 
18
15
  def end_element _
19
16
  if @depth_within_element > 0
20
17
  @depth_within_element -= 1
21
- close if @depth_within_element == 1
22
18
  end
23
19
  end
20
+
21
+ def open?
22
+ @depth_within_element > 1
23
+ end
24
24
  end
25
25
  end
26
26
  end
@@ -0,0 +1,12 @@
1
+ module Saxerator
2
+ class StringElement < String
3
+ attr_accessor :attributes
4
+ attr_accessor :name
5
+
6
+ def initialize(str, name, attributes)
7
+ self.name = name
8
+ self.attributes = attributes
9
+ super(str)
10
+ end
11
+ end
12
+ end
@@ -1,3 +1,3 @@
1
1
  module Saxerator
2
- VERSION = "0.3.0"
2
+ VERSION = "0.5.0"
3
3
  end
@@ -16,34 +16,31 @@ module Saxerator
16
16
  end
17
17
 
18
18
  def to_s
19
- string = StringWithAttributes.new(@children.join)
20
- string.attributes = @attributes
21
- string
19
+ StringElement.new(@children.join, @name, @attributes)
22
20
  end
23
21
 
24
22
  def to_hash
25
- if @text
26
- to_s
27
- else
28
- out = HashWithAttributes.new
29
- out.attributes = @attributes
30
-
31
- @children.each do |child|
32
- name = child.name
33
- element = child.to_hash
34
-
35
- if out[name]
36
- if !out[name].is_a?(Array)
37
- out[name] = [out[name]]
38
- end
39
- out[name] << element
40
- else
41
- out[name] = element
23
+ hash = HashElement.new(@name, @attributes)
24
+
25
+ @children.each do |child|
26
+ name = child.name
27
+ element = child.block_variable
28
+
29
+ if hash[name]
30
+ if !hash[name].is_a?(Array)
31
+ hash[name] = [hash[name]]
42
32
  end
33
+ hash[name] << element
34
+ else
35
+ hash[name] = element
43
36
  end
44
-
45
- out
46
37
  end
38
+
39
+ hash
40
+ end
41
+
42
+ def block_variable
43
+ @text ? to_s : to_hash
47
44
  end
48
45
  end
49
46
  end
@@ -10,10 +10,8 @@ Gem::Specification.new do |s|
10
10
  s.homepage = 'https://github.com/soulcutter/saxerator'
11
11
  s.summary = 'A SAX-based XML-to-hash parser for parsing large files into manageable chunks'
12
12
  s.description = <<-eos
13
- Saxerator is a SAX-based xml-to-hash parser designed for parsing very large files into manageable chunks. Rather than
14
- dealing directly with SAX callback methods, Saxerator gives you Enumerable access to chunks of an xml document.
15
- This approach is ideal for large xml files containing a collection of elements that you can process
16
- independently.
13
+ Saxerator is a streaming xml-to-hash parser designed for working with very large xml files by
14
+ giving you Enumerable access to manageable chunks of the document.
17
15
  eos
18
16
  s.license = 'MIT'
19
17
 
@@ -37,6 +35,5 @@ Gem::Specification.new do |s|
37
35
 
38
36
  s.add_runtime_dependency 'nokogiri', '>= 1.4.0'
39
37
 
40
- s.add_development_dependency 'rake'
41
- s.add_development_dependency 'rspec'
38
+ s.add_development_dependency 'rspec', '>= 2.11.0'
42
39
  end
@@ -14,7 +14,7 @@
14
14
  <content type="html">&lt;p&gt;Airplanes are very large &#8212; this can present difficulty in digestion.&lt;/p&gt;</content>
15
15
  <media:thumbnail url="http://www.gravatar.com/avatar/a9eb6ba22e482b71b266daadf9c9a080?s=80"/>
16
16
  <author>
17
- <name>Soulcutter</name>
17
+ <name><![CDATA[Soul<utter]]></name>
18
18
  </author>
19
19
  <contributor type="primary">
20
20
  <name>Jane Doe</name>
@@ -0,0 +1,20 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::FullDocument#all" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <blurbs>
9
+ <blurb>one</blurb>
10
+ <blurb>two</blurb>
11
+ <blurb>three</blurb>
12
+ <notablurb>four</notablurb>
13
+ </blurbs>
14
+ eos
15
+ end
16
+
17
+ it "should allow you to parse an entire document" do
18
+ parser.all.should == {'blurb' => ['one', 'two', 'three'], 'notablurb' => 'four'}
19
+ end
20
+ end
@@ -0,0 +1,34 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::DSL#at_depth" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <publications>
9
+ <book>
10
+ <name>How to eat an airplane</name>
11
+ <author>Leviticus Alabaster</author>
12
+ </book>
13
+ <book>
14
+ <name>To wallop a horse in the face</name>
15
+ <author>Jeanne Clarewood</author>
16
+ </book>
17
+ </publications>
18
+ eos
19
+ end
20
+
21
+ it "should parse elements at the requested tag depth" do
22
+ parser.at_depth(2).inject([], :<<).should == [
23
+ 'How to eat an airplane', 'Leviticus Alabaster',
24
+ 'To wallop a horse in the face', 'Jeanne Clarewood'
25
+ ]
26
+ end
27
+
28
+ it "should work in combination with #for_tag" do
29
+ parser.at_depth(2).for_tag(:name).inject([], :<<).should == [
30
+ 'How to eat an airplane',
31
+ 'To wallop a horse in the face'
32
+ ]
33
+ end
34
+ end
@@ -0,0 +1,36 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::DSL#child_of" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <root>
9
+ <children>
10
+ <name>Rudy McMannis</name>
11
+ <children>
12
+ <name>Tom McMannis</name>
13
+ </children>
14
+ <grandchildren>
15
+ <name>Mildred Marston</name>
16
+ </grandchildren>
17
+ <name>Anne Welsh</name>
18
+ </children>
19
+ </root>
20
+ eos
21
+ end
22
+
23
+ it "should only parse children of the specified tag" do
24
+ parser.child_of(:grandchildren).inject([], :<<).should == [
25
+ 'Mildred Marston'
26
+ ]
27
+ end
28
+
29
+ it "should work in combination with #for_tag" do
30
+ parser.for_tag(:name).child_of(:children).inject([], :<<).should == [
31
+ 'Rudy McMannis',
32
+ 'Tom McMannis',
33
+ 'Anne Welsh'
34
+ ]
35
+ end
36
+ end
@@ -0,0 +1,20 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::DSL#for_tag" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <blurbs>
9
+ <blurb>one</blurb>
10
+ <blurb>two</blurb>
11
+ <blurb>three</blurb>
12
+ <notablurb>four</notablurb>
13
+ </blurbs>
14
+ eos
15
+ end
16
+
17
+ it "should only select the specified tag" do
18
+ parser.for_tag(:blurb).inject([], :<<).should == ['one', 'two', 'three']
19
+ end
20
+ end
@@ -0,0 +1,19 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::DSL#for_tags" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <blurbs>
9
+ <blurb1>one</blurb1>
10
+ <blurb2>two</blurb2>
11
+ <blurb3>three</blurb3>
12
+ </blurbs>
13
+ eos
14
+ end
15
+
16
+ it "should only select the specified tags" do
17
+ parser.for_tags(%w(blurb1 blurb3)).inject([], :<<).should == ['one', 'three']
18
+ end
19
+ end
@@ -0,0 +1,28 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::DSL#with_attribute" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <book>
9
+ <name>How to eat an airplane</name>
10
+ <author>
11
+ <name type="primary">Leviticus Alabaster</name>
12
+ <name type="foreword">Eunice Diesel</name>
13
+ </author>
14
+ </book>
15
+ eos
16
+ end
17
+
18
+ it "should match tags with the specified attributes" do
19
+ subject.with_attribute(:type).inject([], :<<).should == [
20
+ 'Leviticus Alabaster',
21
+ 'Eunice Diesel'
22
+ ]
23
+ end
24
+
25
+ it "should match tags with the specified attributes" do
26
+ subject.with_attribute(:type, :primary).inject([], :<<).should == ['Leviticus Alabaster']
27
+ end
28
+ end
@@ -0,0 +1,28 @@
1
+ require 'spec_helper'
2
+
3
+ describe "Saxerator::DSL#within" do
4
+ subject(:parser) { Saxerator.parser(xml) }
5
+
6
+ let(:xml) do
7
+ <<-eos
8
+ <magazine>
9
+ <name>The Smarterest</name>
10
+ <article>
11
+ <name>Is our children learning?</name>
12
+ <author>Hazel Nutt</author>
13
+ </article>
14
+ </magazine>
15
+ eos
16
+ end
17
+
18
+ it "should only parse elements nested within the specified tag" do
19
+ parser.within(:article).inject([], :<<).should == [
20
+ 'Is our children learning?',
21
+ 'Hazel Nutt'
22
+ ]
23
+ end
24
+
25
+ it "should work in combination with #for_tag" do
26
+ parser.for_tag(:name).within(:article).inject([], :<<).should == ['Is our children learning?']
27
+ end
28
+ end
@@ -7,164 +7,69 @@ describe Saxerator do
7
7
  File.new(File.join(File.dirname(__FILE__), '..', 'fixtures', name))
8
8
  end
9
9
 
10
- context ".parser" do
11
- subject { parser }
12
- let(:parser) { Saxerator.parser(xml) }
10
+ context "#parser" do
11
+ subject(:parser) { Saxerator.parser(xml) }
13
12
 
14
- context "with a string with blurbs and one non-blurb" do
15
- let(:xml) do
16
- <<-eos
17
- <blurbs>
18
- <blurb>one</blurb>
19
- <blurb>two</blurb>
20
- <blurb>three</blurb>
21
- <notablurb>four</notablurb>
22
- </blurbs>
23
- eos
24
- end
25
-
26
- it "should parse simple strings" do
27
- subject.for_tag(:blurb).inject([], :<<).should == ['one', 'two', 'three']
28
- end
13
+ context "with a File argument" do
14
+ let(:xml) { fixture_file('flat_blurbs.xml') }
29
15
 
30
- it "should only parse the requested tag" do
31
- subject.for_tag(:notablurb).inject([], :<<).should == ['four']
16
+ it "should be able to parse it" do
17
+ parser.all.should == {'blurb' => ['one', 'two', 'three']}
32
18
  end
33
19
 
34
- it "should allow you to parse an entire document" do
35
- subject.all.should == {'blurb' => ['one', 'two', 'three'], 'notablurb' => 'four'}
20
+ it "should allow multiple operations on the same parser" do
21
+ # This exposes a bug where if a File is not reset only the first
22
+ # Enumerable method works as expected
23
+ parser.for_tag(:blurb).first.should == 'one'
24
+ parser.for_tag(:blurb).first.should == 'one'
36
25
  end
37
26
  end
38
27
 
39
- context "with a string with an element at multiple depths" do
28
+ context "with a String argument" do
40
29
  let(:xml) do
41
30
  <<-eos
42
- <publications>
43
- <book>
44
- <name>How to eat an airplane</name>
45
- <author>
46
- <name type="primary">Leviticus Alabaster</name>
47
- <name type="foreword">Eunice Diesel</name>
48
- </author>
49
- </book>
50
- <book>
51
- <name>To wallop a horse in the face</name>
52
- <author>
53
- <name>Jeanne Clarewood</name>
54
- </author>
55
- </book>
56
- <article>
57
- <name>Is our children learning?</name>
58
- <author>
59
- <name>Hazel Nutt</name>
60
- </author>
61
- </article>
62
- </publication>
31
+ <book>
32
+ <name>Illiterates that can read</name>
33
+ <author>Eunice Diesel</author>
34
+ </book>
63
35
  eos
64
36
  end
65
37
 
66
- it "should only parse the requested tag depth" do
67
- subject.at_depth(2).inject([], :<<).should == [
68
- 'How to eat an airplane', { 'name' => ['Leviticus Alabaster', 'Eunice Diesel'] },
69
- 'To wallop a horse in the face', { 'name' => 'Jeanne Clarewood' },
70
- 'Is our children learning?', { 'name' => 'Hazel Nutt' }
71
- ]
72
- end
73
-
74
- it "should only parse the requested tag depth and tag" do
75
- subject.at_depth(2).for_tag(:name).inject([], :<<).should == [
76
- 'How to eat an airplane',
77
- 'To wallop a horse in the face',
78
- 'Is our children learning?'
79
- ]
80
- end
81
-
82
- it "should only parse tags nested inside the specified tag" do
83
- subject.within(:article).inject([], :<<).should == [
84
- 'Is our children learning?',
85
- { 'name' => 'Hazel Nutt' }
86
- ]
87
- end
88
-
89
- it "should combine #for_tag and #within to parse the specified elements" do
90
- subject.for_tag(:name).within(:article).inject([], :<<).should == [
91
- 'Is our children learning?',
92
- 'Hazel Nutt'
93
- ]
94
- end
95
-
96
- it "should match tags with the specified attributes" do
97
- subject.with_attribute(:type).inject([], :<<).should == [
98
- 'Leviticus Alabaster',
99
- 'Eunice Diesel'
100
- ]
101
- end
102
-
103
- it "should match tags with the specified attributes" do
104
- subject.with_attribute(:type, :primary).inject([], :<<).should == ['Leviticus Alabaster']
38
+ it "should be able to parse it" do
39
+ parser.all.should == { 'name' => 'Illiterates that can read', 'author' => 'Eunice Diesel' }
105
40
  end
106
41
  end
42
+ end
107
43
 
108
- context "with a grand child" do
109
- let(:xml) do
110
- <<-eos
111
- <root>
112
- <children>
113
- <name>Rudy McMannis</name>
114
- <children>
115
- <name>Tom McMannis</name>
116
- </children>
117
- <grandchildren>
118
- <name>Mildred Marston</name>
119
- </grandchildren>
120
- <name>Anne Welsh</name>
121
- </children>
122
- </root>
123
- eos
124
- end
44
+ context "block_variable format" do
45
+ let(:xml) { fixture_file('nested_elements.xml') }
46
+ subject(:entry) { Saxerator.parser(xml).for_tag(:entry).first }
125
47
 
126
- it "should only parse children of the specified tag" do
127
- subject.child_of(:grandchildren).inject([], :<<).should == [
128
- 'Mildred Marston'
129
- ]
130
- end
48
+ # string
49
+ specify { entry['title'].should == 'How to eat an airplane' }
131
50
 
132
- it "should combine #for_tag and #child_of" do
133
- subject.for_tag(:name).child_of(:children).inject([], :<<).should == [
134
- 'Rudy McMannis',
135
- 'Tom McMannis',
136
- 'Anne Welsh'
137
- ]
138
- end
139
- end
51
+ # hash and cdata inside name
52
+ specify { entry['author'].should == {'name' => 'Soul<utter'} }
140
53
 
141
- context "with a file with blurbs" do
142
- let(:xml) { fixture_file('flat_blurbs.xml') }
54
+ # array of hashes
55
+ specify { entry['contributor'].should == [{'name' => 'Jane Doe'}, {'name' => 'Leviticus Alabaster'}] }
143
56
 
144
- it "should parse simple strings" do
145
- subject.for_tag(:blurb).inject([], :<<).should == ['one', 'two', 'three']
146
- end
57
+ # attributes on a hash
58
+ specify { entry['contributor'][0].attributes['type'].should == 'primary' }
147
59
 
148
- it "should allow multiple operations on the same parser" do
149
- # This exposes a bug where if a File is not reset only the first
150
- # Enumerable method works as expected
151
- subject.for_tag(:blurb).first.should == 'one'
152
- subject.for_tag(:blurb).first.should == 'one'
153
- end
154
- end
60
+ # attributes on a string
61
+ specify { entry['content'].attributes['type'].should == 'html' }
155
62
 
156
- # Verifying the basic parsing behaviors (strings, hashes, arrays, attributes, character entity decoding)
157
- context "with a file with nested elements" do
158
- let(:xml) { fixture_file('nested_elements.xml') }
159
- subject { parser.for_tag(:entry).first }
63
+ # name on a hash
64
+ specify { entry.name.should == 'entry' }
160
65
 
161
- specify { subject['title'].should == 'How to eat an airplane' }
162
- specify { subject['author'].should == {'name' => 'Soulcutter'} }
66
+ # name on a string
67
+ specify { entry['title'].name.should == 'title' }
163
68
 
164
- specify { subject['contributor'].should == [{'name' => 'Jane Doe'}, {'name' => 'Leviticus Alabaster'}] }
165
- specify { subject['contributor'][0].attributes['type'].should == 'primary' }
69
+ # character entity decoding
70
+ specify { entry['content'].should == "<p>Airplanes are very large — this can present difficulty in digestion.</p>"}
166
71
 
167
- specify { subject['content'].should == "<p>Airplanes are very large — this can present difficulty in digestion.</p>"}
168
- end
72
+ # empty element
73
+ specify { entry['media:thumbnail'].should == {} }
169
74
  end
170
75
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: saxerator
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.5.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-08-25 00:00:00.000000000 Z
12
+ date: 2012-09-05 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: nokogiri
@@ -27,22 +27,6 @@ dependencies:
27
27
  - - ! '>='
28
28
  - !ruby/object:Gem::Version
29
29
  version: 1.4.0
30
- - !ruby/object:Gem::Dependency
31
- name: rake
32
- requirement: !ruby/object:Gem::Requirement
33
- none: false
34
- requirements:
35
- - - ! '>='
36
- - !ruby/object:Gem::Version
37
- version: '0'
38
- type: :development
39
- prerelease: false
40
- version_requirements: !ruby/object:Gem::Requirement
41
- none: false
42
- requirements:
43
- - - ! '>='
44
- - !ruby/object:Gem::Version
45
- version: '0'
46
30
  - !ruby/object:Gem::Dependency
47
31
  name: rspec
48
32
  requirement: !ruby/object:Gem::Requirement
@@ -50,7 +34,7 @@ dependencies:
50
34
  requirements:
51
35
  - - ! '>='
52
36
  - !ruby/object:Gem::Version
53
- version: '0'
37
+ version: 2.11.0
54
38
  type: :development
55
39
  prerelease: false
56
40
  version_requirements: !ruby/object:Gem::Requirement
@@ -58,12 +42,10 @@ dependencies:
58
42
  requirements:
59
43
  - - ! '>='
60
44
  - !ruby/object:Gem::Version
61
- version: '0'
62
- description: ! " Saxerator is a SAX-based xml-to-hash parser designed for parsing
63
- very large files into manageable chunks. Rather than\n dealing directly with
64
- SAX callback methods, Saxerator gives you Enumerable access to chunks of an xml
65
- document.\n This approach is ideal for large xml files containing a collection
66
- of elements that you can process\n independently.\n"
45
+ version: 2.11.0
46
+ description: ! " Saxerator is a streaming xml-to-hash parser designed for working
47
+ with very large xml files by\n giving you Enumerable access to manageable chunks
48
+ of the document.\n"
67
49
  email:
68
50
  - bradley.schaefer@gmail.com
69
51
  executables: []
@@ -80,21 +62,28 @@ files:
80
62
  - lib/saxerator/document_fragment.rb
81
63
  - lib/saxerator/dsl.rb
82
64
  - lib/saxerator/full_document.rb
83
- - lib/saxerator/hash_with_attributes.rb
65
+ - lib/saxerator/hash_element.rb
84
66
  - lib/saxerator/parser/accumulator.rb
85
67
  - lib/saxerator/parser/at_depth_latch.rb
86
68
  - lib/saxerator/parser/child_of_latch.rb
87
69
  - lib/saxerator/parser/document_latch.rb
88
- - lib/saxerator/parser/for_tag_latch.rb
70
+ - lib/saxerator/parser/for_tags_latch.rb
89
71
  - lib/saxerator/parser/latched_accumulator.rb
90
72
  - lib/saxerator/parser/with_attribute_latch.rb
91
73
  - lib/saxerator/parser/within_latch.rb
92
- - lib/saxerator/string_with_attributes.rb
74
+ - lib/saxerator/string_element.rb
93
75
  - lib/saxerator/version.rb
94
76
  - lib/saxerator/xml_node.rb
95
77
  - lib/saxerator.rb
96
78
  - spec/fixtures/flat_blurbs.xml
97
79
  - spec/fixtures/nested_elements.xml
80
+ - spec/lib/dsl/all_spec.rb
81
+ - spec/lib/dsl/at_depth_spec.rb
82
+ - spec/lib/dsl/child_of_spec.rb
83
+ - spec/lib/dsl/for_tag_spec.rb
84
+ - spec/lib/dsl/for_tags_spec.rb
85
+ - spec/lib/dsl/with_attribute_spec.rb
86
+ - spec/lib/dsl/within_spec.rb
98
87
  - spec/lib/saxerator_spec.rb
99
88
  - spec/spec_helper.rb
100
89
  - benchmark/benchmark.rb
@@ -127,5 +116,12 @@ summary: A SAX-based XML-to-hash parser for parsing large files into manageable
127
116
  test_files:
128
117
  - spec/fixtures/flat_blurbs.xml
129
118
  - spec/fixtures/nested_elements.xml
119
+ - spec/lib/dsl/all_spec.rb
120
+ - spec/lib/dsl/at_depth_spec.rb
121
+ - spec/lib/dsl/child_of_spec.rb
122
+ - spec/lib/dsl/for_tag_spec.rb
123
+ - spec/lib/dsl/for_tags_spec.rb
124
+ - spec/lib/dsl/with_attribute_spec.rb
125
+ - spec/lib/dsl/within_spec.rb
130
126
  - spec/lib/saxerator_spec.rb
131
127
  - spec/spec_helper.rb
@@ -1,5 +0,0 @@
1
- module Saxerator
2
- class HashWithAttributes < Hash
3
- attr_accessor :attributes
4
- end
5
- end
@@ -1,5 +0,0 @@
1
- module Saxerator
2
- class StringWithAttributes < String
3
- attr_accessor :attributes
4
- end
5
- end