xmlcodec 0.1.2 → 0.1.3
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +123 -0
- data/Rakefile +4 -4
- data/lib/XMLUtils.rb +7 -4
- data/test/test_helper.rb +3 -5
- metadata +5 -5
- data/README +0 -81
data/README.rdoc
ADDED
@@ -0,0 +1,123 @@
|
|
1
|
+
= xmlcodec
|
2
|
+
|
3
|
+
This is a framework to create importers/exporters of XML formats into Ruby objects. To create a new importer/exporter all you have to do is create a simple ruby class for each of the XML elements. This then gives you four main API interactions for free, all using the same objects:
|
4
|
+
|
5
|
+
* Create a tree of ruby objects and export it as XML
|
6
|
+
* Import a full XML document as a ruby tree of objects
|
7
|
+
* Stream parse a XML document with events for elements as Ruby objects
|
8
|
+
* Create unlimited sized XML documents with constant memory usage by partially writing out the XML at the same time the in-memory tree is being created.
|
9
|
+
|
10
|
+
The first two API's handle full trees at all times. The stream parser allows you to parse a very big XML file as a stream like a SAX parser but receiving fully-formed Ruby objects as events so as to use the same object APIs without ever having the full tree in memory. The partial export API allows you to create huge XML files the same way you'd create a small one (by putting elements in the Ruby tree) but without having to create the whole tree in memory at any one time.
|
11
|
+
|
12
|
+
This project was created as an extract of work done at {Arquivo Nacional da Torre do Tombo}[http://antt.dgarq.gov.pt/].
|
13
|
+
|
14
|
+
== Usage
|
15
|
+
|
16
|
+
To create an importer exporter for this XML format:
|
17
|
+
|
18
|
+
<root>
|
19
|
+
<firstelement>
|
20
|
+
<secondelement firstattr='1'>
|
21
|
+
some value
|
22
|
+
</secondelement>
|
23
|
+
<secondelement firstattr='2'>
|
24
|
+
some other value
|
25
|
+
</secondelement>
|
26
|
+
</firstelement>
|
27
|
+
</root>
|
28
|
+
|
29
|
+
you would create the following classes:
|
30
|
+
|
31
|
+
require 'xmlcodec'
|
32
|
+
|
33
|
+
class Root < XMLCodec::XMLElement
|
34
|
+
elname 'root'
|
35
|
+
xmlsubel :firstelement
|
36
|
+
end
|
37
|
+
|
38
|
+
class FirstElement < XMLCodec::XMLElement
|
39
|
+
elname 'firstelement'
|
40
|
+
xmlsubel_mult :secondelement
|
41
|
+
end
|
42
|
+
|
43
|
+
class SecondElement < XMLCodec::XMLElement
|
44
|
+
elname 'secondelement'
|
45
|
+
elwithvalue
|
46
|
+
xmlattr :firstattr
|
47
|
+
end
|
48
|
+
|
49
|
+
elname defines the name of the element in the XML DOM. xmlsubel defines a
|
50
|
+
subelement that may exist only once. xmlsubel_mult defines a subelement that may
|
51
|
+
appear several times. xmlattr defines an attribute for the element. The classes
|
52
|
+
will respond to accessor methods with the names of the subelements and
|
53
|
+
attributes.
|
54
|
+
|
55
|
+
There is one more way to declare subelements:
|
56
|
+
|
57
|
+
class SomeOtherElement
|
58
|
+
elname 'stuff'
|
59
|
+
xmlsubelements
|
60
|
+
end
|
61
|
+
|
62
|
+
This one defines an element that can have a bunch of elements of different types
|
63
|
+
whose order is important. The class will have a #subelements method that gives
|
64
|
+
access to a container with the collection of the elements.
|
65
|
+
|
66
|
+
This is all you have to define to implement the importer/exporter for the
|
67
|
+
format.
|
68
|
+
|
69
|
+
To import XML just do:
|
70
|
+
|
71
|
+
# From text
|
72
|
+
Root.import_xml_text(File.new('file.xml'))
|
73
|
+
|
74
|
+
# From a REXML DOM
|
75
|
+
Root.import_xml(REXML::Document.new(File.new('file.xml')))
|
76
|
+
|
77
|
+
To export do:
|
78
|
+
|
79
|
+
# To generate XML text
|
80
|
+
string = some_element.xml_text
|
81
|
+
|
82
|
+
# To generate REXML DOM
|
83
|
+
doc = some_element.create_xml(REXML::Document.new)
|
84
|
+
|
85
|
+
All these calls require keeping the whole contents of the document in memory.
|
86
|
+
The ones that use the REXML DOM will have it twice. To handle large documents with constant memory usage another set of APIs is available.
|
87
|
+
|
88
|
+
To stream parse a large document you'd do something like:
|
89
|
+
|
90
|
+
class MyStreamListener
|
91
|
+
def el_secondelement(el)
|
92
|
+
obj = el.get_object
|
93
|
+
|
94
|
+
... do something with obj ...
|
95
|
+
|
96
|
+
# To remove it from the stream so the parent
|
97
|
+
# doesn't include it and memory is freed.
|
98
|
+
el.consume
|
99
|
+
end
|
100
|
+
end
|
101
|
+
|
102
|
+
parser = XMLStreamObjectParser.new(MyStreamListener.new)
|
103
|
+
parser.parse(some_string_or_file)
|
104
|
+
|
105
|
+
You can define as many listening methods as elements you'd like to listen to and by doing el.consume the element is not kept around and memory is freed. Note that when you consume an element it will not be part of the parent when that event comes around.
|
106
|
+
|
107
|
+
To produce very large XML files with constant memory usage you would do something like:
|
108
|
+
|
109
|
+
file = File.new('somefile.xml')
|
110
|
+
fe = FirstElement.new
|
111
|
+
10000.times do |i|
|
112
|
+
se = SecondElement.new(i)
|
113
|
+
fe.secondelement << se
|
114
|
+
se.partial_export(file)
|
115
|
+
end
|
116
|
+
fe.end_partial_export(file)
|
117
|
+
|
118
|
+
Here 10000 instances of <secondelement> where written to the file. Because we did the partial_export calls inside the loop, each instance was written to file and removed from the parent so at any one point we only have one instance of FirstElement and SecondElement in memory. Besides the calls to the partial_export methods all the code is the same you'd use to create the tree in memory.
|
119
|
+
|
120
|
+
|
121
|
+
== Author
|
122
|
+
|
123
|
+
Pedro Côrte-Real <pedro@pedrocr.net>
|
data/Rakefile
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
PKG_NAME = 'xmlcodec'
|
2
|
-
PKG_VERSION = '0.1.
|
2
|
+
PKG_VERSION = '0.1.3'
|
3
3
|
|
4
4
|
require 'rake'
|
5
5
|
require 'rake/testtask'
|
@@ -20,7 +20,7 @@ PKG_FILES = FileList[TEST_FILES,
|
|
20
20
|
'Rakefile']
|
21
21
|
|
22
22
|
RDOC_OPTIONS = ['-S', '-w 2', '-N']
|
23
|
-
RDOC_EXTRA_FILES = ['README']
|
23
|
+
RDOC_EXTRA_FILES = ['README.rdoc']
|
24
24
|
|
25
25
|
spec = Gem::Specification.new do |s|
|
26
26
|
s.platform = Gem::Platform::RUBY
|
@@ -50,10 +50,10 @@ Rake::TestTask.new do |t|
|
|
50
50
|
end
|
51
51
|
|
52
52
|
Rake::RDocTask.new do |rd|
|
53
|
-
rd.main = "README"
|
53
|
+
rd.main = "README.rdoc"
|
54
54
|
rd.name = :docs
|
55
55
|
rd.rdoc_files.include(RDOC_EXTRA_FILES, CODE_FILES)
|
56
|
-
rd.rdoc_dir = '
|
56
|
+
rd.rdoc_dir = 'doc'
|
57
57
|
rd.title = "#{PKG_NAME} API"
|
58
58
|
rd.options = RDOC_OPTIONS
|
59
59
|
end
|
data/lib/XMLUtils.rb
CHANGED
@@ -60,10 +60,13 @@ module XMLUtils
|
|
60
60
|
# Gets the xpath inside a given document that can either be a string or a
|
61
61
|
# REXML::Document
|
62
62
|
#
|
63
|
-
#
|
64
|
-
#
|
65
|
-
#
|
66
|
-
#
|
63
|
+
# Supported options (boolean):
|
64
|
+
# [:multiple]
|
65
|
+
# fetch all the occurences of the xpath
|
66
|
+
# [:with_attrs]
|
67
|
+
# include the attribute contents in the result
|
68
|
+
# [:recursive]
|
69
|
+
# recursively include all the subelements of the matches
|
67
70
|
def self.get_xpath(path, doc, opts={})
|
68
71
|
if doc.is_a? REXML::Document
|
69
72
|
doc = doc
|
data/test/test_helper.rb
CHANGED
@@ -14,16 +14,14 @@ class Test::Unit::TestCase
|
|
14
14
|
end
|
15
15
|
|
16
16
|
def validate_well_formed
|
17
|
-
filename = filename || @temp_path
|
18
17
|
assert(system("xmllint --version > /dev/null 2>&1"),
|
19
18
|
"xmllint utility not installed"+
|
20
19
|
"(on ubuntu/debian install package libxml2-utils)")
|
21
|
-
assert(system("xmllint #{
|
22
|
-
"Validation failed for #{
|
20
|
+
assert(system("xmllint #{@temp_path} >/dev/null"),
|
21
|
+
"Validation failed for #{@temp_path}")
|
23
22
|
end
|
24
23
|
|
25
24
|
def compare_xpath(value, path)
|
26
|
-
|
27
|
-
assert_equal(value.strip, XMLUtils::select_path(path, filename).strip)
|
25
|
+
assert_equal(value.strip, XMLUtils::select_path(path, @temp_path).strip)
|
28
26
|
end
|
29
27
|
end
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xmlcodec
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 29
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
8
|
- 1
|
9
|
-
-
|
10
|
-
version: 0.1.
|
9
|
+
- 3
|
10
|
+
version: 0.1.3
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- "Pedro C\xC3\xB4rte-Real"
|
@@ -28,7 +28,7 @@ executables: []
|
|
28
28
|
extensions: []
|
29
29
|
|
30
30
|
extra_rdoc_files:
|
31
|
-
- README
|
31
|
+
- README.rdoc
|
32
32
|
files:
|
33
33
|
- test/partial_export_test.rb
|
34
34
|
- test/subelements_test.rb
|
@@ -45,7 +45,7 @@ files:
|
|
45
45
|
- lib/xmlcodec.rb
|
46
46
|
- lib/stream_object_parser.rb
|
47
47
|
- lib/stream_parser.rb
|
48
|
-
- README
|
48
|
+
- README.rdoc
|
49
49
|
- LICENSE
|
50
50
|
- Rakefile
|
51
51
|
has_rdoc: true
|
data/README
DELETED
@@ -1,81 +0,0 @@
|
|
1
|
-
This is a library that helps create Ruby importers/exporters of XML. The core
|
2
|
-
of it is XMLCodec::XMLElement. To create an importer exporter for this XML
|
3
|
-
format:
|
4
|
-
|
5
|
-
<root>
|
6
|
-
<firstelement>
|
7
|
-
<secondelement firstattr='1'>
|
8
|
-
some value
|
9
|
-
</secondelement>
|
10
|
-
<secondelement firstattr='2'>
|
11
|
-
some other value
|
12
|
-
</secondelement>
|
13
|
-
</firstelement>
|
14
|
-
</root>
|
15
|
-
|
16
|
-
you'd create the following classes
|
17
|
-
|
18
|
-
require 'xmlcodec'
|
19
|
-
|
20
|
-
class Root < XMLCodec::XMLElement
|
21
|
-
elname 'root'
|
22
|
-
xmlsubel :firstelement
|
23
|
-
end
|
24
|
-
|
25
|
-
class FirstElement < XMLCodec::XMLElement
|
26
|
-
elname 'firstelement'
|
27
|
-
xmlsubel_mult :secondelement
|
28
|
-
end
|
29
|
-
|
30
|
-
class SecondElement < XMLCodec::XMLElement
|
31
|
-
elname 'secondelement'
|
32
|
-
elwithvalue
|
33
|
-
xmlattr :firstattr
|
34
|
-
end
|
35
|
-
|
36
|
-
elname defines the name of the element in the XML DOM. xmlsubel defines a
|
37
|
-
subelement that may exist only once. xmlsubel_mult defines a subelement that may
|
38
|
-
appear several times. xmlattr defines an attribute for the element. The classes
|
39
|
-
will respond to accessor methods with the names of the subelements and
|
40
|
-
attributes.
|
41
|
-
|
42
|
-
There is one more way to declare subelements:
|
43
|
-
|
44
|
-
class SomeOtherElement
|
45
|
-
elname 'stuff'
|
46
|
-
xmlsubelements
|
47
|
-
end
|
48
|
-
|
49
|
-
This one defines an element that can have a bunch of elements of different types
|
50
|
-
whose order is important. The class will have a #subelements method that gives
|
51
|
-
access to a container with the collection of the elements.
|
52
|
-
|
53
|
-
This is all you have to define to implement the importer/exporter for the
|
54
|
-
format.
|
55
|
-
|
56
|
-
To import from a file just do:
|
57
|
-
|
58
|
-
Root.import_xml_text(File.new('somefilename.xml'))
|
59
|
-
|
60
|
-
or from a REXML DOM:
|
61
|
-
|
62
|
-
Root.import_xml(REXML::Document.new(File.new('somefilename.xml')))
|
63
|
-
|
64
|
-
To export into a REXML DOM Document or Element do:
|
65
|
-
|
66
|
-
somerootelement.create_xml(REXML::Document.new)
|
67
|
-
|
68
|
-
or to some XML text:
|
69
|
-
|
70
|
-
text = somerootelement.xml_text
|
71
|
-
|
72
|
-
All these calls require keeping the whole contents of the document in memory.
|
73
|
-
The ones that use the REXML DOM will have it twice. To handle large documents
|
74
|
-
with constant memory usage you should try importing with
|
75
|
-
XMLCodec::XMLStreamObjectParser and exporting with
|
76
|
-
XMLCodec::XMLElement#partial_export.
|
77
|
-
|
78
|
-
|
79
|
-
Author:
|
80
|
-
Pedro Côrte-Real
|
81
|
-
<pedro@pedrocr.net>
|