xmlcodec 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc ADDED
@@ -0,0 +1,123 @@
1
+ = xmlcodec
2
+
3
+ This is a framework to create importers/exporters of XML formats into Ruby objects. To create a new importer/exporter all you have to do is create a simple ruby class for each of the XML elements. This then gives you four main API interactions for free, all using the same objects:
4
+
5
+ * Create a tree of ruby objects and export it as XML
6
+ * Import a full XML document as a ruby tree of objects
7
+ * Stream parse a XML document with events for elements as Ruby objects
8
+ * Create unlimited sized XML documents with constant memory usage by partially writing out the XML at the same time the in-memory tree is being created.
9
+
10
+ The first two API's handle full trees at all times. The stream parser allows you to parse a very big XML file as a stream like a SAX parser but receiving fully-formed Ruby objects as events so as to use the same object APIs without ever having the full tree in memory. The partial export API allows you to create huge XML files the same way you'd create a small one (by putting elements in the Ruby tree) but without having to create the whole tree in memory at any one time.
11
+
12
+ This project was created as an extract of work done at {Arquivo Nacional da Torre do Tombo}[http://antt.dgarq.gov.pt/].
13
+
14
+ == Usage
15
+
16
+ To create an importer exporter for this XML format:
17
+
18
+ <root>
19
+ <firstelement>
20
+ <secondelement firstattr='1'>
21
+ some value
22
+ </secondelement>
23
+ <secondelement firstattr='2'>
24
+ some other value
25
+ </secondelement>
26
+ </firstelement>
27
+ </root>
28
+
29
+ you would create the following classes:
30
+
31
+ require 'xmlcodec'
32
+
33
+ class Root < XMLCodec::XMLElement
34
+ elname 'root'
35
+ xmlsubel :firstelement
36
+ end
37
+
38
+ class FirstElement < XMLCodec::XMLElement
39
+ elname 'firstelement'
40
+ xmlsubel_mult :secondelement
41
+ end
42
+
43
+ class SecondElement < XMLCodec::XMLElement
44
+ elname 'secondelement'
45
+ elwithvalue
46
+ xmlattr :firstattr
47
+ end
48
+
49
+ elname defines the name of the element in the XML DOM. xmlsubel defines a
50
+ subelement that may exist only once. xmlsubel_mult defines a subelement that may
51
+ appear several times. xmlattr defines an attribute for the element. The classes
52
+ will respond to accessor methods with the names of the subelements and
53
+ attributes.
54
+
55
+ There is one more way to declare subelements:
56
+
57
+ class SomeOtherElement
58
+ elname 'stuff'
59
+ xmlsubelements
60
+ end
61
+
62
+ This one defines an element that can have a bunch of elements of different types
63
+ whose order is important. The class will have a #subelements method that gives
64
+ access to a container with the collection of the elements.
65
+
66
+ This is all you have to define to implement the importer/exporter for the
67
+ format.
68
+
69
+ To import XML just do:
70
+
71
+ # From text
72
+ Root.import_xml_text(File.new('file.xml'))
73
+
74
+ # From a REXML DOM
75
+ Root.import_xml(REXML::Document.new(File.new('file.xml')))
76
+
77
+ To export do:
78
+
79
+ # To generate XML text
80
+ string = some_element.xml_text
81
+
82
+ # To generate REXML DOM
83
+ doc = some_element.create_xml(REXML::Document.new)
84
+
85
+ All these calls require keeping the whole contents of the document in memory.
86
+ The ones that use the REXML DOM will have it twice. To handle large documents with constant memory usage another set of APIs is available.
87
+
88
+ To stream parse a large document you'd do something like:
89
+
90
+ class MyStreamListener
91
+ def el_secondelement(el)
92
+ obj = el.get_object
93
+
94
+ ... do something with obj ...
95
+
96
+ # To remove it from the stream so the parent
97
+ # doesn't include it and memory is freed.
98
+ el.consume
99
+ end
100
+ end
101
+
102
+ parser = XMLStreamObjectParser.new(MyStreamListener.new)
103
+ parser.parse(some_string_or_file)
104
+
105
+ You can define as many listening methods as elements you'd like to listen to and by doing el.consume the element is not kept around and memory is freed. Note that when you consume an element it will not be part of the parent when that event comes around.
106
+
107
+ To produce very large XML files with constant memory usage you would do something like:
108
+
109
+ file = File.new('somefile.xml')
110
+ fe = FirstElement.new
111
+ 10000.times do |i|
112
+ se = SecondElement.new(i)
113
+ fe.secondelement << se
114
+ se.partial_export(file)
115
+ end
116
+ fe.end_partial_export(file)
117
+
118
+ Here 10000 instances of <secondelement> where written to the file. Because we did the partial_export calls inside the loop, each instance was written to file and removed from the parent so at any one point we only have one instance of FirstElement and SecondElement in memory. Besides the calls to the partial_export methods all the code is the same you'd use to create the tree in memory.
119
+
120
+
121
+ == Author
122
+
123
+ Pedro Côrte-Real <pedro@pedrocr.net>
data/Rakefile CHANGED
@@ -1,5 +1,5 @@
1
1
  PKG_NAME = 'xmlcodec'
2
- PKG_VERSION = '0.1.2'
2
+ PKG_VERSION = '0.1.3'
3
3
 
4
4
  require 'rake'
5
5
  require 'rake/testtask'
@@ -20,7 +20,7 @@ PKG_FILES = FileList[TEST_FILES,
20
20
  'Rakefile']
21
21
 
22
22
  RDOC_OPTIONS = ['-S', '-w 2', '-N']
23
- RDOC_EXTRA_FILES = ['README']
23
+ RDOC_EXTRA_FILES = ['README.rdoc']
24
24
 
25
25
  spec = Gem::Specification.new do |s|
26
26
  s.platform = Gem::Platform::RUBY
@@ -50,10 +50,10 @@ Rake::TestTask.new do |t|
50
50
  end
51
51
 
52
52
  Rake::RDocTask.new do |rd|
53
- rd.main = "README"
53
+ rd.main = "README.rdoc"
54
54
  rd.name = :docs
55
55
  rd.rdoc_files.include(RDOC_EXTRA_FILES, CODE_FILES)
56
- rd.rdoc_dir = 'web/doc'
56
+ rd.rdoc_dir = 'doc'
57
57
  rd.title = "#{PKG_NAME} API"
58
58
  rd.options = RDOC_OPTIONS
59
59
  end
data/lib/XMLUtils.rb CHANGED
@@ -60,10 +60,13 @@ module XMLUtils
60
60
  # Gets the xpath inside a given document that can either be a string or a
61
61
  # REXML::Document
62
62
  #
63
- # opts can have:
64
- # :multiple: fetch all the occurences of the xpath
65
- # :with_attrs: include the attribute contents in the result
66
- # :recursive: recursively include all the subelements of the matches
63
+ # Supported options (boolean):
64
+ # [:multiple]
65
+ # fetch all the occurences of the xpath
66
+ # [:with_attrs]
67
+ # include the attribute contents in the result
68
+ # [:recursive]
69
+ # recursively include all the subelements of the matches
67
70
  def self.get_xpath(path, doc, opts={})
68
71
  if doc.is_a? REXML::Document
69
72
  doc = doc
data/test/test_helper.rb CHANGED
@@ -14,16 +14,14 @@ class Test::Unit::TestCase
14
14
  end
15
15
 
16
16
  def validate_well_formed
17
- filename = filename || @temp_path
18
17
  assert(system("xmllint --version > /dev/null 2>&1"),
19
18
  "xmllint utility not installed"+
20
19
  "(on ubuntu/debian install package libxml2-utils)")
21
- assert(system("xmllint #{filename} >/dev/null"),
22
- "Validation failed for #{filename}")
20
+ assert(system("xmllint #{@temp_path} >/dev/null"),
21
+ "Validation failed for #{@temp_path}")
23
22
  end
24
23
 
25
24
  def compare_xpath(value, path)
26
- filename = filename || @temp_path
27
- assert_equal(value.strip, XMLUtils::select_path(path, filename).strip)
25
+ assert_equal(value.strip, XMLUtils::select_path(path, @temp_path).strip)
28
26
  end
29
27
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xmlcodec
3
3
  version: !ruby/object:Gem::Version
4
- hash: 31
4
+ hash: 29
5
5
  prerelease: false
6
6
  segments:
7
7
  - 0
8
8
  - 1
9
- - 2
10
- version: 0.1.2
9
+ - 3
10
+ version: 0.1.3
11
11
  platform: ruby
12
12
  authors:
13
13
  - "Pedro C\xC3\xB4rte-Real"
@@ -28,7 +28,7 @@ executables: []
28
28
  extensions: []
29
29
 
30
30
  extra_rdoc_files:
31
- - README
31
+ - README.rdoc
32
32
  files:
33
33
  - test/partial_export_test.rb
34
34
  - test/subelements_test.rb
@@ -45,7 +45,7 @@ files:
45
45
  - lib/xmlcodec.rb
46
46
  - lib/stream_object_parser.rb
47
47
  - lib/stream_parser.rb
48
- - README
48
+ - README.rdoc
49
49
  - LICENSE
50
50
  - Rakefile
51
51
  has_rdoc: true
data/README DELETED
@@ -1,81 +0,0 @@
1
- This is a library that helps create Ruby importers/exporters of XML. The core
2
- of it is XMLCodec::XMLElement. To create an importer exporter for this XML
3
- format:
4
-
5
- <root>
6
- <firstelement>
7
- <secondelement firstattr='1'>
8
- some value
9
- </secondelement>
10
- <secondelement firstattr='2'>
11
- some other value
12
- </secondelement>
13
- </firstelement>
14
- </root>
15
-
16
- you'd create the following classes
17
-
18
- require 'xmlcodec'
19
-
20
- class Root < XMLCodec::XMLElement
21
- elname 'root'
22
- xmlsubel :firstelement
23
- end
24
-
25
- class FirstElement < XMLCodec::XMLElement
26
- elname 'firstelement'
27
- xmlsubel_mult :secondelement
28
- end
29
-
30
- class SecondElement < XMLCodec::XMLElement
31
- elname 'secondelement'
32
- elwithvalue
33
- xmlattr :firstattr
34
- end
35
-
36
- elname defines the name of the element in the XML DOM. xmlsubel defines a
37
- subelement that may exist only once. xmlsubel_mult defines a subelement that may
38
- appear several times. xmlattr defines an attribute for the element. The classes
39
- will respond to accessor methods with the names of the subelements and
40
- attributes.
41
-
42
- There is one more way to declare subelements:
43
-
44
- class SomeOtherElement
45
- elname 'stuff'
46
- xmlsubelements
47
- end
48
-
49
- This one defines an element that can have a bunch of elements of different types
50
- whose order is important. The class will have a #subelements method that gives
51
- access to a container with the collection of the elements.
52
-
53
- This is all you have to define to implement the importer/exporter for the
54
- format.
55
-
56
- To import from a file just do:
57
-
58
- Root.import_xml_text(File.new('somefilename.xml'))
59
-
60
- or from a REXML DOM:
61
-
62
- Root.import_xml(REXML::Document.new(File.new('somefilename.xml')))
63
-
64
- To export into a REXML DOM Document or Element do:
65
-
66
- somerootelement.create_xml(REXML::Document.new)
67
-
68
- or to some XML text:
69
-
70
- text = somerootelement.xml_text
71
-
72
- All these calls require keeping the whole contents of the document in memory.
73
- The ones that use the REXML DOM will have it twice. To handle large documents
74
- with constant memory usage you should try importing with
75
- XMLCodec::XMLStreamObjectParser and exporting with
76
- XMLCodec::XMLElement#partial_export.
77
-
78
-
79
- Author:
80
- Pedro Côrte-Real
81
- <pedro@pedrocr.net>