flexparser 1.0.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,200 @@
1
+ # Flexparser
2
+ `Flexparser` provides an easy to use DSL for flexible, robust xml parsers. The goal of eflexparser is to be able to write **One Parser to parse them all**.
3
+
4
+ ## Installation
5
+
6
+ Add this line to your application's Gemfile:
7
+
8
+ ```ruby
9
+ gem 'flexparser', path: 'vendor/gems/' # Or whereever you have eflexparser stored
10
+ ```
11
+
12
+ Since this gem is private, it can only be vendored into a project right now.
13
+
14
+ **BEWARE:** Since i started working on this gem, another gem by the name of `flexparser` has been published to rubygems. So calling `gem install flexparser` will in fact install the wrong gem.
15
+
16
+ ## Usage
17
+ #### Basics:
18
+
19
+ Including the `Flexparser` module in any Class turns it into a parser.
20
+ Lets start simple:
21
+ ```ruby
22
+ class WebParser
23
+ include Flexparser
24
+
25
+ property 'url'
26
+ end
27
+ ```
28
+ Now this class is able to parse xml similar to this:
29
+ ```xml
30
+ <url>www.my-page.com</url>
31
+ ```
32
+ Now your parser can do this:
33
+ ```ruby
34
+ # Assuming the xml variable holds the xml code mentioned above
35
+ website = WebParser.parse xml
36
+ webite.url #=> 'www.my-page.com'
37
+ ```
38
+
39
+ #### Collections
40
+ A `node` command will only return the first value it finds. When you have multiple nodes that interest you, you can get a `collection` of them.
41
+ ```ruby
42
+ books = '
43
+ <work author="H.P. Lovecraft">
44
+ <story>The Call of Cthulhu</story>
45
+ <story>Dagon</story>
46
+ <story>The Nameless City</story>
47
+ </work>
48
+ '
49
+
50
+ class LovecraftParser
51
+ include Flexparser
52
+
53
+ property 'story', collection: true
54
+ end
55
+
56
+ work = LovecraftParser.parse books
57
+ work.story #=> ['The Call of Cthulhu', 'Dagon', 'The Nameless City']
58
+ ```
59
+
60
+ #### Nested Parsers
61
+ Sometimes you want more than to just xtract a String. This way you can make your parser return complex Objects and nest parsers as deep as you like.
62
+ ```ruby
63
+ library = "
64
+ <book>
65
+ <author>J. R. R. Tolkien</author>
66
+ <title>The Hobbit</title>
67
+ </book>
68
+ <book>
69
+ <author>Suzanne Collins</author>
70
+ <title>The Hunger Games</title>
71
+ </book>"
72
+
73
+ class LibraryParser
74
+ include Flexparser
75
+
76
+ property 'book', collection: true do
77
+ attr_accessor :isbn
78
+ property 'author'
79
+ property 'title'
80
+ end
81
+ end
82
+
83
+ lib = LibraryParser.parse library
84
+ lib.book.second.authro #=> 'Suzanne Collins'
85
+ lib.book.first.title #=> 'The Hobbit'
86
+ lib.book.first.isbn = '9780582186552'
87
+ lib.book.first.isbn #=> '9780582186552'
88
+ ```
89
+ With nested parsers, anonymous classes are defined inside an existing parser. Therefore you can define methods all you like (should the need arise).
90
+
91
+ #### Tag Definitions
92
+ You might not always know (or it might not always be the same), what the information you are looking for is called. If that is the case, you can define multiple tags for the same property. Here are a few examples:
93
+ ```ruby
94
+ class UniParser
95
+ include Flexparser
96
+
97
+ # Creates accessors called 'url' and 'url=' but will look for nodes with the name url, link and website. Will return the first thing it finds.
98
+ property %w[url link website]
99
+
100
+ # Creates a property called main_header and will look for message and title
101
+ property %w[message title], name: 'main_header'
102
+
103
+ # This will define a property called width and will look for an attribute of the same name
104
+ property '@width'
105
+
106
+ # This will define a property called `image_url` that will look for a node called 'image' and extract its 'url' attribute
107
+ property 'image/@url'
108
+
109
+ # This will look for a tag called encoded with the namespace content
110
+ property 'content:encoded'
111
+
112
+ # Here we define a transformation to make the parser return an integer
113
+ property 'height', transform: :to_i
114
+
115
+ # An alternative to the transformation is a type. The type must have a #parse method that receives a string
116
+ property 'url', type: URI
117
+
118
+ # A little bit of everything
119
+ property %w[image picture @img media:image], name: 'visual', type: URI, collection: true
120
+ end
121
+ ```
122
+ ### Configuration
123
+ You can configure Flexparser by using a block (for example in an initializer) like so:
124
+ ```ruby
125
+ Flexparser.configure do |config|
126
+ config.option = value
127
+ end
128
+ ```
129
+ At time of writing there are two Options:
130
+
131
+ #### `explicit_property_naming`
132
+ **Default: ** `true`
133
+ If this is `true` you need to specify a `:name` for your `property` everytime there is more than one tag in your tag-list.
134
+ Example:
135
+ ```ruby
136
+ # Bad!
137
+ property %w[url link website]
138
+
139
+ # Good!
140
+ property %w[url link website], name: 'website'
141
+
142
+ # Don't care! Unambiguous!
143
+ property 'url'
144
+ property ['width']
145
+ ```
146
+ #### `retry_without_namespaces`
147
+ **Default:** `true`
148
+ If true, `Flexparser` will add a second set of xpaths to the list of tags you specified, that will ignore namespaces completely.
149
+ Example:
150
+ ```ruby
151
+ Flexparser.configure { |c| c.retry_without_namespaces = false }
152
+ class SomeParser
153
+ property 'inventory'
154
+ end
155
+
156
+ xml = "<inventory xmlns="www.my-inventory.com">james</inventory>"
157
+
158
+ # The inventory can't be found because it is namespaced.
159
+ SomeParser.parse(xml).inventory #=> nil :(
160
+
161
+ Flexparser.configure { |c| c.retry_without_namespaces = true }
162
+ class SomeBetterParser
163
+ property 'inventory'
164
+ end
165
+
166
+ xml = "<inventory xmlns="www.my-inventory.com">james</inventory>"
167
+
168
+ # The inventory can be found because we don't care.
169
+ SomeParser.parse(xml).inventory #=> 'james'
170
+ ```
171
+ The Xpath used here adheres to xpath version 1.X.X and uses the name property `.//[name()='inventory']`
172
+
173
+ ## Development
174
+
175
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
176
+
177
+ ### How it works (in a nutshell)
178
+ The `Flexparser` module defines certain class methods. Most importantly `property` works in similar ways.
179
+ `property` takes a `String` or an array of strings as well as some options. The `property` method instantiates a `TagParser` and adds it to the `@tags` property of the class that is including `Flexparser` (we'll call it MainClass from here on out), which holds an array of all the `TagParser`s and `CollectionParser`s . It also defines accessors for the *name* of the property the `property` parser should extract.
180
+
181
+ The Parsers use an instance of `Flexparser::XPaths` to handle the array of tags that they are passed.
182
+ When everything is setup (i.e. the class is loaded), you can call `::parse` on your MainClass and pass it an XML string. At this point the MainClass instantiates itself and the `TagParser`s and `CollectionParser`s extract a value from the xml, that is then assigned to the newly created MainClass instance.
183
+
184
+ #### Defining a parser with a block
185
+ When defining nested parsers, you would use a block. Like this:
186
+ ```ruby
187
+ class ParserClass
188
+ include Flexparser
189
+
190
+ property 'story', collection: true do
191
+ property 'author'
192
+ property 'title'
193
+ end
194
+ end
195
+ ```
196
+ When passing a block to a parser definition, a new class is created that basically looks like this:
197
+ ```ruby
198
+ Class.new { include Flexparser }
199
+ ```
200
+ The block is then `class_eval`ed on this anonymous class. Thats gives you a lot of flexibility in definen your parsers.
@@ -0,0 +1,10 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << 'test'
6
+ t.libs << 'lib'
7
+ t.test_files = FileList['test/**/*_test.rb']
8
+ end
9
+
10
+ task default: :test
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'flexparser'
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require 'irb'
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,32 @@
1
+ # coding: utf-8
2
+
3
+ lib = File.expand_path('../lib', __FILE__)
4
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
+ require 'flexparser/version'
6
+
7
+ Gem::Specification.new do |spec|
8
+ spec.name = 'flexparser'
9
+ spec.version = Flexparser::VERSION
10
+ spec.authors = ['Paul Martensen']
11
+ spec.email = ['paul.martensen@gmx.de']
12
+ spec.licenses = ['GPL-3.0']
13
+ spec.homepage = 'https://github.com/lokalportal/flexparser'
14
+ spec.summary = 'A xml-parser dsl'
15
+ spec.description = 'A flexible and robust parser-dsl.'
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
18
+ f.match(%r{^(test|spec|features)/})
19
+ end
20
+ spec.bindir = 'exe'
21
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
+ spec.require_paths = ['lib']
23
+
24
+ spec.add_dependency 'nokogiri', '~> 1.7'
25
+ spec.add_dependency 'xpath', '~> 2.1'
26
+ spec.add_development_dependency 'guard', '~> 2.14'
27
+ spec.add_development_dependency 'guard-minitest', '~> 2.4'
28
+ spec.add_development_dependency 'pry-byebug', '~> 3.4'
29
+ spec.add_development_dependency 'bundler', '~> 1.13'
30
+ spec.add_development_dependency 'rake', '~> 10.0'
31
+ spec.add_development_dependency 'minitest', '~> 5.0'
32
+ end
@@ -0,0 +1,37 @@
1
+ require 'xpath'
2
+ require 'nokogiri'
3
+ require 'forwardable'
4
+
5
+ require 'flexparser/version'
6
+ require 'flexparser/errors'
7
+ require 'flexparser/configuration'
8
+ require 'flexparser/fragment'
9
+ require 'flexparser/xpaths'
10
+ require 'flexparser/empty_fragment'
11
+ require 'flexparser/fragment_builder'
12
+ require 'flexparser/tag_parser'
13
+ require 'flexparser/collection_parser'
14
+ require 'flexparser/class_methods'
15
+ require 'flexparser/anonymous_parser'
16
+
17
+ #
18
+ # Main module that, when included, provides
19
+ # the including class access to the property building
20
+ # structure.
21
+ #
22
+ module Flexparser
23
+ class << self
24
+ def configuration
25
+ @configuration ||= Configuration.new
26
+ end
27
+
28
+ def configure
29
+ yield(configuration)
30
+ configuration
31
+ end
32
+ end
33
+
34
+ def self.included(base)
35
+ base.extend ClassMethods
36
+ end
37
+ end
@@ -0,0 +1,20 @@
1
+ module Flexparser
2
+ #
3
+ # A semi-anonymous class used for building the node
4
+ # structure.
5
+ #
6
+ class AnonymousParser
7
+ extend ClassMethods
8
+
9
+ def to_s
10
+ return super if self.class.parsers.empty?
11
+ to_h.to_s
12
+ end
13
+
14
+ def to_h
15
+ self.class.parsers.each_with_object({}) do |parser, hash|
16
+ hash[parser.name.to_sym] = public_send(parser.name)
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,94 @@
1
+ module Flexparser
2
+ #
3
+ # The ClassMethods defined on the including base.
4
+ #
5
+ module ClassMethods
6
+ #
7
+ # Applies the previously set up property-structure to the given xml
8
+ # and returns Instances of the including Class.
9
+ #
10
+ def parse(xml, _options = {})
11
+ return if parsers.empty?
12
+ @doc = FragmentBuilder.build(xml)
13
+ new.tap do |instance|
14
+ parsers.each do |parser|
15
+ instance.public_send("#{parser.name}=", parser.parse(@doc))
16
+ end
17
+ end
18
+ end
19
+
20
+ # no-doc
21
+ def parsers
22
+ @parsers ||= []
23
+ end
24
+
25
+ protected
26
+
27
+ #
28
+ # Defines a TagParser belonging to the including class.
29
+ # This Tag parser is used in #parse to parse a piece of xml.
30
+ # @param [Array<String>] tags The list of tags the parser rattles through.
31
+ # @option opts [String] :name the name this property should have.
32
+ # Overrides the naming derived from the `tags` list.
33
+ # @option opts [Symbol] :transform a method that will be sent to the
34
+ # resulting value to transofrm it. Like `:to_i` or `:to_sym`.
35
+ # @option opts [Class] :type a class that implements `::parse` to
36
+ # return an instance of itself, parsed from the resulting value.
37
+ # @option opts [Boolean] :required if true, raises an
38
+ # `Flexparser::RequiredMissingError` if the resulting value is `nil`.
39
+ #
40
+ def property(tags, collection: false, **opts, &block)
41
+ check_ambiguous_naming!(tags, opts)
42
+ parser_klass = collection ? CollectionParser : TagParser
43
+ add_parser(parser_klass, tags, opts, &block)
44
+ end
45
+
46
+ #
47
+ # Adds a parser with a given class and options to the list
48
+ # of parsers this class holds.
49
+ # @see self#property
50
+ # @param klass [Flexparser::TagParser] either a collection or a single
51
+ # {TagParser}
52
+ #
53
+ def add_parser(klass, tags, **opts, &block)
54
+ tags = Array(tags).flatten
55
+ define_accessors(opts[:name] || tags.first)
56
+ opts[:sub_parser] = new_parser(&block) if block_given?
57
+ parsers << klass.new(tags, opts)
58
+ end
59
+
60
+ #
61
+ # Creates a new anonymous Parser class
62
+ # based on {Flexparser::AnonymousParser}.
63
+ # @param block [Block] The block that holds the classes parser,
64
+ # methods and so on.
65
+ #
66
+ def new_parser(&block)
67
+ klass = Class.new(AnonymousParser)
68
+ klass.instance_eval(&block)
69
+ klass
70
+ end
71
+
72
+ #
73
+ # Defines accesors with a given name after sanitizing that name.
74
+ # @param name [String] the name the accessors shoudl have.
75
+ # Will most likely be a tag.
76
+ #
77
+ def define_accessors(name)
78
+ attr_accessor name.to_s.sub(/^@/, '').gsub(/([[:punct:]]|-)+/, '_')
79
+ end
80
+
81
+ #
82
+ # Raises an error if the name of a parser is ambiguous and the options
83
+ # forbid it from beeing so.
84
+ #
85
+ def check_ambiguous_naming!(tags, opts)
86
+ return unless Flexparser.configuration.explicit_property_naming &&
87
+ opts[:name].nil? &&
88
+ tags.respond_to?(:each) && tags.length > 1
89
+ raise(AmbiguousNamingError,
90
+ "You need to specify a name for the property (#{tags})
91
+ with the :name option.")
92
+ end
93
+ end
94
+ end
@@ -0,0 +1,27 @@
1
+ module Flexparser
2
+ #
3
+ # A Parser similar to the TagParser but intendet for a collection
4
+ # of propertys.
5
+ # @param sub_parser [Wrench] all CollectionParsers need a subparser to
6
+ # deal with the content of the nodes parsed. This should ideally be
7
+ # a class that includes Spigots::Wrench and can parse the fragment it
8
+ # is dealt.
9
+ #
10
+ class CollectionParser < TagParser
11
+ def parse(doc)
12
+ content(doc).map do |n|
13
+ next sub_parser.parse(n) if sub_parser
14
+ next type.parse(n.text) if type
15
+ n.text
16
+ end
17
+ end
18
+
19
+ protected
20
+
21
+ def content(doc)
22
+ content = doc.xpath(xpaths.valid_paths(doc).reduce(&:union).to_s)
23
+ return content unless content.empty?
24
+ options[:required] ? raise_required_error(doc) : content
25
+ end
26
+ end
27
+ end