flexparser 1.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +10 -0
- data/.rubocop.yml +1 -0
- data/.rubocop_todo.yml +22 -0
- data/.travis.yml +5 -0
- data/Gemfile +4 -0
- data/Guardfile +42 -0
- data/LICENSE +674 -0
- data/README.md +200 -0
- data/Rakefile +10 -0
- data/bin/console +14 -0
- data/bin/setup +8 -0
- data/flexparser.gemspec +32 -0
- data/lib/flexparser.rb +37 -0
- data/lib/flexparser/anonymous_parser.rb +20 -0
- data/lib/flexparser/class_methods.rb +94 -0
- data/lib/flexparser/collection_parser.rb +27 -0
- data/lib/flexparser/configuration.rb +23 -0
- data/lib/flexparser/empty_fragment.rb +23 -0
- data/lib/flexparser/errors.rb +21 -0
- data/lib/flexparser/fragment.rb +44 -0
- data/lib/flexparser/fragment_builder.rb +15 -0
- data/lib/flexparser/tag_parser.rb +65 -0
- data/lib/flexparser/version.rb +3 -0
- data/lib/flexparser/xpaths.rb +91 -0
- data/test.xml +11 -0
- metadata +181 -0
data/README.md
ADDED
@@ -0,0 +1,200 @@
|
|
1
|
+
# Flexparser
|
2
|
+
`Flexparser` provides an easy to use DSL for flexible, robust xml parsers. The goal of eflexparser is to be able to write **One Parser to parse them all**.
|
3
|
+
|
4
|
+
## Installation
|
5
|
+
|
6
|
+
Add this line to your application's Gemfile:
|
7
|
+
|
8
|
+
```ruby
|
9
|
+
gem 'flexparser', path: 'vendor/gems/' # Or whereever you have eflexparser stored
|
10
|
+
```
|
11
|
+
|
12
|
+
Since this gem is private, it can only be vendored into a project right now.
|
13
|
+
|
14
|
+
**BEWARE:** Since i started working on this gem, another gem by the name of `flexparser` has been published to rubygems. So calling `gem install flexparser` will in fact install the wrong gem.
|
15
|
+
|
16
|
+
## Usage
|
17
|
+
#### Basics:
|
18
|
+
|
19
|
+
Including the `Flexparser` module in any Class turns it into a parser.
|
20
|
+
Lets start simple:
|
21
|
+
```ruby
|
22
|
+
class WebParser
|
23
|
+
include Flexparser
|
24
|
+
|
25
|
+
property 'url'
|
26
|
+
end
|
27
|
+
```
|
28
|
+
Now this class is able to parse xml similar to this:
|
29
|
+
```xml
|
30
|
+
<url>www.my-page.com</url>
|
31
|
+
```
|
32
|
+
Now your parser can do this:
|
33
|
+
```ruby
|
34
|
+
# Assuming the xml variable holds the xml code mentioned above
|
35
|
+
website = WebParser.parse xml
|
36
|
+
webite.url #=> 'www.my-page.com'
|
37
|
+
```
|
38
|
+
|
39
|
+
#### Collections
|
40
|
+
A `node` command will only return the first value it finds. When you have multiple nodes that interest you, you can get a `collection` of them.
|
41
|
+
```ruby
|
42
|
+
books = '
|
43
|
+
<work author="H.P. Lovecraft">
|
44
|
+
<story>The Call of Cthulhu</story>
|
45
|
+
<story>Dagon</story>
|
46
|
+
<story>The Nameless City</story>
|
47
|
+
</work>
|
48
|
+
'
|
49
|
+
|
50
|
+
class LovecraftParser
|
51
|
+
include Flexparser
|
52
|
+
|
53
|
+
property 'story', collection: true
|
54
|
+
end
|
55
|
+
|
56
|
+
work = LovecraftParser.parse books
|
57
|
+
work.story #=> ['The Call of Cthulhu', 'Dagon', 'The Nameless City']
|
58
|
+
```
|
59
|
+
|
60
|
+
#### Nested Parsers
|
61
|
+
Sometimes you want more than to just xtract a String. This way you can make your parser return complex Objects and nest parsers as deep as you like.
|
62
|
+
```ruby
|
63
|
+
library = "
|
64
|
+
<book>
|
65
|
+
<author>J. R. R. Tolkien</author>
|
66
|
+
<title>The Hobbit</title>
|
67
|
+
</book>
|
68
|
+
<book>
|
69
|
+
<author>Suzanne Collins</author>
|
70
|
+
<title>The Hunger Games</title>
|
71
|
+
</book>"
|
72
|
+
|
73
|
+
class LibraryParser
|
74
|
+
include Flexparser
|
75
|
+
|
76
|
+
property 'book', collection: true do
|
77
|
+
attr_accessor :isbn
|
78
|
+
property 'author'
|
79
|
+
property 'title'
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
lib = LibraryParser.parse library
|
84
|
+
lib.book.second.authro #=> 'Suzanne Collins'
|
85
|
+
lib.book.first.title #=> 'The Hobbit'
|
86
|
+
lib.book.first.isbn = '9780582186552'
|
87
|
+
lib.book.first.isbn #=> '9780582186552'
|
88
|
+
```
|
89
|
+
With nested parsers, anonymous classes are defined inside an existing parser. Therefore you can define methods all you like (should the need arise).
|
90
|
+
|
91
|
+
#### Tag Definitions
|
92
|
+
You might not always know (or it might not always be the same), what the information you are looking for is called. If that is the case, you can define multiple tags for the same property. Here are a few examples:
|
93
|
+
```ruby
|
94
|
+
class UniParser
|
95
|
+
include Flexparser
|
96
|
+
|
97
|
+
# Creates accessors called 'url' and 'url=' but will look for nodes with the name url, link and website. Will return the first thing it finds.
|
98
|
+
property %w[url link website]
|
99
|
+
|
100
|
+
# Creates a property called main_header and will look for message and title
|
101
|
+
property %w[message title], name: 'main_header'
|
102
|
+
|
103
|
+
# This will define a property called width and will look for an attribute of the same name
|
104
|
+
property '@width'
|
105
|
+
|
106
|
+
# This will define a property called `image_url` that will look for a node called 'image' and extract its 'url' attribute
|
107
|
+
property 'image/@url'
|
108
|
+
|
109
|
+
# This will look for a tag called encoded with the namespace content
|
110
|
+
property 'content:encoded'
|
111
|
+
|
112
|
+
# Here we define a transformation to make the parser return an integer
|
113
|
+
property 'height', transform: :to_i
|
114
|
+
|
115
|
+
# An alternative to the transformation is a type. The type must have a #parse method that receives a string
|
116
|
+
property 'url', type: URI
|
117
|
+
|
118
|
+
# A little bit of everything
|
119
|
+
property %w[image picture @img media:image], name: 'visual', type: URI, collection: true
|
120
|
+
end
|
121
|
+
```
|
122
|
+
### Configuration
|
123
|
+
You can configure Flexparser by using a block (for example in an initializer) like so:
|
124
|
+
```ruby
|
125
|
+
Flexparser.configure do |config|
|
126
|
+
config.option = value
|
127
|
+
end
|
128
|
+
```
|
129
|
+
At time of writing there are two Options:
|
130
|
+
|
131
|
+
#### `explicit_property_naming`
|
132
|
+
**Default: ** `true`
|
133
|
+
If this is `true` you need to specify a `:name` for your `property` everytime there is more than one tag in your tag-list.
|
134
|
+
Example:
|
135
|
+
```ruby
|
136
|
+
# Bad!
|
137
|
+
property %w[url link website]
|
138
|
+
|
139
|
+
# Good!
|
140
|
+
property %w[url link website], name: 'website'
|
141
|
+
|
142
|
+
# Don't care! Unambiguous!
|
143
|
+
property 'url'
|
144
|
+
property ['width']
|
145
|
+
```
|
146
|
+
#### `retry_without_namespaces`
|
147
|
+
**Default:** `true`
|
148
|
+
If true, `Flexparser` will add a second set of xpaths to the list of tags you specified, that will ignore namespaces completely.
|
149
|
+
Example:
|
150
|
+
```ruby
|
151
|
+
Flexparser.configure { |c| c.retry_without_namespaces = false }
|
152
|
+
class SomeParser
|
153
|
+
property 'inventory'
|
154
|
+
end
|
155
|
+
|
156
|
+
xml = "<inventory xmlns="www.my-inventory.com">james</inventory>"
|
157
|
+
|
158
|
+
# The inventory can't be found because it is namespaced.
|
159
|
+
SomeParser.parse(xml).inventory #=> nil :(
|
160
|
+
|
161
|
+
Flexparser.configure { |c| c.retry_without_namespaces = true }
|
162
|
+
class SomeBetterParser
|
163
|
+
property 'inventory'
|
164
|
+
end
|
165
|
+
|
166
|
+
xml = "<inventory xmlns="www.my-inventory.com">james</inventory>"
|
167
|
+
|
168
|
+
# The inventory can be found because we don't care.
|
169
|
+
SomeParser.parse(xml).inventory #=> 'james'
|
170
|
+
```
|
171
|
+
The Xpath used here adheres to xpath version 1.X.X and uses the name property `.//[name()='inventory']`
|
172
|
+
|
173
|
+
## Development
|
174
|
+
|
175
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
176
|
+
|
177
|
+
### How it works (in a nutshell)
|
178
|
+
The `Flexparser` module defines certain class methods. Most importantly `property` works in similar ways.
|
179
|
+
`property` takes a `String` or an array of strings as well as some options. The `property` method instantiates a `TagParser` and adds it to the `@tags` property of the class that is including `Flexparser` (we'll call it MainClass from here on out), which holds an array of all the `TagParser`s and `CollectionParser`s . It also defines accessors for the *name* of the property the `property` parser should extract.
|
180
|
+
|
181
|
+
The Parsers use an instance of `Flexparser::XPaths` to handle the array of tags that they are passed.
|
182
|
+
When everything is setup (i.e. the class is loaded), you can call `::parse` on your MainClass and pass it an XML string. At this point the MainClass instantiates itself and the `TagParser`s and `CollectionParser`s extract a value from the xml, that is then assigned to the newly created MainClass instance.
|
183
|
+
|
184
|
+
#### Defining a parser with a block
|
185
|
+
When defining nested parsers, you would use a block. Like this:
|
186
|
+
```ruby
|
187
|
+
class ParserClass
|
188
|
+
include Flexparser
|
189
|
+
|
190
|
+
property 'story', collection: true do
|
191
|
+
property 'author'
|
192
|
+
property 'title'
|
193
|
+
end
|
194
|
+
end
|
195
|
+
```
|
196
|
+
When passing a block to a parser definition, a new class is created that basically looks like this:
|
197
|
+
```ruby
|
198
|
+
Class.new { include Flexparser }
|
199
|
+
```
|
200
|
+
The block is then `class_eval`ed on this anonymous class. Thats gives you a lot of flexibility in definen your parsers.
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'bundler/setup'
|
4
|
+
require 'flexparser'
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require 'irb'
|
14
|
+
IRB.start
|
data/bin/setup
ADDED
data/flexparser.gemspec
ADDED
@@ -0,0 +1,32 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
|
3
|
+
lib = File.expand_path('../lib', __FILE__)
|
4
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
5
|
+
require 'flexparser/version'
|
6
|
+
|
7
|
+
Gem::Specification.new do |spec|
|
8
|
+
spec.name = 'flexparser'
|
9
|
+
spec.version = Flexparser::VERSION
|
10
|
+
spec.authors = ['Paul Martensen']
|
11
|
+
spec.email = ['paul.martensen@gmx.de']
|
12
|
+
spec.licenses = ['GPL-3.0']
|
13
|
+
spec.homepage = 'https://github.com/lokalportal/flexparser'
|
14
|
+
spec.summary = 'A xml-parser dsl'
|
15
|
+
spec.description = 'A flexible and robust parser-dsl.'
|
16
|
+
|
17
|
+
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
18
|
+
f.match(%r{^(test|spec|features)/})
|
19
|
+
end
|
20
|
+
spec.bindir = 'exe'
|
21
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
22
|
+
spec.require_paths = ['lib']
|
23
|
+
|
24
|
+
spec.add_dependency 'nokogiri', '~> 1.7'
|
25
|
+
spec.add_dependency 'xpath', '~> 2.1'
|
26
|
+
spec.add_development_dependency 'guard', '~> 2.14'
|
27
|
+
spec.add_development_dependency 'guard-minitest', '~> 2.4'
|
28
|
+
spec.add_development_dependency 'pry-byebug', '~> 3.4'
|
29
|
+
spec.add_development_dependency 'bundler', '~> 1.13'
|
30
|
+
spec.add_development_dependency 'rake', '~> 10.0'
|
31
|
+
spec.add_development_dependency 'minitest', '~> 5.0'
|
32
|
+
end
|
data/lib/flexparser.rb
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
require 'xpath'
|
2
|
+
require 'nokogiri'
|
3
|
+
require 'forwardable'
|
4
|
+
|
5
|
+
require 'flexparser/version'
|
6
|
+
require 'flexparser/errors'
|
7
|
+
require 'flexparser/configuration'
|
8
|
+
require 'flexparser/fragment'
|
9
|
+
require 'flexparser/xpaths'
|
10
|
+
require 'flexparser/empty_fragment'
|
11
|
+
require 'flexparser/fragment_builder'
|
12
|
+
require 'flexparser/tag_parser'
|
13
|
+
require 'flexparser/collection_parser'
|
14
|
+
require 'flexparser/class_methods'
|
15
|
+
require 'flexparser/anonymous_parser'
|
16
|
+
|
17
|
+
#
|
18
|
+
# Main module that, when included, provides
|
19
|
+
# the including class access to the property building
|
20
|
+
# structure.
|
21
|
+
#
|
22
|
+
module Flexparser
|
23
|
+
class << self
|
24
|
+
def configuration
|
25
|
+
@configuration ||= Configuration.new
|
26
|
+
end
|
27
|
+
|
28
|
+
def configure
|
29
|
+
yield(configuration)
|
30
|
+
configuration
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
def self.included(base)
|
35
|
+
base.extend ClassMethods
|
36
|
+
end
|
37
|
+
end
|
@@ -0,0 +1,20 @@
|
|
1
|
+
module Flexparser
|
2
|
+
#
|
3
|
+
# A semi-anonymous class used for building the node
|
4
|
+
# structure.
|
5
|
+
#
|
6
|
+
class AnonymousParser
|
7
|
+
extend ClassMethods
|
8
|
+
|
9
|
+
def to_s
|
10
|
+
return super if self.class.parsers.empty?
|
11
|
+
to_h.to_s
|
12
|
+
end
|
13
|
+
|
14
|
+
def to_h
|
15
|
+
self.class.parsers.each_with_object({}) do |parser, hash|
|
16
|
+
hash[parser.name.to_sym] = public_send(parser.name)
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
@@ -0,0 +1,94 @@
|
|
1
|
+
module Flexparser
|
2
|
+
#
|
3
|
+
# The ClassMethods defined on the including base.
|
4
|
+
#
|
5
|
+
module ClassMethods
|
6
|
+
#
|
7
|
+
# Applies the previously set up property-structure to the given xml
|
8
|
+
# and returns Instances of the including Class.
|
9
|
+
#
|
10
|
+
def parse(xml, _options = {})
|
11
|
+
return if parsers.empty?
|
12
|
+
@doc = FragmentBuilder.build(xml)
|
13
|
+
new.tap do |instance|
|
14
|
+
parsers.each do |parser|
|
15
|
+
instance.public_send("#{parser.name}=", parser.parse(@doc))
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
# no-doc
|
21
|
+
def parsers
|
22
|
+
@parsers ||= []
|
23
|
+
end
|
24
|
+
|
25
|
+
protected
|
26
|
+
|
27
|
+
#
|
28
|
+
# Defines a TagParser belonging to the including class.
|
29
|
+
# This Tag parser is used in #parse to parse a piece of xml.
|
30
|
+
# @param [Array<String>] tags The list of tags the parser rattles through.
|
31
|
+
# @option opts [String] :name the name this property should have.
|
32
|
+
# Overrides the naming derived from the `tags` list.
|
33
|
+
# @option opts [Symbol] :transform a method that will be sent to the
|
34
|
+
# resulting value to transofrm it. Like `:to_i` or `:to_sym`.
|
35
|
+
# @option opts [Class] :type a class that implements `::parse` to
|
36
|
+
# return an instance of itself, parsed from the resulting value.
|
37
|
+
# @option opts [Boolean] :required if true, raises an
|
38
|
+
# `Flexparser::RequiredMissingError` if the resulting value is `nil`.
|
39
|
+
#
|
40
|
+
def property(tags, collection: false, **opts, &block)
|
41
|
+
check_ambiguous_naming!(tags, opts)
|
42
|
+
parser_klass = collection ? CollectionParser : TagParser
|
43
|
+
add_parser(parser_klass, tags, opts, &block)
|
44
|
+
end
|
45
|
+
|
46
|
+
#
|
47
|
+
# Adds a parser with a given class and options to the list
|
48
|
+
# of parsers this class holds.
|
49
|
+
# @see self#property
|
50
|
+
# @param klass [Flexparser::TagParser] either a collection or a single
|
51
|
+
# {TagParser}
|
52
|
+
#
|
53
|
+
def add_parser(klass, tags, **opts, &block)
|
54
|
+
tags = Array(tags).flatten
|
55
|
+
define_accessors(opts[:name] || tags.first)
|
56
|
+
opts[:sub_parser] = new_parser(&block) if block_given?
|
57
|
+
parsers << klass.new(tags, opts)
|
58
|
+
end
|
59
|
+
|
60
|
+
#
|
61
|
+
# Creates a new anonymous Parser class
|
62
|
+
# based on {Flexparser::AnonymousParser}.
|
63
|
+
# @param block [Block] The block that holds the classes parser,
|
64
|
+
# methods and so on.
|
65
|
+
#
|
66
|
+
def new_parser(&block)
|
67
|
+
klass = Class.new(AnonymousParser)
|
68
|
+
klass.instance_eval(&block)
|
69
|
+
klass
|
70
|
+
end
|
71
|
+
|
72
|
+
#
|
73
|
+
# Defines accesors with a given name after sanitizing that name.
|
74
|
+
# @param name [String] the name the accessors shoudl have.
|
75
|
+
# Will most likely be a tag.
|
76
|
+
#
|
77
|
+
def define_accessors(name)
|
78
|
+
attr_accessor name.to_s.sub(/^@/, '').gsub(/([[:punct:]]|-)+/, '_')
|
79
|
+
end
|
80
|
+
|
81
|
+
#
|
82
|
+
# Raises an error if the name of a parser is ambiguous and the options
|
83
|
+
# forbid it from beeing so.
|
84
|
+
#
|
85
|
+
def check_ambiguous_naming!(tags, opts)
|
86
|
+
return unless Flexparser.configuration.explicit_property_naming &&
|
87
|
+
opts[:name].nil? &&
|
88
|
+
tags.respond_to?(:each) && tags.length > 1
|
89
|
+
raise(AmbiguousNamingError,
|
90
|
+
"You need to specify a name for the property (#{tags})
|
91
|
+
with the :name option.")
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
@@ -0,0 +1,27 @@
|
|
1
|
+
module Flexparser
|
2
|
+
#
|
3
|
+
# A Parser similar to the TagParser but intendet for a collection
|
4
|
+
# of propertys.
|
5
|
+
# @param sub_parser [Wrench] all CollectionParsers need a subparser to
|
6
|
+
# deal with the content of the nodes parsed. This should ideally be
|
7
|
+
# a class that includes Spigots::Wrench and can parse the fragment it
|
8
|
+
# is dealt.
|
9
|
+
#
|
10
|
+
class CollectionParser < TagParser
|
11
|
+
def parse(doc)
|
12
|
+
content(doc).map do |n|
|
13
|
+
next sub_parser.parse(n) if sub_parser
|
14
|
+
next type.parse(n.text) if type
|
15
|
+
n.text
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
protected
|
20
|
+
|
21
|
+
def content(doc)
|
22
|
+
content = doc.xpath(xpaths.valid_paths(doc).reduce(&:union).to_s)
|
23
|
+
return content unless content.empty?
|
24
|
+
options[:required] ? raise_required_error(doc) : content
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|