rusty 0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,2 @@
1
+ coverage
2
+ rdoc
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ gemspec
2
+
3
+ group :development do
4
+ gem "rake"
5
+ gem "rdoc"
6
+ gem "test-unit"
7
+ gem "ruby-debug19"
8
+ gem "simplecov"
9
+ end
@@ -0,0 +1,44 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ rusty (0.1)
5
+ nokogiri
6
+
7
+ GEM
8
+ specs:
9
+ archive-tar-minitar (0.5.2)
10
+ columnize (0.3.6)
11
+ json (1.7.7)
12
+ linecache19 (0.5.12)
13
+ ruby_core_source (>= 0.1.4)
14
+ multi_json (1.6.1)
15
+ nokogiri (1.5.6)
16
+ rake (10.0.3)
17
+ rdoc (3.12.1)
18
+ json (~> 1.4)
19
+ ruby-debug-base19 (0.11.25)
20
+ columnize (>= 0.3.1)
21
+ linecache19 (>= 0.5.11)
22
+ ruby_core_source (>= 0.1.4)
23
+ ruby-debug19 (0.11.6)
24
+ columnize (>= 0.3.1)
25
+ linecache19 (>= 0.5.11)
26
+ ruby-debug-base19 (>= 0.11.19)
27
+ ruby_core_source (0.1.5)
28
+ archive-tar-minitar (>= 0.5.2)
29
+ simplecov (0.7.1)
30
+ multi_json (~> 1.0)
31
+ simplecov-html (~> 0.7.1)
32
+ simplecov-html (0.7.1)
33
+ test-unit (2.5.3)
34
+
35
+ PLATFORMS
36
+ ruby
37
+
38
+ DEPENDENCIES
39
+ rake
40
+ rdoc
41
+ ruby-debug19
42
+ rusty!
43
+ simplecov
44
+ test-unit
@@ -0,0 +1,26 @@
1
+ Parts of this package are copyrighted under the terms of the modified BSD license.
2
+
3
+ Copyright (c) 2011, 2012, @radiospiel.
4
+
5
+ All rights reserved.
6
+
7
+ Redistribution and use in source and binary forms, with or without
8
+ modification, are permitted provided that the following conditions are met:
9
+ * Redistributions of source code must retain the above copyright
10
+ notice, this list of conditions and the following disclaimer.
11
+ * Redistributions in binary form must reproduce the above copyright
12
+ notice, this list of conditions and the following disclaimer in the
13
+ documentation and/or other materials provided with the distribution.
14
+ * No names of the contributors may be used to endorse or promote products
15
+ derived from this software without specific prior written permission.
16
+
17
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
18
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
19
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
20
+ DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
21
+ DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
22
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
23
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
24
+ ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
26
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,200 @@
1
+ # Ever wanted to parse XML, but hated all the hassle?
2
+
3
+ Rusty is here to help. Lets start with a small example:
4
+
5
+ require "rusty"
6
+
7
+ # A simple RSS parser
8
+ module SimpleRSS
9
+ extend Rusty::RuleSet
10
+ helper Rusty::Helpers::Text
11
+
12
+ on "*" do end
13
+ on "rss channel *" do rss[node.name] = text(node) end
14
+ on "rss channel item" do rss.items << item end
15
+ on "rss channel item *" do item[node.name] = text(node) end
16
+ end
17
+
18
+ doc = Nokogiri.XML File.read("stressfaktor.xml")
19
+ data = SimpleRSS.transform! doc
20
+ p data.rss.to_ruby
21
+
22
+ Interested? Read on.
23
+
24
+ ## Transforming nodes
25
+
26
+ Each XML and HTML document, after being parsed by Nokogiri, is represented as a tree of nodes.
27
+ A transformation would visit all the nodes in the input document and do something with the data
28
+ in it. A trivial 1:1 transformation would recreate the tree with the same data. This is obviously
29
+ not what you want; what you want is probably to build a *different* tree, with some information,
30
+ and/or to do something else entirely.
31
+
32
+ rusty is here to help you.
33
+
34
+ - It let you define procedures to run on nodes, specified by CSS selectors, and
35
+ - it provides a simple name lookup when in fact creating a data structure.
36
+
37
+ And this works miracles.
38
+
39
+ ## Defining callbacks
40
+
41
+ Rusty knows two different kind of callbacks. The `on` callback, which is run before processing a node's children, and the `after` callback, which is run once all children have been visited.
42
+
43
+ module SimpleRSS
44
+ extend Rusty::RuleSet
45
+
46
+ on "rss channel item" do puts "Hu! An item node" end
47
+ on "rss channel item *" do puts "A child of an item" end
48
+ after "rss channel item" do puts "Now I have seen all of the item's children" end
49
+ end
50
+
51
+ There is an additional way to define a callback, which makes some sense if you need both an "on" and an "after" callback for the same nodes, and probably want to share some information between these:
52
+
53
+ module SimpleRSS
54
+ extend Rusty::RuleSet
55
+
56
+ on "rss channel item" do
57
+ start = Time.now
58
+ callback do
59
+ puts "Parsing the item took #{Time.now - start} secs."
60
+ end
61
+ end
62
+ end
63
+
64
+ ## Defining callbacks
65
+
66
+ Rusty knows two different kind of callbacks. The `on` callback, which is run before processing a node's children, and the `after` callback, which is run once all children have been visited.
67
+
68
+ module SimpleRSS
69
+ extend Rusty::RuleSet
70
+
71
+ on "rss channel item" do puts "Hu! An item node" end
72
+ on "rss channel item *" do puts "A child of an item" end
73
+ after "rss channel item" do puts "Now I have seen all of the item's children" end
74
+ end
75
+
76
+ There is an additional way to define a callback, which makes some sense if you need both an "on" and an "after" callback for the same nodes, and probably want to share some information between these:
77
+
78
+ module SimpleRSS
79
+ extend Rusty::RuleSet
80
+
81
+ on "rss channel item" do
82
+ puts "Hu! An item node"
83
+ callback do
84
+ puts "Now I have seen all of the item's children" end
85
+ end
86
+ end
87
+ end
88
+
89
+ `after` and `callback` callbacks can coexist.
90
+
91
+ ## Creating output data
92
+
93
+ One case to parse XML is to recreate some kind of data structure which resembles some or all of the XML's input. To support this mode of operation rusty "mirrors" input nodes with output data nodes. To further help you rusty comes with a nimble name lookup scheme in its callbacks. Whenever you use an undeclared name in a callback, rusty goes up to the parent of the document to find a node with a matching name:
94
+
95
+ module SimpleRSS
96
+ extend Rusty::RuleSet
97
+
98
+ on "rss" do
99
+ rss.item_count = 0
100
+ callback do
101
+ puts "There are #{rss.item_count} items in the input"
102
+ end
103
+ end
104
+
105
+ on "rss channel item" do
106
+ rss.count += 1
107
+ end
108
+ end
109
+
110
+ What happens with the resulting data is up to you. By default rusty throws away all resulting data except what belongs to the top node of the document. In the above example SimpleRSS.transform! would return a hash
111
+
112
+ { count => <some_number> }
113
+
114
+ If you want to keep a node's data you must put it somewhere, as in the following example:
115
+
116
+ on "rss channel item" do rss.items << item end
117
+
118
+ ## Node names
119
+
120
+ What is a matching name? While XML documents may come with names that might make sense, HTML usually does not. After all, a `<div>` is a `<div>` no matter what.
121
+
122
+ For that reason rusty matches both node names and node classes when looking up a node by name. (And yes, that means a node might have multiple names.) And as of yet node names that are not valid ruby identifiers cannot be used in the callback block.
123
+
124
+ There is one special name, `document`, which refers to the top-most node.
125
+
126
+ ## Rusty data nodes
127
+
128
+ A Rusty data node (of type Rusty::DX), is a mongrel of a `Hash`, an `Array`, and `nil`. Unless set to something - i.e. as long as being `nil` - it might turn into an Array or a Hash-like structure, depending on what you do to them.
129
+
130
+ The following makes `rss` a hash:
131
+
132
+ rss.key?(:foo)
133
+ rss.item_count = 0 # Hash entries are automatically created
134
+
135
+ while the following makes it an array
136
+
137
+ rss << 1
138
+ rss[5] = 25
139
+
140
+ To get back a stupid ruby object use the `.to_ruby` method, i.e.
141
+
142
+ rss.to_ruby # => [ 1, nil, nil, nil, nil, 5 ]
143
+
144
+ ## ..or something else?
145
+
146
+ Of course you are free to do whatever. After all, each callback is just a piece of ruby code.
147
+
148
+ module SimpleRSS
149
+ extend Rusty::RuleSet
150
+ helper Rusty::Helpers::Text
151
+
152
+ on "rss channel item *" do item[node.name] = text(node) end
153
+ after "rss channel item" do puts "Found an item: #{item.to_ruby}" end
154
+ end
155
+
156
+ ## The `*` callback
157
+
158
+ You will usually see a rule like this:
159
+
160
+ on "*" do end
161
+
162
+ The "*" selector has a very low weight, meaning it matches all nodes that are not matched by any other rule. This is done to prevent rusty from warning about nodes without a matching rule.
163
+
164
+ During development you should not use a rule like that. Add it only after you feel confident you get all the data you need form the input.
165
+
166
+ ## Speeding up
167
+
168
+ Especially when parsing HTML you might find a number of nodes that belong to a subtree in the document which is completely irrelevant. For example, a page like http://www.google.com/movies contains tons of UI elements, which - assuming you would be interested in theater schedules - is just irrelevant. By skipping the entire subtree you might gain some speed when parsing the input:
169
+
170
+ on "#navbar, #left_nav" do
171
+ skip!
172
+ end
173
+
174
+ ## Helpers and the callback scope
175
+
176
+ Note that callbacks get a special scope. This scope - a Rusty::CallbackBinding - is responsible for looking up names up the node tree. The only value defined there - apart from things like `object_id`, `class`, etc. is `node`, which refers to the input node.
177
+
178
+ If you need special functionality you should define helper methods and modules, as in the following example:
179
+
180
+ module SimpleRSS
181
+ extend Rusty::RuleSet
182
+ helper Rusty::Helpers::Text
183
+ helper do
184
+ def a_helper_method(*args)
185
+ end
186
+ end
187
+ on "rss channel item *" do a_helper_method 1, 2, 3 end
188
+ end
189
+
190
+ Rusty comes with the Rusty::Helpers::Text module, which provides a single helper method, `text`, which returns a node's text after cleaning it up.
191
+
192
+ # That is all.
193
+
194
+ Rusty does have a number of shortcomings.
195
+
196
+ - It does not support namespaces,
197
+ - it's CSS selector matching could be faster,
198
+ - the selector weighting could be more correct,
199
+
200
+ Don't hesitate to fork away and send pull requests!
@@ -0,0 +1,13 @@
1
+ require "bundler"
2
+ Bundler.setup :development
3
+
4
+ Dir.glob("tasks/*.rake").each do |file|
5
+ load file
6
+ end
7
+
8
+ task :default => :test
9
+
10
+ # Add "rake release and rake install"
11
+ Bundler::GemHelper.install_tasks
12
+
13
+ task :default => [:test, :rdoc]
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rb-fsevent'
3
+
4
+ if ARGV.length < 2
5
+ STDERR.puts "watchr dir ruby-parameter(s)..."
6
+ abort
7
+ end
8
+
9
+ paths = ARGV.shift.split(",")
10
+ $args = ARGV.dup
11
+
12
+ def do_ruby
13
+ STDERR.puts $args.join(" ")
14
+ STDERR.puts "=" * 80
15
+
16
+ system(*$args)
17
+ end
18
+
19
+ puts "Initial run"
20
+ do_ruby
21
+
22
+ fsevent = FSEvent.new
23
+ fsevent.watch paths do |directories|
24
+ puts "Detected change inside: #{directories.inspect}"
25
+ do_ruby
26
+ end
27
+ fsevent.run
@@ -0,0 +1,16 @@
1
+ # This file is part of the rusty ruby gem.
2
+ #
3
+ # Copyright (c) 2013 @radiospiel
4
+ # Distributed under the terms of the modified BSD license, see LICENSE.BSD
5
+ require "nokogiri"
6
+
7
+ module Rusty; end
8
+
9
+ require_relative "rusty/version"
10
+ require_relative "rusty/nokogiri_ext"
11
+ require_relative "rusty/dx"
12
+ require_relative "rusty/scope"
13
+ require_relative "rusty/helpers"
14
+ require_relative "rusty/callback_binding"
15
+ require_relative "rusty/selector"
16
+ require_relative "rusty/rule_set"
@@ -0,0 +1,71 @@
1
+ # This file is part of the rusty ruby gem.
2
+ #
3
+ # Copyright (c) 2013 @radiospiel
4
+ # Distributed under the terms of the modified BSD license, see LICENSE.BSD
5
+
6
+ # CallbackBinding objects are used to provide name lookup for callbacks.
7
+ # They are set up in such a way that unknown identifiers might traverse
8
+ # up the parents of a node to find a finally matching node. This allows
9
+ # the "top" node in the following callback to be refered by the name
10
+ # "top", like this:
11
+ #
12
+ # on "top p" { top.attribute = "p" }
13
+ #
14
+ class Rusty::CallbackBinding < Object #BasicObject
15
+ #
16
+ # Create a subclass with a given name and a set of helper modules.
17
+ def self.subclass_with_name_and_helpers(name, *helpers)
18
+ name = name.split("::").last
19
+
20
+ ::Class.new(self).tap do |klass|
21
+ const_set name, klass
22
+ klass.send :include, *helpers
23
+ end
24
+ end
25
+
26
+ # create an event scope which wraps a node scope.
27
+ def initialize(scope)
28
+ @scope = scope
29
+ end
30
+
31
+ # -- special attributes -----------------------------------------------------
32
+
33
+ # callback: set a callback proc which will be called after a node is
34
+ # processed completely.
35
+ def callback(&block)
36
+ @callback = block if block
37
+ @callback
38
+ end
39
+
40
+ # skip!: allow to skip completely skip children.
41
+ def skip!
42
+ @skip = true
43
+ end
44
+
45
+ # Should children be skipped?
46
+ def skip?
47
+ !!@skip
48
+ end
49
+
50
+ # If the missing method is an identifier and has no arguments, this method
51
+ # looks in this node scope and its parent scopes for a scope with that name.
52
+ # If there is no such target scope, the message is forwarded to the
53
+ # node scope (which has its own set of magic, see Rusty::DX)
54
+ def method_missing(sym, *args, &block)
55
+ target = if !block && args.empty? && sym =~ /^[A-Za-z_][A-Za-z_0-9]*$/
56
+ up_scope!(sym.to_s)
57
+ end
58
+
59
+ (target || @scope).send sym, *args, &block
60
+ end
61
+
62
+ private
63
+
64
+ def up_scope!(name)
65
+ @scope.up! do |scope|
66
+ return scope if scope.has_name?(name)
67
+ end
68
+
69
+ nil
70
+ end
71
+ end
@@ -0,0 +1,127 @@
1
+ # This file is part of the rusty ruby gem.
2
+ #
3
+ # Copyright (c) 2013 @radiospiel
4
+ # Distributed under the terms of the modified BSD license, see LICENSE.BSD
5
+
6
+ require "forwardable"
7
+
8
+ # A flexible data object which may act either as a hash or an array.
9
+ # It automatically initialises as array or hash, when used in a context
10
+ # which requires one or the other usage.
11
+ class Rusty::DX
12
+
13
+ # -- set up storage ---------------------------------------------------------
14
+
15
+ # The hash storage implementation.
16
+ class Dict < Hash
17
+ DELEGATE_METHODS = [:[], :[]=, :key?]
18
+ module Delegator
19
+ extend Forwardable
20
+ delegate DELEGATE_METHODS => :@storage
21
+ end
22
+
23
+ def initialize
24
+ super { |hash, key| hash[key] = Rusty::DX.new }
25
+ end
26
+ end
27
+
28
+ # The array storage implementation.
29
+ class List < Array
30
+ DELEGATE_METHODS = [:[], :[]=, :<<, :first]
31
+ module Delegator
32
+ extend Forwardable
33
+ delegate DELEGATE_METHODS => :@storage
34
+ end
35
+ end
36
+
37
+ def inspect
38
+ "<#{@storage ? @storage.inspect : "nil"}>"
39
+ end
40
+
41
+ private
42
+
43
+ def __storage(klass)
44
+ if @storage
45
+ raise ArgumentError, "Cannot change type to #{klass}" unless @storage.is_a?(klass)
46
+ @storage
47
+ else
48
+ extend klass::Delegator
49
+ @storage = klass.new
50
+ end
51
+ end
52
+
53
+ ## -- test mode, convert to ruby objects ------------------------------------
54
+
55
+ public
56
+
57
+ # Is this object in hash mode?
58
+ def dict?; @storage.is_a?(Dict); end
59
+
60
+ # Is this object in list mode?
61
+ def list?; @storage.is_a?(List); end
62
+
63
+
64
+ def self.to_ruby(object)
65
+ object.is_a?(Rusty::DX) ? object.to_ruby : object
66
+ end
67
+
68
+ # convert into a ruby object
69
+ def to_ruby
70
+ case @storage
71
+ when Dict
72
+ items = @storage.inject([]) do |ary, (k, v)|
73
+ ary << k << Rusty::DX.to_ruby(v)
74
+ end
75
+ Hash[*items]
76
+ when List
77
+ @storage.map { |v| Rusty::DX.to_ruby(v) }
78
+ end
79
+ end
80
+
81
+ # -- method missing magic ---------------------------------------------------
82
+
83
+ public
84
+
85
+ # method_missing automatically sets up storage, and matches method names
86
+ # with hash keys (when in Dict mode)
87
+ #
88
+ # When setting up storage for this object the storage type is determined
89
+ # by these conditions:
90
+ #
91
+ # - identifiers, as getters (i.e. with no arguments), result in Dict storage
92
+ # - identifiers, as setters (i.e. with one argument, ending in '='), result
93
+ # in Dict storage
94
+ # - the [] and []= operators result in List storage, if the argument is
95
+ # an integer, else in Dict storage.
96
+ # - Methods that are only implemented in the List or Dict storage determine
97
+ # the storage type accordingly. These are set up automatically by evaluating
98
+ # {List/Dict}::DELEGATE_METHODS. For example, you cannot push (<<) into a
99
+ # Hash, nor you can't ask an array for existence of a key?
100
+ EXCLUSIVE_LIST_METHODS = List::DELEGATE_METHODS - Dict::DELEGATE_METHODS
101
+ EXCLUSIVE_DICT_METHODS = Dict::DELEGATE_METHODS - List::DELEGATE_METHODS
102
+
103
+ def method_missing(sym, *args, &block)
104
+ # A number of missing methods try to initialize this object either as
105
+ # a hash or an array, and then forward the message to storage.
106
+ case sym
107
+ when :[], :[]=
108
+ raise "#{self.class.name}##{sym}: Missing argument" unless args.length >= 1
109
+ return __storage(args.first.is_a?(Integer) ? List : Dict).send(sym, *args)
110
+ when /^([_A-Za-z][_A-Za-z0-9]*)$/
111
+ if args.length == 0 && block.nil?
112
+ return __storage(Dict)[sym.to_s]
113
+ end
114
+ when /^([_A-Za-z][_A-Za-z0-9]*)=$/
115
+ if args.length == 1 && block.nil?
116
+ return __storage(Dict)[$1] = args.first
117
+ end
118
+ else
119
+ return __storage(List).send sym, *args, &block if EXCLUSIVE_LIST_METHODS.include?(sym)
120
+ return __storage(Dict).send sym, *args, &block if EXCLUSIVE_DICT_METHODS.include?(sym)
121
+ end
122
+
123
+ # -- we could not set up nor delegate to storage; run super instead
124
+ # (this will raise a unknown method exception.)
125
+ super
126
+ end
127
+ end