ariel 0.0.1 → 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +49 -83
- data/bin/ariel +29 -20
- data/examples/google_calculator/structure.rb +2 -2
- data/examples/google_calculator/structure.yaml +13 -15
- data/examples/raa/labeled/highline.html +5 -4
- data/examples/raa/labeled/mongrel.html +9 -8
- data/examples/raa/structure.rb +4 -2
- data/examples/raa/structure.yaml +94 -78
- data/lib/ariel.rb +71 -33
- data/lib/ariel/{candidate_selector.rb → candidate_refiner.rb} +39 -38
- data/lib/ariel/label_utils.rb +46 -18
- data/lib/ariel/labeled_document_loader.rb +77 -0
- data/lib/ariel/learner.rb +60 -38
- data/lib/ariel/log.rb +67 -0
- data/lib/ariel/node.rb +52 -0
- data/lib/ariel/node/extracted.rb +90 -0
- data/lib/ariel/node/structure.rb +91 -0
- data/lib/ariel/rule.rb +114 -32
- data/lib/ariel/rule_set.rb +34 -15
- data/lib/ariel/token.rb +9 -3
- data/lib/ariel/token_stream.rb +32 -17
- data/lib/ariel/wildcards.rb +19 -15
- data/test/fixtures.rb +45 -3
- data/test/specs/candidate_refiner_spec.rb +48 -0
- data/test/specs/label_utils_spec.rb +97 -0
- data/test/specs/learner_spec.rb +39 -0
- data/test/specs/node_extracted_spec.rb +90 -0
- data/test/specs/node_spec.rb +76 -0
- data/test/specs/node_structure_spec.rb +74 -0
- data/test/specs/rule_set_spec.rb +85 -0
- data/test/specs/rule_spec.rb +110 -0
- data/test/specs/token_stream_spec.rb +100 -7
- metadata +21 -28
- data/lib/ariel/example_document_loader.rb +0 -59
- data/lib/ariel/extracted_node.rb +0 -20
- data/lib/ariel/node_like.rb +0 -26
- data/lib/ariel/structure_node.rb +0 -75
- data/test/ariel_test_case.rb +0 -15
- data/test/test_candidate_selector.rb +0 -58
- data/test/test_example_document_loader.rb +0 -7
- data/test/test_label_utils.rb +0 -15
- data/test/test_learner.rb +0 -38
- data/test/test_rule.rb +0 -38
- data/test/test_structure_node.rb +0 -81
- data/test/test_token.rb +0 -16
- data/test/test_token_stream.rb +0 -82
- data/test/test_wildcards.rb +0 -18
data/README
CHANGED
@@ -1,98 +1,64 @@
|
|
1
|
-
= Ariel release 0.0
|
1
|
+
= Ariel release 0.1.0
|
2
|
+
|
3
|
+
== About - Ariel: A Ruby Information Extraction Library
|
4
|
+
Ariel is a library that allows you to extract information from semi-structured
|
5
|
+
documents (such as websites). It is different to existing tools because rather
|
6
|
+
than expecting the developer to write rules to extract the desired information,
|
7
|
+
Ariel will use a small number of labeled examples to generate and learn
|
8
|
+
effective extraction rules. It is developed by Alex Bradbury and released under
|
9
|
+
the MIT license. Ariel was started as a Google Summer of Code project mentored
|
10
|
+
by Austin Ziegler in 2006.
|
2
11
|
|
3
12
|
== Install
|
4
13
|
gem install ariel
|
5
14
|
|
6
15
|
== Announcement
|
7
|
-
This is the first public release of Ariel - A Ruby Information Extraction
|
8
|
-
Library. See my previous post, ruby-talk:200140[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/200140]
|
9
|
-
for more background information. This release supports defining a tree document
|
10
|
-
structure and learning rules to extract each node of this true. Handling of list
|
11
|
-
extraction and learning is not yet implemented, and is the next immediate
|
12
|
-
priority. See the examples directory included in this release and below for
|
13
|
-
discussion of the included examples. Rule learning is functional, and appears to
|
14
|
-
work well, but many refinements are possible. Look out for more updates and a
|
15
|
-
new releases shortly.
|
16
|
-
|
17
|
-
== About Ariel
|
18
|
-
Ariel intends to assist in extracting information from semi-structured
|
19
|
-
documents including (but not in any way limited to) web pages. Although you
|
20
|
-
may use libraries such as Hpricot or Rubyful Soup, or even plain Regular
|
21
|
-
Expressions to achieve the same goal, Ariel approaches the problem very
|
22
|
-
differently. Ariel relies on the user labeling examples of the data they
|
23
|
-
want to extract, and then finds patterns across several such labeled
|
24
|
-
examples in order to produce a set of general rules for extracting this
|
25
|
-
information from any similar document. It uses the MIT license.
|
26
|
-
|
27
|
-
== Examples
|
28
|
-
This release includes two examples in the example directory (which should now
|
29
|
-
be in the directory to which rubygems installed ariel). The first is the
|
30
|
-
google_calculator directory (inspired by Justin Bailey's post to my Ariel
|
31
|
-
progress report). The structure is very simple, a calculation is extracted from
|
32
|
-
the page, and then the actual result is extracted from that calculation. 3
|
33
|
-
labeled examples are included. Ariel reads each of these, tokenizes them,
|
34
|
-
and extracts each label. 4 sets of rules are learnt:
|
35
|
-
1. Rules to locate the start of the calculation in the original document.
|
36
|
-
2. Rules to locate the end of the calculation in the original document (applied
|
37
|
-
from the end of the document).
|
38
|
-
3. Rules to locate the start of the result of the calculation from the
|
39
|
-
extracted calculation.
|
40
|
-
4. Rules to locate the end of the result of the calculation from the extracted
|
41
|
-
calculation (applied from the end of the calculation).
|
42
|
-
|
43
|
-
Take note of 3 and 4 - this is the advantage of treating a document as a tree in
|
44
|
-
this way. Deeply nested elements can be located by generating a series of simple
|
45
|
-
rules, rather than generating a rule with complexity that increases at each
|
46
|
-
level. Sets of rules are generated because it may not be possible to generate a
|
47
|
-
single rule that will catch all cases. A rule is found that matches as many of
|
48
|
-
the examples as possible (and fails on the rest), these examples are then removed
|
49
|
-
and a rule is found that will match as many of the remaining examples and so on.
|
50
|
-
When it comes to applying these learnt rules, the rules are applied in order
|
51
|
-
until there is a rule that matches.
|
52
|
-
|
53
|
-
To see this example for yourself just execute structure.rb in the
|
54
|
-
examples/google_calculator directory to create a locally writable
|
55
|
-
structure.yaml. Then do:
|
56
|
-
ariel -D -m learn -s structure.yaml -d /path/to/examples/google_calculator/labeled
|
57
|
-
|
58
|
-
You'll have to wait a while (see my note about performance below). At the end,
|
59
|
-
the learnt rules will be printed in YAML format, and structure.yaml will be
|
60
|
-
updated to include these rules. Apply these learnt rules to some unlabeled
|
61
|
-
documents by doing:
|
62
|
-
ariel -D -m extract -s structure.yaml -d /path/to/examples/google_calculator/unlabeled
|
63
|
-
|
64
|
-
You should see the results of a successful extraction printed to your terminal,
|
65
|
-
such as this one:
|
66
16
|
|
67
|
-
|
68
|
-
|
69
|
-
|
17
|
+
I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
|
18
|
+
Code work. This release should be easy to use, very functional, and hopefully
|
19
|
+
useful - so it's worth trying out. I've put a lot of effort in to writing clear
|
20
|
+
and straightforward documentation to get your started, so take a look at the
|
21
|
+
docs available at http://ariel.rubyforge.org. In particular, flick through the
|
22
|
+
tutorial and quick start guide. If you're interested, you may also want to take
|
23
|
+
a look at the theory page where I've made a good start on describing the method
|
24
|
+
Ariel uses to learn extraction rules. If you have any problems or find any bugs,
|
25
|
+
just send me an email or add it to the issue tracker (see link below). Enjoy.
|
26
|
+
See the FAQ for a vim snippet to make labeling examples a little easier.
|
70
27
|
|
71
|
-
|
72
|
-
fewer than I'd recommend in most cases, but as it works... This example consists
|
73
|
-
of project entries in the Ruby Application Archive. The structure of the page is
|
74
|
-
very flat, so all rules are applied to the full page. Rules are learnt and
|
75
|
-
applied as shown above. The structure.yaml files included in the examples
|
76
|
-
directories already include rules generated by Ariel, use these if you just want
|
77
|
-
to see extraction working.
|
28
|
+
== Quickstart/Basic usage
|
78
29
|
|
79
|
-
|
80
|
-
|
30
|
+
* @require 'ariel'@
|
31
|
+
* Define a structure for the information you wish to extract:
|
32
|
+
structure = Ariel::Node::Structure.new do |r|
|
33
|
+
r.item :title
|
34
|
+
r.item :body
|
35
|
+
r.list :comments do |c|
|
36
|
+
c.list_item :comment do |d|
|
37
|
+
d.item :author
|
38
|
+
d.item :body
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
* Collect a few examples of the sort of document you wish to extract information
|
43
|
+
from (pages from the same website for instance).
|
44
|
+
* Label each example with tags such as <l:title>, <l:comment> and so on in the
|
45
|
+
relevant places.
|
46
|
+
* Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
|
47
|
+
* Find the documents you want to extract information from.
|
48
|
+
* extractions = Ariel.extract structure, unlabeled_file1,
|
49
|
+
unlabeled_file2
|
50
|
+
* extractions[0].search('comments/*/body').each {|e| puts e.extracted_text} =>
|
51
|
+
"Great stuff, loving it", "I love life", .....
|
52
|
+
* extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
|
53
|
+
returns the first result rather than an array of matches).
|
81
54
|
|
82
|
-
== Performance
|
83
|
-
Generating rules takes quite a long time. It is always going to be an intensive
|
84
|
-
operation, but there are some very simple and obvious improvements in efficiency
|
85
|
-
that can be made. For a start, the rule candidate refining process currently
|
86
|
-
re-applies the same rules over and over every time the remaining rule candidates
|
87
|
-
are ranked. This is where most time is spent, and caching these should make a
|
88
|
-
big difference. This will definitely be implemented. Other performance
|
89
|
-
enhancements are bound to be there, but my focus at this time is to get
|
90
|
-
something that works.
|
91
55
|
|
92
56
|
== Credits
|
93
57
|
Ariel is developed by Alex Bradbury as a Google Summer of Code project under the
|
94
58
|
mentoring of Austin Ziegler.
|
95
59
|
|
96
60
|
== Links
|
97
|
-
|
98
|
-
|
61
|
+
SVN Repository: http://rubyforge.org/projects/ariel
|
62
|
+
Issue tracker: http://code.google.com/p/ariel/issues/
|
63
|
+
Documentation/homepage: http://ariel.rubyforge.org
|
64
|
+
RDoc: http://ariel.rubyforge.org/rdoc/
|
data/bin/ariel
CHANGED
@@ -14,43 +14,52 @@ OptionParser.new do |opts|
|
|
14
14
|
end
|
15
15
|
|
16
16
|
opts.on('-d', '--dir=DIRECTORY', 'Directory to look for documents to operate on.') do |dir|
|
17
|
+
raise ArgumentError, "directory does not exist" unless FileTest.directory? dir
|
17
18
|
options[:dir]=dir
|
18
19
|
end
|
19
20
|
|
20
|
-
opts.on('-D', '--debug', '
|
21
|
+
opts.on('-D', '--debug', 'Enable debugging output.') do
|
21
22
|
$DEBUG=true
|
22
23
|
end
|
23
24
|
|
24
25
|
opts.on('-s', '--structure=STRUCTURE', 'YAML file in which the structure is defined') do |structure|
|
25
26
|
options[:structure]=structure
|
26
27
|
end
|
28
|
+
|
29
|
+
opts.on('-o', '--output-dir=DIRECTORY', 'Directory to output to') do |dir|
|
30
|
+
raise ArgumentError, "directory does not exist" unless FileTest.directory? dir
|
31
|
+
options[:output_dir]=dir
|
32
|
+
end
|
27
33
|
end.parse!
|
28
34
|
|
29
|
-
require 'ariel' #After option parsing
|
35
|
+
require 'ariel' #After option parsing so debug setting can take effect
|
36
|
+
|
37
|
+
files=Dir["#{options[:dir]}/*"].select {|file_name| File.file? file_name}
|
38
|
+
structure=YAML.load_file options[:structure]
|
30
39
|
|
31
40
|
case options[:mode]
|
32
41
|
when "learn"
|
33
|
-
structure
|
34
|
-
learnt_structure=Ariel::ExampleDocumentLoader.load_directory options[:dir], structure
|
42
|
+
Ariel.learn(structure, *files)
|
35
43
|
File.open(options[:structure], 'wb') do |file|
|
36
|
-
YAML.dump(
|
37
|
-
end
|
38
|
-
learnt_structure.each_descendant do |structure_node|
|
39
|
-
puts structure_node.meta.name.to_s
|
40
|
-
puts structure_node.ruleset.to_yaml
|
44
|
+
YAML.dump(structure, file)
|
41
45
|
end
|
46
|
+
|
42
47
|
when "extract"
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
48
|
+
extractions = Ariel.extract(structure, *files)
|
49
|
+
if options[:output_dir]
|
50
|
+
extractions.zip(files) do |extraction, file|
|
51
|
+
filename=File.join(options[:output_dir], File.basename(file)+'.yaml')
|
52
|
+
File.open(filename, 'wb') do |f|
|
53
|
+
YAML.dump(extraction, f)
|
54
|
+
end
|
55
|
+
end
|
56
|
+
else
|
57
|
+
puts "No --output-dir given, so printing extractions to stdout"
|
58
|
+
extractions.each do |extraction|
|
59
|
+
extraction.each_descendant do |node|
|
60
|
+
puts "#{node.node_name}: #{node.tokenstream.text}"
|
61
|
+
end
|
62
|
+
puts #Blank line looks prettier
|
52
63
|
end
|
53
|
-
puts
|
54
|
-
# puts root_node.to_yaml
|
55
64
|
end
|
56
65
|
end
|
@@ -1,12 +1,12 @@
|
|
1
1
|
require 'ariel'
|
2
2
|
require 'yaml'
|
3
3
|
|
4
|
-
structure = Ariel::
|
4
|
+
structure = Ariel::Node::Structure.new do |r|
|
5
5
|
r.item :calculation do |c|
|
6
6
|
c.item :result
|
7
7
|
end
|
8
8
|
end
|
9
9
|
|
10
|
-
File.open('structure.yaml') do |file|
|
10
|
+
File.open('structure.yaml', 'w') do |file|
|
11
11
|
YAML.dump structure, file
|
12
12
|
end
|
@@ -1,46 +1,44 @@
|
|
1
|
-
--- &id002 !ruby/object:Ariel::
|
1
|
+
--- &id002 !ruby/object:Ariel::Node::Structure
|
2
2
|
children:
|
3
|
-
:calculation: &id001 !ruby/object:Ariel::
|
3
|
+
:calculation: &id001 !ruby/object:Ariel::Node::Structure
|
4
4
|
children:
|
5
|
-
:result: !ruby/object:Ariel::
|
5
|
+
:result: !ruby/object:Ariel::Node::Structure
|
6
6
|
children: {}
|
7
7
|
|
8
|
-
|
9
|
-
|
10
|
-
:node_type: :not_list
|
11
|
-
:name: :result
|
8
|
+
node_name: :result
|
9
|
+
node_type: :not_list
|
12
10
|
parent: *id001
|
13
11
|
ruleset: !ruby/object:Ariel::RuleSet
|
14
12
|
end_rules:
|
15
13
|
- !ruby/object:Ariel::Rule
|
16
14
|
direction: :back
|
15
|
+
exhaustive: false
|
17
16
|
landmarks: []
|
18
17
|
|
19
18
|
start_rules:
|
20
19
|
- !ruby/object:Ariel::Rule
|
21
20
|
direction: :forward
|
21
|
+
exhaustive: false
|
22
22
|
landmarks:
|
23
23
|
- - "="
|
24
|
-
|
25
|
-
|
26
|
-
:node_type: :not_list
|
27
|
-
:name: :calculation
|
24
|
+
node_name: :calculation
|
25
|
+
node_type: :not_list
|
28
26
|
parent: *id002
|
29
27
|
ruleset: !ruby/object:Ariel::RuleSet
|
30
28
|
end_rules:
|
31
29
|
- !ruby/object:Ariel::Rule
|
32
30
|
direction: :back
|
31
|
+
exhaustive: false
|
33
32
|
landmarks:
|
34
33
|
- - </b>
|
35
34
|
- - </b>
|
36
35
|
start_rules:
|
37
36
|
- !ruby/object:Ariel::Rule
|
38
37
|
direction: :forward
|
38
|
+
exhaustive: false
|
39
39
|
landmarks:
|
40
40
|
- - <b>
|
41
41
|
- - gif
|
42
42
|
- - <b>
|
43
|
-
|
44
|
-
|
45
|
-
:node_type: :not_list
|
46
|
-
:name: :root
|
43
|
+
node_name: :root
|
44
|
+
node_type: :not_list
|
@@ -96,17 +96,18 @@ highline / <l:current_version>1.2.0</l:current_version>
|
|
96
96
|
|
97
97
|
<tr><th>Versions: </th>
|
98
98
|
<td>
|
99
|
-
<l:version_history>[<a
|
99
|
+
<l:version_history>[<a
|
100
|
+
href="project/highline/1.2.0"><l:version>1.2.0</l:version></a> (2006-03-23)]
|
100
101
|
|
101
102
|
[<a href="project/highline/1.0.2">1.0.2</a> (2006-02-20)]
|
102
103
|
|
103
|
-
[<a href="project/highline/1.0.1">1.0.1</a> (2005-07-07)]
|
104
|
+
[<a href="project/highline/1.0.1"><l:version>1.0.1</l:version></a> (2005-07-07)]
|
104
105
|
|
105
106
|
[<a href="project/highline/1.0.0">1.0.0</a> (2005-07-07)]
|
106
107
|
|
107
|
-
[<a href="project/highline/0.6.1">0.6.1</a> (2005-05-26)]
|
108
|
+
[<a href="project/highline/0.6.1"><l:version>0.6.1</l:version></a> (2005-05-26)]
|
108
109
|
|
109
|
-
[<a href="project/highline/0.6.0">0.6.0</a>
|
110
|
+
[<a href="project/highline/0.6.0"><l:version>0.6.0</l:version></a>
|
110
111
|
(2005-05-21)]</l:version_history>
|
111
112
|
|
112
113
|
</td>
|
@@ -126,21 +126,22 @@ mongrel / <l:current_version>0.3.12</l:current_version>
|
|
126
126
|
|
127
127
|
<tr><th>Versions: </th>
|
128
128
|
<td>
|
129
|
-
<l:version_history>[<a
|
129
|
+
<l:version_history>[<a
|
130
|
+
href="project/mongrel/0.3.12"><l:version>0.3.12</l:version></a> (2006-03-30)]
|
130
131
|
|
131
|
-
[<a href="project/mongrel/0.3.11">0.3.11</a> (2006-03-15)]
|
132
|
+
[<a href="project/mongrel/0.3.11"><l:version>0.3.11</l:version></a> (2006-03-15)]
|
132
133
|
|
133
|
-
[<a href="project/mongrel/0.3.10">0.3.10</a> (2006-03-12)]
|
134
|
+
[<a href="project/mongrel/0.3.10"><l:version>0.3.10</l:version></a> (2006-03-12)]
|
134
135
|
|
135
|
-
[<a href="project/mongrel/0.3.9">0.3.9</a> (2006-03-06)]
|
136
|
+
[<a href="project/mongrel/0.3.9"><l:version>0.3.9</l:version></a> (2006-03-06)]
|
136
137
|
|
137
|
-
[<a href="project/mongrel/0.3.8">0.3.8</a> (2006-03-04)]
|
138
|
+
[<a href="project/mongrel/0.3.8"><l:version>0.3.8</l:version></a> (2006-03-04)]
|
138
139
|
|
139
|
-
[<a href="project/mongrel/0.3.6">0.3.6</a> (2006-02-23)]
|
140
|
+
[<a href="project/mongrel/0.3.6"><l:version>0.3.6</l:version></a> (2006-02-23)]
|
140
141
|
|
141
|
-
[<a href="project/mongrel/0.3.2">0.3.2</a> (2006-02-13)]
|
142
|
+
[<a href="project/mongrel/0.3.2"><l:version>0.3.2</l:version></a> (2006-02-13)]
|
142
143
|
|
143
|
-
[<a href="project/mongrel/0.3.1">0.3.1</a> (2006-02-12)]</l:version_history>
|
144
|
+
[<a href="project/mongrel/0.3.1"><l:version>0.3.1</l:version></a> (2006-02-12)]</l:version_history>
|
144
145
|
|
145
146
|
</td>
|
146
147
|
</tr>
|
data/examples/raa/structure.rb
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
require 'ariel'
|
2
2
|
require 'yaml'
|
3
3
|
|
4
|
-
structure = Ariel::
|
4
|
+
structure = Ariel::Node::Structure.new do |r|
|
5
5
|
r.item :name
|
6
6
|
r.item :current_version
|
7
7
|
r.item :short_description
|
@@ -9,7 +9,9 @@ structure = Ariel::StructureNode.new do |r|
|
|
9
9
|
r.item :owner
|
10
10
|
r.item :homepage
|
11
11
|
r.item :license
|
12
|
-
r.
|
12
|
+
r.list :version_history do |v|
|
13
|
+
v.list_item :version
|
14
|
+
end
|
13
15
|
end
|
14
16
|
|
15
17
|
File.open('structure.yaml', 'wb') do |file|
|
data/examples/raa/structure.yaml
CHANGED
@@ -1,38 +1,16 @@
|
|
1
|
-
--- &id001 !ruby/object:Ariel::
|
1
|
+
--- &id001 !ruby/object:Ariel::Node::Structure
|
2
2
|
children:
|
3
|
-
:
|
3
|
+
:short_description: !ruby/object:Ariel::Node::Structure
|
4
4
|
children: {}
|
5
5
|
|
6
|
-
|
7
|
-
|
8
|
-
:name: :version_history
|
9
|
-
:node_type: :not_list
|
10
|
-
parent: *id001
|
11
|
-
ruleset: !ruby/object:Ariel::RuleSet
|
12
|
-
end_rules:
|
13
|
-
- !ruby/object:Ariel::Rule
|
14
|
-
direction: :back
|
15
|
-
landmarks:
|
16
|
-
- - </td>
|
17
|
-
start_rules:
|
18
|
-
- !ruby/object:Ariel::Rule
|
19
|
-
direction: :forward
|
20
|
-
landmarks:
|
21
|
-
- - <td>
|
22
|
-
- - Versions
|
23
|
-
- - <td>
|
24
|
-
:short_description: !ruby/object:Ariel::StructureNode
|
25
|
-
children: {}
|
26
|
-
|
27
|
-
meta: !ruby/object:OpenStruct
|
28
|
-
table:
|
29
|
-
:name: :short_description
|
30
|
-
:node_type: :not_list
|
6
|
+
node_name: :short_description
|
7
|
+
node_type: :not_list
|
31
8
|
parent: *id001
|
32
9
|
ruleset: !ruby/object:Ariel::RuleSet
|
33
10
|
end_rules:
|
34
11
|
- !ruby/object:Ariel::Rule
|
35
12
|
direction: :back
|
13
|
+
exhaustive: false
|
36
14
|
landmarks:
|
37
15
|
- - </td>
|
38
16
|
- - Category
|
@@ -40,109 +18,109 @@ children:
|
|
40
18
|
start_rules:
|
41
19
|
- !ruby/object:Ariel::Rule
|
42
20
|
direction: :forward
|
21
|
+
exhaustive: false
|
43
22
|
landmarks:
|
44
23
|
- - <td>
|
45
|
-
:
|
24
|
+
:homepage: !ruby/object:Ariel::Node::Structure
|
46
25
|
children: {}
|
47
26
|
|
48
|
-
|
49
|
-
|
50
|
-
:name: :current_version
|
51
|
-
:node_type: :not_list
|
27
|
+
node_name: :homepage
|
28
|
+
node_type: :not_list
|
52
29
|
parent: *id001
|
53
30
|
ruleset: !ruby/object:Ariel::RuleSet
|
54
31
|
end_rules:
|
55
32
|
- !ruby/object:Ariel::Rule
|
56
33
|
direction: :back
|
34
|
+
exhaustive: false
|
57
35
|
landmarks:
|
58
|
-
- - </
|
59
|
-
- -
|
60
|
-
- - </
|
36
|
+
- - </a>
|
37
|
+
- - Download
|
38
|
+
- - </a>
|
61
39
|
start_rules:
|
62
40
|
- !ruby/object:Ariel::Rule
|
63
41
|
direction: :forward
|
42
|
+
exhaustive: false
|
64
43
|
landmarks:
|
65
|
-
- -
|
66
|
-
- -
|
67
|
-
- -
|
68
|
-
:
|
44
|
+
- - ">"
|
45
|
+
- - rubyforge
|
46
|
+
- - ">"
|
47
|
+
:category: !ruby/object:Ariel::Node::Structure
|
69
48
|
children: {}
|
70
49
|
|
71
|
-
|
72
|
-
|
73
|
-
:name: :homepage
|
74
|
-
:node_type: :not_list
|
50
|
+
node_name: :category
|
51
|
+
node_type: :not_list
|
75
52
|
parent: *id001
|
76
53
|
ruleset: !ruby/object:Ariel::RuleSet
|
77
54
|
end_rules:
|
78
55
|
- !ruby/object:Ariel::Rule
|
79
56
|
direction: :back
|
57
|
+
exhaustive: false
|
80
58
|
landmarks:
|
81
|
-
- - </
|
82
|
-
- -
|
83
|
-
- - </
|
59
|
+
- - </td>
|
60
|
+
- - Status
|
61
|
+
- - </td>
|
84
62
|
start_rules:
|
85
63
|
- !ruby/object:Ariel::Rule
|
86
64
|
direction: :forward
|
65
|
+
exhaustive: false
|
87
66
|
landmarks:
|
88
|
-
- -
|
89
|
-
- -
|
90
|
-
|
91
|
-
:category: !ruby/object:Ariel::StructureNode
|
67
|
+
- - <td>
|
68
|
+
- - <td>
|
69
|
+
:current_version: !ruby/object:Ariel::Node::Structure
|
92
70
|
children: {}
|
93
71
|
|
94
|
-
|
95
|
-
|
96
|
-
:name: :category
|
97
|
-
:node_type: :not_list
|
72
|
+
node_name: :current_version
|
73
|
+
node_type: :not_list
|
98
74
|
parent: *id001
|
99
75
|
ruleset: !ruby/object:Ariel::RuleSet
|
100
76
|
end_rules:
|
101
77
|
- !ruby/object:Ariel::Rule
|
102
78
|
direction: :back
|
79
|
+
exhaustive: false
|
103
80
|
landmarks:
|
104
|
-
- - </
|
105
|
-
- -
|
106
|
-
- - </
|
81
|
+
- - </p>
|
82
|
+
- - table
|
83
|
+
- - </p>
|
107
84
|
start_rules:
|
108
85
|
- !ruby/object:Ariel::Rule
|
109
86
|
direction: :forward
|
87
|
+
exhaustive: false
|
110
88
|
landmarks:
|
111
|
-
- -
|
112
|
-
- -
|
113
|
-
|
89
|
+
- - :anything
|
90
|
+
- - caption
|
91
|
+
- - /
|
92
|
+
:name: !ruby/object:Ariel::Node::Structure
|
114
93
|
children: {}
|
115
94
|
|
116
|
-
|
117
|
-
|
118
|
-
:name: :name
|
119
|
-
:node_type: :not_list
|
95
|
+
node_name: :name
|
96
|
+
node_type: :not_list
|
120
97
|
parent: *id001
|
121
98
|
ruleset: !ruby/object:Ariel::RuleSet
|
122
99
|
end_rules:
|
123
100
|
- !ruby/object:Ariel::Rule
|
124
101
|
direction: :back
|
102
|
+
exhaustive: false
|
125
103
|
landmarks:
|
126
104
|
- - </title>
|
127
105
|
start_rules:
|
128
106
|
- !ruby/object:Ariel::Rule
|
129
107
|
direction: :forward
|
108
|
+
exhaustive: false
|
130
109
|
landmarks:
|
131
110
|
- - "-"
|
132
111
|
- - RAA
|
133
112
|
- "-"
|
134
|
-
:owner: !ruby/object:Ariel::
|
113
|
+
:owner: !ruby/object:Ariel::Node::Structure
|
135
114
|
children: {}
|
136
115
|
|
137
|
-
|
138
|
-
|
139
|
-
:name: :owner
|
140
|
-
:node_type: :not_list
|
116
|
+
node_name: :owner
|
117
|
+
node_type: :not_list
|
141
118
|
parent: *id001
|
142
119
|
ruleset: !ruby/object:Ariel::RuleSet
|
143
120
|
end_rules:
|
144
121
|
- !ruby/object:Ariel::Rule
|
145
122
|
direction: :back
|
123
|
+
exhaustive: false
|
146
124
|
landmarks:
|
147
125
|
- - </a>
|
148
126
|
- - id
|
@@ -150,22 +128,22 @@ children:
|
|
150
128
|
start_rules:
|
151
129
|
- !ruby/object:Ariel::Rule
|
152
130
|
direction: :forward
|
131
|
+
exhaustive: false
|
153
132
|
landmarks:
|
154
133
|
- - ">"
|
155
134
|
- - Owner
|
156
135
|
- - ">"
|
157
|
-
:license: !ruby/object:Ariel::
|
136
|
+
:license: !ruby/object:Ariel::Node::Structure
|
158
137
|
children: {}
|
159
138
|
|
160
|
-
|
161
|
-
|
162
|
-
:name: :license
|
163
|
-
:node_type: :not_list
|
139
|
+
node_name: :license
|
140
|
+
node_type: :not_list
|
164
141
|
parent: *id001
|
165
142
|
ruleset: !ruby/object:Ariel::RuleSet
|
166
143
|
end_rules:
|
167
144
|
- !ruby/object:Ariel::Rule
|
168
145
|
direction: :back
|
146
|
+
exhaustive: false
|
169
147
|
landmarks:
|
170
148
|
- - </td>
|
171
149
|
- - Dependency
|
@@ -173,11 +151,49 @@ children:
|
|
173
151
|
start_rules:
|
174
152
|
- !ruby/object:Ariel::Rule
|
175
153
|
direction: :forward
|
154
|
+
exhaustive: false
|
176
155
|
landmarks:
|
177
156
|
- - <td>
|
178
157
|
- - License
|
179
158
|
- - <td>
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
159
|
+
:version_history: &id002 !ruby/object:Ariel::Node::Structure
|
160
|
+
children:
|
161
|
+
:version: !ruby/object:Ariel::Node::Structure
|
162
|
+
children: {}
|
163
|
+
|
164
|
+
node_name: :version
|
165
|
+
node_type: :list_item
|
166
|
+
parent: *id002
|
167
|
+
ruleset: !ruby/object:Ariel::RuleSet
|
168
|
+
end_rules:
|
169
|
+
- !ruby/object:Ariel::Rule
|
170
|
+
direction: :back
|
171
|
+
exhaustive: true
|
172
|
+
landmarks:
|
173
|
+
- - </a>
|
174
|
+
start_rules:
|
175
|
+
- !ruby/object:Ariel::Rule
|
176
|
+
direction: :forward
|
177
|
+
exhaustive: true
|
178
|
+
landmarks:
|
179
|
+
- - ">"
|
180
|
+
node_name: :version_history
|
181
|
+
node_type: :not_list
|
182
|
+
parent: *id001
|
183
|
+
ruleset: !ruby/object:Ariel::RuleSet
|
184
|
+
end_rules:
|
185
|
+
- !ruby/object:Ariel::Rule
|
186
|
+
direction: :back
|
187
|
+
exhaustive: false
|
188
|
+
landmarks:
|
189
|
+
- - </td>
|
190
|
+
start_rules:
|
191
|
+
- !ruby/object:Ariel::Rule
|
192
|
+
direction: :forward
|
193
|
+
exhaustive: false
|
194
|
+
landmarks:
|
195
|
+
- - <td>
|
196
|
+
- - Versions
|
197
|
+
- - <td>
|
198
|
+
node_name: :root
|
199
|
+
node_type: :not_list
|