ariel 0.0.1 → 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README +49 -83
- data/bin/ariel +29 -20
- data/examples/google_calculator/structure.rb +2 -2
- data/examples/google_calculator/structure.yaml +13 -15
- data/examples/raa/labeled/highline.html +5 -4
- data/examples/raa/labeled/mongrel.html +9 -8
- data/examples/raa/structure.rb +4 -2
- data/examples/raa/structure.yaml +94 -78
- data/lib/ariel.rb +71 -33
- data/lib/ariel/{candidate_selector.rb → candidate_refiner.rb} +39 -38
- data/lib/ariel/label_utils.rb +46 -18
- data/lib/ariel/labeled_document_loader.rb +77 -0
- data/lib/ariel/learner.rb +60 -38
- data/lib/ariel/log.rb +67 -0
- data/lib/ariel/node.rb +52 -0
- data/lib/ariel/node/extracted.rb +90 -0
- data/lib/ariel/node/structure.rb +91 -0
- data/lib/ariel/rule.rb +114 -32
- data/lib/ariel/rule_set.rb +34 -15
- data/lib/ariel/token.rb +9 -3
- data/lib/ariel/token_stream.rb +32 -17
- data/lib/ariel/wildcards.rb +19 -15
- data/test/fixtures.rb +45 -3
- data/test/specs/candidate_refiner_spec.rb +48 -0
- data/test/specs/label_utils_spec.rb +97 -0
- data/test/specs/learner_spec.rb +39 -0
- data/test/specs/node_extracted_spec.rb +90 -0
- data/test/specs/node_spec.rb +76 -0
- data/test/specs/node_structure_spec.rb +74 -0
- data/test/specs/rule_set_spec.rb +85 -0
- data/test/specs/rule_spec.rb +110 -0
- data/test/specs/token_stream_spec.rb +100 -7
- metadata +21 -28
- data/lib/ariel/example_document_loader.rb +0 -59
- data/lib/ariel/extracted_node.rb +0 -20
- data/lib/ariel/node_like.rb +0 -26
- data/lib/ariel/structure_node.rb +0 -75
- data/test/ariel_test_case.rb +0 -15
- data/test/test_candidate_selector.rb +0 -58
- data/test/test_example_document_loader.rb +0 -7
- data/test/test_label_utils.rb +0 -15
- data/test/test_learner.rb +0 -38
- data/test/test_rule.rb +0 -38
- data/test/test_structure_node.rb +0 -81
- data/test/test_token.rb +0 -16
- data/test/test_token_stream.rb +0 -82
- data/test/test_wildcards.rb +0 -18
data/README
CHANGED
@@ -1,98 +1,64 @@
|
|
1
|
-
= Ariel release 0.0
|
1
|
+
= Ariel release 0.1.0
|
2
|
+
|
3
|
+
== About - Ariel: A Ruby Information Extraction Library
|
4
|
+
Ariel is a library that allows you to extract information from semi-structured
|
5
|
+
documents (such as websites). It is different to existing tools because rather
|
6
|
+
than expecting the developer to write rules to extract the desired information,
|
7
|
+
Ariel will use a small number of labeled examples to generate and learn
|
8
|
+
effective extraction rules. It is developed by Alex Bradbury and released under
|
9
|
+
the MIT license. Ariel was started as a Google Summer of Code project mentored
|
10
|
+
by Austin Ziegler in 2006.
|
2
11
|
|
3
12
|
== Install
|
4
13
|
gem install ariel
|
5
14
|
|
6
15
|
== Announcement
|
7
|
-
This is the first public release of Ariel - A Ruby Information Extraction
|
8
|
-
Library. See my previous post, ruby-talk:200140[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/200140]
|
9
|
-
for more background information. This release supports defining a tree document
|
10
|
-
structure and learning rules to extract each node of this true. Handling of list
|
11
|
-
extraction and learning is not yet implemented, and is the next immediate
|
12
|
-
priority. See the examples directory included in this release and below for
|
13
|
-
discussion of the included examples. Rule learning is functional, and appears to
|
14
|
-
work well, but many refinements are possible. Look out for more updates and a
|
15
|
-
new releases shortly.
|
16
|
-
|
17
|
-
== About Ariel
|
18
|
-
Ariel intends to assist in extracting information from semi-structured
|
19
|
-
documents including (but not in any way limited to) web pages. Although you
|
20
|
-
may use libraries such as Hpricot or Rubyful Soup, or even plain Regular
|
21
|
-
Expressions to achieve the same goal, Ariel approaches the problem very
|
22
|
-
differently. Ariel relies on the user labeling examples of the data they
|
23
|
-
want to extract, and then finds patterns across several such labeled
|
24
|
-
examples in order to produce a set of general rules for extracting this
|
25
|
-
information from any similar document. It uses the MIT license.
|
26
|
-
|
27
|
-
== Examples
|
28
|
-
This release includes two examples in the example directory (which should now
|
29
|
-
be in the directory to which rubygems installed ariel). The first is the
|
30
|
-
google_calculator directory (inspired by Justin Bailey's post to my Ariel
|
31
|
-
progress report). The structure is very simple, a calculation is extracted from
|
32
|
-
the page, and then the actual result is extracted from that calculation. 3
|
33
|
-
labeled examples are included. Ariel reads each of these, tokenizes them,
|
34
|
-
and extracts each label. 4 sets of rules are learnt:
|
35
|
-
1. Rules to locate the start of the calculation in the original document.
|
36
|
-
2. Rules to locate the end of the calculation in the original document (applied
|
37
|
-
from the end of the document).
|
38
|
-
3. Rules to locate the start of the result of the calculation from the
|
39
|
-
extracted calculation.
|
40
|
-
4. Rules to locate the end of the result of the calculation from the extracted
|
41
|
-
calculation (applied from the end of the calculation).
|
42
|
-
|
43
|
-
Take note of 3 and 4 - this is the advantage of treating a document as a tree in
|
44
|
-
this way. Deeply nested elements can be located by generating a series of simple
|
45
|
-
rules, rather than generating a rule with complexity that increases at each
|
46
|
-
level. Sets of rules are generated because it may not be possible to generate a
|
47
|
-
single rule that will catch all cases. A rule is found that matches as many of
|
48
|
-
the examples as possible (and fails on the rest), these examples are then removed
|
49
|
-
and a rule is found that will match as many of the remaining examples and so on.
|
50
|
-
When it comes to applying these learnt rules, the rules are applied in order
|
51
|
-
until there is a rule that matches.
|
52
|
-
|
53
|
-
To see this example for yourself just execute structure.rb in the
|
54
|
-
examples/google_calculator directory to create a locally writable
|
55
|
-
structure.yaml. Then do:
|
56
|
-
ariel -D -m learn -s structure.yaml -d /path/to/examples/google_calculator/labeled
|
57
|
-
|
58
|
-
You'll have to wait a while (see my note about performance below). At the end,
|
59
|
-
the learnt rules will be printed in YAML format, and structure.yaml will be
|
60
|
-
updated to include these rules. Apply these learnt rules to some unlabeled
|
61
|
-
documents by doing:
|
62
|
-
ariel -D -m extract -s structure.yaml -d /path/to/examples/google_calculator/unlabeled
|
63
|
-
|
64
|
-
You should see the results of a successful extraction printed to your terminal,
|
65
|
-
such as this one:
|
66
16
|
|
67
|
-
|
68
|
-
|
69
|
-
|
17
|
+
I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
|
18
|
+
Code work. This release should be easy to use, very functional, and hopefully
|
19
|
+
useful - so it's worth trying out. I've put a lot of effort in to writing clear
|
20
|
+
and straightforward documentation to get your started, so take a look at the
|
21
|
+
docs available at http://ariel.rubyforge.org. In particular, flick through the
|
22
|
+
tutorial and quick start guide. If you're interested, you may also want to take
|
23
|
+
a look at the theory page where I've made a good start on describing the method
|
24
|
+
Ariel uses to learn extraction rules. If you have any problems or find any bugs,
|
25
|
+
just send me an email or add it to the issue tracker (see link below). Enjoy.
|
26
|
+
See the FAQ for a vim snippet to make labeling examples a little easier.
|
70
27
|
|
71
|
-
|
72
|
-
fewer than I'd recommend in most cases, but as it works... This example consists
|
73
|
-
of project entries in the Ruby Application Archive. The structure of the page is
|
74
|
-
very flat, so all rules are applied to the full page. Rules are learnt and
|
75
|
-
applied as shown above. The structure.yaml files included in the examples
|
76
|
-
directories already include rules generated by Ariel, use these if you just want
|
77
|
-
to see extraction working.
|
28
|
+
== Quickstart/Basic usage
|
78
29
|
|
79
|
-
|
80
|
-
|
30
|
+
* @require 'ariel'@
|
31
|
+
* Define a structure for the information you wish to extract:
|
32
|
+
structure = Ariel::Node::Structure.new do |r|
|
33
|
+
r.item :title
|
34
|
+
r.item :body
|
35
|
+
r.list :comments do |c|
|
36
|
+
c.list_item :comment do |d|
|
37
|
+
d.item :author
|
38
|
+
d.item :body
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
* Collect a few examples of the sort of document you wish to extract information
|
43
|
+
from (pages from the same website for instance).
|
44
|
+
* Label each example with tags such as <l:title>, <l:comment> and so on in the
|
45
|
+
relevant places.
|
46
|
+
* Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
|
47
|
+
* Find the documents you want to extract information from.
|
48
|
+
* extractions = Ariel.extract structure, unlabeled_file1,
|
49
|
+
unlabeled_file2
|
50
|
+
* extractions[0].search('comments/*/body').each {|e| puts e.extracted_text} =>
|
51
|
+
"Great stuff, loving it", "I love life", .....
|
52
|
+
* extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
|
53
|
+
returns the first result rather than an array of matches).
|
81
54
|
|
82
|
-
== Performance
|
83
|
-
Generating rules takes quite a long time. It is always going to be an intensive
|
84
|
-
operation, but there are some very simple and obvious improvements in efficiency
|
85
|
-
that can be made. For a start, the rule candidate refining process currently
|
86
|
-
re-applies the same rules over and over every time the remaining rule candidates
|
87
|
-
are ranked. This is where most time is spent, and caching these should make a
|
88
|
-
big difference. This will definitely be implemented. Other performance
|
89
|
-
enhancements are bound to be there, but my focus at this time is to get
|
90
|
-
something that works.
|
91
55
|
|
92
56
|
== Credits
|
93
57
|
Ariel is developed by Alex Bradbury as a Google Summer of Code project under the
|
94
58
|
mentoring of Austin Ziegler.
|
95
59
|
|
96
60
|
== Links
|
97
|
-
|
98
|
-
|
61
|
+
SVN Repository: http://rubyforge.org/projects/ariel
|
62
|
+
Issue tracker: http://code.google.com/p/ariel/issues/
|
63
|
+
Documentation/homepage: http://ariel.rubyforge.org
|
64
|
+
RDoc: http://ariel.rubyforge.org/rdoc/
|
data/bin/ariel
CHANGED
@@ -14,43 +14,52 @@ OptionParser.new do |opts|
|
|
14
14
|
end
|
15
15
|
|
16
16
|
opts.on('-d', '--dir=DIRECTORY', 'Directory to look for documents to operate on.') do |dir|
|
17
|
+
raise ArgumentError, "directory does not exist" unless FileTest.directory? dir
|
17
18
|
options[:dir]=dir
|
18
19
|
end
|
19
20
|
|
20
|
-
opts.on('-D', '--debug', '
|
21
|
+
opts.on('-D', '--debug', 'Enable debugging output.') do
|
21
22
|
$DEBUG=true
|
22
23
|
end
|
23
24
|
|
24
25
|
opts.on('-s', '--structure=STRUCTURE', 'YAML file in which the structure is defined') do |structure|
|
25
26
|
options[:structure]=structure
|
26
27
|
end
|
28
|
+
|
29
|
+
opts.on('-o', '--output-dir=DIRECTORY', 'Directory to output to') do |dir|
|
30
|
+
raise ArgumentError, "directory does not exist" unless FileTest.directory? dir
|
31
|
+
options[:output_dir]=dir
|
32
|
+
end
|
27
33
|
end.parse!
|
28
34
|
|
29
|
-
require 'ariel' #After option parsing
|
35
|
+
require 'ariel' #After option parsing so debug setting can take effect
|
36
|
+
|
37
|
+
files=Dir["#{options[:dir]}/*"].select {|file_name| File.file? file_name}
|
38
|
+
structure=YAML.load_file options[:structure]
|
30
39
|
|
31
40
|
case options[:mode]
|
32
41
|
when "learn"
|
33
|
-
structure
|
34
|
-
learnt_structure=Ariel::ExampleDocumentLoader.load_directory options[:dir], structure
|
42
|
+
Ariel.learn(structure, *files)
|
35
43
|
File.open(options[:structure], 'wb') do |file|
|
36
|
-
YAML.dump(
|
37
|
-
end
|
38
|
-
learnt_structure.each_descendant do |structure_node|
|
39
|
-
puts structure_node.meta.name.to_s
|
40
|
-
puts structure_node.ruleset.to_yaml
|
44
|
+
YAML.dump(structure, file)
|
41
45
|
end
|
46
|
+
|
42
47
|
when "extract"
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
48
|
+
extractions = Ariel.extract(structure, *files)
|
49
|
+
if options[:output_dir]
|
50
|
+
extractions.zip(files) do |extraction, file|
|
51
|
+
filename=File.join(options[:output_dir], File.basename(file)+'.yaml')
|
52
|
+
File.open(filename, 'wb') do |f|
|
53
|
+
YAML.dump(extraction, f)
|
54
|
+
end
|
55
|
+
end
|
56
|
+
else
|
57
|
+
puts "No --output-dir given, so printing extractions to stdout"
|
58
|
+
extractions.each do |extraction|
|
59
|
+
extraction.each_descendant do |node|
|
60
|
+
puts "#{node.node_name}: #{node.tokenstream.text}"
|
61
|
+
end
|
62
|
+
puts #Blank line looks prettier
|
52
63
|
end
|
53
|
-
puts
|
54
|
-
# puts root_node.to_yaml
|
55
64
|
end
|
56
65
|
end
|
@@ -1,12 +1,12 @@
|
|
1
1
|
require 'ariel'
|
2
2
|
require 'yaml'
|
3
3
|
|
4
|
-
structure = Ariel::
|
4
|
+
structure = Ariel::Node::Structure.new do |r|
|
5
5
|
r.item :calculation do |c|
|
6
6
|
c.item :result
|
7
7
|
end
|
8
8
|
end
|
9
9
|
|
10
|
-
File.open('structure.yaml') do |file|
|
10
|
+
File.open('structure.yaml', 'w') do |file|
|
11
11
|
YAML.dump structure, file
|
12
12
|
end
|
@@ -1,46 +1,44 @@
|
|
1
|
-
--- &id002 !ruby/object:Ariel::
|
1
|
+
--- &id002 !ruby/object:Ariel::Node::Structure
|
2
2
|
children:
|
3
|
-
:calculation: &id001 !ruby/object:Ariel::
|
3
|
+
:calculation: &id001 !ruby/object:Ariel::Node::Structure
|
4
4
|
children:
|
5
|
-
:result: !ruby/object:Ariel::
|
5
|
+
:result: !ruby/object:Ariel::Node::Structure
|
6
6
|
children: {}
|
7
7
|
|
8
|
-
|
9
|
-
|
10
|
-
:node_type: :not_list
|
11
|
-
:name: :result
|
8
|
+
node_name: :result
|
9
|
+
node_type: :not_list
|
12
10
|
parent: *id001
|
13
11
|
ruleset: !ruby/object:Ariel::RuleSet
|
14
12
|
end_rules:
|
15
13
|
- !ruby/object:Ariel::Rule
|
16
14
|
direction: :back
|
15
|
+
exhaustive: false
|
17
16
|
landmarks: []
|
18
17
|
|
19
18
|
start_rules:
|
20
19
|
- !ruby/object:Ariel::Rule
|
21
20
|
direction: :forward
|
21
|
+
exhaustive: false
|
22
22
|
landmarks:
|
23
23
|
- - "="
|
24
|
-
|
25
|
-
|
26
|
-
:node_type: :not_list
|
27
|
-
:name: :calculation
|
24
|
+
node_name: :calculation
|
25
|
+
node_type: :not_list
|
28
26
|
parent: *id002
|
29
27
|
ruleset: !ruby/object:Ariel::RuleSet
|
30
28
|
end_rules:
|
31
29
|
- !ruby/object:Ariel::Rule
|
32
30
|
direction: :back
|
31
|
+
exhaustive: false
|
33
32
|
landmarks:
|
34
33
|
- - </b>
|
35
34
|
- - </b>
|
36
35
|
start_rules:
|
37
36
|
- !ruby/object:Ariel::Rule
|
38
37
|
direction: :forward
|
38
|
+
exhaustive: false
|
39
39
|
landmarks:
|
40
40
|
- - <b>
|
41
41
|
- - gif
|
42
42
|
- - <b>
|
43
|
-
|
44
|
-
|
45
|
-
:node_type: :not_list
|
46
|
-
:name: :root
|
43
|
+
node_name: :root
|
44
|
+
node_type: :not_list
|
@@ -96,17 +96,18 @@ highline / <l:current_version>1.2.0</l:current_version>
|
|
96
96
|
|
97
97
|
<tr><th>Versions: </th>
|
98
98
|
<td>
|
99
|
-
<l:version_history>[<a
|
99
|
+
<l:version_history>[<a
|
100
|
+
href="project/highline/1.2.0"><l:version>1.2.0</l:version></a> (2006-03-23)]
|
100
101
|
|
101
102
|
[<a href="project/highline/1.0.2">1.0.2</a> (2006-02-20)]
|
102
103
|
|
103
|
-
[<a href="project/highline/1.0.1">1.0.1</a> (2005-07-07)]
|
104
|
+
[<a href="project/highline/1.0.1"><l:version>1.0.1</l:version></a> (2005-07-07)]
|
104
105
|
|
105
106
|
[<a href="project/highline/1.0.0">1.0.0</a> (2005-07-07)]
|
106
107
|
|
107
|
-
[<a href="project/highline/0.6.1">0.6.1</a> (2005-05-26)]
|
108
|
+
[<a href="project/highline/0.6.1"><l:version>0.6.1</l:version></a> (2005-05-26)]
|
108
109
|
|
109
|
-
[<a href="project/highline/0.6.0">0.6.0</a>
|
110
|
+
[<a href="project/highline/0.6.0"><l:version>0.6.0</l:version></a>
|
110
111
|
(2005-05-21)]</l:version_history>
|
111
112
|
|
112
113
|
</td>
|
@@ -126,21 +126,22 @@ mongrel / <l:current_version>0.3.12</l:current_version>
|
|
126
126
|
|
127
127
|
<tr><th>Versions: </th>
|
128
128
|
<td>
|
129
|
-
<l:version_history>[<a
|
129
|
+
<l:version_history>[<a
|
130
|
+
href="project/mongrel/0.3.12"><l:version>0.3.12</l:version></a> (2006-03-30)]
|
130
131
|
|
131
|
-
[<a href="project/mongrel/0.3.11">0.3.11</a> (2006-03-15)]
|
132
|
+
[<a href="project/mongrel/0.3.11"><l:version>0.3.11</l:version></a> (2006-03-15)]
|
132
133
|
|
133
|
-
[<a href="project/mongrel/0.3.10">0.3.10</a> (2006-03-12)]
|
134
|
+
[<a href="project/mongrel/0.3.10"><l:version>0.3.10</l:version></a> (2006-03-12)]
|
134
135
|
|
135
|
-
[<a href="project/mongrel/0.3.9">0.3.9</a> (2006-03-06)]
|
136
|
+
[<a href="project/mongrel/0.3.9"><l:version>0.3.9</l:version></a> (2006-03-06)]
|
136
137
|
|
137
|
-
[<a href="project/mongrel/0.3.8">0.3.8</a> (2006-03-04)]
|
138
|
+
[<a href="project/mongrel/0.3.8"><l:version>0.3.8</l:version></a> (2006-03-04)]
|
138
139
|
|
139
|
-
[<a href="project/mongrel/0.3.6">0.3.6</a> (2006-02-23)]
|
140
|
+
[<a href="project/mongrel/0.3.6"><l:version>0.3.6</l:version></a> (2006-02-23)]
|
140
141
|
|
141
|
-
[<a href="project/mongrel/0.3.2">0.3.2</a> (2006-02-13)]
|
142
|
+
[<a href="project/mongrel/0.3.2"><l:version>0.3.2</l:version></a> (2006-02-13)]
|
142
143
|
|
143
|
-
[<a href="project/mongrel/0.3.1">0.3.1</a> (2006-02-12)]</l:version_history>
|
144
|
+
[<a href="project/mongrel/0.3.1"><l:version>0.3.1</l:version></a> (2006-02-12)]</l:version_history>
|
144
145
|
|
145
146
|
</td>
|
146
147
|
</tr>
|
data/examples/raa/structure.rb
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
require 'ariel'
|
2
2
|
require 'yaml'
|
3
3
|
|
4
|
-
structure = Ariel::
|
4
|
+
structure = Ariel::Node::Structure.new do |r|
|
5
5
|
r.item :name
|
6
6
|
r.item :current_version
|
7
7
|
r.item :short_description
|
@@ -9,7 +9,9 @@ structure = Ariel::StructureNode.new do |r|
|
|
9
9
|
r.item :owner
|
10
10
|
r.item :homepage
|
11
11
|
r.item :license
|
12
|
-
r.
|
12
|
+
r.list :version_history do |v|
|
13
|
+
v.list_item :version
|
14
|
+
end
|
13
15
|
end
|
14
16
|
|
15
17
|
File.open('structure.yaml', 'wb') do |file|
|
data/examples/raa/structure.yaml
CHANGED
@@ -1,38 +1,16 @@
|
|
1
|
-
--- &id001 !ruby/object:Ariel::
|
1
|
+
--- &id001 !ruby/object:Ariel::Node::Structure
|
2
2
|
children:
|
3
|
-
:
|
3
|
+
:short_description: !ruby/object:Ariel::Node::Structure
|
4
4
|
children: {}
|
5
5
|
|
6
|
-
|
7
|
-
|
8
|
-
:name: :version_history
|
9
|
-
:node_type: :not_list
|
10
|
-
parent: *id001
|
11
|
-
ruleset: !ruby/object:Ariel::RuleSet
|
12
|
-
end_rules:
|
13
|
-
- !ruby/object:Ariel::Rule
|
14
|
-
direction: :back
|
15
|
-
landmarks:
|
16
|
-
- - </td>
|
17
|
-
start_rules:
|
18
|
-
- !ruby/object:Ariel::Rule
|
19
|
-
direction: :forward
|
20
|
-
landmarks:
|
21
|
-
- - <td>
|
22
|
-
- - Versions
|
23
|
-
- - <td>
|
24
|
-
:short_description: !ruby/object:Ariel::StructureNode
|
25
|
-
children: {}
|
26
|
-
|
27
|
-
meta: !ruby/object:OpenStruct
|
28
|
-
table:
|
29
|
-
:name: :short_description
|
30
|
-
:node_type: :not_list
|
6
|
+
node_name: :short_description
|
7
|
+
node_type: :not_list
|
31
8
|
parent: *id001
|
32
9
|
ruleset: !ruby/object:Ariel::RuleSet
|
33
10
|
end_rules:
|
34
11
|
- !ruby/object:Ariel::Rule
|
35
12
|
direction: :back
|
13
|
+
exhaustive: false
|
36
14
|
landmarks:
|
37
15
|
- - </td>
|
38
16
|
- - Category
|
@@ -40,109 +18,109 @@ children:
|
|
40
18
|
start_rules:
|
41
19
|
- !ruby/object:Ariel::Rule
|
42
20
|
direction: :forward
|
21
|
+
exhaustive: false
|
43
22
|
landmarks:
|
44
23
|
- - <td>
|
45
|
-
:
|
24
|
+
:homepage: !ruby/object:Ariel::Node::Structure
|
46
25
|
children: {}
|
47
26
|
|
48
|
-
|
49
|
-
|
50
|
-
:name: :current_version
|
51
|
-
:node_type: :not_list
|
27
|
+
node_name: :homepage
|
28
|
+
node_type: :not_list
|
52
29
|
parent: *id001
|
53
30
|
ruleset: !ruby/object:Ariel::RuleSet
|
54
31
|
end_rules:
|
55
32
|
- !ruby/object:Ariel::Rule
|
56
33
|
direction: :back
|
34
|
+
exhaustive: false
|
57
35
|
landmarks:
|
58
|
-
- - </
|
59
|
-
- -
|
60
|
-
- - </
|
36
|
+
- - </a>
|
37
|
+
- - Download
|
38
|
+
- - </a>
|
61
39
|
start_rules:
|
62
40
|
- !ruby/object:Ariel::Rule
|
63
41
|
direction: :forward
|
42
|
+
exhaustive: false
|
64
43
|
landmarks:
|
65
|
-
- -
|
66
|
-
- -
|
67
|
-
- -
|
68
|
-
:
|
44
|
+
- - ">"
|
45
|
+
- - rubyforge
|
46
|
+
- - ">"
|
47
|
+
:category: !ruby/object:Ariel::Node::Structure
|
69
48
|
children: {}
|
70
49
|
|
71
|
-
|
72
|
-
|
73
|
-
:name: :homepage
|
74
|
-
:node_type: :not_list
|
50
|
+
node_name: :category
|
51
|
+
node_type: :not_list
|
75
52
|
parent: *id001
|
76
53
|
ruleset: !ruby/object:Ariel::RuleSet
|
77
54
|
end_rules:
|
78
55
|
- !ruby/object:Ariel::Rule
|
79
56
|
direction: :back
|
57
|
+
exhaustive: false
|
80
58
|
landmarks:
|
81
|
-
- - </
|
82
|
-
- -
|
83
|
-
- - </
|
59
|
+
- - </td>
|
60
|
+
- - Status
|
61
|
+
- - </td>
|
84
62
|
start_rules:
|
85
63
|
- !ruby/object:Ariel::Rule
|
86
64
|
direction: :forward
|
65
|
+
exhaustive: false
|
87
66
|
landmarks:
|
88
|
-
- -
|
89
|
-
- -
|
90
|
-
|
91
|
-
:category: !ruby/object:Ariel::StructureNode
|
67
|
+
- - <td>
|
68
|
+
- - <td>
|
69
|
+
:current_version: !ruby/object:Ariel::Node::Structure
|
92
70
|
children: {}
|
93
71
|
|
94
|
-
|
95
|
-
|
96
|
-
:name: :category
|
97
|
-
:node_type: :not_list
|
72
|
+
node_name: :current_version
|
73
|
+
node_type: :not_list
|
98
74
|
parent: *id001
|
99
75
|
ruleset: !ruby/object:Ariel::RuleSet
|
100
76
|
end_rules:
|
101
77
|
- !ruby/object:Ariel::Rule
|
102
78
|
direction: :back
|
79
|
+
exhaustive: false
|
103
80
|
landmarks:
|
104
|
-
- - </
|
105
|
-
- -
|
106
|
-
- - </
|
81
|
+
- - </p>
|
82
|
+
- - table
|
83
|
+
- - </p>
|
107
84
|
start_rules:
|
108
85
|
- !ruby/object:Ariel::Rule
|
109
86
|
direction: :forward
|
87
|
+
exhaustive: false
|
110
88
|
landmarks:
|
111
|
-
- -
|
112
|
-
- -
|
113
|
-
|
89
|
+
- - :anything
|
90
|
+
- - caption
|
91
|
+
- - /
|
92
|
+
:name: !ruby/object:Ariel::Node::Structure
|
114
93
|
children: {}
|
115
94
|
|
116
|
-
|
117
|
-
|
118
|
-
:name: :name
|
119
|
-
:node_type: :not_list
|
95
|
+
node_name: :name
|
96
|
+
node_type: :not_list
|
120
97
|
parent: *id001
|
121
98
|
ruleset: !ruby/object:Ariel::RuleSet
|
122
99
|
end_rules:
|
123
100
|
- !ruby/object:Ariel::Rule
|
124
101
|
direction: :back
|
102
|
+
exhaustive: false
|
125
103
|
landmarks:
|
126
104
|
- - </title>
|
127
105
|
start_rules:
|
128
106
|
- !ruby/object:Ariel::Rule
|
129
107
|
direction: :forward
|
108
|
+
exhaustive: false
|
130
109
|
landmarks:
|
131
110
|
- - "-"
|
132
111
|
- - RAA
|
133
112
|
- "-"
|
134
|
-
:owner: !ruby/object:Ariel::
|
113
|
+
:owner: !ruby/object:Ariel::Node::Structure
|
135
114
|
children: {}
|
136
115
|
|
137
|
-
|
138
|
-
|
139
|
-
:name: :owner
|
140
|
-
:node_type: :not_list
|
116
|
+
node_name: :owner
|
117
|
+
node_type: :not_list
|
141
118
|
parent: *id001
|
142
119
|
ruleset: !ruby/object:Ariel::RuleSet
|
143
120
|
end_rules:
|
144
121
|
- !ruby/object:Ariel::Rule
|
145
122
|
direction: :back
|
123
|
+
exhaustive: false
|
146
124
|
landmarks:
|
147
125
|
- - </a>
|
148
126
|
- - id
|
@@ -150,22 +128,22 @@ children:
|
|
150
128
|
start_rules:
|
151
129
|
- !ruby/object:Ariel::Rule
|
152
130
|
direction: :forward
|
131
|
+
exhaustive: false
|
153
132
|
landmarks:
|
154
133
|
- - ">"
|
155
134
|
- - Owner
|
156
135
|
- - ">"
|
157
|
-
:license: !ruby/object:Ariel::
|
136
|
+
:license: !ruby/object:Ariel::Node::Structure
|
158
137
|
children: {}
|
159
138
|
|
160
|
-
|
161
|
-
|
162
|
-
:name: :license
|
163
|
-
:node_type: :not_list
|
139
|
+
node_name: :license
|
140
|
+
node_type: :not_list
|
164
141
|
parent: *id001
|
165
142
|
ruleset: !ruby/object:Ariel::RuleSet
|
166
143
|
end_rules:
|
167
144
|
- !ruby/object:Ariel::Rule
|
168
145
|
direction: :back
|
146
|
+
exhaustive: false
|
169
147
|
landmarks:
|
170
148
|
- - </td>
|
171
149
|
- - Dependency
|
@@ -173,11 +151,49 @@ children:
|
|
173
151
|
start_rules:
|
174
152
|
- !ruby/object:Ariel::Rule
|
175
153
|
direction: :forward
|
154
|
+
exhaustive: false
|
176
155
|
landmarks:
|
177
156
|
- - <td>
|
178
157
|
- - License
|
179
158
|
- - <td>
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
159
|
+
:version_history: &id002 !ruby/object:Ariel::Node::Structure
|
160
|
+
children:
|
161
|
+
:version: !ruby/object:Ariel::Node::Structure
|
162
|
+
children: {}
|
163
|
+
|
164
|
+
node_name: :version
|
165
|
+
node_type: :list_item
|
166
|
+
parent: *id002
|
167
|
+
ruleset: !ruby/object:Ariel::RuleSet
|
168
|
+
end_rules:
|
169
|
+
- !ruby/object:Ariel::Rule
|
170
|
+
direction: :back
|
171
|
+
exhaustive: true
|
172
|
+
landmarks:
|
173
|
+
- - </a>
|
174
|
+
start_rules:
|
175
|
+
- !ruby/object:Ariel::Rule
|
176
|
+
direction: :forward
|
177
|
+
exhaustive: true
|
178
|
+
landmarks:
|
179
|
+
- - ">"
|
180
|
+
node_name: :version_history
|
181
|
+
node_type: :not_list
|
182
|
+
parent: *id001
|
183
|
+
ruleset: !ruby/object:Ariel::RuleSet
|
184
|
+
end_rules:
|
185
|
+
- !ruby/object:Ariel::Rule
|
186
|
+
direction: :back
|
187
|
+
exhaustive: false
|
188
|
+
landmarks:
|
189
|
+
- - </td>
|
190
|
+
start_rules:
|
191
|
+
- !ruby/object:Ariel::Rule
|
192
|
+
direction: :forward
|
193
|
+
exhaustive: false
|
194
|
+
landmarks:
|
195
|
+
- - <td>
|
196
|
+
- - Versions
|
197
|
+
- - <td>
|
198
|
+
node_name: :root
|
199
|
+
node_type: :not_list
|