ariel 0.0.1 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (47) hide show
  1. data/README +49 -83
  2. data/bin/ariel +29 -20
  3. data/examples/google_calculator/structure.rb +2 -2
  4. data/examples/google_calculator/structure.yaml +13 -15
  5. data/examples/raa/labeled/highline.html +5 -4
  6. data/examples/raa/labeled/mongrel.html +9 -8
  7. data/examples/raa/structure.rb +4 -2
  8. data/examples/raa/structure.yaml +94 -78
  9. data/lib/ariel.rb +71 -33
  10. data/lib/ariel/{candidate_selector.rb → candidate_refiner.rb} +39 -38
  11. data/lib/ariel/label_utils.rb +46 -18
  12. data/lib/ariel/labeled_document_loader.rb +77 -0
  13. data/lib/ariel/learner.rb +60 -38
  14. data/lib/ariel/log.rb +67 -0
  15. data/lib/ariel/node.rb +52 -0
  16. data/lib/ariel/node/extracted.rb +90 -0
  17. data/lib/ariel/node/structure.rb +91 -0
  18. data/lib/ariel/rule.rb +114 -32
  19. data/lib/ariel/rule_set.rb +34 -15
  20. data/lib/ariel/token.rb +9 -3
  21. data/lib/ariel/token_stream.rb +32 -17
  22. data/lib/ariel/wildcards.rb +19 -15
  23. data/test/fixtures.rb +45 -3
  24. data/test/specs/candidate_refiner_spec.rb +48 -0
  25. data/test/specs/label_utils_spec.rb +97 -0
  26. data/test/specs/learner_spec.rb +39 -0
  27. data/test/specs/node_extracted_spec.rb +90 -0
  28. data/test/specs/node_spec.rb +76 -0
  29. data/test/specs/node_structure_spec.rb +74 -0
  30. data/test/specs/rule_set_spec.rb +85 -0
  31. data/test/specs/rule_spec.rb +110 -0
  32. data/test/specs/token_stream_spec.rb +100 -7
  33. metadata +21 -28
  34. data/lib/ariel/example_document_loader.rb +0 -59
  35. data/lib/ariel/extracted_node.rb +0 -20
  36. data/lib/ariel/node_like.rb +0 -26
  37. data/lib/ariel/structure_node.rb +0 -75
  38. data/test/ariel_test_case.rb +0 -15
  39. data/test/test_candidate_selector.rb +0 -58
  40. data/test/test_example_document_loader.rb +0 -7
  41. data/test/test_label_utils.rb +0 -15
  42. data/test/test_learner.rb +0 -38
  43. data/test/test_rule.rb +0 -38
  44. data/test/test_structure_node.rb +0 -81
  45. data/test/test_token.rb +0 -16
  46. data/test/test_token_stream.rb +0 -82
  47. data/test/test_wildcards.rb +0 -18
data/README CHANGED
@@ -1,98 +1,64 @@
1
- = Ariel release 0.0.1
1
+ = Ariel release 0.1.0
2
+
3
+ == About - Ariel: A Ruby Information Extraction Library
4
+ Ariel is a library that allows you to extract information from semi-structured
5
+ documents (such as websites). It is different to existing tools because rather
6
+ than expecting the developer to write rules to extract the desired information,
7
+ Ariel will use a small number of labeled examples to generate and learn
8
+ effective extraction rules. It is developed by Alex Bradbury and released under
9
+ the MIT license. Ariel was started as a Google Summer of Code project mentored
10
+ by Austin Ziegler in 2006.
2
11
 
3
12
  == Install
4
13
  gem install ariel
5
14
 
6
15
  == Announcement
7
- This is the first public release of Ariel - A Ruby Information Extraction
8
- Library. See my previous post, ruby-talk:200140[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/200140]
9
- for more background information. This release supports defining a tree document
10
- structure and learning rules to extract each node of this true. Handling of list
11
- extraction and learning is not yet implemented, and is the next immediate
12
- priority. See the examples directory included in this release and below for
13
- discussion of the included examples. Rule learning is functional, and appears to
14
- work well, but many refinements are possible. Look out for more updates and a
15
- new releases shortly.
16
-
17
- == About Ariel
18
- Ariel intends to assist in extracting information from semi-structured
19
- documents including (but not in any way limited to) web pages. Although you
20
- may use libraries such as Hpricot or Rubyful Soup, or even plain Regular
21
- Expressions to achieve the same goal, Ariel approaches the problem very
22
- differently. Ariel relies on the user labeling examples of the data they
23
- want to extract, and then finds patterns across several such labeled
24
- examples in order to produce a set of general rules for extracting this
25
- information from any similar document. It uses the MIT license.
26
-
27
- == Examples
28
- This release includes two examples in the example directory (which should now
29
- be in the directory to which rubygems installed ariel). The first is the
30
- google_calculator directory (inspired by Justin Bailey's post to my Ariel
31
- progress report). The structure is very simple, a calculation is extracted from
32
- the page, and then the actual result is extracted from that calculation. 3
33
- labeled examples are included. Ariel reads each of these, tokenizes them,
34
- and extracts each label. 4 sets of rules are learnt:
35
- 1. Rules to locate the start of the calculation in the original document.
36
- 2. Rules to locate the end of the calculation in the original document (applied
37
- from the end of the document).
38
- 3. Rules to locate the start of the result of the calculation from the
39
- extracted calculation.
40
- 4. Rules to locate the end of the result of the calculation from the extracted
41
- calculation (applied from the end of the calculation).
42
-
43
- Take note of 3 and 4 - this is the advantage of treating a document as a tree in
44
- this way. Deeply nested elements can be located by generating a series of simple
45
- rules, rather than generating a rule with complexity that increases at each
46
- level. Sets of rules are generated because it may not be possible to generate a
47
- single rule that will catch all cases. A rule is found that matches as many of
48
- the examples as possible (and fails on the rest), these examples are then removed
49
- and a rule is found that will match as many of the remaining examples and so on.
50
- When it comes to applying these learnt rules, the rules are applied in order
51
- until there is a rule that matches.
52
-
53
- To see this example for yourself just execute structure.rb in the
54
- examples/google_calculator directory to create a locally writable
55
- structure.yaml. Then do:
56
- ariel -D -m learn -s structure.yaml -d /path/to/examples/google_calculator/labeled
57
-
58
- You'll have to wait a while (see my note about performance below). At the end,
59
- the learnt rules will be printed in YAML format, and structure.yaml will be
60
- updated to include these rules. Apply these learnt rules to some unlabeled
61
- documents by doing:
62
- ariel -D -m extract -s structure.yaml -d /path/to/examples/google_calculator/unlabeled
63
-
64
- You should see the results of a successful extraction printed to your terminal,
65
- such as this one:
66
16
 
67
- Results for unlabeled/2:
68
- calculation: 3.5 U.S. dollars = 1.8486241 British pounds
69
- result: 1.8486241 British pounds
17
+ I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
18
+ Code work. This release should be easy to use, very functional, and hopefully
19
+ useful - so it's worth trying out. I've put a lot of effort in to writing clear
20
+ and straightforward documentation to get your started, so take a look at the
21
+ docs available at http://ariel.rubyforge.org. In particular, flick through the
22
+ tutorial and quick start guide. If you're interested, you may also want to take
23
+ a look at the theory page where I've made a good start on describing the method
24
+ Ariel uses to learn extraction rules. If you have any problems or find any bugs,
25
+ just send me an email or add it to the issue tracker (see link below). Enjoy.
26
+ See the FAQ for a vim snippet to make labeling examples a little easier.
70
27
 
71
- The second example (raa) learns rules using just 2 labeled examples. This is probably
72
- fewer than I'd recommend in most cases, but as it works... This example consists
73
- of project entries in the Ruby Application Archive. The structure of the page is
74
- very flat, so all rules are applied to the full page. Rules are learnt and
75
- applied as shown above. The structure.yaml files included in the examples
76
- directories already include rules generated by Ariel, use these if you just want
77
- to see extraction working.
28
+ == Quickstart/Basic usage
78
29
 
79
- Note: The interface demonstrated by ariel above is not very flexible or
80
- friendly, it's just to serve as a demonstration for the moment.
30
+ * @require 'ariel'@
31
+ * Define a structure for the information you wish to extract:
32
+ structure = Ariel::Node::Structure.new do |r|
33
+ r.item :title
34
+ r.item :body
35
+ r.list :comments do |c|
36
+ c.list_item :comment do |d|
37
+ d.item :author
38
+ d.item :body
39
+ end
40
+ end
41
+ end
42
+ * Collect a few examples of the sort of document you wish to extract information
43
+ from (pages from the same website for instance).
44
+ * Label each example with tags such as <l:title>, <l:comment> and so on in the
45
+ relevant places.
46
+ * Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
47
+ * Find the documents you want to extract information from.
48
+ * extractions = Ariel.extract structure, unlabeled_file1,
49
+ unlabeled_file2
50
+ * extractions[0].search('comments/*/body').each {|e| puts e.extracted_text} =>
51
+ "Great stuff, loving it", "I love life", .....
52
+ * extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
53
+ returns the first result rather than an array of matches).
81
54
 
82
- == Performance
83
- Generating rules takes quite a long time. It is always going to be an intensive
84
- operation, but there are some very simple and obvious improvements in efficiency
85
- that can be made. For a start, the rule candidate refining process currently
86
- re-applies the same rules over and over every time the remaining rule candidates
87
- are ranked. This is where most time is spent, and caching these should make a
88
- big difference. This will definitely be implemented. Other performance
89
- enhancements are bound to be there, but my focus at this time is to get
90
- something that works.
91
55
 
92
56
  == Credits
93
57
  Ariel is developed by Alex Bradbury as a Google Summer of Code project under the
94
58
  mentoring of Austin Ziegler.
95
59
 
96
60
  == Links
97
- Watch my development through the subversion repository at http://rubyforge.org/projects/ariel
98
- I've also just started using the tracker at http://code.google.com/p/ariel/
61
+ SVN Repository: http://rubyforge.org/projects/ariel
62
+ Issue tracker: http://code.google.com/p/ariel/issues/
63
+ Documentation/homepage: http://ariel.rubyforge.org
64
+ RDoc: http://ariel.rubyforge.org/rdoc/
data/bin/ariel CHANGED
@@ -14,43 +14,52 @@ OptionParser.new do |opts|
14
14
  end
15
15
 
16
16
  opts.on('-d', '--dir=DIRECTORY', 'Directory to look for documents to operate on.') do |dir|
17
+ raise ArgumentError, "directory does not exist" unless FileTest.directory? dir
17
18
  options[:dir]=dir
18
19
  end
19
20
 
20
- opts.on('-D', '--debug', 'Directory to look for documents to operate on.') do
21
+ opts.on('-D', '--debug', 'Enable debugging output.') do
21
22
  $DEBUG=true
22
23
  end
23
24
 
24
25
  opts.on('-s', '--structure=STRUCTURE', 'YAML file in which the structure is defined') do |structure|
25
26
  options[:structure]=structure
26
27
  end
28
+
29
+ opts.on('-o', '--output-dir=DIRECTORY', 'Directory to output to') do |dir|
30
+ raise ArgumentError, "directory does not exist" unless FileTest.directory? dir
31
+ options[:output_dir]=dir
32
+ end
27
33
  end.parse!
28
34
 
29
- require 'ariel' #After option parsing to debug setting can take effect
35
+ require 'ariel' #After option parsing so debug setting can take effect
36
+
37
+ files=Dir["#{options[:dir]}/*"].select {|file_name| File.file? file_name}
38
+ structure=YAML.load_file options[:structure]
30
39
 
31
40
  case options[:mode]
32
41
  when "learn"
33
- structure=YAML.load_file options[:structure]
34
- learnt_structure=Ariel::ExampleDocumentLoader.load_directory options[:dir], structure
42
+ Ariel.learn(structure, *files)
35
43
  File.open(options[:structure], 'wb') do |file|
36
- YAML.dump(learnt_structure, file)
37
- end
38
- learnt_structure.each_descendant do |structure_node|
39
- puts structure_node.meta.name.to_s
40
- puts structure_node.ruleset.to_yaml
44
+ YAML.dump(structure, file)
41
45
  end
46
+
42
47
  when "extract"
43
- learnt_structure=YAML.load_file options[:structure]
44
- Dir.glob("#{options[:dir]}/*") do |file|
45
- tokenstream=Ariel::TokenStream.new
46
- tokenstream.tokenize File.read(file)
47
- root_node=Ariel::ExtractedNode.new :root, tokenstream, learnt_structure
48
- learnt_structure.apply_extraction_tree_on root_node
49
- puts "Results for #{file}:"
50
- root_node.each_descendant do |node|
51
- puts "#{node.meta.name}: #{node.tokenstream.text}"
48
+ extractions = Ariel.extract(structure, *files)
49
+ if options[:output_dir]
50
+ extractions.zip(files) do |extraction, file|
51
+ filename=File.join(options[:output_dir], File.basename(file)+'.yaml')
52
+ File.open(filename, 'wb') do |f|
53
+ YAML.dump(extraction, f)
54
+ end
55
+ end
56
+ else
57
+ puts "No --output-dir given, so printing extractions to stdout"
58
+ extractions.each do |extraction|
59
+ extraction.each_descendant do |node|
60
+ puts "#{node.node_name}: #{node.tokenstream.text}"
61
+ end
62
+ puts #Blank line looks prettier
52
63
  end
53
- puts
54
- # puts root_node.to_yaml
55
64
  end
56
65
  end
@@ -1,12 +1,12 @@
1
1
  require 'ariel'
2
2
  require 'yaml'
3
3
 
4
- structure = Ariel::StructureNode.new do |r|
4
+ structure = Ariel::Node::Structure.new do |r|
5
5
  r.item :calculation do |c|
6
6
  c.item :result
7
7
  end
8
8
  end
9
9
 
10
- File.open('structure.yaml') do |file|
10
+ File.open('structure.yaml', 'w') do |file|
11
11
  YAML.dump structure, file
12
12
  end
@@ -1,46 +1,44 @@
1
- --- &id002 !ruby/object:Ariel::StructureNode
1
+ --- &id002 !ruby/object:Ariel::Node::Structure
2
2
  children:
3
- :calculation: &id001 !ruby/object:Ariel::StructureNode
3
+ :calculation: &id001 !ruby/object:Ariel::Node::Structure
4
4
  children:
5
- :result: !ruby/object:Ariel::StructureNode
5
+ :result: !ruby/object:Ariel::Node::Structure
6
6
  children: {}
7
7
 
8
- meta: !ruby/object:OpenStruct
9
- table:
10
- :node_type: :not_list
11
- :name: :result
8
+ node_name: :result
9
+ node_type: :not_list
12
10
  parent: *id001
13
11
  ruleset: !ruby/object:Ariel::RuleSet
14
12
  end_rules:
15
13
  - !ruby/object:Ariel::Rule
16
14
  direction: :back
15
+ exhaustive: false
17
16
  landmarks: []
18
17
 
19
18
  start_rules:
20
19
  - !ruby/object:Ariel::Rule
21
20
  direction: :forward
21
+ exhaustive: false
22
22
  landmarks:
23
23
  - - "="
24
- meta: !ruby/object:OpenStruct
25
- table:
26
- :node_type: :not_list
27
- :name: :calculation
24
+ node_name: :calculation
25
+ node_type: :not_list
28
26
  parent: *id002
29
27
  ruleset: !ruby/object:Ariel::RuleSet
30
28
  end_rules:
31
29
  - !ruby/object:Ariel::Rule
32
30
  direction: :back
31
+ exhaustive: false
33
32
  landmarks:
34
33
  - - </b>
35
34
  - - </b>
36
35
  start_rules:
37
36
  - !ruby/object:Ariel::Rule
38
37
  direction: :forward
38
+ exhaustive: false
39
39
  landmarks:
40
40
  - - <b>
41
41
  - - gif
42
42
  - - <b>
43
- meta: !ruby/object:OpenStruct
44
- table:
45
- :node_type: :not_list
46
- :name: :root
43
+ node_name: :root
44
+ node_type: :not_list
@@ -96,17 +96,18 @@ highline / <l:current_version>1.2.0</l:current_version>
96
96
 
97
97
  <tr><th>Versions: </th>
98
98
  <td>
99
- <l:version_history>[<a href="project/highline/1.2.0">1.2.0</a> (2006-03-23)]
99
+ <l:version_history>[<a
100
+ href="project/highline/1.2.0"><l:version>1.2.0</l:version></a> (2006-03-23)]
100
101
 
101
102
  [<a href="project/highline/1.0.2">1.0.2</a> (2006-02-20)]
102
103
 
103
- [<a href="project/highline/1.0.1">1.0.1</a> (2005-07-07)]
104
+ [<a href="project/highline/1.0.1"><l:version>1.0.1</l:version></a> (2005-07-07)]
104
105
 
105
106
  [<a href="project/highline/1.0.0">1.0.0</a> (2005-07-07)]
106
107
 
107
- [<a href="project/highline/0.6.1">0.6.1</a> (2005-05-26)]
108
+ [<a href="project/highline/0.6.1"><l:version>0.6.1</l:version></a> (2005-05-26)]
108
109
 
109
- [<a href="project/highline/0.6.0">0.6.0</a>
110
+ [<a href="project/highline/0.6.0"><l:version>0.6.0</l:version></a>
110
111
  (2005-05-21)]</l:version_history>
111
112
 
112
113
  </td>
@@ -126,21 +126,22 @@ mongrel / <l:current_version>0.3.12</l:current_version>
126
126
 
127
127
  <tr><th>Versions: </th>
128
128
  <td>
129
- <l:version_history>[<a href="project/mongrel/0.3.12">0.3.12</a> (2006-03-30)]
129
+ <l:version_history>[<a
130
+ href="project/mongrel/0.3.12"><l:version>0.3.12</l:version></a> (2006-03-30)]
130
131
 
131
- [<a href="project/mongrel/0.3.11">0.3.11</a> (2006-03-15)]
132
+ [<a href="project/mongrel/0.3.11"><l:version>0.3.11</l:version></a> (2006-03-15)]
132
133
 
133
- [<a href="project/mongrel/0.3.10">0.3.10</a> (2006-03-12)]
134
+ [<a href="project/mongrel/0.3.10"><l:version>0.3.10</l:version></a> (2006-03-12)]
134
135
 
135
- [<a href="project/mongrel/0.3.9">0.3.9</a> (2006-03-06)]
136
+ [<a href="project/mongrel/0.3.9"><l:version>0.3.9</l:version></a> (2006-03-06)]
136
137
 
137
- [<a href="project/mongrel/0.3.8">0.3.8</a> (2006-03-04)]
138
+ [<a href="project/mongrel/0.3.8"><l:version>0.3.8</l:version></a> (2006-03-04)]
138
139
 
139
- [<a href="project/mongrel/0.3.6">0.3.6</a> (2006-02-23)]
140
+ [<a href="project/mongrel/0.3.6"><l:version>0.3.6</l:version></a> (2006-02-23)]
140
141
 
141
- [<a href="project/mongrel/0.3.2">0.3.2</a> (2006-02-13)]
142
+ [<a href="project/mongrel/0.3.2"><l:version>0.3.2</l:version></a> (2006-02-13)]
142
143
 
143
- [<a href="project/mongrel/0.3.1">0.3.1</a> (2006-02-12)]</l:version_history>
144
+ [<a href="project/mongrel/0.3.1"><l:version>0.3.1</l:version></a> (2006-02-12)]</l:version_history>
144
145
 
145
146
  </td>
146
147
  </tr>
@@ -1,7 +1,7 @@
1
1
  require 'ariel'
2
2
  require 'yaml'
3
3
 
4
- structure = Ariel::StructureNode.new do |r|
4
+ structure = Ariel::Node::Structure.new do |r|
5
5
  r.item :name
6
6
  r.item :current_version
7
7
  r.item :short_description
@@ -9,7 +9,9 @@ structure = Ariel::StructureNode.new do |r|
9
9
  r.item :owner
10
10
  r.item :homepage
11
11
  r.item :license
12
- r.item :version_history
12
+ r.list :version_history do |v|
13
+ v.list_item :version
14
+ end
13
15
  end
14
16
 
15
17
  File.open('structure.yaml', 'wb') do |file|
@@ -1,38 +1,16 @@
1
- --- &id001 !ruby/object:Ariel::StructureNode
1
+ --- &id001 !ruby/object:Ariel::Node::Structure
2
2
  children:
3
- :version_history: !ruby/object:Ariel::StructureNode
3
+ :short_description: !ruby/object:Ariel::Node::Structure
4
4
  children: {}
5
5
 
6
- meta: !ruby/object:OpenStruct
7
- table:
8
- :name: :version_history
9
- :node_type: :not_list
10
- parent: *id001
11
- ruleset: !ruby/object:Ariel::RuleSet
12
- end_rules:
13
- - !ruby/object:Ariel::Rule
14
- direction: :back
15
- landmarks:
16
- - - </td>
17
- start_rules:
18
- - !ruby/object:Ariel::Rule
19
- direction: :forward
20
- landmarks:
21
- - - <td>
22
- - - Versions
23
- - - <td>
24
- :short_description: !ruby/object:Ariel::StructureNode
25
- children: {}
26
-
27
- meta: !ruby/object:OpenStruct
28
- table:
29
- :name: :short_description
30
- :node_type: :not_list
6
+ node_name: :short_description
7
+ node_type: :not_list
31
8
  parent: *id001
32
9
  ruleset: !ruby/object:Ariel::RuleSet
33
10
  end_rules:
34
11
  - !ruby/object:Ariel::Rule
35
12
  direction: :back
13
+ exhaustive: false
36
14
  landmarks:
37
15
  - - </td>
38
16
  - - Category
@@ -40,109 +18,109 @@ children:
40
18
  start_rules:
41
19
  - !ruby/object:Ariel::Rule
42
20
  direction: :forward
21
+ exhaustive: false
43
22
  landmarks:
44
23
  - - <td>
45
- :current_version: !ruby/object:Ariel::StructureNode
24
+ :homepage: !ruby/object:Ariel::Node::Structure
46
25
  children: {}
47
26
 
48
- meta: !ruby/object:OpenStruct
49
- table:
50
- :name: :current_version
51
- :node_type: :not_list
27
+ node_name: :homepage
28
+ node_type: :not_list
52
29
  parent: *id001
53
30
  ruleset: !ruby/object:Ariel::RuleSet
54
31
  end_rules:
55
32
  - !ruby/object:Ariel::Rule
56
33
  direction: :back
34
+ exhaustive: false
57
35
  landmarks:
58
- - - </p>
59
- - - table
60
- - - </p>
36
+ - - </a>
37
+ - - Download
38
+ - - </a>
61
39
  start_rules:
62
40
  - !ruby/object:Ariel::Rule
63
41
  direction: :forward
42
+ exhaustive: false
64
43
  landmarks:
65
- - - /
66
- - - caption
67
- - - /
68
- :homepage: !ruby/object:Ariel::StructureNode
44
+ - - ">"
45
+ - - rubyforge
46
+ - - ">"
47
+ :category: !ruby/object:Ariel::Node::Structure
69
48
  children: {}
70
49
 
71
- meta: !ruby/object:OpenStruct
72
- table:
73
- :name: :homepage
74
- :node_type: :not_list
50
+ node_name: :category
51
+ node_type: :not_list
75
52
  parent: *id001
76
53
  ruleset: !ruby/object:Ariel::RuleSet
77
54
  end_rules:
78
55
  - !ruby/object:Ariel::Rule
79
56
  direction: :back
57
+ exhaustive: false
80
58
  landmarks:
81
- - - </a>
82
- - - Download
83
- - - </a>
59
+ - - </td>
60
+ - - Status
61
+ - - </td>
84
62
  start_rules:
85
63
  - !ruby/object:Ariel::Rule
86
64
  direction: :forward
65
+ exhaustive: false
87
66
  landmarks:
88
- - - ">"
89
- - - rubyforge
90
- - - ">"
91
- :category: !ruby/object:Ariel::StructureNode
67
+ - - <td>
68
+ - - <td>
69
+ :current_version: !ruby/object:Ariel::Node::Structure
92
70
  children: {}
93
71
 
94
- meta: !ruby/object:OpenStruct
95
- table:
96
- :name: :category
97
- :node_type: :not_list
72
+ node_name: :current_version
73
+ node_type: :not_list
98
74
  parent: *id001
99
75
  ruleset: !ruby/object:Ariel::RuleSet
100
76
  end_rules:
101
77
  - !ruby/object:Ariel::Rule
102
78
  direction: :back
79
+ exhaustive: false
103
80
  landmarks:
104
- - - </td>
105
- - - Status
106
- - - </td>
81
+ - - </p>
82
+ - - table
83
+ - - </p>
107
84
  start_rules:
108
85
  - !ruby/object:Ariel::Rule
109
86
  direction: :forward
87
+ exhaustive: false
110
88
  landmarks:
111
- - - <td>
112
- - - <td>
113
- :name: !ruby/object:Ariel::StructureNode
89
+ - - :anything
90
+ - - caption
91
+ - - /
92
+ :name: !ruby/object:Ariel::Node::Structure
114
93
  children: {}
115
94
 
116
- meta: !ruby/object:OpenStruct
117
- table:
118
- :name: :name
119
- :node_type: :not_list
95
+ node_name: :name
96
+ node_type: :not_list
120
97
  parent: *id001
121
98
  ruleset: !ruby/object:Ariel::RuleSet
122
99
  end_rules:
123
100
  - !ruby/object:Ariel::Rule
124
101
  direction: :back
102
+ exhaustive: false
125
103
  landmarks:
126
104
  - - </title>
127
105
  start_rules:
128
106
  - !ruby/object:Ariel::Rule
129
107
  direction: :forward
108
+ exhaustive: false
130
109
  landmarks:
131
110
  - - "-"
132
111
  - - RAA
133
112
  - "-"
134
- :owner: !ruby/object:Ariel::StructureNode
113
+ :owner: !ruby/object:Ariel::Node::Structure
135
114
  children: {}
136
115
 
137
- meta: !ruby/object:OpenStruct
138
- table:
139
- :name: :owner
140
- :node_type: :not_list
116
+ node_name: :owner
117
+ node_type: :not_list
141
118
  parent: *id001
142
119
  ruleset: !ruby/object:Ariel::RuleSet
143
120
  end_rules:
144
121
  - !ruby/object:Ariel::Rule
145
122
  direction: :back
123
+ exhaustive: false
146
124
  landmarks:
147
125
  - - </a>
148
126
  - - id
@@ -150,22 +128,22 @@ children:
150
128
  start_rules:
151
129
  - !ruby/object:Ariel::Rule
152
130
  direction: :forward
131
+ exhaustive: false
153
132
  landmarks:
154
133
  - - ">"
155
134
  - - Owner
156
135
  - - ">"
157
- :license: !ruby/object:Ariel::StructureNode
136
+ :license: !ruby/object:Ariel::Node::Structure
158
137
  children: {}
159
138
 
160
- meta: !ruby/object:OpenStruct
161
- table:
162
- :name: :license
163
- :node_type: :not_list
139
+ node_name: :license
140
+ node_type: :not_list
164
141
  parent: *id001
165
142
  ruleset: !ruby/object:Ariel::RuleSet
166
143
  end_rules:
167
144
  - !ruby/object:Ariel::Rule
168
145
  direction: :back
146
+ exhaustive: false
169
147
  landmarks:
170
148
  - - </td>
171
149
  - - Dependency
@@ -173,11 +151,49 @@ children:
173
151
  start_rules:
174
152
  - !ruby/object:Ariel::Rule
175
153
  direction: :forward
154
+ exhaustive: false
176
155
  landmarks:
177
156
  - - <td>
178
157
  - - License
179
158
  - - <td>
180
- meta: !ruby/object:OpenStruct
181
- table:
182
- :name: :root
183
- :node_type: :not_list
159
+ :version_history: &id002 !ruby/object:Ariel::Node::Structure
160
+ children:
161
+ :version: !ruby/object:Ariel::Node::Structure
162
+ children: {}
163
+
164
+ node_name: :version
165
+ node_type: :list_item
166
+ parent: *id002
167
+ ruleset: !ruby/object:Ariel::RuleSet
168
+ end_rules:
169
+ - !ruby/object:Ariel::Rule
170
+ direction: :back
171
+ exhaustive: true
172
+ landmarks:
173
+ - - </a>
174
+ start_rules:
175
+ - !ruby/object:Ariel::Rule
176
+ direction: :forward
177
+ exhaustive: true
178
+ landmarks:
179
+ - - ">"
180
+ node_name: :version_history
181
+ node_type: :not_list
182
+ parent: *id001
183
+ ruleset: !ruby/object:Ariel::RuleSet
184
+ end_rules:
185
+ - !ruby/object:Ariel::Rule
186
+ direction: :back
187
+ exhaustive: false
188
+ landmarks:
189
+ - - </td>
190
+ start_rules:
191
+ - !ruby/object:Ariel::Rule
192
+ direction: :forward
193
+ exhaustive: false
194
+ landmarks:
195
+ - - <td>
196
+ - - Versions
197
+ - - <td>
198
+ node_name: :root
199
+ node_type: :not_list