rley 0.4.05 → 0.4.06

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 89fbee176719b82e8e1b1676ceb89b837e9ad249
4
- data.tar.gz: ff72f2224e98479729b441aa0a38b0da61fd34e4
3
+ metadata.gz: 57136a1796f3625b841a2f1fdb0965b4314b6224
4
+ data.tar.gz: 80fd87d6ad9f68314278d5b4c15ef85a241647c3
5
5
  SHA512:
6
- metadata.gz: a8eed1075570fac6f89f6ed16164c52026426dd0c497030fd31bbfd861e7873744be3acc14afa046da40f67ddf9d4987301dac76227ac4d0f504e40f46caa862
7
- data.tar.gz: 684e37a2a236bc9e34e4bacba4775b61608be5dab58b3e4af414da222cf0c79f3db98631b904969c23b3ed882368ea553c7c25f9099cd09f17d5d5105111ba21
6
+ metadata.gz: 3922fe824b8d721892b71022e64a79f4117ed0135f0b02fdaf76f5edfef083d345192328f3086a80fc0484e7dac14f7ec188a51c942bab2b14c8a196e4c9ffc4
7
+ data.tar.gz: 836625169012043daeba92c142fda1e80e6f4eaf085437c2d1bc006e57ca8de99d70619fce22f2d6d8f000468e9c93c65b8e073d80f68084619d7c2070e18197
data/CHANGELOG.md CHANGED
@@ -1,3 +1,17 @@
1
+ ### 0.4.06 / 2017-05-25
2
+ * [FIX] File `formatter\asciitree.rb` fixed inconsistency in comments that caused Yard warnings.
3
+ * [FIX] File `formatter\bracket_notation.rb` fixed inconsistency in comments that caused Yard warnings.
4
+ * [FIX] File `parser\parse_entry_set.rb` fixed inconsistency in comments that caused Yard warnings.
5
+ * [NEW] Method `Grammar#diagnose` performs a number of checks on the grammar. It detects whether:
6
+ there are undefined non-terminals (i.e. non-terminals without a rule that define them)
7
+ there are non-productive non-terminals (i.e. non-terminals that don't derive a sting of terminals)
8
+ there are nullable productions and non-terminals.
9
+ * [NEW] Method `GrmFlowGraph#traverse_df` performs depth-first traversal of the GFG.
10
+ * [NEW] Method `GrmFlowGraph#diagnose` determines which terminals are reachable from the start symbol.
11
+ * [NEW] Method `GrmSymbol#generative?` inidcates whether the grammar symbol can produce a sequence of terminals.
12
+ * [CHANGE] Class `GrammarBuilder` Improved the API documentation.
13
+
14
+
1
15
  ### 0.4.05 / 2017-05-06
2
16
  * [CHANGE] File `README.md` Added documentation on how to build parse trees and manipulate them.
3
17
  * [CHANGE] File `examples\NLP\mini_en_demo.rb` now emits different parse tree representations.
data/README.md CHANGED
@@ -182,9 +182,9 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
182
182
 
183
183
  At this stage, we're done with parsing. What we need next are convenient means
184
184
  to exploit the parse result. As it is, the `result` variable in the last code snippet
185
- above is a data structure ("Earley item sets") that is tightly related to the intricate details
185
+ above is a data structure ("Earley item sets") that is highly depending on the intricate details
186
186
  of the Earley's parsing algorithm. Obviously, it contains all the necessary data to exploit
187
- the parsing results but it is too low-level and inconvenient from a programming viewpoint.
187
+ the parsing results but it is rather low-level and inconvenient from a programming viewpoint.
188
188
  Therefore, __Rley__ provides out of the box two convenient data structures for
189
189
  representing the parse outcome:
190
190
  - Parse tree (optimal when the parse is unambiguous)
@@ -201,10 +201,10 @@ OK. Now that we have the parse tree, what we can do with it?
201
201
  One option is to manipulate the parse tree and its node directly. For instance,
202
202
  one could write code to customize and transform the parse tree. This approach gives
203
203
  most the of flexibility needed for advanced applications. The other, more common
204
- option is to rely on the parse tree visitor instantiated from the `Rley::ParseTreeVisitor` class.
204
+ option is to use an `Rley::ParseTreeVisitor` instance.
205
205
  Such a visitor walks over the parse tree nodes and generates visit events that
206
206
  are dispatched to subscribed event listeners. All this may, at first, sound
207
- complicate but the coming code snippets will show it otherwise.
207
+ complicated but the coming code snippets show it otherwise.
208
208
 
209
209
  Let's do it by:
210
210
  - Creating a parse tree visitor
@@ -327,7 +327,7 @@ by yet another one:
327
327
 
328
328
  ```ruby
329
329
  # Let's create a formatter that will render the parse tree in labelled bracket notation
330
- renderer = Rley::Formatter::BracketNotation .new($stdout)
330
+ renderer = Rley::Formatter::BracketNotation.new($stdout)
331
331
 
332
332
  # Subscribe the formatter to the visitor's event and launch the visit
333
333
  renderer.render(visitor)
@@ -338,7 +338,7 @@ This results in the strange-looking output:
338
338
  [S [NP [Proper-Noun John]][VP [Verb saw][NP [Proper-Noun Mary]][PP [Preposition with][NP [Determiner a][Noun telescope]]]]]
339
339
  ```
340
340
 
341
- This output is in a format that is recognized by many NLP software.
341
+ This output is in a format that is recognized by many NLP softwares.
342
342
  The next diagram was created by copy-pasting the output above in the online tool
343
343
  [RSyntaxTree](http://yohasebe.com/rsyntaxtree/).
344
344
  By the way, this tool is also a Ruby gem, [rsyntaxtree](https://rubygems.org/gems/rsyntaxtree).
@@ -103,7 +103,7 @@ visitor = Rley::ParseTreeVisitor.new(ptree)
103
103
  renderer = Rley::Formatter::Asciitree.new($stdout)
104
104
 
105
105
  # Let's create a formatter that will render the parse tree in labelled bracket notation
106
- # renderer = Rley::Formatter::BracketNotation .new($stdout)
106
+ # renderer = Rley::Formatter::BracketNotation.new($stdout)
107
107
 
108
108
  # Subscribe the formatter to the visitor's event and launch the visit
109
109
  renderer.render(visitor)
@@ -3,7 +3,7 @@
3
3
 
4
4
  module Rley # Module used as a namespace
5
5
  # The version number of the gem.
6
- Version = '0.4.05'.freeze
6
+ Version = '0.4.06'.freeze
7
7
 
8
8
  # Brief description of the gem.
9
9
  Description = "Ruby implementation of the Earley's parsing algorithm".freeze
@@ -36,8 +36,8 @@ module Rley # This module is used as a namespace
36
36
  # Method called by a ParseTreeVisitor to which the formatter subscribed.
37
37
  # Notification of a visit event: the visitor is about to visit
38
38
  # the children of a non-terminal node
39
- # @param _parent [NonTerminalNode]
40
- # @param _children [Array] array of children nodes
39
+ # @param parent [NonTerminalNode]
40
+ # @param children [Array] array of children nodes
41
41
  def before_subnodes(parent, children)
42
42
  rank_of(parent)
43
43
  curr_path << parent
@@ -47,7 +47,7 @@ module Rley # This module is used as a namespace
47
47
  # Method called by a ParseTreeVisitor to which the formatter subscribed.
48
48
  # Notification of a visit event: the visitor is about to visit
49
49
  # a non-terminal node
50
- # @param nonterm [NonTerminalNode]
50
+ # @param aNonTerm [NonTerminalNode]
51
51
  def before_non_terminal(aNonTerm)
52
52
  emit(aNonTerm)
53
53
  end
@@ -56,7 +56,7 @@ module Rley # This module is used as a namespace
56
56
  # Method called by a ParseTreeVisitor to which the formatter subscribed.
57
57
  # Notification of a visit event: the visitor is about to visit
58
58
  # a terminal node
59
- # @param _term [TerminalNode]
59
+ # @param aTerm [TerminalNode]
60
60
  def before_terminal(aTerm)
61
61
  emit(aTerm, ": '#{aTerm.token.lexeme}'")
62
62
  end
@@ -23,7 +23,7 @@ module Rley # This module is used as a namespace
23
23
  # Method called by a ParseTreeVisitor to which the formatter subscribed.
24
24
  # Notification of a visit event: the visitor is about to visit
25
25
  # a non-terminal node
26
- # @param nonterm [NonTerminalNode]
26
+ # @param aNonTerm [NonTerminalNode]
27
27
  def before_non_terminal(aNonTerm)
28
28
  write("[#{aNonTerm.symbol.name} ")
29
29
  end
@@ -32,7 +32,7 @@ module Rley # This module is used as a namespace
32
32
  # Method called by a ParseTreeVisitor to which the formatter subscribed.
33
33
  # Notification of a visit event: the visitor is about to visit
34
34
  # a terminal node
35
- # @param _term [TerminalNode]
35
+ # @param aTerm [TerminalNode]
36
36
  def before_terminal(aTerm)
37
37
  write("[#{aTerm.symbol.name} ")
38
38
  end
@@ -40,7 +40,7 @@ module Rley # This module is used as a namespace
40
40
  # Method called by a ParseTreeVisitor to which the formatter subscribed.
41
41
  # Notification of a visit event: the visitor completed the visit of
42
42
  # a terminal node.
43
- # @param _term [TerminalNode]
43
+ # @param aTerm [TerminalNode]
44
44
  def after_terminal(aTerm)
45
45
  # Escape all opening and closing square brackets
46
46
  escape_lbrackets = aTerm.token.lexeme.gsub(/\[/, "\\[")
@@ -14,7 +14,7 @@ module Rley # This module is used as a namespace
14
14
  # Pre-condition: theSuccessor is an StartVertex
15
15
  def initialize(thePredecessor, theSuccessor)
16
16
  super(thePredecessor, theSuccessor)
17
- do_set_key(thePredecessor, theSuccessor)
17
+ do_set_key(thePredecessor, theSuccessor)
18
18
  end
19
19
 
20
20
  private
@@ -1,3 +1,4 @@
1
+ require 'set'
1
2
  require_relative 'start_vertex'
2
3
  require_relative 'end_vertex'
3
4
  require_relative 'item_vertex'
@@ -36,6 +37,76 @@ module Rley # This module is used as a namespace
36
37
  vertices.find { |a_vertex| a_vertex.label == aVertexLabel }
37
38
  end
38
39
 
40
+ # Perform a diagnosis of the grammar elements (symbols and rules)
41
+ # in order to detect:
42
+ # If one wants to remove useless rules, then do first:
43
+ # elimination of non-generating symbols
44
+ # then elimination of unreachable symbols
45
+ def diagnose
46
+ mark_unreachable_symbols
47
+ end
48
+
49
+ Branching = Struct.new(:vertex, :to_visit, :visited) do
50
+ def initialize(aVertex)
51
+ super(aVertex)
52
+ self.to_visit = aVertex.edges.dup
53
+ self.visited = []
54
+ end
55
+
56
+ def done?
57
+ self.to_visit.empty?
58
+ end
59
+
60
+ def next_edge
61
+ next_one = self.to_visit.shift
62
+ self.visited << next_one.successor unless next_one.nil?
63
+
64
+ return next_one
65
+ end
66
+ end
67
+
68
+ # Walk over all the vertices of the graph that are reachable from a given
69
+ # start vertex. This is a depth-first graph traversal.
70
+ # @param aStartVertex [StartVertex] the depth-first traversal begins from here
71
+ # @param visitAction [Proc] block code called when a new graph vertex is found
72
+ def traverse_df(aStartVertex, &visitAction)
73
+ visited = Set.new
74
+ stack = []
75
+ visitee = aStartVertex
76
+
77
+ begin
78
+ first_time = !visited.include?(visitee)
79
+ if first_time
80
+ visitAction.call(visitee)
81
+ visited << visitee
82
+ end
83
+
84
+ case visitee
85
+ when Rley::GFG::StartVertex
86
+ if first_time
87
+ stack.push(Branching.new(visitee))
88
+ curr_edge = stack.last.next_edge
89
+ else
90
+ # Skip start and end vertices
91
+ # Retrieve the corresponding return edge
92
+ curr_edge = get_matching_return(curr_edge)
93
+ end
94
+
95
+ when Rley::GFG::EndVertex
96
+ stack.pop if stack.last.done?
97
+ break if stack.empty?
98
+ curr_edge = stack.last.next_edge
99
+
100
+ else
101
+ # All other vertex types have only one successor
102
+ curr_edge = visitee.edges[0]
103
+ end
104
+ visitee = curr_edge.successor unless curr_edge.nil?
105
+ end until stack.empty?
106
+ # Now process the end vertex matching the initial start vertex
107
+ visitAction.call(end_vertex_for[aStartVertex.non_terminal])
108
+ end
109
+
39
110
  private
40
111
 
41
112
  def add_vertex(aVertex)
@@ -68,7 +139,15 @@ module Rley # This module is used as a namespace
68
139
  def build_all_starts_ends(theDottedItems)
69
140
  productions_raw = theDottedItems.map(&:production)
70
141
  productions = productions_raw.uniq
71
- productions.each { |prod| build_start_end_for(prod.lhs) }
142
+ all_nterms = Set.new
143
+ productions.each do |prod|
144
+ all_nterms << prod.lhs
145
+ nterms_of_rhs = prod.rhs.members.select do |symb|
146
+ symb.kind_of?(Syntax::NonTerminal)
147
+ end
148
+ all_nterms.merge(nterms_of_rhs)
149
+ end
150
+ all_nterms.each { |nterm| build_start_end_for(nterm) }
72
151
  end
73
152
 
74
153
  # if there is not yet a start vertex labelled .N in the GFG:
@@ -179,12 +258,43 @@ module Rley # This module is used as a namespace
179
258
  # Retrieve corresponding end vertex
180
259
  end_vertex = end_vertex_for[nt_symbol]
181
260
  # Create an edge end vertex -> return vertex
182
- ReturnEdge.new(end_vertex, return_vertex)
261
+ ReturnEdge.new(end_vertex, return_vertex) if end_vertex
183
262
  end
184
263
 
185
264
  def build_shortcut_edge(fromVertex, toVertex)
186
265
  ShortcutEdge.new(fromVertex, toVertex)
187
266
  end
267
+
268
+
269
+ # Retrieve the return edge that matches the given
270
+ # call edge.
271
+ def get_matching_return(aCallEdge)
272
+ # Calculate key of return edge from the key of call edge
273
+ ret_key = aCallEdge.key.sub(/CALL/, 'RET')
274
+
275
+ # Retrieve the corresponding end vertex
276
+ end_vertex = end_vertex_for[aCallEdge.successor.non_terminal]
277
+
278
+ # Retrieve the return edge with specified key
279
+ return_edge = end_vertex.edges.find { |edge| edge.key == ret_key }
280
+ end
281
+
282
+ # Mark non-terminal symbols that cannot be derived from the start symbol.
283
+ # In a GFG, a non-terminal symbol N is unreachable if there is no path
284
+ # from the start symbol to the start node .N
285
+ def mark_unreachable_symbols()
286
+ # Mark all non-terminals as unreachable
287
+ start_vertex_for.values.each do |a_vertex|
288
+ a_vertex.non_terminal.unreachable = true
289
+ end
290
+
291
+ # Now traverse graph from start vertex
292
+ # and make all visited non-terminals as reachable
293
+ traverse_df(start_vertex) do |a_vertex|
294
+ next unless a_vertex.kind_of?(StartVertex)
295
+ a_vertex.non_terminal.unreachable = false
296
+ end
297
+ end
188
298
  end # class
189
299
  end # module
190
300
  end # module
@@ -39,7 +39,7 @@ module Rley # This module is used as a namespace
39
39
 
40
40
  # Append the given entry (if it isn't yet in the set)
41
41
  # to the list of parse entries
42
- # @param aState [ParseEntry] the parse entry to push.
42
+ # @param anEntry [ParseEntry] the parse entry to push.
43
43
  # @return [ParseEntry] the passed parse entry it doesn't added
44
44
  def push_entry(anEntry)
45
45
  match = entries.find { |entry| entry == anEntry }
@@ -1,4 +1,4 @@
1
- # File: exceptions.rb
1
+ # File: rley_error.rb
2
2
 
3
3
  module Rley # Module used as a namespace
4
4
  # @abstract
@@ -1,15 +1,16 @@
1
1
  require 'set'
2
+ require_relative '../rley_error'
2
3
 
3
4
  module Rley # This module is used as a namespace
4
5
  module Syntax # This module is used as a namespace
5
6
  # A grammar specifies the syntax of a language.
6
- # Formally, a grammar has:
7
+ # Formally, a grammar has:
7
8
  # * One start symbol,
8
9
  # * One or more other production rules,
9
10
  # * Each production has a rhs that is a sequence of grammar symbols.
10
- # * Grammar symbols are categorized into
11
- # -terminal symbols
12
- # -non-terminal symbols
11
+ # * Grammar symbols are categorized into:
12
+ # -terminal symbols
13
+ # -non-terminal symbols
13
14
  class Grammar
14
15
  # A non-terminal symbol that represents all the possible strings
15
16
  # in the language.
@@ -30,15 +31,16 @@ module Rley # This module is used as a namespace
30
31
  @symbols = []
31
32
  @name2symbol = {}
32
33
  valid_productions = validate_productions(theProductions)
34
+ valid_productions.each { |prod| add_production(prod) }
35
+ diagnose
36
+
33
37
  # TODO: use topological sorting
34
38
  @start_symbol = valid_productions[0].lhs
35
- valid_productions.each { |prod| add_production(prod) }
36
- compute_nullable
37
39
  end
38
40
 
39
41
  # @return [Array] The list of non-terminals in the grammar.
40
42
  def non_terminals()
41
- return symbols.select { |s| s.kind_of?(NonTerminal) }
43
+ @non_terminals ||= symbols.select { |s| s.kind_of?(NonTerminal) }
42
44
  end
43
45
 
44
46
  # @return [Production] The start production of the grammar (i.e.
@@ -61,10 +63,101 @@ module Rley # This module is used as a namespace
61
63
  the_lhs = aProduction.lhs
62
64
  add_symbol(the_lhs)
63
65
 
64
- # TODO: remove quadratic execution time
65
66
  aProduction.rhs.each { |symb| add_symbol(symb) }
66
67
  end
67
68
 
69
+ # Perform some check of the grammar.
70
+ def diagnose()
71
+ mark_undefined
72
+ mark_generative
73
+ compute_nullable
74
+ end
75
+
76
+ # Check that each non-terminal appears at least once in lhs.
77
+ # If it is not the case, then mark it as undefined
78
+ def mark_undefined
79
+ defined = Set.new
80
+
81
+ # Defined non-terminals appear at least once as lhs of a production
82
+ rules.each { |prod| defined << prod.lhs }
83
+ defined.each { |n_term| n_term.undefined = false }
84
+
85
+ # Retrieve all non-terminals that aren't marked as non-undefined
86
+ undefined = non_terminals.select { |n_term| n_term.undefined?.nil? }
87
+
88
+ undefined.each { |n_term| n_term.undefined = true }
89
+ end
90
+
91
+
92
+
93
+ # Mark all non-terminals and production rules as
94
+ # generative or not.
95
+ # A production is generative when it can derive a string of terminals.
96
+ # A production is therefore generative when all its rhs members are
97
+ # themselves generatives.
98
+ # A non-terminal is generative if at least one of its defining production
99
+ # is itself generative.
100
+ def mark_generative
101
+ curr_marked = []
102
+
103
+ # Iterate until no new rule can be marked.
104
+ begin
105
+ prev_marked = curr_marked.dup
106
+
107
+ rules.each do |a_rule|
108
+ next unless a_rule.generative?.nil?
109
+ if a_rule.empty?
110
+ a_rule.generative = false
111
+ curr_marked << a_rule
112
+ could_mark_nterm_generative(a_rule)
113
+ next
114
+ end
115
+
116
+ last_considered = nil
117
+ a_rule.rhs.members.each do |symbol|
118
+ last_considered = symbol
119
+ break unless symbol.generative?
120
+ end
121
+ next if last_considered.generative?.nil?
122
+ a_rule.generative = last_considered.generative?
123
+ curr_marked << a_rule
124
+ could_mark_nterm_generative(a_rule)
125
+ end
126
+ end until prev_marked.size == curr_marked.size
127
+
128
+ # The nonterminals that are not marked yet are non-generative
129
+ non_terminals.each do |nterm|
130
+ nterm.generative = false if nterm.generative?.nil?
131
+ end
132
+ end
133
+
134
+ # Given a production rule with given non-terminal
135
+ # Check whether that non-terminal should be marked
136
+ # as generative or not.
137
+ # A non-terminal may be marked as generative if at
138
+ # least one of its defining production is generative.
139
+ def could_mark_nterm_generative(aRule)
140
+ nterm = aRule.lhs
141
+
142
+ # non-terminal already marked? If yes, nothing more to do...
143
+ return unless nterm.generative?.nil?
144
+
145
+ defining_rules = rules_for(nterm) # Retrieve all defining productions
146
+
147
+ all_false = true
148
+ defining_rules.each do |prod|
149
+ if prod.generative?
150
+ # One generative rule found!
151
+ nterm.generative = true
152
+ all_false = false
153
+ break
154
+ else
155
+ all_false = false if prod.generative?.nil?
156
+ end
157
+ end
158
+ nterm.generative = false if all_false
159
+ end
160
+
68
161
 
69
162
  # For each non-terminal determine whether it is nullable or not.
70
163
  # A nullable nonterminal is a nonterminal that can match an empty string.
@@ -118,6 +211,12 @@ module Rley # This module is used as a namespace
118
211
  @symbols << aSymbol
119
212
  @name2symbol[its_name] = aSymbol
120
213
  end
214
+
215
+ # Retrieve all the production rules that share the same symbol in lhs
216
+ def rules_for(aNonTerm)
217
+ rules.select { |a_rule| a_rule.lhs == aNonTerm }
218
+ end
219
+
121
220
  end # class
122
221
  end # module
123
222
  end # module
@@ -1,56 +1,74 @@
1
+ require 'set'
1
2
  require_relative 'verbatim_symbol'
2
3
  require_relative 'literal'
4
+ require_relative 'terminal'
3
5
  require_relative 'non_terminal'
4
6
  require_relative 'production'
5
7
  require_relative 'grammar'
6
8
 
7
9
  module Rley # This module is used as a namespace
8
10
  module Syntax # This module is used as a namespace
9
- # Builder GoF pattern. Builder pattern builds a complex object
10
- # (say, a grammar) from simpler objects (terminals and productions)
11
- # and using a step by step approach.
11
+ # Builder GoF pattern. Builder builds a complex object
12
+ # (say, a grammar) from simpler objects (terminals and productions)
13
+ # and using a step by step approach.
12
14
  class GrammarBuilder
13
- # The list of symbols of the language.
14
- # Grammar symbols are categorized into terminal (symbol)
15
- # and non-terminal (symbol).
15
+ # @return [Hash{String, GrmSymbol}] The mapping of grammar symbol names
16
+ # to the matching grammar symbol object.
16
17
  attr_reader(:symbols)
17
18
 
18
- # The list of production rules for the grammar to build
19
+ # @return [Array<Production>] The list of production rules for
20
+ # the grammar to build.
19
21
  attr_reader(:productions)
20
22
 
21
-
23
+ # Creates a new grammar builder.
24
+ # @param aBlock [Proc] code block used to build the grammar.
25
+ # @example Building a tiny English grammar
26
+ # builder = Rley::Syntax::GrammarBuilder.new do
27
+ # add_terminals('n', 'v', 'adj', 'det')
28
+ # rule 'S' => %w(NP VP)
29
+ # rule 'VP' => %w(v NP)
30
+ # rule 'NP' => %w(det n)
31
+ # rule 'NP' => %w(adj NP)
32
+ # end
33
+ # tiny_eng = builder.grammar
22
34
  def initialize(&aBlock)
23
35
  @symbols = {}
24
36
  @productions = []
25
37
 
26
38
  instance_exec(&aBlock) if block_given?
27
39
  end
28
-
40
+
29
41
  # Retrieve a grammar symbol from its name.
30
42
  # Raise an exception if not found.
31
- # @param aSymbolName [String] the name of a symbol grammar.
32
- # @return [GrmSymbol] the retrieved symbol.
43
+ # @param aSymbolName [String] the name of a grammar symbol.
44
+ # @return [GrmSymbol] the retrieved symbol object.
33
45
  def [](aSymbolName)
34
46
  return symbols[aSymbolName]
35
47
  end
36
48
 
37
49
  # Add the given terminal symbols to the grammar of the language
38
50
  # @param terminalSymbols [String or Terminal] 1..* terminal symbols.
51
+ # @return [void]
39
52
  def add_terminals(*terminalSymbols)
40
53
  new_symbs = build_symbols(Terminal, terminalSymbols)
41
54
  symbols.merge!(new_symbs)
42
55
  end
43
56
 
44
57
 
45
- # Add a production rule in the grammar given one
58
+ # Add a production rule in the grammar given one
46
59
  # key-value pair of the form: String => Array.
47
- # Where the key is the name of the non-terminal appearing in the
48
- # left side of the rule.
60
+ # Where the key is the name of the non-terminal appearing in the
61
+ # left side of the rule.
49
62
  # The value, an Array, is a sequence of grammar symbol names.
50
63
  # The rule is created and inserted in the grammar.
51
- # Example:
52
- # builder.add_production('A' => ['a', 'A', 'c'])
53
- # @param aProductionRepr [Hash] A Hash-based representation of production
64
+ # @example Equivalent call syntaxes
65
+ # builder.add_production('A' => ['a', 'A', 'c'])
66
+ # builder.rule('A' => ['a', 'A', 'c']) # 'rule' is a synonym
67
+ # builder.rule('A' => %w(a A c)) # Use %w syntax for Array of String
68
+ # builder.rule 'A' => %w(a A c) # Call parentheses are optional
69
+ # @param aProductionRepr [Hash{String, Array<String>}] A Hash-based representation
70
+ # of a production.
71
+ # @return [void]
54
72
  def add_production(aProductionRepr)
55
73
  aProductionRepr.each_pair do |(lhs_name, rhs_repr)|
56
74
  lhs = get_nonterminal(lhs_name)
@@ -69,26 +87,37 @@ module Rley # This module is used as a namespace
69
87
 
70
88
  # Given the grammar symbols and productions added to the builder,
71
89
  # build the resulting grammar (if not yet done).
90
+ # @return [Grammar] the created grammar object.
72
91
  def grammar()
73
92
  unless @grammar
74
93
  raise StandardError, 'No symbol found for grammar' if symbols.empty?
75
94
  if productions.empty?
76
95
  raise StandardError, 'No production found for grammar'
77
96
  end
78
-
79
- # Check that each non-terminal appears at least once in lhs.
80
- all_non_terminals = symbols.values.select { |s| s.is_a?(NonTerminal) }
81
- all_non_terminals.each do |n_term|
82
- next if productions.any? { |prod| n_term == prod.lhs }
83
- raise StandardError, "Nonterminal #{n_term.name} not rewritten"
97
+
98
+ # Check that each terminal appears at least in a rhs of a production
99
+ all_terminals = symbols.values.select do |a_symb|
100
+ a_symb.kind_of?(Terminal)
101
+ end
102
+ in_use = Set.new
103
+ productions.each do |prod|
104
+ prod.rhs.members.each do |symb|
105
+ in_use << symb if symb.kind_of?(Syntax::Terminal)
106
+ end
107
+ end
108
+
109
+ unused = all_terminals.reject { |a_term| in_use.include?(a_term) }
110
+ unless unused.empty?
111
+ suffix = "#{unused.map(&:name).join(', ')}."
112
+ raise StandardError, 'Useless terminal symbol(s): ' + suffix
84
113
  end
85
114
 
86
115
  @grammar = Grammar.new(productions.dup)
87
116
  end
88
-
117
+
89
118
  return @grammar
90
119
  end
91
-
120
+
92
121
  alias rule add_production
93
122
 
94
123
  private
@@ -125,7 +154,7 @@ module Rley # This module is used as a namespace
125
154
 
126
155
  return a_symbol
127
156
  end
128
-
157
+
129
158
  # Retrieve the non-terminal symbol with given name.
130
159
  # If it doesn't exist yet, then it is created on the fly.
131
160
  # @param aSymbolName [String] the name of the grammar symbol to retrieve
@@ -136,7 +165,7 @@ module Rley # This module is used as a namespace
136
165
  end
137
166
  return symbols[aSymbolName]
138
167
  end
139
-
168
+
140
169
  end # class
141
170
  end # module
142
171
  end # module
@@ -2,28 +2,37 @@ module Rley # This module is used as a namespace
2
2
  module Syntax # This module is used as a namespace
3
3
  # Abstract class for grammar symbols.
4
4
  # A grammar symbol is an element that appears in grammar rules.
5
- class GrmSymbol
5
+ class GrmSymbol
6
6
  # The name of the grammar symbol
7
7
  attr_reader(:name)
8
8
 
9
+ # An indicator that tells whether the grammar symbol can generate a
10
+ # non-empty string of terminals.
11
+ attr_writer(:generative)
12
+
9
13
  # Constructor.
10
14
  # aName [String] The name of the grammar symbol.
11
15
  def initialize(aName)
12
16
  @name = aName.dup
13
17
  end
14
-
15
- # Return true iff the symbol is a terminal
16
- def terminal?()
17
- # Default implementation to override if necessary
18
- return false
19
- end
20
-
18
+
21
19
  # The String representation of the grammar symbol
22
20
  # @return [String]
23
21
  def to_s()
24
22
  return name.to_s
25
23
  end
26
-
24
+
25
+ # @return [Boolean] true iff the symbol is a terminal
26
+ def terminal?()
27
+ # Default implementation to override if necessary
28
+ return false
29
+ end
30
+
31
+ # @return [Boolean] true iff the symbol is generative.
32
+ def generative?()
33
+ return @generative
34
+ end
35
+
27
36
  end # class
28
37
  end # module
29
38
  end # module
@@ -5,7 +5,16 @@ module Rley # This module is used as a namespace
5
5
  # A non-terminal symbol (sometimes called a syntactic variable) represents
6
6
  # a composition of terminal or non-terminal symbols
7
7
  class NonTerminal < GrmSymbol
8
+ # A non-terminal symbol is nullable if it can match an empty string.
8
9
  attr_writer(:nullable)
10
+
11
+ # A non-terminal symbol is undefined if no production rule in the grammar
12
+ # has that non-terminal symbol in its left-hand side.
13
+ attr_writer(:undefined)
14
+
15
+ # A non-terminal symbol is unreachable if it cannot be reached (derived)
16
+ # from the start symbol.
17
+ attr_writer(:unreachable)
9
18
 
10
19
  # Constructor.
11
20
  # @param aName [String] The name of the grammar symbol.
@@ -21,6 +30,18 @@ module Rley # This module is used as a namespace
21
30
  def nullable?()
22
31
  return @nullable
23
32
  end
33
+
34
+ # @return [false/true] Return true if the symbol doesn't appear
35
+ # on the left-hand side of any production rule.
36
+ def undefined?()
37
+ return @undefined
38
+ end
39
+
40
+ # @return [false/true] Return true if the symbol cannot be derived
41
+ # from the start symbol.
42
+ def unreachable?()
43
+ return @unreachable
44
+ end
24
45
  end # class
25
46
  end # module
26
47
  end # module
@@ -15,6 +15,10 @@ module Rley # This module is used as a namespace
15
15
 
16
16
  # The left-hand side of the rule. It must be a non-terminal symbol
17
17
  attr_reader(:lhs)
18
+
19
+ # A production is generative when all of its rhs members are generative (that as, they
20
+ # can each generate/derive a non-empty string of terminals).
21
+ attr_writer(:generative)
18
22
 
19
23
  # Provide common alternate names to lhs and rhs accessors
20
24
 
@@ -31,6 +35,14 @@ module Rley # This module is used as a namespace
31
35
  def empty?()
32
36
  return rhs.empty?
33
37
  end
38
+
39
+ # Return true iff the production is generative
40
+ def generative?()
41
+ if @generative.nil?
42
+ end
43
+
44
+ return @generative
45
+ end
34
46
 
35
47
  private
36
48
 
@@ -10,6 +10,7 @@ module Rley # This module is used as a namespace
10
10
  # aName [String] The name of the grammar symbol.
11
11
  def initialize(aName)
12
12
  super(aName)
13
+ self.generative = true
13
14
  end
14
15
 
15
16
  # Return true iff the symbol is a terminal
@@ -154,6 +154,102 @@ module Rley # Open this namespace to avoid module qualifier prefixes
154
154
  end
155
155
  end
156
156
  end # context
157
+
158
+ context 'Provided services:' do
159
+ let(:problematic_grammar) do
160
+ # Based on grammar example in book
161
+ # C. Fisher, R. LeBlanc, "Crafting a Compiler"; page 98
162
+ builder = Rley::Syntax::GrammarBuilder.new
163
+ builder.add_terminals('a', 'b', 'c')
164
+ builder.add_production('S' => 'A')
165
+ builder.add_production('S' => 'B')
166
+ builder.add_production('A' => 'a')
167
+ # There is no edge between .B and B => B . b => non-generative
168
+ builder.add_production('B' => %w(B b))
169
+
170
+ # Non-terminal symbol C is unreachable
171
+ builder.add_production('C' => 'c')
172
+
173
+ # And now build the grammar...
174
+ builder.grammar
175
+ end
176
+
177
+ it 'should provide depth-first traversal' do
178
+ result = []
179
+ subject.traverse_df(subject.start_vertex) do |vertex|
180
+ result << vertex.label
181
+ end
182
+
183
+ expected = [
184
+ '.S',
185
+ 'S => . A',
186
+ '.A',
187
+ 'A => . a A c',
188
+ 'A => a . A c',
189
+ 'A => a A . c',
190
+ 'A => a A c .',
191
+ 'A.',
192
+ 'A => . b',
193
+ 'A => b .',
194
+ 'S.'
195
+ ]
196
+ expect(result).to eq(expected)
197
+ end
198
+
199
+ it 'should perform a diagnosis of a correct grammar' do
200
+ expect { subject.diagnose }.not_to raise_error
201
+ grammar_abc.non_terminals.each do |nterm|
202
+ expect(nterm).not_to be_undefined
203
+ expect(nterm).not_to be_unreachable
204
+ end
205
+ end
206
+
207
+ it 'should detect when a non-terminal is unreachable' do
208
+ grammar = problematic_grammar
209
+ items = build_items_for_grammar(grammar)
210
+
211
+ graph = GrmFlowGraph.new(items)
212
+ expect { graph.diagnose }.not_to raise_error
213
+ grammar.non_terminals.each do |nterm|
214
+ expect(nterm).not_to be_undefined
215
+ end
216
+
217
+ unreachable = grammar.non_terminals.select do |nterm|
218
+ nterm.unreachable?
219
+ end
220
+ expect(unreachable.size).to eq(1)
221
+ expect(unreachable[0].name).to eq('C')
222
+ end
223
+ end # context
224
+
225
+ =begin
226
+ context 'Grammar without undefined symbols:' do
227
+ it 'should mark all its nonterminals as not undefined' do
228
+ nonterms = subject.non_terminals
229
+ nonterms.each do |nterm|
230
+ expect(nterm).not_to be_undefined
231
+ end
232
+ end
233
+ end # context
234
+
235
+ context 'Grammar with undefined symbols:' do
236
+ subject do
237
+ productions = [prod_S, prod_A1, prod_A2, prod_A3]
238
+ Grammar.new(productions)
239
+ end
240
+
241
+ it 'should detect its nonterminals that are undefined' do
242
+ nonterms = subject.non_terminals
243
+ culprits = nonterms.select do |nterm|
244
+ nterm.undefined?
245
+ end
246
+
247
+ expect(culprits.size).to eq(1)
248
+ expect(culprits[0]).to eq(nt_C)
249
+ end
250
+ end # context
251
+ =end
252
+
157
253
  end # describe
158
254
  end # module
159
255
  end # module
@@ -9,7 +9,7 @@ module Rley # Open this namespace to avoid module qualifier prefixes
9
9
  context 'Initialization without argument:' do
10
10
  it 'could be created without argument' do
11
11
  expect { GrammarBuilder.new }.not_to raise_error
12
- end
12
+ end
13
13
 
14
14
  it 'should have no grammar symbols at start' do
15
15
  expect(subject.symbols).to be_empty
@@ -19,12 +19,12 @@ module Rley # Open this namespace to avoid module qualifier prefixes
19
19
  expect(subject.productions).to be_empty
20
20
  end
21
21
  end # context
22
-
22
+
23
23
  context 'Initialization with argument:' do
24
24
  it 'could be created with a block argument' do
25
25
  expect do GrammarBuilder.new { nil }
26
26
  end.not_to raise_error
27
- end
27
+ end
28
28
 
29
29
  it 'could have grammar symbols from block argument' do
30
30
  instance = GrammarBuilder.new do
@@ -36,7 +36,7 @@ module Rley # Open this namespace to avoid module qualifier prefixes
36
36
  it 'should have no productions at start' do
37
37
  expect(subject.productions).to be_empty
38
38
  end
39
- end # context
39
+ end # context
40
40
 
41
41
  context 'Adding symbols:' do
42
42
  it 'should build terminals from their names' do
@@ -122,8 +122,8 @@ module Rley # Open this namespace to avoid module qualifier prefixes
122
122
  expect(subject.grammar).to be_kind_of(Grammar)
123
123
  grm = subject.grammar
124
124
  expect(grm.rules).to eq(subject.productions)
125
-
126
- # Invoking the factory method again should return
125
+
126
+ # Invoking the factory method again should return
127
127
  # the same grammar object
128
128
  second_time = subject.grammar
129
129
  expect(second_time).to eq(grm)
@@ -144,13 +144,18 @@ module Rley # Open this namespace to avoid module qualifier prefixes
144
144
  expect { instance.grammar }.to raise_error(err, msg)
145
145
  end
146
146
 
147
- it 'should complain when non-terminal has no production' do
148
- instance = GrammarBuilder.new
149
- instance.add_terminals('a', 'b', 'c')
150
- instance.add_production('S' => ['A'])
147
+ it 'should complain if one or more terminals are useless' do
148
+ # Add one useless terminal symbol
149
+ subject.add_terminals('d')
150
+
151
151
  err = StandardError
152
- msg = 'Nonterminal A not rewritten'
153
- expect { instance.grammar }.to raise_error(err, msg)
152
+ msg = 'Useless terminal symbol(s): d.'
153
+ expect { subject.grammar }.to raise_error(err, msg)
154
+
155
+ # Add another useless terminal
156
+ subject.add_terminals('e')
157
+ msg = 'Useless terminal symbol(s): d, e.'
158
+ expect { subject.grammar }.to raise_error(err, msg)
154
159
  end
155
160
 
156
161
  it 'should build a grammar with nullable nonterminals' do
@@ -70,12 +70,16 @@ module Rley # Open this namespace to avoid module qualifier prefixes
70
70
  # A ::= "b".
71
71
  let(:nt_S) { NonTerminal.new('S') }
72
72
  let(:nt_A) { NonTerminal.new('A') }
73
+ let(:nt_B) { NonTerminal.new('B') }
74
+ let(:nt_C) { NonTerminal.new('C') }
75
+ let(:nt_D) { NonTerminal.new('D') }
73
76
  let(:a_) { VerbatimSymbol.new('a') }
74
77
  let(:b_) { VerbatimSymbol.new('b') }
75
78
  let(:c_) { VerbatimSymbol.new('c') }
76
79
  let(:prod_S) { Production.new(nt_S, [nt_A]) }
77
80
  let(:prod_A1) { Production.new(nt_A, [a_, nt_A, c_]) }
78
81
  let(:prod_A2) { Production.new(nt_A, [b_]) }
82
+ let(:prod_A3) {Production.new(nt_A, [c_, nt_C] ) }
79
83
 
80
84
  =begin
81
85
  # Non-terminals that specify the lexicon of the language
@@ -137,16 +141,16 @@ module Rley # Open this namespace to avoid module qualifier prefixes
137
141
  it 'should know all its symbols' do
138
142
  expect(subject.symbols).to eq([nt_S, nt_A, a_, c_, b_])
139
143
  end
140
-
144
+
141
145
  it 'should know all its non-terminal symbols' do
142
- expect(subject.non_terminals).to eq([nt_S, nt_A])
146
+ expect(subject.non_terminals).to eq([nt_S, nt_A])
143
147
  end
144
-
148
+
145
149
  it 'should know its start production' do
146
150
  expect(subject.start_production).to eq(prod_S)
147
151
  end
148
152
  end # context
149
-
153
+
150
154
  context 'Provided services:' do
151
155
  it 'should retrieve its symbols from their name' do
152
156
  expect(subject.name2symbol['S']).to eq(nt_S)
@@ -156,9 +160,64 @@ module Rley # Open this namespace to avoid module qualifier prefixes
156
160
  expect(subject.name2symbol['c']).to eq(c_)
157
161
  end
158
162
  end # context
159
-
163
+
164
+ context 'Grammar diagnosis:' do
165
+ it 'should mark any non-terminal that has no production' do
166
+ # S ::= A.
167
+ # S ::= B.
168
+ # A ::= "a" .
169
+ # B ::= C "b". # C doesn't appear on lhs of a rule
170
+ prod_S1 = Rley::Syntax::Production.new(nt_S, [nt_A])
171
+ prod_S2 = Rley::Syntax::Production.new(nt_S, [nt_B])
172
+ prod_A = Rley::Syntax::Production.new(nt_A, [a_])
173
+ prod_B = Rley::Syntax::Production.new(nt_B, [nt_C, b_]) # C is undefined
174
+ instance = Grammar.new([prod_S1, prod_S2, prod_A, prod_B])
175
+ undefineds = instance.non_terminals.select(&:undefined?)
176
+ expect(undefineds.size).to eq(1)
177
+ expect(undefineds.first).to eq(nt_C)
178
+ end
179
+
180
+ it 'should mark any non-terminal as generative or not' do
181
+ # S ::= A.
182
+ # S ::= B.
183
+ # A ::= "a" .
184
+ # B ::= C "b". # C doesn't appear on lhs of a rule
185
+ prod_S1 = Rley::Syntax::Production.new(nt_S, [nt_A])
186
+ prod_S2 = Rley::Syntax::Production.new(nt_S, [nt_B])
187
+ prod_A = Rley::Syntax::Production.new(nt_A, [a_])
188
+ prod_B = Rley::Syntax::Production.new(nt_B, [nt_C, b_]) # C is undefined
189
+ instance = Grammar.new([prod_S1, prod_S2, prod_A, prod_B])
190
+ partitioning = instance.non_terminals.partition(&:generative?)
191
+ expect(partitioning[0].size).to eq(2)
192
+ expect(partitioning[0]).to eq([nt_S, nt_A])
193
+ expect(partitioning[1]).to eq([nt_B, nt_C])
194
+ end
195
+
196
+ it "should do a diagnosis even for 'loopy' grammars" do
197
+ # 'S' => 'A'
198
+ # 'S' => 'B'
199
+ # 'A' => 'a'
200
+ # 'B' => 'C'
201
+ # 'C' => 'D'
202
+ # 'D' => 'B'
203
+ prod_S1 = Rley::Syntax::Production.new(nt_S, [nt_A])
204
+ prod_S2 = Rley::Syntax::Production.new(nt_S, [nt_B])
205
+ prod_A = Rley::Syntax::Production.new(nt_A, [a_])
206
+ prod_B = Rley::Syntax::Production.new(nt_B, [nt_C])
207
+ prod_C = Rley::Syntax::Production.new(nt_C, [nt_D])
208
+ prod_D = Rley::Syntax::Production.new(nt_D, [nt_B])
209
+ instance = Grammar.new([prod_S1, prod_S2, prod_A, prod_B, prod_C, prod_D])
210
+ partitioning = instance.non_terminals.partition(&:generative?)
211
+ expect(partitioning[0].size).to eq(2)
212
+ expect(partitioning[0]).to eq([nt_S, nt_A])
213
+ expect(partitioning[1]).to eq([nt_B, nt_C, nt_D])
214
+
215
+ undefined = instance.non_terminals.select(&:undefined?)
216
+ expect(undefined).to be_empty
217
+ end
218
+ end # context
219
+
160
220
  context 'Non-nullable grammar:' do
161
-
162
221
  it 'should mark all its nonterminals as non-nullable' do
163
222
  nonterms = subject.non_terminals
164
223
  nonterms.each do |nterm|
@@ -166,14 +225,14 @@ module Rley # Open this namespace to avoid module qualifier prefixes
166
225
  end
167
226
  end
168
227
  end # context
169
-
228
+
170
229
  context 'Nullable grammars:' do
171
230
  subject do
172
- prod_A3 = Production.new(nt_A, [])
173
- productions = [prod_S, prod_A1, prod_A2, prod_A3]
231
+ prod_A4 = Production.new(nt_A, [])
232
+ productions = [prod_S, prod_A1, prod_A2, prod_A4]
174
233
  Grammar.new(productions)
175
234
  end
176
-
235
+
177
236
  it 'should mark its nullable nonterminals' do
178
237
  # In the default grammar, all nonterminals are nullable
179
238
  nonterms = subject.non_terminals
@@ -181,8 +240,8 @@ module Rley # Open this namespace to avoid module qualifier prefixes
181
240
  expect(nterm).to be_nullable
182
241
  end
183
242
  end
184
-
185
243
  end # context
244
+
186
245
  end # describe
187
246
  end # module
188
247
  end # module
@@ -32,6 +32,22 @@ module Rley # Open this namespace to avoid module qualifier prefixes
32
32
  subject.nullable = false
33
33
  expect(subject).not_to be_nullable
34
34
  end
35
+
36
+ it 'should know whether it is defined' do
37
+ expect(subject.undefined?).to be_nil
38
+ subject.undefined = true
39
+ expect(subject).to be_undefined
40
+ subject.undefined = false
41
+ expect(subject).not_to be_undefined
42
+ end
43
+
44
+ it 'should know whether it is generative' do
45
+ expect(subject.generative?).to be_nil
46
+ subject.generative = true
47
+ expect(subject).to be_generative
48
+ subject.generative = false
49
+ expect(subject).not_to be_generative
50
+ end
35
51
  end # context
36
52
  end # describe
37
53
  end # module
@@ -25,6 +25,10 @@ module Rley # Open this namespace to avoid module qualifier prefixes
25
25
  it "should know that isn't nullable" do
26
26
  expect(subject).not_to be_nullable
27
27
  end
28
+
29
+ it "should know that it is generative" do
30
+ expect(subject).to be_generative
31
+ end
28
32
  end # context
29
33
  end # describe
30
34
  end # module
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rley
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.05
4
+ version: 0.4.06
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitri Geshef
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-05-06 00:00:00.000000000 Z
11
+ date: 2017-05-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake