rley 0.3.12 → 0.4.00
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +69 -5
- data/examples/NLP/mini_en_demo.rb +5 -1
- data/examples/data_formats/JSON/JSON_demo.rb +1 -0
- data/examples/general/calc/calc_demo.rb +2 -1
- data/lib/rley/constants.rb +1 -1
- data/lib/rley/parser/dotted_item.rb +1 -1
- data/lib/rley/parser/error_reason.rb +106 -0
- data/lib/rley/parser/gfg_chart.rb +1 -24
- data/lib/rley/parser/gfg_earley_parser.rb +28 -57
- data/lib/rley/parser/gfg_parsing.rb +54 -30
- data/lib/rley/ptree/token_range.rb +0 -5
- data/lib/rley/rley_error.rb +10 -0
- data/lib/rley/sppf/parse_forest.rb +7 -9
- data/spec/rley/parser/error_reason_spec.rb +120 -0
- data/spec/rley/parser/gfg_chart_spec.rb +3 -54
- data/spec/rley/parser/gfg_earley_parser_spec.rb +74 -63
- data/spec/rley/parser/gfg_parsing_spec.rb +2 -3
- data/spec/rley/support/grammar_pb_helper.rb +48 -0
- metadata +7 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: eb2c26370206f6c6eca059858ee0c8adedd32810
|
4
|
+
data.tar.gz: 77a42b3da998a2e8b073ec3a811287b71e6b3a3f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b16495b26269ee208ed3151f820a296d801ed7ca01ea9c98cf29b554da4ceba55719d67a7a7e15dc4fee9b70b54b1f08881ae0dc499b217f47db493b873af4eb
|
7
|
+
data.tar.gz: e463f9697c3cf8b012c8bc8c7736e675d6d355d3f81197bac7fb23529bb0c9e66c791d45ad833f2d6fadeb7eb2adb1a5eed6b3415292bb31fe8a02a43d2fed94
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
### 0.4.00 / 2016-12-17
|
2
|
+
* [CHANGE] Error reporting is vastly changed. Syntax errors don't raise exceptions.
|
3
|
+
parse error can be retrieved via an `ErrorReason` object. Such an object is returned by the
|
4
|
+
method `GFGParsing#failure_reason` method.
|
5
|
+
* [CHANGE] File `README.md` updated to reflect the new error reporting.
|
6
|
+
* [CHANGE] Examples updated to reflect the new error reporting.
|
7
|
+
|
1
8
|
### 0.3.12 / 2016-12-08
|
2
9
|
* [NEW] Directory `examples\general\calc`. A simple arithmetic expression demo parser.
|
3
10
|
|
data/README.md
CHANGED
@@ -64,7 +64,7 @@ Installing the latest stable version is simple:
|
|
64
64
|
|
65
65
|
## A whirlwind tour of Rley
|
66
66
|
The purpose of this section is show how to create a parser for a minimalistic
|
67
|
-
English language subset.
|
67
|
+
English language subset.
|
68
68
|
The tour is organized into the following steps:
|
69
69
|
1. [Defining the language grammar](#defining-the-language-grammar)
|
70
70
|
2. [Creating a lexicon](#creating-a-lexicon)
|
@@ -73,7 +73,7 @@ The tour is organized into the following steps:
|
|
73
73
|
5. [Parsing some input](#parsing-some-input)
|
74
74
|
6. [Generating the parse forest](#generating-the-parse-forest)
|
75
75
|
|
76
|
-
The complete source code of the tour can be found in the
|
76
|
+
The complete source code of the example used in this tour can be found in the
|
77
77
|
[examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
|
78
78
|
directory
|
79
79
|
|
@@ -86,7 +86,7 @@ The subset of English grammar is based on an example from the NLTK book.
|
|
86
86
|
# Instantiate a builder object that will build the grammar for us
|
87
87
|
builder = Rley::Syntax::GrammarBuilder.new do
|
88
88
|
# Terminal symbols (= word categories in lexicon)
|
89
|
-
add_terminals('Noun', 'Proper-Noun', 'Verb')
|
89
|
+
add_terminals('Noun', 'Proper-Noun', 'Verb')
|
90
90
|
add_terminals('Determiner', 'Preposition')
|
91
91
|
|
92
92
|
# Here we define the productions (= grammar rules)
|
@@ -97,7 +97,7 @@ The subset of English grammar is based on an example from the NLTK book.
|
|
97
97
|
rule 'VP' => %w[Verb NP]
|
98
98
|
rule 'VP' => %w[Verb NP PP]
|
99
99
|
rule 'PP' => %w[Preposition NP]
|
100
|
-
end
|
100
|
+
end
|
101
101
|
# And now, let's build the grammar...
|
102
102
|
grammar = builder.grammar
|
103
103
|
```
|
@@ -178,11 +178,75 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
|
|
178
178
|
pforest = result.parse_forest
|
179
179
|
```
|
180
180
|
|
181
|
+
## Error reporting
|
182
|
+
__Rley__ is a non-violent parser, that is, it won't throw an exception when it
|
183
|
+
detects a syntax error. Instead, the parse result will be marked as
|
184
|
+
non-successful. The parse error can then be identified by calling the
|
185
|
+
`GFGParsing#failure_reason` method. This method returns an error reason object
|
186
|
+
which can help to produce an error message.
|
187
|
+
|
188
|
+
Consider the example from the [Parsing some input](#parsing-some-input) section
|
189
|
+
above and, as an error, we delete the verb `saw` in the sentence to parse.
|
190
|
+
|
191
|
+
```ruby
|
192
|
+
# Verb has been removed from the sentence on next line
|
193
|
+
input_to_parse = 'John Mary with a telescope'
|
194
|
+
# Convert input text into a sequence of token objects...
|
195
|
+
tokens = tokenizer(input_to_parse, grammar)
|
196
|
+
result = parser.parse(tokens)
|
197
|
+
|
198
|
+
puts "Parsing successful? #{result.success?}" # => Parsing successful? false
|
199
|
+
exit(1)
|
200
|
+
```
|
201
|
+
|
202
|
+
As expected, the parse is now failing.
|
203
|
+
To get an error message, one just need to retrieve the error reason and
|
204
|
+
ask it to generate a message.
|
205
|
+
```ruby
|
206
|
+
# Show error message if parse fails...
|
207
|
+
puts result.failure_reason.message unless result.success?
|
208
|
+
```
|
209
|
+
|
210
|
+
Re-running the example with the error, result in the error message:
|
211
|
+
```
|
212
|
+
Syntax error at or near token 2 >>>Mary<<<
|
213
|
+
Expected one 'Verb', found a 'Proper-Noun' instead.
|
214
|
+
```
|
215
|
+
|
216
|
+
The standard __Rley__ message not only inform about the location of
|
217
|
+
the mistake, it also provides some hint by disclosing its expectations.
|
218
|
+
|
219
|
+
Let's experiment again with the original sentence but without the word
|
220
|
+
`telescope`.
|
221
|
+
|
222
|
+
```ruby
|
223
|
+
# Last word has been removed from the sentence on next line
|
224
|
+
input_to_parse = 'John saw Mary with a '
|
225
|
+
# Convert input text into a sequence of token objects...
|
226
|
+
tokens = tokenizer(input_to_parse, grammar)
|
227
|
+
result = parser.parse(tokens)
|
228
|
+
|
229
|
+
puts "Parsing successful? #{result.success?}" # => Parsing successful? false
|
230
|
+
unless result.success?
|
231
|
+
puts result.failure_reason.message
|
232
|
+
exit(1)
|
233
|
+
end
|
234
|
+
```
|
235
|
+
|
236
|
+
This time, the following output is displayed:
|
237
|
+
```
|
238
|
+
Parsing successful? false
|
239
|
+
Premature end of input after 'a' at position 5
|
240
|
+
Expected one 'Noun'.
|
241
|
+
```
|
242
|
+
Again, the resulting error message is user-friendly.
|
243
|
+
Remark: currently, Rley reports an error position as the index of the
|
244
|
+
input token with which the error was detected.
|
181
245
|
|
182
246
|
|
183
247
|
## Examples
|
184
248
|
|
185
|
-
The project source directory contains several example scripts that demonstrate
|
249
|
+
The project source directory contains several example scripts that demonstrate
|
186
250
|
how grammars are to be constructed and used.
|
187
251
|
|
188
252
|
|
@@ -83,7 +83,11 @@ input_to_parse = 'John saw Mary with a telescope'
|
|
83
83
|
tokens = tokenizer(input_to_parse, grammar)
|
84
84
|
result = parser.parse(tokens)
|
85
85
|
|
86
|
-
puts "Parsing successful? #{result.success?}"
|
86
|
+
puts "Parsing successful? #{result.success?}"
|
87
|
+
unless result.success?
|
88
|
+
puts result.failure_reason.message
|
89
|
+
exit(1)
|
90
|
+
end
|
87
91
|
|
88
92
|
########################################
|
89
93
|
# Step 6. Generating the parse forest
|
@@ -22,7 +22,8 @@ result = parser.parse_expression(ARGV[0])
|
|
22
22
|
|
23
23
|
unless result.success?
|
24
24
|
# Stop if the parse failed...
|
25
|
-
puts "Parsing of '#{
|
25
|
+
puts "Parsing of '#{ARGV[0]}' failed"
|
26
|
+
puts "Reason: #{result.failure_reason.message}"
|
26
27
|
exit(1)
|
27
28
|
end
|
28
29
|
|
data/lib/rley/constants.rb
CHANGED
@@ -115,7 +115,7 @@ module Rley # This module is used as a namespace
|
|
115
115
|
|
116
116
|
private
|
117
117
|
|
118
|
-
# Return the given after its validation.
|
118
|
+
# Return the given position after its validation.
|
119
119
|
def valid_position(aPosition)
|
120
120
|
rhs_size = production.rhs.size
|
121
121
|
if aPosition < 0 || aPosition > rhs_size
|
@@ -0,0 +1,106 @@
|
|
1
|
+
module Rley # Module used as a namespace
|
2
|
+
module Parser # This module is used as a namespace
|
3
|
+
# Abstract class. An instance represents an explanation describing
|
4
|
+
# the likely cause of a parse error
|
5
|
+
# detected by Rley.
|
6
|
+
class ErrorReason
|
7
|
+
# The position of the offending input token
|
8
|
+
attr_reader(:position)
|
9
|
+
|
10
|
+
# The failing production
|
11
|
+
attr_reader(:production)
|
12
|
+
|
13
|
+
def initialize(aPosition)
|
14
|
+
@position = aPosition
|
15
|
+
end
|
16
|
+
|
17
|
+
# Returns the result of invoking reason.to_s.
|
18
|
+
def message()
|
19
|
+
return self.to_s
|
20
|
+
end
|
21
|
+
|
22
|
+
# Return this reason's class name and message
|
23
|
+
def inspect
|
24
|
+
"#{self.class.name}: #{message}"
|
25
|
+
end
|
26
|
+
end # class
|
27
|
+
|
28
|
+
|
29
|
+
# This parse error occurs when no input for parsing was provided
|
30
|
+
# while the grammar requires some non-empty input.
|
31
|
+
class NoInput < ErrorReason
|
32
|
+
def initialize()
|
33
|
+
super(0)
|
34
|
+
end
|
35
|
+
|
36
|
+
# Returns the reason's message.
|
37
|
+
def to_s
|
38
|
+
'Input cannot be empty.'
|
39
|
+
end
|
40
|
+
end # class
|
41
|
+
|
42
|
+
# Abstract class and subclass of ErrorReason.
|
43
|
+
# This specialization represents errors in which the input
|
44
|
+
# didn't match one of the expected token.
|
45
|
+
class ExpectationNotMet < ErrorReason
|
46
|
+
# The last input token read when error was detected
|
47
|
+
attr_reader(:last_token)
|
48
|
+
|
49
|
+
# The terminal symbols expected when error was occurred
|
50
|
+
attr_reader(:expected_terminals)
|
51
|
+
|
52
|
+
def initialize(aPosition, lastToken, expectedTerminals)
|
53
|
+
super(aPosition)
|
54
|
+
@last_token = lastToken.dup
|
55
|
+
@expected_terminals = expectedTerminals.dup
|
56
|
+
end
|
57
|
+
|
58
|
+
protected
|
59
|
+
|
60
|
+
# Emit a text explaining the expected terminal symbols
|
61
|
+
def expectations
|
62
|
+
term_names = expected_terminals.map(&:name)
|
63
|
+
explain = 'Expected one '
|
64
|
+
explain << if expected_terminals.size > 1
|
65
|
+
"of: ['#{term_names.join("', '")}']"
|
66
|
+
else
|
67
|
+
"'#{term_names[0]}'"
|
68
|
+
end
|
69
|
+
return explain
|
70
|
+
end
|
71
|
+
|
72
|
+
end # class
|
73
|
+
|
74
|
+
|
75
|
+
# This parse error occurs when the current token from the input
|
76
|
+
# is unexpected according to the grammar rules.
|
77
|
+
class UnexpectedToken < ExpectationNotMet
|
78
|
+
# Returns the reason's message.
|
79
|
+
def to_s
|
80
|
+
err_msg = "Syntax error at or near token #{position + 1} "
|
81
|
+
err_msg << ">>>#{last_token.lexeme}<<<\n"
|
82
|
+
err_msg << expectations
|
83
|
+
err_msg << ", found a '#{last_token.terminal.name}' instead."
|
84
|
+
|
85
|
+
return err_msg
|
86
|
+
end
|
87
|
+
end # class
|
88
|
+
|
89
|
+
|
90
|
+
# This parse error occurs when all input tokens were consumed
|
91
|
+
# but the parser still expected one or more tokens from the input.
|
92
|
+
class PrematureInputEnd < ExpectationNotMet
|
93
|
+
# Returns the reason's message.
|
94
|
+
def to_s
|
95
|
+
err_msg = "Premature end of input after '#{last_token.lexeme}'"
|
96
|
+
err_msg << " at position #{position + 1}\n"
|
97
|
+
err_msg << "#{expectations}."
|
98
|
+
|
99
|
+
return err_msg
|
100
|
+
end
|
101
|
+
end # class
|
102
|
+
end # module
|
103
|
+
end # module
|
104
|
+
|
105
|
+
# End of file
|
106
|
+
|
@@ -12,17 +12,8 @@ module Rley # This module is used as a namespace
|
|
12
12
|
# An array of entry sets (one per input token + 1)
|
13
13
|
attr_reader(:sets)
|
14
14
|
|
15
|
-
# The level of trace details reported on stdout during the parse.
|
16
|
-
# The possible values are:
|
17
|
-
# 0: No trace output (default case)
|
18
|
-
# 1: Show trace of scanning and completion rules
|
19
|
-
# 2: Same as of 1 with the addition of the prediction rules
|
20
|
-
attr_reader(:tracer)
|
21
|
-
|
22
15
|
# @param tokenCount [Fixnum] The number of lexemes in the input to parse.
|
23
|
-
|
24
|
-
def initialize(tokenCount, aGFGraph, aTracer)
|
25
|
-
@tracer = aTracer
|
16
|
+
def initialize(tokenCount, aGFGraph)
|
26
17
|
@sets = Array.new(tokenCount + 1) { |_| ParseEntrySet.new }
|
27
18
|
push_entry(aGFGraph.start_vertex, 0, 0, :start_rule)
|
28
19
|
end
|
@@ -53,20 +44,6 @@ module Rley # This module is used as a namespace
|
|
53
44
|
def push_entry(aVertex, anOrigin, anIndex, aReason)
|
54
45
|
new_entry = ParseEntry.new(aVertex, anOrigin)
|
55
46
|
pushed = self[anIndex].push_entry(new_entry)
|
56
|
-
if pushed == new_entry && tracer.level > 0
|
57
|
-
case aReason
|
58
|
-
when :start_rule, :prediction
|
59
|
-
tracer.trace_prediction(anIndex, new_entry)
|
60
|
-
|
61
|
-
when :scanning
|
62
|
-
tracer.trace_scanning(anIndex, new_entry)
|
63
|
-
|
64
|
-
when :completion
|
65
|
-
tracer.trace_completion(anIndex, new_entry)
|
66
|
-
else
|
67
|
-
raise NotImplementedError, "Unknown push_entry mode #{aReason}"
|
68
|
-
end
|
69
|
-
end
|
70
47
|
|
71
48
|
return pushed
|
72
49
|
end
|
@@ -17,33 +17,34 @@ module Rley # This module is used as a namespace
|
|
17
17
|
# Parse a sequence of input tokens.
|
18
18
|
# @param aTokenSequence [Array] Array of Tokens objects returned by a
|
19
19
|
# tokenizer/scanner/lexer.
|
20
|
-
# @param aTraceLevel [Fixnum] The specified trace level.
|
21
|
-
# The possible values are:
|
22
|
-
# 0: No trace output (default case)
|
23
|
-
# 1: Show trace of scanning and completion rules
|
24
|
-
# 2: Same as of 1 with the addition of the prediction rules
|
25
20
|
# @return [Parsing] an object that embeds the parse results.
|
26
|
-
def parse(aTokenSequence
|
27
|
-
|
28
|
-
result = GFGParsing.new(gf_graph, aTokenSequence, tracer)
|
21
|
+
def parse(aTokenSequence)
|
22
|
+
result = GFGParsing.new(gf_graph, aTokenSequence)
|
29
23
|
last_token_index = aTokenSequence.size
|
24
|
+
if last_token_index == 0 && !grammar.start_symbol.nullable?
|
25
|
+
return unexpected_empty_input(result)
|
26
|
+
end
|
27
|
+
|
30
28
|
(0..last_token_index).each do |i|
|
31
|
-
handle_error(result) if result.chart[i].empty?
|
32
29
|
result.chart[i].each do |entry|
|
33
30
|
# Is entry of the form? [A => alpha . B beta, k]...
|
34
31
|
next_symbol = entry.next_symbol
|
35
32
|
if next_symbol && next_symbol.kind_of?(Syntax::NonTerminal)
|
36
33
|
# ...apply the Call rule
|
37
|
-
call_rule(result, entry, i
|
34
|
+
call_rule(result, entry, i)
|
38
35
|
end
|
39
36
|
|
40
|
-
exit_rule(result, entry, i
|
41
|
-
start_rule(result, entry, i
|
42
|
-
end_rule(result, entry, i
|
37
|
+
exit_rule(result, entry, i) if entry.exit_entry?
|
38
|
+
start_rule(result, entry, i) if entry.start_entry?
|
39
|
+
end_rule(result, entry, i) if entry.end_entry?
|
40
|
+
end
|
41
|
+
if i < last_token_index
|
42
|
+
scan_success = scan_rule(result, i)
|
43
|
+
break unless scan_success
|
43
44
|
end
|
44
|
-
scan_rule(result, i, tracer) if i < last_token_index
|
45
45
|
end
|
46
|
-
|
46
|
+
|
47
|
+
result.done # End of parsing process
|
47
48
|
return result
|
48
49
|
end
|
49
50
|
|
@@ -55,10 +56,7 @@ module Rley # This module is used as a namespace
|
|
55
56
|
# Then the entry [.B, i] is added to the current sigma set.
|
56
57
|
# Gist: when an entry expects the non-terminal symbol B, then
|
57
58
|
# add an entry with start vertex .B
|
58
|
-
def call_rule(aParsing, anEntry, aPosition
|
59
|
-
if aTracer.level > 1
|
60
|
-
puts "Chart[#{aPosition}] Call rule applied upon #{anEntry}:"
|
61
|
-
end
|
59
|
+
def call_rule(aParsing, anEntry, aPosition)
|
62
60
|
aParsing.call_rule(anEntry, aPosition)
|
63
61
|
end
|
64
62
|
|
@@ -69,10 +67,7 @@ module Rley # This module is used as a namespace
|
|
69
67
|
# is added to the current sigma set.
|
70
68
|
# Gist: for an entry corresponding to a start vertex, add an entry
|
71
69
|
# for each entry edge in the graph.
|
72
|
-
def start_rule(aParsing, anEntry, aPosition
|
73
|
-
if aTracer.level > 1
|
74
|
-
puts "Chart[#{aPosition}] Start rule applied upon #{anEntry}:"
|
75
|
-
end
|
70
|
+
def start_rule(aParsing, anEntry, aPosition)
|
76
71
|
aParsing.start_rule(anEntry, aPosition)
|
77
72
|
end
|
78
73
|
|
@@ -81,10 +76,7 @@ module Rley # This module is used as a namespace
|
|
81
76
|
# production. Then entry [B., k] is added to the current entry set.
|
82
77
|
# Gist: for an entry corresponding to a reduced production, add an entry
|
83
78
|
# for each exit edge in the graph.
|
84
|
-
def exit_rule(aParsing, anEntry, aPosition
|
85
|
-
if aTracer.level > 1
|
86
|
-
puts "Chart[#{aPosition}] Exit rule applied upon #{anEntry}:"
|
87
|
-
end
|
79
|
+
def exit_rule(aParsing, anEntry, aPosition)
|
88
80
|
aParsing.exit_rule(anEntry, aPosition)
|
89
81
|
end
|
90
82
|
|
@@ -92,10 +84,7 @@ module Rley # This module is used as a namespace
|
|
92
84
|
# is added to a parse entry set with index j.
|
93
85
|
# then for every entry of the form [A => α . B γ, i] in the kth sigma set
|
94
86
|
# the entry [A => α B . γ, i] is added to the jth sigma set.
|
95
|
-
def end_rule(aParsing, anEntry, aPosition
|
96
|
-
if aTracer.level > 1
|
97
|
-
puts "Chart[#{aPosition}] End rule applied upon #{anEntry}:"
|
98
|
-
end
|
87
|
+
def end_rule(aParsing, anEntry, aPosition)
|
99
88
|
aParsing.end_rule(anEntry, aPosition)
|
100
89
|
end
|
101
90
|
|
@@ -105,35 +94,17 @@ module Rley # This module is used as a namespace
|
|
105
94
|
# and allow them to cross the edge, adding the node on the back side
|
106
95
|
# of the edge as an entry to the next sigma set:
|
107
96
|
# add an entry to the next sigma set [A => α t . γ, i]
|
108
|
-
def scan_rule(aParsing, aPosition
|
109
|
-
if aTracer.level > 1
|
110
|
-
prefix = "Chart[#{aPosition}] Scan rule applied upon "
|
111
|
-
puts prefix + aParsing.tokens[aPosition].to_s
|
112
|
-
end
|
97
|
+
def scan_rule(aParsing, aPosition)
|
113
98
|
aParsing.scan_rule(aPosition)
|
114
99
|
end
|
100
|
+
|
101
|
+
# Parse error detected: no input tokens provided while the grammar
|
102
|
+
# forbids this this.
|
103
|
+
def unexpected_empty_input(aParsing)
|
104
|
+
aParsing.faulty(NoInput.new)
|
105
|
+
return aParsing
|
106
|
+
end
|
115
107
|
|
116
|
-
# Raise an exception to indicate a syntax error.
|
117
|
-
def handle_error(aParsing)
|
118
|
-
# Retrieve the first empty state set
|
119
|
-
pos = aParsing.chart.sets.find_index(&:empty?)
|
120
|
-
lexeme_at_pos = aParsing.tokens[pos - 1].lexeme
|
121
|
-
puts "chart index: #{pos - 1}"
|
122
|
-
terminals = aParsing.chart.sets[pos - 1].expected_terminals
|
123
|
-
puts "count expected terminals #{terminals.size}"
|
124
|
-
entries = aParsing.chart.sets[pos - 1].entries.map(&:to_s).join("\n")
|
125
|
-
puts "Items #{entries}"
|
126
|
-
term_names = terminals.map(&:name)
|
127
|
-
err_msg = "Syntax error at or near token #{pos}"
|
128
|
-
err_msg << ">>>#{lexeme_at_pos}<<<:\nExpected "
|
129
|
-
err_msg << if terminals.size > 1
|
130
|
-
"one of: ['#{term_names.join("', '")}'],"
|
131
|
-
else
|
132
|
-
": #{term_names[0]},"
|
133
|
-
end
|
134
|
-
err_msg << " found a '#{aParsing.tokens[pos - 1].terminal.name}'"
|
135
|
-
raise StandardError, err_msg + ' instead.'
|
136
|
-
end
|
137
108
|
end # class
|
138
109
|
end # module
|
139
110
|
end # module
|
@@ -1,4 +1,5 @@
|
|
1
1
|
require_relative 'gfg_chart'
|
2
|
+
require_relative 'error_reason'
|
2
3
|
require_relative 'parse_entry_tracker'
|
3
4
|
require_relative 'parse_forest_factory'
|
4
5
|
|
@@ -15,22 +16,21 @@ module Rley # This module is used as a namespace
|
|
15
16
|
# The sequence of input token to parse
|
16
17
|
attr_reader(:tokens)
|
17
18
|
|
18
|
-
# A Hash with pairs of the form:
|
19
|
+
# A Hash with pairs of the form:
|
19
20
|
# parse entry => [ antecedent parse entries ]
|
20
21
|
# It associates to a every parse entry its antecedent(s), that is,
|
21
|
-
# the parse entry/ies that causes the key parse entry to be created
|
22
|
+
# the parse entry/ies that causes the key parse entry to be created
|
22
23
|
# with one the gfg rules
|
23
24
|
attr_reader(:antecedence)
|
24
25
|
|
25
|
-
#
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
def initialize(theGFG, theTokens, aTracer)
|
26
|
+
# The reason of a parse failure
|
27
|
+
attr_reader(:failure_reason)
|
28
|
+
|
29
|
+
|
30
|
+
def initialize(theGFG, theTokens)
|
31
31
|
@gf_graph = theGFG
|
32
32
|
@tokens = theTokens.dup
|
33
|
-
@chart = GFGChart.new(tokens.size, gf_graph
|
33
|
+
@chart = GFGChart.new(tokens.size, gf_graph)
|
34
34
|
@antecedence = Hash.new { |hash, key| hash[key] = [] }
|
35
35
|
antecedence[chart[0].first]
|
36
36
|
end
|
@@ -45,7 +45,7 @@ module Rley # This module is used as a namespace
|
|
45
45
|
next_symbol = anEntry.next_symbol
|
46
46
|
start_vertex = gf_graph.start_vertex_for[next_symbol]
|
47
47
|
pos = aPosition
|
48
|
-
apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
|
48
|
+
apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
|
49
49
|
end
|
50
50
|
|
51
51
|
# Let the current sigma set be the ith parse entry set.
|
@@ -65,7 +65,7 @@ module Rley # This module is used as a namespace
|
|
65
65
|
end
|
66
66
|
|
67
67
|
# This method must be invoked when an entry is added to a parse entry set
|
68
|
-
# and is of the form [B => γ ., k] (the dot is at the end of the
|
68
|
+
# and is of the form [B => γ ., k] (the dot is at the end of the
|
69
69
|
# production. Then entry [B., k] is added to the current entry set.
|
70
70
|
# Gist: for an entry corresponding to a reduced production, add an entry
|
71
71
|
# for each exit edge in the graph.
|
@@ -96,11 +96,12 @@ module Rley # This module is used as a namespace
|
|
96
96
|
end
|
97
97
|
|
98
98
|
# Given that the terminal t is at the specified position,
|
99
|
-
# Locate all entries in the current sigma set that expect t:
|
99
|
+
# Locate all entries in the current sigma set that expect t:
|
100
100
|
# [A => α . t γ, i]
|
101
101
|
# and allow them to cross the edge, adding the node on the back side
|
102
102
|
# of the edge as an entry to the next sigma set:
|
103
103
|
# add an entry to the next sigma set [A => α t . γ, i]
|
104
|
+
# returns true if next token matches the expectations, false otherwise.
|
104
105
|
def scan_rule(aPosition)
|
105
106
|
terminal = tokens[aPosition].terminal
|
106
107
|
|
@@ -108,7 +109,10 @@ module Rley # This module is used as a namespace
|
|
108
109
|
expecting_term = chart[aPosition].entries4term(terminal)
|
109
110
|
|
110
111
|
# ... if the terminal isn't expected then we have an error
|
111
|
-
|
112
|
+
if expecting_term.empty?
|
113
|
+
unexpected_token(aPosition)
|
114
|
+
return false
|
115
|
+
end
|
112
116
|
|
113
117
|
expecting_term.each do |ntry|
|
114
118
|
# Get the vertices after the expected terminal
|
@@ -119,6 +123,8 @@ module Rley # This module is used as a namespace
|
|
119
123
|
apply_rule(ntry, vertex_after_terminal, origin, pos, :scan_rule)
|
120
124
|
end
|
121
125
|
end
|
126
|
+
|
127
|
+
return true
|
122
128
|
end
|
123
129
|
|
124
130
|
|
@@ -136,7 +142,7 @@ module Rley # This module is used as a namespace
|
|
136
142
|
end
|
137
143
|
|
138
144
|
# Factory method. Builds a ParseForest from the parse result.
|
139
|
-
# @return [ParseForest]
|
145
|
+
# @return [ParseForest]
|
140
146
|
def parse_forest()
|
141
147
|
factory = ParseForestFactory.new(self)
|
142
148
|
|
@@ -148,7 +154,7 @@ module Rley # This module is used as a namespace
|
|
148
154
|
# with origin equal to zero.
|
149
155
|
def initial_entry()
|
150
156
|
return chart.initial_entry
|
151
|
-
end
|
157
|
+
end
|
152
158
|
|
153
159
|
# Retrieve the accepting parse entry that represents
|
154
160
|
# a complete, successful parse
|
@@ -158,25 +164,43 @@ module Rley # This module is used as a namespace
|
|
158
164
|
return chart.accepting_entry
|
159
165
|
end
|
160
166
|
|
167
|
+
# Mark the parse as erroneous
|
168
|
+
def faulty(aReason)
|
169
|
+
@failure_reason = aReason
|
170
|
+
end
|
171
|
+
|
172
|
+
# A notification that the parsing reached an end
|
173
|
+
def done
|
174
|
+
unless self.success? || self.failure_reason
|
175
|
+
# Parse not successful and no reason identified
|
176
|
+
# Assuming that parse failed because of a premature end
|
177
|
+
premature_end
|
178
|
+
end
|
179
|
+
end
|
180
|
+
|
161
181
|
private
|
162
182
|
|
163
|
-
#
|
164
|
-
|
165
|
-
|
166
|
-
|
167
|
-
|
183
|
+
# Parse error detected: all input tokens were consumed and
|
184
|
+
# the parser didn't detect syntax error meanwhile but
|
185
|
+
# could not reach the accepting state.
|
186
|
+
def premature_end
|
187
|
+
token_pos = tokens.size # One-based!
|
188
|
+
last_token = tokens[-1]
|
189
|
+
entry_set = chart.sets[tokens.size]
|
190
|
+
expected = entry_set.expected_terminals
|
191
|
+
|
192
|
+
reason = PrematureInputEnd.new(token_pos - 1, last_token, expected)
|
193
|
+
faulty(reason)
|
194
|
+
end
|
168
195
|
|
196
|
+
# Parse error detected: input token doesn't match
|
197
|
+
# the expectations set by grammar rules
|
198
|
+
def unexpected_token(aPosition)
|
199
|
+
unexpected = tokens[aPosition]
|
169
200
|
expected = chart.sets[aPosition].expected_terminals
|
170
|
-
|
171
|
-
|
172
|
-
|
173
|
-
err_msg << if expected.size > 1
|
174
|
-
"one of: ['#{term_names.join("', '")}'],"
|
175
|
-
else
|
176
|
-
": #{term_names[0]},"
|
177
|
-
end
|
178
|
-
err_msg << " found a '#{actual.name}'"
|
179
|
-
raise StandardError, err_msg + ' instead.'
|
201
|
+
|
202
|
+
reason = UnexpectedToken.new(aPosition, unexpected, expected)
|
203
|
+
faulty(reason)
|
180
204
|
end
|
181
205
|
|
182
206
|
def apply_rule(antecedentEntry, aVertex, anOrigin, aPosition, aRuleId)
|
@@ -68,11 +68,6 @@ module Rley # This module is used as a namespace
|
|
68
68
|
return "[#{low_text}, #{high_text}]"
|
69
69
|
end
|
70
70
|
|
71
|
-
# Generate a String that represents a value-based identifier
|
72
|
-
def keystr()
|
73
|
-
return "#{low.object_id}-#{high.object_id}"
|
74
|
-
end
|
75
|
-
|
76
71
|
private
|
77
72
|
|
78
73
|
def assign_low(aRange)
|
@@ -4,15 +4,13 @@ require_relative 'alternative_node'
|
|
4
4
|
|
5
5
|
module Rley # This module is used as a namespace
|
6
6
|
module SPPF # This module is used as a namespace
|
7
|
-
#
|
8
|
-
# A parse
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
12
|
-
#
|
13
|
-
#
|
14
|
-
# during the parse.
|
15
|
-
# The root node corresponds to the main/start symbol of the grammar.
|
7
|
+
# In an ambiguous grammar there are valid inputs that can result in multiple
|
8
|
+
# parse trees. A set of parse trees is commonly referred to as a parse
|
9
|
+
# forest. More specifically a parse forest is a graph data
|
10
|
+
# structure designed to represent a set of equally syntactically correct
|
11
|
+
# parse trees. Parse forests generated by Rley are so-called Shared Packed
|
12
|
+
# Parse Forests (SPPF). SPPFs allow very compact representation of parse
|
13
|
+
# trees by sharing common sub-tree amongst the parse trees.
|
16
14
|
class ParseForest
|
17
15
|
# The root node of the forest
|
18
16
|
attr_reader(:root)
|
@@ -0,0 +1,120 @@
|
|
1
|
+
require_relative '../../spec_helper'
|
2
|
+
require_relative '../../../lib/rley/parser/token'
|
3
|
+
|
4
|
+
# Load the class under test
|
5
|
+
require_relative '../../../lib/rley/parser/error_reason'
|
6
|
+
module Rley # Open this namespace to avoid module qualifier prefixes
|
7
|
+
module Parser # Open this namespace to avoid module qualifier prefixes
|
8
|
+
describe NoInput do
|
9
|
+
context 'Initialization:' do
|
10
|
+
# Default instantiation rule
|
11
|
+
subject { NoInput.new }
|
12
|
+
|
13
|
+
it 'should be created without argument' do
|
14
|
+
expect { NoInput.new }.not_to raise_error
|
15
|
+
end
|
16
|
+
|
17
|
+
it 'should know the error position' do
|
18
|
+
expect(subject.position).to eq(0)
|
19
|
+
end
|
20
|
+
end # context
|
21
|
+
|
22
|
+
context 'Provided services:' do
|
23
|
+
it 'should emit a standard message' do
|
24
|
+
text = 'Input cannot be empty.'
|
25
|
+
expect(subject.to_s).to eq(text)
|
26
|
+
expect(subject.message).to eq(text)
|
27
|
+
end
|
28
|
+
|
29
|
+
it 'should give a clear inspection text' do
|
30
|
+
text = 'Rley::Parser::NoInput: Input cannot be empty.'
|
31
|
+
expect(subject.inspect).to eq(text)
|
32
|
+
end
|
33
|
+
end # context
|
34
|
+
end # describe
|
35
|
+
|
36
|
+
describe ExpectationNotMet do
|
37
|
+
let(:err_token) { double('fake-token') }
|
38
|
+
let(:terminals) do
|
39
|
+
['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
|
40
|
+
end
|
41
|
+
|
42
|
+
# Default instantiation rule
|
43
|
+
subject { ExpectationNotMet.new(3, err_token, terminals) }
|
44
|
+
|
45
|
+
context 'Initialization:' do
|
46
|
+
it 'should be created with arguments' do
|
47
|
+
expect { ExpectationNotMet.new(3, err_token, terminals) }.not_to raise_error
|
48
|
+
end
|
49
|
+
|
50
|
+
it 'should know the error position' do
|
51
|
+
expect(subject.position).to eq(3)
|
52
|
+
end
|
53
|
+
|
54
|
+
it 'should know the expected terminals' do
|
55
|
+
expect(subject.expected_terminals).to eq(terminals)
|
56
|
+
end
|
57
|
+
end # context
|
58
|
+
end # describe
|
59
|
+
|
60
|
+
|
61
|
+
describe UnexpectedToken do
|
62
|
+
let(:err_lexeme) { '-'}
|
63
|
+
let(:err_terminal) { Syntax::Terminal.new('MINUS') }
|
64
|
+
let(:err_token) { Token.new(err_lexeme, err_terminal) }
|
65
|
+
let(:terminals) do
|
66
|
+
['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
|
67
|
+
end
|
68
|
+
|
69
|
+
# Default instantiation rule
|
70
|
+
subject { UnexpectedToken.new(3, err_token, terminals) }
|
71
|
+
|
72
|
+
context 'Initialization:' do
|
73
|
+
it 'should be created with arguments' do
|
74
|
+
expect { UnexpectedToken.new(3, err_token, terminals) }.not_to raise_error
|
75
|
+
end
|
76
|
+
end # context
|
77
|
+
|
78
|
+
context 'Provided services:' do
|
79
|
+
it 'should emit a message' do
|
80
|
+
text = <<MSG_END
|
81
|
+
Syntax error at or near token 4 >>>-<<<
|
82
|
+
Expected one of: ['PLUS', 'LPAREN'], found a 'MINUS' instead.
|
83
|
+
MSG_END
|
84
|
+
expect(subject.to_s).to eq(text.chomp)
|
85
|
+
expect(subject.message).to eq(text.chomp)
|
86
|
+
end
|
87
|
+
end # context
|
88
|
+
end #describe
|
89
|
+
|
90
|
+
describe PrematureInputEnd do
|
91
|
+
let(:err_lexeme) { '+'}
|
92
|
+
let(:err_terminal) { Syntax::Terminal.new('PLUS') }
|
93
|
+
let(:err_token) { Token.new(err_lexeme, err_terminal) }
|
94
|
+
let(:terminals) do
|
95
|
+
['INT', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
|
96
|
+
end
|
97
|
+
|
98
|
+
# Default instantiation rule
|
99
|
+
subject { PrematureInputEnd.new(3, err_token, terminals) }
|
100
|
+
|
101
|
+
context 'Initialization:' do
|
102
|
+
it 'should be created with arguments' do
|
103
|
+
expect { PrematureInputEnd.new(3, err_token, terminals) }.not_to raise_error
|
104
|
+
end
|
105
|
+
end # context
|
106
|
+
|
107
|
+
context 'Provided services:' do
|
108
|
+
it 'should emit a message' do
|
109
|
+
text = <<MSG_END
|
110
|
+
Premature end of input after '+' at position 4
|
111
|
+
Expected one of: ['INT', 'LPAREN'].
|
112
|
+
MSG_END
|
113
|
+
expect(subject.to_s).to eq(text.chomp)
|
114
|
+
expect(subject.message).to eq(text.chomp)
|
115
|
+
end
|
116
|
+
end # context
|
117
|
+
end # describe
|
118
|
+
end # module
|
119
|
+
end # module
|
120
|
+
# End of file
|
@@ -46,17 +46,16 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
46
46
|
# from the abc grammar
|
47
47
|
let(:items_from_grammar) { build_items_for_grammar(grammar_abc) }
|
48
48
|
let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
|
49
|
-
let(:sample_tracer) { ParseTracer.new(0, output, token_seq) }
|
50
49
|
let(:sample_start_symbol) { sample_gfg.start_vertex.non_terminal }
|
51
50
|
|
52
51
|
|
53
52
|
# Default instantiation rule
|
54
|
-
subject { GFGChart.new(count_token, sample_gfg
|
53
|
+
subject { GFGChart.new(count_token, sample_gfg) }
|
55
54
|
|
56
55
|
|
57
56
|
context 'Initialization:' do
|
58
|
-
it 'should be created with start vertex, token count
|
59
|
-
expect { GFGChart.new(count_token, sample_gfg
|
57
|
+
it 'should be created with start vertex, token count' do
|
58
|
+
expect { GFGChart.new(count_token, sample_gfg) }
|
60
59
|
.not_to raise_error
|
61
60
|
end
|
62
61
|
|
@@ -64,10 +63,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
64
63
|
expect(subject.sets.size).to eq(count_token + 1)
|
65
64
|
end
|
66
65
|
|
67
|
-
it 'should reference a tracer' do
|
68
|
-
expect(subject.tracer).to eq(sample_tracer)
|
69
|
-
end
|
70
|
-
|
71
66
|
it 'should know the start symbol' do
|
72
67
|
expect(subject.start_symbol).to eq(sample_start_symbol)
|
73
68
|
end
|
@@ -83,52 +78,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
83
78
|
end
|
84
79
|
|
85
80
|
|
86
|
-
=end
|
87
|
-
end # context
|
88
|
-
|
89
|
-
context 'Provided services:' do
|
90
|
-
=begin
|
91
|
-
let(:t_a) { Syntax::Terminal.new('a') }
|
92
|
-
let(:t_b) { Syntax::Terminal.new('b') }
|
93
|
-
let(:t_c) { Syntax::Terminal.new('c') }
|
94
|
-
let(:nt_sentence) { Syntax::NonTerminal.new('sentence') }
|
95
|
-
|
96
|
-
let(:sample_prod) do
|
97
|
-
Syntax::Production.new(nt_sentence, [t_a, t_b, t_c])
|
98
|
-
end
|
99
|
-
|
100
|
-
let(:origin_val) { 3 }
|
101
|
-
let(:dotted_rule) { DottedItem.new(sample_prod, 2) }
|
102
|
-
let(:complete_rule) { DottedItem.new(sample_prod, 3) }
|
103
|
-
let(:sample_parse_state) { ParseState.new(dotted_rule, origin_val) }
|
104
|
-
let(:sample_tracer) { ParseTracer.new(1, output, token_seq) }
|
105
|
-
|
106
|
-
# Factory method.
|
107
|
-
def parse_state(origin, aDottedRule)
|
108
|
-
ParseState.new(aDottedRule, origin)
|
109
|
-
end
|
110
|
-
|
111
|
-
|
112
|
-
it 'should trace its initialization' do
|
113
|
-
subject[0] # Force constructor call here
|
114
|
-
expectation = <<-SNIPPET
|
115
|
-
['I', 'saw', 'John', 'with', 'a', 'dog']
|
116
|
-
|. I . saw . John . with . a . dog .|
|
117
|
-
|> . . . . . .| [0:0] sentence => A B . C
|
118
|
-
SNIPPET
|
119
|
-
expect(output.string).to eq(expectation)
|
120
|
-
end
|
121
|
-
|
122
|
-
it 'should trace parse state pushing' do
|
123
|
-
subject[0] # Force constructor call here
|
124
|
-
output.string = ''
|
125
|
-
|
126
|
-
subject.push_state(dotted_rule, 3, 5, :prediction)
|
127
|
-
expectation = <<-SNIPPET
|
128
|
-
|. . . > .| [3:5] sentence => A B . C
|
129
|
-
SNIPPET
|
130
|
-
expect(output.string).to eq(expectation)
|
131
|
-
end
|
132
81
|
=end
|
133
82
|
end # context
|
134
83
|
end # describe
|
@@ -7,8 +7,11 @@ require_relative '../../../lib/rley/syntax/grammar_builder'
|
|
7
7
|
require_relative '../../../lib/rley/parser/token'
|
8
8
|
require_relative '../../../lib/rley/parser/dotted_item'
|
9
9
|
require_relative '../../../lib/rley/parser/gfg_parsing'
|
10
|
+
|
11
|
+
# Load builders and lexers for sample grammars
|
10
12
|
require_relative '../support/grammar_abc_helper'
|
11
13
|
require_relative '../support/ambiguous_grammar_helper'
|
14
|
+
require_relative '../support/grammar_pb_helper'
|
12
15
|
require_relative '../support/grammar_helper'
|
13
16
|
require_relative '../support/expectation_helper'
|
14
17
|
|
@@ -68,10 +71,10 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
68
71
|
# for the language specified by grammar_expr
|
69
72
|
def grm2_tokens()
|
70
73
|
input_sequence = [
|
71
|
-
{ '2' => 'integer' },
|
72
|
-
'+',
|
74
|
+
{ '2' => 'integer' },
|
75
|
+
'+',
|
73
76
|
{ '3' => 'integer' },
|
74
|
-
'*',
|
77
|
+
'*',
|
75
78
|
{ '4' => 'integer' }
|
76
79
|
]
|
77
80
|
return build_token_sequence(input_sequence, grammar_expr)
|
@@ -178,39 +181,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
178
181
|
expect(entry_set_5.entries.size).to eq(4)
|
179
182
|
compare_entry_texts(entry_set_5, expected)
|
180
183
|
end
|
181
|
-
=begin
|
182
|
-
it 'should trace a parse with level 1' do
|
183
|
-
# Substitute temporarily $stdout by a StringIO
|
184
|
-
prev_ostream = $stdout
|
185
|
-
$stdout = StringIO.new('', 'w')
|
186
|
-
|
187
|
-
trace_level = 1
|
188
|
-
subject.parse(grm1_tokens, trace_level)
|
189
|
-
expectations = <<-SNIPPET
|
190
|
-
['a', 'a', 'b', 'c', 'c']
|
191
|
-
|. a . a . b . c . c .|
|
192
|
-
|> . . . . .| [0:0] S => . A
|
193
|
-
|> . . . . .| [0:0] A => . 'a' A 'c'
|
194
|
-
|> . . . . .| [0:0] A => . 'b'
|
195
|
-
|[---] . . . .| [0:1] A => 'a' . A 'c'
|
196
|
-
|. > . . . .| [1:1] A => . 'a' A 'c'
|
197
|
-
|. > . . . .| [1:1] A => . 'b'
|
198
|
-
|. [---] . . .| [1:2] A => 'a' . A 'c'
|
199
|
-
|. . > . . .| [2:2] A => . 'a' A 'c'
|
200
|
-
|. . > . . .| [2:2] A => . 'b'
|
201
|
-
|. . [---] . .| [2:3] A => 'b' .
|
202
|
-
|. [-------> . .| [1:3] A => 'a' A . 'c'
|
203
|
-
|. . . [---] .| [3:4] A => 'a' A 'c' .
|
204
|
-
|[---------------> .| [0:4] A => 'a' A . 'c'
|
205
|
-
|. . . . [---]| [4:5] A => 'a' A 'c' .
|
206
|
-
|[===================]| [0:5] S => A .
|
207
|
-
SNIPPET
|
208
|
-
expect($stdout.string).to eq(expectations)
|
209
|
-
|
210
|
-
# Restore standard ouput stream
|
211
|
-
$stdout = prev_ostream
|
212
|
-
end
|
213
|
-
=end
|
214
184
|
|
215
185
|
it 'should parse a valid simple expression' do
|
216
186
|
instance = GFGEarleyParser.new(grammar_expr)
|
@@ -586,40 +556,81 @@ SNIPPET
|
|
586
556
|
it 'should parse an invalid simple input' do
|
587
557
|
# Parse an erroneous input (b is missing)
|
588
558
|
wrong = build_token_sequence(%w(a a c c), grammar_abc)
|
589
|
-
|
559
|
+
parse_result = subject.parse(wrong)
|
560
|
+
expect(parse_result.success?).to eq(false)
|
590
561
|
err_msg = <<-MSG
|
591
|
-
Syntax error at or near token 3>>>c
|
562
|
+
Syntax error at or near token 3 >>>c<<<
|
592
563
|
Expected one of: ['a', 'b'], found a 'c' instead.
|
593
564
|
MSG
|
594
|
-
|
595
|
-
expect { subject.parse(wrong) }
|
596
|
-
.to raise_error(err, err_msg.chomp)
|
565
|
+
expect(parse_result.failure_reason.message).to eq(err_msg.chomp)
|
597
566
|
end
|
598
567
|
|
599
|
-
it 'should
|
600
|
-
|
601
|
-
|
602
|
-
|
603
|
-
|
604
|
-
|
605
|
-
|
568
|
+
it 'should report error when no input provided but was required' do
|
569
|
+
helper = GrammarPBHelper.new
|
570
|
+
grammar = helper.grammar
|
571
|
+
instance = GFGEarleyParser.new(grammar)
|
572
|
+
tokens = helper.tokenize('')
|
573
|
+
parse_result = instance.parse(tokens)
|
574
|
+
expect(parse_result.success?).to eq(false)
|
575
|
+
err_msg = 'Input cannot be empty.'
|
576
|
+
expect(parse_result.failure_reason.message).to eq(err_msg)
|
577
|
+
end
|
606
578
|
|
607
|
-
|
608
|
-
|
609
|
-
|
610
|
-
|
611
|
-
|
612
|
-
|
613
|
-
|
614
|
-
|
615
|
-
|
616
|
-
|
617
|
-
|
618
|
-
'
|
619
|
-
|
579
|
+
it 'should report error when input ends prematurely' do
|
580
|
+
helper = GrammarPBHelper.new
|
581
|
+
grammar = helper.grammar
|
582
|
+
instance = GFGEarleyParser.new(grammar)
|
583
|
+
tokens = helper.tokenize('1 +')
|
584
|
+
parse_result = instance.parse(tokens)
|
585
|
+
expect(parse_result.success?).to eq(false)
|
586
|
+
###################### S(0) == . 1 +
|
587
|
+
# Expectation chart[0]:
|
588
|
+
expected = [
|
589
|
+
'.S | 0', # initialization
|
590
|
+
'S => . E | 0', # start rule
|
591
|
+
'.E | 0', # call rule
|
592
|
+
'E => . int | 0', # start rule
|
593
|
+
"E => . '(' E '+' E ')' | 0", # start rule
|
594
|
+
"E => . E '+' E | 0" # start rule
|
620
595
|
]
|
621
|
-
|
622
|
-
|
596
|
+
compare_entry_texts(parse_result.chart[0], expected)
|
597
|
+
|
598
|
+
###################### S(1) == 1 . +
|
599
|
+
# Expectation chart[1]:
|
600
|
+
expected = [
|
601
|
+
'E => int . | 0', # scan '1'
|
602
|
+
'E. | 0', # exit rule
|
603
|
+
'S => E . | 0', # end rule
|
604
|
+
"E => E . '+' E | 0", # end rule
|
605
|
+
'S. | 0' # exit rule
|
606
|
+
]
|
607
|
+
compare_entry_texts(parse_result.chart[1], expected)
|
608
|
+
|
609
|
+
###################### S(2) == 1 + .
|
610
|
+
# Expectation chart[2]:
|
611
|
+
expected = [
|
612
|
+
"E => E '+' . E | 0", # scan '+'
|
613
|
+
'.E | 2', # exit rule
|
614
|
+
'E => . int | 2', # start rule
|
615
|
+
"E => . '(' E '+' E ')' | 2", # start rule
|
616
|
+
"E => . E '+' E | 2" # start rule
|
617
|
+
]
|
618
|
+
compare_entry_texts(parse_result.chart[2], expected)
|
619
|
+
|
620
|
+
err_msg = "Premature end of input after '+' at position 2"
|
621
|
+
err_msg << "\nExpected one of: ['int', '(']."
|
622
|
+
expect(parse_result.failure_reason.message).to eq(err_msg)
|
623
|
+
end
|
624
|
+
|
625
|
+
|
626
|
+
it 'should parse a common sample' do
|
627
|
+
# Use grammar based on example found in paper of
|
628
|
+
# K. Pingali and G. Bilardi:
|
629
|
+
# "A Graphical Model for Context-Free Grammar Parsing"
|
630
|
+
helper = GrammarPBHelper.new
|
631
|
+
grammar = helper.grammar
|
632
|
+
instance = GFGEarleyParser.new(grammar)
|
633
|
+
tokens = helper.tokenize('7 + 8 + 9')
|
623
634
|
parse_result = instance.parse(tokens)
|
624
635
|
expect(parse_result.success?).to eq(true)
|
625
636
|
###################### S(0) == . 7 + 8 + 9
|
@@ -53,16 +53,15 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
53
53
|
let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
|
54
54
|
|
55
55
|
let(:output) { StringIO.new('', 'w') }
|
56
|
-
let(:sample_tracer) { ParseTracer.new(0, output, grm1_tokens) }
|
57
56
|
|
58
57
|
# Default instantiation rule
|
59
58
|
subject do
|
60
|
-
GFGParsing.new(sample_gfg, grm1_tokens
|
59
|
+
GFGParsing.new(sample_gfg, grm1_tokens)
|
61
60
|
end
|
62
61
|
|
63
62
|
context 'Initialization:' do
|
64
63
|
it 'should be created with a GFG, tokens, trace' do
|
65
|
-
expect { GFGParsing.new(sample_gfg, grm1_tokens
|
64
|
+
expect { GFGParsing.new(sample_gfg, grm1_tokens) }
|
66
65
|
.not_to raise_error
|
67
66
|
end
|
68
67
|
|
@@ -0,0 +1,48 @@
|
|
1
|
+
# Load the builder class
|
2
|
+
require_relative '../../../lib/rley/syntax/grammar_builder'
|
3
|
+
require_relative '../../../lib/rley/parser/token'
|
4
|
+
|
5
|
+
|
6
|
+
# Utility class.
|
7
|
+
class GrammarPBHelper
|
8
|
+
|
9
|
+
# Factory method. Creates a grammar for a basic arithmetic
|
10
|
+
# expression based on example found in paper of
|
11
|
+
# K. Pingali and G. Bilardi:
|
12
|
+
# "A Graphical Model for Context-Free Grammar Parsing"
|
13
|
+
def grammar()
|
14
|
+
@grammar ||= begin
|
15
|
+
builder = Rley::Syntax::GrammarBuilder.new do
|
16
|
+
t_int = Rley::Syntax::Literal.new('int', /[-+]?\d+/)
|
17
|
+
t_plus = Rley::Syntax::VerbatimSymbol.new('+')
|
18
|
+
t_lparen = Rley::Syntax::VerbatimSymbol.new('(')
|
19
|
+
t_rparen = Rley::Syntax::VerbatimSymbol.new(')')
|
20
|
+
add_terminals(t_int, t_plus, t_lparen, t_rparen)
|
21
|
+
rule 'S' => 'E'
|
22
|
+
rule 'E' => 'int'
|
23
|
+
rule 'E' => %w(( E + E ))
|
24
|
+
rule 'E' => %w(E + E)
|
25
|
+
end
|
26
|
+
builder.grammar
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
# Basic expression tokenizer
|
31
|
+
def tokenize(aText)
|
32
|
+
tokens = aText.scan(/\S+/).map do |lexeme|
|
33
|
+
case lexeme
|
34
|
+
when '+', '(', ')'
|
35
|
+
terminal = @grammar.name2symbol[lexeme]
|
36
|
+
when /^[-+]?\d+$/
|
37
|
+
terminal = @grammar.name2symbol['int']
|
38
|
+
else
|
39
|
+
msg = "Unknown input text '#{lexeme}'"
|
40
|
+
raise StandardError, msg
|
41
|
+
end
|
42
|
+
Rley::Parser::Token.new(lexeme, terminal)
|
43
|
+
end
|
44
|
+
|
45
|
+
return tokens
|
46
|
+
end
|
47
|
+
end # module
|
48
|
+
# End of file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rley
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.00
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-12-
|
11
|
+
date: 2016-12-17 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -161,6 +161,7 @@ files:
|
|
161
161
|
- lib/rley/parser/chart.rb
|
162
162
|
- lib/rley/parser/dotted_item.rb
|
163
163
|
- lib/rley/parser/earley_parser.rb
|
164
|
+
- lib/rley/parser/error_reason.rb
|
164
165
|
- lib/rley/parser/gfg_chart.rb
|
165
166
|
- lib/rley/parser/gfg_earley_parser.rb
|
166
167
|
- lib/rley/parser/gfg_parsing.rb
|
@@ -183,6 +184,7 @@ files:
|
|
183
184
|
- lib/rley/ptree/parse_tree_node.rb
|
184
185
|
- lib/rley/ptree/terminal_node.rb
|
185
186
|
- lib/rley/ptree/token_range.rb
|
187
|
+
- lib/rley/rley_error.rb
|
186
188
|
- lib/rley/sppf/alternative_node.rb
|
187
189
|
- lib/rley/sppf/composite_node.rb
|
188
190
|
- lib/rley/sppf/epsilon_node.rb
|
@@ -220,6 +222,7 @@ files:
|
|
220
222
|
- spec/rley/parser/chart_spec.rb
|
221
223
|
- spec/rley/parser/dotted_item_spec.rb
|
222
224
|
- spec/rley/parser/earley_parser_spec.rb
|
225
|
+
- spec/rley/parser/error_reason_spec.rb
|
223
226
|
- spec/rley/parser/gfg_chart_spec.rb
|
224
227
|
- spec/rley/parser/gfg_earley_parser_spec.rb
|
225
228
|
- spec/rley/parser/gfg_parsing_spec.rb
|
@@ -250,6 +253,7 @@ files:
|
|
250
253
|
- spec/rley/support/grammar_b_expr_helper.rb
|
251
254
|
- spec/rley/support/grammar_helper.rb
|
252
255
|
- spec/rley/support/grammar_l0_helper.rb
|
256
|
+
- spec/rley/support/grammar_pb_helper.rb
|
253
257
|
- spec/rley/support/grammar_sppf_helper.rb
|
254
258
|
- spec/rley/syntax/grammar_builder_spec.rb
|
255
259
|
- spec/rley/syntax/grammar_spec.rb
|
@@ -308,6 +312,7 @@ test_files:
|
|
308
312
|
- spec/rley/parser/chart_spec.rb
|
309
313
|
- spec/rley/parser/dotted_item_spec.rb
|
310
314
|
- spec/rley/parser/earley_parser_spec.rb
|
315
|
+
- spec/rley/parser/error_reason_spec.rb
|
311
316
|
- spec/rley/parser/gfg_chart_spec.rb
|
312
317
|
- spec/rley/parser/gfg_earley_parser_spec.rb
|
313
318
|
- spec/rley/parser/gfg_parsing_spec.rb
|