rley 0.3.12 → 0.4.00
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +69 -5
- data/examples/NLP/mini_en_demo.rb +5 -1
- data/examples/data_formats/JSON/JSON_demo.rb +1 -0
- data/examples/general/calc/calc_demo.rb +2 -1
- data/lib/rley/constants.rb +1 -1
- data/lib/rley/parser/dotted_item.rb +1 -1
- data/lib/rley/parser/error_reason.rb +106 -0
- data/lib/rley/parser/gfg_chart.rb +1 -24
- data/lib/rley/parser/gfg_earley_parser.rb +28 -57
- data/lib/rley/parser/gfg_parsing.rb +54 -30
- data/lib/rley/ptree/token_range.rb +0 -5
- data/lib/rley/rley_error.rb +10 -0
- data/lib/rley/sppf/parse_forest.rb +7 -9
- data/spec/rley/parser/error_reason_spec.rb +120 -0
- data/spec/rley/parser/gfg_chart_spec.rb +3 -54
- data/spec/rley/parser/gfg_earley_parser_spec.rb +74 -63
- data/spec/rley/parser/gfg_parsing_spec.rb +2 -3
- data/spec/rley/support/grammar_pb_helper.rb +48 -0
- metadata +7 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: eb2c26370206f6c6eca059858ee0c8adedd32810
|
4
|
+
data.tar.gz: 77a42b3da998a2e8b073ec3a811287b71e6b3a3f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b16495b26269ee208ed3151f820a296d801ed7ca01ea9c98cf29b554da4ceba55719d67a7a7e15dc4fee9b70b54b1f08881ae0dc499b217f47db493b873af4eb
|
7
|
+
data.tar.gz: e463f9697c3cf8b012c8bc8c7736e675d6d355d3f81197bac7fb23529bb0c9e66c791d45ad833f2d6fadeb7eb2adb1a5eed6b3415292bb31fe8a02a43d2fed94
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
### 0.4.00 / 2016-12-17
|
2
|
+
* [CHANGE] Error reporting is vastly changed. Syntax errors don't raise exceptions.
|
3
|
+
parse error can be retrieved via an `ErrorReason` object. Such an object is returned by the
|
4
|
+
method `GFGParsing#failure_reason` method.
|
5
|
+
* [CHANGE] File `README.md` updated to reflect the new error reporting.
|
6
|
+
* [CHANGE] Examples updated to reflect the new error reporting.
|
7
|
+
|
1
8
|
### 0.3.12 / 2016-12-08
|
2
9
|
* [NEW] Directory `examples\general\calc`. A simple arithmetic expression demo parser.
|
3
10
|
|
data/README.md
CHANGED
@@ -64,7 +64,7 @@ Installing the latest stable version is simple:
|
|
64
64
|
|
65
65
|
## A whirlwind tour of Rley
|
66
66
|
The purpose of this section is show how to create a parser for a minimalistic
|
67
|
-
English language subset.
|
67
|
+
English language subset.
|
68
68
|
The tour is organized into the following steps:
|
69
69
|
1. [Defining the language grammar](#defining-the-language-grammar)
|
70
70
|
2. [Creating a lexicon](#creating-a-lexicon)
|
@@ -73,7 +73,7 @@ The tour is organized into the following steps:
|
|
73
73
|
5. [Parsing some input](#parsing-some-input)
|
74
74
|
6. [Generating the parse forest](#generating-the-parse-forest)
|
75
75
|
|
76
|
-
The complete source code of the tour can be found in the
|
76
|
+
The complete source code of the example used in this tour can be found in the
|
77
77
|
[examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
|
78
78
|
directory
|
79
79
|
|
@@ -86,7 +86,7 @@ The subset of English grammar is based on an example from the NLTK book.
|
|
86
86
|
# Instantiate a builder object that will build the grammar for us
|
87
87
|
builder = Rley::Syntax::GrammarBuilder.new do
|
88
88
|
# Terminal symbols (= word categories in lexicon)
|
89
|
-
add_terminals('Noun', 'Proper-Noun', 'Verb')
|
89
|
+
add_terminals('Noun', 'Proper-Noun', 'Verb')
|
90
90
|
add_terminals('Determiner', 'Preposition')
|
91
91
|
|
92
92
|
# Here we define the productions (= grammar rules)
|
@@ -97,7 +97,7 @@ The subset of English grammar is based on an example from the NLTK book.
|
|
97
97
|
rule 'VP' => %w[Verb NP]
|
98
98
|
rule 'VP' => %w[Verb NP PP]
|
99
99
|
rule 'PP' => %w[Preposition NP]
|
100
|
-
end
|
100
|
+
end
|
101
101
|
# And now, let's build the grammar...
|
102
102
|
grammar = builder.grammar
|
103
103
|
```
|
@@ -178,11 +178,75 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
|
|
178
178
|
pforest = result.parse_forest
|
179
179
|
```
|
180
180
|
|
181
|
+
## Error reporting
|
182
|
+
__Rley__ is a non-violent parser, that is, it won't throw an exception when it
|
183
|
+
detects a syntax error. Instead, the parse result will be marked as
|
184
|
+
non-successful. The parse error can then be identified by calling the
|
185
|
+
`GFGParsing#failure_reason` method. This method returns an error reason object
|
186
|
+
which can help to produce an error message.
|
187
|
+
|
188
|
+
Consider the example from the [Parsing some input](#parsing-some-input) section
|
189
|
+
above and, as an error, we delete the verb `saw` in the sentence to parse.
|
190
|
+
|
191
|
+
```ruby
|
192
|
+
# Verb has been removed from the sentence on next line
|
193
|
+
input_to_parse = 'John Mary with a telescope'
|
194
|
+
# Convert input text into a sequence of token objects...
|
195
|
+
tokens = tokenizer(input_to_parse, grammar)
|
196
|
+
result = parser.parse(tokens)
|
197
|
+
|
198
|
+
puts "Parsing successful? #{result.success?}" # => Parsing successful? false
|
199
|
+
exit(1)
|
200
|
+
```
|
201
|
+
|
202
|
+
As expected, the parse is now failing.
|
203
|
+
To get an error message, one just need to retrieve the error reason and
|
204
|
+
ask it to generate a message.
|
205
|
+
```ruby
|
206
|
+
# Show error message if parse fails...
|
207
|
+
puts result.failure_reason.message unless result.success?
|
208
|
+
```
|
209
|
+
|
210
|
+
Re-running the example with the error, result in the error message:
|
211
|
+
```
|
212
|
+
Syntax error at or near token 2 >>>Mary<<<
|
213
|
+
Expected one 'Verb', found a 'Proper-Noun' instead.
|
214
|
+
```
|
215
|
+
|
216
|
+
The standard __Rley__ message not only inform about the location of
|
217
|
+
the mistake, it also provides some hint by disclosing its expectations.
|
218
|
+
|
219
|
+
Let's experiment again with the original sentence but without the word
|
220
|
+
`telescope`.
|
221
|
+
|
222
|
+
```ruby
|
223
|
+
# Last word has been removed from the sentence on next line
|
224
|
+
input_to_parse = 'John saw Mary with a '
|
225
|
+
# Convert input text into a sequence of token objects...
|
226
|
+
tokens = tokenizer(input_to_parse, grammar)
|
227
|
+
result = parser.parse(tokens)
|
228
|
+
|
229
|
+
puts "Parsing successful? #{result.success?}" # => Parsing successful? false
|
230
|
+
unless result.success?
|
231
|
+
puts result.failure_reason.message
|
232
|
+
exit(1)
|
233
|
+
end
|
234
|
+
```
|
235
|
+
|
236
|
+
This time, the following output is displayed:
|
237
|
+
```
|
238
|
+
Parsing successful? false
|
239
|
+
Premature end of input after 'a' at position 5
|
240
|
+
Expected one 'Noun'.
|
241
|
+
```
|
242
|
+
Again, the resulting error message is user-friendly.
|
243
|
+
Remark: currently, Rley reports an error position as the index of the
|
244
|
+
input token with which the error was detected.
|
181
245
|
|
182
246
|
|
183
247
|
## Examples
|
184
248
|
|
185
|
-
The project source directory contains several example scripts that demonstrate
|
249
|
+
The project source directory contains several example scripts that demonstrate
|
186
250
|
how grammars are to be constructed and used.
|
187
251
|
|
188
252
|
|
@@ -83,7 +83,11 @@ input_to_parse = 'John saw Mary with a telescope'
|
|
83
83
|
tokens = tokenizer(input_to_parse, grammar)
|
84
84
|
result = parser.parse(tokens)
|
85
85
|
|
86
|
-
puts "Parsing successful? #{result.success?}"
|
86
|
+
puts "Parsing successful? #{result.success?}"
|
87
|
+
unless result.success?
|
88
|
+
puts result.failure_reason.message
|
89
|
+
exit(1)
|
90
|
+
end
|
87
91
|
|
88
92
|
########################################
|
89
93
|
# Step 6. Generating the parse forest
|
@@ -22,7 +22,8 @@ result = parser.parse_expression(ARGV[0])
|
|
22
22
|
|
23
23
|
unless result.success?
|
24
24
|
# Stop if the parse failed...
|
25
|
-
puts "Parsing of '#{
|
25
|
+
puts "Parsing of '#{ARGV[0]}' failed"
|
26
|
+
puts "Reason: #{result.failure_reason.message}"
|
26
27
|
exit(1)
|
27
28
|
end
|
28
29
|
|
data/lib/rley/constants.rb
CHANGED
@@ -115,7 +115,7 @@ module Rley # This module is used as a namespace
|
|
115
115
|
|
116
116
|
private
|
117
117
|
|
118
|
-
# Return the given after its validation.
|
118
|
+
# Return the given position after its validation.
|
119
119
|
def valid_position(aPosition)
|
120
120
|
rhs_size = production.rhs.size
|
121
121
|
if aPosition < 0 || aPosition > rhs_size
|
@@ -0,0 +1,106 @@
|
|
1
|
+
module Rley # Module used as a namespace
|
2
|
+
module Parser # This module is used as a namespace
|
3
|
+
# Abstract class. An instance represents an explanation describing
|
4
|
+
# the likely cause of a parse error
|
5
|
+
# detected by Rley.
|
6
|
+
class ErrorReason
|
7
|
+
# The position of the offending input token
|
8
|
+
attr_reader(:position)
|
9
|
+
|
10
|
+
# The failing production
|
11
|
+
attr_reader(:production)
|
12
|
+
|
13
|
+
def initialize(aPosition)
|
14
|
+
@position = aPosition
|
15
|
+
end
|
16
|
+
|
17
|
+
# Returns the result of invoking reason.to_s.
|
18
|
+
def message()
|
19
|
+
return self.to_s
|
20
|
+
end
|
21
|
+
|
22
|
+
# Return this reason's class name and message
|
23
|
+
def inspect
|
24
|
+
"#{self.class.name}: #{message}"
|
25
|
+
end
|
26
|
+
end # class
|
27
|
+
|
28
|
+
|
29
|
+
# This parse error occurs when no input for parsing was provided
|
30
|
+
# while the grammar requires some non-empty input.
|
31
|
+
class NoInput < ErrorReason
|
32
|
+
def initialize()
|
33
|
+
super(0)
|
34
|
+
end
|
35
|
+
|
36
|
+
# Returns the reason's message.
|
37
|
+
def to_s
|
38
|
+
'Input cannot be empty.'
|
39
|
+
end
|
40
|
+
end # class
|
41
|
+
|
42
|
+
# Abstract class and subclass of ErrorReason.
|
43
|
+
# This specialization represents errors in which the input
|
44
|
+
# didn't match one of the expected token.
|
45
|
+
class ExpectationNotMet < ErrorReason
|
46
|
+
# The last input token read when error was detected
|
47
|
+
attr_reader(:last_token)
|
48
|
+
|
49
|
+
# The terminal symbols expected when error was occurred
|
50
|
+
attr_reader(:expected_terminals)
|
51
|
+
|
52
|
+
def initialize(aPosition, lastToken, expectedTerminals)
|
53
|
+
super(aPosition)
|
54
|
+
@last_token = lastToken.dup
|
55
|
+
@expected_terminals = expectedTerminals.dup
|
56
|
+
end
|
57
|
+
|
58
|
+
protected
|
59
|
+
|
60
|
+
# Emit a text explaining the expected terminal symbols
|
61
|
+
def expectations
|
62
|
+
term_names = expected_terminals.map(&:name)
|
63
|
+
explain = 'Expected one '
|
64
|
+
explain << if expected_terminals.size > 1
|
65
|
+
"of: ['#{term_names.join("', '")}']"
|
66
|
+
else
|
67
|
+
"'#{term_names[0]}'"
|
68
|
+
end
|
69
|
+
return explain
|
70
|
+
end
|
71
|
+
|
72
|
+
end # class
|
73
|
+
|
74
|
+
|
75
|
+
# This parse error occurs when the current token from the input
|
76
|
+
# is unexpected according to the grammar rules.
|
77
|
+
class UnexpectedToken < ExpectationNotMet
|
78
|
+
# Returns the reason's message.
|
79
|
+
def to_s
|
80
|
+
err_msg = "Syntax error at or near token #{position + 1} "
|
81
|
+
err_msg << ">>>#{last_token.lexeme}<<<\n"
|
82
|
+
err_msg << expectations
|
83
|
+
err_msg << ", found a '#{last_token.terminal.name}' instead."
|
84
|
+
|
85
|
+
return err_msg
|
86
|
+
end
|
87
|
+
end # class
|
88
|
+
|
89
|
+
|
90
|
+
# This parse error occurs when all input tokens were consumed
|
91
|
+
# but the parser still expected one or more tokens from the input.
|
92
|
+
class PrematureInputEnd < ExpectationNotMet
|
93
|
+
# Returns the reason's message.
|
94
|
+
def to_s
|
95
|
+
err_msg = "Premature end of input after '#{last_token.lexeme}'"
|
96
|
+
err_msg << " at position #{position + 1}\n"
|
97
|
+
err_msg << "#{expectations}."
|
98
|
+
|
99
|
+
return err_msg
|
100
|
+
end
|
101
|
+
end # class
|
102
|
+
end # module
|
103
|
+
end # module
|
104
|
+
|
105
|
+
# End of file
|
106
|
+
|
@@ -12,17 +12,8 @@ module Rley # This module is used as a namespace
|
|
12
12
|
# An array of entry sets (one per input token + 1)
|
13
13
|
attr_reader(:sets)
|
14
14
|
|
15
|
-
# The level of trace details reported on stdout during the parse.
|
16
|
-
# The possible values are:
|
17
|
-
# 0: No trace output (default case)
|
18
|
-
# 1: Show trace of scanning and completion rules
|
19
|
-
# 2: Same as of 1 with the addition of the prediction rules
|
20
|
-
attr_reader(:tracer)
|
21
|
-
|
22
15
|
# @param tokenCount [Fixnum] The number of lexemes in the input to parse.
|
23
|
-
|
24
|
-
def initialize(tokenCount, aGFGraph, aTracer)
|
25
|
-
@tracer = aTracer
|
16
|
+
def initialize(tokenCount, aGFGraph)
|
26
17
|
@sets = Array.new(tokenCount + 1) { |_| ParseEntrySet.new }
|
27
18
|
push_entry(aGFGraph.start_vertex, 0, 0, :start_rule)
|
28
19
|
end
|
@@ -53,20 +44,6 @@ module Rley # This module is used as a namespace
|
|
53
44
|
def push_entry(aVertex, anOrigin, anIndex, aReason)
|
54
45
|
new_entry = ParseEntry.new(aVertex, anOrigin)
|
55
46
|
pushed = self[anIndex].push_entry(new_entry)
|
56
|
-
if pushed == new_entry && tracer.level > 0
|
57
|
-
case aReason
|
58
|
-
when :start_rule, :prediction
|
59
|
-
tracer.trace_prediction(anIndex, new_entry)
|
60
|
-
|
61
|
-
when :scanning
|
62
|
-
tracer.trace_scanning(anIndex, new_entry)
|
63
|
-
|
64
|
-
when :completion
|
65
|
-
tracer.trace_completion(anIndex, new_entry)
|
66
|
-
else
|
67
|
-
raise NotImplementedError, "Unknown push_entry mode #{aReason}"
|
68
|
-
end
|
69
|
-
end
|
70
47
|
|
71
48
|
return pushed
|
72
49
|
end
|
@@ -17,33 +17,34 @@ module Rley # This module is used as a namespace
|
|
17
17
|
# Parse a sequence of input tokens.
|
18
18
|
# @param aTokenSequence [Array] Array of Tokens objects returned by a
|
19
19
|
# tokenizer/scanner/lexer.
|
20
|
-
# @param aTraceLevel [Fixnum] The specified trace level.
|
21
|
-
# The possible values are:
|
22
|
-
# 0: No trace output (default case)
|
23
|
-
# 1: Show trace of scanning and completion rules
|
24
|
-
# 2: Same as of 1 with the addition of the prediction rules
|
25
20
|
# @return [Parsing] an object that embeds the parse results.
|
26
|
-
def parse(aTokenSequence
|
27
|
-
|
28
|
-
result = GFGParsing.new(gf_graph, aTokenSequence, tracer)
|
21
|
+
def parse(aTokenSequence)
|
22
|
+
result = GFGParsing.new(gf_graph, aTokenSequence)
|
29
23
|
last_token_index = aTokenSequence.size
|
24
|
+
if last_token_index == 0 && !grammar.start_symbol.nullable?
|
25
|
+
return unexpected_empty_input(result)
|
26
|
+
end
|
27
|
+
|
30
28
|
(0..last_token_index).each do |i|
|
31
|
-
handle_error(result) if result.chart[i].empty?
|
32
29
|
result.chart[i].each do |entry|
|
33
30
|
# Is entry of the form? [A => alpha . B beta, k]...
|
34
31
|
next_symbol = entry.next_symbol
|
35
32
|
if next_symbol && next_symbol.kind_of?(Syntax::NonTerminal)
|
36
33
|
# ...apply the Call rule
|
37
|
-
call_rule(result, entry, i
|
34
|
+
call_rule(result, entry, i)
|
38
35
|
end
|
39
36
|
|
40
|
-
exit_rule(result, entry, i
|
41
|
-
start_rule(result, entry, i
|
42
|
-
end_rule(result, entry, i
|
37
|
+
exit_rule(result, entry, i) if entry.exit_entry?
|
38
|
+
start_rule(result, entry, i) if entry.start_entry?
|
39
|
+
end_rule(result, entry, i) if entry.end_entry?
|
40
|
+
end
|
41
|
+
if i < last_token_index
|
42
|
+
scan_success = scan_rule(result, i)
|
43
|
+
break unless scan_success
|
43
44
|
end
|
44
|
-
scan_rule(result, i, tracer) if i < last_token_index
|
45
45
|
end
|
46
|
-
|
46
|
+
|
47
|
+
result.done # End of parsing process
|
47
48
|
return result
|
48
49
|
end
|
49
50
|
|
@@ -55,10 +56,7 @@ module Rley # This module is used as a namespace
|
|
55
56
|
# Then the entry [.B, i] is added to the current sigma set.
|
56
57
|
# Gist: when an entry expects the non-terminal symbol B, then
|
57
58
|
# add an entry with start vertex .B
|
58
|
-
def call_rule(aParsing, anEntry, aPosition
|
59
|
-
if aTracer.level > 1
|
60
|
-
puts "Chart[#{aPosition}] Call rule applied upon #{anEntry}:"
|
61
|
-
end
|
59
|
+
def call_rule(aParsing, anEntry, aPosition)
|
62
60
|
aParsing.call_rule(anEntry, aPosition)
|
63
61
|
end
|
64
62
|
|
@@ -69,10 +67,7 @@ module Rley # This module is used as a namespace
|
|
69
67
|
# is added to the current sigma set.
|
70
68
|
# Gist: for an entry corresponding to a start vertex, add an entry
|
71
69
|
# for each entry edge in the graph.
|
72
|
-
def start_rule(aParsing, anEntry, aPosition
|
73
|
-
if aTracer.level > 1
|
74
|
-
puts "Chart[#{aPosition}] Start rule applied upon #{anEntry}:"
|
75
|
-
end
|
70
|
+
def start_rule(aParsing, anEntry, aPosition)
|
76
71
|
aParsing.start_rule(anEntry, aPosition)
|
77
72
|
end
|
78
73
|
|
@@ -81,10 +76,7 @@ module Rley # This module is used as a namespace
|
|
81
76
|
# production. Then entry [B., k] is added to the current entry set.
|
82
77
|
# Gist: for an entry corresponding to a reduced production, add an entry
|
83
78
|
# for each exit edge in the graph.
|
84
|
-
def exit_rule(aParsing, anEntry, aPosition
|
85
|
-
if aTracer.level > 1
|
86
|
-
puts "Chart[#{aPosition}] Exit rule applied upon #{anEntry}:"
|
87
|
-
end
|
79
|
+
def exit_rule(aParsing, anEntry, aPosition)
|
88
80
|
aParsing.exit_rule(anEntry, aPosition)
|
89
81
|
end
|
90
82
|
|
@@ -92,10 +84,7 @@ module Rley # This module is used as a namespace
|
|
92
84
|
# is added to a parse entry set with index j.
|
93
85
|
# then for every entry of the form [A => α . B γ, i] in the kth sigma set
|
94
86
|
# the entry [A => α B . γ, i] is added to the jth sigma set.
|
95
|
-
def end_rule(aParsing, anEntry, aPosition
|
96
|
-
if aTracer.level > 1
|
97
|
-
puts "Chart[#{aPosition}] End rule applied upon #{anEntry}:"
|
98
|
-
end
|
87
|
+
def end_rule(aParsing, anEntry, aPosition)
|
99
88
|
aParsing.end_rule(anEntry, aPosition)
|
100
89
|
end
|
101
90
|
|
@@ -105,35 +94,17 @@ module Rley # This module is used as a namespace
|
|
105
94
|
# and allow them to cross the edge, adding the node on the back side
|
106
95
|
# of the edge as an entry to the next sigma set:
|
107
96
|
# add an entry to the next sigma set [A => α t . γ, i]
|
108
|
-
def scan_rule(aParsing, aPosition
|
109
|
-
if aTracer.level > 1
|
110
|
-
prefix = "Chart[#{aPosition}] Scan rule applied upon "
|
111
|
-
puts prefix + aParsing.tokens[aPosition].to_s
|
112
|
-
end
|
97
|
+
def scan_rule(aParsing, aPosition)
|
113
98
|
aParsing.scan_rule(aPosition)
|
114
99
|
end
|
100
|
+
|
101
|
+
# Parse error detected: no input tokens provided while the grammar
|
102
|
+
# forbids this this.
|
103
|
+
def unexpected_empty_input(aParsing)
|
104
|
+
aParsing.faulty(NoInput.new)
|
105
|
+
return aParsing
|
106
|
+
end
|
115
107
|
|
116
|
-
# Raise an exception to indicate a syntax error.
|
117
|
-
def handle_error(aParsing)
|
118
|
-
# Retrieve the first empty state set
|
119
|
-
pos = aParsing.chart.sets.find_index(&:empty?)
|
120
|
-
lexeme_at_pos = aParsing.tokens[pos - 1].lexeme
|
121
|
-
puts "chart index: #{pos - 1}"
|
122
|
-
terminals = aParsing.chart.sets[pos - 1].expected_terminals
|
123
|
-
puts "count expected terminals #{terminals.size}"
|
124
|
-
entries = aParsing.chart.sets[pos - 1].entries.map(&:to_s).join("\n")
|
125
|
-
puts "Items #{entries}"
|
126
|
-
term_names = terminals.map(&:name)
|
127
|
-
err_msg = "Syntax error at or near token #{pos}"
|
128
|
-
err_msg << ">>>#{lexeme_at_pos}<<<:\nExpected "
|
129
|
-
err_msg << if terminals.size > 1
|
130
|
-
"one of: ['#{term_names.join("', '")}'],"
|
131
|
-
else
|
132
|
-
": #{term_names[0]},"
|
133
|
-
end
|
134
|
-
err_msg << " found a '#{aParsing.tokens[pos - 1].terminal.name}'"
|
135
|
-
raise StandardError, err_msg + ' instead.'
|
136
|
-
end
|
137
108
|
end # class
|
138
109
|
end # module
|
139
110
|
end # module
|
@@ -1,4 +1,5 @@
|
|
1
1
|
require_relative 'gfg_chart'
|
2
|
+
require_relative 'error_reason'
|
2
3
|
require_relative 'parse_entry_tracker'
|
3
4
|
require_relative 'parse_forest_factory'
|
4
5
|
|
@@ -15,22 +16,21 @@ module Rley # This module is used as a namespace
|
|
15
16
|
# The sequence of input token to parse
|
16
17
|
attr_reader(:tokens)
|
17
18
|
|
18
|
-
# A Hash with pairs of the form:
|
19
|
+
# A Hash with pairs of the form:
|
19
20
|
# parse entry => [ antecedent parse entries ]
|
20
21
|
# It associates to a every parse entry its antecedent(s), that is,
|
21
|
-
# the parse entry/ies that causes the key parse entry to be created
|
22
|
+
# the parse entry/ies that causes the key parse entry to be created
|
22
23
|
# with one the gfg rules
|
23
24
|
attr_reader(:antecedence)
|
24
25
|
|
25
|
-
#
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
def initialize(theGFG, theTokens, aTracer)
|
26
|
+
# The reason of a parse failure
|
27
|
+
attr_reader(:failure_reason)
|
28
|
+
|
29
|
+
|
30
|
+
def initialize(theGFG, theTokens)
|
31
31
|
@gf_graph = theGFG
|
32
32
|
@tokens = theTokens.dup
|
33
|
-
@chart = GFGChart.new(tokens.size, gf_graph
|
33
|
+
@chart = GFGChart.new(tokens.size, gf_graph)
|
34
34
|
@antecedence = Hash.new { |hash, key| hash[key] = [] }
|
35
35
|
antecedence[chart[0].first]
|
36
36
|
end
|
@@ -45,7 +45,7 @@ module Rley # This module is used as a namespace
|
|
45
45
|
next_symbol = anEntry.next_symbol
|
46
46
|
start_vertex = gf_graph.start_vertex_for[next_symbol]
|
47
47
|
pos = aPosition
|
48
|
-
apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
|
48
|
+
apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
|
49
49
|
end
|
50
50
|
|
51
51
|
# Let the current sigma set be the ith parse entry set.
|
@@ -65,7 +65,7 @@ module Rley # This module is used as a namespace
|
|
65
65
|
end
|
66
66
|
|
67
67
|
# This method must be invoked when an entry is added to a parse entry set
|
68
|
-
# and is of the form [B => γ ., k] (the dot is at the end of the
|
68
|
+
# and is of the form [B => γ ., k] (the dot is at the end of the
|
69
69
|
# production. Then entry [B., k] is added to the current entry set.
|
70
70
|
# Gist: for an entry corresponding to a reduced production, add an entry
|
71
71
|
# for each exit edge in the graph.
|
@@ -96,11 +96,12 @@ module Rley # This module is used as a namespace
|
|
96
96
|
end
|
97
97
|
|
98
98
|
# Given that the terminal t is at the specified position,
|
99
|
-
# Locate all entries in the current sigma set that expect t:
|
99
|
+
# Locate all entries in the current sigma set that expect t:
|
100
100
|
# [A => α . t γ, i]
|
101
101
|
# and allow them to cross the edge, adding the node on the back side
|
102
102
|
# of the edge as an entry to the next sigma set:
|
103
103
|
# add an entry to the next sigma set [A => α t . γ, i]
|
104
|
+
# returns true if next token matches the expectations, false otherwise.
|
104
105
|
def scan_rule(aPosition)
|
105
106
|
terminal = tokens[aPosition].terminal
|
106
107
|
|
@@ -108,7 +109,10 @@ module Rley # This module is used as a namespace
|
|
108
109
|
expecting_term = chart[aPosition].entries4term(terminal)
|
109
110
|
|
110
111
|
# ... if the terminal isn't expected then we have an error
|
111
|
-
|
112
|
+
if expecting_term.empty?
|
113
|
+
unexpected_token(aPosition)
|
114
|
+
return false
|
115
|
+
end
|
112
116
|
|
113
117
|
expecting_term.each do |ntry|
|
114
118
|
# Get the vertices after the expected terminal
|
@@ -119,6 +123,8 @@ module Rley # This module is used as a namespace
|
|
119
123
|
apply_rule(ntry, vertex_after_terminal, origin, pos, :scan_rule)
|
120
124
|
end
|
121
125
|
end
|
126
|
+
|
127
|
+
return true
|
122
128
|
end
|
123
129
|
|
124
130
|
|
@@ -136,7 +142,7 @@ module Rley # This module is used as a namespace
|
|
136
142
|
end
|
137
143
|
|
138
144
|
# Factory method. Builds a ParseForest from the parse result.
|
139
|
-
# @return [ParseForest]
|
145
|
+
# @return [ParseForest]
|
140
146
|
def parse_forest()
|
141
147
|
factory = ParseForestFactory.new(self)
|
142
148
|
|
@@ -148,7 +154,7 @@ module Rley # This module is used as a namespace
|
|
148
154
|
# with origin equal to zero.
|
149
155
|
def initial_entry()
|
150
156
|
return chart.initial_entry
|
151
|
-
end
|
157
|
+
end
|
152
158
|
|
153
159
|
# Retrieve the accepting parse entry that represents
|
154
160
|
# a complete, successful parse
|
@@ -158,25 +164,43 @@ module Rley # This module is used as a namespace
|
|
158
164
|
return chart.accepting_entry
|
159
165
|
end
|
160
166
|
|
167
|
+
# Mark the parse as erroneous
|
168
|
+
def faulty(aReason)
|
169
|
+
@failure_reason = aReason
|
170
|
+
end
|
171
|
+
|
172
|
+
# A notification that the parsing reached an end
|
173
|
+
def done
|
174
|
+
unless self.success? || self.failure_reason
|
175
|
+
# Parse not successful and no reason identified
|
176
|
+
# Assuming that parse failed because of a premature end
|
177
|
+
premature_end
|
178
|
+
end
|
179
|
+
end
|
180
|
+
|
161
181
|
private
|
162
182
|
|
163
|
-
#
|
164
|
-
|
165
|
-
|
166
|
-
|
167
|
-
|
183
|
+
# Parse error detected: all input tokens were consumed and
|
184
|
+
# the parser didn't detect syntax error meanwhile but
|
185
|
+
# could not reach the accepting state.
|
186
|
+
def premature_end
|
187
|
+
token_pos = tokens.size # One-based!
|
188
|
+
last_token = tokens[-1]
|
189
|
+
entry_set = chart.sets[tokens.size]
|
190
|
+
expected = entry_set.expected_terminals
|
191
|
+
|
192
|
+
reason = PrematureInputEnd.new(token_pos - 1, last_token, expected)
|
193
|
+
faulty(reason)
|
194
|
+
end
|
168
195
|
|
196
|
+
# Parse error detected: input token doesn't match
|
197
|
+
# the expectations set by grammar rules
|
198
|
+
def unexpected_token(aPosition)
|
199
|
+
unexpected = tokens[aPosition]
|
169
200
|
expected = chart.sets[aPosition].expected_terminals
|
170
|
-
|
171
|
-
|
172
|
-
|
173
|
-
err_msg << if expected.size > 1
|
174
|
-
"one of: ['#{term_names.join("', '")}'],"
|
175
|
-
else
|
176
|
-
": #{term_names[0]},"
|
177
|
-
end
|
178
|
-
err_msg << " found a '#{actual.name}'"
|
179
|
-
raise StandardError, err_msg + ' instead.'
|
201
|
+
|
202
|
+
reason = UnexpectedToken.new(aPosition, unexpected, expected)
|
203
|
+
faulty(reason)
|
180
204
|
end
|
181
205
|
|
182
206
|
def apply_rule(antecedentEntry, aVertex, anOrigin, aPosition, aRuleId)
|
@@ -68,11 +68,6 @@ module Rley # This module is used as a namespace
|
|
68
68
|
return "[#{low_text}, #{high_text}]"
|
69
69
|
end
|
70
70
|
|
71
|
-
# Generate a String that represents a value-based identifier
|
72
|
-
def keystr()
|
73
|
-
return "#{low.object_id}-#{high.object_id}"
|
74
|
-
end
|
75
|
-
|
76
71
|
private
|
77
72
|
|
78
73
|
def assign_low(aRange)
|
@@ -4,15 +4,13 @@ require_relative 'alternative_node'
|
|
4
4
|
|
5
5
|
module Rley # This module is used as a namespace
|
6
6
|
module SPPF # This module is used as a namespace
|
7
|
-
#
|
8
|
-
# A parse
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
12
|
-
#
|
13
|
-
#
|
14
|
-
# during the parse.
|
15
|
-
# The root node corresponds to the main/start symbol of the grammar.
|
7
|
+
# In an ambiguous grammar there are valid inputs that can result in multiple
|
8
|
+
# parse trees. A set of parse trees is commonly referred to as a parse
|
9
|
+
# forest. More specifically a parse forest is a graph data
|
10
|
+
# structure designed to represent a set of equally syntactically correct
|
11
|
+
# parse trees. Parse forests generated by Rley are so-called Shared Packed
|
12
|
+
# Parse Forests (SPPF). SPPFs allow very compact representation of parse
|
13
|
+
# trees by sharing common sub-tree amongst the parse trees.
|
16
14
|
class ParseForest
|
17
15
|
# The root node of the forest
|
18
16
|
attr_reader(:root)
|
@@ -0,0 +1,120 @@
|
|
1
|
+
require_relative '../../spec_helper'
|
2
|
+
require_relative '../../../lib/rley/parser/token'
|
3
|
+
|
4
|
+
# Load the class under test
|
5
|
+
require_relative '../../../lib/rley/parser/error_reason'
|
6
|
+
module Rley # Open this namespace to avoid module qualifier prefixes
|
7
|
+
module Parser # Open this namespace to avoid module qualifier prefixes
|
8
|
+
describe NoInput do
|
9
|
+
context 'Initialization:' do
|
10
|
+
# Default instantiation rule
|
11
|
+
subject { NoInput.new }
|
12
|
+
|
13
|
+
it 'should be created without argument' do
|
14
|
+
expect { NoInput.new }.not_to raise_error
|
15
|
+
end
|
16
|
+
|
17
|
+
it 'should know the error position' do
|
18
|
+
expect(subject.position).to eq(0)
|
19
|
+
end
|
20
|
+
end # context
|
21
|
+
|
22
|
+
context 'Provided services:' do
|
23
|
+
it 'should emit a standard message' do
|
24
|
+
text = 'Input cannot be empty.'
|
25
|
+
expect(subject.to_s).to eq(text)
|
26
|
+
expect(subject.message).to eq(text)
|
27
|
+
end
|
28
|
+
|
29
|
+
it 'should give a clear inspection text' do
|
30
|
+
text = 'Rley::Parser::NoInput: Input cannot be empty.'
|
31
|
+
expect(subject.inspect).to eq(text)
|
32
|
+
end
|
33
|
+
end # context
|
34
|
+
end # describe
|
35
|
+
|
36
|
+
describe ExpectationNotMet do
|
37
|
+
let(:err_token) { double('fake-token') }
|
38
|
+
let(:terminals) do
|
39
|
+
['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
|
40
|
+
end
|
41
|
+
|
42
|
+
# Default instantiation rule
|
43
|
+
subject { ExpectationNotMet.new(3, err_token, terminals) }
|
44
|
+
|
45
|
+
context 'Initialization:' do
|
46
|
+
it 'should be created with arguments' do
|
47
|
+
expect { ExpectationNotMet.new(3, err_token, terminals) }.not_to raise_error
|
48
|
+
end
|
49
|
+
|
50
|
+
it 'should know the error position' do
|
51
|
+
expect(subject.position).to eq(3)
|
52
|
+
end
|
53
|
+
|
54
|
+
it 'should know the expected terminals' do
|
55
|
+
expect(subject.expected_terminals).to eq(terminals)
|
56
|
+
end
|
57
|
+
end # context
|
58
|
+
end # describe
|
59
|
+
|
60
|
+
|
61
|
+
describe UnexpectedToken do
|
62
|
+
let(:err_lexeme) { '-'}
|
63
|
+
let(:err_terminal) { Syntax::Terminal.new('MINUS') }
|
64
|
+
let(:err_token) { Token.new(err_lexeme, err_terminal) }
|
65
|
+
let(:terminals) do
|
66
|
+
['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
|
67
|
+
end
|
68
|
+
|
69
|
+
# Default instantiation rule
|
70
|
+
subject { UnexpectedToken.new(3, err_token, terminals) }
|
71
|
+
|
72
|
+
context 'Initialization:' do
|
73
|
+
it 'should be created with arguments' do
|
74
|
+
expect { UnexpectedToken.new(3, err_token, terminals) }.not_to raise_error
|
75
|
+
end
|
76
|
+
end # context
|
77
|
+
|
78
|
+
context 'Provided services:' do
|
79
|
+
it 'should emit a message' do
|
80
|
+
text = <<MSG_END
|
81
|
+
Syntax error at or near token 4 >>>-<<<
|
82
|
+
Expected one of: ['PLUS', 'LPAREN'], found a 'MINUS' instead.
|
83
|
+
MSG_END
|
84
|
+
expect(subject.to_s).to eq(text.chomp)
|
85
|
+
expect(subject.message).to eq(text.chomp)
|
86
|
+
end
|
87
|
+
end # context
|
88
|
+
end #describe
|
89
|
+
|
90
|
+
describe PrematureInputEnd do
|
91
|
+
let(:err_lexeme) { '+'}
|
92
|
+
let(:err_terminal) { Syntax::Terminal.new('PLUS') }
|
93
|
+
let(:err_token) { Token.new(err_lexeme, err_terminal) }
|
94
|
+
let(:terminals) do
|
95
|
+
['INT', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
|
96
|
+
end
|
97
|
+
|
98
|
+
# Default instantiation rule
|
99
|
+
subject { PrematureInputEnd.new(3, err_token, terminals) }
|
100
|
+
|
101
|
+
context 'Initialization:' do
|
102
|
+
it 'should be created with arguments' do
|
103
|
+
expect { PrematureInputEnd.new(3, err_token, terminals) }.not_to raise_error
|
104
|
+
end
|
105
|
+
end # context
|
106
|
+
|
107
|
+
context 'Provided services:' do
|
108
|
+
it 'should emit a message' do
|
109
|
+
text = <<MSG_END
|
110
|
+
Premature end of input after '+' at position 4
|
111
|
+
Expected one of: ['INT', 'LPAREN'].
|
112
|
+
MSG_END
|
113
|
+
expect(subject.to_s).to eq(text.chomp)
|
114
|
+
expect(subject.message).to eq(text.chomp)
|
115
|
+
end
|
116
|
+
end # context
|
117
|
+
end # describe
|
118
|
+
end # module
|
119
|
+
end # module
|
120
|
+
# End of file
|
@@ -46,17 +46,16 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
46
46
|
# from the abc grammar
|
47
47
|
let(:items_from_grammar) { build_items_for_grammar(grammar_abc) }
|
48
48
|
let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
|
49
|
-
let(:sample_tracer) { ParseTracer.new(0, output, token_seq) }
|
50
49
|
let(:sample_start_symbol) { sample_gfg.start_vertex.non_terminal }
|
51
50
|
|
52
51
|
|
53
52
|
# Default instantiation rule
|
54
|
-
subject { GFGChart.new(count_token, sample_gfg
|
53
|
+
subject { GFGChart.new(count_token, sample_gfg) }
|
55
54
|
|
56
55
|
|
57
56
|
context 'Initialization:' do
|
58
|
-
it 'should be created with start vertex, token count
|
59
|
-
expect { GFGChart.new(count_token, sample_gfg
|
57
|
+
it 'should be created with start vertex, token count' do
|
58
|
+
expect { GFGChart.new(count_token, sample_gfg) }
|
60
59
|
.not_to raise_error
|
61
60
|
end
|
62
61
|
|
@@ -64,10 +63,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
64
63
|
expect(subject.sets.size).to eq(count_token + 1)
|
65
64
|
end
|
66
65
|
|
67
|
-
it 'should reference a tracer' do
|
68
|
-
expect(subject.tracer).to eq(sample_tracer)
|
69
|
-
end
|
70
|
-
|
71
66
|
it 'should know the start symbol' do
|
72
67
|
expect(subject.start_symbol).to eq(sample_start_symbol)
|
73
68
|
end
|
@@ -83,52 +78,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
83
78
|
end
|
84
79
|
|
85
80
|
|
86
|
-
=end
|
87
|
-
end # context
|
88
|
-
|
89
|
-
context 'Provided services:' do
|
90
|
-
=begin
|
91
|
-
let(:t_a) { Syntax::Terminal.new('a') }
|
92
|
-
let(:t_b) { Syntax::Terminal.new('b') }
|
93
|
-
let(:t_c) { Syntax::Terminal.new('c') }
|
94
|
-
let(:nt_sentence) { Syntax::NonTerminal.new('sentence') }
|
95
|
-
|
96
|
-
let(:sample_prod) do
|
97
|
-
Syntax::Production.new(nt_sentence, [t_a, t_b, t_c])
|
98
|
-
end
|
99
|
-
|
100
|
-
let(:origin_val) { 3 }
|
101
|
-
let(:dotted_rule) { DottedItem.new(sample_prod, 2) }
|
102
|
-
let(:complete_rule) { DottedItem.new(sample_prod, 3) }
|
103
|
-
let(:sample_parse_state) { ParseState.new(dotted_rule, origin_val) }
|
104
|
-
let(:sample_tracer) { ParseTracer.new(1, output, token_seq) }
|
105
|
-
|
106
|
-
# Factory method.
|
107
|
-
def parse_state(origin, aDottedRule)
|
108
|
-
ParseState.new(aDottedRule, origin)
|
109
|
-
end
|
110
|
-
|
111
|
-
|
112
|
-
it 'should trace its initialization' do
|
113
|
-
subject[0] # Force constructor call here
|
114
|
-
expectation = <<-SNIPPET
|
115
|
-
['I', 'saw', 'John', 'with', 'a', 'dog']
|
116
|
-
|. I . saw . John . with . a . dog .|
|
117
|
-
|> . . . . . .| [0:0] sentence => A B . C
|
118
|
-
SNIPPET
|
119
|
-
expect(output.string).to eq(expectation)
|
120
|
-
end
|
121
|
-
|
122
|
-
it 'should trace parse state pushing' do
|
123
|
-
subject[0] # Force constructor call here
|
124
|
-
output.string = ''
|
125
|
-
|
126
|
-
subject.push_state(dotted_rule, 3, 5, :prediction)
|
127
|
-
expectation = <<-SNIPPET
|
128
|
-
|. . . > .| [3:5] sentence => A B . C
|
129
|
-
SNIPPET
|
130
|
-
expect(output.string).to eq(expectation)
|
131
|
-
end
|
132
81
|
=end
|
133
82
|
end # context
|
134
83
|
end # describe
|
@@ -7,8 +7,11 @@ require_relative '../../../lib/rley/syntax/grammar_builder'
|
|
7
7
|
require_relative '../../../lib/rley/parser/token'
|
8
8
|
require_relative '../../../lib/rley/parser/dotted_item'
|
9
9
|
require_relative '../../../lib/rley/parser/gfg_parsing'
|
10
|
+
|
11
|
+
# Load builders and lexers for sample grammars
|
10
12
|
require_relative '../support/grammar_abc_helper'
|
11
13
|
require_relative '../support/ambiguous_grammar_helper'
|
14
|
+
require_relative '../support/grammar_pb_helper'
|
12
15
|
require_relative '../support/grammar_helper'
|
13
16
|
require_relative '../support/expectation_helper'
|
14
17
|
|
@@ -68,10 +71,10 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
68
71
|
# for the language specified by grammar_expr
|
69
72
|
def grm2_tokens()
|
70
73
|
input_sequence = [
|
71
|
-
{ '2' => 'integer' },
|
72
|
-
'+',
|
74
|
+
{ '2' => 'integer' },
|
75
|
+
'+',
|
73
76
|
{ '3' => 'integer' },
|
74
|
-
'*',
|
77
|
+
'*',
|
75
78
|
{ '4' => 'integer' }
|
76
79
|
]
|
77
80
|
return build_token_sequence(input_sequence, grammar_expr)
|
@@ -178,39 +181,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
178
181
|
expect(entry_set_5.entries.size).to eq(4)
|
179
182
|
compare_entry_texts(entry_set_5, expected)
|
180
183
|
end
|
181
|
-
=begin
|
182
|
-
it 'should trace a parse with level 1' do
|
183
|
-
# Substitute temporarily $stdout by a StringIO
|
184
|
-
prev_ostream = $stdout
|
185
|
-
$stdout = StringIO.new('', 'w')
|
186
|
-
|
187
|
-
trace_level = 1
|
188
|
-
subject.parse(grm1_tokens, trace_level)
|
189
|
-
expectations = <<-SNIPPET
|
190
|
-
['a', 'a', 'b', 'c', 'c']
|
191
|
-
|. a . a . b . c . c .|
|
192
|
-
|> . . . . .| [0:0] S => . A
|
193
|
-
|> . . . . .| [0:0] A => . 'a' A 'c'
|
194
|
-
|> . . . . .| [0:0] A => . 'b'
|
195
|
-
|[---] . . . .| [0:1] A => 'a' . A 'c'
|
196
|
-
|. > . . . .| [1:1] A => . 'a' A 'c'
|
197
|
-
|. > . . . .| [1:1] A => . 'b'
|
198
|
-
|. [---] . . .| [1:2] A => 'a' . A 'c'
|
199
|
-
|. . > . . .| [2:2] A => . 'a' A 'c'
|
200
|
-
|. . > . . .| [2:2] A => . 'b'
|
201
|
-
|. . [---] . .| [2:3] A => 'b' .
|
202
|
-
|. [-------> . .| [1:3] A => 'a' A . 'c'
|
203
|
-
|. . . [---] .| [3:4] A => 'a' A 'c' .
|
204
|
-
|[---------------> .| [0:4] A => 'a' A . 'c'
|
205
|
-
|. . . . [---]| [4:5] A => 'a' A 'c' .
|
206
|
-
|[===================]| [0:5] S => A .
|
207
|
-
SNIPPET
|
208
|
-
expect($stdout.string).to eq(expectations)
|
209
|
-
|
210
|
-
# Restore standard ouput stream
|
211
|
-
$stdout = prev_ostream
|
212
|
-
end
|
213
|
-
=end
|
214
184
|
|
215
185
|
it 'should parse a valid simple expression' do
|
216
186
|
instance = GFGEarleyParser.new(grammar_expr)
|
@@ -586,40 +556,81 @@ SNIPPET
|
|
586
556
|
it 'should parse an invalid simple input' do
|
587
557
|
# Parse an erroneous input (b is missing)
|
588
558
|
wrong = build_token_sequence(%w(a a c c), grammar_abc)
|
589
|
-
|
559
|
+
parse_result = subject.parse(wrong)
|
560
|
+
expect(parse_result.success?).to eq(false)
|
590
561
|
err_msg = <<-MSG
|
591
|
-
Syntax error at or near token 3>>>c
|
562
|
+
Syntax error at or near token 3 >>>c<<<
|
592
563
|
Expected one of: ['a', 'b'], found a 'c' instead.
|
593
564
|
MSG
|
594
|
-
|
595
|
-
expect { subject.parse(wrong) }
|
596
|
-
.to raise_error(err, err_msg.chomp)
|
565
|
+
expect(parse_result.failure_reason.message).to eq(err_msg.chomp)
|
597
566
|
end
|
598
567
|
|
599
|
-
it 'should
|
600
|
-
|
601
|
-
|
602
|
-
|
603
|
-
|
604
|
-
|
605
|
-
|
568
|
+
it 'should report error when no input provided but was required' do
|
569
|
+
helper = GrammarPBHelper.new
|
570
|
+
grammar = helper.grammar
|
571
|
+
instance = GFGEarleyParser.new(grammar)
|
572
|
+
tokens = helper.tokenize('')
|
573
|
+
parse_result = instance.parse(tokens)
|
574
|
+
expect(parse_result.success?).to eq(false)
|
575
|
+
err_msg = 'Input cannot be empty.'
|
576
|
+
expect(parse_result.failure_reason.message).to eq(err_msg)
|
577
|
+
end
|
606
578
|
|
607
|
-
|
608
|
-
|
609
|
-
|
610
|
-
|
611
|
-
|
612
|
-
|
613
|
-
|
614
|
-
|
615
|
-
|
616
|
-
|
617
|
-
|
618
|
-
'
|
619
|
-
|
579
|
+
it 'should report error when input ends prematurely' do
|
580
|
+
helper = GrammarPBHelper.new
|
581
|
+
grammar = helper.grammar
|
582
|
+
instance = GFGEarleyParser.new(grammar)
|
583
|
+
tokens = helper.tokenize('1 +')
|
584
|
+
parse_result = instance.parse(tokens)
|
585
|
+
expect(parse_result.success?).to eq(false)
|
586
|
+
###################### S(0) == . 1 +
|
587
|
+
# Expectation chart[0]:
|
588
|
+
expected = [
|
589
|
+
'.S | 0', # initialization
|
590
|
+
'S => . E | 0', # start rule
|
591
|
+
'.E | 0', # call rule
|
592
|
+
'E => . int | 0', # start rule
|
593
|
+
"E => . '(' E '+' E ')' | 0", # start rule
|
594
|
+
"E => . E '+' E | 0" # start rule
|
620
595
|
]
|
621
|
-
|
622
|
-
|
596
|
+
compare_entry_texts(parse_result.chart[0], expected)
|
597
|
+
|
598
|
+
###################### S(1) == 1 . +
|
599
|
+
# Expectation chart[1]:
|
600
|
+
expected = [
|
601
|
+
'E => int . | 0', # scan '1'
|
602
|
+
'E. | 0', # exit rule
|
603
|
+
'S => E . | 0', # end rule
|
604
|
+
"E => E . '+' E | 0", # end rule
|
605
|
+
'S. | 0' # exit rule
|
606
|
+
]
|
607
|
+
compare_entry_texts(parse_result.chart[1], expected)
|
608
|
+
|
609
|
+
###################### S(2) == 1 + .
|
610
|
+
# Expectation chart[2]:
|
611
|
+
expected = [
|
612
|
+
"E => E '+' . E | 0", # scan '+'
|
613
|
+
'.E | 2', # exit rule
|
614
|
+
'E => . int | 2', # start rule
|
615
|
+
"E => . '(' E '+' E ')' | 2", # start rule
|
616
|
+
"E => . E '+' E | 2" # start rule
|
617
|
+
]
|
618
|
+
compare_entry_texts(parse_result.chart[2], expected)
|
619
|
+
|
620
|
+
err_msg = "Premature end of input after '+' at position 2"
|
621
|
+
err_msg << "\nExpected one of: ['int', '(']."
|
622
|
+
expect(parse_result.failure_reason.message).to eq(err_msg)
|
623
|
+
end
|
624
|
+
|
625
|
+
|
626
|
+
it 'should parse a common sample' do
|
627
|
+
# Use grammar based on example found in paper of
|
628
|
+
# K. Pingali and G. Bilardi:
|
629
|
+
# "A Graphical Model for Context-Free Grammar Parsing"
|
630
|
+
helper = GrammarPBHelper.new
|
631
|
+
grammar = helper.grammar
|
632
|
+
instance = GFGEarleyParser.new(grammar)
|
633
|
+
tokens = helper.tokenize('7 + 8 + 9')
|
623
634
|
parse_result = instance.parse(tokens)
|
624
635
|
expect(parse_result.success?).to eq(true)
|
625
636
|
###################### S(0) == . 7 + 8 + 9
|
@@ -53,16 +53,15 @@ module Rley # Open this namespace to avoid module qualifier prefixes
|
|
53
53
|
let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
|
54
54
|
|
55
55
|
let(:output) { StringIO.new('', 'w') }
|
56
|
-
let(:sample_tracer) { ParseTracer.new(0, output, grm1_tokens) }
|
57
56
|
|
58
57
|
# Default instantiation rule
|
59
58
|
subject do
|
60
|
-
GFGParsing.new(sample_gfg, grm1_tokens
|
59
|
+
GFGParsing.new(sample_gfg, grm1_tokens)
|
61
60
|
end
|
62
61
|
|
63
62
|
context 'Initialization:' do
|
64
63
|
it 'should be created with a GFG, tokens, trace' do
|
65
|
-
expect { GFGParsing.new(sample_gfg, grm1_tokens
|
64
|
+
expect { GFGParsing.new(sample_gfg, grm1_tokens) }
|
66
65
|
.not_to raise_error
|
67
66
|
end
|
68
67
|
|
@@ -0,0 +1,48 @@
|
|
1
|
+
# Load the builder class
|
2
|
+
require_relative '../../../lib/rley/syntax/grammar_builder'
|
3
|
+
require_relative '../../../lib/rley/parser/token'
|
4
|
+
|
5
|
+
|
6
|
+
# Utility class.
|
7
|
+
class GrammarPBHelper
|
8
|
+
|
9
|
+
# Factory method. Creates a grammar for a basic arithmetic
|
10
|
+
# expression based on example found in paper of
|
11
|
+
# K. Pingali and G. Bilardi:
|
12
|
+
# "A Graphical Model for Context-Free Grammar Parsing"
|
13
|
+
def grammar()
|
14
|
+
@grammar ||= begin
|
15
|
+
builder = Rley::Syntax::GrammarBuilder.new do
|
16
|
+
t_int = Rley::Syntax::Literal.new('int', /[-+]?\d+/)
|
17
|
+
t_plus = Rley::Syntax::VerbatimSymbol.new('+')
|
18
|
+
t_lparen = Rley::Syntax::VerbatimSymbol.new('(')
|
19
|
+
t_rparen = Rley::Syntax::VerbatimSymbol.new(')')
|
20
|
+
add_terminals(t_int, t_plus, t_lparen, t_rparen)
|
21
|
+
rule 'S' => 'E'
|
22
|
+
rule 'E' => 'int'
|
23
|
+
rule 'E' => %w(( E + E ))
|
24
|
+
rule 'E' => %w(E + E)
|
25
|
+
end
|
26
|
+
builder.grammar
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
# Basic expression tokenizer
|
31
|
+
def tokenize(aText)
|
32
|
+
tokens = aText.scan(/\S+/).map do |lexeme|
|
33
|
+
case lexeme
|
34
|
+
when '+', '(', ')'
|
35
|
+
terminal = @grammar.name2symbol[lexeme]
|
36
|
+
when /^[-+]?\d+$/
|
37
|
+
terminal = @grammar.name2symbol['int']
|
38
|
+
else
|
39
|
+
msg = "Unknown input text '#{lexeme}'"
|
40
|
+
raise StandardError, msg
|
41
|
+
end
|
42
|
+
Rley::Parser::Token.new(lexeme, terminal)
|
43
|
+
end
|
44
|
+
|
45
|
+
return tokens
|
46
|
+
end
|
47
|
+
end # module
|
48
|
+
# End of file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rley
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.00
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitri Geshef
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-12-
|
11
|
+
date: 2016-12-17 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -161,6 +161,7 @@ files:
|
|
161
161
|
- lib/rley/parser/chart.rb
|
162
162
|
- lib/rley/parser/dotted_item.rb
|
163
163
|
- lib/rley/parser/earley_parser.rb
|
164
|
+
- lib/rley/parser/error_reason.rb
|
164
165
|
- lib/rley/parser/gfg_chart.rb
|
165
166
|
- lib/rley/parser/gfg_earley_parser.rb
|
166
167
|
- lib/rley/parser/gfg_parsing.rb
|
@@ -183,6 +184,7 @@ files:
|
|
183
184
|
- lib/rley/ptree/parse_tree_node.rb
|
184
185
|
- lib/rley/ptree/terminal_node.rb
|
185
186
|
- lib/rley/ptree/token_range.rb
|
187
|
+
- lib/rley/rley_error.rb
|
186
188
|
- lib/rley/sppf/alternative_node.rb
|
187
189
|
- lib/rley/sppf/composite_node.rb
|
188
190
|
- lib/rley/sppf/epsilon_node.rb
|
@@ -220,6 +222,7 @@ files:
|
|
220
222
|
- spec/rley/parser/chart_spec.rb
|
221
223
|
- spec/rley/parser/dotted_item_spec.rb
|
222
224
|
- spec/rley/parser/earley_parser_spec.rb
|
225
|
+
- spec/rley/parser/error_reason_spec.rb
|
223
226
|
- spec/rley/parser/gfg_chart_spec.rb
|
224
227
|
- spec/rley/parser/gfg_earley_parser_spec.rb
|
225
228
|
- spec/rley/parser/gfg_parsing_spec.rb
|
@@ -250,6 +253,7 @@ files:
|
|
250
253
|
- spec/rley/support/grammar_b_expr_helper.rb
|
251
254
|
- spec/rley/support/grammar_helper.rb
|
252
255
|
- spec/rley/support/grammar_l0_helper.rb
|
256
|
+
- spec/rley/support/grammar_pb_helper.rb
|
253
257
|
- spec/rley/support/grammar_sppf_helper.rb
|
254
258
|
- spec/rley/syntax/grammar_builder_spec.rb
|
255
259
|
- spec/rley/syntax/grammar_spec.rb
|
@@ -308,6 +312,7 @@ test_files:
|
|
308
312
|
- spec/rley/parser/chart_spec.rb
|
309
313
|
- spec/rley/parser/dotted_item_spec.rb
|
310
314
|
- spec/rley/parser/earley_parser_spec.rb
|
315
|
+
- spec/rley/parser/error_reason_spec.rb
|
311
316
|
- spec/rley/parser/gfg_chart_spec.rb
|
312
317
|
- spec/rley/parser/gfg_earley_parser_spec.rb
|
313
318
|
- spec/rley/parser/gfg_parsing_spec.rb
|