lrama 0.6.10 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/gh-pages.yml +46 -0
- data/.github/workflows/test.yaml +40 -8
- data/.gitignore +1 -0
- data/.rdoc_options +2 -0
- data/Gemfile +4 -2
- data/NEWS.md +125 -30
- data/README.md +44 -15
- data/Rakefile +13 -1
- data/Steepfile +5 -0
- data/doc/Index.md +58 -0
- data/doc/development/compressed_state_table/main.md +635 -0
- data/doc/development/compressed_state_table/parse.output +174 -0
- data/doc/development/compressed_state_table/parse.y +22 -0
- data/doc/development/compressed_state_table/parser.rb +282 -0
- data/lib/lrama/bitmap.rb +4 -1
- data/lib/lrama/command.rb +2 -1
- data/lib/lrama/context.rb +3 -3
- data/lib/lrama/counterexamples/derivation.rb +6 -5
- data/lib/lrama/counterexamples/example.rb +7 -4
- data/lib/lrama/counterexamples/path.rb +4 -0
- data/lib/lrama/counterexamples.rb +19 -9
- data/lib/lrama/digraph.rb +30 -0
- data/lib/lrama/grammar/binding.rb +47 -15
- data/lib/lrama/grammar/parameterizing_rule/rhs.rb +1 -1
- data/lib/lrama/grammar/rule.rb +8 -0
- data/lib/lrama/grammar/rule_builder.rb +4 -16
- data/lib/lrama/grammar/symbols/resolver.rb +4 -0
- data/lib/lrama/grammar.rb +10 -5
- data/lib/lrama/lexer/grammar_file.rb +8 -1
- data/lib/lrama/lexer/location.rb +17 -1
- data/lib/lrama/lexer/token/char.rb +1 -0
- data/lib/lrama/lexer/token/ident.rb +1 -0
- data/lib/lrama/lexer/token/instantiate_rule.rb +6 -1
- data/lib/lrama/lexer/token/tag.rb +3 -1
- data/lib/lrama/lexer/token/user_code.rb +6 -2
- data/lib/lrama/lexer/token.rb +14 -2
- data/lib/lrama/lexer.rb +5 -5
- data/lib/lrama/logger.rb +4 -0
- data/lib/lrama/option_parser.rb +10 -8
- data/lib/lrama/options.rb +2 -1
- data/lib/lrama/parser.rb +529 -490
- data/lib/lrama/state/reduce.rb +2 -3
- data/lib/lrama/state.rb +288 -1
- data/lib/lrama/states/item.rb +8 -0
- data/lib/lrama/states.rb +69 -2
- data/lib/lrama/trace_reporter.rb +17 -2
- data/lib/lrama/version.rb +1 -1
- data/lrama.gemspec +1 -1
- data/parser.y +42 -30
- data/rbs_collection.lock.yaml +10 -2
- data/sig/generated/lrama/bitmap.rbs +11 -0
- data/sig/generated/lrama/digraph.rbs +39 -0
- data/sig/generated/lrama/grammar/binding.rbs +34 -0
- data/sig/generated/lrama/lexer/grammar_file.rbs +28 -0
- data/sig/generated/lrama/lexer/location.rbs +52 -0
- data/sig/{lrama → generated/lrama}/lexer/token/char.rbs +2 -0
- data/sig/{lrama → generated/lrama}/lexer/token/ident.rbs +2 -0
- data/sig/{lrama → generated/lrama}/lexer/token/instantiate_rule.rbs +8 -0
- data/sig/{lrama → generated/lrama}/lexer/token/tag.rbs +3 -0
- data/sig/{lrama → generated/lrama}/lexer/token/user_code.rbs +6 -1
- data/sig/{lrama → generated/lrama}/lexer/token.rbs +26 -3
- data/sig/generated/lrama/logger.rbs +14 -0
- data/sig/generated/lrama/trace_reporter.rbs +25 -0
- data/sig/lrama/counterexamples/derivation.rbs +33 -0
- data/sig/lrama/counterexamples/example.rbs +45 -0
- data/sig/lrama/counterexamples/path.rbs +21 -0
- data/sig/lrama/counterexamples/production_path.rbs +11 -0
- data/sig/lrama/counterexamples/start_path.rbs +13 -0
- data/sig/lrama/counterexamples/state_item.rbs +10 -0
- data/sig/lrama/counterexamples/transition_path.rbs +11 -0
- data/sig/lrama/counterexamples/triple.rbs +20 -0
- data/sig/lrama/counterexamples.rbs +29 -0
- data/sig/lrama/grammar/rule_builder.rbs +0 -1
- data/sig/lrama/grammar/symbol.rbs +1 -1
- data/sig/lrama/grammar/symbols/resolver.rbs +3 -3
- data/sig/lrama/grammar.rbs +13 -0
- data/sig/lrama/options.rbs +1 -0
- data/sig/lrama/state/reduce_reduce_conflict.rbs +2 -2
- data/sig/lrama/state.rbs +79 -0
- data/sig/lrama/states.rbs +101 -0
- metadata +34 -14
- data/sig/lrama/bitmap.rbs +0 -7
- data/sig/lrama/digraph.rbs +0 -23
- data/sig/lrama/grammar/binding.rbs +0 -19
- data/sig/lrama/lexer/grammar_file.rbs +0 -17
- data/sig/lrama/lexer/location.rbs +0 -26
data/doc/Index.md
ADDED
@@ -0,0 +1,58 @@
|
|
1
|
+
# Lrama
|
2
|
+
|
3
|
+
[](https://badge.fury.io/rb/lrama)
|
4
|
+
[](https://github.com/ruby/lrama/actions/workflows/test.yaml)
|
5
|
+
|
6
|
+
|
7
|
+
## Overview
|
8
|
+
|
9
|
+
Lrama is LALR (1) parser generator written by Ruby. The first goal of this project is providing error tolerant parser for CRuby with minimal changes on CRuby parse.y file.
|
10
|
+
|
11
|
+
## Installation
|
12
|
+
|
13
|
+
Lrama's installation is simple. You can install it via RubyGems.
|
14
|
+
|
15
|
+
```shell
|
16
|
+
$ gem install lrama
|
17
|
+
```
|
18
|
+
|
19
|
+
From source codes, you can install it as follows:
|
20
|
+
|
21
|
+
```shell
|
22
|
+
$ cd "$(lrama root)"
|
23
|
+
$ bundle install
|
24
|
+
$ bundle exec rake install
|
25
|
+
$ bundle exec lrama --version
|
26
|
+
lrama 0.7.0
|
27
|
+
```
|
28
|
+
## Usage
|
29
|
+
|
30
|
+
Lrama is a command line tool. You can generate a parser from a grammar file by running `lrama` command.
|
31
|
+
|
32
|
+
```shell
|
33
|
+
# "y.tab.c" and "y.tab.h" are generated
|
34
|
+
$ lrama -d sample/parse.y
|
35
|
+
```
|
36
|
+
Specify the output file with `-o` option. The following example generates "calc.c" and "calc.h".
|
37
|
+
|
38
|
+
```shell
|
39
|
+
# "calc", "calc.c", and "calc.h" are generated
|
40
|
+
$ lrama -d sample/calc.y -o calc.c && gcc -Wall calc.c -o calc && ./calc
|
41
|
+
Enter the formula:
|
42
|
+
1
|
43
|
+
=> 1
|
44
|
+
1+2*3
|
45
|
+
=> 7
|
46
|
+
(1+2)*3
|
47
|
+
=> 9
|
48
|
+
```
|
49
|
+
|
50
|
+
## Supported Ruby version
|
51
|
+
|
52
|
+
Lrama is executed with BASERUBY when building ruby from source code. Therefore Lrama needs to support BASERUBY, currently 2.5, or later version.
|
53
|
+
|
54
|
+
This also requires Lrama to be able to run with only default gems because BASERUBY runs with `--disable=gems` option.
|
55
|
+
|
56
|
+
## License
|
57
|
+
|
58
|
+
See [LEGAL.md](https://github.com/ruby/lrama/blob/master/LEGAL.md) file.
|
@@ -0,0 +1,635 @@
|
|
1
|
+
# Compressed State Table
|
2
|
+
|
3
|
+
LR parser generates two large tables, action table and GOTO table.
|
4
|
+
Action table is a matrix of states and tokens. Each cell of action table indicates next action (shift, reduce, accept and error).
|
5
|
+
GOTO table is a matrix of states and nonterminal symbols. Each cell of GOTO table indicates next state.
|
6
|
+
|
7
|
+
Action table of "parse.y":
|
8
|
+
|
9
|
+
| |EOF| LF|NUM|'+'|'*'|'('|')'|
|
10
|
+
|--------|--:|--:|--:|--:|--:|--:|--:|
|
11
|
+
|State 0| r1| | s1| | | s2| |
|
12
|
+
|State 1| r3| r3| r3| r3| r3| r3| r3|
|
13
|
+
|State 2| | | s1| | | s2| |
|
14
|
+
|State 3| s6| | | | | | |
|
15
|
+
|State 4| | s7| | s8| s9| | |
|
16
|
+
|State 5| | | | s8| s9| |s10|
|
17
|
+
|State 6|acc|acc|acc|acc|acc|acc|acc|
|
18
|
+
|State 7| r2| r2| r2| r2| r2| r2| r2|
|
19
|
+
|State 8| | | s1| | | s2| |
|
20
|
+
|State 9| | | s1| | | s2| |
|
21
|
+
|State 10| r6| r6| r6| r6| r6| r6| r6|
|
22
|
+
|State 11| | r4| | r4| s9| | r4|
|
23
|
+
|State 12| | r5| | r5| r5| | r5|
|
24
|
+
|
25
|
+
GOTO table of "parse.y":
|
26
|
+
|
27
|
+
| |$accept|program|expr|
|
28
|
+
|--------|------:|------:|---:|
|
29
|
+
|State 0| | g3| g4|
|
30
|
+
|State 1| | | |
|
31
|
+
|State 2| | | g5|
|
32
|
+
|State 3| | | |
|
33
|
+
|State 4| | | |
|
34
|
+
|State 5| | | |
|
35
|
+
|State 6| | | |
|
36
|
+
|State 7| | | |
|
37
|
+
|State 8| | | g11|
|
38
|
+
|State 9| | | g12|
|
39
|
+
|State 10| | | |
|
40
|
+
|State 11| | | |
|
41
|
+
|State 12| | | |
|
42
|
+
|
43
|
+
|
44
|
+
Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables.
|
45
|
+
|
46
|
+
* `yypact` & `yypgoto`
|
47
|
+
* `yytable`
|
48
|
+
* `yycheck`
|
49
|
+
* `yydefact` & `yydefgoto`
|
50
|
+
|
51
|
+
## Introduction to major tables
|
52
|
+
|
53
|
+
### `yypact` & `yypgoto`
|
54
|
+
|
55
|
+
`yypact` specifies offset on `yytable` for the current state.
|
56
|
+
As an optimization, `yypact` also specifies default reduce action for some states.
|
57
|
+
Accessing the value by `state`. For example,
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
offset = yypact[state]
|
61
|
+
```
|
62
|
+
|
63
|
+
If the value is `YYPACT_NINF` (Negative INFinity), it means execution of default reduce action.
|
64
|
+
Otherwise the value is an offset in `yytable`.
|
65
|
+
|
66
|
+
`yypgoto` plays the same role as `yypact`.
|
67
|
+
But `yypgoto` is used for GOTO table.
|
68
|
+
Then its index is nonterminal symbol id.
|
69
|
+
Especially `yypgoto` is used when reduce happens.
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
rule_for_reduce = rules[rule_id]
|
73
|
+
|
74
|
+
# lhs_id holds LHS nonterminal id of the rule used for reduce.
|
75
|
+
lhs_id = rule_for_reduce.lhs.id
|
76
|
+
|
77
|
+
offset = yypgoto[lhs_id]
|
78
|
+
|
79
|
+
# Validate access to yytable
|
80
|
+
if yycheck[offset + state] == state
|
81
|
+
next_state = yytable[offset + state]
|
82
|
+
end
|
83
|
+
```
|
84
|
+
|
85
|
+
### `yytable`
|
86
|
+
|
87
|
+
`yytable` is a mixture of action table and GOTO table.
|
88
|
+
|
89
|
+
#### For action table
|
90
|
+
|
91
|
+
For action table, `yytable` specifies what actually to do on the current state.
|
92
|
+
|
93
|
+
Positive number means shift and specifies next state.
|
94
|
+
For example, `yytable[yyn] == 1` means shift and next state is State 1.
|
95
|
+
|
96
|
+
`YYTABLE_NINF` (Negative INFinity) means syntax error.
|
97
|
+
For example, `yytable[yyn] == YYTABLE_NINF` means syntax error.
|
98
|
+
|
99
|
+
Other negative number and zero mean reducing with the rule whose number is opposite.
|
100
|
+
For example, `yytable[yyn] == -1` means reduce with Rule 1.
|
101
|
+
|
102
|
+
#### For GOTO table
|
103
|
+
|
104
|
+
For GOTO table, `yytable` specifies the next state for given LSH nonterminal.
|
105
|
+
|
106
|
+
The value is always positive number which means next state id.
|
107
|
+
It never becomes `YYTABLE_NINF`.
|
108
|
+
|
109
|
+
### `yycheck`
|
110
|
+
|
111
|
+
`yycheck` validates accesses to `yytable`.
|
112
|
+
|
113
|
+
Each line of action table and GOTO table is placed into single array in `yytable`.
|
114
|
+
Consider the case where action table has only two states.
|
115
|
+
In this case, if the second array is shifted to the right, they can be merged into one array without conflict.
|
116
|
+
|
117
|
+
```ruby
|
118
|
+
[
|
119
|
+
[ 'a', 'b', , , 'e'], # State 0
|
120
|
+
[ , 'B', 'C', , 'E'], # State 1
|
121
|
+
]
|
122
|
+
|
123
|
+
# => Shift the second array to the right
|
124
|
+
|
125
|
+
[
|
126
|
+
[ 'a', 'b', , , 'e'], # State 0
|
127
|
+
[ , 'B', 'C', , 'E'], # State 1
|
128
|
+
]
|
129
|
+
|
130
|
+
# => Merge them into single array
|
131
|
+
|
132
|
+
yytable = [
|
133
|
+
'a', 'b', 'B', 'C', 'e', 'E'
|
134
|
+
]
|
135
|
+
```
|
136
|
+
|
137
|
+
`yypact` is an array of each state offset.
|
138
|
+
|
139
|
+
```ruby
|
140
|
+
yypact = [
|
141
|
+
0, # State 0 is not shifted
|
142
|
+
1 # State 1 is shifted one to right
|
143
|
+
]
|
144
|
+
```
|
145
|
+
|
146
|
+
We can access the value of `state1[2]` by consulting `yypact`.
|
147
|
+
|
148
|
+
```ruby
|
149
|
+
yytable[yypact[1] + 2]
|
150
|
+
# => yytable[1 + 2]
|
151
|
+
# => 'C'
|
152
|
+
```
|
153
|
+
|
154
|
+
However this approach doesn't work well when accessing to nil value like `state1[3]`.
|
155
|
+
Because it tries to access to `state0[4]`.
|
156
|
+
|
157
|
+
```ruby
|
158
|
+
yytable[yypact[1] + 3]
|
159
|
+
# => yytable[1 + 3]
|
160
|
+
# => 'e'
|
161
|
+
```
|
162
|
+
|
163
|
+
This is why `yycheck` is needed.
|
164
|
+
`yycheck` stores valid indexes of the original table.
|
165
|
+
In the current example:
|
166
|
+
|
167
|
+
* 0, 1 and 4 are valid index of State 0
|
168
|
+
* 1, 2 and 4 are valid index of State 1
|
169
|
+
|
170
|
+
`yycheck` stores these indexes with same offset with `yytable`.
|
171
|
+
|
172
|
+
```ruby
|
173
|
+
# yytable
|
174
|
+
[
|
175
|
+
[ 'a', 'b', , , 'e'], # State 0
|
176
|
+
[ , 'B', 'C', , 'E'], # State 1
|
177
|
+
]
|
178
|
+
|
179
|
+
yytable = [
|
180
|
+
'a', 'b', 'B', 'C', 'e', 'E'
|
181
|
+
]
|
182
|
+
|
183
|
+
# yycheck
|
184
|
+
[
|
185
|
+
[ 0, 1, , , 4], # State 0
|
186
|
+
[ , 1, 2, , 4], # State 1
|
187
|
+
]
|
188
|
+
|
189
|
+
yycheck = [
|
190
|
+
0, 1, 1, 2, 4, 4
|
191
|
+
]
|
192
|
+
```
|
193
|
+
|
194
|
+
We can validate accesses to `yytable` by consulting `yycheck`.
|
195
|
+
`yycheck` stores valid indexes in the original arrays then validation is comparing `yycheck[index_for_yytable]` and `index_for_the_state`.
|
196
|
+
The access is valid if both values are same.
|
197
|
+
|
198
|
+
```ruby
|
199
|
+
# Validate an access to state1[2]
|
200
|
+
yycheck[yypact[1] + 2] == 2
|
201
|
+
# => yycheck[1 + 2] == 2
|
202
|
+
# => 2 == 2
|
203
|
+
# => true (valid)
|
204
|
+
|
205
|
+
# Validate an access to state1[3]
|
206
|
+
yycheck[yypact[1] + 3] == 3
|
207
|
+
# => yycheck[1 + 3] == 3
|
208
|
+
# => 4 == 3
|
209
|
+
# => false (invalid)
|
210
|
+
```
|
211
|
+
|
212
|
+
### `yydefact` & `yydefgoto`
|
213
|
+
|
214
|
+
`yydefact` stores rule id of default actions for each state.
|
215
|
+
`0` means syntax error, other number means reduce using Rule N.
|
216
|
+
|
217
|
+
```ruby
|
218
|
+
rule_id = yydefact[state]
|
219
|
+
# => 0 means syntax error, other number means reduce using Rule whose id is `rule_id`
|
220
|
+
```
|
221
|
+
|
222
|
+
`yydefgoto` stores default GOTOs for each nonterminal.
|
223
|
+
The number means next state.
|
224
|
+
|
225
|
+
```ruby
|
226
|
+
next_state = yydefgoto[lhs_id]
|
227
|
+
# => Next state id is `next_state`
|
228
|
+
```
|
229
|
+
|
230
|
+
## Example
|
231
|
+
|
232
|
+
Take a look at compressed tables of "parse.y".
|
233
|
+
See "parse.output" for detailed information of symbols and states.
|
234
|
+
|
235
|
+
### `yytable`
|
236
|
+
|
237
|
+
Original action table and GOTO table look like:
|
238
|
+
|
239
|
+
```ruby
|
240
|
+
# Action table is a matrix of terminals * states
|
241
|
+
[
|
242
|
+
# [ EOF, error, undef, LF, NUM, '+', '*', '(', ')'] (default reduce)
|
243
|
+
[ , , , , s1, , , s2, ], # State 0 (r1)
|
244
|
+
[ , , , , , , , , ], # State 1 (r3)
|
245
|
+
[ , , , , s1, , , s2, ], # State 2 ()
|
246
|
+
[ s6, , , , , , , , ], # State 3 ()
|
247
|
+
[ , , , s7, , s8, s9, , ], # State 4 ()
|
248
|
+
[ , , , , , s8, s9, , s10], # State 5 ()
|
249
|
+
[ , , , , , , , , ], # State 6 (accept)
|
250
|
+
[ , , , , , , , , ], # State 7 (r2)
|
251
|
+
[ , , , , s1, , , s2, ], # State 8 ()
|
252
|
+
[ , , , , s1, , , s2, ], # State 9 ()
|
253
|
+
[ , , , , , , , , ], # State 10 (r6)
|
254
|
+
[ , , , , , , s9, , ], # State 11 (r4)
|
255
|
+
[ , , , , , , , , ], # State 12 (r5)
|
256
|
+
]
|
257
|
+
|
258
|
+
# GOTO table is a matrix of states * nonterminals
|
259
|
+
[
|
260
|
+
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto)
|
261
|
+
[ , , , , , , , , , , , , ], # $accept (g0)
|
262
|
+
[ g3, , , , , , , , , , , , ], # program (g3)
|
263
|
+
[ g4, , g5, , , , , , g11, g12, , , ], # expr (g4)
|
264
|
+
]
|
265
|
+
|
266
|
+
# => Remove default goto
|
267
|
+
|
268
|
+
[
|
269
|
+
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] State No (default goto)
|
270
|
+
[ , , , , , , , , , , , , ], # $accept (g0)
|
271
|
+
[ , , , , , , , , , , , , ], # program (g3)
|
272
|
+
[ , , g5, , , , , , g11, g12, , , ], # expr (g4)
|
273
|
+
]
|
274
|
+
```
|
275
|
+
|
276
|
+
These are compressed to `yytable` like below.
|
277
|
+
If offset equals to `YYPACT_NINF`, the line has only default value then the line can be ignored (commented out in this example).
|
278
|
+
|
279
|
+
```ruby
|
280
|
+
[
|
281
|
+
# Action table
|
282
|
+
# (offset, YYPACT_NINF = -4)
|
283
|
+
[ , , , , s1, , , s2, ], # State 0 ( 6)
|
284
|
+
# [ , , , , , , , , ], # State 1 (-4)
|
285
|
+
[ , , , , s1, , , s2, ], # State 2 ( 6)
|
286
|
+
[ s6, , , , , , , , ], # State 3 ( 1)
|
287
|
+
[ , , , s7, , s8, s9, , ], # State 4 (-1)
|
288
|
+
[ , , , , , s8, s9, , s10], # State 5 ( 3)
|
289
|
+
# [ , , , , , , , , ], # State 6 (-4)
|
290
|
+
# [ , , , , , , , , ], # State 7 (-4)
|
291
|
+
[ , , , , s1, , , s2, ], # State 8 ( 6)
|
292
|
+
[ , , , , s1, , , s2, ], # State 9 ( 6)
|
293
|
+
# [ , , , , , , , , ], # State 10 (-4)
|
294
|
+
[ , , , , , , s9, , ], # State 11 (-3)
|
295
|
+
# [ , , , , , , , , ], # State 12 (-4)
|
296
|
+
|
297
|
+
# GOTO table
|
298
|
+
# [ , , , , , , , , , , , , ], # $accept (-4)
|
299
|
+
# [ , , , , , , , , , , , , ], # program (-4)
|
300
|
+
[ , , g5, , , , , , g11, g12, , , ], # expr (-2)
|
301
|
+
]
|
302
|
+
|
303
|
+
# => compressed into single array
|
304
|
+
[ , , , g5, s6, s7, s9, s8, s9, g11, g12, s8, s9, s1, s10, , s2, ]
|
305
|
+
|
306
|
+
# => Cut blank cells on head and tail, remove 'g' and 's' prefix, fill blank with 0
|
307
|
+
# This is `yytable`
|
308
|
+
[ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2]
|
309
|
+
```
|
310
|
+
|
311
|
+
`YYTABLE_NINF` is the minimum negative number.
|
312
|
+
In this case, `0` is the minimum offset number then `YYTABLE_NINF` is `-1`.
|
313
|
+
|
314
|
+
### `yycheck`
|
315
|
+
|
316
|
+
```ruby
|
317
|
+
[
|
318
|
+
# Action table valid indexes
|
319
|
+
# (offset, YYPACT_NINF = -4)
|
320
|
+
[ , , , , 4, , , 7, ], # State 0 ( 6)
|
321
|
+
# [ , , , , , , , , ], # State 1 (-4)
|
322
|
+
[ , , , , 4, , , 7, ], # State 2 ( 6)
|
323
|
+
[ 0, , , , , , , , ], # State 3 ( 1)
|
324
|
+
[ , , , 3, , 5, 6, , ], # State 4 (-1)
|
325
|
+
[ , , , , , 5, 6, , 8], # State 5 ( 3)
|
326
|
+
# [ , , , , , , , , ], # State 6 (-4)
|
327
|
+
# [ , , , , , , , , ], # State 7 (-4)
|
328
|
+
[ , , , , 4, , , 7, ], # State 8 ( 6)
|
329
|
+
[ , , , , 4, , , 7, ], # State 9 ( 6)
|
330
|
+
# [ , , , , , , , , ], # State 10 (-4)
|
331
|
+
[ , , , , , , 6, , ], # State 11 (-3)
|
332
|
+
# [ , , , , , , , , ], # State 12 (-4)
|
333
|
+
|
334
|
+
# GOTO table valid indexes
|
335
|
+
# [ , , , , , , , , , , , , ], # $accept (-4)
|
336
|
+
# [ , , , , , , , , , , , , ], # program (-4)
|
337
|
+
[ , , 2, , , , , , 8, 9, , , ], # expr (-2)
|
338
|
+
]
|
339
|
+
|
340
|
+
# => compressed into single array
|
341
|
+
[ , , , 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, , 7, ]
|
342
|
+
|
343
|
+
# => Cut blank cells on head and tail, fill blank with -1 because no index can be -1 and comparison always fails
|
344
|
+
# This is `yycheck`
|
345
|
+
[ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7]
|
346
|
+
```
|
347
|
+
|
348
|
+
### `yypact` & `yypgoto`
|
349
|
+
|
350
|
+
`yypact` & `yypgoto` are mixture of offset in `yytable` and `YYPACT_NINF` (default reduce action).
|
351
|
+
Index in `yypact` is state id and index in `yypgoto` is nonterminal symbol id.
|
352
|
+
`YYPACT_NINF` is the minimum negative number.
|
353
|
+
In this case, `-3` is the minimum offset number then `YYPACT_NINF` is `-4`.
|
354
|
+
|
355
|
+
```ruby
|
356
|
+
YYPACT_NINF = -4
|
357
|
+
|
358
|
+
yypact = [
|
359
|
+
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No)
|
360
|
+
6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4
|
361
|
+
]
|
362
|
+
|
363
|
+
yypgoto = [
|
364
|
+
# $accept, program, expr
|
365
|
+
-4, -4, -2
|
366
|
+
]
|
367
|
+
```
|
368
|
+
|
369
|
+
### `yydefact` & `yydefgoto`
|
370
|
+
|
371
|
+
`yydefact` & `yydefgoto` store default value.
|
372
|
+
|
373
|
+
`yydefact` specifies rule id of default actions of the state.
|
374
|
+
Because `0` is reserved for syntax error, Rule id starts with 1.
|
375
|
+
|
376
|
+
```
|
377
|
+
# In "parse.output"
|
378
|
+
Grammar
|
379
|
+
|
380
|
+
0 $accept: program "end of file"
|
381
|
+
|
382
|
+
1 program: ε
|
383
|
+
2 | expr LF
|
384
|
+
|
385
|
+
3 expr: NUM
|
386
|
+
4 | expr '+' expr
|
387
|
+
5 | expr '*' expr
|
388
|
+
6 | '(' expr ')'
|
389
|
+
|
390
|
+
# =>
|
391
|
+
|
392
|
+
# In `yydefact`
|
393
|
+
Grammar
|
394
|
+
|
395
|
+
0 Syntax Error
|
396
|
+
|
397
|
+
1 $accept: program "end of file"
|
398
|
+
|
399
|
+
2 program: ε
|
400
|
+
3 | expr LF
|
401
|
+
|
402
|
+
4 expr: NUM
|
403
|
+
5 | expr '+' expr
|
404
|
+
6 | expr '*' expr
|
405
|
+
7 | '(' expr ')'
|
406
|
+
```
|
407
|
+
|
408
|
+
For example, default action for state 1 is 4 (`yydefact[1] == 4`).
|
409
|
+
This means Rule 3 (`3 expr: NUM`) in "parse.output" file.
|
410
|
+
|
411
|
+
`yydefgoto` specifies next state id of the nonterminal.
|
412
|
+
|
413
|
+
```ruby
|
414
|
+
yydefact = [
|
415
|
+
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (State No)
|
416
|
+
2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6
|
417
|
+
]
|
418
|
+
|
419
|
+
yydefgoto = [
|
420
|
+
# $accept, program, expr
|
421
|
+
0, 3, 4
|
422
|
+
]
|
423
|
+
```
|
424
|
+
|
425
|
+
### `yyr1` & `yyr2`
|
426
|
+
|
427
|
+
Both of them are tables for rules.
|
428
|
+
`yyr1` specifies nonterminal symbol id of rule's Left-Hand-Side.
|
429
|
+
`yyr2` specifies the length of the rule, that is, number of symbols on the rule's Right-Hand-Side.
|
430
|
+
Index 0 is not used because Rule id starts with 1.
|
431
|
+
|
432
|
+
```ruby
|
433
|
+
yyr1 = [
|
434
|
+
# 0, 1, 2, 3, 4, 5, 6, 7 (Rule id)
|
435
|
+
# no rule, $accept, program, program, expr, expr, expr, expr (LHS symbol id)
|
436
|
+
0, 9, 10, 10, 11, 11, 11, 11
|
437
|
+
]
|
438
|
+
|
439
|
+
yyr2 = [
|
440
|
+
# 0, 1, 2, 3, 4, 5, 6, 7 (Rule id)
|
441
|
+
0, 2, 0, 2, 1, 3, 3, 3
|
442
|
+
]
|
443
|
+
```
|
444
|
+
|
445
|
+
## How to use tables
|
446
|
+
|
447
|
+
See also "parse.rb" which implements LALR parser based on "parse.y" file.
|
448
|
+
|
449
|
+
At first, define important constants and arrays:
|
450
|
+
|
451
|
+
```ruby
|
452
|
+
YYNTOKENS = 9
|
453
|
+
|
454
|
+
# The last index of yytable and yycheck
|
455
|
+
# The length of yytable and yycheck are always same
|
456
|
+
YYLAST = 13
|
457
|
+
YYTABLE_NINF = -1
|
458
|
+
yytable = [ 5, 6, 7, 9, 8, 9, 11, 12, 8, 9, 1, 10, 0, 2]
|
459
|
+
yycheck = [ 2, 0, 3, 6, 5, 6, 8, 9, 5, 6, 4, 8, -1, 7]
|
460
|
+
|
461
|
+
YYPACT_NINF = -4
|
462
|
+
yypact = [ 6, -4, 6, 1, -1, 3, -4, -4, 6, 6, -4, -3, -4]
|
463
|
+
yypgoto = [ -4, -4, -2]
|
464
|
+
|
465
|
+
yydefact = [ 2, 4, 0, 0, 0, 0, 1, 3, 0, 0, 7, 5, 6]
|
466
|
+
yydefgoto = [ 0, 3, 4]
|
467
|
+
|
468
|
+
yyr1 = [ 0, 9, 10, 10, 11, 11, 11, 11]
|
469
|
+
yyr2 = [ 0, 2, 0, 2, 1, 3, 3, 3]
|
470
|
+
```
|
471
|
+
|
472
|
+
### Determine what to do next
|
473
|
+
|
474
|
+
Determine what to do next based on current state (`state`) and next token (`yytoken`).
|
475
|
+
|
476
|
+
The first step to decide action is looking up `yypact` table by current state.
|
477
|
+
If only default reduce exists for the current state, `yypact` returns `YYPACT_NINF`.
|
478
|
+
|
479
|
+
```ruby
|
480
|
+
# Case 1: Only default reduce exists for the state
|
481
|
+
#
|
482
|
+
# State 7
|
483
|
+
#
|
484
|
+
# 2 program: expr LF •
|
485
|
+
#
|
486
|
+
# $default reduce using rule 2 (program)
|
487
|
+
|
488
|
+
state = 7
|
489
|
+
yytoken = nil # Do not use yytoken in this case
|
490
|
+
|
491
|
+
offset = yypact[state] # -4
|
492
|
+
if offset == YYPACT_NINF # true
|
493
|
+
next_action = :yydefault
|
494
|
+
return
|
495
|
+
end
|
496
|
+
```
|
497
|
+
|
498
|
+
If both shift and default reduce exists for the current state, `yypact` returns offset in `yytable`.
|
499
|
+
Index is the sum of `offset` and `yytoken`.
|
500
|
+
Need to check index before access to `yytable` by consulting `yycheck`.
|
501
|
+
Index can be out of range because blank cells on head and tail are omitted, see how `yycheck` is constructed in the example above.
|
502
|
+
Therefore need to check an index is not less than 0 and not greater than `YYLAST`.
|
503
|
+
|
504
|
+
```ruby
|
505
|
+
# Case 2: Both shift and default reduce exists for the state
|
506
|
+
#
|
507
|
+
# State 11
|
508
|
+
#
|
509
|
+
# 4 expr: expr • '+' expr
|
510
|
+
# 4 | expr '+' expr • [LF, '+', ')']
|
511
|
+
# 5 | expr • '*' expr
|
512
|
+
#
|
513
|
+
# '*' shift, and go to state 9
|
514
|
+
#
|
515
|
+
# $default reduce using rule 4 (expr)
|
516
|
+
|
517
|
+
# Next token is '*' then shift it
|
518
|
+
state = 11
|
519
|
+
yytoken = nil
|
520
|
+
|
521
|
+
offset = yypact[state] # -3
|
522
|
+
if offset == YYPACT_NINF # false
|
523
|
+
next_action = :yydefault
|
524
|
+
break
|
525
|
+
end
|
526
|
+
|
527
|
+
unless yytoken
|
528
|
+
yytoken = yylex() # yylex returns 6 ('*')
|
529
|
+
end
|
530
|
+
|
531
|
+
idx = offset + yytoken # 3
|
532
|
+
if idx < 0 || YYLAST < idx # false
|
533
|
+
next_action = :yydefault
|
534
|
+
break
|
535
|
+
end
|
536
|
+
if yycheck[idx] != yytoken # false
|
537
|
+
next_action = :yydefault
|
538
|
+
break
|
539
|
+
end
|
540
|
+
|
541
|
+
act = yytable[idx] # 9
|
542
|
+
if act == YYTABLE_NINF # false
|
543
|
+
next_action = :syntax_error
|
544
|
+
break
|
545
|
+
end
|
546
|
+
if act > 0 # true
|
547
|
+
# Shift
|
548
|
+
next_action = :yyshift
|
549
|
+
break
|
550
|
+
else
|
551
|
+
# Reduce
|
552
|
+
next_action = :yyreduce
|
553
|
+
break
|
554
|
+
end
|
555
|
+
```
|
556
|
+
|
557
|
+
### Execute (default) reduce
|
558
|
+
|
559
|
+
Once next action is decided to default reduce, need to determine
|
560
|
+
|
561
|
+
1. the rule to be applied
|
562
|
+
2. the next state from GOTO table
|
563
|
+
|
564
|
+
Rule id for the default reduce is stored in `yydefact`.
|
565
|
+
`0` in `yydefact` means syntax error so need to check the value is not `0` before continue the process.
|
566
|
+
|
567
|
+
Once rule is determined, the length of the rule can be decided from `yyr2` and the LHS nonterminal can be decided from `yyr1`.
|
568
|
+
|
569
|
+
The next state is determined by LHS nonterminal and the state after reduce.
|
570
|
+
GOTO table is also compressed into `yytable` then the process to decide next state is similar to `yypact`.
|
571
|
+
|
572
|
+
1. Look up `yypgoto` by LHS nonterminal. Note `yypact` is indexed by state but `yypgoto` is indexed by nonterminal.
|
573
|
+
2. Check the value on `yypgoto` is `YYPACT_NINF` is not.
|
574
|
+
3. Check the index, sum of offset and state, is out of range or not.
|
575
|
+
4. Check `yycheck` table before access to `yytable`.
|
576
|
+
|
577
|
+
Finally push the state to the stack.
|
578
|
+
|
579
|
+
```ruby
|
580
|
+
# State 11
|
581
|
+
#
|
582
|
+
# 4 expr: expr • '+' expr
|
583
|
+
# 4 | expr '+' expr • [LF, '+', ')']
|
584
|
+
# 5 | expr • '*' expr
|
585
|
+
#
|
586
|
+
# '*' shift, and go to state 9
|
587
|
+
#
|
588
|
+
# $default reduce using rule 4 (expr)
|
589
|
+
|
590
|
+
# Input is "1 + 2 + 3 LF" and next token is the second '+'.
|
591
|
+
# Current state stack is `[0, 4, 8, 11]`.
|
592
|
+
# What to do next is reduce with default action.
|
593
|
+
state = 11
|
594
|
+
yytoken = 5 # '+'
|
595
|
+
|
596
|
+
rule = yydefact[state] # 5
|
597
|
+
if rule == 0 # false
|
598
|
+
next_action = :syntax_error
|
599
|
+
break
|
600
|
+
end
|
601
|
+
|
602
|
+
rhs_length = yyr2[rule] # 3. Because rule 4 is "expr: expr '+' expr"
|
603
|
+
lhs_nterm = yyr1[rule] # 11 (expr)
|
604
|
+
lhs_nterm_id = lhs_nterm - YYNTOKENS # 11 - 9 = 2
|
605
|
+
|
606
|
+
case rule
|
607
|
+
when 1
|
608
|
+
# Execute Rule 1 action
|
609
|
+
when 2
|
610
|
+
# Execute Rule 2 action
|
611
|
+
#...
|
612
|
+
when 7
|
613
|
+
# Execute Rule 7 action
|
614
|
+
end
|
615
|
+
|
616
|
+
stack.pop(rhs_length) # state stack: `[0, 4, 8, 11]` -> `[0]`
|
617
|
+
state = stack[-1] # state = 0
|
618
|
+
|
619
|
+
offset = yypgoto[lhs_nterm_id] # -2
|
620
|
+
if offset == YYPACT_NINF # false
|
621
|
+
state = yydefgoto[lhs_nterm_id]
|
622
|
+
else
|
623
|
+
idx = offset + state # 0
|
624
|
+
if idx < 0 || YYLAST < idx # true
|
625
|
+
state = yydefgoto[lhs_nterm_id] # 4
|
626
|
+
elsif yycheck[idx] != state
|
627
|
+
state = yydefgoto[lhs_nterm_id]
|
628
|
+
else
|
629
|
+
state = yytable[idx]
|
630
|
+
end
|
631
|
+
end
|
632
|
+
|
633
|
+
# yyval = $$, yyloc = @$
|
634
|
+
push_state(state, yyval, yyloc) # state stack: [0, 4]
|
635
|
+
```
|