lrama 0.6.2 → 0.6.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e4158de45c42ff62eacfb00737261feaa49d8f0cc646004e30da74ba4e2e69c6
4
- data.tar.gz: 734830227f701e18df2e9e8bc3da55d15f49c890e08530e6ac55ef87ae5f952d
3
+ metadata.gz: ecd30d3fab4dd73442ed6d3b2802db5b463159cb6ddf1f1835d9b8e860d4c9dd
4
+ data.tar.gz: 79b6087e68d3c2e95db81fa1d25f58280a5543e9c5fa91f5b6ecc7c40b5599d7
5
5
  SHA512:
6
- metadata.gz: 52ebbe4d099ae63d73aa995bddc8e966f989a4d00ad3b39634d2abe2448da404dd9bff8f15e0dedd0577716089329c804ef2c4edcadd39ca6ba47f8d293d101d
7
- data.tar.gz: 72e91c79618071b5850c85335cfe3f1b63ff89f11cd332b0141623a4e2a7e2c2c389dd0db9afe38a5a84b7aa891ac1834fc7a2d6c6eed8f62d87734f6b99cbbf
6
+ metadata.gz: f3302156423399987015deb90afbaa0d6916e5e61b14c5297ff6a0e01ab9db3bbd164b334a11e26d38cf626425b7faee404e5d1cdec236a9b08b576ced4fe201
7
+ data.tar.gz: 380d8d31c93e5ae6c5a406c2b2eedad0d4b52dd311ecd742ca1412bd1acade7dae8fe12f7a6325ef8a0b4922ffd927b0dfa2caf3cd54c095669ab2b2cab85516
data/NEWS.md CHANGED
@@ -1,5 +1,39 @@
1
1
  # NEWS for Lrama
2
2
 
3
+ ## Lrama 0.6.3 (2024-02-15)
4
+
5
+ ### Bring Your Own Stack
6
+
7
+ Provide functionalities for Bring Your Own Stack.
8
+
9
+ Ruby’s Ripper library requires their own semantic value stack to manage Ruby Objects returned by user defined callback method. Currently Ripper uses semantic value stack (`yyvsa`) which is used by parser to manage Node. This hack introduces some limitation on Ripper. For example, Ripper can not execute semantic analysis depending on Node structure.
10
+
11
+ Lrama introduces two features to support another semantic value stack by parser generator users.
12
+
13
+ 1. Callback entry points
14
+
15
+ User can emulate semantic value stack by these callbacks.
16
+ Lrama provides these five callbacks. Registered functions are called when each event happen. For example %after-shift function is called when shift happens on original semantic value stack.
17
+
18
+ * `%after-shift` function_name
19
+ * `%before-reduce` function_name
20
+ * `%after-reduce` function_name
21
+ * `%after-shift-error-token` function_name
22
+ * `%after-pop-stack` function_name
23
+
24
+ 2. `$:n` variable to access index of each grammar symbols
25
+
26
+ User also needs to access semantic value of their stack in grammar action. `$:n` provides the way to access to it. `$:n` is translated to the minus index from the top of the stack.
27
+ For example
28
+
29
+ ```
30
+ primary: k_if expr_value then compstmt if_tail k_end
31
+ {
32
+ /*% ripper: if!($:2, $:4, $:5) %*/
33
+ /* $:2 = -5, $:4 = -3, $:5 = -2. */
34
+ }
35
+ ```
36
+
3
37
  ## Lrama 0.6.2 (2024-01-27)
4
38
 
5
39
  ### %no-stdlib directive
data/README.md CHANGED
@@ -1,7 +1,23 @@
1
1
  # Lrama
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/lrama.svg)](https://badge.fury.io/rb/lrama)
4
+ [![build](https://github.com/ruby/lrama/actions/workflows/test.yaml/badge.svg)](https://github.com/ruby/lrama/actions/workflows/test.yaml)
5
+
3
6
  Lrama is LALR (1) parser generator written by Ruby. The first goal of this project is providing error tolerant parser for CRuby with minimal changes on CRuby parse.y file.
4
7
 
8
+ * [Features](#features)
9
+ * [Installation](#installation)
10
+ * [Usage](#usage)
11
+ * [Versions and Branches](#versions-and-branches)
12
+ * [Supported Ruby version](#supported-ruby-version)
13
+ * [Development](#development)
14
+ * [How to generate parser.rb](#how-to-generate-parserrb)
15
+ * [Test](#test)
16
+ * [Profiling Lrama](#profiling-lrama)
17
+ * [Build Ruby](#build-ruby)
18
+ * [Release flow](#release-flow)
19
+ * [License](#license)
20
+
5
21
  ## Features
6
22
 
7
23
  * Bison style grammar file is supported with some assumptions
@@ -11,6 +27,9 @@ Lrama is LALR (1) parser generator written by Ruby. The first goal of this proje
11
27
  * b4_lac_if is always false
12
28
  * Error Tolerance parser
13
29
  * Subset of [Repairing Syntax Errors in LR Parsers (Corchuelo et al.)](https://idus.us.es/bitstream/handle/11441/65631/Repairing%20syntax%20errors.pdf) algorithm is supported
30
+ * Parameterizing rules
31
+ * The definition of a non-terminal symbol can be parameterized with other (terminal or non-terminal) symbols.
32
+ * Providing a generic definition of parameterizing rules as a [standard library](lib/lrama/grammar/stdlib.y).
14
33
 
15
34
  ## Installation
16
35
 
@@ -85,6 +104,8 @@ Running tests:
85
104
  ```shell
86
105
  $ bundle install
87
106
  $ bundle exec rspec
107
+ # or
108
+ $ bundle exec rake spec
88
109
  ```
89
110
 
90
111
  Running type check:
@@ -93,6 +114,8 @@ Running type check:
93
114
  $ bundle install
94
115
  $ bundle exec rbs collection install
95
116
  $ bundle exec steep check
117
+ # or
118
+ $ bundle exec rake steep
96
119
  ```
97
120
 
98
121
  Running both of them:
data/Steepfile CHANGED
@@ -11,12 +11,14 @@ target :lib do
11
11
  check "lib/lrama/grammar/error_token.rb"
12
12
  check "lib/lrama/grammar/parameterizing_rule"
13
13
  check "lib/lrama/grammar/parameterizing_rules"
14
+ check "lib/lrama/grammar/symbols"
14
15
  check "lib/lrama/grammar/percent_code.rb"
15
16
  check "lib/lrama/grammar/precedence.rb"
16
17
  check "lib/lrama/grammar/printer.rb"
17
18
  check "lib/lrama/grammar/reference.rb"
18
19
  check "lib/lrama/grammar/rule_builder.rb"
19
20
  check "lib/lrama/grammar/symbol.rb"
21
+ check "lib/lrama/grammar/type.rb"
20
22
  check "lib/lrama/lexer"
21
23
  check "lib/lrama/report"
22
24
  check "lib/lrama/bitmap.rb"
data/lib/lrama/context.rb CHANGED
@@ -265,9 +265,9 @@ module Lrama
265
265
 
266
266
  s = actions.each_with_index.map do |n, i|
267
267
  [i, n]
268
- end.select do |i, n|
268
+ end.reject do |i, n|
269
269
  # Remove default_reduction_rule entries
270
- n != 0
270
+ n == 0
271
271
  end
272
272
 
273
273
  if s.count != 0
@@ -462,7 +462,7 @@ module Lrama
462
462
  @yylast = high
463
463
 
464
464
  # replace_ninf
465
- @yypact_ninf = (@base.select {|i| i != BaseMin } + [0]).min - 1
465
+ @yypact_ninf = (@base.reject {|i| i == BaseMin } + [0]).min - 1
466
466
  @base.map! do |i|
467
467
  case i
468
468
  when BaseMin
@@ -472,7 +472,7 @@ module Lrama
472
472
  end
473
473
  end
474
474
 
475
- @yytable_ninf = (@table.compact.select {|i| i != ErrorActionNumber } + [0]).min - 1
475
+ @yytable_ninf = (@table.compact.reject {|i| i == ErrorActionNumber } + [0]).min - 1
476
476
  @table.map! do |i|
477
477
  case i
478
478
  when nil
@@ -6,18 +6,24 @@ module Lrama
6
6
 
7
7
  # * ($$) yylval
8
8
  # * (@$) yylloc
9
+ # * ($:$) error
9
10
  # * ($1) error
10
11
  # * (@1) error
12
+ # * ($:1) error
11
13
  def reference_to_c(ref)
12
14
  case
13
15
  when ref.type == :dollar && ref.name == "$" # $$
14
16
  "yylval"
15
17
  when ref.type == :at && ref.name == "$" # @$
16
18
  "yylloc"
19
+ when ref.type == :index && ref.name == "$" # $:$
20
+ raise "$:#{ref.value} can not be used in initial_action."
17
21
  when ref.type == :dollar # $n
18
22
  raise "$#{ref.value} can not be used in initial_action."
19
23
  when ref.type == :at # @n
20
24
  raise "@#{ref.value} can not be used in initial_action."
25
+ when ref.type == :index # $:n
26
+ raise "$:#{ref.value} can not be used in initial_action."
21
27
  else
22
28
  raise "Unexpected. #{self}, #{ref}"
23
29
  end
@@ -6,14 +6,18 @@ module Lrama
6
6
 
7
7
  # * ($$) error
8
8
  # * (@$) error
9
+ # * ($:$) error
9
10
  # * ($1) error
10
11
  # * (@1) error
12
+ # * ($:1) error
11
13
  def reference_to_c(ref)
12
14
  case
13
15
  when ref.type == :dollar # $$, $n
14
16
  raise "$#{ref.value} can not be used in #{type}."
15
17
  when ref.type == :at # @$, @n
16
18
  raise "@#{ref.value} can not be used in #{type}."
19
+ when ref.type == :index # $:$, $:n
20
+ raise "$:#{ref.value} can not be used in #{type}."
17
21
  else
18
22
  raise "Unexpected. #{self}, #{ref}"
19
23
  end
@@ -11,8 +11,10 @@ module Lrama
11
11
 
12
12
  # * ($$) *yyvaluep
13
13
  # * (@$) *yylocationp
14
+ # * ($:$) error
14
15
  # * ($1) error
15
16
  # * (@1) error
17
+ # * ($:1) error
16
18
  def reference_to_c(ref)
17
19
  case
18
20
  when ref.type == :dollar && ref.name == "$" # $$
@@ -20,10 +22,14 @@ module Lrama
20
22
  "((*yyvaluep).#{member})"
21
23
  when ref.type == :at && ref.name == "$" # @$
22
24
  "(*yylocationp)"
25
+ when ref.type == :index && ref.name == "$" # $:$
26
+ raise "$:#{ref.value} can not be used in #{type}."
23
27
  when ref.type == :dollar # $n
24
28
  raise "$#{ref.value} can not be used in #{type}."
25
29
  when ref.type == :at # @n
26
30
  raise "@#{ref.value} can not be used in #{type}."
31
+ when ref.type == :index # $:n
32
+ raise "$:#{ref.value} can not be used in #{type}."
27
33
  else
28
34
  raise "Unexpected. #{self}, #{ref}"
29
35
  end
@@ -11,8 +11,10 @@ module Lrama
11
11
 
12
12
  # * ($$) yyval
13
13
  # * (@$) yyloc
14
+ # * ($:$) error
14
15
  # * ($1) yyvsp[i]
15
16
  # * (@1) yylsp[i]
17
+ # * ($:1) i - 1
16
18
  #
17
19
  #
18
20
  # Consider a rule like
@@ -24,6 +26,8 @@ module Lrama
24
26
  # "Rule" class: keyword_class { $1 } tSTRING { $2 + $3 } keyword_end { $class = $1 + $keyword_end }
25
27
  # "Position in grammar" $1 $2 $3 $4 $5
26
28
  # "Index for yyvsp" -4 -3 -2 -1 0
29
+ # "$:n" $:1 $:2 $:3 $:4 $:5
30
+ # "index of $:n" -5 -4 -3 -2 -1
27
31
  #
28
32
  #
29
33
  # For the first midrule action:
@@ -31,6 +35,7 @@ module Lrama
31
35
  # "Rule" class: keyword_class { $1 } tSTRING { $2 + $3 } keyword_end { $class = $1 + $keyword_end }
32
36
  # "Position in grammar" $1
33
37
  # "Index for yyvsp" 0
38
+ # "$:n" $:1
34
39
  def reference_to_c(ref)
35
40
  case
36
41
  when ref.type == :dollar && ref.name == "$" # $$
@@ -39,6 +44,8 @@ module Lrama
39
44
  "(yyval.#{tag.member})"
40
45
  when ref.type == :at && ref.name == "$" # @$
41
46
  "(yyloc)"
47
+ when ref.type == :index && ref.name == "$" # $:$
48
+ raise "$:$ is not supported"
42
49
  when ref.type == :dollar # $n
43
50
  i = -position_in_rhs + ref.index
44
51
  tag = ref.ex_tag || rhs[ref.index - 1].tag
@@ -47,6 +54,9 @@ module Lrama
47
54
  when ref.type == :at # @n
48
55
  i = -position_in_rhs + ref.index
49
56
  "(yylsp[#{i}])"
57
+ when ref.type == :index # $:n
58
+ i = -position_in_rhs + ref.index
59
+ "(#{i} - 1)"
50
60
  else
51
61
  raise "Unexpected. #{self}, #{ref}"
52
62
  end
@@ -70,7 +80,7 @@ module Lrama
70
80
  end
71
81
 
72
82
  def raise_tag_not_found_error(ref)
73
- raise "Tag is not specified for '$#{ref.value}' in '#{@rule.to_s}'"
83
+ raise "Tag is not specified for '$#{ref.value}' in '#{@rule}'"
74
84
  end
75
85
  end
76
86
  end
@@ -2,11 +2,12 @@ module Lrama
2
2
  class Grammar
3
3
  # type: :dollar or :at
4
4
  # name: String (e.g. $$, $foo, $expr.right)
5
- # index: Integer (e.g. $1)
5
+ # number: Integer (e.g. $1)
6
+ # index: Integer
6
7
  # ex_tag: "$<tag>1" (Optional)
7
- class Reference < Struct.new(:type, :name, :index, :ex_tag, :first_column, :last_column, keyword_init: true)
8
+ class Reference < Struct.new(:type, :name, :number, :index, :ex_tag, :first_column, :last_column, keyword_init: true)
8
9
  def value
9
- name || index
10
+ name || number
10
11
  end
11
12
  end
12
13
  end
@@ -181,11 +181,18 @@ module Lrama
181
181
  if referring_symbol[1] == 0 # Refers to LHS
182
182
  ref.name = '$'
183
183
  else
184
- ref.index = referring_symbol[1]
184
+ ref.number = referring_symbol[1]
185
185
  end
186
186
  end
187
187
  end
188
188
 
189
+ if ref.number
190
+ # TODO: When Inlining is implemented, for example, if `$1` is expanded to multiple RHS tokens,
191
+ # `$2` needs to access `$2 + n` to actually access it. So, after the Inlining implementation,
192
+ # it needs resolves from number to index.
193
+ ref.index = ref.number
194
+ end
195
+
189
196
  # TODO: Need to check index of @ too?
190
197
  next if ref.type == :at
191
198
 
@@ -11,7 +11,7 @@ module Lrama
11
11
  attr_reader :term
12
12
  attr_writer :eof_symbol, :error_symbol, :undef_symbol, :accept_symbol
13
13
 
14
- def initialize(id:, alias_name: nil, number: nil, tag: nil, term:, token_id: nil, nullable: nil, precedence: nil, printer: nil)
14
+ def initialize(id:, term:, alias_name: nil, number: nil, tag: nil, token_id: nil, nullable: nil, precedence: nil, printer: nil)
15
15
  @id = id
16
16
  @alias_name = alias_name
17
17
  @number = number
@@ -0,0 +1,276 @@
1
+ module Lrama
2
+ class Grammar
3
+ class Symbols
4
+ class Resolver
5
+ attr_reader :terms, :nterms
6
+
7
+ def initialize
8
+ @terms = []
9
+ @nterms = []
10
+ end
11
+
12
+ def symbols
13
+ @symbols ||= (@terms + @nterms)
14
+ end
15
+
16
+ def sort_by_number!
17
+ symbols.sort_by!(&:number)
18
+ end
19
+
20
+ def add_term(id:, alias_name: nil, tag: nil, token_id: nil, replace: false)
21
+ if token_id && (sym = find_symbol_by_token_id(token_id))
22
+ if replace
23
+ sym.id = id
24
+ sym.alias_name = alias_name
25
+ sym.tag = tag
26
+ end
27
+
28
+ return sym
29
+ end
30
+
31
+ if (sym = find_symbol_by_id(id))
32
+ return sym
33
+ end
34
+
35
+ @symbols = nil
36
+ term = Symbol.new(
37
+ id: id, alias_name: alias_name, number: nil, tag: tag,
38
+ term: true, token_id: token_id, nullable: false
39
+ )
40
+ @terms << term
41
+ term
42
+ end
43
+
44
+ def add_nterm(id:, alias_name: nil, tag: nil)
45
+ return if find_symbol_by_id(id)
46
+
47
+ @symbols = nil
48
+ nterm = Symbol.new(
49
+ id: id, alias_name: alias_name, number: nil, tag: tag,
50
+ term: false, token_id: nil, nullable: nil,
51
+ )
52
+ @nterms << nterm
53
+ nterm
54
+ end
55
+
56
+ def find_symbol_by_s_value(s_value)
57
+ symbols.find { |s| s.id.s_value == s_value }
58
+ end
59
+
60
+ def find_symbol_by_s_value!(s_value)
61
+ find_symbol_by_s_value(s_value) || (raise "Symbol not found: #{s_value}")
62
+ end
63
+
64
+ def find_symbol_by_id(id)
65
+ symbols.find do |s|
66
+ s.id == id || s.alias_name == id.s_value
67
+ end
68
+ end
69
+
70
+ def find_symbol_by_id!(id)
71
+ find_symbol_by_id(id) || (raise "Symbol not found: #{id}")
72
+ end
73
+
74
+ def find_symbol_by_token_id(token_id)
75
+ symbols.find {|s| s.token_id == token_id }
76
+ end
77
+
78
+ def find_symbol_by_number!(number)
79
+ sym = symbols[number]
80
+
81
+ raise "Symbol not found: #{number}" unless sym
82
+ raise "[BUG] Symbol number mismatch. #{number}, #{sym}" if sym.number != number
83
+
84
+ sym
85
+ end
86
+
87
+ def fill_symbol_number
88
+ # YYEMPTY = -2
89
+ # YYEOF = 0
90
+ # YYerror = 1
91
+ # YYUNDEF = 2
92
+ @number = 3
93
+ fill_terms_number
94
+ fill_nterms_number
95
+ end
96
+
97
+ def fill_nterm_type(types)
98
+ types.each do |type|
99
+ nterm = find_nterm_by_id!(type.id)
100
+ nterm.tag = type.tag
101
+ end
102
+ end
103
+
104
+ def fill_printer(printers)
105
+ symbols.each do |sym|
106
+ printers.each do |printer|
107
+ printer.ident_or_tags.each do |ident_or_tag|
108
+ case ident_or_tag
109
+ when Lrama::Lexer::Token::Ident
110
+ sym.printer = printer if sym.id == ident_or_tag
111
+ when Lrama::Lexer::Token::Tag
112
+ sym.printer = printer if sym.tag == ident_or_tag
113
+ else
114
+ raise "Unknown token type. #{printer}"
115
+ end
116
+ end
117
+ end
118
+ end
119
+ end
120
+
121
+ def fill_error_token(error_tokens)
122
+ symbols.each do |sym|
123
+ error_tokens.each do |token|
124
+ token.ident_or_tags.each do |ident_or_tag|
125
+ case ident_or_tag
126
+ when Lrama::Lexer::Token::Ident
127
+ sym.error_token = token if sym.id == ident_or_tag
128
+ when Lrama::Lexer::Token::Tag
129
+ sym.error_token = token if sym.tag == ident_or_tag
130
+ else
131
+ raise "Unknown token type. #{token}"
132
+ end
133
+ end
134
+ end
135
+ end
136
+ end
137
+
138
+ def token_to_symbol(token)
139
+ case token
140
+ when Lrama::Lexer::Token
141
+ find_symbol_by_id!(token)
142
+ else
143
+ raise "Unknown class: #{token}"
144
+ end
145
+ end
146
+
147
+ def validate!
148
+ validate_number_uniqueness!
149
+ validate_alias_name_uniqueness!
150
+ end
151
+
152
+ private
153
+
154
+ def find_nterm_by_id!(id)
155
+ @nterms.find do |s|
156
+ s.id == id
157
+ end || (raise "Symbol not found: #{id}")
158
+ end
159
+
160
+ def fill_terms_number
161
+ # Character literal in grammar file has
162
+ # token id corresponding to ASCII code by default,
163
+ # so start token_id from 256.
164
+ token_id = 256
165
+
166
+ @terms.each do |sym|
167
+ while used_numbers[@number] do
168
+ @number += 1
169
+ end
170
+
171
+ if sym.number.nil?
172
+ sym.number = @number
173
+ used_numbers[@number] = true
174
+ @number += 1
175
+ end
176
+
177
+ # If id is Token::Char, it uses ASCII code
178
+ if sym.token_id.nil?
179
+ if sym.id.is_a?(Lrama::Lexer::Token::Char)
180
+ # Ignore ' on the both sides
181
+ case sym.id.s_value[1..-2]
182
+ when "\\b"
183
+ sym.token_id = 8
184
+ when "\\f"
185
+ sym.token_id = 12
186
+ when "\\n"
187
+ sym.token_id = 10
188
+ when "\\r"
189
+ sym.token_id = 13
190
+ when "\\t"
191
+ sym.token_id = 9
192
+ when "\\v"
193
+ sym.token_id = 11
194
+ when "\""
195
+ sym.token_id = 34
196
+ when "'"
197
+ sym.token_id = 39
198
+ when "\\\\"
199
+ sym.token_id = 92
200
+ when /\A\\(\d+)\z/
201
+ unless (id = Integer($1, 8)).nil?
202
+ sym.token_id = id
203
+ else
204
+ raise "Unknown Char s_value #{sym}"
205
+ end
206
+ when /\A(.)\z/
207
+ unless (id = $1&.bytes&.first).nil?
208
+ sym.token_id = id
209
+ else
210
+ raise "Unknown Char s_value #{sym}"
211
+ end
212
+ else
213
+ raise "Unknown Char s_value #{sym}"
214
+ end
215
+ else
216
+ sym.token_id = token_id
217
+ token_id += 1
218
+ end
219
+ end
220
+ end
221
+ end
222
+
223
+ def fill_nterms_number
224
+ token_id = 0
225
+
226
+ @nterms.each do |sym|
227
+ while used_numbers[@number] do
228
+ @number += 1
229
+ end
230
+
231
+ if sym.number.nil?
232
+ sym.number = @number
233
+ used_numbers[@number] = true
234
+ @number += 1
235
+ end
236
+
237
+ if sym.token_id.nil?
238
+ sym.token_id = token_id
239
+ token_id += 1
240
+ end
241
+ end
242
+ end
243
+
244
+ def used_numbers
245
+ return @used_numbers if defined?(@used_numbers)
246
+
247
+ @used_numbers = {}
248
+ symbols.map(&:number).each do |n|
249
+ @used_numbers[n] = true
250
+ end
251
+ @used_numbers
252
+ end
253
+
254
+ def validate_number_uniqueness!
255
+ invalid = symbols.group_by(&:number).select do |number, syms|
256
+ syms.count > 1
257
+ end
258
+
259
+ return if invalid.empty?
260
+
261
+ raise "Symbol number is duplicated. #{invalid}"
262
+ end
263
+
264
+ def validate_alias_name_uniqueness!
265
+ invalid = symbols.select(&:alias_name).group_by(&:alias_name).select do |alias_name, syms|
266
+ syms.count > 1
267
+ end
268
+
269
+ return if invalid.empty?
270
+
271
+ raise "Symbol alias name is duplicated. #{invalid}"
272
+ end
273
+ end
274
+ end
275
+ end
276
+ end
@@ -0,0 +1 @@
1
+ require_relative "symbols/resolver"