lrama 0.6.2 → 0.6.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e4158de45c42ff62eacfb00737261feaa49d8f0cc646004e30da74ba4e2e69c6
4
- data.tar.gz: 734830227f701e18df2e9e8bc3da55d15f49c890e08530e6ac55ef87ae5f952d
3
+ metadata.gz: ecd30d3fab4dd73442ed6d3b2802db5b463159cb6ddf1f1835d9b8e860d4c9dd
4
+ data.tar.gz: 79b6087e68d3c2e95db81fa1d25f58280a5543e9c5fa91f5b6ecc7c40b5599d7
5
5
  SHA512:
6
- metadata.gz: 52ebbe4d099ae63d73aa995bddc8e966f989a4d00ad3b39634d2abe2448da404dd9bff8f15e0dedd0577716089329c804ef2c4edcadd39ca6ba47f8d293d101d
7
- data.tar.gz: 72e91c79618071b5850c85335cfe3f1b63ff89f11cd332b0141623a4e2a7e2c2c389dd0db9afe38a5a84b7aa891ac1834fc7a2d6c6eed8f62d87734f6b99cbbf
6
+ metadata.gz: f3302156423399987015deb90afbaa0d6916e5e61b14c5297ff6a0e01ab9db3bbd164b334a11e26d38cf626425b7faee404e5d1cdec236a9b08b576ced4fe201
7
+ data.tar.gz: 380d8d31c93e5ae6c5a406c2b2eedad0d4b52dd311ecd742ca1412bd1acade7dae8fe12f7a6325ef8a0b4922ffd927b0dfa2caf3cd54c095669ab2b2cab85516
data/NEWS.md CHANGED
@@ -1,5 +1,39 @@
1
1
  # NEWS for Lrama
2
2
 
3
+ ## Lrama 0.6.3 (2024-02-15)
4
+
5
+ ### Bring Your Own Stack
6
+
7
+ Provide functionalities for Bring Your Own Stack.
8
+
9
+ Ruby’s Ripper library requires their own semantic value stack to manage Ruby Objects returned by user defined callback method. Currently Ripper uses semantic value stack (`yyvsa`) which is used by parser to manage Node. This hack introduces some limitation on Ripper. For example, Ripper can not execute semantic analysis depending on Node structure.
10
+
11
+ Lrama introduces two features to support another semantic value stack by parser generator users.
12
+
13
+ 1. Callback entry points
14
+
15
+ User can emulate semantic value stack by these callbacks.
16
+ Lrama provides these five callbacks. Registered functions are called when each event happen. For example %after-shift function is called when shift happens on original semantic value stack.
17
+
18
+ * `%after-shift` function_name
19
+ * `%before-reduce` function_name
20
+ * `%after-reduce` function_name
21
+ * `%after-shift-error-token` function_name
22
+ * `%after-pop-stack` function_name
23
+
24
+ 2. `$:n` variable to access index of each grammar symbols
25
+
26
+ User also needs to access semantic value of their stack in grammar action. `$:n` provides the way to access to it. `$:n` is translated to the minus index from the top of the stack.
27
+ For example
28
+
29
+ ```
30
+ primary: k_if expr_value then compstmt if_tail k_end
31
+ {
32
+ /*% ripper: if!($:2, $:4, $:5) %*/
33
+ /* $:2 = -5, $:4 = -3, $:5 = -2. */
34
+ }
35
+ ```
36
+
3
37
  ## Lrama 0.6.2 (2024-01-27)
4
38
 
5
39
  ### %no-stdlib directive
data/README.md CHANGED
@@ -1,7 +1,23 @@
1
1
  # Lrama
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/lrama.svg)](https://badge.fury.io/rb/lrama)
4
+ [![build](https://github.com/ruby/lrama/actions/workflows/test.yaml/badge.svg)](https://github.com/ruby/lrama/actions/workflows/test.yaml)
5
+
3
6
  Lrama is LALR (1) parser generator written by Ruby. The first goal of this project is providing error tolerant parser for CRuby with minimal changes on CRuby parse.y file.
4
7
 
8
+ * [Features](#features)
9
+ * [Installation](#installation)
10
+ * [Usage](#usage)
11
+ * [Versions and Branches](#versions-and-branches)
12
+ * [Supported Ruby version](#supported-ruby-version)
13
+ * [Development](#development)
14
+ * [How to generate parser.rb](#how-to-generate-parserrb)
15
+ * [Test](#test)
16
+ * [Profiling Lrama](#profiling-lrama)
17
+ * [Build Ruby](#build-ruby)
18
+ * [Release flow](#release-flow)
19
+ * [License](#license)
20
+
5
21
  ## Features
6
22
 
7
23
  * Bison style grammar file is supported with some assumptions
@@ -11,6 +27,9 @@ Lrama is LALR (1) parser generator written by Ruby. The first goal of this proje
11
27
  * b4_lac_if is always false
12
28
  * Error Tolerance parser
13
29
  * Subset of [Repairing Syntax Errors in LR Parsers (Corchuelo et al.)](https://idus.us.es/bitstream/handle/11441/65631/Repairing%20syntax%20errors.pdf) algorithm is supported
30
+ * Parameterizing rules
31
+ * The definition of a non-terminal symbol can be parameterized with other (terminal or non-terminal) symbols.
32
+ * Providing a generic definition of parameterizing rules as a [standard library](lib/lrama/grammar/stdlib.y).
14
33
 
15
34
  ## Installation
16
35
 
@@ -85,6 +104,8 @@ Running tests:
85
104
  ```shell
86
105
  $ bundle install
87
106
  $ bundle exec rspec
107
+ # or
108
+ $ bundle exec rake spec
88
109
  ```
89
110
 
90
111
  Running type check:
@@ -93,6 +114,8 @@ Running type check:
93
114
  $ bundle install
94
115
  $ bundle exec rbs collection install
95
116
  $ bundle exec steep check
117
+ # or
118
+ $ bundle exec rake steep
96
119
  ```
97
120
 
98
121
  Running both of them:
data/Steepfile CHANGED
@@ -11,12 +11,14 @@ target :lib do
11
11
  check "lib/lrama/grammar/error_token.rb"
12
12
  check "lib/lrama/grammar/parameterizing_rule"
13
13
  check "lib/lrama/grammar/parameterizing_rules"
14
+ check "lib/lrama/grammar/symbols"
14
15
  check "lib/lrama/grammar/percent_code.rb"
15
16
  check "lib/lrama/grammar/precedence.rb"
16
17
  check "lib/lrama/grammar/printer.rb"
17
18
  check "lib/lrama/grammar/reference.rb"
18
19
  check "lib/lrama/grammar/rule_builder.rb"
19
20
  check "lib/lrama/grammar/symbol.rb"
21
+ check "lib/lrama/grammar/type.rb"
20
22
  check "lib/lrama/lexer"
21
23
  check "lib/lrama/report"
22
24
  check "lib/lrama/bitmap.rb"
data/lib/lrama/context.rb CHANGED
@@ -265,9 +265,9 @@ module Lrama
265
265
 
266
266
  s = actions.each_with_index.map do |n, i|
267
267
  [i, n]
268
- end.select do |i, n|
268
+ end.reject do |i, n|
269
269
  # Remove default_reduction_rule entries
270
- n != 0
270
+ n == 0
271
271
  end
272
272
 
273
273
  if s.count != 0
@@ -462,7 +462,7 @@ module Lrama
462
462
  @yylast = high
463
463
 
464
464
  # replace_ninf
465
- @yypact_ninf = (@base.select {|i| i != BaseMin } + [0]).min - 1
465
+ @yypact_ninf = (@base.reject {|i| i == BaseMin } + [0]).min - 1
466
466
  @base.map! do |i|
467
467
  case i
468
468
  when BaseMin
@@ -472,7 +472,7 @@ module Lrama
472
472
  end
473
473
  end
474
474
 
475
- @yytable_ninf = (@table.compact.select {|i| i != ErrorActionNumber } + [0]).min - 1
475
+ @yytable_ninf = (@table.compact.reject {|i| i == ErrorActionNumber } + [0]).min - 1
476
476
  @table.map! do |i|
477
477
  case i
478
478
  when nil
@@ -6,18 +6,24 @@ module Lrama
6
6
 
7
7
  # * ($$) yylval
8
8
  # * (@$) yylloc
9
+ # * ($:$) error
9
10
  # * ($1) error
10
11
  # * (@1) error
12
+ # * ($:1) error
11
13
  def reference_to_c(ref)
12
14
  case
13
15
  when ref.type == :dollar && ref.name == "$" # $$
14
16
  "yylval"
15
17
  when ref.type == :at && ref.name == "$" # @$
16
18
  "yylloc"
19
+ when ref.type == :index && ref.name == "$" # $:$
20
+ raise "$:#{ref.value} can not be used in initial_action."
17
21
  when ref.type == :dollar # $n
18
22
  raise "$#{ref.value} can not be used in initial_action."
19
23
  when ref.type == :at # @n
20
24
  raise "@#{ref.value} can not be used in initial_action."
25
+ when ref.type == :index # $:n
26
+ raise "$:#{ref.value} can not be used in initial_action."
21
27
  else
22
28
  raise "Unexpected. #{self}, #{ref}"
23
29
  end
@@ -6,14 +6,18 @@ module Lrama
6
6
 
7
7
  # * ($$) error
8
8
  # * (@$) error
9
+ # * ($:$) error
9
10
  # * ($1) error
10
11
  # * (@1) error
12
+ # * ($:1) error
11
13
  def reference_to_c(ref)
12
14
  case
13
15
  when ref.type == :dollar # $$, $n
14
16
  raise "$#{ref.value} can not be used in #{type}."
15
17
  when ref.type == :at # @$, @n
16
18
  raise "@#{ref.value} can not be used in #{type}."
19
+ when ref.type == :index # $:$, $:n
20
+ raise "$:#{ref.value} can not be used in #{type}."
17
21
  else
18
22
  raise "Unexpected. #{self}, #{ref}"
19
23
  end
@@ -11,8 +11,10 @@ module Lrama
11
11
 
12
12
  # * ($$) *yyvaluep
13
13
  # * (@$) *yylocationp
14
+ # * ($:$) error
14
15
  # * ($1) error
15
16
  # * (@1) error
17
+ # * ($:1) error
16
18
  def reference_to_c(ref)
17
19
  case
18
20
  when ref.type == :dollar && ref.name == "$" # $$
@@ -20,10 +22,14 @@ module Lrama
20
22
  "((*yyvaluep).#{member})"
21
23
  when ref.type == :at && ref.name == "$" # @$
22
24
  "(*yylocationp)"
25
+ when ref.type == :index && ref.name == "$" # $:$
26
+ raise "$:#{ref.value} can not be used in #{type}."
23
27
  when ref.type == :dollar # $n
24
28
  raise "$#{ref.value} can not be used in #{type}."
25
29
  when ref.type == :at # @n
26
30
  raise "@#{ref.value} can not be used in #{type}."
31
+ when ref.type == :index # $:n
32
+ raise "$:#{ref.value} can not be used in #{type}."
27
33
  else
28
34
  raise "Unexpected. #{self}, #{ref}"
29
35
  end
@@ -11,8 +11,10 @@ module Lrama
11
11
 
12
12
  # * ($$) yyval
13
13
  # * (@$) yyloc
14
+ # * ($:$) error
14
15
  # * ($1) yyvsp[i]
15
16
  # * (@1) yylsp[i]
17
+ # * ($:1) i - 1
16
18
  #
17
19
  #
18
20
  # Consider a rule like
@@ -24,6 +26,8 @@ module Lrama
24
26
  # "Rule" class: keyword_class { $1 } tSTRING { $2 + $3 } keyword_end { $class = $1 + $keyword_end }
25
27
  # "Position in grammar" $1 $2 $3 $4 $5
26
28
  # "Index for yyvsp" -4 -3 -2 -1 0
29
+ # "$:n" $:1 $:2 $:3 $:4 $:5
30
+ # "index of $:n" -5 -4 -3 -2 -1
27
31
  #
28
32
  #
29
33
  # For the first midrule action:
@@ -31,6 +35,7 @@ module Lrama
31
35
  # "Rule" class: keyword_class { $1 } tSTRING { $2 + $3 } keyword_end { $class = $1 + $keyword_end }
32
36
  # "Position in grammar" $1
33
37
  # "Index for yyvsp" 0
38
+ # "$:n" $:1
34
39
  def reference_to_c(ref)
35
40
  case
36
41
  when ref.type == :dollar && ref.name == "$" # $$
@@ -39,6 +44,8 @@ module Lrama
39
44
  "(yyval.#{tag.member})"
40
45
  when ref.type == :at && ref.name == "$" # @$
41
46
  "(yyloc)"
47
+ when ref.type == :index && ref.name == "$" # $:$
48
+ raise "$:$ is not supported"
42
49
  when ref.type == :dollar # $n
43
50
  i = -position_in_rhs + ref.index
44
51
  tag = ref.ex_tag || rhs[ref.index - 1].tag
@@ -47,6 +54,9 @@ module Lrama
47
54
  when ref.type == :at # @n
48
55
  i = -position_in_rhs + ref.index
49
56
  "(yylsp[#{i}])"
57
+ when ref.type == :index # $:n
58
+ i = -position_in_rhs + ref.index
59
+ "(#{i} - 1)"
50
60
  else
51
61
  raise "Unexpected. #{self}, #{ref}"
52
62
  end
@@ -70,7 +80,7 @@ module Lrama
70
80
  end
71
81
 
72
82
  def raise_tag_not_found_error(ref)
73
- raise "Tag is not specified for '$#{ref.value}' in '#{@rule.to_s}'"
83
+ raise "Tag is not specified for '$#{ref.value}' in '#{@rule}'"
74
84
  end
75
85
  end
76
86
  end
@@ -2,11 +2,12 @@ module Lrama
2
2
  class Grammar
3
3
  # type: :dollar or :at
4
4
  # name: String (e.g. $$, $foo, $expr.right)
5
- # index: Integer (e.g. $1)
5
+ # number: Integer (e.g. $1)
6
+ # index: Integer
6
7
  # ex_tag: "$<tag>1" (Optional)
7
- class Reference < Struct.new(:type, :name, :index, :ex_tag, :first_column, :last_column, keyword_init: true)
8
+ class Reference < Struct.new(:type, :name, :number, :index, :ex_tag, :first_column, :last_column, keyword_init: true)
8
9
  def value
9
- name || index
10
+ name || number
10
11
  end
11
12
  end
12
13
  end
@@ -181,11 +181,18 @@ module Lrama
181
181
  if referring_symbol[1] == 0 # Refers to LHS
182
182
  ref.name = '$'
183
183
  else
184
- ref.index = referring_symbol[1]
184
+ ref.number = referring_symbol[1]
185
185
  end
186
186
  end
187
187
  end
188
188
 
189
+ if ref.number
190
+ # TODO: When Inlining is implemented, for example, if `$1` is expanded to multiple RHS tokens,
191
+ # `$2` needs to access `$2 + n` to actually access it. So, after the Inlining implementation,
192
+ # it needs resolves from number to index.
193
+ ref.index = ref.number
194
+ end
195
+
189
196
  # TODO: Need to check index of @ too?
190
197
  next if ref.type == :at
191
198
 
@@ -11,7 +11,7 @@ module Lrama
11
11
  attr_reader :term
12
12
  attr_writer :eof_symbol, :error_symbol, :undef_symbol, :accept_symbol
13
13
 
14
- def initialize(id:, alias_name: nil, number: nil, tag: nil, term:, token_id: nil, nullable: nil, precedence: nil, printer: nil)
14
+ def initialize(id:, term:, alias_name: nil, number: nil, tag: nil, token_id: nil, nullable: nil, precedence: nil, printer: nil)
15
15
  @id = id
16
16
  @alias_name = alias_name
17
17
  @number = number
@@ -0,0 +1,276 @@
1
+ module Lrama
2
+ class Grammar
3
+ class Symbols
4
+ class Resolver
5
+ attr_reader :terms, :nterms
6
+
7
+ def initialize
8
+ @terms = []
9
+ @nterms = []
10
+ end
11
+
12
+ def symbols
13
+ @symbols ||= (@terms + @nterms)
14
+ end
15
+
16
+ def sort_by_number!
17
+ symbols.sort_by!(&:number)
18
+ end
19
+
20
+ def add_term(id:, alias_name: nil, tag: nil, token_id: nil, replace: false)
21
+ if token_id && (sym = find_symbol_by_token_id(token_id))
22
+ if replace
23
+ sym.id = id
24
+ sym.alias_name = alias_name
25
+ sym.tag = tag
26
+ end
27
+
28
+ return sym
29
+ end
30
+
31
+ if (sym = find_symbol_by_id(id))
32
+ return sym
33
+ end
34
+
35
+ @symbols = nil
36
+ term = Symbol.new(
37
+ id: id, alias_name: alias_name, number: nil, tag: tag,
38
+ term: true, token_id: token_id, nullable: false
39
+ )
40
+ @terms << term
41
+ term
42
+ end
43
+
44
+ def add_nterm(id:, alias_name: nil, tag: nil)
45
+ return if find_symbol_by_id(id)
46
+
47
+ @symbols = nil
48
+ nterm = Symbol.new(
49
+ id: id, alias_name: alias_name, number: nil, tag: tag,
50
+ term: false, token_id: nil, nullable: nil,
51
+ )
52
+ @nterms << nterm
53
+ nterm
54
+ end
55
+
56
+ def find_symbol_by_s_value(s_value)
57
+ symbols.find { |s| s.id.s_value == s_value }
58
+ end
59
+
60
+ def find_symbol_by_s_value!(s_value)
61
+ find_symbol_by_s_value(s_value) || (raise "Symbol not found: #{s_value}")
62
+ end
63
+
64
+ def find_symbol_by_id(id)
65
+ symbols.find do |s|
66
+ s.id == id || s.alias_name == id.s_value
67
+ end
68
+ end
69
+
70
+ def find_symbol_by_id!(id)
71
+ find_symbol_by_id(id) || (raise "Symbol not found: #{id}")
72
+ end
73
+
74
+ def find_symbol_by_token_id(token_id)
75
+ symbols.find {|s| s.token_id == token_id }
76
+ end
77
+
78
+ def find_symbol_by_number!(number)
79
+ sym = symbols[number]
80
+
81
+ raise "Symbol not found: #{number}" unless sym
82
+ raise "[BUG] Symbol number mismatch. #{number}, #{sym}" if sym.number != number
83
+
84
+ sym
85
+ end
86
+
87
+ def fill_symbol_number
88
+ # YYEMPTY = -2
89
+ # YYEOF = 0
90
+ # YYerror = 1
91
+ # YYUNDEF = 2
92
+ @number = 3
93
+ fill_terms_number
94
+ fill_nterms_number
95
+ end
96
+
97
+ def fill_nterm_type(types)
98
+ types.each do |type|
99
+ nterm = find_nterm_by_id!(type.id)
100
+ nterm.tag = type.tag
101
+ end
102
+ end
103
+
104
+ def fill_printer(printers)
105
+ symbols.each do |sym|
106
+ printers.each do |printer|
107
+ printer.ident_or_tags.each do |ident_or_tag|
108
+ case ident_or_tag
109
+ when Lrama::Lexer::Token::Ident
110
+ sym.printer = printer if sym.id == ident_or_tag
111
+ when Lrama::Lexer::Token::Tag
112
+ sym.printer = printer if sym.tag == ident_or_tag
113
+ else
114
+ raise "Unknown token type. #{printer}"
115
+ end
116
+ end
117
+ end
118
+ end
119
+ end
120
+
121
+ def fill_error_token(error_tokens)
122
+ symbols.each do |sym|
123
+ error_tokens.each do |token|
124
+ token.ident_or_tags.each do |ident_or_tag|
125
+ case ident_or_tag
126
+ when Lrama::Lexer::Token::Ident
127
+ sym.error_token = token if sym.id == ident_or_tag
128
+ when Lrama::Lexer::Token::Tag
129
+ sym.error_token = token if sym.tag == ident_or_tag
130
+ else
131
+ raise "Unknown token type. #{token}"
132
+ end
133
+ end
134
+ end
135
+ end
136
+ end
137
+
138
+ def token_to_symbol(token)
139
+ case token
140
+ when Lrama::Lexer::Token
141
+ find_symbol_by_id!(token)
142
+ else
143
+ raise "Unknown class: #{token}"
144
+ end
145
+ end
146
+
147
+ def validate!
148
+ validate_number_uniqueness!
149
+ validate_alias_name_uniqueness!
150
+ end
151
+
152
+ private
153
+
154
+ def find_nterm_by_id!(id)
155
+ @nterms.find do |s|
156
+ s.id == id
157
+ end || (raise "Symbol not found: #{id}")
158
+ end
159
+
160
+ def fill_terms_number
161
+ # Character literal in grammar file has
162
+ # token id corresponding to ASCII code by default,
163
+ # so start token_id from 256.
164
+ token_id = 256
165
+
166
+ @terms.each do |sym|
167
+ while used_numbers[@number] do
168
+ @number += 1
169
+ end
170
+
171
+ if sym.number.nil?
172
+ sym.number = @number
173
+ used_numbers[@number] = true
174
+ @number += 1
175
+ end
176
+
177
+ # If id is Token::Char, it uses ASCII code
178
+ if sym.token_id.nil?
179
+ if sym.id.is_a?(Lrama::Lexer::Token::Char)
180
+ # Ignore ' on the both sides
181
+ case sym.id.s_value[1..-2]
182
+ when "\\b"
183
+ sym.token_id = 8
184
+ when "\\f"
185
+ sym.token_id = 12
186
+ when "\\n"
187
+ sym.token_id = 10
188
+ when "\\r"
189
+ sym.token_id = 13
190
+ when "\\t"
191
+ sym.token_id = 9
192
+ when "\\v"
193
+ sym.token_id = 11
194
+ when "\""
195
+ sym.token_id = 34
196
+ when "'"
197
+ sym.token_id = 39
198
+ when "\\\\"
199
+ sym.token_id = 92
200
+ when /\A\\(\d+)\z/
201
+ unless (id = Integer($1, 8)).nil?
202
+ sym.token_id = id
203
+ else
204
+ raise "Unknown Char s_value #{sym}"
205
+ end
206
+ when /\A(.)\z/
207
+ unless (id = $1&.bytes&.first).nil?
208
+ sym.token_id = id
209
+ else
210
+ raise "Unknown Char s_value #{sym}"
211
+ end
212
+ else
213
+ raise "Unknown Char s_value #{sym}"
214
+ end
215
+ else
216
+ sym.token_id = token_id
217
+ token_id += 1
218
+ end
219
+ end
220
+ end
221
+ end
222
+
223
+ def fill_nterms_number
224
+ token_id = 0
225
+
226
+ @nterms.each do |sym|
227
+ while used_numbers[@number] do
228
+ @number += 1
229
+ end
230
+
231
+ if sym.number.nil?
232
+ sym.number = @number
233
+ used_numbers[@number] = true
234
+ @number += 1
235
+ end
236
+
237
+ if sym.token_id.nil?
238
+ sym.token_id = token_id
239
+ token_id += 1
240
+ end
241
+ end
242
+ end
243
+
244
+ def used_numbers
245
+ return @used_numbers if defined?(@used_numbers)
246
+
247
+ @used_numbers = {}
248
+ symbols.map(&:number).each do |n|
249
+ @used_numbers[n] = true
250
+ end
251
+ @used_numbers
252
+ end
253
+
254
+ def validate_number_uniqueness!
255
+ invalid = symbols.group_by(&:number).select do |number, syms|
256
+ syms.count > 1
257
+ end
258
+
259
+ return if invalid.empty?
260
+
261
+ raise "Symbol number is duplicated. #{invalid}"
262
+ end
263
+
264
+ def validate_alias_name_uniqueness!
265
+ invalid = symbols.select(&:alias_name).group_by(&:alias_name).select do |alias_name, syms|
266
+ syms.count > 1
267
+ end
268
+
269
+ return if invalid.empty?
270
+
271
+ raise "Symbol alias name is duplicated. #{invalid}"
272
+ end
273
+ end
274
+ end
275
+ end
276
+ end
@@ -0,0 +1 @@
1
+ require_relative "symbols/resolver"