ruby_parser 3.14.2 → 3.17.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data.tar.gz.sig +0 -0
- data/History.rdoc +60 -0
- data/Manifest.txt +4 -0
- data/Rakefile +83 -16
- data/bin/ruby_parse_extract_error +8 -3
- data/compare/normalize.rb +45 -5
- data/debugging.md +172 -0
- data/lib/ruby20_parser.rb +2953 -2924
- data/lib/ruby20_parser.y +99 -59
- data/lib/ruby21_parser.rb +3008 -2977
- data/lib/ruby21_parser.y +99 -59
- data/lib/ruby22_parser.rb +3011 -2976
- data/lib/ruby22_parser.y +99 -59
- data/lib/ruby23_parser.rb +2955 -2923
- data/lib/ruby23_parser.y +99 -59
- data/lib/ruby24_parser.rb +3024 -2984
- data/lib/ruby24_parser.y +99 -59
- data/lib/ruby25_parser.rb +3023 -2984
- data/lib/ruby25_parser.y +99 -59
- data/lib/ruby26_parser.rb +2954 -2913
- data/lib/ruby26_parser.y +100 -59
- data/lib/ruby27_parser.rb +7393 -0
- data/lib/ruby27_parser.y +2715 -0
- data/lib/ruby30_parser.rb +7393 -0
- data/lib/ruby30_parser.y +2715 -0
- data/lib/ruby_lexer.rb +90 -39
- data/lib/ruby_lexer.rex +6 -7
- data/lib/ruby_lexer.rex.rb +7 -9
- data/lib/ruby_parser.rb +4 -0
- data/lib/ruby_parser.yy +164 -59
- data/lib/ruby_parser_extras.rb +57 -18
- data/test/test_ruby_lexer.rb +64 -16
- data/test/test_ruby_parser.rb +277 -3
- data/tools/munge.rb +9 -4
- metadata +55 -36
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ff6be95e278654e341f5279fed2fd7f0c9a96d93b2fd23ba1ff4b181d593be18
|
4
|
+
data.tar.gz: ab91b782eb2e77cdd855fa68f4699614b6160ebcca623dd8be25719b410b4206
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a469da9dadd1eeb35a48dbb34548e70feed8ca83b2d27e41c6bf940cf9dd779622fbddcc4b3c50534f46c6de42f1f085754739d76051413866ee6557fe84050d
|
7
|
+
data.tar.gz: d182a507b167a6c9af4a7a48e748f55066462888b646a38a08ab12c4365a57645c5216baa2d64c99c8d7b6ebcb4e4b22219a9da9a7d0bd945c00fc21104d7343
|
checksums.yaml.gz.sig
CHANGED
Binary file
|
data.tar.gz.sig
CHANGED
Binary file
|
data/History.rdoc
CHANGED
@@ -1,3 +1,63 @@
|
|
1
|
+
=== 3.16.0 / 2021-05-15
|
2
|
+
|
3
|
+
* 1 major enhancement:
|
4
|
+
|
5
|
+
* Added tentative 3.0 support.
|
6
|
+
|
7
|
+
* 3 minor enhancements:
|
8
|
+
|
9
|
+
* Added lexing for "beginless range" (bdots).
|
10
|
+
* Added parsing for bdots.
|
11
|
+
* Updated rake compare task to download xz files, bumped versions, etc
|
12
|
+
|
13
|
+
* 4 bug fixes:
|
14
|
+
|
15
|
+
* Bump rake dependency to >= 10, < 15. (presidentbeef)
|
16
|
+
* Bump sexp_processor dependency to 4.15.1+. (pravi)
|
17
|
+
* Fixed minor state mismatch at the end of parsing to make diffing a little cleaner.
|
18
|
+
* Fixed normalizer to deal with new bison token syntax
|
19
|
+
|
20
|
+
=== 3.15.1 / 2021-01-10
|
21
|
+
|
22
|
+
* 1 bug fix:
|
23
|
+
|
24
|
+
* Bumped ruby version to include < 4 (trunk).
|
25
|
+
|
26
|
+
=== 3.15.0 / 2020-08-31
|
27
|
+
|
28
|
+
* 1 major enhancement:
|
29
|
+
|
30
|
+
* Added tentative 2.7 support.
|
31
|
+
|
32
|
+
* 1 minor enhancement:
|
33
|
+
|
34
|
+
* Improved ruby_parse_extract_error's handling of moving slow files out.
|
35
|
+
|
36
|
+
* 22 bug fixes:
|
37
|
+
|
38
|
+
* Bumped ruby version to include 3.0 (trunk).
|
39
|
+
* Fix an error related to empty ensure bodies. (presidentbeef)
|
40
|
+
* Fix handling of bad magic encoding comment.
|
41
|
+
* Fixed SystemStackError when parsing a huoooge hash, caused by a splat arg.
|
42
|
+
* Fixed a number of errors parsing do blocks in strange edge cases.
|
43
|
+
* Fixed a string backslash lexing bug when the string is an invalid encoding. (nijikon, gmcgibbon)
|
44
|
+
* Fixed bug assigning line number to some arg nodes.
|
45
|
+
* Fixed bug concatinating string literals with differing encodings.
|
46
|
+
* Fixed bug lexing heredoc w/ nasty mix of \r\n and \n.
|
47
|
+
* Fixed bug lexing multiple codepoints in \u{0000 1111 2222} forms.
|
48
|
+
* Fixed bug setting line numbers in empty xstrings in some contexts.
|
49
|
+
* Fixed edge case on call w/ begin + do block as an arg.
|
50
|
+
* Fixed handling of UTF BOM.
|
51
|
+
* Fixed handling of lexer state across string interpolation braces.
|
52
|
+
* Fixed infinite loop when lexing backslash+cr+newline (aka dos-files)
|
53
|
+
* Fixed lambda + do block edge case.
|
54
|
+
* Fixed lexing of some ?\M... and ?\C... edge cases.
|
55
|
+
* Fixed more do/brace block edge case failures.
|
56
|
+
* Fixed parsing bug where splat was used in the middle of a list.
|
57
|
+
* Fixed parsing of interpolation in heredoc-like strings. (presidentbeef)
|
58
|
+
* Fixed parsing some esoteric edge cases in op_asgn.
|
59
|
+
* Fixed unicode processing in ident chars so now they better mix.
|
60
|
+
|
1
61
|
=== 3.14.2 / 2020-02-06
|
2
62
|
|
3
63
|
* 1 minor enhancement:
|
data/Manifest.txt
CHANGED
data/Rakefile
CHANGED
@@ -8,11 +8,12 @@ Hoe.plugin :racc
|
|
8
8
|
Hoe.plugin :isolate
|
9
9
|
Hoe.plugin :rdoc
|
10
10
|
|
11
|
+
Hoe.add_include_dirs "lib"
|
11
12
|
Hoe.add_include_dirs "../../sexp_processor/dev/lib"
|
12
13
|
Hoe.add_include_dirs "../../minitest/dev/lib"
|
13
14
|
Hoe.add_include_dirs "../../oedipus_lex/dev/lib"
|
14
15
|
|
15
|
-
V2 = %w[20 21 22 23 24 25 26]
|
16
|
+
V2 = %w[20 21 22 23 24 25 26 27 30]
|
16
17
|
V2.replace [V2.last] if ENV["FAST"] # HACK
|
17
18
|
|
18
19
|
Hoe.spec "ruby_parser" do
|
@@ -20,11 +21,18 @@ Hoe.spec "ruby_parser" do
|
|
20
21
|
|
21
22
|
license "MIT"
|
22
23
|
|
23
|
-
dependency "sexp_processor", "~> 4.
|
24
|
-
dependency "rake", "<
|
24
|
+
dependency "sexp_processor", ["~> 4.15", ">= 4.15.1"]
|
25
|
+
dependency "rake", [">= 10", "< 15"], :developer
|
25
26
|
dependency "oedipus_lex", "~> 2.5", :developer
|
26
27
|
|
27
|
-
|
28
|
+
# NOTE: Ryan!!! Stop trying to fix this dependency! Isolate just
|
29
|
+
# can't handle having a faux-gem half-installed! Stop! Just `gem
|
30
|
+
# install racc` and move on. Revisit this ONLY once racc-compiler
|
31
|
+
# gets split out.
|
32
|
+
|
33
|
+
dependency "racc", "~> 1.5", :developer
|
34
|
+
|
35
|
+
require_ruby_version [">= 2.1", "< 4"]
|
28
36
|
|
29
37
|
if plugin? :perforce then # generated files
|
30
38
|
V2.each do |n|
|
@@ -56,6 +64,8 @@ end
|
|
56
64
|
|
57
65
|
file "lib/ruby_lexer.rex.rb" => "lib/ruby_lexer.rex"
|
58
66
|
|
67
|
+
task :generate => [:lexer, :parser]
|
68
|
+
|
59
69
|
task :clean do
|
60
70
|
rm_rf(Dir["**/*~"] +
|
61
71
|
Dir["diff.diff"] + # not all diffs. bit me too many times
|
@@ -89,7 +99,7 @@ end
|
|
89
99
|
|
90
100
|
def dl v
|
91
101
|
dir = v[/^\d+\.\d+/]
|
92
|
-
url = "https://cache.ruby-lang.org/pub/ruby/#{dir}/ruby-#{v}.tar.
|
102
|
+
url = "https://cache.ruby-lang.org/pub/ruby/#{dir}/ruby-#{v}.tar.xz"
|
93
103
|
path = File.basename url
|
94
104
|
unless File.exist? path then
|
95
105
|
system "curl -O #{url}"
|
@@ -101,7 +111,7 @@ def ruby_parse version
|
|
101
111
|
rp_txt = "rp#{v}.txt"
|
102
112
|
mri_txt = "mri#{v}.txt"
|
103
113
|
parse_y = "parse#{v}.y"
|
104
|
-
tarball = "ruby-#{version}.tar.
|
114
|
+
tarball = "ruby-#{version}.tar.xz"
|
105
115
|
ruby_dir = "ruby-#{version}"
|
106
116
|
diff = "diff#{v}.diff"
|
107
117
|
rp_out = "lib/ruby#{v}_parser.output"
|
@@ -121,23 +131,40 @@ def ruby_parse version
|
|
121
131
|
end
|
122
132
|
end
|
123
133
|
|
134
|
+
desc "fetch all tarballs"
|
135
|
+
task :fetch => c_tarball
|
136
|
+
|
124
137
|
file c_parse_y => c_tarball do
|
125
138
|
in_compare do
|
126
|
-
|
139
|
+
extract_glob = case version
|
140
|
+
when /2\.7|3\.0/
|
141
|
+
"{id.h,parse.y,tool/{id2token.rb,lib/vpath.rb}}"
|
142
|
+
else
|
143
|
+
"{id.h,parse.y,tool/{id2token.rb,vpath.rb}}"
|
144
|
+
end
|
145
|
+
system "tar Jxf #{tarball} #{ruby_dir}/#{extract_glob}"
|
146
|
+
|
127
147
|
Dir.chdir ruby_dir do
|
128
148
|
if File.exist? "tool/id2token.rb" then
|
129
149
|
sh "ruby tool/id2token.rb --path-separator=.:./ id.h parse.y | expand > ../#{parse_y}"
|
130
150
|
else
|
131
151
|
sh "expand parse.y > ../#{parse_y}"
|
132
152
|
end
|
153
|
+
|
154
|
+
ruby "-pi", "-e", 'gsub(/^%define\s+api\.pure/, "%pure-parser")', "../#{parse_y}"
|
133
155
|
end
|
134
156
|
sh "rm -rf #{ruby_dir}"
|
135
157
|
end
|
136
158
|
end
|
137
159
|
|
160
|
+
bison = Dir["/opt/homebrew/opt/bison/bin/bison",
|
161
|
+
"/usr/local/opt/bison/bin/bison",
|
162
|
+
`which bison`.chomp,
|
163
|
+
].first
|
164
|
+
|
138
165
|
file c_mri_txt => [c_parse_y, normalize] do
|
139
166
|
in_compare do
|
140
|
-
sh "bison -r all #{parse_y}"
|
167
|
+
sh "#{bison} -r all #{parse_y}"
|
141
168
|
sh "./normalize.rb parse#{v}.output > #{mri_txt}"
|
142
169
|
rm ["parse#{v}.output", "parse#{v}.tab.c"]
|
143
170
|
end
|
@@ -178,20 +205,54 @@ def ruby_parse version
|
|
178
205
|
end
|
179
206
|
end
|
180
207
|
|
208
|
+
task :versions do
|
209
|
+
require "open-uri"
|
210
|
+
require "net/http" # avoid require issues in threads
|
211
|
+
require "net/https"
|
212
|
+
|
213
|
+
versions = %w[ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.0 ]
|
214
|
+
|
215
|
+
base_url = "https://cache.ruby-lang.org/pub/ruby"
|
216
|
+
|
217
|
+
class Array
|
218
|
+
def human_sort
|
219
|
+
sort_by { |item| item.to_s.split(/(\d+)/).map { |e| [e.to_i, e] } }
|
220
|
+
end
|
221
|
+
end
|
222
|
+
|
223
|
+
versions = versions.map { |ver|
|
224
|
+
Thread.new {
|
225
|
+
URI
|
226
|
+
.parse("#{base_url}/#{ver}/")
|
227
|
+
.read
|
228
|
+
.scan(/ruby-\d+\.\d+\.\d+[-\w.]*?.tar.gz/)
|
229
|
+
.reject { |s| s =~ /-(?:rc|preview)\d/ }
|
230
|
+
.human_sort
|
231
|
+
.last
|
232
|
+
.delete_prefix("ruby-")
|
233
|
+
.delete_suffix ".tar.gz"
|
234
|
+
}
|
235
|
+
}.map(&:value).sort
|
236
|
+
|
237
|
+
puts versions.map { |v| "ruby_parse %p" % [v] }
|
238
|
+
end
|
239
|
+
|
181
240
|
ruby_parse "2.0.0-p648"
|
182
|
-
ruby_parse "2.1.
|
183
|
-
ruby_parse "2.2.
|
241
|
+
ruby_parse "2.1.10"
|
242
|
+
ruby_parse "2.2.10"
|
184
243
|
ruby_parse "2.3.8"
|
185
|
-
ruby_parse "2.4.
|
186
|
-
ruby_parse "2.5.
|
187
|
-
ruby_parse "2.6.
|
244
|
+
ruby_parse "2.4.10"
|
245
|
+
ruby_parse "2.5.9"
|
246
|
+
ruby_parse "2.6.8"
|
247
|
+
ruby_parse "2.7.4"
|
248
|
+
ruby_parse "3.0.2"
|
188
249
|
|
189
250
|
task :debug => :isolate do
|
190
251
|
ENV["V"] ||= V2.last
|
191
252
|
Rake.application[:parser].invoke # this way we can have DEBUG set
|
192
253
|
Rake.application[:lexer].invoke # this way we can have DEBUG set
|
193
254
|
|
194
|
-
|
255
|
+
$:.unshift "lib"
|
195
256
|
require "ruby_parser"
|
196
257
|
require "pp"
|
197
258
|
|
@@ -214,8 +275,9 @@ task :debug => :isolate do
|
|
214
275
|
|
215
276
|
begin
|
216
277
|
pp parser.process(ruby, file, time)
|
217
|
-
rescue Racc::ParseError => e
|
278
|
+
rescue ArgumentError, Racc::ParseError => e
|
218
279
|
p e
|
280
|
+
puts e.backtrace.join "\n "
|
219
281
|
ss = parser.lexer.ss
|
220
282
|
src = ss.string
|
221
283
|
lines = src[0..ss.pos].split(/\n/)
|
@@ -232,12 +294,17 @@ task :debug3 do
|
|
232
294
|
|
233
295
|
ENV.delete "V"
|
234
296
|
|
297
|
+
sh "ruby -v"
|
235
298
|
sh "ruby -y #{file} 2>&1 | #{munge} > tmp/ruby"
|
236
299
|
sh "./tools/ripper.rb -d #{file} | #{munge} > tmp/rip"
|
237
|
-
sh "rake debug F=#{file} DEBUG=1
|
300
|
+
sh "rake debug F=#{file} DEBUG=1 2>&1 | #{munge} > tmp/rp"
|
238
301
|
sh "diff -U 999 -d tmp/{rip,rp}"
|
239
302
|
end
|
240
303
|
|
304
|
+
task :cmp do
|
305
|
+
sh %(emacsclient --eval '(ediff-files "tmp/ruby" "tmp/rp")')
|
306
|
+
end
|
307
|
+
|
241
308
|
task :cmp3 do
|
242
309
|
sh %(emacsclient --eval '(ediff-files3 "tmp/ruby" "tmp/rip" "tmp/rp")')
|
243
310
|
end
|
@@ -104,9 +104,14 @@ rescue Timeout::Error
|
|
104
104
|
warn "TIMEOUT parsing #{file}. Skipping."
|
105
105
|
|
106
106
|
if $m then
|
107
|
-
|
108
|
-
|
109
|
-
|
107
|
+
base_dir, *rest = file.split("/")
|
108
|
+
base_dir.sub!(/\.slow\.?.*/, "")
|
109
|
+
base_dir += ".slow.#{time}"
|
110
|
+
|
111
|
+
new_file = File.join(base_dir, *rest)
|
112
|
+
|
113
|
+
FileUtils.mkdir_p File.dirname(new_file)
|
114
|
+
FileUtils.move file, new_file, verbose:true
|
110
115
|
elsif $t then
|
111
116
|
File.unlink file
|
112
117
|
end
|
data/compare/normalize.rb
CHANGED
@@ -8,6 +8,10 @@ order = []
|
|
8
8
|
|
9
9
|
def munge s
|
10
10
|
renames = [
|
11
|
+
# unquote... wtf?
|
12
|
+
/`(.+?)'/, proc { $1 },
|
13
|
+
/"'(.+?)'"/, proc { "\"#{$1}\"" },
|
14
|
+
|
11
15
|
"'='", "tEQL",
|
12
16
|
"'!'", "tBANG",
|
13
17
|
"'%'", "tPERCENT",
|
@@ -100,6 +104,43 @@ def munge s
|
|
100
104
|
|
101
105
|
"kVARIABLE", "keyword_variable", # ugh: this is a rule name
|
102
106
|
|
107
|
+
# 2.7 changes:
|
108
|
+
|
109
|
+
'"global variable"', "tGVAR",
|
110
|
+
'"operator-assignment"', "tOP_ASGN",
|
111
|
+
'"back reference"', "tBACK_REF",
|
112
|
+
'"numbered reference"', "tNTH_REF",
|
113
|
+
'"local variable or method"', "tIDENTIFIER",
|
114
|
+
'"constant"', "tCONSTANT",
|
115
|
+
|
116
|
+
'"(.."', "tBDOT2",
|
117
|
+
'"(..."', "tBDOT3",
|
118
|
+
'"char literal"', "tCHAR",
|
119
|
+
'"literal content"', "tSTRING_CONTENT",
|
120
|
+
'"string literal"', "tSTRING_BEG",
|
121
|
+
'"symbol literal"', "tSYMBEG",
|
122
|
+
'"backtick literal"', "tXSTRING_BEG",
|
123
|
+
'"regexp literal"', "tREGEXP_BEG",
|
124
|
+
'"word list"', "tWORDS_BEG",
|
125
|
+
'"verbatim word list"', "tQWORDS_BEG",
|
126
|
+
'"symbol list"', "tSYMBOLS_BEG",
|
127
|
+
'"verbatim symbol list"', "tQSYMBOLS_BEG",
|
128
|
+
|
129
|
+
'"float literal"', "tFLOAT",
|
130
|
+
'"imaginary literal"', "tIMAGINARY",
|
131
|
+
'"integer literal"', "tINTEGER",
|
132
|
+
'"rational literal"', "tRATIONAL",
|
133
|
+
|
134
|
+
'"instance variable"', "tIVAR",
|
135
|
+
'"class variable"', "tCVAR",
|
136
|
+
'"terminator"', "tSTRING_END", # TODO: switch this?
|
137
|
+
'"method"', "tFID",
|
138
|
+
'"}"', "tSTRING_DEND",
|
139
|
+
|
140
|
+
'"do for block"', "kDO_BLOCK",
|
141
|
+
'"do for condition"', "kDO_COND",
|
142
|
+
'"do for lambda"', "kDO_LAMBDA",
|
143
|
+
|
103
144
|
# UGH
|
104
145
|
"k_LINE__", "k__LINE__",
|
105
146
|
"k_FILE__", "k__FILE__",
|
@@ -107,13 +148,12 @@ def munge s
|
|
107
148
|
|
108
149
|
'"defined?"', "kDEFINED",
|
109
150
|
|
110
|
-
|
111
151
|
'"do (for condition)"', "kDO_COND",
|
112
152
|
'"do (for lambda)"', "kDO_LAMBDA",
|
113
153
|
'"do (for block)"', "kDO_BLOCK",
|
114
154
|
|
115
|
-
/\"(\w+) \(modifier\)
|
116
|
-
/\"(\w+)\"/,
|
155
|
+
/\"(\w+) \(?modifier\)?\"/, proc { |x| "k#{$1.upcase}_MOD" },
|
156
|
+
/\"(\w+)\"/, proc { |x| "k#{$1.upcase}" },
|
117
157
|
|
118
158
|
/@(\d+)(\s+|$)/, "",
|
119
159
|
]
|
@@ -134,7 +174,7 @@ ARGF.each_line do |line|
|
|
134
174
|
|
135
175
|
case line.strip
|
136
176
|
when /^$/ then
|
137
|
-
when /^(\d+) (
|
177
|
+
when /^(\d+) (\$?[@\w]+): (.*)/ then # yacc
|
138
178
|
rule = $2
|
139
179
|
order << rule unless rules.has_key? rule
|
140
180
|
rules[rule] << munge($3)
|
@@ -159,7 +199,7 @@ ARGF.each_line do |line|
|
|
159
199
|
when /^\cL/ then # byacc
|
160
200
|
break
|
161
201
|
else
|
162
|
-
warn "unparsed: #{$.}: #{line.
|
202
|
+
warn "unparsed: #{$.}: #{line.strip.inspect}"
|
163
203
|
end
|
164
204
|
end
|
165
205
|
|
data/debugging.md
CHANGED
@@ -1,5 +1,44 @@
|
|
1
1
|
# Quick Notes to Help with Debugging
|
2
2
|
|
3
|
+
## Reducing
|
4
|
+
|
5
|
+
One of the most important steps is reducing the code sample to a
|
6
|
+
minimal reproduction. For example, one thing I'm debugging right now
|
7
|
+
was reported as:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
a, b, c, d, e, f, g, h, i, j = 1, *[p1, p2, p3], *[p1, p2, p3], *[p4, p5, p6]
|
11
|
+
```
|
12
|
+
|
13
|
+
This original sample has 10 items on the left-hand-side (LHS) and 1 +
|
14
|
+
3 groups of 3 (calls) on the RHS + 3 arrays + 3 splats. That's a lot.
|
15
|
+
|
16
|
+
It's already been reported (perhaps incorrectly) that this has to do
|
17
|
+
with multiple splats on the RHS, so let's focus on that. At a minimum
|
18
|
+
the code can be reduced to 2 splats on the RHS and some
|
19
|
+
experimentation shows that it needs a non-splat item to fail:
|
20
|
+
|
21
|
+
```
|
22
|
+
_, _, _ = 1, *[2], *[3]
|
23
|
+
```
|
24
|
+
|
25
|
+
and some intuition further removed the arrays:
|
26
|
+
|
27
|
+
```
|
28
|
+
_, _, _ = 1, *2, *3
|
29
|
+
```
|
30
|
+
|
31
|
+
the difference is huge and will make a ton of difference when
|
32
|
+
debugging.
|
33
|
+
|
34
|
+
## Getting something to compare
|
35
|
+
|
36
|
+
```
|
37
|
+
% rake debug3 F=file.rb
|
38
|
+
```
|
39
|
+
|
40
|
+
TODO
|
41
|
+
|
3
42
|
## Comparing against ruby / ripper:
|
4
43
|
|
5
44
|
```
|
@@ -16,3 +55,136 @@ From there? Good luck. I'm currently trying to backtrack from rule
|
|
16
55
|
reductions to state change differences. I'd like to figure out a way
|
17
56
|
to go from this sort of diff to a reasonable test that checks state
|
18
57
|
changes but I don't have that set up at this point.
|
58
|
+
|
59
|
+
## Adding New Grammar Productions
|
60
|
+
|
61
|
+
Ruby adds stuff to the parser ALL THE TIME. It's actually hard to keep
|
62
|
+
up with, but I've added some tools and shown what a typical workflow
|
63
|
+
looks like. Let's say you want to add ruby 2.7's "beginless range" (eg
|
64
|
+
`..42`).
|
65
|
+
|
66
|
+
Whenever there's a language feature missing, I start with comparing
|
67
|
+
the parse trees between MRI and RP:
|
68
|
+
|
69
|
+
### Structural Comparing
|
70
|
+
|
71
|
+
There's a bunch of rake tasks `compare27`, `compare26`, etc that try
|
72
|
+
to normalize and diff MRI's parse.y parse tree (just the structure of
|
73
|
+
the tree in yacc) to ruby\_parser's parse tree (racc). It's the first
|
74
|
+
thing I do when I'm adding a new version. Stub out all the version
|
75
|
+
differences, and then start to diff the structure and move
|
76
|
+
ruby\_parser towards the new changes.
|
77
|
+
|
78
|
+
Some differences are just gonna be there... but here's an example of a
|
79
|
+
real diff between MRI 2.7 and ruby_parser as of today:
|
80
|
+
|
81
|
+
```diff
|
82
|
+
arg tDOT3 arg
|
83
|
+
arg tDOT2
|
84
|
+
arg tDOT3
|
85
|
+
- tBDOT2 arg
|
86
|
+
- tBDOT3 arg
|
87
|
+
arg tPLUS arg
|
88
|
+
arg tMINUS arg
|
89
|
+
arg tSTAR2 arg
|
90
|
+
```
|
91
|
+
|
92
|
+
This is a new language feature that ruby_parser doesn't handle yet.
|
93
|
+
It's in MRI (the left hand side of the diff) but not ruby\_parser (the
|
94
|
+
right hand side) so it is a `-` or missing line.
|
95
|
+
|
96
|
+
Some other diffs will have both `+` and `-` lines. That usually
|
97
|
+
happens when MRI has been refactoring the grammar. Sometimes I choose
|
98
|
+
to adapt those refactorings and sometimes it starts to get too
|
99
|
+
difficult to maintain multiple versions of ruby parsing in a single
|
100
|
+
file.
|
101
|
+
|
102
|
+
But! This structural comparing is always a place you should look when
|
103
|
+
ruby_parser is failing to parse something. Maybe it just hasn't been
|
104
|
+
implemented yet and the easiest place to look is the diff.
|
105
|
+
|
106
|
+
### Starting Test First
|
107
|
+
|
108
|
+
The next thing I do is to add a parser test to cover that feature. I
|
109
|
+
usually start with the parser and work backwards towards the lexer as
|
110
|
+
needed, as I find it structures things properly and keeps things goal
|
111
|
+
oriented.
|
112
|
+
|
113
|
+
So, make a new parser test, usually in the versioned section of the
|
114
|
+
parser tests.
|
115
|
+
|
116
|
+
```
|
117
|
+
def test_beginless2
|
118
|
+
rb = "..10\n; ..a\n; c"
|
119
|
+
pt = s(:block,
|
120
|
+
s(:dot2, nil, s(:lit, 0).line(1)).line(1),
|
121
|
+
s(:dot2, nil, s(:call, nil, :a).line(2)).line(2),
|
122
|
+
s(:call, nil, :c).line(3)).line(1)
|
123
|
+
|
124
|
+
assert_parse_line rb, pt, 1
|
125
|
+
|
126
|
+
flunk "not done yet"
|
127
|
+
end
|
128
|
+
```
|
129
|
+
|
130
|
+
(In this case copied and modified the tests for open ranges from 2.6)
|
131
|
+
and run it to get my first error:
|
132
|
+
|
133
|
+
```
|
134
|
+
% rake N=/beginless/
|
135
|
+
|
136
|
+
...
|
137
|
+
|
138
|
+
E
|
139
|
+
|
140
|
+
Finished in 0.021814s, 45.8421 runs/s, 0.0000 assertions/s.
|
141
|
+
|
142
|
+
1) Error:
|
143
|
+
TestRubyParserV27#test_whatevs:
|
144
|
+
Racc::ParseError: (string):1 :: parse error on value ".." (tDOT2)
|
145
|
+
GEMS/2.7.0/gems/racc-1.5.0/lib/racc/parser.rb:538:in `on_error'
|
146
|
+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1304:in `on_error'
|
147
|
+
(eval):3:in `_racc_do_parse_c'
|
148
|
+
(eval):3:in `do_parse'
|
149
|
+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1329:in `block in process'
|
150
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:95:in `block in timeout'
|
151
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `block in catch'
|
152
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
|
153
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
|
154
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
|
155
|
+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1317:in `process'
|
156
|
+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4198:in `assert_parse'
|
157
|
+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4221:in `assert_parse_line'
|
158
|
+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4451:in `test_whatevs'
|
159
|
+
```
|
160
|
+
|
161
|
+
For starters, we know the missing production is for `tBDOT2 arg`. It
|
162
|
+
is currently blowing up because it is getting `tDOT2` and simply
|
163
|
+
doesn't know what to do with it, so it raises the error. As the diff
|
164
|
+
suggests, that's the wrong token to begin with, so it is probably time
|
165
|
+
to also create a lexer test:
|
166
|
+
|
167
|
+
```
|
168
|
+
def test_yylex_bdot2
|
169
|
+
assert_lex3("..42",
|
170
|
+
s(:dot2, nil, s(:lit, 42)),
|
171
|
+
|
172
|
+
:tBDOT2, "..", EXPR_BEG,
|
173
|
+
:tINTEGER, "42", EXPR_NUM)
|
174
|
+
|
175
|
+
flunk "not done yet"
|
176
|
+
end
|
177
|
+
```
|
178
|
+
|
179
|
+
This one is mostly speculative at this point. It says "if we're lexing
|
180
|
+
this string, we should get this sexp if we fully parse it, and the
|
181
|
+
lexical stream should look like this"... That last bit is mostly made
|
182
|
+
up at this point. Sometimes I don't know exactly what expression state
|
183
|
+
things should be in until I start really digging in.
|
184
|
+
|
185
|
+
At this point, I have 2 failing tests that are directing me in the
|
186
|
+
right direction. It's now a matter of digging through
|
187
|
+
`compare/parse26.y` to see how the lexer differs and implementing
|
188
|
+
it...
|
189
|
+
|
190
|
+
But this is a good start to the doco for now. I'll add more later.
|