ruby_parser 3.12.0 → 3.18.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/.autotest +18 -29
- data/History.rdoc +283 -0
- data/Manifest.txt +12 -4
- data/README.rdoc +4 -3
- data/Rakefile +189 -51
- data/bin/ruby_parse +3 -1
- data/bin/ruby_parse_extract_error +19 -36
- data/compare/normalize.rb +76 -4
- data/debugging.md +190 -0
- data/gauntlet.md +106 -0
- data/lib/rp_extensions.rb +14 -42
- data/lib/rp_stringscanner.rb +20 -51
- data/lib/ruby20_parser.rb +4659 -4218
- data/lib/ruby20_parser.y +953 -602
- data/lib/ruby21_parser.rb +4723 -4308
- data/lib/ruby21_parser.y +956 -605
- data/lib/ruby22_parser.rb +4762 -4337
- data/lib/ruby22_parser.y +960 -612
- data/lib/ruby23_parser.rb +4761 -4342
- data/lib/ruby23_parser.y +961 -613
- data/lib/ruby24_parser.rb +4791 -4341
- data/lib/ruby24_parser.y +968 -612
- data/lib/ruby25_parser.rb +4791 -4341
- data/lib/ruby25_parser.y +968 -612
- data/lib/ruby26_parser.rb +7287 -0
- data/lib/ruby26_parser.y +2749 -0
- data/lib/ruby27_parser.rb +8517 -0
- data/lib/ruby27_parser.y +3346 -0
- data/lib/ruby30_parser.rb +8751 -0
- data/lib/ruby30_parser.y +3472 -0
- data/lib/ruby3_parser.yy +3476 -0
- data/lib/ruby_lexer.rb +611 -826
- data/lib/ruby_lexer.rex +48 -40
- data/lib/ruby_lexer.rex.rb +122 -46
- data/lib/ruby_lexer_strings.rb +638 -0
- data/lib/ruby_parser.rb +38 -34
- data/lib/ruby_parser.yy +1710 -704
- data/lib/ruby_parser_extras.rb +987 -553
- data/test/test_ruby_lexer.rb +1718 -1539
- data/test/test_ruby_parser.rb +3957 -2164
- data/test/test_ruby_parser_extras.rb +39 -4
- data/tools/munge.rb +250 -0
- data/tools/ripper.rb +44 -0
- data.tar.gz.sig +0 -0
- metadata +68 -47
- metadata.gz.sig +0 -0
- data/lib/ruby18_parser.rb +0 -5793
- data/lib/ruby18_parser.y +0 -1908
- data/lib/ruby19_parser.rb +0 -6185
- data/lib/ruby19_parser.y +0 -2116
data/debugging.md
ADDED
@@ -0,0 +1,190 @@
|
|
1
|
+
# Quick Notes to Help with Debugging
|
2
|
+
|
3
|
+
## Reducing
|
4
|
+
|
5
|
+
One of the most important steps is reducing the code sample to a
|
6
|
+
minimal reproduction. For example, one thing I'm debugging right now
|
7
|
+
was reported as:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
a, b, c, d, e, f, g, h, i, j = 1, *[p1, p2, p3], *[p1, p2, p3], *[p4, p5, p6]
|
11
|
+
```
|
12
|
+
|
13
|
+
This original sample has 10 items on the left-hand-side (LHS) and 1 +
|
14
|
+
3 groups of 3 (calls) on the RHS + 3 arrays + 3 splats. That's a lot.
|
15
|
+
|
16
|
+
It's already been reported (perhaps incorrectly) that this has to do
|
17
|
+
with multiple splats on the RHS, so let's focus on that. At a minimum
|
18
|
+
the code can be reduced to 2 splats on the RHS and some
|
19
|
+
experimentation shows that it needs a non-splat item to fail:
|
20
|
+
|
21
|
+
```
|
22
|
+
_, _, _ = 1, *[2], *[3]
|
23
|
+
```
|
24
|
+
|
25
|
+
and some intuition further removed the arrays:
|
26
|
+
|
27
|
+
```
|
28
|
+
_, _, _ = 1, *2, *3
|
29
|
+
```
|
30
|
+
|
31
|
+
the difference is huge and will make a ton of difference when
|
32
|
+
debugging.
|
33
|
+
|
34
|
+
## Getting something to compare
|
35
|
+
|
36
|
+
```
|
37
|
+
% rake debug3 F=file.rb
|
38
|
+
```
|
39
|
+
|
40
|
+
TODO
|
41
|
+
|
42
|
+
## Comparing against ruby / ripper:
|
43
|
+
|
44
|
+
```
|
45
|
+
% rake cmp3 F=file.rb
|
46
|
+
```
|
47
|
+
|
48
|
+
This compiles the parser & lexer and then parses file.rb using both
|
49
|
+
ruby, ripper, and ruby_parser in debug modes. The output is munged to
|
50
|
+
be as uniform as possible and diffable. I'm using emacs'
|
51
|
+
`ediff-files3` to compare these files (via `rake cmp3`) all at once,
|
52
|
+
but regular `diff -u tmp/{ruby,rp}` will suffice for most tasks.
|
53
|
+
|
54
|
+
From there? Good luck. I'm currently trying to backtrack from rule
|
55
|
+
reductions to state change differences. I'd like to figure out a way
|
56
|
+
to go from this sort of diff to a reasonable test that checks state
|
57
|
+
changes but I don't have that set up at this point.
|
58
|
+
|
59
|
+
## Adding New Grammar Productions
|
60
|
+
|
61
|
+
Ruby adds stuff to the parser ALL THE TIME. It's actually hard to keep
|
62
|
+
up with, but I've added some tools and shown what a typical workflow
|
63
|
+
looks like. Let's say you want to add ruby 2.7's "beginless range" (eg
|
64
|
+
`..42`).
|
65
|
+
|
66
|
+
Whenever there's a language feature missing, I start with comparing
|
67
|
+
the parse trees between MRI and RP:
|
68
|
+
|
69
|
+
### Structural Comparing
|
70
|
+
|
71
|
+
There's a bunch of rake tasks `compare27`, `compare26`, etc that try
|
72
|
+
to normalize and diff MRI's parse.y parse tree (just the structure of
|
73
|
+
the tree in yacc) to ruby\_parser's parse tree (racc). It's the first
|
74
|
+
thing I do when I'm adding a new version. Stub out all the version
|
75
|
+
differences, and then start to diff the structure and move
|
76
|
+
ruby\_parser towards the new changes.
|
77
|
+
|
78
|
+
Some differences are just gonna be there... but here's an example of a
|
79
|
+
real diff between MRI 2.7 and ruby_parser as of today:
|
80
|
+
|
81
|
+
```diff
|
82
|
+
arg tDOT3 arg
|
83
|
+
arg tDOT2
|
84
|
+
arg tDOT3
|
85
|
+
- tBDOT2 arg
|
86
|
+
- tBDOT3 arg
|
87
|
+
arg tPLUS arg
|
88
|
+
arg tMINUS arg
|
89
|
+
arg tSTAR2 arg
|
90
|
+
```
|
91
|
+
|
92
|
+
This is a new language feature that ruby_parser doesn't handle yet.
|
93
|
+
It's in MRI (the left hand side of the diff) but not ruby\_parser (the
|
94
|
+
right hand side) so it is a `-` or missing line.
|
95
|
+
|
96
|
+
Some other diffs will have both `+` and `-` lines. That usually
|
97
|
+
happens when MRI has been refactoring the grammar. Sometimes I choose
|
98
|
+
to adapt those refactorings and sometimes it starts to get too
|
99
|
+
difficult to maintain multiple versions of ruby parsing in a single
|
100
|
+
file.
|
101
|
+
|
102
|
+
But! This structural comparing is always a place you should look when
|
103
|
+
ruby_parser is failing to parse something. Maybe it just hasn't been
|
104
|
+
implemented yet and the easiest place to look is the diff.
|
105
|
+
|
106
|
+
### Starting Test First
|
107
|
+
|
108
|
+
The next thing I do is to add a parser test to cover that feature. I
|
109
|
+
usually start with the parser and work backwards towards the lexer as
|
110
|
+
needed, as I find it structures things properly and keeps things goal
|
111
|
+
oriented.
|
112
|
+
|
113
|
+
So, make a new parser test, usually in the versioned section of the
|
114
|
+
parser tests.
|
115
|
+
|
116
|
+
```
|
117
|
+
def test_beginless2
|
118
|
+
rb = "..10\n; ..a\n; c"
|
119
|
+
pt = s(:block,
|
120
|
+
s(:dot2, nil, s(:lit, 0).line(1)).line(1),
|
121
|
+
s(:dot2, nil, s(:call, nil, :a).line(2)).line(2),
|
122
|
+
s(:call, nil, :c).line(3)).line(1)
|
123
|
+
|
124
|
+
assert_parse_line rb, pt, 1
|
125
|
+
|
126
|
+
flunk "not done yet"
|
127
|
+
end
|
128
|
+
```
|
129
|
+
|
130
|
+
(In this case copied and modified the tests for open ranges from 2.6)
|
131
|
+
and run it to get my first error:
|
132
|
+
|
133
|
+
```
|
134
|
+
% rake N=/beginless/
|
135
|
+
|
136
|
+
...
|
137
|
+
|
138
|
+
E
|
139
|
+
|
140
|
+
Finished in 0.021814s, 45.8421 runs/s, 0.0000 assertions/s.
|
141
|
+
|
142
|
+
1) Error:
|
143
|
+
TestRubyParserV27#test_whatevs:
|
144
|
+
Racc::ParseError: (string):1 :: parse error on value ".." (tDOT2)
|
145
|
+
GEMS/2.7.0/gems/racc-1.5.0/lib/racc/parser.rb:538:in `on_error'
|
146
|
+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1304:in `on_error'
|
147
|
+
(eval):3:in `_racc_do_parse_c'
|
148
|
+
(eval):3:in `do_parse'
|
149
|
+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1329:in `block in process'
|
150
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:95:in `block in timeout'
|
151
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `block in catch'
|
152
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
|
153
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
|
154
|
+
RUBY/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
|
155
|
+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1317:in `process'
|
156
|
+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4198:in `assert_parse'
|
157
|
+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4221:in `assert_parse_line'
|
158
|
+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4451:in `test_whatevs'
|
159
|
+
```
|
160
|
+
|
161
|
+
For starters, we know the missing production is for `tBDOT2 arg`. It
|
162
|
+
is currently blowing up because it is getting `tDOT2` and simply
|
163
|
+
doesn't know what to do with it, so it raises the error. As the diff
|
164
|
+
suggests, that's the wrong token to begin with, so it is probably time
|
165
|
+
to also create a lexer test:
|
166
|
+
|
167
|
+
```
|
168
|
+
def test_yylex_bdot2
|
169
|
+
assert_lex3("..42",
|
170
|
+
s(:dot2, nil, s(:lit, 42)),
|
171
|
+
|
172
|
+
:tBDOT2, "..", EXPR_BEG,
|
173
|
+
:tINTEGER, "42", EXPR_NUM)
|
174
|
+
|
175
|
+
flunk "not done yet"
|
176
|
+
end
|
177
|
+
```
|
178
|
+
|
179
|
+
This one is mostly speculative at this point. It says "if we're lexing
|
180
|
+
this string, we should get this sexp if we fully parse it, and the
|
181
|
+
lexical stream should look like this"... That last bit is mostly made
|
182
|
+
up at this point. Sometimes I don't know exactly what expression state
|
183
|
+
things should be in until I start really digging in.
|
184
|
+
|
185
|
+
At this point, I have 2 failing tests that are directing me in the
|
186
|
+
right direction. It's now a matter of digging through
|
187
|
+
`compare/parse26.y` to see how the lexer differs and implementing
|
188
|
+
it...
|
189
|
+
|
190
|
+
But this is a good start to the doco for now. I'll add more later.
|
data/gauntlet.md
ADDED
@@ -0,0 +1,106 @@
|
|
1
|
+
# Running the Gauntlet
|
2
|
+
|
3
|
+
## Maintaining a Gem Mirror
|
4
|
+
|
5
|
+
I use rubygems-mirror to keep an archive of all the latest rubygems on
|
6
|
+
an external disk. Here is the config:
|
7
|
+
|
8
|
+
```
|
9
|
+
---
|
10
|
+
- from: https://rubygems.org
|
11
|
+
to: /Volumes/StuffA/gauntlet/mirror
|
12
|
+
parallelism: 10
|
13
|
+
retries: 3
|
14
|
+
delete: true
|
15
|
+
skiperror: true
|
16
|
+
hashdir: true
|
17
|
+
```
|
18
|
+
|
19
|
+
And I update using rake:
|
20
|
+
|
21
|
+
```
|
22
|
+
% cd ~/Work/git/rubygems/rubygems-mirror
|
23
|
+
% git down
|
24
|
+
% rake mirror:latest
|
25
|
+
% /Volumes/StuffA/gauntlet/bin/cleanup.rb
|
26
|
+
```
|
27
|
+
|
28
|
+
This rather quickly updates my mirror to the latest versions of
|
29
|
+
everything and then deletes all old versions. I then run a cleanup
|
30
|
+
script that fixes the file dates to their publication date and deletes
|
31
|
+
any gems that have invalid specs. This can argue with the mirror a
|
32
|
+
bit, but it is pretty minimal (currently ~20 bad gems).
|
33
|
+
|
34
|
+
## Curating an Archive of Ruby Files
|
35
|
+
|
36
|
+
Next, I process the gem mirror into a much more digestable structure
|
37
|
+
using `hash.rb` (TODO: needs a better name):
|
38
|
+
|
39
|
+
```
|
40
|
+
% cd RP
|
41
|
+
% /Volumes/StuffA/gauntlet/bin/unpack_gems.rb
|
42
|
+
... waaaait ...
|
43
|
+
% mv hashed.noindex gauntlet.$(today).noindex
|
44
|
+
% lrztar gauntlet.$(today).noindex
|
45
|
+
% mv gauntlet.$(today).noindex.lrz /Volumes/StuffA/gauntlet/
|
46
|
+
```
|
47
|
+
|
48
|
+
This script filters all the newer gems (TODO: WHY?), unpacks them,
|
49
|
+
finds all the files that look like they're valid ruby, ensures they're
|
50
|
+
valid ruby (using the current version of ruby to compile them), and
|
51
|
+
then moves them into a SHA dir structure that looks something like
|
52
|
+
this:
|
53
|
+
|
54
|
+
```
|
55
|
+
hashed.noindex/a/b/c/<full_file_sha>.rb
|
56
|
+
```
|
57
|
+
|
58
|
+
This removes all duplicates and puts everything in a fairly even,
|
59
|
+
wide, flat directory layout.
|
60
|
+
|
61
|
+
This process takes a very long time, even with a lot of
|
62
|
+
parallelization. There are currently about 160k gems in the mirror.
|
63
|
+
Unpacking, validating, SHA'ing everything is disk and CPU intensive.
|
64
|
+
The `.noindex` extension stops spotlight from indexing the continous
|
65
|
+
churn of files being unpacked and moved and saves time.
|
66
|
+
|
67
|
+
Finally, I rename and archive it all up (currently using lrztar, but
|
68
|
+
I'm not in love with it).
|
69
|
+
|
70
|
+
### Stats
|
71
|
+
|
72
|
+
```
|
73
|
+
9696 % find gauntlet.$(today).noindex -type f | lc
|
74
|
+
561270
|
75
|
+
3.5G gauntlet.2021-08-06.noindex
|
76
|
+
239M gauntlet.2021-08-06.noindex.tar.lrz
|
77
|
+
```
|
78
|
+
|
79
|
+
So I wind up with a little over half a million unique ruby files to
|
80
|
+
parse. It's about 3.5g but compresses very nicely down to 240m
|
81
|
+
|
82
|
+
## Running the Gauntlet
|
83
|
+
|
84
|
+
Assuming you're starting from scratch, unpack the archive once:
|
85
|
+
|
86
|
+
```
|
87
|
+
% lrzuntar gauntlet.$(today).noindex.lrz
|
88
|
+
```
|
89
|
+
|
90
|
+
Then, either run a single process (easier to read):
|
91
|
+
|
92
|
+
```
|
93
|
+
% ./gauntlet/bin/gauntlet.rb gauntlet/*.noindex/?
|
94
|
+
```
|
95
|
+
|
96
|
+
Or max out your machine using xargs (note the `-P 16` and choose accordingly):
|
97
|
+
|
98
|
+
```
|
99
|
+
% ls -d gauntlet/*.noindex/?/? | xargs -n 1 -P 16 ./gauntlet/bin/gauntlet.rb
|
100
|
+
```
|
101
|
+
|
102
|
+
In another terminal I usually monitor the progress like so:
|
103
|
+
|
104
|
+
```
|
105
|
+
% while true ; do clear; fd . -t d -t e gauntlet/*.noindex -X rmdir -p 2> /dev/null ; for D in gauntlet/*.noindex/? ; do echo -n "$D: "; fd .rb $D | wc -l ; done ; echo ; sleep 30 ; done
|
106
|
+
```
|
data/lib/rp_extensions.rb
CHANGED
@@ -10,71 +10,43 @@ class Regexp
|
|
10
10
|
ENC_UTF8 = /x/u.options
|
11
11
|
end
|
12
12
|
end
|
13
|
+
# :startdoc:
|
13
14
|
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
self
|
15
|
+
class Array
|
16
|
+
def prepend *vals
|
17
|
+
self[0,0] = vals
|
18
18
|
end
|
19
|
-
end unless
|
19
|
+
end unless [].respond_to?(:prepend)
|
20
|
+
|
21
|
+
# :stopdoc:
|
22
|
+
class Symbol
|
23
|
+
def end_with? o
|
24
|
+
self.to_s.end_with? o
|
25
|
+
end
|
26
|
+
end unless :woot.respond_to?(:end_with?)
|
20
27
|
# :startdoc:
|
21
28
|
|
22
29
|
############################################################
|
23
30
|
# HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK
|
24
31
|
|
25
|
-
unless "".respond_to?(:grep) then
|
26
|
-
class String
|
27
|
-
def grep re
|
28
|
-
lines.grep re
|
29
|
-
end
|
30
|
-
end
|
31
|
-
end
|
32
|
-
|
33
32
|
class String
|
34
|
-
##
|
35
|
-
# This is a hack used by the lexer to sneak in line numbers at the
|
36
|
-
# identifier level. This should be MUCH smaller than making
|
37
|
-
# process_token return [value, lineno] and modifying EVERYTHING that
|
38
|
-
# reduces tIDENTIFIER.
|
39
|
-
|
40
|
-
attr_accessor :lineno
|
41
|
-
|
42
33
|
def clean_caller
|
43
|
-
self.sub(File.dirname(__FILE__), "
|
34
|
+
self.sub(File.dirname(__FILE__), "./lib").sub(/:in.*/, "")
|
44
35
|
end if $DEBUG
|
45
36
|
end
|
46
37
|
|
47
38
|
require "sexp"
|
48
39
|
|
49
40
|
class Sexp
|
50
|
-
attr_writer :paren
|
41
|
+
attr_writer :paren # TODO: retire
|
51
42
|
|
52
43
|
def paren
|
53
44
|
@paren ||= false
|
54
45
|
end
|
55
46
|
|
56
|
-
def value
|
57
|
-
raise "multi item sexp" if size > 2
|
58
|
-
last
|
59
|
-
end
|
60
|
-
|
61
|
-
def to_sym
|
62
|
-
raise "no: #{self.inspect}.to_sym is a bug"
|
63
|
-
self.value.to_sym
|
64
|
-
end
|
65
|
-
|
66
|
-
alias :add :<<
|
67
|
-
|
68
|
-
def add_all x
|
69
|
-
self.concat x.sexp_body
|
70
|
-
end
|
71
|
-
|
72
47
|
def block_pass?
|
73
48
|
any? { |s| Sexp === s && s.sexp_type == :block_pass }
|
74
49
|
end
|
75
|
-
|
76
|
-
alias :node_type :sexp_type
|
77
|
-
alias :values :sexp_body # TODO: retire
|
78
50
|
end
|
79
51
|
|
80
52
|
# END HACK
|
data/lib/rp_stringscanner.rb
CHANGED
@@ -1,64 +1,33 @@
|
|
1
1
|
require "strscan"
|
2
2
|
|
3
3
|
class RPStringScanner < StringScanner
|
4
|
-
|
5
|
-
# alias :old_getch :getch
|
6
|
-
# def getch
|
7
|
-
# warn({:getch => caller[0]}.inspect)
|
8
|
-
# old_getch
|
9
|
-
# end
|
10
|
-
# end
|
11
|
-
|
12
|
-
if "".respond_to? :encoding then
|
13
|
-
if "".respond_to? :byteslice then
|
14
|
-
def string_to_pos
|
15
|
-
string.byteslice(0, pos)
|
16
|
-
end
|
17
|
-
else
|
18
|
-
def string_to_pos
|
19
|
-
string.bytes.first(pos).pack("c*").force_encoding(string.encoding)
|
20
|
-
end
|
21
|
-
end
|
22
|
-
|
23
|
-
def charpos
|
24
|
-
string_to_pos.length
|
25
|
-
end
|
26
|
-
else
|
27
|
-
alias :charpos :pos
|
28
|
-
|
29
|
-
def string_to_pos
|
30
|
-
string[0..pos]
|
31
|
-
end
|
32
|
-
end
|
33
|
-
|
34
|
-
def unread_many str # TODO: remove this entirely - we should not need it
|
35
|
-
warn({:unread_many => caller[0]}.inspect) if ENV['TALLY']
|
36
|
-
begin
|
37
|
-
string[charpos, 0] = str
|
38
|
-
rescue IndexError
|
39
|
-
# HACK -- this is a bandaid on a dirty rag on an open festering wound
|
40
|
-
end
|
41
|
-
end
|
42
|
-
|
43
|
-
if ENV['DEBUG'] then
|
44
|
-
alias :old_getch :getch
|
4
|
+
if ENV["DEBUG"] || ENV["TALLY"] then
|
45
5
|
def getch
|
46
|
-
c =
|
47
|
-
|
6
|
+
c = super
|
7
|
+
where = caller.drop_while { |s| s =~ /(getch|nextc).$/ }.first
|
8
|
+
where = where.split(/:/).first(2).join(":")
|
9
|
+
if ENV["TALLY"] then
|
10
|
+
d getch:where
|
11
|
+
else
|
12
|
+
d getch:[c, where]
|
13
|
+
end
|
48
14
|
c
|
49
15
|
end
|
50
16
|
|
51
|
-
alias :old_scan :scan
|
52
17
|
def scan re
|
53
|
-
s =
|
54
|
-
where = caller
|
55
|
-
|
18
|
+
s = super
|
19
|
+
where = caller.drop_while { |x| x =~ /scan.$/ }.first
|
20
|
+
where = where.split(/:/).first(2).join(":")
|
21
|
+
if ENV["TALLY"] then
|
22
|
+
d scan:[where]
|
23
|
+
else
|
24
|
+
d scan:[s, where] if s
|
25
|
+
end
|
56
26
|
s
|
57
27
|
end
|
58
|
-
end
|
59
28
|
|
60
|
-
|
61
|
-
|
29
|
+
def d o
|
30
|
+
STDERR.puts o.inspect
|
31
|
+
end
|
62
32
|
end
|
63
33
|
end
|
64
|
-
|