ruby_parser 3.12.0 → 3.18.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (52) hide show
  1. checksums.yaml +4 -4
  2. checksums.yaml.gz.sig +0 -0
  3. data/.autotest +18 -29
  4. data/History.rdoc +283 -0
  5. data/Manifest.txt +12 -4
  6. data/README.rdoc +4 -3
  7. data/Rakefile +189 -51
  8. data/bin/ruby_parse +3 -1
  9. data/bin/ruby_parse_extract_error +19 -36
  10. data/compare/normalize.rb +76 -4
  11. data/debugging.md +190 -0
  12. data/gauntlet.md +106 -0
  13. data/lib/rp_extensions.rb +14 -42
  14. data/lib/rp_stringscanner.rb +20 -51
  15. data/lib/ruby20_parser.rb +4659 -4218
  16. data/lib/ruby20_parser.y +953 -602
  17. data/lib/ruby21_parser.rb +4723 -4308
  18. data/lib/ruby21_parser.y +956 -605
  19. data/lib/ruby22_parser.rb +4762 -4337
  20. data/lib/ruby22_parser.y +960 -612
  21. data/lib/ruby23_parser.rb +4761 -4342
  22. data/lib/ruby23_parser.y +961 -613
  23. data/lib/ruby24_parser.rb +4791 -4341
  24. data/lib/ruby24_parser.y +968 -612
  25. data/lib/ruby25_parser.rb +4791 -4341
  26. data/lib/ruby25_parser.y +968 -612
  27. data/lib/ruby26_parser.rb +7287 -0
  28. data/lib/ruby26_parser.y +2749 -0
  29. data/lib/ruby27_parser.rb +8517 -0
  30. data/lib/ruby27_parser.y +3346 -0
  31. data/lib/ruby30_parser.rb +8751 -0
  32. data/lib/ruby30_parser.y +3472 -0
  33. data/lib/ruby3_parser.yy +3476 -0
  34. data/lib/ruby_lexer.rb +611 -826
  35. data/lib/ruby_lexer.rex +48 -40
  36. data/lib/ruby_lexer.rex.rb +122 -46
  37. data/lib/ruby_lexer_strings.rb +638 -0
  38. data/lib/ruby_parser.rb +38 -34
  39. data/lib/ruby_parser.yy +1710 -704
  40. data/lib/ruby_parser_extras.rb +987 -553
  41. data/test/test_ruby_lexer.rb +1718 -1539
  42. data/test/test_ruby_parser.rb +3957 -2164
  43. data/test/test_ruby_parser_extras.rb +39 -4
  44. data/tools/munge.rb +250 -0
  45. data/tools/ripper.rb +44 -0
  46. data.tar.gz.sig +0 -0
  47. metadata +68 -47
  48. metadata.gz.sig +0 -0
  49. data/lib/ruby18_parser.rb +0 -5793
  50. data/lib/ruby18_parser.y +0 -1908
  51. data/lib/ruby19_parser.rb +0 -6185
  52. data/lib/ruby19_parser.y +0 -2116
data/debugging.md ADDED
@@ -0,0 +1,190 @@
1
+ # Quick Notes to Help with Debugging
2
+
3
+ ## Reducing
4
+
5
+ One of the most important steps is reducing the code sample to a
6
+ minimal reproduction. For example, one thing I'm debugging right now
7
+ was reported as:
8
+
9
+ ```ruby
10
+ a, b, c, d, e, f, g, h, i, j = 1, *[p1, p2, p3], *[p1, p2, p3], *[p4, p5, p6]
11
+ ```
12
+
13
+ This original sample has 10 items on the left-hand-side (LHS) and 1 +
14
+ 3 groups of 3 (calls) on the RHS + 3 arrays + 3 splats. That's a lot.
15
+
16
+ It's already been reported (perhaps incorrectly) that this has to do
17
+ with multiple splats on the RHS, so let's focus on that. At a minimum
18
+ the code can be reduced to 2 splats on the RHS and some
19
+ experimentation shows that it needs a non-splat item to fail:
20
+
21
+ ```
22
+ _, _, _ = 1, *[2], *[3]
23
+ ```
24
+
25
+ and some intuition further removed the arrays:
26
+
27
+ ```
28
+ _, _, _ = 1, *2, *3
29
+ ```
30
+
31
+ the difference is huge and will make a ton of difference when
32
+ debugging.
33
+
34
+ ## Getting something to compare
35
+
36
+ ```
37
+ % rake debug3 F=file.rb
38
+ ```
39
+
40
+ TODO
41
+
42
+ ## Comparing against ruby / ripper:
43
+
44
+ ```
45
+ % rake cmp3 F=file.rb
46
+ ```
47
+
48
+ This compiles the parser & lexer and then parses file.rb using both
49
+ ruby, ripper, and ruby_parser in debug modes. The output is munged to
50
+ be as uniform as possible and diffable. I'm using emacs'
51
+ `ediff-files3` to compare these files (via `rake cmp3`) all at once,
52
+ but regular `diff -u tmp/{ruby,rp}` will suffice for most tasks.
53
+
54
+ From there? Good luck. I'm currently trying to backtrack from rule
55
+ reductions to state change differences. I'd like to figure out a way
56
+ to go from this sort of diff to a reasonable test that checks state
57
+ changes but I don't have that set up at this point.
58
+
59
+ ## Adding New Grammar Productions
60
+
61
+ Ruby adds stuff to the parser ALL THE TIME. It's actually hard to keep
62
+ up with, but I've added some tools and shown what a typical workflow
63
+ looks like. Let's say you want to add ruby 2.7's "beginless range" (eg
64
+ `..42`).
65
+
66
+ Whenever there's a language feature missing, I start with comparing
67
+ the parse trees between MRI and RP:
68
+
69
+ ### Structural Comparing
70
+
71
+ There's a bunch of rake tasks `compare27`, `compare26`, etc that try
72
+ to normalize and diff MRI's parse.y parse tree (just the structure of
73
+ the tree in yacc) to ruby\_parser's parse tree (racc). It's the first
74
+ thing I do when I'm adding a new version. Stub out all the version
75
+ differences, and then start to diff the structure and move
76
+ ruby\_parser towards the new changes.
77
+
78
+ Some differences are just gonna be there... but here's an example of a
79
+ real diff between MRI 2.7 and ruby_parser as of today:
80
+
81
+ ```diff
82
+ arg tDOT3 arg
83
+ arg tDOT2
84
+ arg tDOT3
85
+ - tBDOT2 arg
86
+ - tBDOT3 arg
87
+ arg tPLUS arg
88
+ arg tMINUS arg
89
+ arg tSTAR2 arg
90
+ ```
91
+
92
+ This is a new language feature that ruby_parser doesn't handle yet.
93
+ It's in MRI (the left hand side of the diff) but not ruby\_parser (the
94
+ right hand side) so it is a `-` or missing line.
95
+
96
+ Some other diffs will have both `+` and `-` lines. That usually
97
+ happens when MRI has been refactoring the grammar. Sometimes I choose
98
+ to adapt those refactorings and sometimes it starts to get too
99
+ difficult to maintain multiple versions of ruby parsing in a single
100
+ file.
101
+
102
+ But! This structural comparing is always a place you should look when
103
+ ruby_parser is failing to parse something. Maybe it just hasn't been
104
+ implemented yet and the easiest place to look is the diff.
105
+
106
+ ### Starting Test First
107
+
108
+ The next thing I do is to add a parser test to cover that feature. I
109
+ usually start with the parser and work backwards towards the lexer as
110
+ needed, as I find it structures things properly and keeps things goal
111
+ oriented.
112
+
113
+ So, make a new parser test, usually in the versioned section of the
114
+ parser tests.
115
+
116
+ ```
117
+ def test_beginless2
118
+ rb = "..10\n; ..a\n; c"
119
+ pt = s(:block,
120
+ s(:dot2, nil, s(:lit, 0).line(1)).line(1),
121
+ s(:dot2, nil, s(:call, nil, :a).line(2)).line(2),
122
+ s(:call, nil, :c).line(3)).line(1)
123
+
124
+ assert_parse_line rb, pt, 1
125
+
126
+ flunk "not done yet"
127
+ end
128
+ ```
129
+
130
+ (In this case copied and modified the tests for open ranges from 2.6)
131
+ and run it to get my first error:
132
+
133
+ ```
134
+ % rake N=/beginless/
135
+
136
+ ...
137
+
138
+ E
139
+
140
+ Finished in 0.021814s, 45.8421 runs/s, 0.0000 assertions/s.
141
+
142
+ 1) Error:
143
+ TestRubyParserV27#test_whatevs:
144
+ Racc::ParseError: (string):1 :: parse error on value ".." (tDOT2)
145
+ GEMS/2.7.0/gems/racc-1.5.0/lib/racc/parser.rb:538:in `on_error'
146
+ WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1304:in `on_error'
147
+ (eval):3:in `_racc_do_parse_c'
148
+ (eval):3:in `do_parse'
149
+ WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1329:in `block in process'
150
+ RUBY/lib/ruby/2.7.0/timeout.rb:95:in `block in timeout'
151
+ RUBY/lib/ruby/2.7.0/timeout.rb:33:in `block in catch'
152
+ RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
153
+ RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
154
+ RUBY/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
155
+ WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1317:in `process'
156
+ WORK/ruby_parser/dev/test/test_ruby_parser.rb:4198:in `assert_parse'
157
+ WORK/ruby_parser/dev/test/test_ruby_parser.rb:4221:in `assert_parse_line'
158
+ WORK/ruby_parser/dev/test/test_ruby_parser.rb:4451:in `test_whatevs'
159
+ ```
160
+
161
+ For starters, we know the missing production is for `tBDOT2 arg`. It
162
+ is currently blowing up because it is getting `tDOT2` and simply
163
+ doesn't know what to do with it, so it raises the error. As the diff
164
+ suggests, that's the wrong token to begin with, so it is probably time
165
+ to also create a lexer test:
166
+
167
+ ```
168
+ def test_yylex_bdot2
169
+ assert_lex3("..42",
170
+ s(:dot2, nil, s(:lit, 42)),
171
+
172
+ :tBDOT2, "..", EXPR_BEG,
173
+ :tINTEGER, "42", EXPR_NUM)
174
+
175
+ flunk "not done yet"
176
+ end
177
+ ```
178
+
179
+ This one is mostly speculative at this point. It says "if we're lexing
180
+ this string, we should get this sexp if we fully parse it, and the
181
+ lexical stream should look like this"... That last bit is mostly made
182
+ up at this point. Sometimes I don't know exactly what expression state
183
+ things should be in until I start really digging in.
184
+
185
+ At this point, I have 2 failing tests that are directing me in the
186
+ right direction. It's now a matter of digging through
187
+ `compare/parse26.y` to see how the lexer differs and implementing
188
+ it...
189
+
190
+ But this is a good start to the doco for now. I'll add more later.
data/gauntlet.md ADDED
@@ -0,0 +1,106 @@
1
+ # Running the Gauntlet
2
+
3
+ ## Maintaining a Gem Mirror
4
+
5
+ I use rubygems-mirror to keep an archive of all the latest rubygems on
6
+ an external disk. Here is the config:
7
+
8
+ ```
9
+ ---
10
+ - from: https://rubygems.org
11
+ to: /Volumes/StuffA/gauntlet/mirror
12
+ parallelism: 10
13
+ retries: 3
14
+ delete: true
15
+ skiperror: true
16
+ hashdir: true
17
+ ```
18
+
19
+ And I update using rake:
20
+
21
+ ```
22
+ % cd ~/Work/git/rubygems/rubygems-mirror
23
+ % git down
24
+ % rake mirror:latest
25
+ % /Volumes/StuffA/gauntlet/bin/cleanup.rb
26
+ ```
27
+
28
+ This rather quickly updates my mirror to the latest versions of
29
+ everything and then deletes all old versions. I then run a cleanup
30
+ script that fixes the file dates to their publication date and deletes
31
+ any gems that have invalid specs. This can argue with the mirror a
32
+ bit, but it is pretty minimal (currently ~20 bad gems).
33
+
34
+ ## Curating an Archive of Ruby Files
35
+
36
+ Next, I process the gem mirror into a much more digestable structure
37
+ using `hash.rb` (TODO: needs a better name):
38
+
39
+ ```
40
+ % cd RP
41
+ % /Volumes/StuffA/gauntlet/bin/unpack_gems.rb
42
+ ... waaaait ...
43
+ % mv hashed.noindex gauntlet.$(today).noindex
44
+ % lrztar gauntlet.$(today).noindex
45
+ % mv gauntlet.$(today).noindex.lrz /Volumes/StuffA/gauntlet/
46
+ ```
47
+
48
+ This script filters all the newer gems (TODO: WHY?), unpacks them,
49
+ finds all the files that look like they're valid ruby, ensures they're
50
+ valid ruby (using the current version of ruby to compile them), and
51
+ then moves them into a SHA dir structure that looks something like
52
+ this:
53
+
54
+ ```
55
+ hashed.noindex/a/b/c/<full_file_sha>.rb
56
+ ```
57
+
58
+ This removes all duplicates and puts everything in a fairly even,
59
+ wide, flat directory layout.
60
+
61
+ This process takes a very long time, even with a lot of
62
+ parallelization. There are currently about 160k gems in the mirror.
63
+ Unpacking, validating, SHA'ing everything is disk and CPU intensive.
64
+ The `.noindex` extension stops spotlight from indexing the continous
65
+ churn of files being unpacked and moved and saves time.
66
+
67
+ Finally, I rename and archive it all up (currently using lrztar, but
68
+ I'm not in love with it).
69
+
70
+ ### Stats
71
+
72
+ ```
73
+ 9696 % find gauntlet.$(today).noindex -type f | lc
74
+ 561270
75
+ 3.5G gauntlet.2021-08-06.noindex
76
+ 239M gauntlet.2021-08-06.noindex.tar.lrz
77
+ ```
78
+
79
+ So I wind up with a little over half a million unique ruby files to
80
+ parse. It's about 3.5g but compresses very nicely down to 240m
81
+
82
+ ## Running the Gauntlet
83
+
84
+ Assuming you're starting from scratch, unpack the archive once:
85
+
86
+ ```
87
+ % lrzuntar gauntlet.$(today).noindex.lrz
88
+ ```
89
+
90
+ Then, either run a single process (easier to read):
91
+
92
+ ```
93
+ % ./gauntlet/bin/gauntlet.rb gauntlet/*.noindex/?
94
+ ```
95
+
96
+ Or max out your machine using xargs (note the `-P 16` and choose accordingly):
97
+
98
+ ```
99
+ % ls -d gauntlet/*.noindex/?/? | xargs -n 1 -P 16 ./gauntlet/bin/gauntlet.rb
100
+ ```
101
+
102
+ In another terminal I usually monitor the progress like so:
103
+
104
+ ```
105
+ % while true ; do clear; fd . -t d -t e gauntlet/*.noindex -X rmdir -p 2> /dev/null ; for D in gauntlet/*.noindex/? ; do echo -n "$D: "; fd .rb $D | wc -l ; done ; echo ; sleep 30 ; done
106
+ ```
data/lib/rp_extensions.rb CHANGED
@@ -10,71 +10,43 @@ class Regexp
10
10
  ENC_UTF8 = /x/u.options
11
11
  end
12
12
  end
13
+ # :startdoc:
13
14
 
14
- # I hate ruby 1.9 string changes
15
- class Fixnum
16
- def ord
17
- self
15
+ class Array
16
+ def prepend *vals
17
+ self[0,0] = vals
18
18
  end
19
- end unless "a"[0] == "a"
19
+ end unless [].respond_to?(:prepend)
20
+
21
+ # :stopdoc:
22
+ class Symbol
23
+ def end_with? o
24
+ self.to_s.end_with? o
25
+ end
26
+ end unless :woot.respond_to?(:end_with?)
20
27
  # :startdoc:
21
28
 
22
29
  ############################################################
23
30
  # HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK
24
31
 
25
- unless "".respond_to?(:grep) then
26
- class String
27
- def grep re
28
- lines.grep re
29
- end
30
- end
31
- end
32
-
33
32
  class String
34
- ##
35
- # This is a hack used by the lexer to sneak in line numbers at the
36
- # identifier level. This should be MUCH smaller than making
37
- # process_token return [value, lineno] and modifying EVERYTHING that
38
- # reduces tIDENTIFIER.
39
-
40
- attr_accessor :lineno
41
-
42
33
  def clean_caller
43
- self.sub(File.dirname(__FILE__), ".").sub(/:in.*/, "")
34
+ self.sub(File.dirname(__FILE__), "./lib").sub(/:in.*/, "")
44
35
  end if $DEBUG
45
36
  end
46
37
 
47
38
  require "sexp"
48
39
 
49
40
  class Sexp
50
- attr_writer :paren
41
+ attr_writer :paren # TODO: retire
51
42
 
52
43
  def paren
53
44
  @paren ||= false
54
45
  end
55
46
 
56
- def value
57
- raise "multi item sexp" if size > 2
58
- last
59
- end
60
-
61
- def to_sym
62
- raise "no: #{self.inspect}.to_sym is a bug"
63
- self.value.to_sym
64
- end
65
-
66
- alias :add :<<
67
-
68
- def add_all x
69
- self.concat x.sexp_body
70
- end
71
-
72
47
  def block_pass?
73
48
  any? { |s| Sexp === s && s.sexp_type == :block_pass }
74
49
  end
75
-
76
- alias :node_type :sexp_type
77
- alias :values :sexp_body # TODO: retire
78
50
  end
79
51
 
80
52
  # END HACK
@@ -1,64 +1,33 @@
1
1
  require "strscan"
2
2
 
3
3
  class RPStringScanner < StringScanner
4
- # if ENV['TALLY'] then
5
- # alias :old_getch :getch
6
- # def getch
7
- # warn({:getch => caller[0]}.inspect)
8
- # old_getch
9
- # end
10
- # end
11
-
12
- if "".respond_to? :encoding then
13
- if "".respond_to? :byteslice then
14
- def string_to_pos
15
- string.byteslice(0, pos)
16
- end
17
- else
18
- def string_to_pos
19
- string.bytes.first(pos).pack("c*").force_encoding(string.encoding)
20
- end
21
- end
22
-
23
- def charpos
24
- string_to_pos.length
25
- end
26
- else
27
- alias :charpos :pos
28
-
29
- def string_to_pos
30
- string[0..pos]
31
- end
32
- end
33
-
34
- def unread_many str # TODO: remove this entirely - we should not need it
35
- warn({:unread_many => caller[0]}.inspect) if ENV['TALLY']
36
- begin
37
- string[charpos, 0] = str
38
- rescue IndexError
39
- # HACK -- this is a bandaid on a dirty rag on an open festering wound
40
- end
41
- end
42
-
43
- if ENV['DEBUG'] then
44
- alias :old_getch :getch
4
+ if ENV["DEBUG"] || ENV["TALLY"] then
45
5
  def getch
46
- c = self.old_getch
47
- p :getch => [c, caller.first]
6
+ c = super
7
+ where = caller.drop_while { |s| s =~ /(getch|nextc).$/ }.first
8
+ where = where.split(/:/).first(2).join(":")
9
+ if ENV["TALLY"] then
10
+ d getch:where
11
+ else
12
+ d getch:[c, where]
13
+ end
48
14
  c
49
15
  end
50
16
 
51
- alias :old_scan :scan
52
17
  def scan re
53
- s = old_scan re
54
- where = caller[1].split(/:/).first(2).join(":")
55
- d :scan => [s, where] if s
18
+ s = super
19
+ where = caller.drop_while { |x| x =~ /scan.$/ }.first
20
+ where = where.split(/:/).first(2).join(":")
21
+ if ENV["TALLY"] then
22
+ d scan:[where]
23
+ else
24
+ d scan:[s, where] if s
25
+ end
56
26
  s
57
27
  end
58
- end
59
28
 
60
- def d o
61
- $stderr.puts o.inspect
29
+ def d o
30
+ STDERR.puts o.inspect
31
+ end
62
32
  end
63
33
  end
64
-