ruby_parser 3.15.0 → 3.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/History.rdoc +101 -0
- data/Manifest.txt +5 -0
- data/README.rdoc +1 -0
- data/Rakefile +128 -30
- data/bin/ruby_parse_extract_error +1 -1
- data/compare/normalize.rb +8 -3
- data/debugging.md +133 -0
- data/gauntlet.md +106 -0
- data/lib/rp_extensions.rb +15 -36
- data/lib/rp_stringscanner.rb +20 -51
- data/lib/ruby20_parser.rb +3559 -3499
- data/lib/ruby20_parser.y +333 -248
- data/lib/ruby21_parser.rb +3650 -3614
- data/lib/ruby21_parser.y +328 -245
- data/lib/ruby22_parser.rb +3690 -3628
- data/lib/ruby22_parser.y +332 -247
- data/lib/ruby23_parser.rb +3629 -3573
- data/lib/ruby23_parser.y +332 -247
- data/lib/ruby24_parser.rb +3712 -3654
- data/lib/ruby24_parser.y +332 -247
- data/lib/ruby25_parser.rb +3712 -3654
- data/lib/ruby25_parser.y +332 -247
- data/lib/ruby26_parser.rb +3715 -3658
- data/lib/ruby26_parser.y +332 -246
- data/lib/ruby27_parser.rb +5009 -3722
- data/lib/ruby27_parser.y +928 -245
- data/lib/ruby30_parser.rb +8741 -0
- data/lib/ruby30_parser.y +3463 -0
- data/lib/ruby3_parser.yy +3467 -0
- data/lib/ruby_lexer.rb +273 -602
- data/lib/ruby_lexer.rex +28 -21
- data/lib/ruby_lexer.rex.rb +60 -24
- data/lib/ruby_lexer_strings.rb +638 -0
- data/lib/ruby_parser.rb +2 -0
- data/lib/ruby_parser.yy +969 -252
- data/lib/ruby_parser_extras.rb +297 -116
- data/test/test_ruby_lexer.rb +213 -129
- data/test/test_ruby_parser.rb +1288 -110
- data/tools/munge.rb +36 -8
- data/tools/ripper.rb +15 -10
- data.tar.gz.sig +0 -0
- metadata +48 -35
- metadata.gz.sig +1 -4
data/gauntlet.md
ADDED
@@ -0,0 +1,106 @@
|
|
1
|
+
# Running the Gauntlet
|
2
|
+
|
3
|
+
## Maintaining a Gem Mirror
|
4
|
+
|
5
|
+
I use rubygems-mirror to keep an archive of all the latest rubygems on
|
6
|
+
an external disk. Here is the config:
|
7
|
+
|
8
|
+
```
|
9
|
+
---
|
10
|
+
- from: https://rubygems.org
|
11
|
+
to: /Volumes/StuffA/gauntlet/mirror
|
12
|
+
parallelism: 10
|
13
|
+
retries: 3
|
14
|
+
delete: true
|
15
|
+
skiperror: true
|
16
|
+
hashdir: true
|
17
|
+
```
|
18
|
+
|
19
|
+
And I update using rake:
|
20
|
+
|
21
|
+
```
|
22
|
+
% cd ~/Work/git/rubygems/rubygems-mirror
|
23
|
+
% git down
|
24
|
+
% rake mirror:latest
|
25
|
+
% /Volumes/StuffA/gauntlet/bin/cleanup.rb
|
26
|
+
```
|
27
|
+
|
28
|
+
This rather quickly updates my mirror to the latest versions of
|
29
|
+
everything and then deletes all old versions. I then run a cleanup
|
30
|
+
script that fixes the file dates to their publication date and deletes
|
31
|
+
any gems that have invalid specs. This can argue with the mirror a
|
32
|
+
bit, but it is pretty minimal (currently ~20 bad gems).
|
33
|
+
|
34
|
+
## Curating an Archive of Ruby Files
|
35
|
+
|
36
|
+
Next, I process the gem mirror into a much more digestable structure
|
37
|
+
using `hash.rb` (TODO: needs a better name):
|
38
|
+
|
39
|
+
```
|
40
|
+
% cd RP
|
41
|
+
% /Volumes/StuffA/gauntlet/bin/unpack_gems.rb
|
42
|
+
... waaaait ...
|
43
|
+
% mv hashed.noindex gauntlet.$(today).noindex
|
44
|
+
% lrztar gauntlet.$(today).noindex
|
45
|
+
% mv gauntlet.$(today).noindex.lrz /Volumes/StuffA/gauntlet/
|
46
|
+
```
|
47
|
+
|
48
|
+
This script filters all the newer gems (TODO: WHY?), unpacks them,
|
49
|
+
finds all the files that look like they're valid ruby, ensures they're
|
50
|
+
valid ruby (using the current version of ruby to compile them), and
|
51
|
+
then moves them into a SHA dir structure that looks something like
|
52
|
+
this:
|
53
|
+
|
54
|
+
```
|
55
|
+
hashed.noindex/a/b/c/<full_file_sha>.rb
|
56
|
+
```
|
57
|
+
|
58
|
+
This removes all duplicates and puts everything in a fairly even,
|
59
|
+
wide, flat directory layout.
|
60
|
+
|
61
|
+
This process takes a very long time, even with a lot of
|
62
|
+
parallelization. There are currently about 160k gems in the mirror.
|
63
|
+
Unpacking, validating, SHA'ing everything is disk and CPU intensive.
|
64
|
+
The `.noindex` extension stops spotlight from indexing the continous
|
65
|
+
churn of files being unpacked and moved and saves time.
|
66
|
+
|
67
|
+
Finally, I rename and archive it all up (currently using lrztar, but
|
68
|
+
I'm not in love with it).
|
69
|
+
|
70
|
+
### Stats
|
71
|
+
|
72
|
+
```
|
73
|
+
9696 % find gauntlet.$(today).noindex -type f | lc
|
74
|
+
561270
|
75
|
+
3.5G gauntlet.2021-08-06.noindex
|
76
|
+
239M gauntlet.2021-08-06.noindex.tar.lrz
|
77
|
+
```
|
78
|
+
|
79
|
+
So I wind up with a little over half a million unique ruby files to
|
80
|
+
parse. It's about 3.5g but compresses very nicely down to 240m
|
81
|
+
|
82
|
+
## Running the Gauntlet
|
83
|
+
|
84
|
+
Assuming you're starting from scratch, unpack the archive once:
|
85
|
+
|
86
|
+
```
|
87
|
+
% lrzuntar gauntlet.$(today).noindex.lrz
|
88
|
+
```
|
89
|
+
|
90
|
+
Then, either run a single process (easier to read):
|
91
|
+
|
92
|
+
```
|
93
|
+
% ./gauntlet/bin/gauntlet.rb gauntlet/*.noindex/?
|
94
|
+
```
|
95
|
+
|
96
|
+
Or max out your machine using xargs (note the `-P 16` and choose accordingly):
|
97
|
+
|
98
|
+
```
|
99
|
+
% ls -d gauntlet/*.noindex/?/? | xargs -n 1 -P 16 ./gauntlet/bin/gauntlet.rb
|
100
|
+
```
|
101
|
+
|
102
|
+
In another terminal I usually monitor the progress like so:
|
103
|
+
|
104
|
+
```
|
105
|
+
% while true ; do clear; fd . -t d -t e gauntlet/*.noindex -X rmdir -p 2> /dev/null ; for D in gauntlet/*.noindex/? ; do echo -n "$D: "; fd .rb $D | wc -l ; done ; echo ; sleep 30 ; done
|
106
|
+
```
|
data/lib/rp_extensions.rb
CHANGED
@@ -12,26 +12,24 @@ class Regexp
|
|
12
12
|
end
|
13
13
|
# :startdoc:
|
14
14
|
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
unless "".respond_to?(:grep) then
|
19
|
-
class String
|
20
|
-
def grep re
|
21
|
-
lines.grep re
|
22
|
-
end
|
15
|
+
class Array
|
16
|
+
def prepend *vals
|
17
|
+
self[0,0] = vals
|
23
18
|
end
|
24
|
-
end
|
19
|
+
end unless [].respond_to?(:prepend)
|
25
20
|
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
21
|
+
# :stopdoc:
|
22
|
+
class Symbol
|
23
|
+
def end_with? o
|
24
|
+
self.to_s.end_with? o
|
25
|
+
end
|
26
|
+
end unless :woot.respond_to?(:end_with?)
|
27
|
+
# :startdoc:
|
32
28
|
|
33
|
-
|
29
|
+
############################################################
|
30
|
+
# HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK HACK
|
34
31
|
|
32
|
+
class String
|
35
33
|
def clean_caller
|
36
34
|
self.sub(File.dirname(__FILE__), "./lib").sub(/:in.*/, "")
|
37
35
|
end if $DEBUG
|
@@ -40,34 +38,15 @@ end
|
|
40
38
|
require "sexp"
|
41
39
|
|
42
40
|
class Sexp
|
43
|
-
attr_writer :paren
|
41
|
+
attr_writer :paren # TODO: retire
|
44
42
|
|
45
43
|
def paren
|
46
44
|
@paren ||= false
|
47
45
|
end
|
48
46
|
|
49
|
-
def value
|
50
|
-
raise "multi item sexp" if size > 2
|
51
|
-
last
|
52
|
-
end
|
53
|
-
|
54
|
-
def to_sym
|
55
|
-
raise "no: #{self.inspect}.to_sym is a bug"
|
56
|
-
self.value.to_sym
|
57
|
-
end
|
58
|
-
|
59
|
-
alias :add :<<
|
60
|
-
|
61
|
-
def add_all x
|
62
|
-
self.concat x.sexp_body
|
63
|
-
end
|
64
|
-
|
65
47
|
def block_pass?
|
66
48
|
any? { |s| Sexp === s && s.sexp_type == :block_pass }
|
67
49
|
end
|
68
|
-
|
69
|
-
alias :node_type :sexp_type
|
70
|
-
alias :values :sexp_body # TODO: retire
|
71
50
|
end
|
72
51
|
|
73
52
|
# END HACK
|
data/lib/rp_stringscanner.rb
CHANGED
@@ -1,64 +1,33 @@
|
|
1
1
|
require "strscan"
|
2
2
|
|
3
3
|
class RPStringScanner < StringScanner
|
4
|
-
|
5
|
-
# alias :old_getch :getch
|
6
|
-
# def getch
|
7
|
-
# warn({:getch => caller[0]}.inspect)
|
8
|
-
# old_getch
|
9
|
-
# end
|
10
|
-
# end
|
11
|
-
|
12
|
-
if "".respond_to? :encoding then
|
13
|
-
if "".respond_to? :byteslice then
|
14
|
-
def string_to_pos
|
15
|
-
string.byteslice(0, pos)
|
16
|
-
end
|
17
|
-
else
|
18
|
-
def string_to_pos
|
19
|
-
string.bytes.first(pos).pack("c*").force_encoding(string.encoding)
|
20
|
-
end
|
21
|
-
end
|
22
|
-
|
23
|
-
def charpos
|
24
|
-
string_to_pos.length
|
25
|
-
end
|
26
|
-
else
|
27
|
-
alias :charpos :pos
|
28
|
-
|
29
|
-
def string_to_pos
|
30
|
-
string[0..pos]
|
31
|
-
end
|
32
|
-
end
|
33
|
-
|
34
|
-
def unread_many str # TODO: remove this entirely - we should not need it
|
35
|
-
warn({:unread_many => caller[0]}.inspect) if ENV['TALLY']
|
36
|
-
begin
|
37
|
-
string[charpos, 0] = str
|
38
|
-
rescue IndexError
|
39
|
-
# HACK -- this is a bandaid on a dirty rag on an open festering wound
|
40
|
-
end
|
41
|
-
end
|
42
|
-
|
43
|
-
if ENV['DEBUG'] then
|
44
|
-
alias :old_getch :getch
|
4
|
+
if ENV["DEBUG"] || ENV["TALLY"] then
|
45
5
|
def getch
|
46
|
-
c =
|
47
|
-
|
6
|
+
c = super
|
7
|
+
where = caller.drop_while { |s| s =~ /(getch|nextc).$/ }.first
|
8
|
+
where = where.split(/:/).first(2).join(":")
|
9
|
+
if ENV["TALLY"] then
|
10
|
+
d getch:where
|
11
|
+
else
|
12
|
+
d getch:[c, where]
|
13
|
+
end
|
48
14
|
c
|
49
15
|
end
|
50
16
|
|
51
|
-
alias :old_scan :scan
|
52
17
|
def scan re
|
53
|
-
s =
|
54
|
-
where = caller
|
55
|
-
|
18
|
+
s = super
|
19
|
+
where = caller.drop_while { |x| x =~ /scan.$/ }.first
|
20
|
+
where = where.split(/:/).first(2).join(":")
|
21
|
+
if ENV["TALLY"] then
|
22
|
+
d scan:[where]
|
23
|
+
else
|
24
|
+
d scan:[s, where] if s
|
25
|
+
end
|
56
26
|
s
|
57
27
|
end
|
58
|
-
end
|
59
28
|
|
60
|
-
|
61
|
-
|
29
|
+
def d o
|
30
|
+
STDERR.puts o.inspect
|
31
|
+
end
|
62
32
|
end
|
63
33
|
end
|
64
|
-
|