prism 0.13.0 → 0.15.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +37 -1
- data/README.md +4 -1
- data/config.yml +96 -35
- data/docs/fuzzing.md +5 -10
- data/docs/prism.png +0 -0
- data/docs/serialization.md +10 -0
- data/ext/prism/api_node.c +239 -86
- data/ext/prism/extension.c +35 -48
- data/ext/prism/extension.h +1 -1
- data/include/prism/ast.h +170 -118
- data/include/prism/diagnostic.h +1 -0
- data/include/prism/node.h +8 -0
- data/include/prism/parser.h +26 -0
- data/include/prism/util/pm_buffer.h +3 -0
- data/include/prism/util/pm_constant_pool.h +21 -2
- data/include/prism/util/pm_string.h +2 -1
- data/include/prism/version.h +2 -2
- data/include/prism.h +1 -2
- data/lib/prism/compiler.rb +150 -141
- data/lib/prism/debug.rb +30 -26
- data/lib/prism/dispatcher.rb +42 -0
- data/lib/prism/dsl.rb +23 -8
- data/lib/prism/ffi.rb +4 -4
- data/lib/prism/lex_compat.rb +42 -8
- data/lib/prism/mutation_compiler.rb +18 -3
- data/lib/prism/node.rb +2061 -191
- data/lib/prism/node_ext.rb +44 -0
- data/lib/prism/parse_result.rb +32 -5
- data/lib/prism/pattern.rb +1 -1
- data/lib/prism/serialize.rb +95 -87
- data/lib/prism/visitor.rb +9 -0
- data/prism.gemspec +2 -3
- data/src/diagnostic.c +2 -1
- data/src/node.c +99 -32
- data/src/prettyprint.c +137 -80
- data/src/prism.c +1960 -843
- data/src/serialize.c +140 -79
- data/src/util/pm_buffer.c +9 -7
- data/src/util/pm_constant_pool.c +25 -11
- metadata +3 -4
- data/include/prism/unescape.h +0 -48
- data/src/unescape.c +0 -637
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1d02e58a6abcd72bbe920246599ed073e7c074929dff4f5b2408f87c9a222a51
|
4
|
+
data.tar.gz: 3c89a774748375ff30057d712465678ecff0936db4359d8fef10a819ba410d42
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c8b5d4de5b481f30fe65177f04b22d323cc697512d0a16077a44aefe393c821d1527343d369fc0a4a850cae725d7ef5a5f073c65c29d4cb49a558262e5c4decb
|
7
|
+
data.tar.gz: 6a487e4ca950684075ab462ce3cb9e04653cfa0e9af6888280866f08ca05653172ce86dcb89a8dd7bda1d4c5d4bd45f6f8cd106f428cca723ffe9787ff3d19b5
|
data/CHANGELOG.md
CHANGED
@@ -6,6 +6,40 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
|
|
6
6
|
|
7
7
|
## [Unreleased]
|
8
8
|
|
9
|
+
## [0.15.0] - 2023-10-18
|
10
|
+
|
11
|
+
### Added
|
12
|
+
|
13
|
+
- `BackReferenceReadNode#name` is now provided.
|
14
|
+
- `Index{Operator,And,Or}WriteNode` are introduced, split out from `Call{Operator,And,Or}WriteNode` when the method is `[]`.
|
15
|
+
|
16
|
+
### Changed
|
17
|
+
|
18
|
+
- Ensure `PM_NODE_FLAG_COMMON_MASK` into a constant expression to fix compile errors.
|
19
|
+
- `super(&arg)` is now fixed.
|
20
|
+
- Ensure the last encoding flag on regular expressions wins.
|
21
|
+
- Fix the common whitespace calculation when embedded expressions begin on a line.
|
22
|
+
- Capture groups in regular expressions now scan the unescaped version to get the correct local variables.
|
23
|
+
- `*` and `&` are added to the local table when `...` is found in the parameters of a method definition.
|
24
|
+
|
25
|
+
## [0.14.0] - 2023-10-13
|
26
|
+
|
27
|
+
### Added
|
28
|
+
|
29
|
+
- Syntax errors are added for invalid lambda local semicolon placement.
|
30
|
+
- Lambda locals are now checked for duplicate names.
|
31
|
+
- Destructured parameters are now checked for duplicate names.
|
32
|
+
- `Constant{Read,Path,PathTarget}Node#full_name` and `Constant{Read,Path,PathTarget}Node#full_name_parts` are added to walk constant paths for you to find the full name of the constant.
|
33
|
+
- Syntax errors are added when assigning to a numbered parameter.
|
34
|
+
- `Node::type` is added, which matches the `Node#type` API.
|
35
|
+
- Magic comments are now parsed as part of the parsing process and a new field is added in the form of `ParseResult#magic_comments` to access them.
|
36
|
+
|
37
|
+
### Changed
|
38
|
+
|
39
|
+
- **BREAKING**: `Call*Node#name` methods now return symbols instead of strings.
|
40
|
+
- **BREAKING**: For loops now have their index value considered as part of the body, so depths of local variable assignments will be increased by 1.
|
41
|
+
- Tilde heredocs now split up their lines into multiple string nodes to make them easier to dedent.
|
42
|
+
|
9
43
|
## [0.13.0] - 2023-09-29
|
10
44
|
|
11
45
|
### Added
|
@@ -161,7 +195,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
|
|
161
195
|
|
162
196
|
- 🎉 Initial release! 🎉
|
163
197
|
|
164
|
-
[unreleased]: https://github.com/ruby/prism/compare/v0.
|
198
|
+
[unreleased]: https://github.com/ruby/prism/compare/v0.15.0...HEAD
|
199
|
+
[0.15.0]: https://github.com/ruby/prism/compare/v0.14.0...v0.15.0
|
200
|
+
[0.14.0]: https://github.com/ruby/prism/compare/v0.13.0...v0.14.0
|
165
201
|
[0.13.0]: https://github.com/ruby/prism/compare/v0.12.0...v0.13.0
|
166
202
|
[0.12.0]: https://github.com/ruby/prism/compare/v0.11.0...v0.12.0
|
167
203
|
[0.11.0]: https://github.com/ruby/prism/compare/v0.10.0...v0.11.0
|
data/README.md
CHANGED
@@ -1,4 +1,7 @@
|
|
1
|
-
|
1
|
+
<h1 align="center">Prism Ruby parser</h1>
|
2
|
+
<div align="center">
|
3
|
+
<img alt="Prism Ruby parser" height="256px" src="https://github.com/ruby/prism/blob/main/docs/prism.png?raw=true">
|
4
|
+
</div>
|
2
5
|
|
3
6
|
This is a parser for the Ruby programming language. It is designed to be portable, error tolerant, and maintainable. It is written in C99 and has no dependencies. It is currently being integrated into [CRuby](https://github.com/ruby/ruby), [JRuby](https://github.com/jruby/jruby), [TruffleRuby](https://github.com/oracle/truffleruby), [Sorbet](https://github.com/sorbet/sorbet), and [Syntax Tree](https://github.com/ruby-syntax-tree/syntax_tree).
|
4
7
|
|
data/config.yml
CHANGED
@@ -361,6 +361,8 @@ flags:
|
|
361
361
|
comment: "x - ignores whitespace and allows comments in regular expressions"
|
362
362
|
- name: MULTI_LINE
|
363
363
|
comment: "m - allows $ to match the end of lines within strings"
|
364
|
+
- name: ONCE
|
365
|
+
comment: "o - only interpolates values into the regular expression once"
|
364
366
|
- name: EUC_JP
|
365
367
|
comment: "e - forces the EUC-JP encoding"
|
366
368
|
- name: ASCII_8BIT
|
@@ -369,12 +371,10 @@ flags:
|
|
369
371
|
comment: "s - forces the Windows-31J encoding"
|
370
372
|
- name: UTF_8
|
371
373
|
comment: "u - forces the UTF-8 encoding"
|
372
|
-
- name: ONCE
|
373
|
-
comment: "o - only interpolates values into the regular expression once"
|
374
374
|
- name: StringFlags
|
375
375
|
values:
|
376
376
|
- name: FROZEN
|
377
|
-
comment: "frozen by virtue of a frozen_string_literal comment"
|
377
|
+
comment: "frozen by virtue of a `frozen_string_literal` comment"
|
378
378
|
nodes:
|
379
379
|
- name: AliasGlobalVariableNode
|
380
380
|
fields:
|
@@ -507,6 +507,9 @@ nodes:
|
|
507
507
|
{ **foo }
|
508
508
|
^^^^^
|
509
509
|
- name: BackReferenceReadNode
|
510
|
+
fields:
|
511
|
+
- name: name
|
512
|
+
type: constant
|
510
513
|
comment: |
|
511
514
|
Represents reading a reference to a field in the previous match.
|
512
515
|
|
@@ -630,20 +633,13 @@ nodes:
|
|
630
633
|
type: location?
|
631
634
|
- name: message_loc
|
632
635
|
type: location?
|
633
|
-
- name: opening_loc
|
634
|
-
type: location?
|
635
|
-
- name: arguments
|
636
|
-
type: node?
|
637
|
-
kind: ArgumentsNode
|
638
|
-
- name: closing_loc
|
639
|
-
type: location?
|
640
636
|
- name: flags
|
641
637
|
type: flags
|
642
638
|
kind: CallNodeFlags
|
643
639
|
- name: read_name
|
644
|
-
type:
|
640
|
+
type: constant
|
645
641
|
- name: write_name
|
646
|
-
type:
|
642
|
+
type: constant
|
647
643
|
- name: operator_loc
|
648
644
|
type: location
|
649
645
|
- name: value
|
@@ -674,7 +670,7 @@ nodes:
|
|
674
670
|
type: flags
|
675
671
|
kind: CallNodeFlags
|
676
672
|
- name: name
|
677
|
-
type:
|
673
|
+
type: constant
|
678
674
|
comment: |
|
679
675
|
Represents a method call, in all of the various forms that can take.
|
680
676
|
|
@@ -703,20 +699,13 @@ nodes:
|
|
703
699
|
type: location?
|
704
700
|
- name: message_loc
|
705
701
|
type: location?
|
706
|
-
- name: opening_loc
|
707
|
-
type: location?
|
708
|
-
- name: arguments
|
709
|
-
type: node?
|
710
|
-
kind: ArgumentsNode
|
711
|
-
- name: closing_loc
|
712
|
-
type: location?
|
713
702
|
- name: flags
|
714
703
|
type: flags
|
715
704
|
kind: CallNodeFlags
|
716
705
|
- name: read_name
|
717
|
-
type:
|
706
|
+
type: constant
|
718
707
|
- name: write_name
|
719
|
-
type:
|
708
|
+
type: constant
|
720
709
|
- name: operator
|
721
710
|
type: constant
|
722
711
|
- name: operator_loc
|
@@ -736,20 +725,13 @@ nodes:
|
|
736
725
|
type: location?
|
737
726
|
- name: message_loc
|
738
727
|
type: location?
|
739
|
-
- name: opening_loc
|
740
|
-
type: location?
|
741
|
-
- name: arguments
|
742
|
-
type: node?
|
743
|
-
kind: ArgumentsNode
|
744
|
-
- name: closing_loc
|
745
|
-
type: location?
|
746
728
|
- name: flags
|
747
729
|
type: flags
|
748
730
|
kind: CallNodeFlags
|
749
731
|
- name: read_name
|
750
|
-
type:
|
732
|
+
type: constant
|
751
733
|
- name: write_name
|
752
|
-
type:
|
734
|
+
type: constant
|
753
735
|
- name: operator_loc
|
754
736
|
type: location
|
755
737
|
- name: value
|
@@ -1443,6 +1425,89 @@ nodes:
|
|
1443
1425
|
|
1444
1426
|
case a; in b then c end
|
1445
1427
|
^^^^^^^^^^^
|
1428
|
+
- name: IndexAndWriteNode
|
1429
|
+
fields:
|
1430
|
+
- name: receiver
|
1431
|
+
type: node?
|
1432
|
+
- name: call_operator_loc
|
1433
|
+
type: location?
|
1434
|
+
- name: opening_loc
|
1435
|
+
type: location
|
1436
|
+
- name: arguments
|
1437
|
+
type: node?
|
1438
|
+
kind: ArgumentsNode
|
1439
|
+
- name: closing_loc
|
1440
|
+
type: location
|
1441
|
+
- name: block
|
1442
|
+
type: node?
|
1443
|
+
- name: flags
|
1444
|
+
type: flags
|
1445
|
+
kind: CallNodeFlags
|
1446
|
+
- name: operator_loc
|
1447
|
+
type: location
|
1448
|
+
- name: value
|
1449
|
+
type: node
|
1450
|
+
comment: |
|
1451
|
+
Represents the use of the `&&=` operator on a call to the `[]` method.
|
1452
|
+
|
1453
|
+
foo.bar[baz] &&= value
|
1454
|
+
^^^^^^^^^^^^^^^^^^^^^^
|
1455
|
+
- name: IndexOperatorWriteNode
|
1456
|
+
fields:
|
1457
|
+
- name: receiver
|
1458
|
+
type: node?
|
1459
|
+
- name: call_operator_loc
|
1460
|
+
type: location?
|
1461
|
+
- name: opening_loc
|
1462
|
+
type: location
|
1463
|
+
- name: arguments
|
1464
|
+
type: node?
|
1465
|
+
kind: ArgumentsNode
|
1466
|
+
- name: closing_loc
|
1467
|
+
type: location
|
1468
|
+
- name: block
|
1469
|
+
type: node?
|
1470
|
+
- name: flags
|
1471
|
+
type: flags
|
1472
|
+
kind: CallNodeFlags
|
1473
|
+
- name: operator
|
1474
|
+
type: constant
|
1475
|
+
- name: operator_loc
|
1476
|
+
type: location
|
1477
|
+
- name: value
|
1478
|
+
type: node
|
1479
|
+
comment: |
|
1480
|
+
Represents the use of an assignment operator on a call to `[]`.
|
1481
|
+
|
1482
|
+
foo.bar[baz] += value
|
1483
|
+
^^^^^^^^^^^^^^^^^^^^^
|
1484
|
+
- name: IndexOrWriteNode
|
1485
|
+
fields:
|
1486
|
+
- name: receiver
|
1487
|
+
type: node?
|
1488
|
+
- name: call_operator_loc
|
1489
|
+
type: location?
|
1490
|
+
- name: opening_loc
|
1491
|
+
type: location
|
1492
|
+
- name: arguments
|
1493
|
+
type: node?
|
1494
|
+
kind: ArgumentsNode
|
1495
|
+
- name: closing_loc
|
1496
|
+
type: location
|
1497
|
+
- name: block
|
1498
|
+
type: node?
|
1499
|
+
- name: flags
|
1500
|
+
type: flags
|
1501
|
+
kind: CallNodeFlags
|
1502
|
+
- name: operator_loc
|
1503
|
+
type: location
|
1504
|
+
- name: value
|
1505
|
+
type: node
|
1506
|
+
comment: |
|
1507
|
+
Represents the use of the `||=` operator on a call to `[]`.
|
1508
|
+
|
1509
|
+
foo.bar[baz] ||= value
|
1510
|
+
^^^^^^^^^^^^^^^^^^^^^^
|
1446
1511
|
- name: InstanceVariableAndWriteNode
|
1447
1512
|
fields:
|
1448
1513
|
- name: name
|
@@ -1772,7 +1837,6 @@ nodes:
|
|
1772
1837
|
type: location
|
1773
1838
|
- name: content_loc
|
1774
1839
|
type: location
|
1775
|
-
semantic_field: true # https://github.com/ruby/prism/issues/1452
|
1776
1840
|
- name: closing_loc
|
1777
1841
|
type: location
|
1778
1842
|
- name: unescaped
|
@@ -2093,7 +2157,6 @@ nodes:
|
|
2093
2157
|
type: location
|
2094
2158
|
- name: content_loc
|
2095
2159
|
type: location
|
2096
|
-
semantic_field: true # https://github.com/ruby/prism/issues/1452
|
2097
2160
|
- name: closing_loc
|
2098
2161
|
type: location
|
2099
2162
|
- name: unescaped
|
@@ -2287,10 +2350,8 @@ nodes:
|
|
2287
2350
|
kind: StringFlags
|
2288
2351
|
- name: opening_loc
|
2289
2352
|
type: location?
|
2290
|
-
semantic_field: true # https://github.com/ruby/prism/issues/1452
|
2291
2353
|
- name: content_loc
|
2292
2354
|
type: location
|
2293
|
-
semantic_field: true # https://github.com/ruby/prism/issues/1452
|
2294
2355
|
- name: closing_loc
|
2295
2356
|
type: location?
|
2296
2357
|
- name: unescaped
|
data/docs/fuzzing.md
CHANGED
@@ -6,8 +6,7 @@ We use fuzzing to test the various entrypoints to the library. The fuzzer we use
|
|
6
6
|
fuzz
|
7
7
|
├── corpus
|
8
8
|
│ ├── parse fuzzing corpus for parsing (a symlink to our fixtures)
|
9
|
-
│
|
10
|
-
│ └── unescape fuzzing corpus for unescaping strings
|
9
|
+
│ └── regexp fuzzing corpus for regexp
|
11
10
|
├── dict a AFL++ dictionary containing various tokens
|
12
11
|
├── docker
|
13
12
|
│ └── Dockerfile for building a container with the fuzzer toolchain
|
@@ -17,11 +16,9 @@ fuzz
|
|
17
16
|
├── parse.sh script to run parsing fuzzer
|
18
17
|
├── regexp.c fuzz handler for regular expression parsing
|
19
18
|
├── regexp.sh script to run regexp fuzzer
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
├── unescape.c fuzz handler for unescape functionality
|
24
|
-
└── unescape.sh script to run unescape fuzzer
|
19
|
+
└── tools
|
20
|
+
├── backtrace.sh generates backtrace files for a crash directory
|
21
|
+
└── minimize.sh generates minimized crash or hang files
|
25
22
|
```
|
26
23
|
|
27
24
|
## Usage
|
@@ -30,14 +27,12 @@ There are currently three fuzzing targets
|
|
30
27
|
|
31
28
|
- `pm_parse_serialize` (parse)
|
32
29
|
- `pm_regexp_named_capture_group_names` (regexp)
|
33
|
-
- `pm_unescape_manipulate_string` (unescape)
|
34
30
|
|
35
31
|
Respectively, fuzzing can be performed with
|
36
32
|
|
37
33
|
```
|
38
34
|
make fuzz-run-parse
|
39
35
|
make fuzz-run-regexp
|
40
|
-
make fuzz-run-unescape
|
41
36
|
```
|
42
37
|
|
43
38
|
To end a fuzzing job, interrupt with CTRL+C. To enter a container with the fuzzing toolchain and debug utilities, run
|
@@ -60,7 +55,7 @@ Note, that this may make reproducing bugs difficult as they may depend on memory
|
|
60
55
|
|
61
56
|
```
|
62
57
|
make fuzz-debug # enter the docker container with build tools
|
63
|
-
make build/fuzz.heisenbug.parse # or .
|
58
|
+
make build/fuzz.heisenbug.parse # or .regexp
|
64
59
|
./build/fuzz.heisenbug.parse path-to-problem-input
|
65
60
|
```
|
66
61
|
|
data/docs/prism.png
ADDED
Binary file
|
data/docs/serialization.md
CHANGED
@@ -31,6 +31,7 @@ This drastically cuts down on the size of the serialized string, especially when
|
|
31
31
|
### comment
|
32
32
|
|
33
33
|
The comment type is one of:
|
34
|
+
|
34
35
|
* 0=`INLINE` (`# comment`)
|
35
36
|
* 1=`EMBEDDED_DOCUMENT` (`=begin`/`=end`)
|
36
37
|
* 2=`__END__` (after `__END__`)
|
@@ -40,6 +41,13 @@ The comment type is one of:
|
|
40
41
|
| `1` | comment type |
|
41
42
|
| location | the location in the source of this comment |
|
42
43
|
|
44
|
+
### magic comment
|
45
|
+
|
46
|
+
| # bytes | field |
|
47
|
+
| --- | --- |
|
48
|
+
| location | the location of the key of the magic comment |
|
49
|
+
| location | the location of the value of the magic comment |
|
50
|
+
|
43
51
|
### diagnostic
|
44
52
|
|
45
53
|
| # bytes | field |
|
@@ -66,6 +74,8 @@ The header is structured like the following table:
|
|
66
74
|
| string | the encoding name |
|
67
75
|
| varint | number of comments |
|
68
76
|
| comment* | comments |
|
77
|
+
| varint | number of magic comments |
|
78
|
+
| magic comment* | magic comments |
|
69
79
|
| varint | number of errors |
|
70
80
|
| diagnostic* | errors |
|
71
81
|
| varint | number of warnings |
|