prism 0.17.1 → 0.19.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +60 -1
- data/Makefile +5 -5
- data/README.md +4 -3
- data/config.yml +214 -68
- data/docs/build_system.md +6 -6
- data/docs/building.md +10 -3
- data/docs/configuration.md +11 -9
- data/docs/encoding.md +92 -88
- data/docs/heredocs.md +1 -1
- data/docs/javascript.md +29 -1
- data/docs/local_variable_depth.md +229 -0
- data/docs/ruby_api.md +16 -0
- data/docs/serialization.md +18 -13
- data/ext/prism/api_node.c +411 -240
- data/ext/prism/extconf.rb +97 -127
- data/ext/prism/extension.c +97 -33
- data/ext/prism/extension.h +1 -1
- data/include/prism/ast.h +377 -159
- data/include/prism/defines.h +17 -0
- data/include/prism/diagnostic.h +38 -6
- data/include/prism/{enc/pm_encoding.h → encoding.h} +126 -64
- data/include/prism/options.h +2 -2
- data/include/prism/parser.h +62 -36
- data/include/prism/regexp.h +2 -2
- data/include/prism/util/pm_buffer.h +9 -1
- data/include/prism/util/pm_memchr.h +2 -2
- data/include/prism/util/pm_strpbrk.h +3 -3
- data/include/prism/version.h +3 -3
- data/include/prism.h +13 -15
- data/lib/prism/compiler.rb +15 -3
- data/lib/prism/debug.rb +13 -4
- data/lib/prism/desugar_compiler.rb +4 -3
- data/lib/prism/dispatcher.rb +70 -14
- data/lib/prism/dot_visitor.rb +4612 -0
- data/lib/prism/dsl.rb +77 -57
- data/lib/prism/ffi.rb +19 -6
- data/lib/prism/lex_compat.rb +19 -9
- data/lib/prism/mutation_compiler.rb +26 -6
- data/lib/prism/node.rb +1314 -522
- data/lib/prism/node_ext.rb +102 -19
- data/lib/prism/parse_result.rb +58 -27
- data/lib/prism/ripper_compat.rb +49 -34
- data/lib/prism/serialize.rb +251 -227
- data/lib/prism/visitor.rb +15 -3
- data/lib/prism.rb +21 -4
- data/prism.gemspec +7 -9
- data/rbi/prism.rbi +688 -284
- data/rbi/prism_static.rbi +3 -0
- data/sig/prism.rbs +426 -156
- data/sig/prism_static.rbs +1 -0
- data/src/diagnostic.c +280 -216
- data/src/encoding.c +5137 -0
- data/src/node.c +99 -21
- data/src/options.c +21 -2
- data/src/prettyprint.c +1743 -1241
- data/src/prism.c +1774 -831
- data/src/regexp.c +15 -15
- data/src/serialize.c +261 -164
- data/src/util/pm_buffer.c +10 -1
- data/src/util/pm_memchr.c +1 -1
- data/src/util/pm_strpbrk.c +4 -4
- metadata +8 -10
- data/src/enc/pm_big5.c +0 -53
- data/src/enc/pm_euc_jp.c +0 -59
- data/src/enc/pm_gbk.c +0 -62
- data/src/enc/pm_shift_jis.c +0 -57
- data/src/enc/pm_tables.c +0 -743
- data/src/enc/pm_unicode.c +0 -2369
- data/src/enc/pm_windows_31j.c +0 -57
data/docs/configuration.md
CHANGED
@@ -11,6 +11,7 @@ A lot of code in prism's repository is templated from a single configuration fil
|
|
11
11
|
* `java/org/prism/Nodes.java` - for defining the nodes in Java
|
12
12
|
* `lib/prism/compiler.rb` - for defining the compiler for the nodes in Ruby
|
13
13
|
* `lib/prism/dispatcher.rb` - for defining the dispatch visitors for the nodes in Ruby
|
14
|
+
* `lib/prism/dot_visitor.rb` - for defining the dot visitor for the nodes in Ruby
|
14
15
|
* `lib/prism/dsl.rb` - for defining the DSL for the nodes in Ruby
|
15
16
|
* `lib/prism/mutation_compiler.rb` - for defining the mutation compiler for the nodes in Ruby
|
16
17
|
* `lib/prism/node.rb` - for defining the nodes in Ruby
|
@@ -49,14 +50,15 @@ Optionally, every node can define a `child_nodes` key that is an array. This arr
|
|
49
50
|
|
50
51
|
The available values for `type` are:
|
51
52
|
|
52
|
-
* `node` - A
|
53
|
-
* `node?` - A
|
54
|
-
* `node[]` - A
|
55
|
-
* `string` - A
|
56
|
-
* `constant` - A
|
57
|
-
* `constant[]` - A
|
58
|
-
* `location` - A
|
59
|
-
* `location?` - A
|
60
|
-
* `
|
53
|
+
* `node` - A field that is a node. This is a `pm_node_t *` in C.
|
54
|
+
* `node?` - A field that is a node that is optionally present. This is also a `pm_node_t *` in C, but can be `NULL`.
|
55
|
+
* `node[]` - A field that is an array of nodes. This is a `pm_node_list_t` in C.
|
56
|
+
* `string` - A field that is a string. For example, this is used as the name of the method in a call node, since it cannot directly reference the source string (as in `@-` or `foo=`). This is a `pm_string_t` in C.
|
57
|
+
* `constant` - A field that is an integer that represents an index in the constant pool. This is a `pm_constant_id_t` in C.
|
58
|
+
* `constant[]` - A field that is an array of constants. This is a `pm_constant_id_list_t` in C.
|
59
|
+
* `location` - A field that is a location. This is a `pm_location_t` in C.
|
60
|
+
* `location?` - A field that is a location that is optionally present. This is a `pm_location_t` in C, but if the value is not present then the `start` and `end` fields will be `NULL`.
|
61
|
+
* `uint8` - A field that is an 8-bit unsigned integer. This is a `uint8_t` in C.
|
62
|
+
* `uint32` - A field that is a 32-bit unsigned integer. This is a `uint32_t` in C.
|
61
63
|
|
62
64
|
If the type is `node` or `node?` then the value also accepts an optional `kind` key (a string). This key is expected to match to the name of another node type within `config.yml`. This changes a couple of places where code is templated out to use the more specific struct name instead of the generic `pm_node_t`. For example, with `kind: StatementsNode` the `pm_node_t *` in C becomes a `pm_statements_node_t *`.
|
data/docs/encoding.md
CHANGED
@@ -12,94 +12,98 @@ If the file is not encoded in UTF-8, the user must specify the encoding in a "ma
|
|
12
12
|
|
13
13
|
The key of the comment can be either "encoding" or "coding". The value of the comment must be a string that is a valid encoding name. The encodings that prism supports by default are:
|
14
14
|
|
15
|
-
* `
|
16
|
-
* `
|
17
|
-
* `
|
18
|
-
* `
|
19
|
-
* `
|
20
|
-
* `
|
21
|
-
* `
|
22
|
-
* `
|
23
|
-
* `
|
24
|
-
* `
|
25
|
-
* `
|
26
|
-
* `
|
27
|
-
* `
|
28
|
-
* `
|
29
|
-
* `
|
30
|
-
* `
|
31
|
-
* `
|
32
|
-
* `
|
33
|
-
* `
|
34
|
-
* `
|
35
|
-
* `
|
36
|
-
* `
|
37
|
-
* `
|
38
|
-
* `
|
39
|
-
* `
|
40
|
-
* `
|
41
|
-
* `
|
42
|
-
* `
|
43
|
-
* `
|
44
|
-
* `
|
45
|
-
* `
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
15
|
+
* `ASCII-8BIT`
|
16
|
+
* `Big5`
|
17
|
+
* `Big5-HKSCS`
|
18
|
+
* `Big5-UAO`
|
19
|
+
* `CESU-8`
|
20
|
+
* `CP51932`
|
21
|
+
* `CP850`
|
22
|
+
* `CP852`
|
23
|
+
* `CP855`
|
24
|
+
* `CP949`
|
25
|
+
* `CP950`
|
26
|
+
* `CP951`
|
27
|
+
* `Emacs-Mule`
|
28
|
+
* `EUC-JP`
|
29
|
+
* `eucJP-ms`
|
30
|
+
* `EUC-JIS-2004`
|
31
|
+
* `EUC-KR`
|
32
|
+
* `EUC-TW`
|
33
|
+
* `GB12345`
|
34
|
+
* `GB18030`
|
35
|
+
* `GB1988`
|
36
|
+
* `GB2312`
|
37
|
+
* `GBK`
|
38
|
+
* `IBM437`
|
39
|
+
* `IBM720`
|
40
|
+
* `IBM737`
|
41
|
+
* `IBM775`
|
42
|
+
* `IBM852`
|
43
|
+
* `IBM855`
|
44
|
+
* `IBM857`
|
45
|
+
* `IBM860`
|
46
|
+
* `IBM861`
|
47
|
+
* `IBM862`
|
48
|
+
* `IBM863`
|
49
|
+
* `IBM864`
|
50
|
+
* `IBM865`
|
51
|
+
* `IBM866`
|
52
|
+
* `IBM869`
|
53
|
+
* `ISO-8859-1`
|
54
|
+
* `ISO-8859-2`
|
55
|
+
* `ISO-8859-3`
|
56
|
+
* `ISO-8859-4`
|
57
|
+
* `ISO-8859-5`
|
58
|
+
* `ISO-8859-6`
|
59
|
+
* `ISO-8859-7`
|
60
|
+
* `ISO-8859-8`
|
61
|
+
* `ISO-8859-9`
|
62
|
+
* `ISO-8859-10`
|
63
|
+
* `ISO-8859-11`
|
64
|
+
* `ISO-8859-13`
|
65
|
+
* `ISO-8859-14`
|
66
|
+
* `ISO-8859-15`
|
67
|
+
* `ISO-8859-16`
|
68
|
+
* `KOI8-R`
|
69
|
+
* `KOI8-U`
|
70
|
+
* `macCentEuro`
|
71
|
+
* `macCroatian`
|
72
|
+
* `macCyrillic`
|
73
|
+
* `macGreek`
|
74
|
+
* `macIceland`
|
75
|
+
* `MacJapanese`
|
76
|
+
* `macRoman`
|
77
|
+
* `macRomania`
|
78
|
+
* `macThai`
|
79
|
+
* `macTurkish`
|
80
|
+
* `macUkraine`
|
81
|
+
* `Shift_JIS`
|
82
|
+
* `SJIS-DoCoMo`
|
83
|
+
* `SJIS-KDDI`
|
84
|
+
* `SJIS-SoftBank`
|
85
|
+
* `stateless-ISO-2022-JP`
|
86
|
+
* `stateless-ISO-2022-JP-KDDI`
|
87
|
+
* `TIS-620`
|
88
|
+
* `US-ASCII`
|
89
|
+
* `UTF-8`
|
90
|
+
* `UTF8-MAC`
|
91
|
+
* `UTF8-DoCoMo`
|
92
|
+
* `UTF8-KDDI`
|
93
|
+
* `UTF8-SoftBank`
|
94
|
+
* `Windows-1250`
|
95
|
+
* `Windows-1251`
|
96
|
+
* `Windows-1252`
|
97
|
+
* `Windows-1253`
|
98
|
+
* `Windows-1254`
|
99
|
+
* `Windows-1255`
|
100
|
+
* `Windows-1256`
|
101
|
+
* `Windows-1257`
|
102
|
+
* `Windows-1258`
|
103
|
+
* `Windows-31J`
|
104
|
+
* `Windows-874`
|
105
|
+
|
106
|
+
For each of these encodings, prism provides functions for checking if the subsequent bytes can be interpreted as a character, and then if that character is alphabetic, alphanumeric, or uppercase.
|
103
107
|
|
104
108
|
## Getting notified when the encoding changes
|
105
109
|
|
data/docs/heredocs.md
CHANGED
@@ -12,7 +12,7 @@ When a heredoc identifier is encountered in the regular process of lexing, we pu
|
|
12
12
|
|
13
13
|
We also set the special `parser.next_start` field which is a pointer to the place in the source where we should start lexing the next token. This is set to the pointer of the character immediately following the next newline.
|
14
14
|
|
15
|
-
Note that if the `parser.heredoc_end` field is already set, then it means we have already encountered a heredoc on this line. In that case the `parser.next_start` field will be set to the `parser.heredoc_end` field. This is because we want to skip past the
|
15
|
+
Note that if the `parser.heredoc_end` field is already set, then it means we have already encountered a heredoc on this line. In that case the `parser.next_start` field will be set to the `parser.heredoc_end` field. This is because we want to skip past the previous heredocs on this line and instead lex the body of this heredoc.
|
16
16
|
|
17
17
|
## 2. Lexing the body
|
18
18
|
|
data/docs/javascript.md
CHANGED
@@ -24,7 +24,7 @@ const parse = await loadPrism();
|
|
24
24
|
|
25
25
|
## Browser
|
26
26
|
|
27
|
-
To use the package from the browser, you will need to do some additional work. The [javascript/example.html](javascript/example.html) file shows an example of running Prism in the browser. You will need to instantiate the WebAssembly module yourself and then pass it to the `parsePrism` function.
|
27
|
+
To use the package from the browser, you will need to do some additional work. The [javascript/example.html](../javascript/example.html) file shows an example of running Prism in the browser. You will need to instantiate the WebAssembly module yourself and then pass it to the `parsePrism` function.
|
28
28
|
|
29
29
|
First, get a shim for WASI since not all browsers support it yet.
|
30
30
|
|
@@ -74,6 +74,34 @@ A ParseResult object is very similar to the Prism::ParseResult object from Ruby.
|
|
74
74
|
console.log(JSON.stringify(parseResult.value, null, 2));
|
75
75
|
```
|
76
76
|
|
77
|
+
## Visitors
|
78
|
+
|
79
|
+
Prism allows you to traverse the AST of parsed Ruby code using visitors.
|
80
|
+
|
81
|
+
Here's an example of a custom `FooCalls` visitor:
|
82
|
+
|
83
|
+
```js
|
84
|
+
import { loadPrism, Visitor } from "@ruby/prism"
|
85
|
+
|
86
|
+
const parse = await loadPrism();
|
87
|
+
const parseResult = parse("foo()");
|
88
|
+
|
89
|
+
class FooCalls extends Visitor {
|
90
|
+
visitCallNode(node) {
|
91
|
+
if (node.name === "foo") {
|
92
|
+
// Do something with the node
|
93
|
+
}
|
94
|
+
|
95
|
+
// Call super so that the visitor continues walking the tree
|
96
|
+
super.visitCallNode(node);
|
97
|
+
}
|
98
|
+
}
|
99
|
+
|
100
|
+
const fooVisitor = new FooCalls();
|
101
|
+
|
102
|
+
parseResult.value.accept(fooVisitor);
|
103
|
+
```
|
104
|
+
|
77
105
|
## Building
|
78
106
|
|
79
107
|
To build the WASM package yourself, first obtain a copy of `wasi-sdk`. You can retrieve this here: <https://github.com/WebAssembly/wasi-sdk>. Next, run:
|
@@ -0,0 +1,229 @@
|
|
1
|
+
# Local variable depth
|
2
|
+
|
3
|
+
One feature of Prism is that it resolves local variables as it parses. It's necessary to do this because of ambiguities in the grammar. For example, consider the following code:
|
4
|
+
|
5
|
+
```ruby
|
6
|
+
foo / bar#/
|
7
|
+
```
|
8
|
+
|
9
|
+
If `foo` is a local variable, this is a call to `/` with `bar` as an argument, followed by a comment. If it's not a local variable, this is a method call to `foo` with a regular expression argument.
|
10
|
+
|
11
|
+
"Depth" refers to the number of visible scopes that Prism has to go up to find the declaration of a local variable.
|
12
|
+
Note that this follows the same scoping rules as Ruby, so a local variable is only visible in the scope it is declared in and in blocks nested in that scope.
|
13
|
+
The rules for calculating the depth are very important to understand because they may differ from individual Ruby implementations since they are not specified by the language.
|
14
|
+
|
15
|
+
Prism uses the minimum number of scopes, i.e., it only creates scopes when necessary semantically, in other words when there must be distinct scopes (which can be observed through `binding.local_variables`).
|
16
|
+
That are no "transparent/invisible" scopes in Prism.
|
17
|
+
Some Ruby implementations use those for some language constructs and need to adjust by maintaining a depth offset.
|
18
|
+
|
19
|
+
Below are the places where a local variable can be written/targeted, along with how the depth is calculated at that point.
|
20
|
+
|
21
|
+
## General
|
22
|
+
|
23
|
+
In the course of general Ruby code when reading a local variable, the depth is equal to the number of scopes to go up to find the declaration of that variable. For example:
|
24
|
+
|
25
|
+
```ruby
|
26
|
+
foo = 1
|
27
|
+
bar = 2
|
28
|
+
baz = 3
|
29
|
+
|
30
|
+
foo # depth 0
|
31
|
+
tap { bar } # depth 1
|
32
|
+
tap { tap { baz } } # depth 2
|
33
|
+
```
|
34
|
+
|
35
|
+
This also includes writing to a local variable, which could be writing to a local variable that is already declared. For example:
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
foo = 1
|
39
|
+
bar = 2
|
40
|
+
|
41
|
+
foo = 3 # depth 0
|
42
|
+
tap { bar = 4 } # depth 1
|
43
|
+
```
|
44
|
+
|
45
|
+
This includes multiple assignment, where the same principle applies. For example:
|
46
|
+
|
47
|
+
```ruby
|
48
|
+
foo = 1
|
49
|
+
bar = 2
|
50
|
+
|
51
|
+
foo, bar = 3, 4 # depth 0
|
52
|
+
tap { foo, bar = 5, 6 } # depth 1
|
53
|
+
```
|
54
|
+
|
55
|
+
## `for` loops
|
56
|
+
|
57
|
+
`for` loops in Ruby break down to calls to `.each` with a block.
|
58
|
+
However in that case local variable reads and writes within the block will be in the same scope as the scope surrounding the `for` and not in a deeper/separate scope (surprising, but this is Ruby semantics).
|
59
|
+
For example:
|
60
|
+
|
61
|
+
```ruby
|
62
|
+
foo = 1
|
63
|
+
|
64
|
+
for e in baz
|
65
|
+
foo # depth 0
|
66
|
+
bar = 2 # depth 0
|
67
|
+
end
|
68
|
+
|
69
|
+
p bar # depth 0, prints 2
|
70
|
+
```
|
71
|
+
|
72
|
+
The local variable(s) used for the index of the `for` are also at the same depth (as variables inside and outside the `for`):
|
73
|
+
|
74
|
+
```ruby
|
75
|
+
for e in [1, 2] # depth 0
|
76
|
+
e # depth 0
|
77
|
+
end
|
78
|
+
|
79
|
+
p e # depth 0, prints 2
|
80
|
+
```
|
81
|
+
|
82
|
+
## Pattern matching captures
|
83
|
+
|
84
|
+
You can target a local variable in a pattern matching expression using capture syntax. Using this syntax, you can target local variables in the current scope or in visible parent scopes. For example:
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
42 => bar # depth 0
|
88
|
+
```
|
89
|
+
|
90
|
+
The example above writes to a local variable in the current scope. If the variable is already declared in a higher visible scope, it will be written to that scope instead. For example:
|
91
|
+
|
92
|
+
```ruby
|
93
|
+
foo = 1
|
94
|
+
tap { 42 => foo } # depth 1
|
95
|
+
```
|
96
|
+
|
97
|
+
## Named capture groups
|
98
|
+
|
99
|
+
You can target local variables through named capture groups in regular expressions if they are used on the left-hand side of a `=~` operator. For example:
|
100
|
+
|
101
|
+
```ruby
|
102
|
+
/(?<foo>\d+)/ =~ "42" # depth 0
|
103
|
+
```
|
104
|
+
|
105
|
+
This will write to a `foo` local variable. If the variable is already declared in a higher visible scope, it will be written to that scope instead. For example:
|
106
|
+
|
107
|
+
```ruby
|
108
|
+
foo = 1
|
109
|
+
tap { /(?<foo>\d+)/ =~ "42" } # depth 1
|
110
|
+
```
|
111
|
+
|
112
|
+
## "interpolated once" regular expressions
|
113
|
+
|
114
|
+
Regular expressions that interpolate local variables (unrelated to capture group local variables) and have the `o` flag will only interpolate the local variables once for the runtime of the program.
|
115
|
+
In CRuby, this is implemented by compiling the regular expression within a nested instruction sequence, which means CRuby thinks the depth is one more than prism does. For example:
|
116
|
+
|
117
|
+
```
|
118
|
+
$ ruby --dump=insns -e 'foo = 1; /#{foo}/o'
|
119
|
+
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,18)> (catch: false)
|
120
|
+
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
|
121
|
+
[ 1] foo@0
|
122
|
+
0000 putobject_INT2FIX_1_ ( 1)[Li]
|
123
|
+
0001 setlocal_WC_0 foo@0
|
124
|
+
0003 once block in <main>, <is:0>
|
125
|
+
0006 leave
|
126
|
+
|
127
|
+
== disasm: #<ISeq:block in <main>@-e:1 (1,9)-(1,18)> (catch: false)
|
128
|
+
0000 putobject "" ( 1)
|
129
|
+
0002 getlocal_WC_1 foo@0
|
130
|
+
0004 dup
|
131
|
+
0005 objtostring <calldata!mid:to_s, argc:0, FCALL|ARGS_SIMPLE>
|
132
|
+
0007 anytostring
|
133
|
+
0008 toregexp 0, 2
|
134
|
+
0011 leave
|
135
|
+
```
|
136
|
+
|
137
|
+
In this case CRuby fetches the local variable with `getlocal_WC_1` as the second instruction to the "once" instruction sequence. When compiling CRuby, prism therefore will adjust the depth to account for this difference.
|
138
|
+
|
139
|
+
## `rescue` clauses
|
140
|
+
|
141
|
+
In CRuby, `rescue` clauses are implemented as their own instruction sequence, and therefore CRuby thinks the depth is one more than prism does. For example:
|
142
|
+
|
143
|
+
```
|
144
|
+
$ ruby --dump=insns -e 'begin; foo = 1; rescue; foo; end'
|
145
|
+
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,32)> (catch: true)
|
146
|
+
== catch table
|
147
|
+
| catch type: rescue st: 0000 ed: 0004 sp: 0000 cont: 0005
|
148
|
+
| == disasm: #<ISeq:rescue in <main>@-e:1 (1,16)-(1,28)> (catch: true)
|
149
|
+
| local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
|
150
|
+
| [ 1] $!@0
|
151
|
+
| 0000 getlocal_WC_0 $!@0 ( 1)
|
152
|
+
| 0002 putobject StandardError
|
153
|
+
| 0004 checkmatch 3
|
154
|
+
| 0006 branchunless 11
|
155
|
+
| 0008 getlocal_WC_1 foo@0[Li]
|
156
|
+
| 0010 leave
|
157
|
+
| 0011 getlocal_WC_0 $!@0
|
158
|
+
| 0013 throw 0
|
159
|
+
| catch type: retry st: 0004 ed: 0005 sp: 0000 cont: 0000
|
160
|
+
|------------------------------------------------------------------------
|
161
|
+
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
|
162
|
+
[ 1] foo@0
|
163
|
+
0000 putobject_INT2FIX_1_ ( 1)[Li]
|
164
|
+
0001 dup
|
165
|
+
0002 setlocal_WC_0 foo@0
|
166
|
+
0004 nop
|
167
|
+
0005 leave
|
168
|
+
```
|
169
|
+
|
170
|
+
In the catch table, CRuby is reading the `foo` local variable using `getlocal_WC_1` as the fifth instruction to the "rescue" instruction sequence. When compiling CRuby, prism therefore will adjust the depth to account for this difference.
|
171
|
+
|
172
|
+
Note that this includes the error reference, which can target local variables, as in:
|
173
|
+
|
174
|
+
```
|
175
|
+
$ ruby --dump=insns -e 'foo = 1; begin; rescue => foo; end'
|
176
|
+
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)> (catch: true)
|
177
|
+
== catch table
|
178
|
+
| catch type: rescue st: 0003 ed: 0004 sp: 0000 cont: 0005
|
179
|
+
| == disasm: #<ISeq:rescue in <main>@-e:1 (1,16)-(1,30)> (catch: true)
|
180
|
+
| local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
|
181
|
+
| [ 1] $!@0
|
182
|
+
| 0000 getlocal_WC_0 $!@0 ( 1)
|
183
|
+
| 0002 putobject StandardError
|
184
|
+
| 0004 checkmatch 3
|
185
|
+
| 0006 branchunless 14
|
186
|
+
| 0008 getlocal_WC_0 $!@0
|
187
|
+
| 0010 setlocal_WC_1 foo@0
|
188
|
+
| 0012 putnil
|
189
|
+
| 0013 leave
|
190
|
+
| 0014 getlocal_WC_0 $!@0
|
191
|
+
| 0016 throw 0
|
192
|
+
| catch type: retry st: 0004 ed: 0005 sp: 0000 cont: 0003
|
193
|
+
|------------------------------------------------------------------------
|
194
|
+
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
|
195
|
+
[ 1] foo@0
|
196
|
+
0000 putobject_INT2FIX_1_ ( 1)[Li]
|
197
|
+
0001 setlocal_WC_0 foo@0
|
198
|
+
0003 putnil
|
199
|
+
0004 nop
|
200
|
+
0005 leave
|
201
|
+
```
|
202
|
+
|
203
|
+
Note that CRuby is writing to the `foo` local variable using the `setlocal_WC_1` instruction as the sixth instruction to the "rescue" instruction sequence. When compiling CRuby, prism therefore will adjust the depth to account for this difference.
|
204
|
+
|
205
|
+
## Post execution blocks
|
206
|
+
|
207
|
+
The `END {}` syntax allows executing code when the program exits. In CRuby, this is implemented as two nested instruction sequences. CRuby therefore thinks the depth is two more than prism does. For example:
|
208
|
+
|
209
|
+
```
|
210
|
+
$ ruby --dump=insns -e 'foo = 1; END { foo }'
|
211
|
+
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,20)> (catch: false)
|
212
|
+
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
|
213
|
+
[ 1] foo@0
|
214
|
+
0000 putobject_INT2FIX_1_ ( 1)[Li]
|
215
|
+
0001 setlocal_WC_0 foo@0
|
216
|
+
0003 once block in <main>, <is:0>
|
217
|
+
0006 leave
|
218
|
+
|
219
|
+
== disasm: #<ISeq:block in <main>@-e:0 (0,0)-(-1,-1)> (catch: false)
|
220
|
+
0000 putspecialobject 1 ( 1)
|
221
|
+
0002 send <calldata!mid:core#set_postexe, argc:0, FCALL>, block in <main>
|
222
|
+
0005 leave
|
223
|
+
|
224
|
+
== disasm: #<ISeq:block in <main>@-e:1 (1,9)-(1,20)> (catch: false)
|
225
|
+
0000 getlocal foo@0, 2 ( 1)[LiBc]
|
226
|
+
0003 leave [Br]
|
227
|
+
```
|
228
|
+
|
229
|
+
In the instruction sequence corresponding to the code that gets executed inside the `END` block, CRuby is reading the `foo` local variable using `getlocal` as the second instruction to the `"block in <main>"` instruction sequence. When compiling CRuby, prism therefore will adjust the depth to account for this difference.
|
data/docs/ruby_api.md
CHANGED
@@ -25,3 +25,19 @@ The full API is documented below.
|
|
25
25
|
* `Prism.load(source, serialized)` - load the serialized syntax tree using the source as a reference into a syntax tree
|
26
26
|
* `Prism.parse_comments(source)` - parse the comments corresponding to the given source string and return them
|
27
27
|
* `Prism.parse_file_comments(source)` - parse the comments corresponding to the given source file and return them
|
28
|
+
* `Prism.parse_success?(source)` - parse the syntax tree corresponding to the given source string and return true if it was parsed without errors
|
29
|
+
* `Prism.parse_file_success?(filepath)` - parse the syntax tree corresponding to the given source file and return true if it was parsed without errors
|
30
|
+
|
31
|
+
## Nodes
|
32
|
+
|
33
|
+
Once you have nodes in hand coming out of a parse result, there are a number of common APIs that are available on each instance. They are:
|
34
|
+
|
35
|
+
* `#accept(visitor)` - a method that will immediately call `visit_*` to specialize for the node type
|
36
|
+
* `#child_nodes` - a positional array of the child nodes of the node, with `nil` values for any missing children
|
37
|
+
* `#compact_child_nodes` - a positional array of the child nodes of the node with no `nil` values
|
38
|
+
* `#copy(**keys)` - a method that allows creating a shallow copy of the node with the given keys overridden
|
39
|
+
* `#deconstruct`/`#deconstruct_keys(keys)` - the pattern matching interface for nodes
|
40
|
+
* `#inspect` - a string representation that looks like the syntax tree of the node
|
41
|
+
* `#location` - a `Location` object that describes the location of the node in the source file
|
42
|
+
* `#to_dot` - convert the node's syntax tree into graphviz dot notation
|
43
|
+
* `#type` - a symbol that represents the type of the node, useful for quick comparisons
|
data/docs/serialization.md
CHANGED
@@ -9,24 +9,28 @@ The syntax tree still requires a copy of the original source, as for the most pa
|
|
9
9
|
|
10
10
|
Let us define some simple types for readability.
|
11
11
|
|
12
|
-
###
|
12
|
+
### varuint
|
13
13
|
|
14
|
-
A variable-length integer with the value fitting in `uint32_t` using between 1 and 5 bytes, using the [LEB128](https://en.wikipedia.org/wiki/LEB128) encoding.
|
14
|
+
A variable-length unsigned integer with the value fitting in `uint32_t` using between 1 and 5 bytes, using the [LEB128](https://en.wikipedia.org/wiki/LEB128) encoding.
|
15
15
|
This drastically cuts down on the size of the serialized string, especially when the source file is large.
|
16
16
|
|
17
|
+
### varsint
|
18
|
+
|
19
|
+
A variable-length signed integer with the value fitting in `int32_t` using between 1 and 5 bytes, using [ZigZag encoding](https://protobuf.dev/programming-guides/encoding/#signed-ints) into [LEB128].
|
20
|
+
|
17
21
|
### string
|
18
22
|
|
19
23
|
| # bytes | field |
|
20
24
|
| --- | --- |
|
21
|
-
|
|
25
|
+
| varuint | the length of the string in bytes |
|
22
26
|
| ... | the string bytes |
|
23
27
|
|
24
28
|
### location
|
25
29
|
|
26
30
|
| # bytes | field |
|
27
31
|
| --- | --- |
|
28
|
-
|
|
29
|
-
|
|
32
|
+
| varuint | byte offset into the source string where this location begins |
|
33
|
+
| varuint | length of the location in bytes in the source string |
|
30
34
|
|
31
35
|
### comment
|
32
36
|
|
@@ -34,7 +38,6 @@ The comment type is one of:
|
|
34
38
|
|
35
39
|
* 0=`INLINE` (`# comment`)
|
36
40
|
* 1=`EMBEDDED_DOCUMENT` (`=begin`/`=end`)
|
37
|
-
* 2=`__END__` (after `__END__`)
|
38
41
|
|
39
42
|
| # bytes | field |
|
40
43
|
| --- | --- |
|
@@ -72,17 +75,18 @@ The header is structured like the following table:
|
|
72
75
|
| `1` | patch version number |
|
73
76
|
| `1` | 1 indicates only semantics fields were serialized, 0 indicates all fields were serialized (including location fields) |
|
74
77
|
| string | the encoding name |
|
75
|
-
|
|
76
|
-
|
|
78
|
+
| varsint | the start line |
|
79
|
+
| varuint | number of comments |
|
77
80
|
| comment* | comments |
|
78
|
-
|
|
81
|
+
| varuint | number of magic comments |
|
79
82
|
| magic comment* | magic comments |
|
80
|
-
|
|
83
|
+
| location? | the optional location of the `__END__` keyword and its contents |
|
84
|
+
| varuint | number of errors |
|
81
85
|
| diagnostic* | errors |
|
82
|
-
|
|
86
|
+
| varuint | number of warnings |
|
83
87
|
| diagnostic* | warnings |
|
84
88
|
| `4` | content pool offset |
|
85
|
-
|
|
89
|
+
| varuint | content pool size |
|
86
90
|
|
87
91
|
After the header comes the body of the serialized string.
|
88
92
|
The body consists of a sequence of nodes that is built using a prefix traversal order of the syntax tree.
|
@@ -103,6 +107,7 @@ Every field on the node is then appended to the serialized string. The fields ca
|
|
103
107
|
* `constant?` - An optional variable-length integer that represents an index in the constant pool. If it's not present, then a single `0` byte will be written in its place.
|
104
108
|
* `location` - A field that is a location. This is structured as a variable-length integer start followed by a variable-length integer length.
|
105
109
|
* `location?` - A field that is a location that is optionally present. If the location is not present, then a single `0` byte will be written in its place. If it is present, then it will be structured just like the `location` child node.
|
110
|
+
* `uint8` - A field that is an 8-bit unsigned integer. This is structured as a single byte.
|
106
111
|
* `uint32` - A field that is a 32-bit unsigned integer. This is structured as a variable-length integer.
|
107
112
|
|
108
113
|
After the syntax tree, the content pool is serialized. This is a list of constants that were referenced from within the tree. The content pool begins at the offset specified in the header. Constants can be either "owned" (in which case their contents are embedded in the serialization) or "shared" (in which case their contents represent a slice of the source string). The most significant bit of the constant indicates whether it is owned or shared.
|
@@ -159,7 +164,7 @@ serialize(const uint8_t *source, size_t length) {
|
|
159
164
|
}
|
160
165
|
```
|
161
166
|
|
162
|
-
The final argument to `pm_serialize_parse` is an optional string that controls the options to the parse function. This includes all of the normal options that could be passed to `pm_parser_init` through a `pm_options_t` struct, but serialized as a string to make it easier for callers through FFI. Note that no `
|
167
|
+
The final argument to `pm_serialize_parse` is an optional string that controls the options to the parse function. This includes all of the normal options that could be passed to `pm_parser_init` through a `pm_options_t` struct, but serialized as a string to make it easier for callers through FFI. Note that no `varuint` are used here to make it easier to produce the data for the caller, and also serialized size is less important here. The format of the data is structured as follows:
|
163
168
|
|
164
169
|
| # bytes | field |
|
165
170
|
| ------- | -------------------------- |
|