yarp 0.12.0 → 0.13.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +29 -8
- data/CONTRIBUTING.md +2 -2
- data/Makefile +5 -5
- data/README.md +11 -12
- data/config.yml +6 -2
- data/docs/build_system.md +21 -21
- data/docs/building.md +4 -4
- data/docs/configuration.md +25 -21
- data/docs/design.md +2 -2
- data/docs/encoding.md +17 -17
- data/docs/fuzzing.md +4 -4
- data/docs/heredocs.md +3 -3
- data/docs/mapping.md +94 -94
- data/docs/ripper.md +4 -4
- data/docs/ruby_api.md +11 -11
- data/docs/serialization.md +17 -16
- data/docs/testing.md +6 -6
- data/ext/prism/api_node.c +4725 -0
- data/ext/{yarp → prism}/api_pack.c +82 -82
- data/ext/{yarp → prism}/extconf.rb +13 -13
- data/ext/{yarp → prism}/extension.c +175 -168
- data/ext/prism/extension.h +18 -0
- data/include/prism/ast.h +1932 -0
- data/include/prism/defines.h +45 -0
- data/include/prism/diagnostic.h +231 -0
- data/include/{yarp/enc/yp_encoding.h → prism/enc/pm_encoding.h} +40 -40
- data/include/prism/node.h +41 -0
- data/include/prism/pack.h +141 -0
- data/include/{yarp → prism}/parser.h +143 -142
- data/include/prism/regexp.h +19 -0
- data/include/prism/unescape.h +48 -0
- data/include/prism/util/pm_buffer.h +51 -0
- data/include/{yarp/util/yp_char.h → prism/util/pm_char.h} +20 -20
- data/include/{yarp/util/yp_constant_pool.h → prism/util/pm_constant_pool.h} +26 -22
- data/include/{yarp/util/yp_list.h → prism/util/pm_list.h} +21 -21
- data/include/prism/util/pm_memchr.h +14 -0
- data/include/{yarp/util/yp_newline_list.h → prism/util/pm_newline_list.h} +11 -11
- data/include/prism/util/pm_state_stack.h +24 -0
- data/include/{yarp/util/yp_string.h → prism/util/pm_string.h} +20 -20
- data/include/prism/util/pm_string_list.h +25 -0
- data/include/{yarp/util/yp_strpbrk.h → prism/util/pm_strpbrk.h} +7 -7
- data/include/prism/version.h +4 -0
- data/include/prism.h +82 -0
- data/lib/prism/compiler.rb +465 -0
- data/lib/prism/debug.rb +157 -0
- data/lib/{yarp/desugar_visitor.rb → prism/desugar_compiler.rb} +4 -2
- data/lib/prism/dispatcher.rb +2051 -0
- data/lib/prism/dsl.rb +750 -0
- data/lib/{yarp → prism}/ffi.rb +66 -67
- data/lib/{yarp → prism}/lex_compat.rb +40 -43
- data/lib/{yarp/mutation_visitor.rb → prism/mutation_compiler.rb} +3 -3
- data/lib/{yarp → prism}/node.rb +2012 -2593
- data/lib/prism/node_ext.rb +55 -0
- data/lib/prism/node_inspector.rb +68 -0
- data/lib/{yarp → prism}/pack.rb +1 -1
- data/lib/{yarp → prism}/parse_result/comments.rb +1 -1
- data/lib/{yarp → prism}/parse_result/newlines.rb +1 -1
- data/lib/prism/parse_result.rb +266 -0
- data/lib/{yarp → prism}/pattern.rb +14 -14
- data/lib/{yarp → prism}/ripper_compat.rb +5 -5
- data/lib/{yarp → prism}/serialize.rb +12 -7
- data/lib/prism/visitor.rb +470 -0
- data/lib/prism.rb +64 -0
- data/lib/yarp.rb +2 -614
- data/src/diagnostic.c +213 -208
- data/src/enc/pm_big5.c +52 -0
- data/src/enc/pm_euc_jp.c +58 -0
- data/src/enc/{yp_gbk.c → pm_gbk.c} +16 -16
- data/src/enc/pm_shift_jis.c +56 -0
- data/src/enc/{yp_tables.c → pm_tables.c} +69 -69
- data/src/enc/{yp_unicode.c → pm_unicode.c} +40 -40
- data/src/enc/pm_windows_31j.c +56 -0
- data/src/node.c +1293 -1233
- data/src/pack.c +247 -247
- data/src/prettyprint.c +1479 -1479
- data/src/{yarp.c → prism.c} +5205 -5083
- data/src/regexp.c +132 -132
- data/src/serialize.c +1121 -1121
- data/src/token_type.c +169 -167
- data/src/unescape.c +106 -87
- data/src/util/pm_buffer.c +103 -0
- data/src/util/{yp_char.c → pm_char.c} +72 -72
- data/src/util/{yp_constant_pool.c → pm_constant_pool.c} +85 -64
- data/src/util/{yp_list.c → pm_list.c} +10 -10
- data/src/util/{yp_memchr.c → pm_memchr.c} +6 -4
- data/src/util/{yp_newline_list.c → pm_newline_list.c} +21 -21
- data/src/util/{yp_state_stack.c → pm_state_stack.c} +4 -4
- data/src/util/{yp_string.c → pm_string.c} +38 -38
- data/src/util/pm_string_list.c +29 -0
- data/src/util/{yp_strncasecmp.c → pm_strncasecmp.c} +1 -1
- data/src/util/{yp_strpbrk.c → pm_strpbrk.c} +8 -8
- data/yarp.gemspec +68 -59
- metadata +70 -61
- data/ext/yarp/api_node.c +0 -4728
- data/ext/yarp/extension.h +0 -18
- data/include/yarp/ast.h +0 -1929
- data/include/yarp/defines.h +0 -45
- data/include/yarp/diagnostic.h +0 -226
- data/include/yarp/node.h +0 -42
- data/include/yarp/pack.h +0 -141
- data/include/yarp/regexp.h +0 -19
- data/include/yarp/unescape.h +0 -44
- data/include/yarp/util/yp_buffer.h +0 -51
- data/include/yarp/util/yp_memchr.h +0 -14
- data/include/yarp/util/yp_state_stack.h +0 -24
- data/include/yarp/util/yp_string_list.h +0 -25
- data/include/yarp/version.h +0 -4
- data/include/yarp.h +0 -82
- data/src/enc/yp_big5.c +0 -52
- data/src/enc/yp_euc_jp.c +0 -58
- data/src/enc/yp_shift_jis.c +0 -56
- data/src/enc/yp_windows_31j.c +0 -56
- data/src/util/yp_buffer.c +0 -101
- data/src/util/yp_string_list.c +0 -29
data/docs/mapping.md
CHANGED
@@ -1,117 +1,117 @@
|
|
1
1
|
# Mapping
|
2
2
|
|
3
|
-
When considering the previous CRuby parser versus
|
3
|
+
When considering the previous CRuby parser versus prism, this document should be helpful to understand how various concepts are mapped.
|
4
4
|
|
5
5
|
## Nodes
|
6
6
|
|
7
|
-
The following table shows how the various CRuby nodes are mapped to
|
7
|
+
The following table shows how the various CRuby nodes are mapped to prism nodes.
|
8
8
|
|
9
|
-
| CRuby |
|
9
|
+
| CRuby | prism |
|
10
10
|
| --- | --- |
|
11
11
|
| `NODE_SCOPE` | |
|
12
12
|
| `NODE_BLOCK` | |
|
13
|
-
| `NODE_IF` | `
|
14
|
-
| `NODE_UNLESS` | `
|
15
|
-
| `NODE_CASE` | `
|
16
|
-
| `NODE_CASE2` | `
|
13
|
+
| `NODE_IF` | `PM_IF_NODE` |
|
14
|
+
| `NODE_UNLESS` | `PM_UNLESS_NODE` |
|
15
|
+
| `NODE_CASE` | `PM_CASE_NODE` |
|
16
|
+
| `NODE_CASE2` | `PM_CASE_NODE` (with a null predicate) |
|
17
17
|
| `NODE_CASE3` | |
|
18
|
-
| `NODE_WHEN` | `
|
19
|
-
| `NODE_IN` | `
|
20
|
-
| `NODE_WHILE` | `
|
21
|
-
| `NODE_UNTIL` | `
|
22
|
-
| `NODE_ITER` | `
|
23
|
-
| `NODE_FOR` | `
|
24
|
-
| `NODE_FOR_MASGN` | `
|
25
|
-
| `NODE_BREAK` | `
|
26
|
-
| `NODE_NEXT` | `
|
27
|
-
| `NODE_REDO` | `
|
28
|
-
| `NODE_RETRY` | `
|
29
|
-
| `NODE_BEGIN` | `
|
30
|
-
| `NODE_RESCUE` | `
|
18
|
+
| `NODE_WHEN` | `PM_WHEN_NODE` |
|
19
|
+
| `NODE_IN` | `PM_IN_NODE` |
|
20
|
+
| `NODE_WHILE` | `PM_WHILE_NODE` |
|
21
|
+
| `NODE_UNTIL` | `PM_UNTIL_NODE` |
|
22
|
+
| `NODE_ITER` | `PM_CALL_NODE` (with a non-null block) |
|
23
|
+
| `NODE_FOR` | `PM_FOR_NODE` |
|
24
|
+
| `NODE_FOR_MASGN` | `PM_FOR_NODE` (with a multi-write node as the index) |
|
25
|
+
| `NODE_BREAK` | `PM_BREAK_NODE` |
|
26
|
+
| `NODE_NEXT` | `PM_NEXT_NODE` |
|
27
|
+
| `NODE_REDO` | `PM_REDO_NODE` |
|
28
|
+
| `NODE_RETRY` | `PM_RETRY_NODE` |
|
29
|
+
| `NODE_BEGIN` | `PM_BEGIN_NODE` |
|
30
|
+
| `NODE_RESCUE` | `PM_RESCUE_NODE` |
|
31
31
|
| `NODE_RESBODY` | |
|
32
|
-
| `NODE_ENSURE` | `
|
33
|
-
| `NODE_AND` | `
|
34
|
-
| `NODE_OR` | `
|
35
|
-
| `NODE_MASGN` | `
|
36
|
-
| `NODE_LASGN` | `
|
37
|
-
| `NODE_DASGN` | `
|
38
|
-
| `NODE_GASGN` | `
|
39
|
-
| `NODE_IASGN` | `
|
40
|
-
| `NODE_CDECL` | `
|
41
|
-
| `NODE_CVASGN` | `
|
32
|
+
| `NODE_ENSURE` | `PM_ENSURE_NODE` |
|
33
|
+
| `NODE_AND` | `PM_AND_NODE` |
|
34
|
+
| `NODE_OR` | `PM_OR_NODE` |
|
35
|
+
| `NODE_MASGN` | `PM_MULTI_WRITE_NODE` |
|
36
|
+
| `NODE_LASGN` | `PM_LOCAL_VARIABLE_WRITE_NODE` |
|
37
|
+
| `NODE_DASGN` | `PM_LOCAL_VARIABLE_WRITE_NODE` |
|
38
|
+
| `NODE_GASGN` | `PM_GLOBAL_VARIABLE_WRITE_NODE` |
|
39
|
+
| `NODE_IASGN` | `PM_INSTANCE_VARIABLE_WRITE_NODE` |
|
40
|
+
| `NODE_CDECL` | `PM_CONSTANT_PATH_WRITE_NODE` |
|
41
|
+
| `NODE_CVASGN` | `PM_CLASS_VARIABLE_WRITE_NODE` |
|
42
42
|
| `NODE_OP_ASGN1` | |
|
43
43
|
| `NODE_OP_ASGN2` | |
|
44
|
-
| `NODE_OP_ASGN_AND` | `
|
45
|
-
| `NODE_OP_ASGN_OR` | `
|
44
|
+
| `NODE_OP_ASGN_AND` | `PM_OPERATOR_AND_ASSIGNMENT_NODE` |
|
45
|
+
| `NODE_OP_ASGN_OR` | `PM_OPERATOR_OR_ASSIGNMENT_NODE` |
|
46
46
|
| `NODE_OP_CDECL` | |
|
47
|
-
| `NODE_CALL` | `
|
48
|
-
| `NODE_OPCALL` | `
|
49
|
-
| `NODE_FCALL` | `
|
50
|
-
| `NODE_VCALL` | `
|
51
|
-
| `NODE_QCALL` | `
|
52
|
-
| `NODE_SUPER` | `
|
53
|
-
| `NODE_ZSUPER` | `
|
54
|
-
| `NODE_LIST` | `
|
55
|
-
| `NODE_ZLIST` | `
|
56
|
-
| `NODE_VALUES` | `
|
57
|
-
| `NODE_HASH` | `
|
58
|
-
| `NODE_RETURN` | `
|
59
|
-
| `NODE_YIELD` | `
|
60
|
-
| `NODE_LVAR` | `
|
61
|
-
| `NODE_DVAR` | `
|
62
|
-
| `NODE_GVAR` | `
|
63
|
-
| `NODE_IVAR` | `
|
64
|
-
| `NODE_CONST` | `
|
65
|
-
| `NODE_CVAR` | `
|
66
|
-
| `NODE_NTH_REF` | `
|
67
|
-
| `NODE_BACK_REF` | `
|
47
|
+
| `NODE_CALL` | `PM_CALL_NODE` |
|
48
|
+
| `NODE_OPCALL` | `PM_CALL_NODE` (with an operator as the method) |
|
49
|
+
| `NODE_FCALL` | `PM_CALL_NODE` (with a null receiver and parentheses) |
|
50
|
+
| `NODE_VCALL` | `PM_CALL_NODE` (with a null receiver and parentheses or arguments) |
|
51
|
+
| `NODE_QCALL` | `PM_CALL_NODE` (with a &. operator) |
|
52
|
+
| `NODE_SUPER` | `PM_SUPER_NODE` |
|
53
|
+
| `NODE_ZSUPER` | `PM_FORWARDING_SUPER_NODE` |
|
54
|
+
| `NODE_LIST` | `PM_ARRAY_NODE` |
|
55
|
+
| `NODE_ZLIST` | `PM_ARRAY_NODE` (with no child elements) |
|
56
|
+
| `NODE_VALUES` | `PM_ARGUMENTS_NODE` |
|
57
|
+
| `NODE_HASH` | `PM_HASH_NODE` |
|
58
|
+
| `NODE_RETURN` | `PM_RETURN_NODE` |
|
59
|
+
| `NODE_YIELD` | `PM_YIELD_NODE` |
|
60
|
+
| `NODE_LVAR` | `PM_LOCAL_VARIABLE_READ_NODE` |
|
61
|
+
| `NODE_DVAR` | `PM_LOCAL_VARIABLE_READ_NODE` |
|
62
|
+
| `NODE_GVAR` | `PM_GLOBAL_VARIABLE_READ_NODE` |
|
63
|
+
| `NODE_IVAR` | `PM_INSTANCE_VARIABLE_READ_NODE` |
|
64
|
+
| `NODE_CONST` | `PM_CONSTANT_PATH_READ_NODE` |
|
65
|
+
| `NODE_CVAR` | `PM_CLASS_VARIABLE_READ_NODE` |
|
66
|
+
| `NODE_NTH_REF` | `PM_NUMBERED_REFERENCE_READ_NODE` |
|
67
|
+
| `NODE_BACK_REF` | `PM_BACK_REFERENCE_READ_NODE` |
|
68
68
|
| `NODE_MATCH` | |
|
69
|
-
| `NODE_MATCH2` | `
|
70
|
-
| `NODE_MATCH3` | `
|
69
|
+
| `NODE_MATCH2` | `PM_CALL_NODE` (with regular expression as receiver) |
|
70
|
+
| `NODE_MATCH3` | `PM_CALL_NODE` (with regular expression as only argument) |
|
71
71
|
| `NODE_LIT` | |
|
72
|
-
| `NODE_STR` | `
|
73
|
-
| `NODE_DSTR` | `
|
74
|
-
| `NODE_XSTR` | `
|
75
|
-
| `NODE_DXSTR` | `
|
76
|
-
| `NODE_EVSTR` | `
|
77
|
-
| `NODE_DREGX` | `
|
72
|
+
| `NODE_STR` | `PM_STRING_NODE` |
|
73
|
+
| `NODE_DSTR` | `PM_INTERPOLATED_STRING_NODE` |
|
74
|
+
| `NODE_XSTR` | `PM_X_STRING_NODE` |
|
75
|
+
| `NODE_DXSTR` | `PM_INTERPOLATED_X_STRING_NODE` |
|
76
|
+
| `NODE_EVSTR` | `PM_STRING_INTERPOLATED_NODE` |
|
77
|
+
| `NODE_DREGX` | `PM_INTERPOLATED_REGULAR_EXPRESSION_NODE` |
|
78
78
|
| `NODE_ONCE` | |
|
79
|
-
| `NODE_ARGS` | `
|
79
|
+
| `NODE_ARGS` | `PM_PARAMETERS_NODE` |
|
80
80
|
| `NODE_ARGS_AUX` | |
|
81
|
-
| `NODE_OPT_ARG` | `
|
82
|
-
| `NODE_KW_ARG` | `
|
83
|
-
| `NODE_POSTARG` | `
|
81
|
+
| `NODE_OPT_ARG` | `PM_OPTIONAL_PARAMETER_NODE` |
|
82
|
+
| `NODE_KW_ARG` | `PM_KEYWORD_PARAMETER_NODE` |
|
83
|
+
| `NODE_POSTARG` | `PM_REQUIRED_PARAMETER_NODE` |
|
84
84
|
| `NODE_ARGSCAT` | |
|
85
85
|
| `NODE_ARGSPUSH` | |
|
86
|
-
| `NODE_SPLAT` | `
|
87
|
-
| `NODE_BLOCK_PASS` | `
|
88
|
-
| `NODE_DEFN` | `
|
89
|
-
| `NODE_DEFS` | `
|
90
|
-
| `NODE_ALIAS` | `
|
91
|
-
| `NODE_VALIAS` | `
|
92
|
-
| `NODE_UNDEF` | `
|
93
|
-
| `NODE_CLASS` | `
|
94
|
-
| `NODE_MODULE` | `
|
95
|
-
| `NODE_SCLASS` | `
|
96
|
-
| `NODE_COLON2` | `
|
97
|
-
| `NODE_COLON3` | `
|
98
|
-
| `NODE_DOT2` | `
|
99
|
-
| `NODE_DOT3` | `
|
100
|
-
| `NODE_FLIP2` | `
|
101
|
-
| `NODE_FLIP3` | `
|
102
|
-
| `NODE_SELF` | `
|
103
|
-
| `NODE_NIL` | `
|
104
|
-
| `NODE_TRUE` | `
|
105
|
-
| `NODE_FALSE` | `
|
86
|
+
| `NODE_SPLAT` | `PM_SPLAT_NODE` |
|
87
|
+
| `NODE_BLOCK_PASS` | `PM_BLOCK_ARGUMENT_NODE` |
|
88
|
+
| `NODE_DEFN` | `PM_DEF_NODE` (with a null receiver) |
|
89
|
+
| `NODE_DEFS` | `PM_DEF_NODE` (with a non-null receiver) |
|
90
|
+
| `NODE_ALIAS` | `PM_ALIAS_NODE` |
|
91
|
+
| `NODE_VALIAS` | `PM_ALIAS_NODE` (with a global variable first argument) |
|
92
|
+
| `NODE_UNDEF` | `PM_UNDEF_NODE` |
|
93
|
+
| `NODE_CLASS` | `PM_CLASS_NODE` |
|
94
|
+
| `NODE_MODULE` | `PM_MODULE_NODE` |
|
95
|
+
| `NODE_SCLASS` | `PM_S_CLASS_NODE` |
|
96
|
+
| `NODE_COLON2` | `PM_CONSTANT_PATH_NODE` |
|
97
|
+
| `NODE_COLON3` | `PM_CONSTANT_PATH_NODE` (with a null receiver) |
|
98
|
+
| `NODE_DOT2` | `PM_RANGE_NODE` (with a .. operator) |
|
99
|
+
| `NODE_DOT3` | `PM_RANGE_NODE` (with a ... operator) |
|
100
|
+
| `NODE_FLIP2` | `PM_RANGE_NODE` (with a .. operator) |
|
101
|
+
| `NODE_FLIP3` | `PM_RANGE_NODE` (with a ... operator) |
|
102
|
+
| `NODE_SELF` | `PM_SELF_NODE` |
|
103
|
+
| `NODE_NIL` | `PM_NIL_NODE` |
|
104
|
+
| `NODE_TRUE` | `PM_TRUE_NODE` |
|
105
|
+
| `NODE_FALSE` | `PM_FALSE_NODE` |
|
106
106
|
| `NODE_ERRINFO` | |
|
107
|
-
| `NODE_DEFINED` | `
|
108
|
-
| `NODE_POSTEXE` | `
|
109
|
-
| `NODE_DSYM` | `
|
110
|
-
| `NODE_ATTRASGN` | `
|
111
|
-
| `NODE_LAMBDA` | `
|
112
|
-
| `NODE_ARYPTN` | `
|
113
|
-
| `NODE_HSHPTN` | `
|
114
|
-
| `NODE_FNDPTN` | `
|
115
|
-
| `NODE_ERROR` | `
|
107
|
+
| `NODE_DEFINED` | `PM_DEFINED_NODE` |
|
108
|
+
| `NODE_POSTEXE` | `PM_POST_EXECUTION_NODE` |
|
109
|
+
| `NODE_DSYM` | `PM_INTERPOLATED_SYMBOL_NODE` |
|
110
|
+
| `NODE_ATTRASGN` | `PM_CALL_NODE` (with a message that ends with =) |
|
111
|
+
| `NODE_LAMBDA` | `PM_LAMBDA_NODE` |
|
112
|
+
| `NODE_ARYPTN` | `PM_ARRAY_PATTERN_NODE` |
|
113
|
+
| `NODE_HSHPTN` | `PM_HASH_PATTERN_NODE` |
|
114
|
+
| `NODE_FNDPTN` | `PM_FIND_PATTERN_NODE` |
|
115
|
+
| `NODE_ERROR` | `PM_MISSING_NODE` |
|
116
116
|
| `NODE_LAST` | |
|
117
117
|
```
|
data/docs/ripper.md
CHANGED
@@ -2,12 +2,12 @@
|
|
2
2
|
|
3
3
|
To test the parser, we compare against the output from `Ripper`, both for testing the lexer and testing the parser. The lexer test suite is much more feature complete at the moment.
|
4
4
|
|
5
|
-
To lex source code using `
|
5
|
+
To lex source code using `prism`, you typically would run `Prism.lex(source)`. If you want to instead get output that `Ripper` would normally produce, you can run `Prism.lex_compat(source)`. This will produce tokens that should be equivalent to `Ripper`.
|
6
6
|
|
7
|
-
To parse source code using `
|
7
|
+
To parse source code using `prism`, you typically would run `Prism.parse(source)`. If you want to instead using the `Ripper` streaming interface, you can inherit from `Prism::RipperCompat` and override the `on_*` methods. This will produce a syntax tree that should be equivalent to `Ripper`. That would look like:
|
8
8
|
|
9
9
|
```ruby
|
10
|
-
class ArithmeticRipper <
|
10
|
+
class ArithmeticRipper < Prism::RipperCompat
|
11
11
|
def on_binary(left, operator, right)
|
12
12
|
left.public_send(operator, right)
|
13
13
|
end
|
@@ -33,4 +33,4 @@ end
|
|
33
33
|
ArithmeticRipper.new("1 + 2 - 3").parse # => [0]
|
34
34
|
```
|
35
35
|
|
36
|
-
There are also APIs for building trees similar to the s-expression builders in `Ripper`. The method names are the same. These include `
|
36
|
+
There are also APIs for building trees similar to the s-expression builders in `Ripper`. The method names are the same. These include `Prism::RipperCompat.sexp_raw(source)` and `Prism::RipperCompat.sexp(source)`.
|
data/docs/ruby_api.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# Ruby API
|
2
2
|
|
3
|
-
The `
|
3
|
+
The `prism` gem provides a Ruby API for accessing the syntax tree.
|
4
4
|
|
5
5
|
For the most part, the API for accessing the tree mirrors that found in the [Syntax Tree](https://github.com/ruby-syntax-tree/syntax_tree) project. This means:
|
6
6
|
|
@@ -9,17 +9,17 @@ For the most part, the API for accessing the tree mirrors that found in the [Syn
|
|
9
9
|
* Nodes respond to the pattern matching interfaces `#deconstruct` and `#deconstruct_keys`
|
10
10
|
|
11
11
|
Every entry in `config.yml` will generate a Ruby class as well as the code that builds the nodes themselves.
|
12
|
-
Creating a syntax tree involves calling one of the class methods on the `
|
12
|
+
Creating a syntax tree involves calling one of the class methods on the `Prism` module.
|
13
13
|
The full API is documented below.
|
14
14
|
|
15
15
|
## API
|
16
16
|
|
17
|
-
* `
|
18
|
-
* `
|
19
|
-
* `
|
20
|
-
* `
|
21
|
-
* `
|
22
|
-
* `
|
23
|
-
* `
|
24
|
-
* `
|
25
|
-
* `
|
17
|
+
* `Prism.dump(source, filepath)` - parse the syntax tree corresponding to the given source string and filepath, and serialize it to a string. Filepath can be nil.
|
18
|
+
* `Prism.dump_file(filepath)` - parse the syntax tree corresponding to the given source file and serialize it to a string
|
19
|
+
* `Prism.lex(source)` - parse the tokens corresponding to the given source string and return them as an array within a parse result
|
20
|
+
* `Prism.lex_file(filepath)` - parse the tokens corresponding to the given source file and return them as an array within a parse result
|
21
|
+
* `Prism.parse(source)` - parse the syntax tree corresponding to the given source string and return it within a parse result
|
22
|
+
* `Prism.parse_file(filepath)` - parse the syntax tree corresponding to the given source file and return it within a parse result
|
23
|
+
* `Prism.parse_lex(source)` - parse the syntax tree corresponding to the given source string and return it within a parse result, along with the tokens
|
24
|
+
* `Prism.parse_lex_file(filepath)` - parse the syntax tree corresponding to the given source file and return it within a parse result, along with the tokens
|
25
|
+
* `Prism.load(source, serialized)` - load the serialized syntax tree using the source as a reference into a syntax tree
|
data/docs/serialization.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# Serialization
|
2
2
|
|
3
|
-
|
3
|
+
Prism ships with the ability to serialize a syntax tree to a single string.
|
4
4
|
The string can then be deserialized back into a syntax tree using a language other than C.
|
5
5
|
This is useful for using the parsing logic in other tools without having to write a parser in that language.
|
6
6
|
The syntax tree still requires a copy of the original source, as for the most part it just contains byte offsets into the source string.
|
@@ -50,7 +50,7 @@ The comment type is one of:
|
|
50
50
|
## Structure
|
51
51
|
|
52
52
|
The serialized string representing the syntax tree is composed of three parts: the header, the body, and the constant pool.
|
53
|
-
The header contains information like the version of
|
53
|
+
The header contains information like the version of prism that serialized the tree.
|
54
54
|
The body contains the actual nodes in the tree.
|
55
55
|
The constant pool contains constants that were interned while parsing.
|
56
56
|
|
@@ -58,10 +58,11 @@ The header is structured like the following table:
|
|
58
58
|
|
59
59
|
| # bytes | field |
|
60
60
|
| --- | --- |
|
61
|
-
| `
|
61
|
+
| `5` | "PRISM" |
|
62
62
|
| `1` | major version number |
|
63
63
|
| `1` | minor version number |
|
64
64
|
| `1` | patch version number |
|
65
|
+
| `1` | 1 indicates only semantics fields were serialized, 0 indicates all fields were serialized (including location fields) |
|
65
66
|
| string | the encoding name |
|
66
67
|
| varint | number of comments |
|
67
68
|
| comment* | comments |
|
@@ -116,42 +117,42 @@ After the constant pool, the contents of the owned constants are serialized. Thi
|
|
116
117
|
The relevant APIs and struct definitions are listed below:
|
117
118
|
|
118
119
|
```c
|
119
|
-
// A
|
120
|
+
// A pm_buffer_t is a simple memory buffer that stores data in a contiguous
|
120
121
|
// block of memory. It is used to store the serialized representation of a
|
121
|
-
//
|
122
|
+
// prism tree.
|
122
123
|
typedef struct {
|
123
124
|
char *value;
|
124
125
|
size_t length;
|
125
126
|
size_t capacity;
|
126
|
-
}
|
127
|
+
} pm_buffer_t;
|
127
128
|
|
128
|
-
// Initialize a
|
129
|
-
bool
|
129
|
+
// Initialize a pm_buffer_t with its default values.
|
130
|
+
bool pm_buffer_init(pm_buffer_t *);
|
130
131
|
|
131
132
|
// Free the memory associated with the buffer.
|
132
|
-
void
|
133
|
+
void pm_buffer_free(pm_buffer_t *);
|
133
134
|
|
134
135
|
// Parse and serialize the AST represented by the given source to the given
|
135
136
|
// buffer.
|
136
|
-
void
|
137
|
+
void pm_parse_serialize(const uint8_t *source, size_t length, pm_buffer_t *buffer, const char *metadata);
|
137
138
|
```
|
138
139
|
|
139
|
-
Typically you would use a stack-allocated `
|
140
|
+
Typically you would use a stack-allocated `pm_buffer_t` and call `pm_parse_serialize`, as in:
|
140
141
|
|
141
142
|
```c
|
142
143
|
void
|
143
144
|
serialize(const uint8_t *source, size_t length) {
|
144
|
-
|
145
|
-
if (!
|
145
|
+
pm_buffer_t buffer;
|
146
|
+
if (!pm_buffer_init(&buffer)) return;
|
146
147
|
|
147
|
-
|
148
|
+
pm_parse_serialize(source, length, &buffer, NULL);
|
148
149
|
// Do something with the serialized string.
|
149
150
|
|
150
|
-
|
151
|
+
pm_buffer_free(&buffer);
|
151
152
|
}
|
152
153
|
```
|
153
154
|
|
154
|
-
The final argument to `
|
155
|
+
The final argument to `pm_parse_serialize` controls the metadata of the source.
|
155
156
|
This includes the filepath that the source is associated with, and any nested local variables scopes that are necessary to properly parse the file (in the case of parsing an `eval`).
|
156
157
|
Note that no `varint` are used here to make it easier to produce the metadata for the caller, and also serialized size is less important here.
|
157
158
|
The metadata is a serialized format itself, and is structured as follows:
|
data/docs/testing.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# Testing
|
2
2
|
|
3
|
-
This document explains how to test
|
3
|
+
This document explains how to test prism, both locally, and against existing test suites.
|
4
4
|
|
5
5
|
## Test suite
|
6
6
|
|
@@ -8,13 +8,13 @@ This document explains how to test YARP, both locally, and against existing test
|
|
8
8
|
|
9
9
|
### Unit tests
|
10
10
|
|
11
|
-
These test specific
|
11
|
+
These test specific prism implementation details like comments, errors, and regular expressions. There are corresponding files for each thing being tested (like `test/errors_test.rb`).
|
12
12
|
|
13
13
|
### Snapshot tests
|
14
14
|
|
15
|
-
Snapshot tests ensure that parsed output is equivalent to previous parsed output. There are many categorized examples of valid syntax within the `test/
|
15
|
+
Snapshot tests ensure that parsed output is equivalent to previous parsed output. There are many categorized examples of valid syntax within the `test/prism/fixtures/` directory. When the test suite runs, it will parse all of this syntax, and compare it against corresponding files in the `test/prism/snapshots/` directory. For example, `test/prism/fixtures/strings.txt` has a corresponding `test/prism/snapshots/strings.txt`.
|
16
16
|
|
17
|
-
If the parsed files do not match, it will raise an error. If there is not a corresponding file in the `test/
|
17
|
+
If the parsed files do not match, it will raise an error. If there is not a corresponding file in the `test/prism/snapshots/` directory, one will be created so that it exists for the next test run.
|
18
18
|
|
19
19
|
### Testing against repositories
|
20
20
|
|
@@ -24,7 +24,7 @@ To test the parser against a repository, you can run `FILEPATHS='/path/to/reposi
|
|
24
24
|
|
25
25
|
As you are working, you will likely want to test your code locally. `test.rb` is ignored by git, so it can be used for local testing. There are also two executables which may help you:
|
26
26
|
|
27
|
-
1. **bin/lex** takes a filepath and compares
|
27
|
+
1. **bin/lex** takes a filepath and compares prism's lexed output to Ripper's lexed output. It prints any lexed output that doesn't match. It does some minor transformations to the lexed output in order to compare them, like split prism's heredoc tokens to mirror Ripper's.
|
28
28
|
|
29
29
|
```
|
30
30
|
$ bin/lex test.rb
|
@@ -42,7 +42,7 @@ $ VERBOSE=1 bin/lex test.rb
|
|
42
42
|
$ bin/lex -e "1 + 2"
|
43
43
|
```
|
44
44
|
|
45
|
-
2. **bin/parse** takes a filepath and outputs
|
45
|
+
2. **bin/parse** takes a filepath and outputs prism's parsed node structure generated from reading the file.
|
46
46
|
|
47
47
|
```
|
48
48
|
$ bin/parse test.rb
|