prism 0.16.0 → 0.17.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (86) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +16 -1
  3. data/Makefile +6 -0
  4. data/README.md +1 -1
  5. data/config.yml +50 -35
  6. data/docs/fuzzing.md +1 -1
  7. data/docs/serialization.md +28 -29
  8. data/ext/prism/api_node.c +802 -770
  9. data/ext/prism/api_pack.c +20 -9
  10. data/ext/prism/extension.c +464 -162
  11. data/ext/prism/extension.h +1 -1
  12. data/include/prism/ast.h +3173 -763
  13. data/include/prism/defines.h +32 -9
  14. data/include/prism/diagnostic.h +36 -3
  15. data/include/prism/enc/pm_encoding.h +118 -28
  16. data/include/prism/node.h +38 -13
  17. data/include/prism/options.h +204 -0
  18. data/include/prism/pack.h +44 -33
  19. data/include/prism/parser.h +445 -200
  20. data/include/prism/prettyprint.h +12 -1
  21. data/include/prism/regexp.h +16 -2
  22. data/include/prism/util/pm_buffer.h +94 -16
  23. data/include/prism/util/pm_char.h +162 -48
  24. data/include/prism/util/pm_constant_pool.h +126 -32
  25. data/include/prism/util/pm_list.h +68 -38
  26. data/include/prism/util/pm_memchr.h +18 -3
  27. data/include/prism/util/pm_newline_list.h +70 -27
  28. data/include/prism/util/pm_state_stack.h +25 -7
  29. data/include/prism/util/pm_string.h +115 -27
  30. data/include/prism/util/pm_string_list.h +25 -6
  31. data/include/prism/util/pm_strncasecmp.h +32 -0
  32. data/include/prism/util/pm_strpbrk.h +31 -17
  33. data/include/prism/version.h +27 -2
  34. data/include/prism.h +224 -31
  35. data/lib/prism/compiler.rb +6 -3
  36. data/lib/prism/debug.rb +23 -7
  37. data/lib/prism/dispatcher.rb +33 -18
  38. data/lib/prism/dsl.rb +10 -5
  39. data/lib/prism/ffi.rb +132 -80
  40. data/lib/prism/lex_compat.rb +25 -15
  41. data/lib/prism/mutation_compiler.rb +10 -5
  42. data/lib/prism/node.rb +370 -135
  43. data/lib/prism/node_ext.rb +1 -1
  44. data/lib/prism/node_inspector.rb +1 -1
  45. data/lib/prism/pack.rb +79 -40
  46. data/lib/prism/parse_result/comments.rb +7 -2
  47. data/lib/prism/parse_result/newlines.rb +4 -0
  48. data/lib/prism/parse_result.rb +150 -30
  49. data/lib/prism/pattern.rb +11 -0
  50. data/lib/prism/ripper_compat.rb +28 -10
  51. data/lib/prism/serialize.rb +86 -54
  52. data/lib/prism/visitor.rb +10 -3
  53. data/lib/prism.rb +20 -2
  54. data/prism.gemspec +4 -2
  55. data/rbi/prism.rbi +104 -60
  56. data/rbi/prism_static.rbi +16 -2
  57. data/sig/prism.rbs +72 -43
  58. data/sig/prism_static.rbs +14 -1
  59. data/src/diagnostic.c +56 -53
  60. data/src/enc/pm_big5.c +1 -0
  61. data/src/enc/pm_euc_jp.c +1 -0
  62. data/src/enc/pm_gbk.c +1 -0
  63. data/src/enc/pm_shift_jis.c +1 -0
  64. data/src/enc/pm_tables.c +316 -80
  65. data/src/enc/pm_unicode.c +53 -8
  66. data/src/enc/pm_windows_31j.c +1 -0
  67. data/src/node.c +334 -321
  68. data/src/options.c +170 -0
  69. data/src/prettyprint.c +74 -47
  70. data/src/prism.c +1642 -856
  71. data/src/regexp.c +151 -95
  72. data/src/serialize.c +44 -20
  73. data/src/token_type.c +3 -1
  74. data/src/util/pm_buffer.c +45 -15
  75. data/src/util/pm_char.c +103 -57
  76. data/src/util/pm_constant_pool.c +51 -21
  77. data/src/util/pm_list.c +12 -4
  78. data/src/util/pm_memchr.c +5 -3
  79. data/src/util/pm_newline_list.c +20 -12
  80. data/src/util/pm_state_stack.c +9 -3
  81. data/src/util/pm_string.c +95 -85
  82. data/src/util/pm_string_list.c +14 -15
  83. data/src/util/pm_strncasecmp.c +10 -3
  84. data/src/util/pm_strpbrk.c +25 -19
  85. metadata +5 -3
  86. data/docs/prism.png +0 -0
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 498f20248401af377faf45d30b7154d819b8bea560be2747c030d5ae536851c9
4
- data.tar.gz: 8d3bb4fc0afb899869d8d6a55a526a40769fb46a582cf465035de750824d72d7
3
+ metadata.gz: 69cdca044f91ad2a198666562fdffd6035323906da465743ebff2cbd34ccac5d
4
+ data.tar.gz: 3da1460885f7cabda4b1ae6438274ab64a6e966536587d98eab2b3c764d0b6c0
5
5
  SHA512:
6
- metadata.gz: 2d71744c4342a671b578ed7ebf3dede7b0146404a61e40ed8b082c544f5d8e9d83dc4e1f0780555cab4c70bd052156e26f9d8b46a2f2012bcedf89bf5b21d54b
7
- data.tar.gz: 2a2178b5615fe4c55d7c16156901423fe38dee075e4f60cc893c367630e7c136cc7d6bc19f3fcf037a6958d158b2663f61389656754fe679570271503a5b35a2
6
+ metadata.gz: c01d8b62728fe1cbce99394d683c28b943b7962e6a1fc82d435b3190286612fc4d7e4aaab93a6bef7a75f8acdfa3b5a597acac3295e489a7dd89aada94b29b23
7
+ data.tar.gz: ce88826e2a46cb18fe5e89b1a7c18c8fba2387b9c9e5625ec9aac5e5bd1449b21f23624bd0a6b526c4a9fa156c679186f8b89c09d0a003d386663f4a6a33269e
data/CHANGELOG.md CHANGED
@@ -6,6 +6,20 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [0.17.0] - 2023-11-03
10
+
11
+ ### Added
12
+
13
+ - We now properly support forwarding arguments into arrays, like `def foo(*) = [*]`.
14
+ - We now have much better documentation for the C and Ruby APIs.
15
+ - We now properly provide an error message when attempting to assign to numbered parameters from within regular expression named capture groups, as in `/(?<_1>)/ =~ ""`.
16
+
17
+ ### Changed
18
+
19
+ - **BREAKING**: `KeywordParameterNode` is split into `OptionalKeywordParameterNode` and `RequiredKeywordParameterNode`. `RequiredKeywordParameterNode` has no `value` field.
20
+ - **BREAKING**: Most of the `Prism::` APIs now accept a bunch of keyword options. The options we now support are: `filepath`, `encoding`, `line`, `frozen_string_literal`, `verbose`, and `scopes`. See [the pull request](https://github.com/ruby/prism/pull/1763) for more details.
21
+ - **BREAKING**: Comments are now split into three different classes instead of a single class, and the `type` field has been removed. They are: `InlineComment`, `EmbDocComment`, and `DATAComment`.
22
+
9
23
  ## [0.16.0] - 2023-10-30
10
24
 
11
25
  ### Added
@@ -219,7 +233,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
219
233
 
220
234
  - 🎉 Initial release! 🎉
221
235
 
222
- [unreleased]: https://github.com/ruby/prism/compare/v0.16.0...HEAD
236
+ [unreleased]: https://github.com/ruby/prism/compare/v0.17.0...HEAD
237
+ [0.17.0]: https://github.com/ruby/prism/compare/v0.16.0...v0.17.0
223
238
  [0.16.0]: https://github.com/ruby/prism/compare/v0.15.1...v0.16.0
224
239
  [0.15.1]: https://github.com/ruby/prism/compare/v0.15.0...v0.15.1
225
240
  [0.15.0]: https://github.com/ruby/prism/compare/v0.14.0...v0.15.0
data/Makefile CHANGED
@@ -88,3 +88,9 @@ clean:
88
88
  all-no-debug: DEBUG_FLAGS := -DNDEBUG=1
89
89
  all-no-debug: OPTFLAGS := -O3
90
90
  all-no-debug: all
91
+
92
+ run: Makefile $(STATIC_OBJECTS) $(HEADERS) test.c
93
+ $(ECHO) "compiling test.c"
94
+ $(Q) $(CC) $(CPPFLAGS) $(CFLAGS) $(STATIC_OBJECTS) test.c
95
+ $(ECHO) "running test.c"
96
+ $(Q) ./a.out
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <h1 align="center">Prism Ruby parser</h1>
2
2
  <div align="center">
3
- <img alt="Prism Ruby parser" height="256px" src="https://github.com/ruby/prism/blob/main/docs/prism.png?raw=true">
3
+ <img alt="Prism Ruby parser" height="256px" src="https://github.com/ruby/prism/blob/main/doc/images/prism.png?raw=true">
4
4
  </div>
5
5
 
6
6
  This is a parser for the Ruby programming language. It is designed to be portable, error tolerant, and maintainable. It is written in C99 and has no dependencies. It is currently being integrated into [CRuby](https://github.com/ruby/ruby), [JRuby](https://github.com/jruby/jruby), [TruffleRuby](https://github.com/oracle/truffleruby), [Sorbet](https://github.com/sorbet/sorbet), and [Syntax Tree](https://github.com/ruby-syntax-tree/syntax_tree).
data/config.yml CHANGED
@@ -59,11 +59,11 @@ tokens:
59
59
  - name: CONSTANT
60
60
  comment: "a constant"
61
61
  - name: DOT
62
- comment: "."
62
+ comment: "the . call operator"
63
63
  - name: DOT_DOT
64
- comment: ".."
64
+ comment: "the .. range operator"
65
65
  - name: DOT_DOT_DOT
66
- comment: "..."
66
+ comment: "the ... range operator or forwarding parameter"
67
67
  - name: EMBDOC_BEGIN
68
68
  comment: "=begin"
69
69
  - name: EMBDOC_END
@@ -311,9 +311,9 @@ tokens:
311
311
  - name: UCOLON_COLON
312
312
  comment: "unary ::"
313
313
  - name: UDOT_DOT
314
- comment: "unary .."
314
+ comment: "unary .. operator"
315
315
  - name: UDOT_DOT_DOT
316
- comment: "unary ..."
316
+ comment: "unary ... operator"
317
317
  - name: UMINUS
318
318
  comment: "-@"
319
319
  - name: UMINUS_NUM
@@ -333,12 +333,14 @@ flags:
333
333
  values:
334
334
  - name: KEYWORD_SPLAT
335
335
  comment: "if arguments contain keyword splat"
336
+ comment: Flags for arguments nodes.
336
337
  - name: CallNodeFlags
337
338
  values:
338
339
  - name: SAFE_NAVIGATION
339
340
  comment: "&. operator"
340
341
  - name: VARIABLE_CALL
341
342
  comment: "a call that could have been a local variable"
343
+ comment: Flags for call nodes.
342
344
  - name: IntegerBaseFlags
343
345
  values:
344
346
  - name: BINARY
@@ -349,14 +351,17 @@ flags:
349
351
  comment: "0d or no prefix"
350
352
  - name: HEXADECIMAL
351
353
  comment: "0x prefix"
354
+ comment: Flags for integer nodes that correspond to the base of the integer.
352
355
  - name: LoopFlags
353
356
  values:
354
357
  - name: BEGIN_MODIFIER
355
358
  comment: "a loop after a begin statement, so the body is executed first before the condition"
359
+ comment: Flags for while and until loop nodes.
356
360
  - name: RangeFlags
357
361
  values:
358
362
  - name: EXCLUDE_END
359
363
  comment: "... operator"
364
+ comment: Flags for range and flip-flop nodes.
360
365
  - name: RegularExpressionFlags
361
366
  values:
362
367
  - name: IGNORE_CASE
@@ -375,10 +380,12 @@ flags:
375
380
  comment: "s - forces the Windows-31J encoding"
376
381
  - name: UTF_8
377
382
  comment: "u - forces the UTF-8 encoding"
383
+ comment: Flags for regular expression and match last line nodes.
378
384
  - name: StringFlags
379
385
  values:
380
386
  - name: FROZEN
381
387
  comment: "frozen by virtue of a `frozen_string_literal` comment"
388
+ comment: Flags for string nodes.
382
389
  nodes:
383
390
  - name: AliasGlobalVariableNode
384
391
  fields:
@@ -777,10 +784,10 @@ nodes:
777
784
  comment: |
778
785
  Represents the use of a case statement.
779
786
 
780
- case true
781
- ^^^^^^^^^
782
- when false
783
- end
787
+ case true
788
+ when false
789
+ end
790
+ ^^^^^^^^^^
784
791
  - name: ClassNode
785
792
  fields:
786
793
  - name: locals
@@ -818,7 +825,7 @@ nodes:
818
825
  Represents the use of the `&&=` operator for assignment to a class variable.
819
826
 
820
827
  @@target &&= value
821
- ^^^^^^^^^^^^^^^^
828
+ ^^^^^^^^^^^^^^^^^^
822
829
  - name: ClassVariableOperatorWriteNode
823
830
  fields:
824
831
  - name: name
@@ -1183,13 +1190,13 @@ nodes:
1183
1190
  Represents a find pattern in pattern matching.
1184
1191
 
1185
1192
  foo in *bar, baz, *qux
1186
- ^^^^^^^^^^^^^^^^^^^^^^
1193
+ ^^^^^^^^^^^^^^^
1187
1194
 
1188
1195
  foo in [*bar, baz, *qux]
1189
- ^^^^^^^^^^^^^^^^^^^^^^^^
1196
+ ^^^^^^^^^^^^^^^^^
1190
1197
 
1191
1198
  foo in Foo(*bar, baz, *qux)
1192
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
1199
+ ^^^^^^^^^^^^^^^^^^^^
1193
1200
  - name: FlipFlopNode
1194
1201
  fields:
1195
1202
  - name: left
@@ -1240,7 +1247,7 @@ nodes:
1240
1247
 
1241
1248
  def foo(...)
1242
1249
  bar(...)
1243
- ^^^^^^^^
1250
+ ^^^
1244
1251
  end
1245
1252
  - name: ForwardingParameterNode
1246
1253
  comment: |
@@ -1692,24 +1699,6 @@ nodes:
1692
1699
 
1693
1700
  foo(a: b)
1694
1701
  ^^^^
1695
- - name: KeywordParameterNode
1696
- fields:
1697
- - name: name
1698
- type: constant
1699
- - name: name_loc
1700
- type: location
1701
- - name: value
1702
- type: node?
1703
- comment: |
1704
- Represents a keyword parameter to a method, block, or lambda definition.
1705
-
1706
- def a(b:)
1707
- ^^
1708
- end
1709
-
1710
- def a(b: 1)
1711
- ^^^^
1712
- end
1713
1702
  - name: KeywordRestParameterNode
1714
1703
  fields:
1715
1704
  - name: name
@@ -1997,6 +1986,20 @@ nodes:
1997
1986
 
1998
1987
  $1
1999
1988
  ^^
1989
+ - name: OptionalKeywordParameterNode
1990
+ fields:
1991
+ - name: name
1992
+ type: constant
1993
+ - name: name_loc
1994
+ type: location
1995
+ - name: value
1996
+ type: node
1997
+ comment: |
1998
+ Represents an optional keyword parameter to a method, block, or lambda definition.
1999
+
2000
+ def a(b: 1)
2001
+ ^^^^
2002
+ end
2000
2003
  - name: OptionalParameterNode
2001
2004
  fields:
2002
2005
  - name: name
@@ -2184,6 +2187,18 @@ nodes:
2184
2187
 
2185
2188
  /foo/i
2186
2189
  ^^^^^^
2190
+ - name: RequiredKeywordParameterNode
2191
+ fields:
2192
+ - name: name
2193
+ type: constant
2194
+ - name: name_loc
2195
+ type: location
2196
+ comment: |
2197
+ Represents a required keyword parameter to a method, block, or lambda definition.
2198
+
2199
+ def a(b: )
2200
+ ^^
2201
+ end
2187
2202
  - name: RequiredParameterNode
2188
2203
  fields:
2189
2204
  - name: name
@@ -2206,8 +2221,8 @@ nodes:
2206
2221
  comment: |
2207
2222
  Represents an expression modified with a rescue.
2208
2223
 
2209
- foo rescue nil
2210
- ^^^^^^^^^^^^^^
2224
+ foo rescue nil
2225
+ ^^^^^^^^^^^^^^
2211
2226
  - name: RescueNode
2212
2227
  fields:
2213
2228
  - name: keyword_loc
@@ -2229,8 +2244,8 @@ nodes:
2229
2244
 
2230
2245
  begin
2231
2246
  rescue Foo, *splat, Bar => ex
2232
- ^^^^^^
2233
2247
  foo
2248
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2234
2249
  end
2235
2250
 
2236
2251
  `Foo, *splat, Bar` are in the `exceptions` field.
data/docs/fuzzing.md CHANGED
@@ -25,7 +25,7 @@ fuzz
25
25
 
26
26
  There are currently three fuzzing targets
27
27
 
28
- - `pm_parse_serialize` (parse)
28
+ - `pm_serialize_parse` (parse)
29
29
  - `pm_regexp_named_capture_group_names` (regexp)
30
30
 
31
31
  Respectively, fuzzing can be performed with
@@ -72,6 +72,7 @@ The header is structured like the following table:
72
72
  | `1` | patch version number |
73
73
  | `1` | 1 indicates only semantics fields were serialized, 0 indicates all fields were serialized (including location fields) |
74
74
  | string | the encoding name |
75
+ | varint | the start line |
75
76
  | varint | number of comments |
76
77
  | comment* | comments |
77
78
  | varint | number of magic comments |
@@ -136,56 +137,54 @@ typedef struct {
136
137
  size_t capacity;
137
138
  } pm_buffer_t;
138
139
 
139
- // Initialize a pm_buffer_t with its default values.
140
- bool pm_buffer_init(pm_buffer_t *);
141
-
142
140
  // Free the memory associated with the buffer.
143
141
  void pm_buffer_free(pm_buffer_t *);
144
142
 
145
143
  // Parse and serialize the AST represented by the given source to the given
146
144
  // buffer.
147
- void pm_parse_serialize(const uint8_t *source, size_t length, pm_buffer_t *buffer, const char *metadata);
145
+ void pm_serialize_parse(pm_buffer_t *buffer, const uint8_t *source, size_t length, const char *data);
148
146
  ```
149
147
 
150
- Typically you would use a stack-allocated `pm_buffer_t` and call `pm_parse_serialize`, as in:
148
+ Typically you would use a stack-allocated `pm_buffer_t` and call `pm_serialize_parse`, as in:
151
149
 
152
150
  ```c
153
151
  void
154
152
  serialize(const uint8_t *source, size_t length) {
155
- pm_buffer_t buffer;
156
- if (!pm_buffer_init(&buffer)) return;
153
+ pm_buffer_t buffer = { 0 };
154
+ pm_serialize_parse(&buffer, source, length, NULL);
157
155
 
158
- pm_parse_serialize(source, length, &buffer, NULL);
159
156
  // Do something with the serialized string.
160
157
 
161
158
  pm_buffer_free(&buffer);
162
159
  }
163
160
  ```
164
161
 
165
- The final argument to `pm_parse_serialize` controls the metadata of the source.
166
- This includes the filepath that the source is associated with, and any nested local variables scopes that are necessary to properly parse the file (in the case of parsing an `eval`).
167
- Note that no `varint` are used here to make it easier to produce the metadata for the caller, and also serialized size is less important here.
168
- The metadata is a serialized format itself, and is structured as follows:
162
+ The final argument to `pm_serialize_parse` is an optional string that controls the options to the parse function. This includes all of the normal options that could be passed to `pm_parser_init` through a `pm_options_t` struct, but serialized as a string to make it easier for callers through FFI. Note that no `varint` are used here to make it easier to produce the data for the caller, and also serialized size is less important here. The format of the data is structured as follows:
169
163
 
170
- | # bytes | field |
171
- | --- | --- |
172
- | `4` | the size of the filepath string |
173
- | | the filepath string |
174
- | `4` | the number of local variable scopes |
164
+ | # bytes | field |
165
+ | ------- | -------------------------- |
166
+ | `4` | the length of the filepath |
167
+ | ... | the filepath bytes |
168
+ | `4` | the line number |
169
+ | `4` | the length the encoding |
170
+ | ... | the encoding bytes |
171
+ | `1` | frozen string literal |
172
+ | `1` | suppress warnings |
173
+ | `4` | the number of scopes |
174
+ | ... | the scopes |
175
175
 
176
- Then, each local variable scope is encoded as:
176
+ Each scope is layed out as follows:
177
177
 
178
- | # bytes | field |
179
- | --- | --- |
180
- | `4` | the number of local variables in the scope |
181
- | | the local variables |
178
+ | # bytes | field |
179
+ | ------- | -------------------------- |
180
+ | `4` | the number of locals |
181
+ | ... | the locals |
182
182
 
183
- Each local variable within each scope is encoded as:
183
+ Each local is layed out as follows:
184
184
 
185
- | # bytes | field |
186
- | --- | --- |
187
- | `4` | the size of the local variable name |
188
- | | the local variable name |
185
+ | # bytes | field |
186
+ | ------- | -------------------------- |
187
+ | `4` | the length of the local |
188
+ | ... | the local bytes |
189
189
 
190
- The metadata can be `NULL` (as seen in the example above).
191
- If it is not null, then a minimal metadata string would be `"\0\0\0\0\0\0\0\0"` which would use 4 bytes to indicate an empty filepath string and 4 bytes to indicate that there were no local variable scopes.
190
+ The data can be `NULL` (as seen in the example above).