prism 0.29.0 → 1.3.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (92) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +115 -1
  3. data/CONTRIBUTING.md +0 -4
  4. data/Makefile +1 -1
  5. data/README.md +4 -0
  6. data/config.yml +920 -148
  7. data/docs/build_system.md +8 -11
  8. data/docs/fuzzing.md +1 -1
  9. data/docs/parsing_rules.md +4 -1
  10. data/docs/relocation.md +34 -0
  11. data/docs/ripper_translation.md +22 -0
  12. data/docs/serialization.md +3 -0
  13. data/ext/prism/api_node.c +2863 -2079
  14. data/ext/prism/extconf.rb +14 -37
  15. data/ext/prism/extension.c +241 -391
  16. data/ext/prism/extension.h +2 -2
  17. data/include/prism/ast.h +2156 -453
  18. data/include/prism/defines.h +58 -7
  19. data/include/prism/diagnostic.h +24 -6
  20. data/include/prism/node.h +0 -21
  21. data/include/prism/options.h +94 -3
  22. data/include/prism/parser.h +82 -40
  23. data/include/prism/regexp.h +18 -8
  24. data/include/prism/static_literals.h +3 -2
  25. data/include/prism/util/pm_char.h +1 -2
  26. data/include/prism/util/pm_constant_pool.h +0 -8
  27. data/include/prism/util/pm_integer.h +22 -15
  28. data/include/prism/util/pm_newline_list.h +11 -0
  29. data/include/prism/util/pm_string.h +28 -12
  30. data/include/prism/version.h +3 -3
  31. data/include/prism.h +47 -11
  32. data/lib/prism/compiler.rb +3 -0
  33. data/lib/prism/desugar_compiler.rb +111 -74
  34. data/lib/prism/dispatcher.rb +16 -1
  35. data/lib/prism/dot_visitor.rb +55 -34
  36. data/lib/prism/dsl.rb +660 -468
  37. data/lib/prism/ffi.rb +113 -8
  38. data/lib/prism/inspect_visitor.rb +296 -64
  39. data/lib/prism/lex_compat.rb +1 -1
  40. data/lib/prism/mutation_compiler.rb +11 -6
  41. data/lib/prism/node.rb +4262 -5023
  42. data/lib/prism/node_ext.rb +91 -14
  43. data/lib/prism/parse_result/comments.rb +0 -7
  44. data/lib/prism/parse_result/errors.rb +65 -0
  45. data/lib/prism/parse_result/newlines.rb +101 -11
  46. data/lib/prism/parse_result.rb +183 -6
  47. data/lib/prism/reflection.rb +12 -10
  48. data/lib/prism/relocation.rb +504 -0
  49. data/lib/prism/serialize.rb +496 -609
  50. data/lib/prism/string_query.rb +30 -0
  51. data/lib/prism/translation/parser/compiler.rb +185 -155
  52. data/lib/prism/translation/parser/lexer.rb +26 -4
  53. data/lib/prism/translation/parser.rb +9 -4
  54. data/lib/prism/translation/ripper.rb +23 -25
  55. data/lib/prism/translation/ruby_parser.rb +86 -17
  56. data/lib/prism/visitor.rb +3 -0
  57. data/lib/prism.rb +6 -8
  58. data/prism.gemspec +9 -5
  59. data/rbi/prism/dsl.rbi +521 -0
  60. data/rbi/prism/node.rbi +1115 -1120
  61. data/rbi/prism/parse_result.rbi +29 -0
  62. data/rbi/prism/string_query.rbi +12 -0
  63. data/rbi/prism/visitor.rbi +3 -0
  64. data/rbi/prism.rbi +36 -30
  65. data/sig/prism/dsl.rbs +190 -303
  66. data/sig/prism/mutation_compiler.rbs +1 -0
  67. data/sig/prism/node.rbs +678 -632
  68. data/sig/prism/parse_result.rbs +22 -0
  69. data/sig/prism/relocation.rbs +185 -0
  70. data/sig/prism/string_query.rbs +11 -0
  71. data/sig/prism/visitor.rbs +1 -0
  72. data/sig/prism.rbs +103 -64
  73. data/src/diagnostic.c +64 -28
  74. data/src/node.c +502 -1739
  75. data/src/options.c +76 -27
  76. data/src/prettyprint.c +188 -112
  77. data/src/prism.c +3376 -2293
  78. data/src/regexp.c +208 -71
  79. data/src/serialize.c +182 -50
  80. data/src/static_literals.c +64 -85
  81. data/src/token_type.c +4 -4
  82. data/src/util/pm_char.c +1 -1
  83. data/src/util/pm_constant_pool.c +0 -8
  84. data/src/util/pm_integer.c +53 -25
  85. data/src/util/pm_newline_list.c +29 -0
  86. data/src/util/pm_string.c +131 -80
  87. data/src/util/pm_strpbrk.c +32 -6
  88. metadata +11 -7
  89. data/include/prism/util/pm_string_list.h +0 -44
  90. data/lib/prism/debug.rb +0 -249
  91. data/lib/prism/translation/parser/rubocop.rb +0 -73
  92. data/src/util/pm_string_list.c +0 -28
data/docs/build_system.md CHANGED
@@ -16,9 +16,9 @@ The main solution for the second point seems a Makefile, otherwise many of the u
16
16
  ## General Design
17
17
 
18
18
  1. Templates are generated by `templates/template.rb`
19
- 4. The `Makefile` compiles both `libprism.a` and `libprism.{so,dylib,dll}` from the `src/**/*.c` and `include/**/*.h` files
20
- 5. The `Rakefile` `:compile` task ensures the above prerequisites are done, then calls `make`,
21
- and uses `Rake::ExtensionTask` to compile the C extension (using its `extconf.rb`), which uses `libprism.a`
19
+ 2. The `Makefile` compiles both `libprism.a` and `libprism.{so,dylib,dll}` from the `src/**/*.c` and `include/**/*.h` files
20
+ 3. The `Rakefile` `:compile` task ensures the above prerequisites are done, then calls `make`,
21
+ and uses `Rake::ExtensionTask` to compile the C extension (using its `extconf.rb`)
22
22
 
23
23
  This way there is minimal duplication, and each layer builds on the previous one and has its own responsibilities.
24
24
 
@@ -35,14 +35,11 @@ loaded per process (i.e., at most one version of the prism *gem* loaded in a pro
35
35
  ### Building the prism gem by `gem install/bundle install`
36
36
 
37
37
  The gem contains the pre-generated templates.
38
- When installing the gem, `extconf.rb` is used and that:
39
- * runs `make build/libprism.a`
40
- * compiles the C extension with mkmf
41
38
 
42
- When installing the gem on JRuby and TruffleRuby, no C extension is built, so instead of the last step,
43
- there is Ruby code using FFI which uses `libprism.{so,dylib,dll}`
44
- to implement the same methods as the C extension, but using serialization instead of many native calls/accesses
45
- (JRuby does not support C extensions, serialization is faster on TruffleRuby than the C extension).
39
+ When installing the gem on CRuby, `extconf.rb` is used and that compiles the C extension with mkmf, including both the extension files and the sources of prism itself.
40
+
41
+ When installing the gem on JRuby and TruffleRuby, no C extension is built, so instead the `extconf.rb` runs `make build/libprism.{so,dylib,dll}`.
42
+ There is Ruby code using FFI which uses `libprism.{so,dylib,dll}` to implement the same methods as the C extension, but using serialization instead of many native calls/accesses (JRuby does not support C extensions, serialization is faster on TruffleRuby than the C extension).
46
43
 
47
44
  ### Building the prism gem from git, e.g. `gem "prism", github: "ruby/prism"`
48
45
 
@@ -66,7 +63,7 @@ The script generates the templates when importing.
66
63
 
67
64
  Then when `mx build` builds TruffleRuby and the `prism` mx project inside, it runs `make`.
68
65
 
69
- Then the `prism bindings` mx project is built, which contains the [bindings](https://github.com/oracle/truffleruby/blob/master/src/main/c/prism_bindings/src/prism_bindings.c)
66
+ Then the `prism bindings` mx project is built, which contains the [bindings](https://github.com/oracle/truffleruby/blob/vm-24.1.1/src/main/c/yarp_bindings/src/yarp_bindings.c)
70
67
  and links to `libprism.a` (to avoid exporting symbols, so no conflict when installing the prism gem).
71
68
 
72
69
  ### Building prism as part of JRuby
data/docs/fuzzing.md CHANGED
@@ -26,7 +26,7 @@ fuzz
26
26
  There are currently three fuzzing targets
27
27
 
28
28
  - `pm_serialize_parse` (parse)
29
- - `pm_regexp_named_capture_group_names` (regexp)
29
+ - `pm_regexp_parse` (regexp)
30
30
 
31
31
  Respectively, fuzzing can be performed with
32
32
 
@@ -12,7 +12,10 @@ Constants in Ruby begin with an upper-case letter. This is followed by any numbe
12
12
 
13
13
  Most expressions in CRuby are non-void. This means the expression they represent resolves to a value. For example, `1 + 2` is a non-void expression, because it resolves to a method call. Even things like `class Foo; end` is a non-void expression, because it returns the last evaluated expression in the body of the class (or `nil`).
14
14
 
15
- Certain nodes, however, are void expressions, and cannot be combined to form larger expressions. For example, `BEGIN {}`, `END {}`, `alias foo bar`, and `undef foo`.
15
+ Certain nodes, however, are void expressions, and cannot be combined to form larger expressions.
16
+ * `BEGIN {}`, `END {}`, `alias foo bar`, and `undef foo` can only be at a statement position.
17
+ * The "jumps": `return`, `break`, `next`, `redo`, `retry` are void expressions.
18
+ * `value => pattern` is also considered a void expression.
16
19
 
17
20
  ## Identifiers
18
21
 
@@ -0,0 +1,34 @@
1
+ # Relocation
2
+
3
+ Prism parses deterministically for the same input. This provides a nice property that is exposed through the `#node_id` API on nodes. Effectively this means that for the same input, these values will remain consistent every time the source is parsed. This means we can reparse the source same with a `#node_id` value and find the exact same node again.
4
+
5
+ The `Relocation` module provides an API around this property. It allows you to "save" nodes and locations using a minimal amount of memory (just the node_id and a field identifier) and then reify them later. This minimizes the amount of memory you need to allocate to store this information because it does not keep around a pointer to the source string.
6
+
7
+ ## Getting started
8
+
9
+ To get started with the `Relocation` module, you would first instantiate a `Repository` object. You do this through a DSL that chains method calls for configuration. For example, if for every entry in the repository you want to store the start and end lines, the start and end code unit columns for in UTF-16, and the leading comments, you would:
10
+
11
+ ```ruby
12
+ repository = Prism::Relocation.filepath("path/to/file").lines.code_unit_columns(Encoding::UTF_16).leading_comments
13
+ ```
14
+
15
+ Now that you have the repository, you can pass it into any of the `save*` APIs on nodes or locations to create entries in the repository that will be lazily reified.
16
+
17
+ ```ruby
18
+ # assume that node is a Prism::ClassNode object
19
+ entry = node.constant_path.save(repository)
20
+ ```
21
+
22
+ Now that you have the entry object, you do not need to keep around a reference to the repository, it will be cleaned up on its own when the last entry is reified. Now, whenever you need to, you may call the associated field methods on the entry object, as in:
23
+
24
+ ```ruby
25
+ entry.start_line
26
+ entry.end_line
27
+
28
+ entry.start_code_units_column
29
+ entry.end_code_units_column
30
+
31
+ entry.leading_comments
32
+ ```
33
+
34
+ Note that if you had configured other fields to be saved, you would be able to access them as well. The first time one of these fields is accessed, the repository will reify every entry it knows about and then clean itself up. In this way, you can effectively treat them as if you had kept around lightweight versions of `Prism::Node` or `Prism::Location` objects.
@@ -48,3 +48,25 @@ ArithmeticRipper.new("1 + 2 - 3").parse # => [0]
48
48
  ```
49
49
 
50
50
  The exact names of the `on_*` methods are listed in the `Ripper` source.
51
+
52
+ ## Background
53
+
54
+ It is helpful to understand the differences between the `Ripper` library and the `Prism` library. Both libraries perform parsing and provide you with APIs to manipulate and understand the resulting syntax tree. However, there are a few key differences.
55
+
56
+ ### Design
57
+
58
+ `Ripper` is a streaming parser. This means as it is parsing Ruby code, it dispatches events back to the consumer. This allows quite a bit of flexibility. You can use it to build your own syntax tree or to find specific patterns in the code. `Prism` on the other hand returns to your the completed syntax tree _before_ it allows you to manipulate it. This means the tree that you get back is the only representation that can be generated by the parser _at parse time_ (but of course can be manipulated later).
59
+
60
+ ### Fields
61
+
62
+ We use the term "field" to mean a piece of information on a syntax tree node. `Ripper` provides the minimal number of fields to accurately represent the syntax tree for the purposes of compilation/interpretation. For example, in the callbacks for nodes that are based on keywords (`class`, `module`, `for`, `while`, etc.) you are not given the keyword itself, you need to attach it on your own. In other cases, tokens are not necessarily dispatched at all, meaning you need to find them yourself. `Prism` provides the opposite: the maximum number of fields on nodes is provided. As a tradeoff, this requires more memory, but this is chosen to make it easier on consumers.
63
+
64
+ ### Maintainability
65
+
66
+ The `Ripper` interface is not guaranteed in any way, and tends to change between patch versions of CRuby. This is largely due to the fact that `Ripper` is a by-product of the generated parser, as opposed to its own parser. As an example, in the expression `foo::bar = baz`, there are three different represents possible for the call operator, including:
67
+
68
+ * `:"::"` - Ruby 1.9 to Ruby 3.1.4
69
+ * `73` - Ruby 3.1.5 to Ruby 3.1.6
70
+ * `[:@op, "::", [lineno, column]]` - Ruby 3.2.0 and later
71
+
72
+ The `Prism` interface is guaranteed going forward to be the consistent, and the official Ruby syntax tree interface. This means you can rely on this interface without having to worry about individual changes between Ruby versions. It also is a gem, which means it is versioned based on the gem version, as opposed to being versioned based on the Ruby version. Finally, you can use `Prism` to parse multiple versions of Ruby, whereas `Ripper` is tied to the Ruby version it is running on.
@@ -116,7 +116,9 @@ Each node is structured like the following table:
116
116
  | # bytes | field |
117
117
  | --- | --- |
118
118
  | `1` | node type |
119
+ | varuint | node identifier |
119
120
  | location | node location |
121
+ | varuint | node flags |
120
122
 
121
123
  Every field on the node is then appended to the serialized string. The fields can be determined by referencing `config.yml`. Depending on the type of field, it could take a couple of different forms, described below:
122
124
 
@@ -199,6 +201,7 @@ The final argument to `pm_serialize_parse` is an optional string that controls t
199
201
  | `1` | frozen string literal |
200
202
  | `1` | command line flags |
201
203
  | `1` | syntax version, see [pm_options_version_t](https://github.com/ruby/prism/blob/main/include/prism/options.h) for valid values |
204
+ | `1` | whether or not the encoding is locked (should almost always be false) |
202
205
  | `4` | the number of scopes |
203
206
  | ... | the scopes |
204
207