ruby-ll 1.1.3-java → 2.0.0-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a5a9dc1d9c35dced64f28eaf2340cd576d990ad5
4
- data.tar.gz: 97acb8a53b364b77ef58e4e4060b21366efff1de
3
+ metadata.gz: 0f307cb5b874f4f151092083942884b5077e5fab
4
+ data.tar.gz: 1a9dbbcd10bd6898a709aa5759d5f91030272e8f
5
5
  SHA512:
6
- metadata.gz: 5a2a4f2b163adf8d50072570cd955b412c810d27b1f0a27f5c3f9a138132b012ea6059df8d337e788844194f0450f1661833029c06cdbd42899afcec8a2c260d
7
- data.tar.gz: 531c4be54e3d700126c339bd44177c7fcdb1fc2b51c03d135de9fd1d1c6c8162a7628b2a3a9bccabff5159add4a8ab82298acd3032424107f990b7da0d078e63
6
+ metadata.gz: d397ac3dac1792bb4310508aaec7ee448cfe0f09f9fa1072daa5c53f1b8ea290c2ff342897e8edfcced314d5b9fa0715970dccadea8ac0c9f0b23be1dced959b
7
+ data.tar.gz: 76771a39c8fdebf2e64f5b0535a019c22ac5aff0907b96f2a8d91ec5d5f054f742aa3fc75dd847cb1c050c1e1a60c79269308a2eb6aa596cd9923479a21b3665
data/README.md CHANGED
@@ -278,8 +278,30 @@ return specific elements from it:
278
278
  numbers = A B { val[0] };
279
279
 
280
280
  Values returned by code blocks are passed to whatever other rule called it. This
281
- allows code blocks to be used for building ASTs and the likes. If no explicit
282
- code block is defined `val` is returned as is.
281
+ allows code blocks to be used for building ASTs and the likes.
282
+
283
+ If no explicit code block is defined then ruby-ll will generate one for you. If
284
+ a branch consists out of only a single step (e.g. `A = B;`) then only the first
285
+ value is returned, otherwise all values are returned.
286
+
287
+ This means that in the following example the output will be whatever value "C"
288
+ contains:
289
+
290
+ A = B { p val[0] };
291
+ B = C;
292
+
293
+ However, here the output would be `[C, D]` as the `B` rule's branch contains
294
+ multiple steps:
295
+
296
+ A = B { p val[0] };
297
+ B = C D;
298
+
299
+ To summarize (`# =>` denotes the return value):
300
+
301
+ A = B; # => B
302
+ A = B C; # => [B, C]
303
+
304
+ You can override this behaviour simply by defining your own code block.
283
305
 
284
306
  ruby-ll parsers recurse into rules before unwinding, this means that the
285
307
  inner-most rule is processed first.
@@ -301,6 +323,50 @@ name as a terminal, as such the following is invalid:
301
323
 
302
324
  It's also an error to re-define an existing rule.
303
325
 
326
+ ### Operators
327
+
328
+ Grammars can use two operators to define a sequence of terminals/non-terminals:
329
+ the star (`*`) and plus (`+`) operators.
330
+
331
+ The star operator indicates that something should occur 0 or more times. Here
332
+ the "B" identifier could occur 0 times, once, twice or many more times:
333
+
334
+ A = B*;
335
+
336
+ The plus operator indicates that something should occur at least once followed
337
+ by any number of more occurrences. For example, this grammar states that "B"
338
+ should occur at least once but can also occur, say, 10 times:
339
+
340
+ A = B+;
341
+
342
+ Operators can be applied either to a single terminal/rule or a series of
343
+ terminals/rules grouped together using parenthesis. For example, both are
344
+ perfectly valid:
345
+
346
+ A = B+;
347
+ A = (B C)+;
348
+
349
+ When calling an operator on a single terminal/rule the corresponding entry in
350
+ the `val` array is simply set to the terminal/rule value. For example:
351
+
352
+ A = B+ { p val[0] };
353
+
354
+ For input `B B B` this would output `[B, B, B]`.
355
+
356
+ However, when grouping multiple terminals/rules using parenthesis every
357
+ occurrence is wrapped in an Array. For example:
358
+
359
+ A = (B C)+ { p val[0] };
360
+
361
+ For input `B C B C` this would output `[[B, C], [B, C]]`. To work around this
362
+ you can simply move the group of identifiers to its own rule and only return
363
+ whatever you need:
364
+
365
+ A = A1+ { p val[0] };
366
+ A1 = B C { val[0] }; # only return "B"
367
+
368
+ For input `B C B C` this would output `[B, B]`.
369
+
304
370
  ## Conflicts
305
371
 
306
372
  LL(1) grammars can have two kinds of conflicts in a rule:
@@ -0,0 +1,32 @@
1
+ # Driver Architecture
2
+
3
+ The actual parsing of input is handled by a so called "driver" represented as
4
+ the class `LL::Driver`. This class is written in either C or Java depending on
5
+ the Ruby platform that's being used. The rationale for this is simple:
6
+ performance. While Ruby is a great language it's sadly not fast enough to handle
7
+ parsing of large inputs in a way that doesn't either require lots of memory,
8
+ time or both.
9
+
10
+ Both the C and Java drivers try to use native data structures as much as
11
+ possible instead of using Ruby structures. For example, their internal parsing
12
+ stacks are native stacks. In case of Java this is an ArrayDeque, in case of C
13
+ this is a vector created using the [kvec][kvec] library as C doesn't have a
14
+ native vector structure.
15
+
16
+ The driver operates by iterating over every token supplied by the `each_token`
17
+ method (this method must be defined by a parser itself). For every input token a
18
+ callback function in C/Java is executed that determines what to parse and how to
19
+ parse it.
20
+
21
+ The parsing process largely operates on integers, only using Ruby objects where
22
+ absolutely required. For example, all steps of a rule's branch are represented
23
+ as integers. Lookup tables are also simply arrays of integers with terminals
24
+ being mapped directly to the indexes of these arrays. See ruby-ll's own parser
25
+ for examples. Note that the integers for the `rules` Array are in reverse order,
26
+ so everything that comes first is processed last.
27
+
28
+ For more information on the internals its best to refer to the C driver code
29
+ located in `ext/c/driver.c`. The Java code is largely based on this code safe
30
+ for some code comments here and there.
31
+
32
+ [kvec]: https://github.com/attractivechaos/klib/blob/master/kvec.h
@@ -5,6 +5,10 @@
5
5
  #define T_TERMINAL 1
6
6
  #define T_EPSILON 2
7
7
  #define T_ACTION 3
8
+ #define T_STAR 4
9
+ #define T_PLUS 5
10
+ #define T_ADD_VALUE_STACK 6
11
+ #define T_APPEND_VALUE_STACK 7
8
12
 
9
13
  ID id_config_const;
10
14
  ID id_each_token;
@@ -66,6 +70,8 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
66
70
  VALUE method;
67
71
  VALUE action_args;
68
72
  VALUE action_retval;
73
+ VALUE operator_buffer;
74
+ VALUE last_value;
69
75
  long num_args;
70
76
  long args_i;
71
77
 
@@ -113,8 +119,8 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
113
119
  }
114
120
  }
115
121
 
116
- /* Rule */
117
- if ( stack_type == T_RULE )
122
+ /* A rule or the "+" operator */
123
+ if ( stack_type == T_RULE || stack_type == T_PLUS )
118
124
  {
119
125
  production_i = state->config->table[stack_value][token_id];
120
126
 
@@ -132,6 +138,19 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
132
138
  }
133
139
  else
134
140
  {
141
+ /*
142
+ Append a "*" operator for all following occurrences as they are
143
+ optional
144
+ */
145
+ if ( stack_type == T_PLUS )
146
+ {
147
+ kv_push(long, state->stack, T_STAR);
148
+ kv_push(long, state->stack, stack_value);
149
+
150
+ kv_push(long, state->stack, T_APPEND_VALUE_STACK);
151
+ kv_push(long, state->stack, 0);
152
+ }
153
+
135
154
  FOR(rule_i, state->config->rule_lengths[production_i])
136
155
  {
137
156
  kv_push(
@@ -142,6 +161,56 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
142
161
  }
143
162
  }
144
163
  }
164
+ /* "*" operator */
165
+ else if ( stack_type == T_STAR )
166
+ {
167
+ production_i = state->config->table[stack_value][token_id];
168
+
169
+ if ( production_i != T_EOF )
170
+ {
171
+ kv_push(long, state->stack, T_STAR);
172
+ kv_push(long, state->stack, stack_value);
173
+
174
+ kv_push(long, state->stack, T_APPEND_VALUE_STACK);
175
+ kv_push(long, state->stack, 0);
176
+
177
+ FOR(rule_i, state->config->rule_lengths[production_i])
178
+ {
179
+ kv_push(
180
+ long,
181
+ state->stack,
182
+ state->config->rules[production_i][rule_i]
183
+ );
184
+ }
185
+ }
186
+ }
187
+ /*
188
+ Adds a new array to the value stack that can be used to group operator
189
+ values together
190
+ */
191
+ else if ( stack_type == T_ADD_VALUE_STACK )
192
+ {
193
+ operator_buffer = rb_ary_new();
194
+
195
+ kv_push(VALUE, state->value_stack, operator_buffer);
196
+
197
+ RB_GC_GUARD(operator_buffer);
198
+ }
199
+ /*
200
+ Appends the last value on the value stack to the operator buffer that
201
+ preceeds it.
202
+ */
203
+ else if ( stack_type == T_APPEND_VALUE_STACK )
204
+ {
205
+ last_value = kv_pop(state->value_stack);
206
+
207
+ operator_buffer = kv_A(
208
+ state->value_stack,
209
+ kv_size(state->value_stack) - 1
210
+ );
211
+
212
+ rb_ary_push(operator_buffer, last_value);
213
+ }
145
214
  /* Terminal */
146
215
  else if ( stack_type == T_TERMINAL )
147
216
  {
@@ -166,7 +166,7 @@ VALUE ll_driver_config_set_actions(VALUE self, VALUE array)
166
166
 
167
167
  Data_Get_Struct(self, DriverConfig, config);
168
168
 
169
- config->action_names = ALLOC_N(ID, row_count);
169
+ config->action_names = ALLOC_N(VALUE, row_count);
170
170
  config->action_arg_amounts = ALLOC_N(long, row_count);
171
171
 
172
172
  FOR(rindex, row_count)
@@ -30,7 +30,7 @@ typedef struct
30
30
  long **table;
31
31
 
32
32
  /* Array containing action names as Symbols */
33
- ID *action_names;
33
+ VALUE *action_names;
34
34
 
35
35
  /* Array containing the arity for every action */
36
36
  long *action_arg_amounts;
@@ -27,11 +27,15 @@ import org.jruby.runtime.builtin.IRubyObject;
27
27
  @JRubyClass(name="LL::Driver", parent="Object")
28
28
  public class Driver extends RubyObject
29
29
  {
30
- private static long T_EOF = -1;
31
- private static long T_RULE = 0;
32
- private static long T_TERMINAL = 1;
33
- private static long T_EPSILON = 2;
34
- private static long T_ACTION = 3;
30
+ private static long T_EOF = -1;
31
+ private static long T_RULE = 0;
32
+ private static long T_TERMINAL = 1;
33
+ private static long T_EPSILON = 2;
34
+ private static long T_ACTION = 3;
35
+ private static long T_STAR = 4;
36
+ private static long T_PLUS = 5;
37
+ private static long T_ADD_VALUE_STACK = 6;
38
+ private static long T_APPEND_VALUE_STACK = 7;
35
39
 
36
40
  /**
37
41
  * The current Ruby runtime.
@@ -132,8 +136,8 @@ public class Driver extends RubyObject
132
136
  token_id = self.config.terminals.get(type);
133
137
  }
134
138
 
135
- // Rule
136
- if ( stack_type == self.T_RULE )
139
+ // A rule or the "+" operator
140
+ if ( stack_type == self.T_RULE || stack_type == self.T_PLUS )
137
141
  {
138
142
  Long production_i = self.config.table
139
143
  .get(stack_value.intValue())
@@ -152,6 +156,17 @@ public class Driver extends RubyObject
152
156
  }
153
157
  else
154
158
  {
159
+ // Append a "*" operator for all following
160
+ // occurrences as they are optional
161
+ if ( stack_type == self.T_PLUS )
162
+ {
163
+ stack.push(self.T_STAR);
164
+ stack.push(stack_value);
165
+
166
+ stack.push(self.T_APPEND_VALUE_STACK);
167
+ stack.push(Long.valueOf(0));
168
+ }
169
+
155
170
  ArrayList<Long> row = self.config.rules
156
171
  .get(production_i.intValue());
157
172
 
@@ -161,6 +176,47 @@ public class Driver extends RubyObject
161
176
  }
162
177
  }
163
178
  }
179
+ // "*" operator
180
+ else if ( stack_type == self.T_STAR )
181
+ {
182
+ Long production_i = self.config.table
183
+ .get(stack_value.intValue())
184
+ .get(token_id.intValue());
185
+
186
+ if ( production_i != self.T_EOF )
187
+ {
188
+ stack.push(self.T_STAR);
189
+ stack.push(stack_value);
190
+
191
+ stack.push(self.T_APPEND_VALUE_STACK);
192
+ stack.push(Long.valueOf(0));
193
+
194
+ ArrayList<Long> row = self.config.rules
195
+ .get(production_i.intValue());
196
+
197
+ for ( int index = 0; index < row.size(); index++ )
198
+ {
199
+ stack.push(row.get(index));
200
+ }
201
+ }
202
+ }
203
+ // Adds a new array to the value stack that can be used to
204
+ // group operator values together
205
+ else if ( stack_type == self.T_ADD_VALUE_STACK )
206
+ {
207
+ RubyArray operator_buffer = self.runtime.newArray();
208
+
209
+ value_stack.push(operator_buffer);
210
+ }
211
+ // Appends the last value on the value stack to the operator
212
+ // buffer that preceeds it.
213
+ else if ( stack_type == self.T_APPEND_VALUE_STACK )
214
+ {
215
+ IRubyObject last_value = value_stack.pop();
216
+ RubyArray operator_buffer = (RubyArray) value_stack.peek();
217
+
218
+ operator_buffer.append(last_value);
219
+ }
164
220
  // Terminal
165
221
  else if ( stack_type == self.T_TERMINAL )
166
222
  {
Binary file
data/lib/ll.rb CHANGED
@@ -18,6 +18,7 @@ require_relative 'll/rule'
18
18
  require_relative 'll/branch'
19
19
  require_relative 'll/terminal'
20
20
  require_relative 'll/epsilon'
21
+ require_relative 'll/operator'
21
22
  require_relative 'll/message'
22
23
  require_relative 'll/ast/node'
23
24
  require_relative 'll/erb_context'
@@ -25,7 +25,13 @@ module LL
25
25
  def first_set
26
26
  first = steps[0]
27
27
 
28
- return first.is_a?(Rule) ? first.first_set : [first]
28
+ if first.is_a?(Rule)
29
+ return first.first_set
30
+ elsif first
31
+ return [first]
32
+ else
33
+ return []
34
+ end
29
35
  end
30
36
 
31
37
  ##
@@ -8,11 +8,15 @@ module LL
8
8
  # @return [Hash]
9
9
  #
10
10
  TYPES = {
11
- :eof => -1,
12
- :rule => 0,
13
- :terminal => 1,
14
- :epsilon => 2,
15
- :action => 3
11
+ :eof => -1,
12
+ :rule => 0,
13
+ :terminal => 1,
14
+ :epsilon => 2,
15
+ :action => 3,
16
+ :star => 4,
17
+ :plus => 5,
18
+ :add_value_stack => 6,
19
+ :append_value_stack => 7
16
20
  }.freeze
17
21
 
18
22
  ##
@@ -105,7 +109,21 @@ module LL
105
109
 
106
110
  grammar.rules.each do |rule|
107
111
  rule.branches.each do |branch|
108
- bodies[:"_rule_#{index}"] = branch.ruby_code || DEFAULT_RUBY_CODE
112
+ if branch.ruby_code
113
+ code = branch.ruby_code
114
+
115
+ # If a branch only contains a single, non-epsilon step we can just
116
+ # return that value as-is. This makes parsing code a little bit
117
+ # easier.
118
+ elsif !branch.ruby_code and branch.steps.length == 1 \
119
+ and !branch.steps[0].is_a?(Epsilon)
120
+ code = 'val[0]'
121
+
122
+ else
123
+ code = DEFAULT_RUBY_CODE
124
+ end
125
+
126
+ bodies[:"_rule_#{index}"] = code
109
127
 
110
128
  index += 1
111
129
  end
@@ -133,17 +151,24 @@ module LL
133
151
  action_index += 1
134
152
 
135
153
  branch.steps.reverse_each do |step|
136
- if step.is_a?(LL::Terminal)
154
+ if step.is_a?(Terminal)
137
155
  row << TYPES[:terminal]
138
156
  row << term_indices[step] + 1
139
157
 
140
- elsif step.is_a?(LL::Rule)
158
+ elsif step.is_a?(Rule)
141
159
  row << TYPES[:rule]
142
160
  row << rule_indices[step]
143
161
 
144
- elsif step.is_a?(LL::Epsilon)
162
+ elsif step.is_a?(Epsilon)
145
163
  row << TYPES[:epsilon]
146
164
  row << 0
165
+
166
+ elsif step.is_a?(Operator)
167
+ row << TYPES[step.type]
168
+ row << rule_indices[step.receiver]
169
+
170
+ row << TYPES[:add_value_stack]
171
+ row << 0
147
172
  end
148
173
  end
149
174