ruby-ll 1.1.3-java → 2.0.0-java

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a5a9dc1d9c35dced64f28eaf2340cd576d990ad5
4
- data.tar.gz: 97acb8a53b364b77ef58e4e4060b21366efff1de
3
+ metadata.gz: 0f307cb5b874f4f151092083942884b5077e5fab
4
+ data.tar.gz: 1a9dbbcd10bd6898a709aa5759d5f91030272e8f
5
5
  SHA512:
6
- metadata.gz: 5a2a4f2b163adf8d50072570cd955b412c810d27b1f0a27f5c3f9a138132b012ea6059df8d337e788844194f0450f1661833029c06cdbd42899afcec8a2c260d
7
- data.tar.gz: 531c4be54e3d700126c339bd44177c7fcdb1fc2b51c03d135de9fd1d1c6c8162a7628b2a3a9bccabff5159add4a8ab82298acd3032424107f990b7da0d078e63
6
+ metadata.gz: d397ac3dac1792bb4310508aaec7ee448cfe0f09f9fa1072daa5c53f1b8ea290c2ff342897e8edfcced314d5b9fa0715970dccadea8ac0c9f0b23be1dced959b
7
+ data.tar.gz: 76771a39c8fdebf2e64f5b0535a019c22ac5aff0907b96f2a8d91ec5d5f054f742aa3fc75dd847cb1c050c1e1a60c79269308a2eb6aa596cd9923479a21b3665
data/README.md CHANGED
@@ -278,8 +278,30 @@ return specific elements from it:
278
278
  numbers = A B { val[0] };
279
279
 
280
280
  Values returned by code blocks are passed to whatever other rule called it. This
281
- allows code blocks to be used for building ASTs and the likes. If no explicit
282
- code block is defined `val` is returned as is.
281
+ allows code blocks to be used for building ASTs and the likes.
282
+
283
+ If no explicit code block is defined then ruby-ll will generate one for you. If
284
+ a branch consists out of only a single step (e.g. `A = B;`) then only the first
285
+ value is returned, otherwise all values are returned.
286
+
287
+ This means that in the following example the output will be whatever value "C"
288
+ contains:
289
+
290
+ A = B { p val[0] };
291
+ B = C;
292
+
293
+ However, here the output would be `[C, D]` as the `B` rule's branch contains
294
+ multiple steps:
295
+
296
+ A = B { p val[0] };
297
+ B = C D;
298
+
299
+ To summarize (`# =>` denotes the return value):
300
+
301
+ A = B; # => B
302
+ A = B C; # => [B, C]
303
+
304
+ You can override this behaviour simply by defining your own code block.
283
305
 
284
306
  ruby-ll parsers recurse into rules before unwinding, this means that the
285
307
  inner-most rule is processed first.
@@ -301,6 +323,50 @@ name as a terminal, as such the following is invalid:
301
323
 
302
324
  It's also an error to re-define an existing rule.
303
325
 
326
+ ### Operators
327
+
328
+ Grammars can use two operators to define a sequence of terminals/non-terminals:
329
+ the star (`*`) and plus (`+`) operators.
330
+
331
+ The star operator indicates that something should occur 0 or more times. Here
332
+ the "B" identifier could occur 0 times, once, twice or many more times:
333
+
334
+ A = B*;
335
+
336
+ The plus operator indicates that something should occur at least once followed
337
+ by any number of more occurrences. For example, this grammar states that "B"
338
+ should occur at least once but can also occur, say, 10 times:
339
+
340
+ A = B+;
341
+
342
+ Operators can be applied either to a single terminal/rule or a series of
343
+ terminals/rules grouped together using parenthesis. For example, both are
344
+ perfectly valid:
345
+
346
+ A = B+;
347
+ A = (B C)+;
348
+
349
+ When calling an operator on a single terminal/rule the corresponding entry in
350
+ the `val` array is simply set to the terminal/rule value. For example:
351
+
352
+ A = B+ { p val[0] };
353
+
354
+ For input `B B B` this would output `[B, B, B]`.
355
+
356
+ However, when grouping multiple terminals/rules using parenthesis every
357
+ occurrence is wrapped in an Array. For example:
358
+
359
+ A = (B C)+ { p val[0] };
360
+
361
+ For input `B C B C` this would output `[[B, C], [B, C]]`. To work around this
362
+ you can simply move the group of identifiers to its own rule and only return
363
+ whatever you need:
364
+
365
+ A = A1+ { p val[0] };
366
+ A1 = B C { val[0] }; # only return "B"
367
+
368
+ For input `B C B C` this would output `[B, B]`.
369
+
304
370
  ## Conflicts
305
371
 
306
372
  LL(1) grammars can have two kinds of conflicts in a rule:
@@ -0,0 +1,32 @@
1
+ # Driver Architecture
2
+
3
+ The actual parsing of input is handled by a so called "driver" represented as
4
+ the class `LL::Driver`. This class is written in either C or Java depending on
5
+ the Ruby platform that's being used. The rationale for this is simple:
6
+ performance. While Ruby is a great language it's sadly not fast enough to handle
7
+ parsing of large inputs in a way that doesn't either require lots of memory,
8
+ time or both.
9
+
10
+ Both the C and Java drivers try to use native data structures as much as
11
+ possible instead of using Ruby structures. For example, their internal parsing
12
+ stacks are native stacks. In case of Java this is an ArrayDeque, in case of C
13
+ this is a vector created using the [kvec][kvec] library as C doesn't have a
14
+ native vector structure.
15
+
16
+ The driver operates by iterating over every token supplied by the `each_token`
17
+ method (this method must be defined by a parser itself). For every input token a
18
+ callback function in C/Java is executed that determines what to parse and how to
19
+ parse it.
20
+
21
+ The parsing process largely operates on integers, only using Ruby objects where
22
+ absolutely required. For example, all steps of a rule's branch are represented
23
+ as integers. Lookup tables are also simply arrays of integers with terminals
24
+ being mapped directly to the indexes of these arrays. See ruby-ll's own parser
25
+ for examples. Note that the integers for the `rules` Array are in reverse order,
26
+ so everything that comes first is processed last.
27
+
28
+ For more information on the internals its best to refer to the C driver code
29
+ located in `ext/c/driver.c`. The Java code is largely based on this code safe
30
+ for some code comments here and there.
31
+
32
+ [kvec]: https://github.com/attractivechaos/klib/blob/master/kvec.h
@@ -5,6 +5,10 @@
5
5
  #define T_TERMINAL 1
6
6
  #define T_EPSILON 2
7
7
  #define T_ACTION 3
8
+ #define T_STAR 4
9
+ #define T_PLUS 5
10
+ #define T_ADD_VALUE_STACK 6
11
+ #define T_APPEND_VALUE_STACK 7
8
12
 
9
13
  ID id_config_const;
10
14
  ID id_each_token;
@@ -66,6 +70,8 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
66
70
  VALUE method;
67
71
  VALUE action_args;
68
72
  VALUE action_retval;
73
+ VALUE operator_buffer;
74
+ VALUE last_value;
69
75
  long num_args;
70
76
  long args_i;
71
77
 
@@ -113,8 +119,8 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
113
119
  }
114
120
  }
115
121
 
116
- /* Rule */
117
- if ( stack_type == T_RULE )
122
+ /* A rule or the "+" operator */
123
+ if ( stack_type == T_RULE || stack_type == T_PLUS )
118
124
  {
119
125
  production_i = state->config->table[stack_value][token_id];
120
126
 
@@ -132,6 +138,19 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
132
138
  }
133
139
  else
134
140
  {
141
+ /*
142
+ Append a "*" operator for all following occurrences as they are
143
+ optional
144
+ */
145
+ if ( stack_type == T_PLUS )
146
+ {
147
+ kv_push(long, state->stack, T_STAR);
148
+ kv_push(long, state->stack, stack_value);
149
+
150
+ kv_push(long, state->stack, T_APPEND_VALUE_STACK);
151
+ kv_push(long, state->stack, 0);
152
+ }
153
+
135
154
  FOR(rule_i, state->config->rule_lengths[production_i])
136
155
  {
137
156
  kv_push(
@@ -142,6 +161,56 @@ VALUE ll_driver_each_token(VALUE token, VALUE self)
142
161
  }
143
162
  }
144
163
  }
164
+ /* "*" operator */
165
+ else if ( stack_type == T_STAR )
166
+ {
167
+ production_i = state->config->table[stack_value][token_id];
168
+
169
+ if ( production_i != T_EOF )
170
+ {
171
+ kv_push(long, state->stack, T_STAR);
172
+ kv_push(long, state->stack, stack_value);
173
+
174
+ kv_push(long, state->stack, T_APPEND_VALUE_STACK);
175
+ kv_push(long, state->stack, 0);
176
+
177
+ FOR(rule_i, state->config->rule_lengths[production_i])
178
+ {
179
+ kv_push(
180
+ long,
181
+ state->stack,
182
+ state->config->rules[production_i][rule_i]
183
+ );
184
+ }
185
+ }
186
+ }
187
+ /*
188
+ Adds a new array to the value stack that can be used to group operator
189
+ values together
190
+ */
191
+ else if ( stack_type == T_ADD_VALUE_STACK )
192
+ {
193
+ operator_buffer = rb_ary_new();
194
+
195
+ kv_push(VALUE, state->value_stack, operator_buffer);
196
+
197
+ RB_GC_GUARD(operator_buffer);
198
+ }
199
+ /*
200
+ Appends the last value on the value stack to the operator buffer that
201
+ preceeds it.
202
+ */
203
+ else if ( stack_type == T_APPEND_VALUE_STACK )
204
+ {
205
+ last_value = kv_pop(state->value_stack);
206
+
207
+ operator_buffer = kv_A(
208
+ state->value_stack,
209
+ kv_size(state->value_stack) - 1
210
+ );
211
+
212
+ rb_ary_push(operator_buffer, last_value);
213
+ }
145
214
  /* Terminal */
146
215
  else if ( stack_type == T_TERMINAL )
147
216
  {
@@ -166,7 +166,7 @@ VALUE ll_driver_config_set_actions(VALUE self, VALUE array)
166
166
 
167
167
  Data_Get_Struct(self, DriverConfig, config);
168
168
 
169
- config->action_names = ALLOC_N(ID, row_count);
169
+ config->action_names = ALLOC_N(VALUE, row_count);
170
170
  config->action_arg_amounts = ALLOC_N(long, row_count);
171
171
 
172
172
  FOR(rindex, row_count)
@@ -30,7 +30,7 @@ typedef struct
30
30
  long **table;
31
31
 
32
32
  /* Array containing action names as Symbols */
33
- ID *action_names;
33
+ VALUE *action_names;
34
34
 
35
35
  /* Array containing the arity for every action */
36
36
  long *action_arg_amounts;
@@ -27,11 +27,15 @@ import org.jruby.runtime.builtin.IRubyObject;
27
27
  @JRubyClass(name="LL::Driver", parent="Object")
28
28
  public class Driver extends RubyObject
29
29
  {
30
- private static long T_EOF = -1;
31
- private static long T_RULE = 0;
32
- private static long T_TERMINAL = 1;
33
- private static long T_EPSILON = 2;
34
- private static long T_ACTION = 3;
30
+ private static long T_EOF = -1;
31
+ private static long T_RULE = 0;
32
+ private static long T_TERMINAL = 1;
33
+ private static long T_EPSILON = 2;
34
+ private static long T_ACTION = 3;
35
+ private static long T_STAR = 4;
36
+ private static long T_PLUS = 5;
37
+ private static long T_ADD_VALUE_STACK = 6;
38
+ private static long T_APPEND_VALUE_STACK = 7;
35
39
 
36
40
  /**
37
41
  * The current Ruby runtime.
@@ -132,8 +136,8 @@ public class Driver extends RubyObject
132
136
  token_id = self.config.terminals.get(type);
133
137
  }
134
138
 
135
- // Rule
136
- if ( stack_type == self.T_RULE )
139
+ // A rule or the "+" operator
140
+ if ( stack_type == self.T_RULE || stack_type == self.T_PLUS )
137
141
  {
138
142
  Long production_i = self.config.table
139
143
  .get(stack_value.intValue())
@@ -152,6 +156,17 @@ public class Driver extends RubyObject
152
156
  }
153
157
  else
154
158
  {
159
+ // Append a "*" operator for all following
160
+ // occurrences as they are optional
161
+ if ( stack_type == self.T_PLUS )
162
+ {
163
+ stack.push(self.T_STAR);
164
+ stack.push(stack_value);
165
+
166
+ stack.push(self.T_APPEND_VALUE_STACK);
167
+ stack.push(Long.valueOf(0));
168
+ }
169
+
155
170
  ArrayList<Long> row = self.config.rules
156
171
  .get(production_i.intValue());
157
172
 
@@ -161,6 +176,47 @@ public class Driver extends RubyObject
161
176
  }
162
177
  }
163
178
  }
179
+ // "*" operator
180
+ else if ( stack_type == self.T_STAR )
181
+ {
182
+ Long production_i = self.config.table
183
+ .get(stack_value.intValue())
184
+ .get(token_id.intValue());
185
+
186
+ if ( production_i != self.T_EOF )
187
+ {
188
+ stack.push(self.T_STAR);
189
+ stack.push(stack_value);
190
+
191
+ stack.push(self.T_APPEND_VALUE_STACK);
192
+ stack.push(Long.valueOf(0));
193
+
194
+ ArrayList<Long> row = self.config.rules
195
+ .get(production_i.intValue());
196
+
197
+ for ( int index = 0; index < row.size(); index++ )
198
+ {
199
+ stack.push(row.get(index));
200
+ }
201
+ }
202
+ }
203
+ // Adds a new array to the value stack that can be used to
204
+ // group operator values together
205
+ else if ( stack_type == self.T_ADD_VALUE_STACK )
206
+ {
207
+ RubyArray operator_buffer = self.runtime.newArray();
208
+
209
+ value_stack.push(operator_buffer);
210
+ }
211
+ // Appends the last value on the value stack to the operator
212
+ // buffer that preceeds it.
213
+ else if ( stack_type == self.T_APPEND_VALUE_STACK )
214
+ {
215
+ IRubyObject last_value = value_stack.pop();
216
+ RubyArray operator_buffer = (RubyArray) value_stack.peek();
217
+
218
+ operator_buffer.append(last_value);
219
+ }
164
220
  // Terminal
165
221
  else if ( stack_type == self.T_TERMINAL )
166
222
  {
Binary file
data/lib/ll.rb CHANGED
@@ -18,6 +18,7 @@ require_relative 'll/rule'
18
18
  require_relative 'll/branch'
19
19
  require_relative 'll/terminal'
20
20
  require_relative 'll/epsilon'
21
+ require_relative 'll/operator'
21
22
  require_relative 'll/message'
22
23
  require_relative 'll/ast/node'
23
24
  require_relative 'll/erb_context'
@@ -25,7 +25,13 @@ module LL
25
25
  def first_set
26
26
  first = steps[0]
27
27
 
28
- return first.is_a?(Rule) ? first.first_set : [first]
28
+ if first.is_a?(Rule)
29
+ return first.first_set
30
+ elsif first
31
+ return [first]
32
+ else
33
+ return []
34
+ end
29
35
  end
30
36
 
31
37
  ##
@@ -8,11 +8,15 @@ module LL
8
8
  # @return [Hash]
9
9
  #
10
10
  TYPES = {
11
- :eof => -1,
12
- :rule => 0,
13
- :terminal => 1,
14
- :epsilon => 2,
15
- :action => 3
11
+ :eof => -1,
12
+ :rule => 0,
13
+ :terminal => 1,
14
+ :epsilon => 2,
15
+ :action => 3,
16
+ :star => 4,
17
+ :plus => 5,
18
+ :add_value_stack => 6,
19
+ :append_value_stack => 7
16
20
  }.freeze
17
21
 
18
22
  ##
@@ -105,7 +109,21 @@ module LL
105
109
 
106
110
  grammar.rules.each do |rule|
107
111
  rule.branches.each do |branch|
108
- bodies[:"_rule_#{index}"] = branch.ruby_code || DEFAULT_RUBY_CODE
112
+ if branch.ruby_code
113
+ code = branch.ruby_code
114
+
115
+ # If a branch only contains a single, non-epsilon step we can just
116
+ # return that value as-is. This makes parsing code a little bit
117
+ # easier.
118
+ elsif !branch.ruby_code and branch.steps.length == 1 \
119
+ and !branch.steps[0].is_a?(Epsilon)
120
+ code = 'val[0]'
121
+
122
+ else
123
+ code = DEFAULT_RUBY_CODE
124
+ end
125
+
126
+ bodies[:"_rule_#{index}"] = code
109
127
 
110
128
  index += 1
111
129
  end
@@ -133,17 +151,24 @@ module LL
133
151
  action_index += 1
134
152
 
135
153
  branch.steps.reverse_each do |step|
136
- if step.is_a?(LL::Terminal)
154
+ if step.is_a?(Terminal)
137
155
  row << TYPES[:terminal]
138
156
  row << term_indices[step] + 1
139
157
 
140
- elsif step.is_a?(LL::Rule)
158
+ elsif step.is_a?(Rule)
141
159
  row << TYPES[:rule]
142
160
  row << rule_indices[step]
143
161
 
144
- elsif step.is_a?(LL::Epsilon)
162
+ elsif step.is_a?(Epsilon)
145
163
  row << TYPES[:epsilon]
146
164
  row << 0
165
+
166
+ elsif step.is_a?(Operator)
167
+ row << TYPES[step.type]
168
+ row << rule_indices[step.receiver]
169
+
170
+ row << TYPES[:add_value_stack]
171
+ row << 0
147
172
  end
148
173
  end
149
174