kpeg 0.8.0 → 0.8.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -23,16 +23,20 @@ After that a block of ruby code can be defined that will be added into the class
23
23
 
24
24
  ### Defining literals
25
25
 
26
- Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables.
26
+ Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables. Literals can take strings, regular expressions or character ranges
27
27
 
28
28
  ALPHA = /[A-Za-z]/
29
29
  DIGIT = /[0-9]/
30
30
  period = "."
31
+ string = "a string"
32
+ regex = /(regexs?)+/
33
+ char_range = [b-t]
31
34
 
32
35
  Literals can also accept multiple definitions
33
36
 
34
37
  vowel = "a" | "e" | "i" | "o" | "u"
35
38
  alpha = /[A-Z]/ | /[a-z]/
39
+
36
40
 
37
41
  ### Defining Rules for Values
38
42
 
@@ -41,29 +45,111 @@ Before you can start parsing a string you will need to define rules that you wil
41
45
  The most basic of these rules is a string capture
42
46
 
43
47
  alpha = < /[A-Za-z]/ > { text }
48
+
44
49
 
45
50
  While this looks very much like the ALPHA literal defined above it differs in one important way, the text captured by the rule defined between the < and > symbols will be set as the text variable in block that follows. You can also explicitly define the variable that you would like but only with existing rules or literals.
46
51
 
52
+ letter = alpha:a { a }
53
+
54
+ Additionally blocks can return true or false values based upon an expression within the block. To return true if a test passes do the following:
55
+
56
+ match_greater_than_10 = < num:n > &{ n > 10 }
57
+
58
+ To test and return a false value if the test passes do the following:
59
+
60
+ do_not_match_greater_than_10 = < num:n > !{ n > 10 }
61
+
62
+ Rules can also act like functions and take parameters. An example of this is lifted from the [Email List Validator](https://github.com/larb/email_address_validator), where an ascii value is passed in and the character is evaluated against it returning a true if it matches
63
+
64
+ d(num) = <.> &{ text[0] == num }
65
+
66
+ Rules support some regular expression syntax for matching
67
+
68
+ + maybe ?
69
+ + many
70
+ + kleene *
71
+ + groupings ()
72
+
73
+ Examples
74
+
75
+ letters = alpha+
76
+ words = alpha+ space* period?
77
+ sentence = (letters+ | space+)+
78
+
79
+ Kpeg also allows a rule to define the acceptable number of matches in the form of a range. In regular expressions this is often denoted with syntax like {0,3}. Kpeg uses this syntax to accomplish match ranges [min, max].
80
+
81
+ matches_3_to_5_times = letter[3,5]
82
+ matches_3_to_any_times = letter[3,*]
83
+
84
+
85
+ ### Defining Actions
86
+
87
+ Illustrated above in some of the examples, kpeg allows you to perform actions based upon a match that are described in block provided or in the rule definition itself.
88
+
47
89
  num = /[1-9][0-9]*/
48
90
  sum = < num:n1 "+" num:n2 > { n1 + n2 }
91
+
92
+ ### Referencing an external grammar
93
+
94
+ Kpeg allows you to run a rule that is defined in an external grammar. This is useful if there is a defined set of rules that you would like to reuse in another parser. To do this, create your grammar and generate a parser using the kpeg command line tool.
95
+
96
+ kpeg literals.kpeg
97
+
98
+ Once you have the generated parser, include that file into your new grammar
99
+
100
+ %{
101
+ require "literals.kpeg.rb"
102
+ }
49
103
 
50
- Additionally blocks can return true or false values based upon an expression within the block. To test if something is true do the following:
104
+ Then create a variable to hold to foreign interface and pass it the class name of your parser. In this case my parser class name is Literal
105
+
106
+ %foreign_grammer = Literal
51
107
 
52
- greater_than_10 = < num:n > &{ n > 10 }
108
+ You can then use rules defined in the foreign grammar in the local grammar file like so
109
+
110
+ sentence = (%foreign_grammer.alpha %foreign_grammer.space*)+ %foreign_grammer.period
111
+
112
+ ### Comments
113
+
114
+ Kpeg allows comments to be added to the grammar file by using the # symbol
115
+
116
+ # This is a comment in my grammar
53
117
 
54
- To test for a false value do the following:
118
+ ## Generating and running your parser
119
+
120
+ Before you can generate your parser you will need to define a root rule. This will be the first rule run against the string provided to the parser
55
121
 
56
- not_greater_than_10 = < num:n > !{ n > 10 }
122
+ root = sentence
57
123
 
58
- Rules can also act like functions and take parameters, an example of this is can be lifted from the [Email List Validator](https://github.com/andrewvc/email_address_validator), where an ascii value is passed in and the character is evaluated against it returning a true if it matches
124
+ To generate the parser run the kpeg command with the kpeg file(s) as an argument. This will generate a ruby file with the same name as your grammar file.
125
+
126
+ kpeg example.kpeg
59
127
 
60
- d(num) = <.> &{ text[0] == num }
128
+ Include your generated parser file into an application that you want to use the parser in and run it. Create a new instance of the parser and pass in the string you want to evaluate. When parse is called on the parser instance it will return a true if the sting is matched, or false if it doesn't.
129
+
130
+ require "example.kpeg.rb"
61
131
 
132
+ parser = Example::Parser.new(string_to_evaluate)
133
+ parser.parse
134
+
135
+ ## Shortcuts and other techniques
136
+
137
+ Per vito, you can get the current line or current column in the following way
62
138
 
139
+ line = { current_line }
140
+ column = { current_column }
141
+ foo = line:line ... { # use line here }
142
+
143
+ ## Examples
144
+
145
+ There are several examples available in the /examples directory. The upper parser has a readme with a step by step description of the grammar.
63
146
 
64
147
  ## Projects using kpeg
65
148
 
66
149
  [Dang](https://github.com/veganstraightedge/dang)
67
- [Email Address Validator](https://github.com/andrewvc/email_address_validator)
150
+
151
+ [Email Address Validator](https://github.com/larb/email_address_validator)
152
+
68
153
  [Callisto](https://github.com/dwaite/Callisto)
69
- [Doodle](https://github.com/vito/doodle)
154
+
155
+ [Doodle](https://github.com/vito/doodle)
data/Rakefile CHANGED
@@ -19,6 +19,7 @@ task :grammar do
19
19
  gr.render(STDOUT)
20
20
  end
21
21
 
22
+ desc "rebuild parser"
22
23
  task :parser do
23
24
  sh "ruby -Ilib bin/kpeg -o lib/kpeg/format_parser.rb -s -f lib/kpeg/format.kpeg"
24
25
  end
data/kpeg.gemspec CHANGED
@@ -15,7 +15,7 @@ Gem::Specification.new do |s|
15
15
  rb = Dir["lib/**/*.rb"] << "bin/kpeg"
16
16
  docs = Dir["doc/**/*"]
17
17
 
18
- s.files = rb + docs + ["LICENSE", "README.md", "Rakefile", "kpeg.gemspec", "Gemfile"]
18
+ s.files = rb + docs + ["README.md", "Rakefile", "kpeg.gemspec", "Gemfile"]
19
19
  s.test_files = Dir["test/**/*.rb"]
20
20
  s.bindir = "bin"
21
21
  s.executables = ["kpeg"]
@@ -63,7 +63,11 @@ module KPeg
63
63
 
64
64
  methods = []
65
65
 
66
- @grammar.variables.each do |name, val|
66
+ vars = @grammar.variables.keys.sort
67
+
68
+ vars.each do |name|
69
+ val = @grammar.variables[name]
70
+
67
71
  if val.index("ast ") == 0
68
72
  unless output_node
69
73
  code << "\n"
@@ -86,22 +90,32 @@ module KPeg
86
90
  end
87
91
  end
88
92
  end
89
-
90
- def output_op(code, op)
93
+
94
+ def indentify(code, indent)
95
+ "#{" " * indent}#{code}"
96
+ end
97
+
98
+ # Default indent is 4 spaces (indent=2)
99
+ def output_op(code, op, indent=2)
91
100
  case op
92
101
  when Dot
93
- code << " _tmp = get_byte\n"
102
+ code << indentify("_tmp = get_byte\n", indent)
94
103
  when LiteralString
95
- code << " _tmp = match_string(#{op.string.dump})\n"
104
+ code << indentify("_tmp = match_string(#{op.string.dump})\n", indent)
96
105
  when LiteralRegexp
97
- lang = op.regexp.kcode.to_s[0,1]
98
- code << " _tmp = scan(/\\A#{op.regexp}/#{lang})\n"
106
+ if op.regexp.respond_to?(:kcode)
107
+ lang = op.regexp.kcode.to_s[0,1]
108
+ else
109
+ # Let default ruby string handling figure it out
110
+ lang = ""
111
+ end
112
+ code << indentify("_tmp = scan(/\\A#{op.regexp}/#{lang})\n", indent)
99
113
  when CharRange
100
114
  ss = save()
101
115
  if op.start.bytesize == 1 and op.fin.bytesize == 1
102
- code << " #{ss} = self.pos\n"
103
- code << " _tmp = get_byte\n"
104
- code << " if _tmp\n"
116
+ code << indentify("#{ss} = self.pos\n", indent)
117
+ code << indentify("_tmp = get_byte\n", indent)
118
+ code << indentify("if _tmp\n", indent)
105
119
 
106
120
  if op.start.respond_to? :getbyte
107
121
  left = op.start.getbyte 0
@@ -110,178 +124,178 @@ module KPeg
110
124
  left = op.start[0]
111
125
  right = op.fin[0]
112
126
  end
113
-
114
- code << " unless _tmp >= #{left} and _tmp <= #{right}\n"
115
- code << " self.pos = #{ss}\n"
116
- code << " _tmp = nil\n"
117
- code << " end\n"
118
- code << " end\n"
127
+
128
+ code << indentify(" unless _tmp >= #{left} and _tmp <= #{right}\n", indent)
129
+ code << indentify(" self.pos = #{ss}\n", indent)
130
+ code << indentify(" _tmp = nil\n", indent)
131
+ code << indentify(" end\n", indent)
132
+ code << indentify("end\n", indent)
119
133
  else
120
134
  raise "Unsupported char range - #{op.inspect}"
121
135
  end
122
136
  when Choice
123
137
  ss = save()
124
- code << "\n #{ss} = self.pos\n"
125
- code << " while true # choice\n"
138
+ code << "\n"
139
+ code << indentify("#{ss} = self.pos\n", indent)
140
+ code << indentify("while true # choice\n", indent)
126
141
  op.ops.each_with_index do |n,idx|
127
- output_op code, n
128
-
129
- code << " break if _tmp\n"
130
- code << " self.pos = #{ss}\n"
142
+ output_op code, n, (indent+1)
143
+
144
+ code << indentify(" break if _tmp\n", indent)
145
+ code << indentify(" self.pos = #{ss}\n", indent)
131
146
  if idx == op.ops.size - 1
132
- code << " break\n"
147
+ code << indentify(" break\n", indent)
133
148
  end
134
149
  end
135
- code << " end # end choice\n\n"
150
+ code << indentify("end # end choice\n\n", indent)
136
151
  when Multiple
137
152
  ss = save()
138
153
  if op.min == 0 and op.max == 1
139
- code << " #{ss} = self.pos\n"
140
- output_op code, op.op
154
+ code << indentify("#{ss} = self.pos\n", indent)
155
+ output_op code, op.op, indent
141
156
  if op.save_values
142
- code << " @result = nil unless _tmp\n"
157
+ code << indentify("@result = nil unless _tmp\n", indent)
143
158
  end
144
- code << " unless _tmp\n"
145
- code << " _tmp = true\n"
146
- code << " self.pos = #{ss}\n"
147
- code << " end\n"
159
+ code << indentify("unless _tmp\n", indent)
160
+ code << indentify(" _tmp = true\n", indent)
161
+ code << indentify(" self.pos = #{ss}\n", indent)
162
+ code << indentify("end\n", indent)
148
163
  elsif op.min == 0 and !op.max
149
164
  if op.save_values
150
- code << " _ary = []\n"
165
+ code << indentify("_ary = []\n", indent)
151
166
  end
152
167
 
153
- code << " while true\n"
154
- output_op code, op.op
168
+ code << indentify("while true\n", indent)
169
+ output_op code, op.op, (indent+1)
155
170
  if op.save_values
156
- code << " _ary << @result if _tmp\n"
171
+ code << indentify(" _ary << @result if _tmp\n", indent)
157
172
  end
158
- code << " break unless _tmp\n"
159
- code << " end\n"
160
- code << " _tmp = true\n"
173
+ code << indentify(" break unless _tmp\n", indent)
174
+ code << indentify("end\n", indent)
175
+ code << indentify("_tmp = true\n", indent)
161
176
 
162
177
  if op.save_values
163
- code << " @result = _ary\n"
178
+ code << indentify("@result = _ary\n", indent)
164
179
  end
165
180
 
166
181
  elsif op.min == 1 and !op.max
167
- code << " #{ss} = self.pos\n"
182
+ code << indentify("#{ss} = self.pos\n", indent)
168
183
  if op.save_values
169
- code << " _ary = []\n"
184
+ code << indentify("_ary = []\n", indent)
170
185
  end
171
- output_op code, op.op
172
- code << " if _tmp\n"
186
+ output_op code, op.op, indent
187
+ code << indentify("if _tmp\n", indent)
173
188
  if op.save_values
174
- code << " _ary << @result\n"
189
+ code << indentify(" _ary << @result\n", indent)
175
190
  end
176
- code << " while true\n"
177
- code << " "
178
- output_op code, op.op
191
+ code << indentify(" while true\n", indent)
192
+ output_op code, op.op, (indent+2)
179
193
  if op.save_values
180
- code << " _ary << @result if _tmp\n"
194
+ code << indentify(" _ary << @result if _tmp\n", indent)
181
195
  end
182
- code << " break unless _tmp\n"
183
- code << " end\n"
184
- code << " _tmp = true\n"
196
+ code << indentify(" break unless _tmp\n", indent)
197
+ code << indentify(" end\n", indent)
198
+ code << indentify(" _tmp = true\n", indent)
185
199
  if op.save_values
186
- code << " @result = _ary\n"
200
+ code << indentify(" @result = _ary\n", indent)
187
201
  end
188
- code << " else\n"
189
- code << " self.pos = #{ss}\n"
190
- code << " end\n"
202
+ code << indentify("else\n", indent)
203
+ code << indentify(" self.pos = #{ss}\n", indent)
204
+ code << indentify("end\n", indent)
191
205
  else
192
- code << " #{ss} = self.pos\n"
193
- code << " _count = 0\n"
194
- code << " while true\n"
195
- code << " "
196
- output_op code, op.op
197
- code << " if _tmp\n"
198
- code << " _count += 1\n"
199
- code << " break if _count == #{op.max}\n"
200
- code << " else\n"
201
- code << " break\n"
202
- code << " end\n"
203
- code << " end\n"
204
- code << " if _count >= #{op.min}\n"
205
- code << " _tmp = true\n"
206
- code << " else\n"
207
- code << " self.pos = #{ss}\n"
208
- code << " _tmp = nil\n"
209
- code << " end\n"
206
+ code << indentify("#{ss} = self.pos\n", indent)
207
+ code << indentify("_count = 0\n", indent)
208
+ code << indentify("while true\n", indent)
209
+ output_op code, op.op, (indent+1)
210
+ code << indentify(" if _tmp\n", indent)
211
+ code << indentify(" _count += 1\n", indent)
212
+ code << indentify(" break if _count == #{op.max}\n", indent)
213
+ code << indentify(" else\n", indent)
214
+ code << indentify(" break\n", indent)
215
+ code << indentify(" end\n", indent)
216
+ code << indentify("end\n", indent)
217
+ code << indentify("if _count >= #{op.min}\n", indent)
218
+ code << indentify(" _tmp = true\n", indent)
219
+ code << indentify("else\n", indent)
220
+ code << indentify(" self.pos = #{ss}\n", indent)
221
+ code << indentify(" _tmp = nil\n", indent)
222
+ code << indentify("end\n", indent)
210
223
  end
211
224
 
212
225
  when Sequence
213
226
  ss = save()
214
- code << "\n #{ss} = self.pos\n"
215
- code << " while true # sequence\n"
227
+ code << "\n"
228
+ code << indentify("#{ss} = self.pos\n", indent)
229
+ code << indentify("while true # sequence\n", indent)
216
230
  op.ops.each_with_index do |n, idx|
217
- output_op code, n
231
+ output_op code, n, (indent+1)
218
232
 
219
233
  if idx == op.ops.size - 1
220
- code << " unless _tmp\n"
221
- code << " self.pos = #{ss}\n"
222
- code << " end\n"
223
- code << " break\n"
234
+ code << indentify(" unless _tmp\n", indent)
235
+ code << indentify(" self.pos = #{ss}\n", indent)
236
+ code << indentify(" end\n", indent)
237
+ code << indentify(" break\n", indent)
224
238
  else
225
- code << " unless _tmp\n"
226
- code << " self.pos = #{ss}\n"
227
- code << " break\n"
228
- code << " end\n"
239
+ code << indentify(" unless _tmp\n", indent)
240
+ code << indentify(" self.pos = #{ss}\n", indent)
241
+ code << indentify(" break\n", indent)
242
+ code << indentify(" end\n", indent)
229
243
  end
230
244
  end
231
- code << " end # end sequence\n\n"
245
+ code << indentify("end # end sequence\n\n", indent)
232
246
  when AndPredicate
233
247
  ss = save()
234
- code << " #{ss} = self.pos\n"
248
+ code << indentify("#{ss} = self.pos\n", indent)
235
249
  if op.op.kind_of? Action
236
- code << " _tmp = begin; #{op.op.action}; end\n"
250
+ code << indentify("_tmp = begin; #{op.op.action}; end\n", indent)
237
251
  else
238
- output_op code, op.op
252
+ output_op code, op.op, indent
239
253
  end
240
- code << " self.pos = #{ss}\n"
254
+ code << indentify("self.pos = #{ss}\n", indent)
241
255
  when NotPredicate
242
256
  ss = save()
243
- code << " #{ss} = self.pos\n"
257
+ code << indentify("#{ss} = self.pos\n", indent)
244
258
  if op.op.kind_of? Action
245
- code << " _tmp = begin; #{op.op.action}; end\n"
259
+ code << indentify("_tmp = begin; #{op.op.action}; end\n", indent)
246
260
  else
247
- output_op code, op.op
261
+ output_op code, op.op, indent
248
262
  end
249
- code << " _tmp = _tmp ? nil : true\n"
250
- code << " self.pos = #{ss}\n"
263
+ code << indentify("_tmp = _tmp ? nil : true\n", indent)
264
+ code << indentify("self.pos = #{ss}\n", indent)
251
265
  when RuleReference
252
- code << " _tmp = apply(:#{method_name op.rule_name})\n"
266
+ code << indentify("_tmp = apply(:#{method_name op.rule_name})\n", indent)
253
267
  when InvokeRule
254
268
  if op.arguments
255
- code << " _tmp = #{method_name op.rule_name}#{op.arguments}\n"
269
+ code << indentify("_tmp = #{method_name op.rule_name}#{op.arguments}\n", indent)
256
270
  else
257
- code << " _tmp = #{method_name op.rule_name}()\n"
271
+ code << indentify("_tmp = #{method_name op.rule_name}()\n", indent)
258
272
  end
259
273
  when ForeignInvokeRule
260
274
  if op.arguments
261
- code << " _tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name}, #{op.arguments[1..-2]})\n"
275
+ code << indentify("_tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name}, #{op.arguments[1..-2]})\n", indent)
262
276
  else
263
- code << " _tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name})\n"
277
+ code << indentify("_tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name})\n", indent)
264
278
  end
265
279
  when Tag
266
280
  if op.tag_name and !op.tag_name.empty?
267
- output_op code, op.op
268
- code << " #{op.tag_name} = @result\n"
281
+ output_op code, op.op, indent
282
+ code << indentify("#{op.tag_name} = @result\n", indent)
269
283
  else
270
- output_op code, op.op
284
+ output_op code, op.op, indent
271
285
  end
272
286
  when Action
273
- code << " @result = begin; "
287
+ code << indentify("@result = begin; ", indent)
274
288
  code << op.action << "; end\n"
275
289
  if @debug
276
- code << " puts \" => \" #{op.action.dump} \" => \#{@result.inspect} \\n\"\n"
290
+ code << indentify("puts \" => \" #{op.action.dump} \" => \#{@result.inspect} \\n\"\n", indent)
277
291
  end
278
- code << " _tmp = true\n"
292
+ code << indentify("_tmp = true\n", indent)
279
293
  when Collect
280
- code << " _text_start = self.pos\n"
281
- output_op code, op.op
282
- code << " if _tmp\n"
283
- code << " text = get_text(_text_start)\n"
284
- code << " end\n"
294
+ code << indentify("_text_start = self.pos\n", indent)
295
+ output_op code, op.op, indent
296
+ code << indentify("if _tmp\n", indent)
297
+ code << indentify(" text = get_text(_text_start)\n", indent)
298
+ code << indentify("end\n", indent)
285
299
  else
286
300
  raise "Unknown op - #{op.class}"
287
301
  end
@@ -339,13 +353,6 @@ module KPeg
339
353
  code << " @_grammar_#{name} = #{gram}.new(nil)\n"
340
354
  end
341
355
  code << " end\n"
342
-
343
- @grammar.foreign_grammars.each do |name, gram|
344
- code << "\n"
345
- code << " def invoke_#{name}(*args)\n"
346
- code << " @_grammar_#{name}.external_invoke(self, :_root, *args)\n"
347
- code << " end\n"
348
- end
349
356
  end
350
357
 
351
358
  render = GrammarRenderer.new(@grammar)