kpeg 0.8.0 → 0.8.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +95 -9
- data/Rakefile +1 -0
- data/kpeg.gemspec +1 -1
- data/lib/kpeg/code_generator.rb +124 -117
- data/lib/kpeg/format_parser.rb +198 -25
- data/lib/kpeg/grammar.rb +20 -3
- data/lib/kpeg/version.rb +1 -1
- data/test/test_kpeg_code_generator.rb +182 -155
- data/test/test_kpeg_format.rb +25 -1
- data/test/test_kpeg_grammar_renderer.rb +12 -0
- metadata +4 -5
- data/LICENSE +0 -25
data/README.md
CHANGED
@@ -23,16 +23,20 @@ After that a block of ruby code can be defined that will be added into the class
|
|
23
23
|
|
24
24
|
### Defining literals
|
25
25
|
|
26
|
-
Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables.
|
26
|
+
Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables. Literals can take strings, regular expressions or character ranges
|
27
27
|
|
28
28
|
ALPHA = /[A-Za-z]/
|
29
29
|
DIGIT = /[0-9]/
|
30
30
|
period = "."
|
31
|
+
string = "a string"
|
32
|
+
regex = /(regexs?)+/
|
33
|
+
char_range = [b-t]
|
31
34
|
|
32
35
|
Literals can also accept multiple definitions
|
33
36
|
|
34
37
|
vowel = "a" | "e" | "i" | "o" | "u"
|
35
38
|
alpha = /[A-Z]/ | /[a-z]/
|
39
|
+
|
36
40
|
|
37
41
|
### Defining Rules for Values
|
38
42
|
|
@@ -41,29 +45,111 @@ Before you can start parsing a string you will need to define rules that you wil
|
|
41
45
|
The most basic of these rules is a string capture
|
42
46
|
|
43
47
|
alpha = < /[A-Za-z]/ > { text }
|
48
|
+
|
44
49
|
|
45
50
|
While this looks very much like the ALPHA literal defined above it differs in one important way, the text captured by the rule defined between the < and > symbols will be set as the text variable in block that follows. You can also explicitly define the variable that you would like but only with existing rules or literals.
|
46
51
|
|
52
|
+
letter = alpha:a { a }
|
53
|
+
|
54
|
+
Additionally blocks can return true or false values based upon an expression within the block. To return true if a test passes do the following:
|
55
|
+
|
56
|
+
match_greater_than_10 = < num:n > &{ n > 10 }
|
57
|
+
|
58
|
+
To test and return a false value if the test passes do the following:
|
59
|
+
|
60
|
+
do_not_match_greater_than_10 = < num:n > !{ n > 10 }
|
61
|
+
|
62
|
+
Rules can also act like functions and take parameters. An example of this is lifted from the [Email List Validator](https://github.com/larb/email_address_validator), where an ascii value is passed in and the character is evaluated against it returning a true if it matches
|
63
|
+
|
64
|
+
d(num) = <.> &{ text[0] == num }
|
65
|
+
|
66
|
+
Rules support some regular expression syntax for matching
|
67
|
+
|
68
|
+
+ maybe ?
|
69
|
+
+ many
|
70
|
+
+ kleene *
|
71
|
+
+ groupings ()
|
72
|
+
|
73
|
+
Examples
|
74
|
+
|
75
|
+
letters = alpha+
|
76
|
+
words = alpha+ space* period?
|
77
|
+
sentence = (letters+ | space+)+
|
78
|
+
|
79
|
+
Kpeg also allows a rule to define the acceptable number of matches in the form of a range. In regular expressions this is often denoted with syntax like {0,3}. Kpeg uses this syntax to accomplish match ranges [min, max].
|
80
|
+
|
81
|
+
matches_3_to_5_times = letter[3,5]
|
82
|
+
matches_3_to_any_times = letter[3,*]
|
83
|
+
|
84
|
+
|
85
|
+
### Defining Actions
|
86
|
+
|
87
|
+
Illustrated above in some of the examples, kpeg allows you to perform actions based upon a match that are described in block provided or in the rule definition itself.
|
88
|
+
|
47
89
|
num = /[1-9][0-9]*/
|
48
90
|
sum = < num:n1 "+" num:n2 > { n1 + n2 }
|
91
|
+
|
92
|
+
### Referencing an external grammar
|
93
|
+
|
94
|
+
Kpeg allows you to run a rule that is defined in an external grammar. This is useful if there is a defined set of rules that you would like to reuse in another parser. To do this, create your grammar and generate a parser using the kpeg command line tool.
|
95
|
+
|
96
|
+
kpeg literals.kpeg
|
97
|
+
|
98
|
+
Once you have the generated parser, include that file into your new grammar
|
99
|
+
|
100
|
+
%{
|
101
|
+
require "literals.kpeg.rb"
|
102
|
+
}
|
49
103
|
|
50
|
-
|
104
|
+
Then create a variable to hold to foreign interface and pass it the class name of your parser. In this case my parser class name is Literal
|
105
|
+
|
106
|
+
%foreign_grammer = Literal
|
51
107
|
|
52
|
-
|
108
|
+
You can then use rules defined in the foreign grammar in the local grammar file like so
|
109
|
+
|
110
|
+
sentence = (%foreign_grammer.alpha %foreign_grammer.space*)+ %foreign_grammer.period
|
111
|
+
|
112
|
+
### Comments
|
113
|
+
|
114
|
+
Kpeg allows comments to be added to the grammar file by using the # symbol
|
115
|
+
|
116
|
+
# This is a comment in my grammar
|
53
117
|
|
54
|
-
|
118
|
+
## Generating and running your parser
|
119
|
+
|
120
|
+
Before you can generate your parser you will need to define a root rule. This will be the first rule run against the string provided to the parser
|
55
121
|
|
56
|
-
|
122
|
+
root = sentence
|
57
123
|
|
58
|
-
|
124
|
+
To generate the parser run the kpeg command with the kpeg file(s) as an argument. This will generate a ruby file with the same name as your grammar file.
|
125
|
+
|
126
|
+
kpeg example.kpeg
|
59
127
|
|
60
|
-
|
128
|
+
Include your generated parser file into an application that you want to use the parser in and run it. Create a new instance of the parser and pass in the string you want to evaluate. When parse is called on the parser instance it will return a true if the sting is matched, or false if it doesn't.
|
129
|
+
|
130
|
+
require "example.kpeg.rb"
|
61
131
|
|
132
|
+
parser = Example::Parser.new(string_to_evaluate)
|
133
|
+
parser.parse
|
134
|
+
|
135
|
+
## Shortcuts and other techniques
|
136
|
+
|
137
|
+
Per vito, you can get the current line or current column in the following way
|
62
138
|
|
139
|
+
line = { current_line }
|
140
|
+
column = { current_column }
|
141
|
+
foo = line:line ... { # use line here }
|
142
|
+
|
143
|
+
## Examples
|
144
|
+
|
145
|
+
There are several examples available in the /examples directory. The upper parser has a readme with a step by step description of the grammar.
|
63
146
|
|
64
147
|
## Projects using kpeg
|
65
148
|
|
66
149
|
[Dang](https://github.com/veganstraightedge/dang)
|
67
|
-
|
150
|
+
|
151
|
+
[Email Address Validator](https://github.com/larb/email_address_validator)
|
152
|
+
|
68
153
|
[Callisto](https://github.com/dwaite/Callisto)
|
69
|
-
|
154
|
+
|
155
|
+
[Doodle](https://github.com/vito/doodle)
|
data/Rakefile
CHANGED
data/kpeg.gemspec
CHANGED
@@ -15,7 +15,7 @@ Gem::Specification.new do |s|
|
|
15
15
|
rb = Dir["lib/**/*.rb"] << "bin/kpeg"
|
16
16
|
docs = Dir["doc/**/*"]
|
17
17
|
|
18
|
-
s.files = rb + docs + ["
|
18
|
+
s.files = rb + docs + ["README.md", "Rakefile", "kpeg.gemspec", "Gemfile"]
|
19
19
|
s.test_files = Dir["test/**/*.rb"]
|
20
20
|
s.bindir = "bin"
|
21
21
|
s.executables = ["kpeg"]
|
data/lib/kpeg/code_generator.rb
CHANGED
@@ -63,7 +63,11 @@ module KPeg
|
|
63
63
|
|
64
64
|
methods = []
|
65
65
|
|
66
|
-
@grammar.variables.
|
66
|
+
vars = @grammar.variables.keys.sort
|
67
|
+
|
68
|
+
vars.each do |name|
|
69
|
+
val = @grammar.variables[name]
|
70
|
+
|
67
71
|
if val.index("ast ") == 0
|
68
72
|
unless output_node
|
69
73
|
code << "\n"
|
@@ -86,22 +90,32 @@ module KPeg
|
|
86
90
|
end
|
87
91
|
end
|
88
92
|
end
|
89
|
-
|
90
|
-
def
|
93
|
+
|
94
|
+
def indentify(code, indent)
|
95
|
+
"#{" " * indent}#{code}"
|
96
|
+
end
|
97
|
+
|
98
|
+
# Default indent is 4 spaces (indent=2)
|
99
|
+
def output_op(code, op, indent=2)
|
91
100
|
case op
|
92
101
|
when Dot
|
93
|
-
code << "
|
102
|
+
code << indentify("_tmp = get_byte\n", indent)
|
94
103
|
when LiteralString
|
95
|
-
code << "
|
104
|
+
code << indentify("_tmp = match_string(#{op.string.dump})\n", indent)
|
96
105
|
when LiteralRegexp
|
97
|
-
|
98
|
-
|
106
|
+
if op.regexp.respond_to?(:kcode)
|
107
|
+
lang = op.regexp.kcode.to_s[0,1]
|
108
|
+
else
|
109
|
+
# Let default ruby string handling figure it out
|
110
|
+
lang = ""
|
111
|
+
end
|
112
|
+
code << indentify("_tmp = scan(/\\A#{op.regexp}/#{lang})\n", indent)
|
99
113
|
when CharRange
|
100
114
|
ss = save()
|
101
115
|
if op.start.bytesize == 1 and op.fin.bytesize == 1
|
102
|
-
code << "
|
103
|
-
code << "
|
104
|
-
code << "
|
116
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
117
|
+
code << indentify("_tmp = get_byte\n", indent)
|
118
|
+
code << indentify("if _tmp\n", indent)
|
105
119
|
|
106
120
|
if op.start.respond_to? :getbyte
|
107
121
|
left = op.start.getbyte 0
|
@@ -110,178 +124,178 @@ module KPeg
|
|
110
124
|
left = op.start[0]
|
111
125
|
right = op.fin[0]
|
112
126
|
end
|
113
|
-
|
114
|
-
code << "
|
115
|
-
code << "
|
116
|
-
code << "
|
117
|
-
code << "
|
118
|
-
code << "
|
127
|
+
|
128
|
+
code << indentify(" unless _tmp >= #{left} and _tmp <= #{right}\n", indent)
|
129
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
130
|
+
code << indentify(" _tmp = nil\n", indent)
|
131
|
+
code << indentify(" end\n", indent)
|
132
|
+
code << indentify("end\n", indent)
|
119
133
|
else
|
120
134
|
raise "Unsupported char range - #{op.inspect}"
|
121
135
|
end
|
122
136
|
when Choice
|
123
137
|
ss = save()
|
124
|
-
code << "\n
|
125
|
-
code << "
|
138
|
+
code << "\n"
|
139
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
140
|
+
code << indentify("while true # choice\n", indent)
|
126
141
|
op.ops.each_with_index do |n,idx|
|
127
|
-
output_op code, n
|
128
|
-
|
129
|
-
code << "
|
130
|
-
code << "
|
142
|
+
output_op code, n, (indent+1)
|
143
|
+
|
144
|
+
code << indentify(" break if _tmp\n", indent)
|
145
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
131
146
|
if idx == op.ops.size - 1
|
132
|
-
code << "
|
147
|
+
code << indentify(" break\n", indent)
|
133
148
|
end
|
134
149
|
end
|
135
|
-
code << "
|
150
|
+
code << indentify("end # end choice\n\n", indent)
|
136
151
|
when Multiple
|
137
152
|
ss = save()
|
138
153
|
if op.min == 0 and op.max == 1
|
139
|
-
code << "
|
140
|
-
output_op code, op.op
|
154
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
155
|
+
output_op code, op.op, indent
|
141
156
|
if op.save_values
|
142
|
-
code << "
|
157
|
+
code << indentify("@result = nil unless _tmp\n", indent)
|
143
158
|
end
|
144
|
-
code << "
|
145
|
-
code << "
|
146
|
-
code << "
|
147
|
-
code << "
|
159
|
+
code << indentify("unless _tmp\n", indent)
|
160
|
+
code << indentify(" _tmp = true\n", indent)
|
161
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
162
|
+
code << indentify("end\n", indent)
|
148
163
|
elsif op.min == 0 and !op.max
|
149
164
|
if op.save_values
|
150
|
-
code << "
|
165
|
+
code << indentify("_ary = []\n", indent)
|
151
166
|
end
|
152
167
|
|
153
|
-
code << "
|
154
|
-
output_op code, op.op
|
168
|
+
code << indentify("while true\n", indent)
|
169
|
+
output_op code, op.op, (indent+1)
|
155
170
|
if op.save_values
|
156
|
-
code << "
|
171
|
+
code << indentify(" _ary << @result if _tmp\n", indent)
|
157
172
|
end
|
158
|
-
code << "
|
159
|
-
code << "
|
160
|
-
code << "
|
173
|
+
code << indentify(" break unless _tmp\n", indent)
|
174
|
+
code << indentify("end\n", indent)
|
175
|
+
code << indentify("_tmp = true\n", indent)
|
161
176
|
|
162
177
|
if op.save_values
|
163
|
-
code << "
|
178
|
+
code << indentify("@result = _ary\n", indent)
|
164
179
|
end
|
165
180
|
|
166
181
|
elsif op.min == 1 and !op.max
|
167
|
-
code << "
|
182
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
168
183
|
if op.save_values
|
169
|
-
code << "
|
184
|
+
code << indentify("_ary = []\n", indent)
|
170
185
|
end
|
171
|
-
output_op code, op.op
|
172
|
-
code << "
|
186
|
+
output_op code, op.op, indent
|
187
|
+
code << indentify("if _tmp\n", indent)
|
173
188
|
if op.save_values
|
174
|
-
code << "
|
189
|
+
code << indentify(" _ary << @result\n", indent)
|
175
190
|
end
|
176
|
-
code << "
|
177
|
-
code
|
178
|
-
output_op code, op.op
|
191
|
+
code << indentify(" while true\n", indent)
|
192
|
+
output_op code, op.op, (indent+2)
|
179
193
|
if op.save_values
|
180
|
-
code << "
|
194
|
+
code << indentify(" _ary << @result if _tmp\n", indent)
|
181
195
|
end
|
182
|
-
code << "
|
183
|
-
code << "
|
184
|
-
code << "
|
196
|
+
code << indentify(" break unless _tmp\n", indent)
|
197
|
+
code << indentify(" end\n", indent)
|
198
|
+
code << indentify(" _tmp = true\n", indent)
|
185
199
|
if op.save_values
|
186
|
-
code << "
|
200
|
+
code << indentify(" @result = _ary\n", indent)
|
187
201
|
end
|
188
|
-
code << "
|
189
|
-
code << "
|
190
|
-
code << "
|
202
|
+
code << indentify("else\n", indent)
|
203
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
204
|
+
code << indentify("end\n", indent)
|
191
205
|
else
|
192
|
-
code << "
|
193
|
-
code << "
|
194
|
-
code << "
|
195
|
-
code
|
196
|
-
|
197
|
-
code << "
|
198
|
-
code << "
|
199
|
-
code << "
|
200
|
-
code << "
|
201
|
-
code << "
|
202
|
-
code << "
|
203
|
-
code << "
|
204
|
-
code << "
|
205
|
-
code << "
|
206
|
-
code << "
|
207
|
-
code << "
|
208
|
-
code << "
|
209
|
-
code << " end\n"
|
206
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
207
|
+
code << indentify("_count = 0\n", indent)
|
208
|
+
code << indentify("while true\n", indent)
|
209
|
+
output_op code, op.op, (indent+1)
|
210
|
+
code << indentify(" if _tmp\n", indent)
|
211
|
+
code << indentify(" _count += 1\n", indent)
|
212
|
+
code << indentify(" break if _count == #{op.max}\n", indent)
|
213
|
+
code << indentify(" else\n", indent)
|
214
|
+
code << indentify(" break\n", indent)
|
215
|
+
code << indentify(" end\n", indent)
|
216
|
+
code << indentify("end\n", indent)
|
217
|
+
code << indentify("if _count >= #{op.min}\n", indent)
|
218
|
+
code << indentify(" _tmp = true\n", indent)
|
219
|
+
code << indentify("else\n", indent)
|
220
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
221
|
+
code << indentify(" _tmp = nil\n", indent)
|
222
|
+
code << indentify("end\n", indent)
|
210
223
|
end
|
211
224
|
|
212
225
|
when Sequence
|
213
226
|
ss = save()
|
214
|
-
code << "\n
|
215
|
-
code << "
|
227
|
+
code << "\n"
|
228
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
229
|
+
code << indentify("while true # sequence\n", indent)
|
216
230
|
op.ops.each_with_index do |n, idx|
|
217
|
-
output_op code, n
|
231
|
+
output_op code, n, (indent+1)
|
218
232
|
|
219
233
|
if idx == op.ops.size - 1
|
220
|
-
code << "
|
221
|
-
code << "
|
222
|
-
code << "
|
223
|
-
code << "
|
234
|
+
code << indentify(" unless _tmp\n", indent)
|
235
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
236
|
+
code << indentify(" end\n", indent)
|
237
|
+
code << indentify(" break\n", indent)
|
224
238
|
else
|
225
|
-
code << "
|
226
|
-
code << "
|
227
|
-
code << "
|
228
|
-
code << "
|
239
|
+
code << indentify(" unless _tmp\n", indent)
|
240
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
241
|
+
code << indentify(" break\n", indent)
|
242
|
+
code << indentify(" end\n", indent)
|
229
243
|
end
|
230
244
|
end
|
231
|
-
code << "
|
245
|
+
code << indentify("end # end sequence\n\n", indent)
|
232
246
|
when AndPredicate
|
233
247
|
ss = save()
|
234
|
-
code << "
|
248
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
235
249
|
if op.op.kind_of? Action
|
236
|
-
code << "
|
250
|
+
code << indentify("_tmp = begin; #{op.op.action}; end\n", indent)
|
237
251
|
else
|
238
|
-
output_op code, op.op
|
252
|
+
output_op code, op.op, indent
|
239
253
|
end
|
240
|
-
code << "
|
254
|
+
code << indentify("self.pos = #{ss}\n", indent)
|
241
255
|
when NotPredicate
|
242
256
|
ss = save()
|
243
|
-
code << "
|
257
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
244
258
|
if op.op.kind_of? Action
|
245
|
-
code << "
|
259
|
+
code << indentify("_tmp = begin; #{op.op.action}; end\n", indent)
|
246
260
|
else
|
247
|
-
output_op code, op.op
|
261
|
+
output_op code, op.op, indent
|
248
262
|
end
|
249
|
-
code << "
|
250
|
-
code << "
|
263
|
+
code << indentify("_tmp = _tmp ? nil : true\n", indent)
|
264
|
+
code << indentify("self.pos = #{ss}\n", indent)
|
251
265
|
when RuleReference
|
252
|
-
code << "
|
266
|
+
code << indentify("_tmp = apply(:#{method_name op.rule_name})\n", indent)
|
253
267
|
when InvokeRule
|
254
268
|
if op.arguments
|
255
|
-
code << "
|
269
|
+
code << indentify("_tmp = #{method_name op.rule_name}#{op.arguments}\n", indent)
|
256
270
|
else
|
257
|
-
code << "
|
271
|
+
code << indentify("_tmp = #{method_name op.rule_name}()\n", indent)
|
258
272
|
end
|
259
273
|
when ForeignInvokeRule
|
260
274
|
if op.arguments
|
261
|
-
code << "
|
275
|
+
code << indentify("_tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name}, #{op.arguments[1..-2]})\n", indent)
|
262
276
|
else
|
263
|
-
code << "
|
277
|
+
code << indentify("_tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name})\n", indent)
|
264
278
|
end
|
265
279
|
when Tag
|
266
280
|
if op.tag_name and !op.tag_name.empty?
|
267
|
-
output_op code, op.op
|
268
|
-
code << "
|
281
|
+
output_op code, op.op, indent
|
282
|
+
code << indentify("#{op.tag_name} = @result\n", indent)
|
269
283
|
else
|
270
|
-
output_op code, op.op
|
284
|
+
output_op code, op.op, indent
|
271
285
|
end
|
272
286
|
when Action
|
273
|
-
code << "
|
287
|
+
code << indentify("@result = begin; ", indent)
|
274
288
|
code << op.action << "; end\n"
|
275
289
|
if @debug
|
276
|
-
code << "
|
290
|
+
code << indentify("puts \" => \" #{op.action.dump} \" => \#{@result.inspect} \\n\"\n", indent)
|
277
291
|
end
|
278
|
-
code << "
|
292
|
+
code << indentify("_tmp = true\n", indent)
|
279
293
|
when Collect
|
280
|
-
code << "
|
281
|
-
output_op code, op.op
|
282
|
-
code << "
|
283
|
-
code << "
|
284
|
-
code << "
|
294
|
+
code << indentify("_text_start = self.pos\n", indent)
|
295
|
+
output_op code, op.op, indent
|
296
|
+
code << indentify("if _tmp\n", indent)
|
297
|
+
code << indentify(" text = get_text(_text_start)\n", indent)
|
298
|
+
code << indentify("end\n", indent)
|
285
299
|
else
|
286
300
|
raise "Unknown op - #{op.class}"
|
287
301
|
end
|
@@ -339,13 +353,6 @@ module KPeg
|
|
339
353
|
code << " @_grammar_#{name} = #{gram}.new(nil)\n"
|
340
354
|
end
|
341
355
|
code << " end\n"
|
342
|
-
|
343
|
-
@grammar.foreign_grammars.each do |name, gram|
|
344
|
-
code << "\n"
|
345
|
-
code << " def invoke_#{name}(*args)\n"
|
346
|
-
code << " @_grammar_#{name}.external_invoke(self, :_root, *args)\n"
|
347
|
-
code << " end\n"
|
348
|
-
end
|
349
356
|
end
|
350
357
|
|
351
358
|
render = GrammarRenderer.new(@grammar)
|