kpeg 0.8.0 → 0.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +95 -9
- data/Rakefile +1 -0
- data/kpeg.gemspec +1 -1
- data/lib/kpeg/code_generator.rb +124 -117
- data/lib/kpeg/format_parser.rb +198 -25
- data/lib/kpeg/grammar.rb +20 -3
- data/lib/kpeg/version.rb +1 -1
- data/test/test_kpeg_code_generator.rb +182 -155
- data/test/test_kpeg_format.rb +25 -1
- data/test/test_kpeg_grammar_renderer.rb +12 -0
- metadata +4 -5
- data/LICENSE +0 -25
data/README.md
CHANGED
@@ -23,16 +23,20 @@ After that a block of ruby code can be defined that will be added into the class
|
|
23
23
|
|
24
24
|
### Defining literals
|
25
25
|
|
26
|
-
Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables.
|
26
|
+
Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables. Literals can take strings, regular expressions or character ranges
|
27
27
|
|
28
28
|
ALPHA = /[A-Za-z]/
|
29
29
|
DIGIT = /[0-9]/
|
30
30
|
period = "."
|
31
|
+
string = "a string"
|
32
|
+
regex = /(regexs?)+/
|
33
|
+
char_range = [b-t]
|
31
34
|
|
32
35
|
Literals can also accept multiple definitions
|
33
36
|
|
34
37
|
vowel = "a" | "e" | "i" | "o" | "u"
|
35
38
|
alpha = /[A-Z]/ | /[a-z]/
|
39
|
+
|
36
40
|
|
37
41
|
### Defining Rules for Values
|
38
42
|
|
@@ -41,29 +45,111 @@ Before you can start parsing a string you will need to define rules that you wil
|
|
41
45
|
The most basic of these rules is a string capture
|
42
46
|
|
43
47
|
alpha = < /[A-Za-z]/ > { text }
|
48
|
+
|
44
49
|
|
45
50
|
While this looks very much like the ALPHA literal defined above it differs in one important way, the text captured by the rule defined between the < and > symbols will be set as the text variable in block that follows. You can also explicitly define the variable that you would like but only with existing rules or literals.
|
46
51
|
|
52
|
+
letter = alpha:a { a }
|
53
|
+
|
54
|
+
Additionally blocks can return true or false values based upon an expression within the block. To return true if a test passes do the following:
|
55
|
+
|
56
|
+
match_greater_than_10 = < num:n > &{ n > 10 }
|
57
|
+
|
58
|
+
To test and return a false value if the test passes do the following:
|
59
|
+
|
60
|
+
do_not_match_greater_than_10 = < num:n > !{ n > 10 }
|
61
|
+
|
62
|
+
Rules can also act like functions and take parameters. An example of this is lifted from the [Email List Validator](https://github.com/larb/email_address_validator), where an ascii value is passed in and the character is evaluated against it returning a true if it matches
|
63
|
+
|
64
|
+
d(num) = <.> &{ text[0] == num }
|
65
|
+
|
66
|
+
Rules support some regular expression syntax for matching
|
67
|
+
|
68
|
+
+ maybe ?
|
69
|
+
+ many
|
70
|
+
+ kleene *
|
71
|
+
+ groupings ()
|
72
|
+
|
73
|
+
Examples
|
74
|
+
|
75
|
+
letters = alpha+
|
76
|
+
words = alpha+ space* period?
|
77
|
+
sentence = (letters+ | space+)+
|
78
|
+
|
79
|
+
Kpeg also allows a rule to define the acceptable number of matches in the form of a range. In regular expressions this is often denoted with syntax like {0,3}. Kpeg uses this syntax to accomplish match ranges [min, max].
|
80
|
+
|
81
|
+
matches_3_to_5_times = letter[3,5]
|
82
|
+
matches_3_to_any_times = letter[3,*]
|
83
|
+
|
84
|
+
|
85
|
+
### Defining Actions
|
86
|
+
|
87
|
+
Illustrated above in some of the examples, kpeg allows you to perform actions based upon a match that are described in block provided or in the rule definition itself.
|
88
|
+
|
47
89
|
num = /[1-9][0-9]*/
|
48
90
|
sum = < num:n1 "+" num:n2 > { n1 + n2 }
|
91
|
+
|
92
|
+
### Referencing an external grammar
|
93
|
+
|
94
|
+
Kpeg allows you to run a rule that is defined in an external grammar. This is useful if there is a defined set of rules that you would like to reuse in another parser. To do this, create your grammar and generate a parser using the kpeg command line tool.
|
95
|
+
|
96
|
+
kpeg literals.kpeg
|
97
|
+
|
98
|
+
Once you have the generated parser, include that file into your new grammar
|
99
|
+
|
100
|
+
%{
|
101
|
+
require "literals.kpeg.rb"
|
102
|
+
}
|
49
103
|
|
50
|
-
|
104
|
+
Then create a variable to hold to foreign interface and pass it the class name of your parser. In this case my parser class name is Literal
|
105
|
+
|
106
|
+
%foreign_grammer = Literal
|
51
107
|
|
52
|
-
|
108
|
+
You can then use rules defined in the foreign grammar in the local grammar file like so
|
109
|
+
|
110
|
+
sentence = (%foreign_grammer.alpha %foreign_grammer.space*)+ %foreign_grammer.period
|
111
|
+
|
112
|
+
### Comments
|
113
|
+
|
114
|
+
Kpeg allows comments to be added to the grammar file by using the # symbol
|
115
|
+
|
116
|
+
# This is a comment in my grammar
|
53
117
|
|
54
|
-
|
118
|
+
## Generating and running your parser
|
119
|
+
|
120
|
+
Before you can generate your parser you will need to define a root rule. This will be the first rule run against the string provided to the parser
|
55
121
|
|
56
|
-
|
122
|
+
root = sentence
|
57
123
|
|
58
|
-
|
124
|
+
To generate the parser run the kpeg command with the kpeg file(s) as an argument. This will generate a ruby file with the same name as your grammar file.
|
125
|
+
|
126
|
+
kpeg example.kpeg
|
59
127
|
|
60
|
-
|
128
|
+
Include your generated parser file into an application that you want to use the parser in and run it. Create a new instance of the parser and pass in the string you want to evaluate. When parse is called on the parser instance it will return a true if the sting is matched, or false if it doesn't.
|
129
|
+
|
130
|
+
require "example.kpeg.rb"
|
61
131
|
|
132
|
+
parser = Example::Parser.new(string_to_evaluate)
|
133
|
+
parser.parse
|
134
|
+
|
135
|
+
## Shortcuts and other techniques
|
136
|
+
|
137
|
+
Per vito, you can get the current line or current column in the following way
|
62
138
|
|
139
|
+
line = { current_line }
|
140
|
+
column = { current_column }
|
141
|
+
foo = line:line ... { # use line here }
|
142
|
+
|
143
|
+
## Examples
|
144
|
+
|
145
|
+
There are several examples available in the /examples directory. The upper parser has a readme with a step by step description of the grammar.
|
63
146
|
|
64
147
|
## Projects using kpeg
|
65
148
|
|
66
149
|
[Dang](https://github.com/veganstraightedge/dang)
|
67
|
-
|
150
|
+
|
151
|
+
[Email Address Validator](https://github.com/larb/email_address_validator)
|
152
|
+
|
68
153
|
[Callisto](https://github.com/dwaite/Callisto)
|
69
|
-
|
154
|
+
|
155
|
+
[Doodle](https://github.com/vito/doodle)
|
data/Rakefile
CHANGED
data/kpeg.gemspec
CHANGED
@@ -15,7 +15,7 @@ Gem::Specification.new do |s|
|
|
15
15
|
rb = Dir["lib/**/*.rb"] << "bin/kpeg"
|
16
16
|
docs = Dir["doc/**/*"]
|
17
17
|
|
18
|
-
s.files = rb + docs + ["
|
18
|
+
s.files = rb + docs + ["README.md", "Rakefile", "kpeg.gemspec", "Gemfile"]
|
19
19
|
s.test_files = Dir["test/**/*.rb"]
|
20
20
|
s.bindir = "bin"
|
21
21
|
s.executables = ["kpeg"]
|
data/lib/kpeg/code_generator.rb
CHANGED
@@ -63,7 +63,11 @@ module KPeg
|
|
63
63
|
|
64
64
|
methods = []
|
65
65
|
|
66
|
-
@grammar.variables.
|
66
|
+
vars = @grammar.variables.keys.sort
|
67
|
+
|
68
|
+
vars.each do |name|
|
69
|
+
val = @grammar.variables[name]
|
70
|
+
|
67
71
|
if val.index("ast ") == 0
|
68
72
|
unless output_node
|
69
73
|
code << "\n"
|
@@ -86,22 +90,32 @@ module KPeg
|
|
86
90
|
end
|
87
91
|
end
|
88
92
|
end
|
89
|
-
|
90
|
-
def
|
93
|
+
|
94
|
+
def indentify(code, indent)
|
95
|
+
"#{" " * indent}#{code}"
|
96
|
+
end
|
97
|
+
|
98
|
+
# Default indent is 4 spaces (indent=2)
|
99
|
+
def output_op(code, op, indent=2)
|
91
100
|
case op
|
92
101
|
when Dot
|
93
|
-
code << "
|
102
|
+
code << indentify("_tmp = get_byte\n", indent)
|
94
103
|
when LiteralString
|
95
|
-
code << "
|
104
|
+
code << indentify("_tmp = match_string(#{op.string.dump})\n", indent)
|
96
105
|
when LiteralRegexp
|
97
|
-
|
98
|
-
|
106
|
+
if op.regexp.respond_to?(:kcode)
|
107
|
+
lang = op.regexp.kcode.to_s[0,1]
|
108
|
+
else
|
109
|
+
# Let default ruby string handling figure it out
|
110
|
+
lang = ""
|
111
|
+
end
|
112
|
+
code << indentify("_tmp = scan(/\\A#{op.regexp}/#{lang})\n", indent)
|
99
113
|
when CharRange
|
100
114
|
ss = save()
|
101
115
|
if op.start.bytesize == 1 and op.fin.bytesize == 1
|
102
|
-
code << "
|
103
|
-
code << "
|
104
|
-
code << "
|
116
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
117
|
+
code << indentify("_tmp = get_byte\n", indent)
|
118
|
+
code << indentify("if _tmp\n", indent)
|
105
119
|
|
106
120
|
if op.start.respond_to? :getbyte
|
107
121
|
left = op.start.getbyte 0
|
@@ -110,178 +124,178 @@ module KPeg
|
|
110
124
|
left = op.start[0]
|
111
125
|
right = op.fin[0]
|
112
126
|
end
|
113
|
-
|
114
|
-
code << "
|
115
|
-
code << "
|
116
|
-
code << "
|
117
|
-
code << "
|
118
|
-
code << "
|
127
|
+
|
128
|
+
code << indentify(" unless _tmp >= #{left} and _tmp <= #{right}\n", indent)
|
129
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
130
|
+
code << indentify(" _tmp = nil\n", indent)
|
131
|
+
code << indentify(" end\n", indent)
|
132
|
+
code << indentify("end\n", indent)
|
119
133
|
else
|
120
134
|
raise "Unsupported char range - #{op.inspect}"
|
121
135
|
end
|
122
136
|
when Choice
|
123
137
|
ss = save()
|
124
|
-
code << "\n
|
125
|
-
code << "
|
138
|
+
code << "\n"
|
139
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
140
|
+
code << indentify("while true # choice\n", indent)
|
126
141
|
op.ops.each_with_index do |n,idx|
|
127
|
-
output_op code, n
|
128
|
-
|
129
|
-
code << "
|
130
|
-
code << "
|
142
|
+
output_op code, n, (indent+1)
|
143
|
+
|
144
|
+
code << indentify(" break if _tmp\n", indent)
|
145
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
131
146
|
if idx == op.ops.size - 1
|
132
|
-
code << "
|
147
|
+
code << indentify(" break\n", indent)
|
133
148
|
end
|
134
149
|
end
|
135
|
-
code << "
|
150
|
+
code << indentify("end # end choice\n\n", indent)
|
136
151
|
when Multiple
|
137
152
|
ss = save()
|
138
153
|
if op.min == 0 and op.max == 1
|
139
|
-
code << "
|
140
|
-
output_op code, op.op
|
154
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
155
|
+
output_op code, op.op, indent
|
141
156
|
if op.save_values
|
142
|
-
code << "
|
157
|
+
code << indentify("@result = nil unless _tmp\n", indent)
|
143
158
|
end
|
144
|
-
code << "
|
145
|
-
code << "
|
146
|
-
code << "
|
147
|
-
code << "
|
159
|
+
code << indentify("unless _tmp\n", indent)
|
160
|
+
code << indentify(" _tmp = true\n", indent)
|
161
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
162
|
+
code << indentify("end\n", indent)
|
148
163
|
elsif op.min == 0 and !op.max
|
149
164
|
if op.save_values
|
150
|
-
code << "
|
165
|
+
code << indentify("_ary = []\n", indent)
|
151
166
|
end
|
152
167
|
|
153
|
-
code << "
|
154
|
-
output_op code, op.op
|
168
|
+
code << indentify("while true\n", indent)
|
169
|
+
output_op code, op.op, (indent+1)
|
155
170
|
if op.save_values
|
156
|
-
code << "
|
171
|
+
code << indentify(" _ary << @result if _tmp\n", indent)
|
157
172
|
end
|
158
|
-
code << "
|
159
|
-
code << "
|
160
|
-
code << "
|
173
|
+
code << indentify(" break unless _tmp\n", indent)
|
174
|
+
code << indentify("end\n", indent)
|
175
|
+
code << indentify("_tmp = true\n", indent)
|
161
176
|
|
162
177
|
if op.save_values
|
163
|
-
code << "
|
178
|
+
code << indentify("@result = _ary\n", indent)
|
164
179
|
end
|
165
180
|
|
166
181
|
elsif op.min == 1 and !op.max
|
167
|
-
code << "
|
182
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
168
183
|
if op.save_values
|
169
|
-
code << "
|
184
|
+
code << indentify("_ary = []\n", indent)
|
170
185
|
end
|
171
|
-
output_op code, op.op
|
172
|
-
code << "
|
186
|
+
output_op code, op.op, indent
|
187
|
+
code << indentify("if _tmp\n", indent)
|
173
188
|
if op.save_values
|
174
|
-
code << "
|
189
|
+
code << indentify(" _ary << @result\n", indent)
|
175
190
|
end
|
176
|
-
code << "
|
177
|
-
code
|
178
|
-
output_op code, op.op
|
191
|
+
code << indentify(" while true\n", indent)
|
192
|
+
output_op code, op.op, (indent+2)
|
179
193
|
if op.save_values
|
180
|
-
code << "
|
194
|
+
code << indentify(" _ary << @result if _tmp\n", indent)
|
181
195
|
end
|
182
|
-
code << "
|
183
|
-
code << "
|
184
|
-
code << "
|
196
|
+
code << indentify(" break unless _tmp\n", indent)
|
197
|
+
code << indentify(" end\n", indent)
|
198
|
+
code << indentify(" _tmp = true\n", indent)
|
185
199
|
if op.save_values
|
186
|
-
code << "
|
200
|
+
code << indentify(" @result = _ary\n", indent)
|
187
201
|
end
|
188
|
-
code << "
|
189
|
-
code << "
|
190
|
-
code << "
|
202
|
+
code << indentify("else\n", indent)
|
203
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
204
|
+
code << indentify("end\n", indent)
|
191
205
|
else
|
192
|
-
code << "
|
193
|
-
code << "
|
194
|
-
code << "
|
195
|
-
code
|
196
|
-
|
197
|
-
code << "
|
198
|
-
code << "
|
199
|
-
code << "
|
200
|
-
code << "
|
201
|
-
code << "
|
202
|
-
code << "
|
203
|
-
code << "
|
204
|
-
code << "
|
205
|
-
code << "
|
206
|
-
code << "
|
207
|
-
code << "
|
208
|
-
code << "
|
209
|
-
code << " end\n"
|
206
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
207
|
+
code << indentify("_count = 0\n", indent)
|
208
|
+
code << indentify("while true\n", indent)
|
209
|
+
output_op code, op.op, (indent+1)
|
210
|
+
code << indentify(" if _tmp\n", indent)
|
211
|
+
code << indentify(" _count += 1\n", indent)
|
212
|
+
code << indentify(" break if _count == #{op.max}\n", indent)
|
213
|
+
code << indentify(" else\n", indent)
|
214
|
+
code << indentify(" break\n", indent)
|
215
|
+
code << indentify(" end\n", indent)
|
216
|
+
code << indentify("end\n", indent)
|
217
|
+
code << indentify("if _count >= #{op.min}\n", indent)
|
218
|
+
code << indentify(" _tmp = true\n", indent)
|
219
|
+
code << indentify("else\n", indent)
|
220
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
221
|
+
code << indentify(" _tmp = nil\n", indent)
|
222
|
+
code << indentify("end\n", indent)
|
210
223
|
end
|
211
224
|
|
212
225
|
when Sequence
|
213
226
|
ss = save()
|
214
|
-
code << "\n
|
215
|
-
code << "
|
227
|
+
code << "\n"
|
228
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
229
|
+
code << indentify("while true # sequence\n", indent)
|
216
230
|
op.ops.each_with_index do |n, idx|
|
217
|
-
output_op code, n
|
231
|
+
output_op code, n, (indent+1)
|
218
232
|
|
219
233
|
if idx == op.ops.size - 1
|
220
|
-
code << "
|
221
|
-
code << "
|
222
|
-
code << "
|
223
|
-
code << "
|
234
|
+
code << indentify(" unless _tmp\n", indent)
|
235
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
236
|
+
code << indentify(" end\n", indent)
|
237
|
+
code << indentify(" break\n", indent)
|
224
238
|
else
|
225
|
-
code << "
|
226
|
-
code << "
|
227
|
-
code << "
|
228
|
-
code << "
|
239
|
+
code << indentify(" unless _tmp\n", indent)
|
240
|
+
code << indentify(" self.pos = #{ss}\n", indent)
|
241
|
+
code << indentify(" break\n", indent)
|
242
|
+
code << indentify(" end\n", indent)
|
229
243
|
end
|
230
244
|
end
|
231
|
-
code << "
|
245
|
+
code << indentify("end # end sequence\n\n", indent)
|
232
246
|
when AndPredicate
|
233
247
|
ss = save()
|
234
|
-
code << "
|
248
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
235
249
|
if op.op.kind_of? Action
|
236
|
-
code << "
|
250
|
+
code << indentify("_tmp = begin; #{op.op.action}; end\n", indent)
|
237
251
|
else
|
238
|
-
output_op code, op.op
|
252
|
+
output_op code, op.op, indent
|
239
253
|
end
|
240
|
-
code << "
|
254
|
+
code << indentify("self.pos = #{ss}\n", indent)
|
241
255
|
when NotPredicate
|
242
256
|
ss = save()
|
243
|
-
code << "
|
257
|
+
code << indentify("#{ss} = self.pos\n", indent)
|
244
258
|
if op.op.kind_of? Action
|
245
|
-
code << "
|
259
|
+
code << indentify("_tmp = begin; #{op.op.action}; end\n", indent)
|
246
260
|
else
|
247
|
-
output_op code, op.op
|
261
|
+
output_op code, op.op, indent
|
248
262
|
end
|
249
|
-
code << "
|
250
|
-
code << "
|
263
|
+
code << indentify("_tmp = _tmp ? nil : true\n", indent)
|
264
|
+
code << indentify("self.pos = #{ss}\n", indent)
|
251
265
|
when RuleReference
|
252
|
-
code << "
|
266
|
+
code << indentify("_tmp = apply(:#{method_name op.rule_name})\n", indent)
|
253
267
|
when InvokeRule
|
254
268
|
if op.arguments
|
255
|
-
code << "
|
269
|
+
code << indentify("_tmp = #{method_name op.rule_name}#{op.arguments}\n", indent)
|
256
270
|
else
|
257
|
-
code << "
|
271
|
+
code << indentify("_tmp = #{method_name op.rule_name}()\n", indent)
|
258
272
|
end
|
259
273
|
when ForeignInvokeRule
|
260
274
|
if op.arguments
|
261
|
-
code << "
|
275
|
+
code << indentify("_tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name}, #{op.arguments[1..-2]})\n", indent)
|
262
276
|
else
|
263
|
-
code << "
|
277
|
+
code << indentify("_tmp = @_grammar_#{op.grammar_name}.external_invoke(self, :#{method_name op.rule_name})\n", indent)
|
264
278
|
end
|
265
279
|
when Tag
|
266
280
|
if op.tag_name and !op.tag_name.empty?
|
267
|
-
output_op code, op.op
|
268
|
-
code << "
|
281
|
+
output_op code, op.op, indent
|
282
|
+
code << indentify("#{op.tag_name} = @result\n", indent)
|
269
283
|
else
|
270
|
-
output_op code, op.op
|
284
|
+
output_op code, op.op, indent
|
271
285
|
end
|
272
286
|
when Action
|
273
|
-
code << "
|
287
|
+
code << indentify("@result = begin; ", indent)
|
274
288
|
code << op.action << "; end\n"
|
275
289
|
if @debug
|
276
|
-
code << "
|
290
|
+
code << indentify("puts \" => \" #{op.action.dump} \" => \#{@result.inspect} \\n\"\n", indent)
|
277
291
|
end
|
278
|
-
code << "
|
292
|
+
code << indentify("_tmp = true\n", indent)
|
279
293
|
when Collect
|
280
|
-
code << "
|
281
|
-
output_op code, op.op
|
282
|
-
code << "
|
283
|
-
code << "
|
284
|
-
code << "
|
294
|
+
code << indentify("_text_start = self.pos\n", indent)
|
295
|
+
output_op code, op.op, indent
|
296
|
+
code << indentify("if _tmp\n", indent)
|
297
|
+
code << indentify(" text = get_text(_text_start)\n", indent)
|
298
|
+
code << indentify("end\n", indent)
|
285
299
|
else
|
286
300
|
raise "Unknown op - #{op.class}"
|
287
301
|
end
|
@@ -339,13 +353,6 @@ module KPeg
|
|
339
353
|
code << " @_grammar_#{name} = #{gram}.new(nil)\n"
|
340
354
|
end
|
341
355
|
code << " end\n"
|
342
|
-
|
343
|
-
@grammar.foreign_grammars.each do |name, gram|
|
344
|
-
code << "\n"
|
345
|
-
code << " def invoke_#{name}(*args)\n"
|
346
|
-
code << " @_grammar_#{name}.external_invoke(self, :_root, *args)\n"
|
347
|
-
code << " end\n"
|
348
|
-
end
|
349
356
|
end
|
350
357
|
|
351
358
|
render = GrammarRenderer.new(@grammar)
|