citrus 1.2.1 → 1.2.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README +296 -36
- data/citrus.gemspec +4 -3
- data/doc/background.rdoc +72 -0
- data/doc/example.rdoc +128 -0
- data/doc/index.rdoc +15 -0
- data/doc/license.rdoc +21 -0
- data/doc/links.rdoc +18 -0
- data/doc/syntax.rdoc +96 -0
- data/lib/citrus.rb +3 -3
- data/lib/citrus/debug.rb +1 -1
- metadata +10 -4
data/README
CHANGED
@@ -7,59 +7,319 @@
|
|
7
7
|
|
8
8
|
Citrus is a compact and powerful parsing library for Ruby that combines the
|
9
9
|
elegance and expressiveness of the language with the simplicity and power of
|
10
|
-
parsing
|
10
|
+
parsing expressions.
|
11
11
|
|
12
|
-
Citrus grammars look very much like Treetop grammars but take a completely
|
13
|
-
different approach. Instead of generating parsers from your grammars, Citrus
|
14
|
-
evaluates grammars and rules in memory as Ruby modules. In fact, you can even
|
15
|
-
define your grammars as Ruby modules in the first place, entirely skipping the
|
16
|
-
parsing/evaluation step.
|
17
12
|
|
18
|
-
|
19
|
-
sequences, choices, labels, repetition, and lookahead (both positive and
|
20
|
-
negative) are all included, as well as character classes and the dot-matches-
|
21
|
-
anything symbol.
|
13
|
+
** Installation **
|
22
14
|
|
23
|
-
To try it out, fire up an IRB session from the root of the project and run one
|
24
|
-
of the examples.
|
25
15
|
|
26
|
-
|
27
|
-
> require 'citrus'
|
28
|
-
=> true
|
29
|
-
> Citrus.load 'examples/calc'
|
30
|
-
=> [Calc]
|
31
|
-
> match = Calc.parse '1 + 5'
|
32
|
-
=> #<Citrus::Match ...
|
33
|
-
> match.value
|
34
|
-
=> 6
|
16
|
+
Via RubyGems:
|
35
17
|
|
36
|
-
|
37
|
-
some better visualization of the match results.
|
18
|
+
$ sudo gem install citrus
|
38
19
|
|
39
|
-
|
40
|
-
fairly easy to understand for anyone who is familiar with parsing expressions.
|
20
|
+
From a local copy:
|
41
21
|
|
22
|
+
$ git clone git://github.com/mjijackson/citrus.git
|
23
|
+
$ cd citrus
|
24
|
+
$ rake package && sudo rake install
|
42
25
|
|
43
|
-
** Links **
|
44
26
|
|
27
|
+
** Background **
|
45
28
|
|
46
|
-
http://pdos.csail.mit.edu/~baford/packrat/
|
47
|
-
http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
48
|
-
http://treetop.rubyforge.org/index.html
|
49
29
|
|
30
|
+
In order to be able to use Citrus effectively, you must first understand the
|
31
|
+
difference between syntax and semantics. Syntax is a set of rules that govern
|
32
|
+
the way letters and punctuation may be used in a language. For example, English
|
33
|
+
syntax dictates that proper nouns should start with a capital letter and that
|
34
|
+
sentences should end with a period.
|
50
35
|
|
51
|
-
|
36
|
+
Semantics are the rules by which meaning may be derived in a language. For
|
37
|
+
example, as you read a book you are able to make some sense of the particular
|
38
|
+
way in which words on a page are combined to form thoughts and express ideas
|
39
|
+
because you understand what the words themselves mean and you can understand
|
40
|
+
what they mean collectively.
|
52
41
|
|
42
|
+
Computers use a similar process when interpreting code. First, the code must be
|
43
|
+
parsed into recognizable symbols or tokens. These tokens may then be passed to
|
44
|
+
an interpreter which is responsible for forming actual instructions from them.
|
53
45
|
|
54
|
-
|
46
|
+
Citrus is a pure Ruby library that allows you to perform both lexical analysis
|
47
|
+
and semantic interpretation quickly and easily. Using Citrus you can write
|
48
|
+
powerful parsers that are simple to understand and easy to create and maintain.
|
55
49
|
|
56
|
-
|
50
|
+
In Citrus, there are three main types of objects: rules, grammars, and matches.
|
57
51
|
|
58
|
-
|
52
|
+
== Rules
|
53
|
+
|
54
|
+
A rule is an object that specifies some matching behavior on a string. There are
|
55
|
+
two types of rules: terminals and non-terminals. Terminals can be either Ruby
|
56
|
+
strings or regular expressions that specify some input to match. For example, a
|
57
|
+
terminal created from the string "end" would match any sequence of the
|
58
|
+
characters "e", "n", and "d", in that order. A terminal created from a regular
|
59
|
+
expression uses Ruby's regular expression engine to attempt to create a match.
|
60
|
+
|
61
|
+
Non-terminals are rules that may contain other rules but do not themselves match
|
62
|
+
directly on the input. For example, a Repeat is a non-terminal that may contain
|
63
|
+
one other rule that will try and match a certain number of times. Several other
|
64
|
+
types of non-terminals are available that will be discussed later.
|
65
|
+
|
66
|
+
Rule objects may also have semantic information associated with them in the form
|
67
|
+
of Ruby modules. These modules contain methods that will be used to extend any
|
68
|
+
match objects created by the rule with which they are associated.
|
69
|
+
|
70
|
+
== Grammars
|
71
|
+
|
72
|
+
A grammar is a container for rules. Usually the rules in a grammar collectively
|
73
|
+
form a complete specification for some language, or a well-defined subset
|
74
|
+
thereof.
|
75
|
+
|
76
|
+
A Citrus grammar is really just a souped-up Ruby module. These modules may be
|
77
|
+
included in other grammar modules in the same way that Ruby modules are normally
|
78
|
+
used. This property allows you to divide a complex grammar into reusable pieces
|
79
|
+
that may be combined dynamically at runtime. Any grammar rule with the same name
|
80
|
+
as a rule in an included grammar may access that rule with a mechanism similar
|
81
|
+
to Ruby's super keyword.
|
82
|
+
|
83
|
+
== Matches
|
84
|
+
|
85
|
+
Matches are created by rule objects when they match on the input. A match
|
86
|
+
contains the string of text that made up the match as well as its offset in the
|
87
|
+
original input string. During a parse, matches are arranged in a tree structure
|
88
|
+
where any match may contain any number of other matches. This structure is
|
89
|
+
determined by the way in which the rule that generated each match is used in the
|
90
|
+
grammar.
|
91
|
+
|
92
|
+
For example, a match that is created from a non-terminal rule that contains
|
93
|
+
several other terminals will likewise contain several matches, one for each
|
94
|
+
terminal.
|
95
|
+
|
96
|
+
Match objects may be extended with semantic information in the form of methods.
|
97
|
+
These methods can interpret the text of a match using the wealth of information
|
98
|
+
available to them including the text of the match, its position in the input,
|
99
|
+
and any submatches.
|
100
|
+
|
101
|
+
|
102
|
+
** Syntax **
|
103
|
+
|
104
|
+
|
105
|
+
The most straightforward way to compose a Citrus grammar is to use Citrus' own
|
106
|
+
custom grammar syntax. This syntax borrows heavily from Ruby, so it should
|
107
|
+
already be familiar to Ruby programmers.
|
108
|
+
|
109
|
+
== Terminals
|
110
|
+
|
111
|
+
Terminals may be represented by a string or a regular expression. Both follow
|
112
|
+
the same rules as Ruby string and regular expression literals.
|
113
|
+
|
114
|
+
'abc'
|
115
|
+
"abc\n"
|
116
|
+
/\xFF/
|
117
|
+
|
118
|
+
Character classes and the dot (match anything) symbol are supported as well for
|
119
|
+
compatibility with other parsing expression implementations.
|
120
|
+
|
121
|
+
[a-z0-9] # match any lowercase letter or digit
|
122
|
+
[\x00-\xFF] # match any octet
|
123
|
+
. # match anything, even new lines
|
124
|
+
|
125
|
+
== Repetition
|
126
|
+
|
127
|
+
Quantifiers may be used after any expression to specify a number of times it
|
128
|
+
must match. The universal form of a quantifier is N*M where N is the minimum and
|
129
|
+
M is the maximum number of times the expression may match.
|
130
|
+
|
131
|
+
'abc'1*2 # match "abc" a minimum of one, maximum
|
132
|
+
# of two times
|
133
|
+
'abc'1* # match "abc" at least once
|
134
|
+
'abc'*2 # match "abc" a maximum of twice
|
135
|
+
|
136
|
+
The + and ? operators are supported as well for the common cases of 1* and *1
|
137
|
+
respectively.
|
138
|
+
|
139
|
+
'abc'+ # match "abc" at least once
|
140
|
+
'abc'? # match "abc" a maximum of once
|
141
|
+
|
142
|
+
== Lookahead
|
143
|
+
|
144
|
+
Both positive and negative lookahead are supported in Citrus. Use the & and !
|
145
|
+
operators to indicate that an expression either should or should not match. In
|
146
|
+
neither case is any input consumed.
|
147
|
+
|
148
|
+
&'a' 'b' # match a "b" preceded by an "a"
|
149
|
+
!'a' 'b' # match a "b" that is not preceded by an "a"
|
150
|
+
!'a' . # match any character except for "a"
|
151
|
+
|
152
|
+
== Sequences
|
153
|
+
|
154
|
+
Sequences of expressions may be separated by a space to indicate that the rules
|
155
|
+
should match in that order.
|
156
|
+
|
157
|
+
'a' 'b' 'c' # match "a", then "b", then "c"
|
158
|
+
'a' [0-9] # match "a", then a numeric digit
|
159
|
+
|
160
|
+
== Choices
|
161
|
+
|
162
|
+
Ordered choice is indicated by a vertical bar that separates two expressions.
|
163
|
+
Note that any operator binds more tightly than the bar.
|
59
164
|
|
60
|
-
|
61
|
-
|
62
|
-
|
165
|
+
'a' | 'b' # match "a" or "b"
|
166
|
+
'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
|
167
|
+
|
168
|
+
== Super
|
169
|
+
|
170
|
+
When including a grammar inside another, all rules in the child that have the
|
171
|
+
same name as a rule in the parent also have access to the super keyword to
|
172
|
+
invoke the parent rule.
|
173
|
+
|
174
|
+
== Labels
|
175
|
+
|
176
|
+
Match objects may be referred to by a different name than the rule that
|
177
|
+
originally generated them. Labels are created by placing the label and a colon
|
178
|
+
immediately preceding any expression.
|
179
|
+
|
180
|
+
chars:/[a-z]+/ # the characters matched by the regular
|
181
|
+
# expression may be referred to as "chars"
|
182
|
+
# in a block method
|
183
|
+
|
184
|
+
|
185
|
+
** Example **
|
186
|
+
|
187
|
+
|
188
|
+
Below is an example of a simple grammar that is able to parse strings of
|
189
|
+
integers separated by any amount of white space and a + symbol.
|
190
|
+
|
191
|
+
grammar Addition
|
192
|
+
rule additive
|
193
|
+
number plus (additive | number)
|
194
|
+
end
|
195
|
+
|
196
|
+
rule number
|
197
|
+
[0-9]+ space
|
198
|
+
end
|
199
|
+
|
200
|
+
rule plus
|
201
|
+
'+' space
|
202
|
+
end
|
203
|
+
|
204
|
+
rule space
|
205
|
+
[ \t]*
|
206
|
+
end
|
207
|
+
end
|
208
|
+
|
209
|
+
Several things to note about the above example:
|
210
|
+
|
211
|
+
* Grammar and rule declarations end with the "end" keyword
|
212
|
+
* A Sequence of rules is created by separating expressions with a space
|
213
|
+
* Likewise, ordered choice is represented with a vertical bar
|
214
|
+
* Parentheses may be used to override the natural binding order
|
215
|
+
* Rules may refer to other rules in their own definitions simply by using the
|
216
|
+
other rule's name
|
217
|
+
* Any expression may be followed by a quantifier
|
218
|
+
|
219
|
+
== Interpretation
|
220
|
+
|
221
|
+
The grammar above is able to parse simple mathematical expressions such as "1+2"
|
222
|
+
and "1 + 2+3", but it does not have enough semantic information to be able to
|
223
|
+
actually interpret these expressions.
|
224
|
+
|
225
|
+
At this point, when the grammar parses a string it generates a tree of Match
|
226
|
+
objects. Each match is created by a rule. A match will know what text it
|
227
|
+
contains, its offset in the original input, and what submatches it contains.
|
228
|
+
|
229
|
+
Submatches are created whenever a rule contains another rule. For example, in
|
230
|
+
the grammar above the number rule matches a string of digits followed by white
|
231
|
+
space. Thus, a match generated by the number rule will contain two submatches.
|
232
|
+
|
233
|
+
We can use Ruby's block syntax to create a module that will be attached to these
|
234
|
+
matches when they are created and is used to lazily extend them when we want to
|
235
|
+
interpret them. The following example shows one way to do this.
|
236
|
+
|
237
|
+
grammar Addition
|
238
|
+
rule additive
|
239
|
+
(number plus term) {
|
240
|
+
def value
|
241
|
+
number.value + term.value
|
242
|
+
end
|
243
|
+
}
|
244
|
+
end
|
245
|
+
|
246
|
+
rule term
|
247
|
+
(additive | number) {
|
248
|
+
def value
|
249
|
+
first.value
|
250
|
+
end
|
251
|
+
}
|
252
|
+
end
|
253
|
+
|
254
|
+
rule number
|
255
|
+
([0-9]+ space) {
|
256
|
+
def value
|
257
|
+
text.strip.to_i
|
258
|
+
end
|
259
|
+
}
|
260
|
+
end
|
261
|
+
|
262
|
+
rule plus
|
263
|
+
'+' space
|
264
|
+
end
|
265
|
+
|
266
|
+
rule space
|
267
|
+
[ \t]*
|
268
|
+
end
|
269
|
+
end
|
270
|
+
|
271
|
+
In this version of the grammar the additive rule has been refactored to use the
|
272
|
+
term rule. This makes it a little cleaner to define our semantic blocks. It's
|
273
|
+
easiest to explain what is going on here by starting with the lowest level
|
274
|
+
block, which is defined within the number rule.
|
275
|
+
|
276
|
+
The semantic block associated with the number rule defines one method, value.
|
277
|
+
This method will be present on all matches that result from this rule. Inside
|
278
|
+
this method, we can see that the value of a number match is determined to be
|
279
|
+
its text value, stripped of white space and converted to an integer.
|
280
|
+
|
281
|
+
Similarly, the block that is applied to term matches also defines a value
|
282
|
+
method. However, this method works a bit differently. Since a term matches an
|
283
|
+
additive or a number a term match will contain one submatch, the match that
|
284
|
+
resulted from either additive or number. The first method retrieves the first
|
285
|
+
submatch. So, the value of a term is determined to be the value of its first
|
286
|
+
submatch.
|
287
|
+
|
288
|
+
Finally, the additive rule also extends its matches with a value method. Here,
|
289
|
+
the value of an additive is determined to be the values of its number and term
|
290
|
+
matches added together using Ruby's addition operator.
|
291
|
+
|
292
|
+
Since additive is the first rule defined in the grammar, any match that results
|
293
|
+
from parsing a string with this grammar will have a value method that can be
|
294
|
+
used to recursively calculate the collective value of the entire match tree.
|
295
|
+
|
296
|
+
To give it a try, save the code for the Addition grammar in a file called
|
297
|
+
addition.citrus. Next, assuming you have the Citrus gem installed, try the
|
298
|
+
following sequence of commands in a terminal.
|
299
|
+
|
300
|
+
$ irb
|
301
|
+
> require 'citrus'
|
302
|
+
=> true
|
303
|
+
> Citrus.load 'addition'
|
304
|
+
=> [Addition]
|
305
|
+
> m = Addition.parse '1 + 2 + 3'
|
306
|
+
=> #<Citrus::Match ...
|
307
|
+
> m.value
|
308
|
+
=> 6
|
309
|
+
|
310
|
+
Congratulations! You just ran your first piece of Citrus code.
|
311
|
+
|
312
|
+
Take a look at examples/calc.citrus for an example of a calculator that is able
|
313
|
+
to parse and evaluate more complex mathematical expressions.
|
314
|
+
|
315
|
+
|
316
|
+
** Links **
|
317
|
+
|
318
|
+
|
319
|
+
http://mjijackson.com/citrus
|
320
|
+
http://pdos.csail.mit.edu/~baford/packrat/
|
321
|
+
http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
322
|
+
http://treetop.rubyforge.org/index.html
|
63
323
|
|
64
324
|
|
65
325
|
** License **
|
data/citrus.gemspec
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'citrus'
|
3
|
-
s.version = '1.2.
|
4
|
-
s.date = '2010-06-
|
3
|
+
s.version = '1.2.2'
|
4
|
+
s.date = '2010-06-09'
|
5
5
|
|
6
6
|
s.summary = 'Parsing Expressions for Ruby'
|
7
7
|
s.description = 'Parsing Expressions for Ruby'
|
@@ -14,6 +14,7 @@ Gem::Specification.new do |s|
|
|
14
14
|
s.files = Dir['benchmark/*.rb'] +
|
15
15
|
Dir['benchmark/*.citrus'] +
|
16
16
|
Dir['benchmark/*.gnuplot'] +
|
17
|
+
Dir['doc/**/*'] +
|
17
18
|
Dir['examples/**/*'] +
|
18
19
|
Dir['extras/**/*'] +
|
19
20
|
Dir['lib/**/*.rb'] +
|
@@ -29,5 +30,5 @@ Gem::Specification.new do |s|
|
|
29
30
|
s.rdoc_options = %w< --line-numbers --inline-source --title Citrus --main Citrus >
|
30
31
|
s.extra_rdoc_files = %w< README >
|
31
32
|
|
32
|
-
s.homepage = 'http://
|
33
|
+
s.homepage = 'http://mjijackson.com/citrus'
|
33
34
|
end
|
data/doc/background.rdoc
ADDED
@@ -0,0 +1,72 @@
|
|
1
|
+
= Background
|
2
|
+
|
3
|
+
In order to be able to use Citrus effectively, you must first understand the
|
4
|
+
difference between syntax and semantics. Syntax is a set of rules that govern
|
5
|
+
the way letters and punctuation may be used in a language. For example, English
|
6
|
+
syntax dictates that proper nouns should start with a capital letter and that
|
7
|
+
sentences should end with a period.
|
8
|
+
|
9
|
+
Semantics are the rules by which meaning may be derived in a language. For
|
10
|
+
example, as you read a book you are able to make some sense of the particular
|
11
|
+
way in which words on a page are combined to form thoughts and express ideas
|
12
|
+
because you understand what the words themselves mean and you can understand
|
13
|
+
what they mean collectively.
|
14
|
+
|
15
|
+
Computers use a similar process when interpreting code. First, the code must be
|
16
|
+
parsed into recognizable symbols or tokens. These tokens may then be passed to
|
17
|
+
an interpreter which is responsible for forming actual instructions from them.
|
18
|
+
|
19
|
+
Citrus is a pure Ruby library that allows you to perform both lexical analysis
|
20
|
+
and semantic interpretation quickly and easily. Using Citrus you can write
|
21
|
+
powerful parsers that are simple to understand and easy to create and maintain.
|
22
|
+
|
23
|
+
In Citrus, there are three main types of objects: rules, grammars, and matches.
|
24
|
+
|
25
|
+
== Rules
|
26
|
+
|
27
|
+
A Rule[link:api/classes/Citrus/Rule.html] is an object that specifies some matching behavior on a string. There are
|
28
|
+
two types of rules: terminals and non-terminals. Terminals can be either Ruby
|
29
|
+
strings or regular expressions that specify some input to match. For example, a
|
30
|
+
terminal created from the string "end" would match any sequence of the
|
31
|
+
characters "e", "n", and "d", in that order. A terminal created from a regular
|
32
|
+
expression uses Ruby's regular expression engine to attempt to create a match.
|
33
|
+
|
34
|
+
Non-terminals are rules that may contain other rules but do not themselves match
|
35
|
+
directly on the input. For example, a Repeat is a non-terminal that may contain
|
36
|
+
one other rule that will try and match a certain number of times. Several other
|
37
|
+
types of non-terminals are available that will be discussed later.
|
38
|
+
|
39
|
+
Rule objects may also have semantic information associated with them in the form
|
40
|
+
of Ruby modules. These modules contain methods that will be used to extend any
|
41
|
+
match objects created by the rule with which they are associated.
|
42
|
+
|
43
|
+
== Grammars
|
44
|
+
|
45
|
+
A Grammar[link:api/classes/Citrus/Grammar.html] is a container for rules. Usually the rules in a grammar collectively
|
46
|
+
form a complete specification for some language, or a well-defined subset
|
47
|
+
thereof.
|
48
|
+
|
49
|
+
A Citrus grammar is really just a souped-up Ruby module. These modules may be
|
50
|
+
included in other grammar modules in the same way that Ruby modules are normally
|
51
|
+
used. This property allows you to divide a complex grammar into reusable pieces
|
52
|
+
that may be combined dynamically at runtime. Any grammar rule with the same name
|
53
|
+
as a rule in an included grammar may access that rule with a mechanism similar
|
54
|
+
to Ruby's super keyword.
|
55
|
+
|
56
|
+
== Matches
|
57
|
+
|
58
|
+
Matches are created by rule objects when they match on the input. A Match[link:api/classes/Citrus/Match.html]
|
59
|
+
contains the string of text that made up the match as well as its offset in the
|
60
|
+
original input string. During a parse, matches are arranged in a tree structure
|
61
|
+
where any match may contain any number of other matches. This structure is
|
62
|
+
determined by the way in which the rule that generated each match is used in the
|
63
|
+
grammar.
|
64
|
+
|
65
|
+
For example, a match that is created from a non-terminal rule that contains
|
66
|
+
several other terminals will likewise contain several matches, one for each
|
67
|
+
terminal.
|
68
|
+
|
69
|
+
Match objects may be extended with semantic information in the form of methods.
|
70
|
+
These methods can interpret the text of a match using the wealth of information
|
71
|
+
available to them including the text of the match, its position in the input,
|
72
|
+
and any submatches.
|
data/doc/example.rdoc
ADDED
@@ -0,0 +1,128 @@
|
|
1
|
+
= Example
|
2
|
+
|
3
|
+
Below is an example of a simple grammar that is able to parse strings of
|
4
|
+
integers separated by any amount of white space and a <tt>+</tt> symbol.
|
5
|
+
|
6
|
+
grammar Addition
|
7
|
+
rule additive
|
8
|
+
number plus (additive | number)
|
9
|
+
end
|
10
|
+
|
11
|
+
rule number
|
12
|
+
[0-9]+ space
|
13
|
+
end
|
14
|
+
|
15
|
+
rule plus
|
16
|
+
'+' space
|
17
|
+
end
|
18
|
+
|
19
|
+
rule space
|
20
|
+
[ \t]*
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
Several things to note about the above example:
|
25
|
+
|
26
|
+
* Grammar and rule declarations end with the <tt>end</tt> keyword
|
27
|
+
* A Sequence of rules is created by separating expressions with a space
|
28
|
+
* Likewise, ordered choice is represented with a vertical bar
|
29
|
+
* Parentheses may be used to override the natural binding order
|
30
|
+
* Rules may refer to other rules in their own definitions simply by using the
|
31
|
+
other rule's name
|
32
|
+
* Any expression may be followed by a quantifier
|
33
|
+
|
34
|
+
== Interpretation
|
35
|
+
|
36
|
+
The grammar above is able to parse simple mathematical expressions such as "1+2"
|
37
|
+
and "1 + 2+3", but it does not have enough semantic information to be able to
|
38
|
+
actually interpret these expressions.
|
39
|
+
|
40
|
+
At this point, when the grammar parses a string it generates a tree of Match[link:api/classes/Citrus/Match.html]
|
41
|
+
objects. Each match is created by a rule. A match will know what text it
|
42
|
+
contains, its offset in the original input, and what submatches it contains.
|
43
|
+
|
44
|
+
Submatches are created whenever a rule contains another rule. For example, in
|
45
|
+
the grammar above the number rule matches a string of digits followed by white
|
46
|
+
space. Thus, a match generated by the number rule will contain two submatches.
|
47
|
+
|
48
|
+
We can use Ruby's block syntax to create a module that will be attached to these
|
49
|
+
matches when they are created and is used to lazily extend them when we want to
|
50
|
+
interpret them. The following example shows one way to do this.
|
51
|
+
|
52
|
+
grammar Addition
|
53
|
+
rule additive
|
54
|
+
(number plus term) {
|
55
|
+
def value
|
56
|
+
number.value + term.value
|
57
|
+
end
|
58
|
+
}
|
59
|
+
end
|
60
|
+
|
61
|
+
rule term
|
62
|
+
(additive | number) {
|
63
|
+
def value
|
64
|
+
first.value
|
65
|
+
end
|
66
|
+
}
|
67
|
+
end
|
68
|
+
|
69
|
+
rule number
|
70
|
+
([0-9]+ space) {
|
71
|
+
def value
|
72
|
+
text.strip.to_i
|
73
|
+
end
|
74
|
+
}
|
75
|
+
end
|
76
|
+
|
77
|
+
rule plus
|
78
|
+
'+' space
|
79
|
+
end
|
80
|
+
|
81
|
+
rule space
|
82
|
+
[ \t]*
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
In this version of the grammar the additive rule has been refactored to use the
|
87
|
+
term rule. This makes it a little cleaner to define our semantic blocks. It's
|
88
|
+
easiest to explain what is going on here by starting with the lowest level
|
89
|
+
block, which is defined within the number rule.
|
90
|
+
|
91
|
+
The semantic block associated with the number rule defines one method, value.
|
92
|
+
This method will be present on all matches that result from this rule. Inside
|
93
|
+
this method, we can see that the value of a number match is determined to be
|
94
|
+
its text value, stripped of white space and converted to an integer.
|
95
|
+
|
96
|
+
Similarly, the block that is applied to term matches also defines a value
|
97
|
+
method. However, this method works a bit differently. Since a term matches an
|
98
|
+
additive or a number a term match will contain one submatch, the match that
|
99
|
+
resulted from either additive or number. The first method retrieves the first
|
100
|
+
submatch. So, the value of a term is determined to be the value of its first
|
101
|
+
submatch.
|
102
|
+
|
103
|
+
Finally, the additive rule also extends its matches with a value method. Here,
|
104
|
+
the value of an additive is determined to be the values of its number and term
|
105
|
+
matches added together using Ruby's addition operator.
|
106
|
+
|
107
|
+
Since additive is the first rule defined in the grammar, any match that results
|
108
|
+
from parsing a string with this grammar will have a value method that can be
|
109
|
+
used to recursively calculate the collective value of the entire match tree.
|
110
|
+
|
111
|
+
To give it a try, save the code for the Addition grammar in a file called
|
112
|
+
addition.citrus. Next, assuming you have the Citrus gem installed, try the
|
113
|
+
following sequence of commands in a terminal.
|
114
|
+
|
115
|
+
$ irb
|
116
|
+
> require 'citrus'
|
117
|
+
=> true
|
118
|
+
> Citrus.load 'addition'
|
119
|
+
=> [Addition]
|
120
|
+
> m = Addition.parse '1 + 2 + 3'
|
121
|
+
=> #<Citrus::Match ...
|
122
|
+
> m.value
|
123
|
+
=> 6
|
124
|
+
|
125
|
+
Congratulations! You just ran your first piece of Citrus code.
|
126
|
+
|
127
|
+
Take a look at examples/calc.citrus[http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus] for an example of a calculator that is able
|
128
|
+
to parse and evaluate more complex mathematical expressions.
|
data/doc/index.rdoc
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
Citrus is a compact and powerful parsing library for Ruby[http://ruby-lang.org/] that combines the
|
2
|
+
elegance and expressiveness of the language with the simplicity and power of
|
3
|
+
parsing expressions.
|
4
|
+
|
5
|
+
= Installation
|
6
|
+
|
7
|
+
Via RubyGems[http://rubygems.org/]:
|
8
|
+
|
9
|
+
$ sudo gem install citrus
|
10
|
+
|
11
|
+
From a local copy:
|
12
|
+
|
13
|
+
$ git clone git://github.com/mjijackson/citrus.git
|
14
|
+
$ cd citrus
|
15
|
+
$ rake package && sudo rake install
|
data/doc/license.rdoc
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
= License
|
2
|
+
|
3
|
+
Copyright 2010 Michael Jackson
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/doc/links.rdoc
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
= Links
|
2
|
+
|
3
|
+
The primary resource for all things to do with parsing expressions can be found
|
4
|
+
at MIT.
|
5
|
+
|
6
|
+
http://pdos.csail.mit.edu/~baford/packrat
|
7
|
+
|
8
|
+
A useful summary of parsing expression grammars can be found on Wikipedia as
|
9
|
+
well.
|
10
|
+
|
11
|
+
http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
12
|
+
|
13
|
+
Citrus draws inspiration from another Ruby library for writing parsing
|
14
|
+
expression grammars, Treetop. While Citrus' syntax is similar to that of
|
15
|
+
Treetop, it's not identical. The link is included here for those who may wish to
|
16
|
+
explore an alternative implementation.
|
17
|
+
|
18
|
+
http://treetop.rubyforge.org
|
data/doc/syntax.rdoc
ADDED
@@ -0,0 +1,96 @@
|
|
1
|
+
= Syntax
|
2
|
+
|
3
|
+
The most straightforward way to compose a Citrus grammar is to use Citrus' own
|
4
|
+
custom grammar syntax. This syntax borrows heavily from Ruby, so it should
|
5
|
+
already be familiar to Ruby programmers.
|
6
|
+
|
7
|
+
== Terminals
|
8
|
+
|
9
|
+
Terminals may be represented by a string or a regular expression. Both follow
|
10
|
+
the same rules as Ruby string and regular expression literals.
|
11
|
+
|
12
|
+
'abc'
|
13
|
+
"abc\n"
|
14
|
+
/\xFF/
|
15
|
+
|
16
|
+
Character classes and the dot (match anything) symbol are supported as well for
|
17
|
+
compatibility with other parsing expression implementations.
|
18
|
+
|
19
|
+
[a-z0-9] # match any lowercase letter or digit
|
20
|
+
[\x00-\xFF] # match any octet
|
21
|
+
. # match anything, even new lines
|
22
|
+
|
23
|
+
See FixedWidth[link:api/classes/Citrus/FixedWidth.html] and
|
24
|
+
Expression[link:api/classes/Citrus/Expression.html] for more information.
|
25
|
+
|
26
|
+
== Repetition
|
27
|
+
|
28
|
+
Quantifiers may be used after any expression to specify a number of times it
|
29
|
+
must match. The universal form of a quantifier is N*M where N is the minimum and
|
30
|
+
M is the maximum number of times the expression may match.
|
31
|
+
|
32
|
+
'abc'1*2 # match "abc" a minimum of one, maximum
|
33
|
+
# of two times
|
34
|
+
'abc'1* # match "abc" at least once
|
35
|
+
'abc'*2 # match "abc" a maximum of twice
|
36
|
+
|
37
|
+
The + and ? operators are supported as well for the common cases of 1* and *1
|
38
|
+
respectively.
|
39
|
+
|
40
|
+
'abc'+ # match "abc" at least once
|
41
|
+
'abc'? # match "abc" a maximum of once
|
42
|
+
|
43
|
+
See Repeat[link:api/classes/Citrus/Repeat.html] for more information.
|
44
|
+
|
45
|
+
== Lookahead
|
46
|
+
|
47
|
+
Both positive and negative lookahead are supported in Citrus. Use the & and !
|
48
|
+
operators to indicate that an expression either should or should not match. In
|
49
|
+
neither case is any input consumed.
|
50
|
+
|
51
|
+
&'a' 'b' # match a "b" preceded by an "a"
|
52
|
+
!'a' 'b' # match a "b" that is not preceded by an "a"
|
53
|
+
!'a' . # match any character except for "a"
|
54
|
+
|
55
|
+
See AndPredicate[link:api/classes/Citrus/AndPredicate.html] and
|
56
|
+
NotPredicate[link:api/classes/Citrus/NotPredicate.html] for more information.
|
57
|
+
|
58
|
+
== Sequences
|
59
|
+
|
60
|
+
Sequences of expressions may be separated by a space to indicate that the rules
|
61
|
+
should match in that order.
|
62
|
+
|
63
|
+
'a' 'b' 'c' # match "a", then "b", then "c"
|
64
|
+
'a' [0-9] # match "a", then a numeric digit
|
65
|
+
|
66
|
+
See Sequence[link:api/classes/Citrus/Sequence.html] for more information.
|
67
|
+
|
68
|
+
== Choices
|
69
|
+
|
70
|
+
Ordered choice is indicated by a vertical bar that separates two expressions.
|
71
|
+
Note that any operator binds more tightly than the bar.
|
72
|
+
|
73
|
+
'a' | 'b' # match "a" or "b"
|
74
|
+
'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
|
75
|
+
|
76
|
+
See Choice[link:api/classes/Citrus/Choice.html] for more information.
|
77
|
+
|
78
|
+
== Super
|
79
|
+
|
80
|
+
When including a grammar inside another, all rules in the child that have the
|
81
|
+
same name as a rule in the parent also have access to the super keyword to
|
82
|
+
invoke the parent rule.
|
83
|
+
|
84
|
+
See Super[link:api/classes/Citrus/Super.html] for more information.
|
85
|
+
|
86
|
+
== Labels
|
87
|
+
|
88
|
+
Match objects may be referred to by a different name than the rule that
|
89
|
+
originally generated them. Labels are created by placing the label and a colon
|
90
|
+
immediately preceding any expression.
|
91
|
+
|
92
|
+
chars:/[a-z]+/ # the characters matched by the regular
|
93
|
+
# expression may be referred to as "chars"
|
94
|
+
# in a block method
|
95
|
+
|
96
|
+
See Label[link:api/classes/Citrus/Label.html] for more information.
|
data/lib/citrus.rb
CHANGED
@@ -1,10 +1,10 @@
|
|
1
1
|
# Citrus is a compact and powerful parsing library for Ruby that combines the
|
2
2
|
# elegance and expressiveness of the language with the simplicity and power of
|
3
|
-
# parsing
|
3
|
+
# parsing expressions.
|
4
4
|
#
|
5
|
-
# http://
|
5
|
+
# http://mjijackson.com/citrus
|
6
6
|
module Citrus
|
7
|
-
VERSION = [1, 2,
|
7
|
+
VERSION = [1, 2, 2]
|
8
8
|
|
9
9
|
Infinity = 1.0 / 0
|
10
10
|
|
data/lib/citrus/debug.rb
CHANGED
@@ -7,7 +7,7 @@ module Citrus
|
|
7
7
|
# inspecting a nested match. The +xml+ argument may be a Hash of
|
8
8
|
# Builder::XmlMarkup options.
|
9
9
|
def to_markup(xml={})
|
10
|
-
if xml
|
10
|
+
if Hash === xml
|
11
11
|
opt = { :indent => 2 }.merge(xml)
|
12
12
|
xml = Builder::XmlMarkup.new(opt)
|
13
13
|
xml.instruct!
|
metadata
CHANGED
@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
|
|
5
5
|
segments:
|
6
6
|
- 1
|
7
7
|
- 2
|
8
|
-
-
|
9
|
-
version: 1.2.
|
8
|
+
- 2
|
9
|
+
version: 1.2.2
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- Michael Jackson
|
@@ -14,7 +14,7 @@ autorequire:
|
|
14
14
|
bindir: bin
|
15
15
|
cert_chain: []
|
16
16
|
|
17
|
-
date: 2010-06-
|
17
|
+
date: 2010-06-09 00:00:00 -06:00
|
18
18
|
default_executable:
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|
@@ -53,6 +53,12 @@ files:
|
|
53
53
|
- benchmark/seqpar.rb
|
54
54
|
- benchmark/seqpar.citrus
|
55
55
|
- benchmark/seqpar.gnuplot
|
56
|
+
- doc/background.rdoc
|
57
|
+
- doc/example.rdoc
|
58
|
+
- doc/index.rdoc
|
59
|
+
- doc/license.rdoc
|
60
|
+
- doc/links.rdoc
|
61
|
+
- doc/syntax.rdoc
|
56
62
|
- examples/calc.citrus
|
57
63
|
- examples/calc.rb
|
58
64
|
- examples/calc_sugar.rb
|
@@ -83,7 +89,7 @@ files:
|
|
83
89
|
- Rakefile
|
84
90
|
- README
|
85
91
|
has_rdoc: true
|
86
|
-
homepage: http://
|
92
|
+
homepage: http://mjijackson.com/citrus
|
87
93
|
licenses: []
|
88
94
|
|
89
95
|
post_install_message:
|