citrus 1.2.1 → 1.2.2

Sign up to get free protection for your applications and to get access to all the features.
data/README CHANGED
@@ -7,59 +7,319 @@
7
7
 
8
8
  Citrus is a compact and powerful parsing library for Ruby that combines the
9
9
  elegance and expressiveness of the language with the simplicity and power of
10
- parsing expression grammars.
10
+ parsing expressions.
11
11
 
12
- Citrus grammars look very much like Treetop grammars but take a completely
13
- different approach. Instead of generating parsers from your grammars, Citrus
14
- evaluates grammars and rules in memory as Ruby modules. In fact, you can even
15
- define your grammars as Ruby modules in the first place, entirely skipping the
16
- parsing/evaluation step.
17
12
 
18
- Terminals are represented as either strings or regular expressions. Support for
19
- sequences, choices, labels, repetition, and lookahead (both positive and
20
- negative) are all included, as well as character classes and the dot-matches-
21
- anything symbol.
13
+ ** Installation **
22
14
 
23
- To try it out, fire up an IRB session from the root of the project and run one
24
- of the examples.
25
15
 
26
- $ irb -Ilib
27
- > require 'citrus'
28
- => true
29
- > Citrus.load 'examples/calc'
30
- => [Calc]
31
- > match = Calc.parse '1 + 5'
32
- => #<Citrus::Match ...
33
- > match.value
34
- => 6
16
+ Via RubyGems:
35
17
 
36
- Be sure to try requiring `citrus/debug' (instead of just `citrus') if you'd like
37
- some better visualization of the match results.
18
+ $ sudo gem install citrus
38
19
 
39
- The code base is very small and it's well-documented and tested, so it should be
40
- fairly easy to understand for anyone who is familiar with parsing expressions.
20
+ From a local copy:
41
21
 
22
+ $ git clone git://github.com/mjijackson/citrus.git
23
+ $ cd citrus
24
+ $ rake package && sudo rake install
42
25
 
43
- ** Links **
44
26
 
27
+ ** Background **
45
28
 
46
- http://pdos.csail.mit.edu/~baford/packrat/
47
- http://en.wikipedia.org/wiki/Parsing_expression_grammar
48
- http://treetop.rubyforge.org/index.html
49
29
 
30
+ In order to be able to use Citrus effectively, you must first understand the
31
+ difference between syntax and semantics. Syntax is a set of rules that govern
32
+ the way letters and punctuation may be used in a language. For example, English
33
+ syntax dictates that proper nouns should start with a capital letter and that
34
+ sentences should end with a period.
50
35
 
51
- ** Installation **
36
+ Semantics are the rules by which meaning may be derived in a language. For
37
+ example, as you read a book you are able to make some sense of the particular
38
+ way in which words on a page are combined to form thoughts and express ideas
39
+ because you understand what the words themselves mean and you can understand
40
+ what they mean collectively.
52
41
 
42
+ Computers use a similar process when interpreting code. First, the code must be
43
+ parsed into recognizable symbols or tokens. These tokens may then be passed to
44
+ an interpreter which is responsible for forming actual instructions from them.
53
45
 
54
- Via RubyGems:
46
+ Citrus is a pure Ruby library that allows you to perform both lexical analysis
47
+ and semantic interpretation quickly and easily. Using Citrus you can write
48
+ powerful parsers that are simple to understand and easy to create and maintain.
55
49
 
56
- $ sudo gem install citrus
50
+ In Citrus, there are three main types of objects: rules, grammars, and matches.
57
51
 
58
- From a local copy:
52
+ == Rules
53
+
54
+ A rule is an object that specifies some matching behavior on a string. There are
55
+ two types of rules: terminals and non-terminals. Terminals can be either Ruby
56
+ strings or regular expressions that specify some input to match. For example, a
57
+ terminal created from the string "end" would match any sequence of the
58
+ characters "e", "n", and "d", in that order. A terminal created from a regular
59
+ expression uses Ruby's regular expression engine to attempt to create a match.
60
+
61
+ Non-terminals are rules that may contain other rules but do not themselves match
62
+ directly on the input. For example, a Repeat is a non-terminal that may contain
63
+ one other rule that will try and match a certain number of times. Several other
64
+ types of non-terminals are available that will be discussed later.
65
+
66
+ Rule objects may also have semantic information associated with them in the form
67
+ of Ruby modules. These modules contain methods that will be used to extend any
68
+ match objects created by the rule with which they are associated.
69
+
70
+ == Grammars
71
+
72
+ A grammar is a container for rules. Usually the rules in a grammar collectively
73
+ form a complete specification for some language, or a well-defined subset
74
+ thereof.
75
+
76
+ A Citrus grammar is really just a souped-up Ruby module. These modules may be
77
+ included in other grammar modules in the same way that Ruby modules are normally
78
+ used. This property allows you to divide a complex grammar into reusable pieces
79
+ that may be combined dynamically at runtime. Any grammar rule with the same name
80
+ as a rule in an included grammar may access that rule with a mechanism similar
81
+ to Ruby's super keyword.
82
+
83
+ == Matches
84
+
85
+ Matches are created by rule objects when they match on the input. A match
86
+ contains the string of text that made up the match as well as its offset in the
87
+ original input string. During a parse, matches are arranged in a tree structure
88
+ where any match may contain any number of other matches. This structure is
89
+ determined by the way in which the rule that generated each match is used in the
90
+ grammar.
91
+
92
+ For example, a match that is created from a non-terminal rule that contains
93
+ several other terminals will likewise contain several matches, one for each
94
+ terminal.
95
+
96
+ Match objects may be extended with semantic information in the form of methods.
97
+ These methods can interpret the text of a match using the wealth of information
98
+ available to them including the text of the match, its position in the input,
99
+ and any submatches.
100
+
101
+
102
+ ** Syntax **
103
+
104
+
105
+ The most straightforward way to compose a Citrus grammar is to use Citrus' own
106
+ custom grammar syntax. This syntax borrows heavily from Ruby, so it should
107
+ already be familiar to Ruby programmers.
108
+
109
+ == Terminals
110
+
111
+ Terminals may be represented by a string or a regular expression. Both follow
112
+ the same rules as Ruby string and regular expression literals.
113
+
114
+ 'abc'
115
+ "abc\n"
116
+ /\xFF/
117
+
118
+ Character classes and the dot (match anything) symbol are supported as well for
119
+ compatibility with other parsing expression implementations.
120
+
121
+ [a-z0-9] # match any lowercase letter or digit
122
+ [\x00-\xFF] # match any octet
123
+ . # match anything, even new lines
124
+
125
+ == Repetition
126
+
127
+ Quantifiers may be used after any expression to specify a number of times it
128
+ must match. The universal form of a quantifier is N*M where N is the minimum and
129
+ M is the maximum number of times the expression may match.
130
+
131
+ 'abc'1*2 # match "abc" a minimum of one, maximum
132
+ # of two times
133
+ 'abc'1* # match "abc" at least once
134
+ 'abc'*2 # match "abc" a maximum of twice
135
+
136
+ The + and ? operators are supported as well for the common cases of 1* and *1
137
+ respectively.
138
+
139
+ 'abc'+ # match "abc" at least once
140
+ 'abc'? # match "abc" a maximum of once
141
+
142
+ == Lookahead
143
+
144
+ Both positive and negative lookahead are supported in Citrus. Use the & and !
145
+ operators to indicate that an expression either should or should not match. In
146
+ neither case is any input consumed.
147
+
148
+ &'a' 'b' # match a "b" preceded by an "a"
149
+ !'a' 'b' # match a "b" that is not preceded by an "a"
150
+ !'a' . # match any character except for "a"
151
+
152
+ == Sequences
153
+
154
+ Sequences of expressions may be separated by a space to indicate that the rules
155
+ should match in that order.
156
+
157
+ 'a' 'b' 'c' # match "a", then "b", then "c"
158
+ 'a' [0-9] # match "a", then a numeric digit
159
+
160
+ == Choices
161
+
162
+ Ordered choice is indicated by a vertical bar that separates two expressions.
163
+ Note that any operator binds more tightly than the bar.
59
164
 
60
- $ git clone git://github.com/mjijackson/citrus.git
61
- $ cd citrus
62
- $ rake package && sudo rake install
165
+ 'a' | 'b' # match "a" or "b"
166
+ 'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
167
+
168
+ == Super
169
+
170
+ When including a grammar inside another, all rules in the child that have the
171
+ same name as a rule in the parent also have access to the super keyword to
172
+ invoke the parent rule.
173
+
174
+ == Labels
175
+
176
+ Match objects may be referred to by a different name than the rule that
177
+ originally generated them. Labels are created by placing the label and a colon
178
+ immediately preceding any expression.
179
+
180
+ chars:/[a-z]+/ # the characters matched by the regular
181
+ # expression may be referred to as "chars"
182
+ # in a block method
183
+
184
+
185
+ ** Example **
186
+
187
+
188
+ Below is an example of a simple grammar that is able to parse strings of
189
+ integers separated by any amount of white space and a + symbol.
190
+
191
+ grammar Addition
192
+ rule additive
193
+ number plus (additive | number)
194
+ end
195
+
196
+ rule number
197
+ [0-9]+ space
198
+ end
199
+
200
+ rule plus
201
+ '+' space
202
+ end
203
+
204
+ rule space
205
+ [ \t]*
206
+ end
207
+ end
208
+
209
+ Several things to note about the above example:
210
+
211
+ * Grammar and rule declarations end with the "end" keyword
212
+ * A Sequence of rules is created by separating expressions with a space
213
+ * Likewise, ordered choice is represented with a vertical bar
214
+ * Parentheses may be used to override the natural binding order
215
+ * Rules may refer to other rules in their own definitions simply by using the
216
+ other rule's name
217
+ * Any expression may be followed by a quantifier
218
+
219
+ == Interpretation
220
+
221
+ The grammar above is able to parse simple mathematical expressions such as "1+2"
222
+ and "1 + 2+3", but it does not have enough semantic information to be able to
223
+ actually interpret these expressions.
224
+
225
+ At this point, when the grammar parses a string it generates a tree of Match
226
+ objects. Each match is created by a rule. A match will know what text it
227
+ contains, its offset in the original input, and what submatches it contains.
228
+
229
+ Submatches are created whenever a rule contains another rule. For example, in
230
+ the grammar above the number rule matches a string of digits followed by white
231
+ space. Thus, a match generated by the number rule will contain two submatches.
232
+
233
+ We can use Ruby's block syntax to create a module that will be attached to these
234
+ matches when they are created and is used to lazily extend them when we want to
235
+ interpret them. The following example shows one way to do this.
236
+
237
+ grammar Addition
238
+ rule additive
239
+ (number plus term) {
240
+ def value
241
+ number.value + term.value
242
+ end
243
+ }
244
+ end
245
+
246
+ rule term
247
+ (additive | number) {
248
+ def value
249
+ first.value
250
+ end
251
+ }
252
+ end
253
+
254
+ rule number
255
+ ([0-9]+ space) {
256
+ def value
257
+ text.strip.to_i
258
+ end
259
+ }
260
+ end
261
+
262
+ rule plus
263
+ '+' space
264
+ end
265
+
266
+ rule space
267
+ [ \t]*
268
+ end
269
+ end
270
+
271
+ In this version of the grammar the additive rule has been refactored to use the
272
+ term rule. This makes it a little cleaner to define our semantic blocks. It's
273
+ easiest to explain what is going on here by starting with the lowest level
274
+ block, which is defined within the number rule.
275
+
276
+ The semantic block associated with the number rule defines one method, value.
277
+ This method will be present on all matches that result from this rule. Inside
278
+ this method, we can see that the value of a number match is determined to be
279
+ its text value, stripped of white space and converted to an integer.
280
+
281
+ Similarly, the block that is applied to term matches also defines a value
282
+ method. However, this method works a bit differently. Since a term matches an
283
+ additive or a number a term match will contain one submatch, the match that
284
+ resulted from either additive or number. The first method retrieves the first
285
+ submatch. So, the value of a term is determined to be the value of its first
286
+ submatch.
287
+
288
+ Finally, the additive rule also extends its matches with a value method. Here,
289
+ the value of an additive is determined to be the values of its number and term
290
+ matches added together using Ruby's addition operator.
291
+
292
+ Since additive is the first rule defined in the grammar, any match that results
293
+ from parsing a string with this grammar will have a value method that can be
294
+ used to recursively calculate the collective value of the entire match tree.
295
+
296
+ To give it a try, save the code for the Addition grammar in a file called
297
+ addition.citrus. Next, assuming you have the Citrus gem installed, try the
298
+ following sequence of commands in a terminal.
299
+
300
+ $ irb
301
+ > require 'citrus'
302
+ => true
303
+ > Citrus.load 'addition'
304
+ => [Addition]
305
+ > m = Addition.parse '1 + 2 + 3'
306
+ => #<Citrus::Match ...
307
+ > m.value
308
+ => 6
309
+
310
+ Congratulations! You just ran your first piece of Citrus code.
311
+
312
+ Take a look at examples/calc.citrus for an example of a calculator that is able
313
+ to parse and evaluate more complex mathematical expressions.
314
+
315
+
316
+ ** Links **
317
+
318
+
319
+ http://mjijackson.com/citrus
320
+ http://pdos.csail.mit.edu/~baford/packrat/
321
+ http://en.wikipedia.org/wiki/Parsing_expression_grammar
322
+ http://treetop.rubyforge.org/index.html
63
323
 
64
324
 
65
325
  ** License **
data/citrus.gemspec CHANGED
@@ -1,7 +1,7 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'citrus'
3
- s.version = '1.2.1'
4
- s.date = '2010-06-02'
3
+ s.version = '1.2.2'
4
+ s.date = '2010-06-09'
5
5
 
6
6
  s.summary = 'Parsing Expressions for Ruby'
7
7
  s.description = 'Parsing Expressions for Ruby'
@@ -14,6 +14,7 @@ Gem::Specification.new do |s|
14
14
  s.files = Dir['benchmark/*.rb'] +
15
15
  Dir['benchmark/*.citrus'] +
16
16
  Dir['benchmark/*.gnuplot'] +
17
+ Dir['doc/**/*'] +
17
18
  Dir['examples/**/*'] +
18
19
  Dir['extras/**/*'] +
19
20
  Dir['lib/**/*.rb'] +
@@ -29,5 +30,5 @@ Gem::Specification.new do |s|
29
30
  s.rdoc_options = %w< --line-numbers --inline-source --title Citrus --main Citrus >
30
31
  s.extra_rdoc_files = %w< README >
31
32
 
32
- s.homepage = 'http://github.com/mjijackson/citrus'
33
+ s.homepage = 'http://mjijackson.com/citrus'
33
34
  end
@@ -0,0 +1,72 @@
1
+ = Background
2
+
3
+ In order to be able to use Citrus effectively, you must first understand the
4
+ difference between syntax and semantics. Syntax is a set of rules that govern
5
+ the way letters and punctuation may be used in a language. For example, English
6
+ syntax dictates that proper nouns should start with a capital letter and that
7
+ sentences should end with a period.
8
+
9
+ Semantics are the rules by which meaning may be derived in a language. For
10
+ example, as you read a book you are able to make some sense of the particular
11
+ way in which words on a page are combined to form thoughts and express ideas
12
+ because you understand what the words themselves mean and you can understand
13
+ what they mean collectively.
14
+
15
+ Computers use a similar process when interpreting code. First, the code must be
16
+ parsed into recognizable symbols or tokens. These tokens may then be passed to
17
+ an interpreter which is responsible for forming actual instructions from them.
18
+
19
+ Citrus is a pure Ruby library that allows you to perform both lexical analysis
20
+ and semantic interpretation quickly and easily. Using Citrus you can write
21
+ powerful parsers that are simple to understand and easy to create and maintain.
22
+
23
+ In Citrus, there are three main types of objects: rules, grammars, and matches.
24
+
25
+ == Rules
26
+
27
+ A Rule[link:api/classes/Citrus/Rule.html] is an object that specifies some matching behavior on a string. There are
28
+ two types of rules: terminals and non-terminals. Terminals can be either Ruby
29
+ strings or regular expressions that specify some input to match. For example, a
30
+ terminal created from the string "end" would match any sequence of the
31
+ characters "e", "n", and "d", in that order. A terminal created from a regular
32
+ expression uses Ruby's regular expression engine to attempt to create a match.
33
+
34
+ Non-terminals are rules that may contain other rules but do not themselves match
35
+ directly on the input. For example, a Repeat is a non-terminal that may contain
36
+ one other rule that will try and match a certain number of times. Several other
37
+ types of non-terminals are available that will be discussed later.
38
+
39
+ Rule objects may also have semantic information associated with them in the form
40
+ of Ruby modules. These modules contain methods that will be used to extend any
41
+ match objects created by the rule with which they are associated.
42
+
43
+ == Grammars
44
+
45
+ A Grammar[link:api/classes/Citrus/Grammar.html] is a container for rules. Usually the rules in a grammar collectively
46
+ form a complete specification for some language, or a well-defined subset
47
+ thereof.
48
+
49
+ A Citrus grammar is really just a souped-up Ruby module. These modules may be
50
+ included in other grammar modules in the same way that Ruby modules are normally
51
+ used. This property allows you to divide a complex grammar into reusable pieces
52
+ that may be combined dynamically at runtime. Any grammar rule with the same name
53
+ as a rule in an included grammar may access that rule with a mechanism similar
54
+ to Ruby's super keyword.
55
+
56
+ == Matches
57
+
58
+ Matches are created by rule objects when they match on the input. A Match[link:api/classes/Citrus/Match.html]
59
+ contains the string of text that made up the match as well as its offset in the
60
+ original input string. During a parse, matches are arranged in a tree structure
61
+ where any match may contain any number of other matches. This structure is
62
+ determined by the way in which the rule that generated each match is used in the
63
+ grammar.
64
+
65
+ For example, a match that is created from a non-terminal rule that contains
66
+ several other terminals will likewise contain several matches, one for each
67
+ terminal.
68
+
69
+ Match objects may be extended with semantic information in the form of methods.
70
+ These methods can interpret the text of a match using the wealth of information
71
+ available to them including the text of the match, its position in the input,
72
+ and any submatches.
data/doc/example.rdoc ADDED
@@ -0,0 +1,128 @@
1
+ = Example
2
+
3
+ Below is an example of a simple grammar that is able to parse strings of
4
+ integers separated by any amount of white space and a <tt>+</tt> symbol.
5
+
6
+ grammar Addition
7
+ rule additive
8
+ number plus (additive | number)
9
+ end
10
+
11
+ rule number
12
+ [0-9]+ space
13
+ end
14
+
15
+ rule plus
16
+ '+' space
17
+ end
18
+
19
+ rule space
20
+ [ \t]*
21
+ end
22
+ end
23
+
24
+ Several things to note about the above example:
25
+
26
+ * Grammar and rule declarations end with the <tt>end</tt> keyword
27
+ * A Sequence of rules is created by separating expressions with a space
28
+ * Likewise, ordered choice is represented with a vertical bar
29
+ * Parentheses may be used to override the natural binding order
30
+ * Rules may refer to other rules in their own definitions simply by using the
31
+ other rule's name
32
+ * Any expression may be followed by a quantifier
33
+
34
+ == Interpretation
35
+
36
+ The grammar above is able to parse simple mathematical expressions such as "1+2"
37
+ and "1 + 2+3", but it does not have enough semantic information to be able to
38
+ actually interpret these expressions.
39
+
40
+ At this point, when the grammar parses a string it generates a tree of Match[link:api/classes/Citrus/Match.html]
41
+ objects. Each match is created by a rule. A match will know what text it
42
+ contains, its offset in the original input, and what submatches it contains.
43
+
44
+ Submatches are created whenever a rule contains another rule. For example, in
45
+ the grammar above the number rule matches a string of digits followed by white
46
+ space. Thus, a match generated by the number rule will contain two submatches.
47
+
48
+ We can use Ruby's block syntax to create a module that will be attached to these
49
+ matches when they are created and is used to lazily extend them when we want to
50
+ interpret them. The following example shows one way to do this.
51
+
52
+ grammar Addition
53
+ rule additive
54
+ (number plus term) {
55
+ def value
56
+ number.value + term.value
57
+ end
58
+ }
59
+ end
60
+
61
+ rule term
62
+ (additive | number) {
63
+ def value
64
+ first.value
65
+ end
66
+ }
67
+ end
68
+
69
+ rule number
70
+ ([0-9]+ space) {
71
+ def value
72
+ text.strip.to_i
73
+ end
74
+ }
75
+ end
76
+
77
+ rule plus
78
+ '+' space
79
+ end
80
+
81
+ rule space
82
+ [ \t]*
83
+ end
84
+ end
85
+
86
+ In this version of the grammar the additive rule has been refactored to use the
87
+ term rule. This makes it a little cleaner to define our semantic blocks. It's
88
+ easiest to explain what is going on here by starting with the lowest level
89
+ block, which is defined within the number rule.
90
+
91
+ The semantic block associated with the number rule defines one method, value.
92
+ This method will be present on all matches that result from this rule. Inside
93
+ this method, we can see that the value of a number match is determined to be
94
+ its text value, stripped of white space and converted to an integer.
95
+
96
+ Similarly, the block that is applied to term matches also defines a value
97
+ method. However, this method works a bit differently. Since a term matches an
98
+ additive or a number a term match will contain one submatch, the match that
99
+ resulted from either additive or number. The first method retrieves the first
100
+ submatch. So, the value of a term is determined to be the value of its first
101
+ submatch.
102
+
103
+ Finally, the additive rule also extends its matches with a value method. Here,
104
+ the value of an additive is determined to be the values of its number and term
105
+ matches added together using Ruby's addition operator.
106
+
107
+ Since additive is the first rule defined in the grammar, any match that results
108
+ from parsing a string with this grammar will have a value method that can be
109
+ used to recursively calculate the collective value of the entire match tree.
110
+
111
+ To give it a try, save the code for the Addition grammar in a file called
112
+ addition.citrus. Next, assuming you have the Citrus gem installed, try the
113
+ following sequence of commands in a terminal.
114
+
115
+ $ irb
116
+ > require 'citrus'
117
+ => true
118
+ > Citrus.load 'addition'
119
+ => [Addition]
120
+ > m = Addition.parse '1 + 2 + 3'
121
+ => #<Citrus::Match ...
122
+ > m.value
123
+ => 6
124
+
125
+ Congratulations! You just ran your first piece of Citrus code.
126
+
127
+ Take a look at examples/calc.citrus[http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus] for an example of a calculator that is able
128
+ to parse and evaluate more complex mathematical expressions.
data/doc/index.rdoc ADDED
@@ -0,0 +1,15 @@
1
+ Citrus is a compact and powerful parsing library for Ruby[http://ruby-lang.org/] that combines the
2
+ elegance and expressiveness of the language with the simplicity and power of
3
+ parsing expressions.
4
+
5
+ = Installation
6
+
7
+ Via RubyGems[http://rubygems.org/]:
8
+
9
+ $ sudo gem install citrus
10
+
11
+ From a local copy:
12
+
13
+ $ git clone git://github.com/mjijackson/citrus.git
14
+ $ cd citrus
15
+ $ rake package && sudo rake install
data/doc/license.rdoc ADDED
@@ -0,0 +1,21 @@
1
+ = License
2
+
3
+ Copyright 2010 Michael Jackson
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/doc/links.rdoc ADDED
@@ -0,0 +1,18 @@
1
+ = Links
2
+
3
+ The primary resource for all things to do with parsing expressions can be found
4
+ at MIT.
5
+
6
+ http://pdos.csail.mit.edu/~baford/packrat
7
+
8
+ A useful summary of parsing expression grammars can be found on Wikipedia as
9
+ well.
10
+
11
+ http://en.wikipedia.org/wiki/Parsing_expression_grammar
12
+
13
+ Citrus draws inspiration from another Ruby library for writing parsing
14
+ expression grammars, Treetop. While Citrus' syntax is similar to that of
15
+ Treetop, it's not identical. The link is included here for those who may wish to
16
+ explore an alternative implementation.
17
+
18
+ http://treetop.rubyforge.org
data/doc/syntax.rdoc ADDED
@@ -0,0 +1,96 @@
1
+ = Syntax
2
+
3
+ The most straightforward way to compose a Citrus grammar is to use Citrus' own
4
+ custom grammar syntax. This syntax borrows heavily from Ruby, so it should
5
+ already be familiar to Ruby programmers.
6
+
7
+ == Terminals
8
+
9
+ Terminals may be represented by a string or a regular expression. Both follow
10
+ the same rules as Ruby string and regular expression literals.
11
+
12
+ 'abc'
13
+ "abc\n"
14
+ /\xFF/
15
+
16
+ Character classes and the dot (match anything) symbol are supported as well for
17
+ compatibility with other parsing expression implementations.
18
+
19
+ [a-z0-9] # match any lowercase letter or digit
20
+ [\x00-\xFF] # match any octet
21
+ . # match anything, even new lines
22
+
23
+ See FixedWidth[link:api/classes/Citrus/FixedWidth.html] and
24
+ Expression[link:api/classes/Citrus/Expression.html] for more information.
25
+
26
+ == Repetition
27
+
28
+ Quantifiers may be used after any expression to specify a number of times it
29
+ must match. The universal form of a quantifier is N*M where N is the minimum and
30
+ M is the maximum number of times the expression may match.
31
+
32
+ 'abc'1*2 # match "abc" a minimum of one, maximum
33
+ # of two times
34
+ 'abc'1* # match "abc" at least once
35
+ 'abc'*2 # match "abc" a maximum of twice
36
+
37
+ The + and ? operators are supported as well for the common cases of 1* and *1
38
+ respectively.
39
+
40
+ 'abc'+ # match "abc" at least once
41
+ 'abc'? # match "abc" a maximum of once
42
+
43
+ See Repeat[link:api/classes/Citrus/Repeat.html] for more information.
44
+
45
+ == Lookahead
46
+
47
+ Both positive and negative lookahead are supported in Citrus. Use the & and !
48
+ operators to indicate that an expression either should or should not match. In
49
+ neither case is any input consumed.
50
+
51
+ &'a' 'b' # match a "b" preceded by an "a"
52
+ !'a' 'b' # match a "b" that is not preceded by an "a"
53
+ !'a' . # match any character except for "a"
54
+
55
+ See AndPredicate[link:api/classes/Citrus/AndPredicate.html] and
56
+ NotPredicate[link:api/classes/Citrus/NotPredicate.html] for more information.
57
+
58
+ == Sequences
59
+
60
+ Sequences of expressions may be separated by a space to indicate that the rules
61
+ should match in that order.
62
+
63
+ 'a' 'b' 'c' # match "a", then "b", then "c"
64
+ 'a' [0-9] # match "a", then a numeric digit
65
+
66
+ See Sequence[link:api/classes/Citrus/Sequence.html] for more information.
67
+
68
+ == Choices
69
+
70
+ Ordered choice is indicated by a vertical bar that separates two expressions.
71
+ Note that any operator binds more tightly than the bar.
72
+
73
+ 'a' | 'b' # match "a" or "b"
74
+ 'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
75
+
76
+ See Choice[link:api/classes/Citrus/Choice.html] for more information.
77
+
78
+ == Super
79
+
80
+ When including a grammar inside another, all rules in the child that have the
81
+ same name as a rule in the parent also have access to the super keyword to
82
+ invoke the parent rule.
83
+
84
+ See Super[link:api/classes/Citrus/Super.html] for more information.
85
+
86
+ == Labels
87
+
88
+ Match objects may be referred to by a different name than the rule that
89
+ originally generated them. Labels are created by placing the label and a colon
90
+ immediately preceding any expression.
91
+
92
+ chars:/[a-z]+/ # the characters matched by the regular
93
+ # expression may be referred to as "chars"
94
+ # in a block method
95
+
96
+ See Label[link:api/classes/Citrus/Label.html] for more information.
data/lib/citrus.rb CHANGED
@@ -1,10 +1,10 @@
1
1
  # Citrus is a compact and powerful parsing library for Ruby that combines the
2
2
  # elegance and expressiveness of the language with the simplicity and power of
3
- # parsing expression grammars.
3
+ # parsing expressions.
4
4
  #
5
- # http://github.com/mjijackson/citrus
5
+ # http://mjijackson.com/citrus
6
6
  module Citrus
7
- VERSION = [1, 2, 1]
7
+ VERSION = [1, 2, 2]
8
8
 
9
9
  Infinity = 1.0 / 0
10
10
 
data/lib/citrus/debug.rb CHANGED
@@ -7,7 +7,7 @@ module Citrus
7
7
  # inspecting a nested match. The +xml+ argument may be a Hash of
8
8
  # Builder::XmlMarkup options.
9
9
  def to_markup(xml={})
10
- if xml.is_a?(Hash)
10
+ if Hash === xml
11
11
  opt = { :indent => 2 }.merge(xml)
12
12
  xml = Builder::XmlMarkup.new(opt)
13
13
  xml.instruct!
metadata CHANGED
@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
5
5
  segments:
6
6
  - 1
7
7
  - 2
8
- - 1
9
- version: 1.2.1
8
+ - 2
9
+ version: 1.2.2
10
10
  platform: ruby
11
11
  authors:
12
12
  - Michael Jackson
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2010-06-02 00:00:00 -06:00
17
+ date: 2010-06-09 00:00:00 -06:00
18
18
  default_executable:
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
@@ -53,6 +53,12 @@ files:
53
53
  - benchmark/seqpar.rb
54
54
  - benchmark/seqpar.citrus
55
55
  - benchmark/seqpar.gnuplot
56
+ - doc/background.rdoc
57
+ - doc/example.rdoc
58
+ - doc/index.rdoc
59
+ - doc/license.rdoc
60
+ - doc/links.rdoc
61
+ - doc/syntax.rdoc
56
62
  - examples/calc.citrus
57
63
  - examples/calc.rb
58
64
  - examples/calc_sugar.rb
@@ -83,7 +89,7 @@ files:
83
89
  - Rakefile
84
90
  - README
85
91
  has_rdoc: true
86
- homepage: http://github.com/mjijackson/citrus
92
+ homepage: http://mjijackson.com/citrus
87
93
  licenses: []
88
94
 
89
95
  post_install_message: