parslet 0.9.0

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile ADDED
@@ -0,0 +1,7 @@
1
+ # A sample Gemfile
2
+ source "http://rubygems.org"
3
+
4
+ group :development do
5
+ gem 'rspec'
6
+ gem 'flexmock'
7
+ end
data/HISTORY.txt ADDED
@@ -0,0 +1,21 @@
1
+ = 0.9.0 / ???
2
+ * More of everything: Examples, documentation, etc...
3
+
4
+ * Breaking change: Ruby's binary or ('|') is now used for alternatives,
5
+ instead of the division sign ('/') - this reduces the amount of
6
+ parenthesis needed for a grammar overall.
7
+
8
+ * parslet.maybe now yields the result or nil in case of parse failure. This
9
+ is probably better than the array it did before; the jury is still out on
10
+ that.
11
+
12
+ * parslet.repeat(min, max) is now valid syntax
13
+
14
+ = 0.1.0 / not released.
15
+
16
+ * Initial version. Classes for parsing, matching in the resulting trees
17
+ and transforming the trees into something more useful.
18
+
19
+ * Parses and outputs intermediary trees
20
+
21
+ * Matching of single elements and sequences
data/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+
2
+ Copyright (c) 2010 Kaspar Schiess
3
+
4
+ Permission is hereby granted, free of charge, to any person
5
+ obtaining a copy of this software and associated documentation
6
+ files (the "Software"), to deal in the Software without
7
+ restriction, including without limitation the rights to use,
8
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the
10
+ Software is furnished to do so, subject to the following
11
+ conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
18
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
20
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
21
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
22
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
23
+ OTHER DEALINGS IN THE SOFTWARE.
data/README ADDED
@@ -0,0 +1,101 @@
1
+ INTRODUCTION
2
+
3
+ A small library that implements a PEG grammar. PEG means Parsing Expression
4
+ Grammars [1]. These are a different kind of grammars that recognize almost the
5
+ same languages as your conventional LR parser, except that they are easier to
6
+ work with, since they haven't been conceived for generation, but for
7
+ recognition of languages. You can read the founding paper of the field by
8
+ Bryan Ford here [2].
9
+
10
+ Other Ruby projects that work on the same topic are:
11
+ http://wiki.github.com/luikore/rsec/
12
+ http://github.com/mjijackson/citrus
13
+ http://github.com/nathansobo/treetop
14
+
15
+ My goal here was to see how a parser/parser generator should be constructed to
16
+ allow clean AST construction and good error handling. It seems to me that most
17
+ often, parser generators only handle the success-case and forget about
18
+ debugging and error generation.
19
+
20
+ More specifically, this library is motivated by one of my compiler projects. I
21
+ started out using 'treetop' (see the link above), but found it unusable. It
22
+ was lacking in
23
+
24
+ * error reporting: Hard to see where a grammar fails.
25
+
26
+ * stability of generated trees: Intermediary trees were dictated by the
27
+ grammar. It was hard to define invariants in that system - what was
28
+ convenient when writing the grammar often wasn't in subsequent stages.
29
+
30
+ * clarity of parser code: The parser code is generated and is very hard
31
+ to read. Add that to the first point to understand my pain.
32
+
33
+ So parslet tries to be different. It doesn't generate the parser, but instead
34
+ defines it in a DSL which is very close to what you find in [2]. A successful
35
+ parse then generates a parser tree consisting entirely of hashes and arrays
36
+ and strings (read: instable). This parser tree can then be converted to a real
37
+ AST (read: stable) using a pattern matcher that is also part of this library.
38
+
39
+ Error reporting is another area where parslet excels: It is able to print not
40
+ only the error you are used to seeing ('Parse failed because of REASON at line
41
+ 1 and char 2'), but also prints what led to that failure in the form of a
42
+ tree (#error_tree method).
43
+
44
+ [1] http://en.wikipedia.org/wiki/Parsing_expression_grammar
45
+ [2] http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
46
+
47
+ SYNOPSIS
48
+
49
+ require 'parslet'
50
+ include Parslet
51
+
52
+ # Constructs a parser using a Parser Expression Grammar like DSL:
53
+ parser = str('"') >>
54
+ (
55
+ str('\\') >> any |
56
+ str('"').absnt? >> any
57
+ ).repeat.as(:string) >>
58
+ str('"')
59
+
60
+ # Parse the string and capture parts of the interpretation (:string above)
61
+ tree = parser.parse(%Q{
62
+ "This is a \\"String\\" in which you can escape stuff"
63
+ }.strip)
64
+
65
+ tree # => {:string=>"This is a \\\"String\\\" in which you can escape stuff"}
66
+
67
+ # Here's how you can grab results from that tree:
68
+ Pattern.new(:string => simple(:x)).each_match(tree) do |dictionary|
69
+ puts "String contents: #{dictionary[:x]}"
70
+ end
71
+
72
+ # Here's how to transform that tree into something else ----------------------
73
+
74
+ # Defines the classes of our new Syntax Tree
75
+ class StringLiteral < Struct.new(:text); end
76
+
77
+ # Defines a set of transformation rules on tree leafes
78
+ transform = Transform.new
79
+ transform.rule(:string => simple(:x)) { |d| StringLiteral.new(d[:x]) }
80
+
81
+ # Transforms the tree
82
+ transform.apply(tree)
83
+
84
+ # => #<struct StringLiteral text="This is a \\\"String\\\" ... escape stuff">
85
+
86
+ COMPATIBILITY
87
+
88
+ This library should work with both ruby 1.8 and ruby 1.9.
89
+
90
+ AUTHORS
91
+
92
+ My gigantous thanks go to the following cool guys and gals that help make this
93
+ rock:
94
+
95
+ Florian Hanke <florian.hanke@gmail.com>
96
+
97
+ STATUS
98
+
99
+ On the road to 1.0; improving documentation, packaging and upgrading to rspec2.
100
+
101
+ (c) 2010 Kaspar Schiess
data/Rakefile ADDED
@@ -0,0 +1,73 @@
1
+
2
+ require "rubygems"
3
+ require "rake/gempackagetask"
4
+ require "rake/rdoctask"
5
+ require 'rspec/core/rake_task'
6
+
7
+ desc "Run all examples"
8
+ RSpec::Core::RakeTask.new
9
+
10
+ task :default => :spec
11
+
12
+ # This builds the actual gem. For details of what all these options
13
+ # mean, and other ones you can add, check the documentation here:
14
+ #
15
+ # http://rubygems.org/read/chapter/20
16
+ #
17
+ spec = Gem::Specification.new do |s|
18
+
19
+ # Change these as appropriate
20
+ s.name = "parslet"
21
+ s.version = "0.9.0"
22
+ s.summary = "Parser construction library with great error reporting in Ruby."
23
+ s.author = "Kaspar Schiess"
24
+ s.email = "kaspar.schiess@absurd.li"
25
+ s.homepage = "http://kschiess.github.com/parslet"
26
+
27
+ s.has_rdoc = true
28
+ s.extra_rdoc_files = %w(README)
29
+ s.rdoc_options = %w(--main README)
30
+
31
+ # Add any extra files to include in the gem
32
+ s.files = %w(Gemfile HISTORY.txt LICENSE Rakefile README) + Dir.glob("{spec,lib/**/*}")
33
+ s.require_paths = ["lib"]
34
+
35
+ # If you want to depend on other gems, add them here, along with any
36
+ # relevant versions
37
+ # s.add_dependency("some_other_gem", "~> 0.1.0")
38
+
39
+ # If your tests use any gems, include them here
40
+ s.add_development_dependency("rspec")
41
+ s.add_development_dependency("flexmock")
42
+ end
43
+
44
+ # This task actually builds the gem. We also regenerate a static
45
+ # .gemspec file, which is useful if something (i.e. GitHub) will
46
+ # be automatically building a gem for this project. If you're not
47
+ # using GitHub, edit as appropriate.
48
+ #
49
+ # To publish your gem online, install the 'gemcutter' gem; Read more
50
+ # about that here: http://gemcutter.org/pages/gem_docs
51
+ Rake::GemPackageTask.new(spec) do |pkg|
52
+ pkg.gem_spec = spec
53
+ end
54
+
55
+ desc "Build the gemspec file #{spec.name}.gemspec"
56
+ task :gemspec do
57
+ file = File.dirname(__FILE__) + "/#{spec.name}.gemspec"
58
+ File.open(file, "w") {|f| f << spec.to_ruby }
59
+ end
60
+
61
+ task :package => :gemspec
62
+
63
+ # Generate documentation
64
+ Rake::RDocTask.new do |rd|
65
+ rd.main = "README"
66
+ rd.rdoc_files.include("README", "lib/**/*.rb")
67
+ rd.rdoc_dir = "rdoc"
68
+ end
69
+
70
+ desc 'Clear out RDoc and generated packages'
71
+ task :clean => [:clobber_rdoc, :clobber_package] do
72
+ rm "#{spec.name}.gemspec"
73
+ end
data/lib/parslet.rb ADDED
@@ -0,0 +1,301 @@
1
+ require 'stringio'
2
+
3
+ # A simple parser generator library. Typical usage would look like this:
4
+ #
5
+ # require 'parslet'
6
+ #
7
+ # class MyParser
8
+ # include Parslet
9
+ #
10
+ # rule(:a) { str('a').repeat }
11
+ #
12
+ # def parse(str)
13
+ # a.parse(str)
14
+ # end
15
+ # end
16
+ #
17
+ # pp MyParser.new.parse('aaaa') # => 'aaaa'
18
+ # pp MyParser.new.parse('bbbb') # => Parslet::Atoms::ParseFailed:
19
+ # # Don't know what to do with bbbb at line 1 char 1.
20
+ #
21
+ # The simple DSL allows you to define grammars in PEG-style. This kind of
22
+ # grammar construction does away with the ambiguities that usually comes with
23
+ # parsers; instead, it allows you to construct grammars that are easier to
24
+ # debug, since less magic is involved.
25
+ #
26
+ # Parslet is typically used in stages:
27
+ #
28
+ #
29
+ # * Parsing the input string; this yields an intermediary tree
30
+ # * Transformation of the tree into something useful to you
31
+ #
32
+ # The first stage is traditionally intermingled with the second stage; output
33
+ # from the second stage is usually called the 'Abstract Syntax Tree' or AST.
34
+ #
35
+ # The stages are completely decoupled; You can change your grammar around
36
+ # and use the second stage to isolate the rest of your code from the changes
37
+ # you've effected.
38
+ #
39
+ # = Language Atoms
40
+ #
41
+ # PEG-style grammars build on a very small number of atoms, or parslets. In
42
+ # fact, only three types of parslets exist. Here's how to match a string:
43
+ #
44
+ # str('a string')
45
+ #
46
+ # This matches the string 'a string' literally and nothing else. If your input
47
+ # doesn't contain the string, it will fail. Here's how to match a character
48
+ # set:
49
+ #
50
+ # match('[abc]')
51
+ #
52
+ # This matches 'a', 'b' or 'c'. The string matched will always have a length
53
+ # of 1; to match longer strings, please see the title below. The last parslet
54
+ # of the three is 'any':
55
+ #
56
+ # any
57
+ #
58
+ # 'any' functions like the dot in regular expressions - it matches any single
59
+ # character.
60
+ #
61
+ # = Combination and Repetition
62
+ #
63
+ # Parslets only get useful when combined to grammars. To combine one parslet
64
+ # with the other, you have 4 kinds of methods available: repeat and maybe, >>
65
+ # (sequence), | (alternation), absnt? and prsnt?.
66
+ #
67
+ # str('a').repeat # any number of 'a's, including 0
68
+ # str('a').maybe # maybe there'll be an 'a', maybe not
69
+ #
70
+ # Parslets can be joined using >>. This means: Match the left parslet, then
71
+ # match the right parslet.
72
+ #
73
+ # str('a') >> str('b') # would match 'ab'
74
+ #
75
+ # Keep in mind that all combination and repetition operators themselves return
76
+ # a parslet. You can combine the result again:
77
+ #
78
+ # ( str('a') >> str('b') ) >> str('c') # would match 'abc'
79
+ #
80
+ # The slash ('|') indicates alternatives:
81
+ #
82
+ # str('a') | str('b') # would match 'a' OR 'b'
83
+ #
84
+ # The left side of an alternative is matched first; if it matches, the right
85
+ # side is never looked at.
86
+ #
87
+ # The absnt? and prsnt? qualifiers allow looking at input without consuming
88
+ # it:
89
+ #
90
+ # str('a').absnt? # will match if at the current position there is an 'a'.
91
+ # str('a').absnt? >> str('b') # check for 'a' then match 'b'
92
+ #
93
+ # This means that the second example will not match any input; when the second
94
+ # part is parsed, the first part has asserted the presence of 'a', and thus
95
+ # str('b') cannot match. The prsnt? method is the opposite of absnt?, it
96
+ # asserts presence.
97
+ #
98
+ # More documentation on these methods can be found in Parslets::Atoms::Base.
99
+ #
100
+ # = Intermediary Parse Trees
101
+ #
102
+ # As you have probably seen above, you can hand input (strings or StringIOs) to
103
+ # your parslets like this:
104
+ #
105
+ # parslet.parse(str)
106
+ #
107
+ # This returns an intermediary parse tree or raises an exception
108
+ # (Parslet::ParseFailed) when the input is not well formed.
109
+ #
110
+ # Intermediary parse trees are essentially just Plain Old Ruby Objects. (PORO
111
+ # technology as we call it.) Parslets try very hard to return sensible stuff;
112
+ # it is quite easy to use the results for the later stages of your program.
113
+ #
114
+ # Here a few examples and what their intermediary tree looks like:
115
+ #
116
+ # str('foo').parse('foo') # => 'foo'
117
+ # (str('f') >> str('o') >> str('o')).parse('foo') # => 'foo'
118
+ #
119
+ # Naming parslets
120
+ #
121
+ # Construction of lambda blocks
122
+ #
123
+ # = Intermediary Tree transformation
124
+ #
125
+ # The intermediary parse tree by itself is most often not very useful. Its
126
+ # form is volatile; changing your parser in the slightest might produce
127
+ # profound changes in the generated trees.
128
+ #
129
+ # Generally you will want to construct a more stable tree using your own
130
+ # carefully crafted representation of the domain. Parslet provides you with
131
+ # an elegant way of transmogrifying your intermediary tree into the output
132
+ # format you choose. This is achieved by transformation rules such as this
133
+ # one:
134
+ #
135
+ # transform.rule(:literal => {:string => :_x}) { |d|
136
+ # StringLit.new(*d.values) }
137
+ #
138
+ # The above rule will transform a subtree looking like this:
139
+ #
140
+ # :literal
141
+ # |
142
+ # :string
143
+ # |
144
+ # "somestring"
145
+ #
146
+ # into just this:
147
+ #
148
+ # StringLit
149
+ # value: "somestring"
150
+ #
151
+ #
152
+ # = Further documentation
153
+ #
154
+ # Please see the examples subdirectory of the distribution for more examples.
155
+ # Check out 'rooc' (github.com/kschiess/rooc) as well - it uses parslet for
156
+ # compiler construction.
157
+ #
158
+ module Parslet
159
+ def self.included(base)
160
+ base.extend(ClassMethods)
161
+ end
162
+
163
+ # This is raised when the parse failed to match or to consume all its input.
164
+ # It contains the message that should be presented to the user. If you want
165
+ # to display more error explanation, you can print the #error_tree that is
166
+ # stored in the parslet. This is a graphical representation of what went
167
+ # wrong.
168
+ #
169
+ # Example:
170
+ #
171
+ # begin
172
+ # parslet.parse(str)
173
+ # rescue Parslet::ParseFailed => failure
174
+ # puts parslet.error_tree.ascii_tree
175
+ # end
176
+ #
177
+ class ParseFailed < Exception
178
+ end
179
+
180
+ module ClassMethods
181
+ # Define the parsers #root function. This is the place where you start
182
+ # parsing; if you have a rule for 'file' that describes what should be
183
+ # in a file, this would be your root declaration:
184
+ # class Parser
185
+ # root :file
186
+ # rule(:file) { ... }
187
+ # end
188
+ #
189
+ # #root declares a 'parse' function that works just like the parse
190
+ # function that you can call on a simple parslet, taking a string as input
191
+ # and producing parse output.
192
+ #
193
+ # In a way, #root is a shorthand for:
194
+ #
195
+ # def parse(str)
196
+ # your_parser_root.parse(str)
197
+ # end
198
+ #
199
+ def root(name)
200
+ define_method(:root) do
201
+ self.send(name)
202
+ end
203
+ define_method(:parse) do |str|
204
+ root.parse(str)
205
+ end
206
+ end
207
+
208
+ # Define an entity for the parser. This generates a method of the same name
209
+ # that can be used as part of other patterns. Those methods can be freely
210
+ # mixed in your parser class with real ruby methods.
211
+ #
212
+ # Example:
213
+ #
214
+ # class MyParser
215
+ # include Parslet
216
+ #
217
+ # rule :bar { str('bar') }
218
+ # rule :twobar do
219
+ # bar >> bar
220
+ # end
221
+ #
222
+ # def parse(str)
223
+ # twobar.parse(str)
224
+ # end
225
+ # end
226
+ #
227
+ def rule(name, &definition)
228
+ define_method(name) do
229
+ @rules ||= {} # <name, rule> memoization
230
+ @rules[name] or
231
+ (@rules[name] = Atoms::Entity.new(name, self, definition))
232
+ end
233
+ end
234
+ end
235
+
236
+ # Returns an atom matching a character class. This is essentially a regular
237
+ # expression, but you should only match a single character.
238
+ #
239
+ # Example:
240
+ #
241
+ # match('[ab]') # will match either 'a' or 'b'
242
+ # match('[\n\s]') # will match newlines and spaces
243
+ #
244
+ def match(obj)
245
+ Atoms::Re.new(obj)
246
+ end
247
+ module_function :match
248
+
249
+ # Returns an atom matching the +str+ given.
250
+ #
251
+ # Example:
252
+ #
253
+ # str('class') # will match 'class'
254
+ #
255
+ def str(str)
256
+ Atoms::Str.new(str)
257
+ end
258
+ module_function :str
259
+
260
+ # Returns an atom matching any character.
261
+ #
262
+ def any
263
+ Atoms::Re.new('.')
264
+ end
265
+ module_function :any
266
+
267
+ # Returns a placeholder for a tree transformation that will only match a
268
+ # sequence of elements. The +symbol+ you specify will be the key for the
269
+ # matched sequence in the returned dictionary.
270
+ #
271
+ # Example:
272
+ #
273
+ # # This would match a body element that contains several declarations.
274
+ # { :body => sequence(:declarations) }
275
+ #
276
+ # The above example would match :body => ['a', 'b'], but not :body => 'a'.
277
+ #
278
+ def sequence(symbol)
279
+ Pattern::SequenceBind.new(symbol)
280
+ end
281
+ module_function :sequence
282
+
283
+ # Returns a placeholder for a tree transformation that will only match
284
+ # simple elements. This matches everything that #sequence doesn't match.
285
+ #
286
+ # Example:
287
+ #
288
+ # # Matches a single header.
289
+ # { :header => simple(:header) }
290
+ #
291
+ def simple(symbol)
292
+ Pattern::SimpleBind.new(symbol)
293
+ end
294
+ module_function :simple
295
+ end
296
+
297
+ require 'parslet/error_tree'
298
+ require 'parslet/atoms'
299
+ require 'parslet/pattern'
300
+ require 'parslet/pattern/binding'
301
+ require 'parslet/transform'