parslet 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile ADDED
@@ -0,0 +1,7 @@
1
+ # A sample Gemfile
2
+ source "http://rubygems.org"
3
+
4
+ group :development do
5
+ gem 'rspec'
6
+ gem 'flexmock'
7
+ end
data/HISTORY.txt ADDED
@@ -0,0 +1,21 @@
1
+ = 0.9.0 / ???
2
+ * More of everything: Examples, documentation, etc...
3
+
4
+ * Breaking change: Ruby's binary or ('|') is now used for alternatives,
5
+ instead of the division sign ('/') - this reduces the amount of
6
+ parenthesis needed for a grammar overall.
7
+
8
+ * parslet.maybe now yields the result or nil in case of parse failure. This
9
+ is probably better than the array it did before; the jury is still out on
10
+ that.
11
+
12
+ * parslet.repeat(min, max) is now valid syntax
13
+
14
+ = 0.1.0 / not released.
15
+
16
+ * Initial version. Classes for parsing, matching in the resulting trees
17
+ and transforming the trees into something more useful.
18
+
19
+ * Parses and outputs intermediary trees
20
+
21
+ * Matching of single elements and sequences
data/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+
2
+ Copyright (c) 2010 Kaspar Schiess
3
+
4
+ Permission is hereby granted, free of charge, to any person
5
+ obtaining a copy of this software and associated documentation
6
+ files (the "Software"), to deal in the Software without
7
+ restriction, including without limitation the rights to use,
8
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the
10
+ Software is furnished to do so, subject to the following
11
+ conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
18
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
20
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
21
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
22
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
23
+ OTHER DEALINGS IN THE SOFTWARE.
data/README ADDED
@@ -0,0 +1,101 @@
1
+ INTRODUCTION
2
+
3
+ A small library that implements a PEG grammar. PEG means Parsing Expression
4
+ Grammars [1]. These are a different kind of grammars that recognize almost the
5
+ same languages as your conventional LR parser, except that they are easier to
6
+ work with, since they haven't been conceived for generation, but for
7
+ recognition of languages. You can read the founding paper of the field by
8
+ Bryan Ford here [2].
9
+
10
+ Other Ruby projects that work on the same topic are:
11
+ http://wiki.github.com/luikore/rsec/
12
+ http://github.com/mjijackson/citrus
13
+ http://github.com/nathansobo/treetop
14
+
15
+ My goal here was to see how a parser/parser generator should be constructed to
16
+ allow clean AST construction and good error handling. It seems to me that most
17
+ often, parser generators only handle the success-case and forget about
18
+ debugging and error generation.
19
+
20
+ More specifically, this library is motivated by one of my compiler projects. I
21
+ started out using 'treetop' (see the link above), but found it unusable. It
22
+ was lacking in
23
+
24
+ * error reporting: Hard to see where a grammar fails.
25
+
26
+ * stability of generated trees: Intermediary trees were dictated by the
27
+ grammar. It was hard to define invariants in that system - what was
28
+ convenient when writing the grammar often wasn't in subsequent stages.
29
+
30
+ * clarity of parser code: The parser code is generated and is very hard
31
+ to read. Add that to the first point to understand my pain.
32
+
33
+ So parslet tries to be different. It doesn't generate the parser, but instead
34
+ defines it in a DSL which is very close to what you find in [2]. A successful
35
+ parse then generates a parser tree consisting entirely of hashes and arrays
36
+ and strings (read: instable). This parser tree can then be converted to a real
37
+ AST (read: stable) using a pattern matcher that is also part of this library.
38
+
39
+ Error reporting is another area where parslet excels: It is able to print not
40
+ only the error you are used to seeing ('Parse failed because of REASON at line
41
+ 1 and char 2'), but also prints what led to that failure in the form of a
42
+ tree (#error_tree method).
43
+
44
+ [1] http://en.wikipedia.org/wiki/Parsing_expression_grammar
45
+ [2] http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
46
+
47
+ SYNOPSIS
48
+
49
+ require 'parslet'
50
+ include Parslet
51
+
52
+ # Constructs a parser using a Parser Expression Grammar like DSL:
53
+ parser = str('"') >>
54
+ (
55
+ str('\\') >> any |
56
+ str('"').absnt? >> any
57
+ ).repeat.as(:string) >>
58
+ str('"')
59
+
60
+ # Parse the string and capture parts of the interpretation (:string above)
61
+ tree = parser.parse(%Q{
62
+ "This is a \\"String\\" in which you can escape stuff"
63
+ }.strip)
64
+
65
+ tree # => {:string=>"This is a \\\"String\\\" in which you can escape stuff"}
66
+
67
+ # Here's how you can grab results from that tree:
68
+ Pattern.new(:string => simple(:x)).each_match(tree) do |dictionary|
69
+ puts "String contents: #{dictionary[:x]}"
70
+ end
71
+
72
+ # Here's how to transform that tree into something else ----------------------
73
+
74
+ # Defines the classes of our new Syntax Tree
75
+ class StringLiteral < Struct.new(:text); end
76
+
77
+ # Defines a set of transformation rules on tree leafes
78
+ transform = Transform.new
79
+ transform.rule(:string => simple(:x)) { |d| StringLiteral.new(d[:x]) }
80
+
81
+ # Transforms the tree
82
+ transform.apply(tree)
83
+
84
+ # => #<struct StringLiteral text="This is a \\\"String\\\" ... escape stuff">
85
+
86
+ COMPATIBILITY
87
+
88
+ This library should work with both ruby 1.8 and ruby 1.9.
89
+
90
+ AUTHORS
91
+
92
+ My gigantous thanks go to the following cool guys and gals that help make this
93
+ rock:
94
+
95
+ Florian Hanke <florian.hanke@gmail.com>
96
+
97
+ STATUS
98
+
99
+ On the road to 1.0; improving documentation, packaging and upgrading to rspec2.
100
+
101
+ (c) 2010 Kaspar Schiess
data/Rakefile ADDED
@@ -0,0 +1,73 @@
1
+
2
+ require "rubygems"
3
+ require "rake/gempackagetask"
4
+ require "rake/rdoctask"
5
+ require 'rspec/core/rake_task'
6
+
7
+ desc "Run all examples"
8
+ RSpec::Core::RakeTask.new
9
+
10
+ task :default => :spec
11
+
12
+ # This builds the actual gem. For details of what all these options
13
+ # mean, and other ones you can add, check the documentation here:
14
+ #
15
+ # http://rubygems.org/read/chapter/20
16
+ #
17
+ spec = Gem::Specification.new do |s|
18
+
19
+ # Change these as appropriate
20
+ s.name = "parslet"
21
+ s.version = "0.9.0"
22
+ s.summary = "Parser construction library with great error reporting in Ruby."
23
+ s.author = "Kaspar Schiess"
24
+ s.email = "kaspar.schiess@absurd.li"
25
+ s.homepage = "http://kschiess.github.com/parslet"
26
+
27
+ s.has_rdoc = true
28
+ s.extra_rdoc_files = %w(README)
29
+ s.rdoc_options = %w(--main README)
30
+
31
+ # Add any extra files to include in the gem
32
+ s.files = %w(Gemfile HISTORY.txt LICENSE Rakefile README) + Dir.glob("{spec,lib/**/*}")
33
+ s.require_paths = ["lib"]
34
+
35
+ # If you want to depend on other gems, add them here, along with any
36
+ # relevant versions
37
+ # s.add_dependency("some_other_gem", "~> 0.1.0")
38
+
39
+ # If your tests use any gems, include them here
40
+ s.add_development_dependency("rspec")
41
+ s.add_development_dependency("flexmock")
42
+ end
43
+
44
+ # This task actually builds the gem. We also regenerate a static
45
+ # .gemspec file, which is useful if something (i.e. GitHub) will
46
+ # be automatically building a gem for this project. If you're not
47
+ # using GitHub, edit as appropriate.
48
+ #
49
+ # To publish your gem online, install the 'gemcutter' gem; Read more
50
+ # about that here: http://gemcutter.org/pages/gem_docs
51
+ Rake::GemPackageTask.new(spec) do |pkg|
52
+ pkg.gem_spec = spec
53
+ end
54
+
55
+ desc "Build the gemspec file #{spec.name}.gemspec"
56
+ task :gemspec do
57
+ file = File.dirname(__FILE__) + "/#{spec.name}.gemspec"
58
+ File.open(file, "w") {|f| f << spec.to_ruby }
59
+ end
60
+
61
+ task :package => :gemspec
62
+
63
+ # Generate documentation
64
+ Rake::RDocTask.new do |rd|
65
+ rd.main = "README"
66
+ rd.rdoc_files.include("README", "lib/**/*.rb")
67
+ rd.rdoc_dir = "rdoc"
68
+ end
69
+
70
+ desc 'Clear out RDoc and generated packages'
71
+ task :clean => [:clobber_rdoc, :clobber_package] do
72
+ rm "#{spec.name}.gemspec"
73
+ end
data/lib/parslet.rb ADDED
@@ -0,0 +1,301 @@
1
+ require 'stringio'
2
+
3
+ # A simple parser generator library. Typical usage would look like this:
4
+ #
5
+ # require 'parslet'
6
+ #
7
+ # class MyParser
8
+ # include Parslet
9
+ #
10
+ # rule(:a) { str('a').repeat }
11
+ #
12
+ # def parse(str)
13
+ # a.parse(str)
14
+ # end
15
+ # end
16
+ #
17
+ # pp MyParser.new.parse('aaaa') # => 'aaaa'
18
+ # pp MyParser.new.parse('bbbb') # => Parslet::Atoms::ParseFailed:
19
+ # # Don't know what to do with bbbb at line 1 char 1.
20
+ #
21
+ # The simple DSL allows you to define grammars in PEG-style. This kind of
22
+ # grammar construction does away with the ambiguities that usually comes with
23
+ # parsers; instead, it allows you to construct grammars that are easier to
24
+ # debug, since less magic is involved.
25
+ #
26
+ # Parslet is typically used in stages:
27
+ #
28
+ #
29
+ # * Parsing the input string; this yields an intermediary tree
30
+ # * Transformation of the tree into something useful to you
31
+ #
32
+ # The first stage is traditionally intermingled with the second stage; output
33
+ # from the second stage is usually called the 'Abstract Syntax Tree' or AST.
34
+ #
35
+ # The stages are completely decoupled; You can change your grammar around
36
+ # and use the second stage to isolate the rest of your code from the changes
37
+ # you've effected.
38
+ #
39
+ # = Language Atoms
40
+ #
41
+ # PEG-style grammars build on a very small number of atoms, or parslets. In
42
+ # fact, only three types of parslets exist. Here's how to match a string:
43
+ #
44
+ # str('a string')
45
+ #
46
+ # This matches the string 'a string' literally and nothing else. If your input
47
+ # doesn't contain the string, it will fail. Here's how to match a character
48
+ # set:
49
+ #
50
+ # match('[abc]')
51
+ #
52
+ # This matches 'a', 'b' or 'c'. The string matched will always have a length
53
+ # of 1; to match longer strings, please see the title below. The last parslet
54
+ # of the three is 'any':
55
+ #
56
+ # any
57
+ #
58
+ # 'any' functions like the dot in regular expressions - it matches any single
59
+ # character.
60
+ #
61
+ # = Combination and Repetition
62
+ #
63
+ # Parslets only get useful when combined to grammars. To combine one parslet
64
+ # with the other, you have 4 kinds of methods available: repeat and maybe, >>
65
+ # (sequence), | (alternation), absnt? and prsnt?.
66
+ #
67
+ # str('a').repeat # any number of 'a's, including 0
68
+ # str('a').maybe # maybe there'll be an 'a', maybe not
69
+ #
70
+ # Parslets can be joined using >>. This means: Match the left parslet, then
71
+ # match the right parslet.
72
+ #
73
+ # str('a') >> str('b') # would match 'ab'
74
+ #
75
+ # Keep in mind that all combination and repetition operators themselves return
76
+ # a parslet. You can combine the result again:
77
+ #
78
+ # ( str('a') >> str('b') ) >> str('c') # would match 'abc'
79
+ #
80
+ # The slash ('|') indicates alternatives:
81
+ #
82
+ # str('a') | str('b') # would match 'a' OR 'b'
83
+ #
84
+ # The left side of an alternative is matched first; if it matches, the right
85
+ # side is never looked at.
86
+ #
87
+ # The absnt? and prsnt? qualifiers allow looking at input without consuming
88
+ # it:
89
+ #
90
+ # str('a').absnt? # will match if at the current position there is an 'a'.
91
+ # str('a').absnt? >> str('b') # check for 'a' then match 'b'
92
+ #
93
+ # This means that the second example will not match any input; when the second
94
+ # part is parsed, the first part has asserted the presence of 'a', and thus
95
+ # str('b') cannot match. The prsnt? method is the opposite of absnt?, it
96
+ # asserts presence.
97
+ #
98
+ # More documentation on these methods can be found in Parslets::Atoms::Base.
99
+ #
100
+ # = Intermediary Parse Trees
101
+ #
102
+ # As you have probably seen above, you can hand input (strings or StringIOs) to
103
+ # your parslets like this:
104
+ #
105
+ # parslet.parse(str)
106
+ #
107
+ # This returns an intermediary parse tree or raises an exception
108
+ # (Parslet::ParseFailed) when the input is not well formed.
109
+ #
110
+ # Intermediary parse trees are essentially just Plain Old Ruby Objects. (PORO
111
+ # technology as we call it.) Parslets try very hard to return sensible stuff;
112
+ # it is quite easy to use the results for the later stages of your program.
113
+ #
114
+ # Here a few examples and what their intermediary tree looks like:
115
+ #
116
+ # str('foo').parse('foo') # => 'foo'
117
+ # (str('f') >> str('o') >> str('o')).parse('foo') # => 'foo'
118
+ #
119
+ # Naming parslets
120
+ #
121
+ # Construction of lambda blocks
122
+ #
123
+ # = Intermediary Tree transformation
124
+ #
125
+ # The intermediary parse tree by itself is most often not very useful. Its
126
+ # form is volatile; changing your parser in the slightest might produce
127
+ # profound changes in the generated trees.
128
+ #
129
+ # Generally you will want to construct a more stable tree using your own
130
+ # carefully crafted representation of the domain. Parslet provides you with
131
+ # an elegant way of transmogrifying your intermediary tree into the output
132
+ # format you choose. This is achieved by transformation rules such as this
133
+ # one:
134
+ #
135
+ # transform.rule(:literal => {:string => :_x}) { |d|
136
+ # StringLit.new(*d.values) }
137
+ #
138
+ # The above rule will transform a subtree looking like this:
139
+ #
140
+ # :literal
141
+ # |
142
+ # :string
143
+ # |
144
+ # "somestring"
145
+ #
146
+ # into just this:
147
+ #
148
+ # StringLit
149
+ # value: "somestring"
150
+ #
151
+ #
152
+ # = Further documentation
153
+ #
154
+ # Please see the examples subdirectory of the distribution for more examples.
155
+ # Check out 'rooc' (github.com/kschiess/rooc) as well - it uses parslet for
156
+ # compiler construction.
157
+ #
158
+ module Parslet
159
+ def self.included(base)
160
+ base.extend(ClassMethods)
161
+ end
162
+
163
+ # This is raised when the parse failed to match or to consume all its input.
164
+ # It contains the message that should be presented to the user. If you want
165
+ # to display more error explanation, you can print the #error_tree that is
166
+ # stored in the parslet. This is a graphical representation of what went
167
+ # wrong.
168
+ #
169
+ # Example:
170
+ #
171
+ # begin
172
+ # parslet.parse(str)
173
+ # rescue Parslet::ParseFailed => failure
174
+ # puts parslet.error_tree.ascii_tree
175
+ # end
176
+ #
177
+ class ParseFailed < Exception
178
+ end
179
+
180
+ module ClassMethods
181
+ # Define the parsers #root function. This is the place where you start
182
+ # parsing; if you have a rule for 'file' that describes what should be
183
+ # in a file, this would be your root declaration:
184
+ # class Parser
185
+ # root :file
186
+ # rule(:file) { ... }
187
+ # end
188
+ #
189
+ # #root declares a 'parse' function that works just like the parse
190
+ # function that you can call on a simple parslet, taking a string as input
191
+ # and producing parse output.
192
+ #
193
+ # In a way, #root is a shorthand for:
194
+ #
195
+ # def parse(str)
196
+ # your_parser_root.parse(str)
197
+ # end
198
+ #
199
+ def root(name)
200
+ define_method(:root) do
201
+ self.send(name)
202
+ end
203
+ define_method(:parse) do |str|
204
+ root.parse(str)
205
+ end
206
+ end
207
+
208
+ # Define an entity for the parser. This generates a method of the same name
209
+ # that can be used as part of other patterns. Those methods can be freely
210
+ # mixed in your parser class with real ruby methods.
211
+ #
212
+ # Example:
213
+ #
214
+ # class MyParser
215
+ # include Parslet
216
+ #
217
+ # rule :bar { str('bar') }
218
+ # rule :twobar do
219
+ # bar >> bar
220
+ # end
221
+ #
222
+ # def parse(str)
223
+ # twobar.parse(str)
224
+ # end
225
+ # end
226
+ #
227
+ def rule(name, &definition)
228
+ define_method(name) do
229
+ @rules ||= {} # <name, rule> memoization
230
+ @rules[name] or
231
+ (@rules[name] = Atoms::Entity.new(name, self, definition))
232
+ end
233
+ end
234
+ end
235
+
236
+ # Returns an atom matching a character class. This is essentially a regular
237
+ # expression, but you should only match a single character.
238
+ #
239
+ # Example:
240
+ #
241
+ # match('[ab]') # will match either 'a' or 'b'
242
+ # match('[\n\s]') # will match newlines and spaces
243
+ #
244
+ def match(obj)
245
+ Atoms::Re.new(obj)
246
+ end
247
+ module_function :match
248
+
249
+ # Returns an atom matching the +str+ given.
250
+ #
251
+ # Example:
252
+ #
253
+ # str('class') # will match 'class'
254
+ #
255
+ def str(str)
256
+ Atoms::Str.new(str)
257
+ end
258
+ module_function :str
259
+
260
+ # Returns an atom matching any character.
261
+ #
262
+ def any
263
+ Atoms::Re.new('.')
264
+ end
265
+ module_function :any
266
+
267
+ # Returns a placeholder for a tree transformation that will only match a
268
+ # sequence of elements. The +symbol+ you specify will be the key for the
269
+ # matched sequence in the returned dictionary.
270
+ #
271
+ # Example:
272
+ #
273
+ # # This would match a body element that contains several declarations.
274
+ # { :body => sequence(:declarations) }
275
+ #
276
+ # The above example would match :body => ['a', 'b'], but not :body => 'a'.
277
+ #
278
+ def sequence(symbol)
279
+ Pattern::SequenceBind.new(symbol)
280
+ end
281
+ module_function :sequence
282
+
283
+ # Returns a placeholder for a tree transformation that will only match
284
+ # simple elements. This matches everything that #sequence doesn't match.
285
+ #
286
+ # Example:
287
+ #
288
+ # # Matches a single header.
289
+ # { :header => simple(:header) }
290
+ #
291
+ def simple(symbol)
292
+ Pattern::SimpleBind.new(symbol)
293
+ end
294
+ module_function :simple
295
+ end
296
+
297
+ require 'parslet/error_tree'
298
+ require 'parslet/atoms'
299
+ require 'parslet/pattern'
300
+ require 'parslet/pattern/binding'
301
+ require 'parslet/transform'