RubyGems - parslet - Versions diffs - 0.9.0 - Mend

parslet 0.9.0

Files changed (12) hide show

data/Gemfile +7 -0
data/HISTORY.txt +21 -0
data/LICENSE +23 -0
data/README +101 -0
data/Rakefile +73 -0
data/lib/parslet.rb +301 -0
data/lib/parslet/atoms.rb +492 -0
data/lib/parslet/error_tree.rb +50 -0
data/lib/parslet/pattern.rb +144 -0
data/lib/parslet/pattern/binding.rb +40 -0
data/lib/parslet/transform.rb +118 -0
metadata +100 -0

data/Gemfile ADDED Viewed

@@ -0,0 +1,7 @@
+# A sample Gemfile
+source "http://rubygems.org"
+group :development do
+  gem 'rspec'
+  gem 'flexmock'
+end

data/HISTORY.txt ADDED Viewed

@@ -0,0 +1,21 @@
+= 0.9.0 / ???
+  * More of everything: Examples, documentation, etc...
+  * Breaking change: Ruby's binary or ('|') is now used for alternatives,
+    instead of the division sign ('/') - this reduces the amount of
+    parenthesis needed for a grammar overall.
+  * parslet.maybe now yields the result or nil in case of parse failure. This
+    is probably better than the array it did before; the jury is still out on
+    that.
+  * parslet.repeat(min, max) is now valid syntax
+= 0.1.0 / not released.
+  * Initial version. Classes for parsing, matching in the resulting trees
+    and transforming the trees into something more useful.
+  * Parses and outputs intermediary trees
+  * Matching of single elements and sequences

data/LICENSE ADDED Viewed

@@ -0,0 +1,23 @@
+ Copyright (c) 2010 Kaspar Schiess
+ Permission is hereby granted, free of charge, to any person
+ obtaining a copy of this software and associated documentation
+ files (the "Software"), to deal in the Software without
+ restriction, including without limitation the rights to use,
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
+ copies of the Software, and to permit persons to whom the
+ Software is furnished to do so, subject to the following
+ conditions:
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ OTHER DEALINGS IN THE SOFTWARE.

data/README ADDED Viewed

@@ -0,0 +1,101 @@
+INTRODUCTION
+A small library that implements a PEG grammar. PEG means Parsing Expression
+Grammars [1]. These are a different kind of grammars that recognize almost the
+same languages as your conventional LR parser, except that they are easier to
+work with, since they haven't been conceived for generation, but for
+recognition of languages. You can read the founding paper of the field by
+Bryan Ford here [2].
+Other Ruby projects that work on the same topic are:
+http://wiki.github.com/luikore/rsec/
+http://github.com/mjijackson/citrus
+http://github.com/nathansobo/treetop
+My goal here was to see how a parser/parser generator should be constructed to
+allow clean AST construction and good error handling. It seems to me that most
+often, parser generators only handle the success-case and forget about
+debugging and error generation.
+More specifically, this library is motivated by one of my compiler projects. I
+started out using 'treetop' (see the link above), but found it unusable. It
+was lacking in
+  * error reporting: Hard to see where a grammar fails.
+  * stability of generated trees: Intermediary trees were dictated by the
+    grammar. It was hard to define invariants in that system - what was
+    convenient when writing the grammar often wasn't in subsequent stages.
+  * clarity of parser code: The parser code is generated and is very hard
+    to read. Add that to the first point to understand my pain.
+So parslet tries to be different. It doesn't generate the parser, but instead
+defines it in a DSL which is very close to what you find in [2]. A successful
+parse then generates a parser tree consisting entirely of hashes and arrays
+and strings (read: instable). This parser tree can then be converted to a real
+AST (read: stable) using a pattern matcher that is also part of this library.
+Error reporting is another area where parslet excels: It is able to print not
+only the error you are used to seeing ('Parse failed because of REASON at line
+1 and char 2'), but also prints what led to that failure in the form of a
+tree (#error_tree method).
+[1] http://en.wikipedia.org/wiki/Parsing_expression_grammar
+[2] http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
+SYNOPSIS
+  require 'parslet'
+  include Parslet
+  # Constructs a parser using a Parser Expression Grammar like DSL:
+  parser =  str('"') >>
+            (
+              str('\\') >> any |
+              str('"').absnt? >> any
+            ).repeat.as(:string) >>
+            str('"')
+  # Parse the string and capture parts of the interpretation (:string above)
+  tree = parser.parse(%Q{
+    "This is a \\"String\\" in which you can escape stuff"
+  }.strip)
+  tree # => {:string=>"This is a \\\"String\\\" in which you can escape stuff"}
+  # Here's how you can grab results from that tree:
+  Pattern.new(:string => simple(:x)).each_match(tree) do |dictionary|
+    puts "String contents: #{dictionary[:x]}"
+  end
+  # Here's how to transform that tree into something else ----------------------
+  # Defines the classes of our new Syntax Tree
+  class StringLiteral < Struct.new(:text); end
+  # Defines a set of transformation rules on tree leafes
+  transform = Transform.new
+  transform.rule(:string => simple(:x)) { |d| StringLiteral.new(d[:x]) }
+  # Transforms the tree
+  transform.apply(tree)
+  # => #<struct StringLiteral text="This is a \\\"String\\\" ... escape stuff">
+COMPATIBILITY
+This library should work with both ruby 1.8 and ruby 1.9.
+AUTHORS
+My gigantous thanks go to the following cool guys and gals that help make this
+rock:
+Florian Hanke <florian.hanke@gmail.com>
+STATUS
+On the road to 1.0; improving documentation, packaging and upgrading to rspec2.
+(c) 2010 Kaspar Schiess

data/Rakefile ADDED Viewed

@@ -0,0 +1,73 @@
+require "rubygems"
+require "rake/gempackagetask"
+require "rake/rdoctask"
+require 'rspec/core/rake_task'
+desc "Run all examples"
+RSpec::Core::RakeTask.new
+task :default => :spec
+# This builds the actual gem. For details of what all these options
+# mean, and other ones you can add, check the documentation here:
+#
+#   http://rubygems.org/read/chapter/20
+#
+spec = Gem::Specification.new do |s|
+  # Change these as appropriate
+  s.name              = "parslet"
+  s.version           = "0.9.0"
+  s.summary           = "Parser construction library with great error reporting in Ruby."
+  s.author            = "Kaspar Schiess"
+  s.email             = "kaspar.schiess@absurd.li"
+  s.homepage          = "http://kschiess.github.com/parslet"
+  s.has_rdoc          = true
+  s.extra_rdoc_files  = %w(README)
+  s.rdoc_options      = %w(--main README)
+  # Add any extra files to include in the gem
+  s.files             = %w(Gemfile HISTORY.txt LICENSE Rakefile README) + Dir.glob("{spec,lib/**/*}")
+  s.require_paths     = ["lib"]
+  # If you want to depend on other gems, add them here, along with any
+  # relevant versions
+  # s.add_dependency("some_other_gem", "~> 0.1.0")
+  # If your tests use any gems, include them here
+  s.add_development_dependency("rspec")
+  s.add_development_dependency("flexmock")
+end
+# This task actually builds the gem. We also regenerate a static
+# .gemspec file, which is useful if something (i.e. GitHub) will
+# be automatically building a gem for this project. If you're not
+# using GitHub, edit as appropriate.
+#
+# To publish your gem online, install the 'gemcutter' gem; Read more
+# about that here: http://gemcutter.org/pages/gem_docs
+Rake::GemPackageTask.new(spec) do |pkg|
+  pkg.gem_spec = spec
+end
+desc "Build the gemspec file #{spec.name}.gemspec"
+task :gemspec do
+  file = File.dirname(__FILE__) + "/#{spec.name}.gemspec"
+  File.open(file, "w") {|f| f << spec.to_ruby }
+end
+task :package => :gemspec
+# Generate documentation
+Rake::RDocTask.new do |rd|
+  rd.main = "README"
+  rd.rdoc_files.include("README", "lib/**/*.rb")
+  rd.rdoc_dir = "rdoc"
+end
+desc 'Clear out RDoc and generated packages'
+task :clean => [:clobber_rdoc, :clobber_package] do
+  rm "#{spec.name}.gemspec"
+end

data/lib/parslet.rb ADDED Viewed

@@ -0,0 +1,301 @@
+require 'stringio'
+# A simple parser generator library. Typical usage would look like this:
+#
+#   require 'parslet'
+#
+#   class MyParser
+#     include Parslet
+#
+#     rule(:a) { str('a').repeat }
+#
+#     def parse(str)
+#       a.parse(str)
+#     end
+#   end
+#
+#   pp MyParser.new.parse('aaaa')   # => 'aaaa'
+#   pp MyParser.new.parse('bbbb')   # => Parslet::Atoms::ParseFailed:
+#                                   #    Don't know what to do with bbbb at line 1 char 1.
+#
+# The simple DSL allows you to define grammars in PEG-style. This kind of
+# grammar construction does away with the ambiguities that usually comes with
+# parsers; instead, it allows you to construct grammars that are easier to
+# debug, since less magic is involved.
+#
+# Parslet is typically used in stages:
+#
+#
+# * Parsing the input string; this yields an intermediary tree
+# * Transformation of the tree into something useful to you
+#
+# The first stage is traditionally intermingled with the second stage; output
+# from the second stage is usually called the 'Abstract Syntax Tree' or AST.
+#
+# The stages are completely decoupled; You can change your grammar around
+# and use the second stage to isolate the rest of your code from the changes
+# you've effected.
+#
+# = Language Atoms
+#
+# PEG-style grammars build on a very small number of atoms, or parslets. In
+# fact, only three types of parslets exist. Here's how to match a string:
+#
+#   str('a string')
+#
+# This matches the string 'a string' literally and nothing else. If your input
+# doesn't contain the string, it will fail. Here's how to match a character
+# set:
+#
+#   match('[abc]')
+#
+# This matches 'a', 'b' or 'c'. The string matched will always have a length
+# of 1; to match longer strings, please see the title below. The last parslet
+# of the three is 'any':
+#
+#   any
+#
+# 'any' functions like the dot in regular expressions - it matches any single
+# character.
+#
+# = Combination and Repetition
+#
+# Parslets only get useful when combined to grammars. To combine one parslet
+# with the other, you have 4 kinds of methods available: repeat and maybe, >>
+# (sequence), | (alternation), absnt? and prsnt?.
+#
+#   str('a').repeat     # any number of 'a's, including 0
+#   str('a').maybe      # maybe there'll be an 'a', maybe not
+#
+# Parslets can be joined using >>. This means: Match the left parslet, then
+# match the right parslet.
+#
+#   str('a') >> str('b')  # would match 'ab'
+#
+# Keep in mind that all combination and repetition operators themselves return
+# a parslet. You can combine the result again:
+#
+#   ( str('a') >> str('b') ) >> str('c')    # would match 'abc'
+#
+# The slash ('|') indicates alternatives:
+#
+#   str('a') | str('b')   # would match 'a' OR 'b'
+#
+# The left side of an alternative is matched first; if it matches, the right
+# side is never looked at.
+#
+# The absnt? and prsnt? qualifiers allow looking at input without consuming
+# it:
+#
+#   str('a').absnt?               # will match if at the current position there is an 'a'.
+#   str('a').absnt? >> str('b')   # check for 'a' then match 'b'
+#
+# This means that the second example will not match any input; when the second
+# part is parsed, the first part has asserted the presence of 'a', and thus
+# str('b') cannot match. The prsnt? method is the opposite of absnt?, it
+# asserts presence.
+#
+# More documentation on these methods can be found in Parslets::Atoms::Base.
+#
+# = Intermediary Parse Trees
+#
+# As you have probably seen above, you can hand input (strings or StringIOs) to
+# your parslets like this:
+#
+#   parslet.parse(str)
+#
+# This returns an intermediary parse tree or raises an exception
+# (Parslet::ParseFailed) when the input is not well formed.
+#
+# Intermediary parse trees are essentially just Plain Old Ruby Objects. (PORO
+# technology as we call it.) Parslets try very hard to return sensible stuff;
+# it is quite easy to use the results for the later stages of your program.
+#
+# Here a few examples and what their intermediary tree looks like:
+#
+#   str('foo').parse('foo')                           # => 'foo'
+#   (str('f') >> str('o') >> str('o')).parse('foo')   # => 'foo'
+#
+# Naming parslets
+#
+# Construction of lambda blocks
+#
+# = Intermediary Tree transformation
+#
+# The intermediary parse tree by itself is most often not very useful. Its
+# form is volatile; changing your parser in the slightest might produce
+# profound changes in the generated trees.
+#
+# Generally you will want to construct a more stable tree using your own
+# carefully crafted representation of the domain. Parslet provides you with
+# an elegant way of transmogrifying your intermediary tree into the output
+# format you choose. This is achieved by transformation rules such as this
+# one:
+#
+#   transform.rule(:literal => {:string => :_x}) { |d|
+#     StringLit.new(*d.values) }
+#
+# The above rule will transform a subtree looking like this:
+#
+#                              :literal
+#                                   |
+#                               :string
+#                                   |
+#                              "somestring"
+#
+# into just this:
+#
+#                               StringLit
+#                               value: "somestring"
+#
+#
+# = Further documentation
+#
+# Please see the examples subdirectory of the distribution for more examples.
+# Check out 'rooc' (github.com/kschiess/rooc) as well - it uses parslet for
+# compiler construction.
+#
+module Parslet
+  def self.included(base)
+    base.extend(ClassMethods)
+  end
+  # This is raised when the parse failed to match or to consume all its input.
+  # It contains the message that should be presented to the user. If you want
+  # to display more error explanation, you can print the #error_tree that is
+  # stored in the parslet. This is a graphical representation of what went
+  # wrong.
+  #
+  # Example:
+  #
+  #   begin
+  #     parslet.parse(str)
+  #   rescue Parslet::ParseFailed => failure
+  #     puts parslet.error_tree.ascii_tree
+  #   end
+  #
+  class ParseFailed < Exception
+  end
+  module ClassMethods
+    # Define the parsers #root function. This is the place where you start
+    # parsing; if you have a rule for 'file' that describes what should be
+    # in a file, this would be your root declaration:
+    #   class Parser
+    #     root :file
+    #     rule(:file) { ... }
+    #   end
+    #
+    # #root declares a 'parse' function that works just like the parse
+    # function that you can call on a simple parslet, taking a string as input
+    # and producing parse output.
+    #
+    # In a way, #root is a shorthand for:
+    #
+    #   def parse(str)
+    #     your_parser_root.parse(str)
+    #   end
+    #
+    def root(name)
+      define_method(:root) do
+        self.send(name)
+      end
+      define_method(:parse) do |str|
+        root.parse(str)
+      end
+    end
+    # Define an entity for the parser. This generates a method of the same name
+    # that can be used as part of other patterns. Those methods can be freely
+    # mixed in your parser class with real ruby methods.
+    #
+    # Example:
+    #
+    #   class MyParser
+    #     include Parslet
+    #
+    #     rule :bar { str('bar') }
+    #     rule :twobar do
+    #       bar >> bar
+    #     end
+    #
+    #     def parse(str)
+    #       twobar.parse(str)
+    #     end
+    #   end
+    #
+    def rule(name, &definition)
+      define_method(name) do
+        @rules ||= {}     # <name, rule> memoization
+        @rules[name] or
+          (@rules[name] = Atoms::Entity.new(name, self, definition))
+      end
+    end
+  end
+  # Returns an atom matching a character class. This is essentially a regular
+  # expression, but you should only match a single character.
+  #
+  # Example:
+  #
+  #   match('[ab]')     # will match either 'a' or 'b'
+  #   match('[\n\s]')   # will match newlines and spaces
+  #
+  def match(obj)
+    Atoms::Re.new(obj)
+  end
+  module_function :match
+  # Returns an atom matching the +str+ given.
+  #
+  # Example:
+  #
+  #   str('class')      # will match 'class'
+  #
+  def str(str)
+    Atoms::Str.new(str)
+  end
+  module_function :str
+  # Returns an atom matching any character.
+  #
+  def any
+    Atoms::Re.new('.')
+  end
+  module_function :any
+  # Returns a placeholder for a tree transformation that will only match a
+  # sequence of elements. The +symbol+ you specify will be the key for the
+  # matched sequence in the returned dictionary.
+  #
+  # Example:
+  #
+  #   # This would match a body element that contains several declarations.
+  #   { :body => sequence(:declarations) }
+  #
+  # The above example would match :body => ['a', 'b'], but not :body => 'a'.
+  #
+  def sequence(symbol)
+    Pattern::SequenceBind.new(symbol)
+  end
+  module_function :sequence
+  # Returns a placeholder for a tree transformation that will only match
+  # simple elements. This matches everything that #sequence doesn't match.
+  #
+  # Example:
+  #
+  #   # Matches a single header.
+  #   { :header => simple(:header) }
+  #
+  def simple(symbol)
+    Pattern::SimpleBind.new(symbol)
+  end
+  module_function :simple
+end
+require 'parslet/error_tree'
+require 'parslet/atoms'
+require 'parslet/pattern'
+require 'parslet/pattern/binding'
+require 'parslet/transform'