RubyGems - mulparse - Versions diffs - 1.0.1 - Mend

mulparse 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +7 -0
data/README.md +59 -0
data/lib/mulparse.rb +7 -0
data/lib/mulparse/component.rb +134 -0
data/lib/mulparse/lang-configs/ruby.yaml +73 -0
data/lib/mulparse/mulparse_exception.rb +9 -0
data/lib/mulparse/parser.rb +222 -0
data/lib/mulparse/unmatched_exception.rb +17 -0
metadata +53 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 71ca881927e698064749fcceaba4af74476a83db
+  data.tar.gz: 7ec5d516bcc59cf06429887640f969e0bde126dc
+SHA512:
+  metadata.gz: b070fa38f8b5ca42280ca429b86e2ec9a16078850e292fe16f5a75ce760c9244210cfb6e9d694d54cf22fbc80f56dd98c0cda5e90f30bbae408b93af96cd204e
+  data.tar.gz: 71d954628f2be33a6db27a7431ac4ac7c1b441ff3d2a31fd16d0bd02eca72383e496fc2135955d93ac41c5d653cbfca762595692e8367ffb2de72828138f090f

data/README.md ADDED Viewed

@@ -0,0 +1,59 @@
+snakeyworm John 3:16
+# Mulparse
+***
+ Mulparse is a multilingual-parser for coding languages. The philosophy
+ behind this parser is having the ability to focus on bigger aspects of
+ your project. Parsing is a small and tedious aspect. This parser has the
+ advantage of being able to parse anything if properly defined in a "lang-config"
+ This parser also has many predefined "lang-configs" that provide simple
+ and efficient parsing of the languages you love. This parser was designed
+ to be easier to learn than a parser like citrus. The citrus parser has its
+ own mini language to learn. Mulparse only has simple hashes with a few fields
+ that contain simple values.
+### Lang-config Syntax
+***
+ A lang-config is simply an array that contains hashes where each hash defines
+ how to parse a component. A hash can contain the following fields where each
+ field is a symbol&#58;
++ name (Required)
++ start (Required)
++ finish (Required)
++ unestable
++ hash
+#### name
+ This field is a string that represents the name of the component matched
+#### start
+ A regular expression the matchesthe opening delimiter of a component.
+#### finish
+ Either a regular expression or a string. If it is a regular expression it simply
+ matches the end delimiter of a component. One should note the special treatment of
+ any capture named "escape" in this regular expression. Where the regular expression
+ doesn't match unless the named capture's length is even. This is useful for matching
+ things such as strings that have an escape character. Otherwise if it is a string then
+ any named capture. If it is a string than all named captures from the start regular
+ expressions are passed to Regexp.escape() then the return value of Regexp.escape are
+ formatted into the string then it is converted to a regular expression. Note when this
+ field is a string "escape" is still specially treated.
+#### unestable
+ Is a boolean value denoting wether or not the component it is applied on can't nest any
+ other components.
+#### hash
+ This field is a hash. This hash's keys are used when the finish field is a string. In this
+ case all named captures from start are passed to this hash to retrieve a value to be formatted
+ into finish before it is converted to a regular expression.

data/lib/mulparse.rb ADDED Viewed

@@ -0,0 +1,7 @@
+# Require needed files
+require "mulparse/mulparse_exception.rb"
+require "mulparse/unmatched_exception.rb"
+require "mulparse/component.rb"
+require "mulparse/parser.rb"

data/lib/mulparse/component.rb ADDED Viewed

@@ -0,0 +1,134 @@
+# snakeyworm John 3:16
+# This file defines a class that contains
+# data collected by the parser(Commonly referred
+# to as components). Each instance of this class
+# is linked to a parent and one or more child nodes.
+# TODO:
+#
+#   * Expand as needed to fit expanding lang config and
+#    parser.
+#
+class Component
+    # Class that wraps component data and provides hierarhcy navigation
+    # Declare accessor's to metadata
+    # Delimiters
+    attr_accessor :start_match
+    attr_accessor :finish_match
+    # Hierarchial
+    attr_reader :parent
+    attr_reader :children
+    # Lang config
+    attr_reader :name
+    attr_reader :start
+    attr_reader :finish
+    attr_reader :unestable
+    attr_reader :hash
+    def initialize(
+        next_start=nil,
+        parent=nil,
+        name: nil,
+        start: nil,
+        finish: nil,
+        unestable: nil,
+        hash: nil )
+        #   Initialize the ComponentNode with the
+        #   data provided.
+        @start_match = next_start # Store the component's opening match
+        # Set up lang config attributes
+        @name = name
+        @start = start
+        @finish = finish
+        @unestable = unestable
+        @hash = hash
+        @parent = parent # Set the ComponentNode's parent
+        @parent.children << self if parent
+        @children = []; # Set the children of this ComponentNode to an empty array
+    end
+    def sibling( offset=1 )
+        # Returns the sibling offset by the
+        # provided parameter.
+        return @parent.children[ @parent.children.index( self ) + offset ]
+    end
+    def has_sibling_at?( offset=1 )
+        # Returns true if a call to sibling() with the
+        # same parameter would run without error.
+        return @parent.children.index( self ) + offset < @parent.children.length
+    end
+    def get_src( src )
+        # Return the matches and the content between
+        # them in the src provided.
+        begin
+            return src[ ( @start_match.begin( 0 )...@finish_match.end( 0 ) ) ]
+        rescue NoMethodError
+            return @start_match[0]
+        end
+    end
+    def to_s()
+        # Return a string representation of this Component
+        if @start_match then
+            abbreviated_start_match = @start_match[0][0..10]
+            abbreviated_start_match += "..." if @start_match.length > 11
+        end
+        return "Component(\n" \
+            "    start_match=%p,\n" \
+            "    parent.name=%p,\n" \
+            "    name=%p,\n" \
+            "    start=%s,\n" \
+            "    finish=%s,\n" \
+            "    unestable=%p\n" \
+            ")" % [
+                abbreviated_start_match,
+                @parent&.name,
+                @name,
+                @start,
+                @finish || "nil",
+                @unestable
+            ]
+    end
+end

data/lib/mulparse/lang-configs/ruby.yaml ADDED Viewed

@@ -0,0 +1,73 @@
+---
+- :name: module
+  :start: !ruby/regexp "/\n        (?<=\\s)module[\\t ]+ # Keyword\n        (?:\\\\\\n+\\s*)*
+    # White space and line continuation\n        (?<name>[A-Z]\\w*)(?=\\s) # Name\n
+    \   /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: class
+  :start: !ruby/regexp "/\n        (?<=\\s)class[\\t ]+ # Keyword\n        (?:\\\\\\n+[\\t
+    ]*)* # White space and line continuation\n        (?<nesting_class>(?:[A-Z]\\w*::(?:\\\\\\n+[\\t
+    ]*)*)*) # White space and line continuation\n        (?<name>[A-Z]\\w*) # Name\n
+    \       (?:[\\t ]*(?:[\\t ]+\\\\\\n+\\s*)* # White space and line continuation\n
+    \       <? # Inheritance operator\n        [\\t ]*(?:[\\t ]+\\\\\\n+\\s*)* # White
+    space and line continuation\n        (?<parent>[A-Z]\\w*))?(?=\\s) # Parent class\n
+    \   /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: singleton-class
+  :start: !ruby/regexp "/\n        (?<=\\s)class[\\t ]+ # Keyword\n        (?:\\\\\\n+[\\t
+    ]*)* # White space and line continuation\n        [\\t ]*\n        << # Singleton
+    operator\n        [\\t ]*(?:[\\t ]+\\\\\\n+\\s*)* # White space and line continuation\n
+    \       (?<name>\\w*)(?=\\s) # Name\n    /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: method
+  :start: !ruby/regexp "/\n        (?<=\\s)def[\\t ]+ # Keyword\n        (?:\\\\\\n+[\\t
+    ]*)* # White space and line continuation\n        (?:(?<class>\\w+) # Class\n
+    \       (?:\\\\\\n+[\\t ]*)*[\\t ]* # White space and line continuation\n        \\.
+    # Dot operator\n        [\\t ]*(?:\\\\\\n+[\\t ]*)*)? # White space and line continuation\n
+    \       (?<name>\\w+)(?=[\\s\\(]*) # Name\n    /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: access-modifier
+  :start: !ruby/regexp /(?<=\s)private|protected|public(?=\s)/
+- :name: case
+  :start: !ruby/regexp /(?<=\s)case(?=[\(\t ])/
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: if
+  :start: !ruby/regexp "/\n        (?<=[\\n;])[\\t ]* # White space\n        if #
+    Key word\n        (?:\\\\\\n+[\\t ]*)*[\\s\\(] # White space and line continuation\n
+    \   /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: unless
+  :start: !ruby/regexp "/\n        (?<=[\\n;])[\\t ]* # White space\n        unless
+    # Key word\n        (?:\\\\\\n+[\\t ]*)*[\\s\\(] # White space and line continuation\n
+    \   /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: do-block
+  :start: !ruby/regexp "/\n        [\\)\\s]+ # White space\n        do # Key word\n
+    \       (?:\\\\\\n+[\\t ]*)*[\\s\\|] # White space and line continuation\n    /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: begin
+  :start: !ruby/regexp "/\n        ;?\\s+ # White space\n        begin # Key word\n
+    \       [(\\s] # White space\n    /x"
+  :finish: !ruby/regexp /(?<=\s)end(?=\s)/
+- :name: multi-comment
+  :start: !ruby/regexp /^=begin(?<content>\s.*?)(?<=\n)=end(?=\s)/m
+- :name: single-comment
+  :start: !ruby/regexp /#(?<content>.*?)(?=\n)/
+- :name: string
+  :start: !ruby/regexp /(?:%[q|Q]?(?<delimiter>"|'|[\`\~\!\@\#\$\%\^\&\*\(\)\-\_\=\+\[\]\{\}\\\|\;\:\,\<\.\>\/\?]))|(?<delimiter>"|')/
+  :finish: "(?<escape>\\\\*)%{delimiter}"
+  :unestable: true
+  :hash: &1
+    "{": "}"
+    "<": ">"
+    "[": "]"
+    "/": "/"
+- :name: regular-expression
+  :start: !ruby/regexp /(?:%r(?<delimiter>"|'|[\`\~\!\@\#\$\%\^\&\*\(\)\-\_\=\+\[\]\{\}\\\|\;\:\,\<\.\>\/\?]))|(?<delimiter>\/)/
+  :finish: "(?<escape>\\\\*)%{delimiter}"
+  :unestable: true
+  :hash: *1
+- :name: here-document
+  :start: !ruby/regexp /[=\s]<<\-?(?<mode>['"`]?)(?<EOD>[A-Za-z_]\w*)\k<mode>/
+  :finish: "(?<escape>[\\\\]*)%{EOD}"
+  :unestable: true

data/lib/mulparse/mulparse_exception.rb ADDED Viewed

@@ -0,0 +1,9 @@
+# snakeyworm John 3:16
+# This class provides a superclass for
+# all parsing exceptions.
+class MulparseException < Exception
+end

data/lib/mulparse/parser.rb ADDED Viewed

@@ -0,0 +1,222 @@
+# snakeyworm John 3:16
+# This files defines the class that actually
+# performs IO on the file(s) provided and
+# parses the input.
+require "YAML"
+class Parser
+    # This class is used to instantiate an instance that provides
+    # Enumerable interface for parsing.
+    #
+    # TODO:
+    #
+    #   * Ensure that the lang config ussage is well documented.
+    #
+    include Enumerable
+    attr_reader :src
+    def initialize( src, lang_config )
+        # Initialize a parser and set lang_config and other parsing configurations
+        @src = src # src must be loaded from a text stream not a binary one
+        if lang_config.class == String then
+            # Open predefined lang-config
+            @lang_config = YAML.load( IO.read(
+                "#{File.absolute_path( File.expand_path( File.join( File.dirname( __FILE__ ), '..' ) ) )}" \
+                "/mulparse/lang-configs/#{lang_config}.yaml"
+            ) )
+        else
+            # Else use user provided lang-config
+            @lang_config = lang_config
+        end
+        @position = 0
+        # Store the current Component
+        @current = Component.new( finish: true )
+    end
+    def each( &block )
+        # Implements each method defined in Enumerable.
+        # This method passes the next component parsed
+        # in the srouce. The component is in the form
+        # of a Component.
+        loop do
+            # Either finish the currentCN or open a new one on each iteration
+            # Get the next start delimiter and its corresponding lcEntry
+            next_start, start_entry = get_next_start()
+            # Keep track of the existance of the finish attribute
+            has_finish = @current.finish && @current.finish != true # A true value indicates the root Component
+            if has_finish then
+                # If finsih regex provided prepare closing regex
+                closing_regex = @current.finish
+                if closing_regex.is_a?( String ) then
+                    # Format in mirror to the closing_regex if mirror is provided
+                    if @current.hash then
+                        # Process hash argument if provided
+                        closing_regex = Regexp.new( closing_regex % Hash.new { |_, key|
+                            Regexp.escape( @current.hash[ @current.start_match[ key ] ] || @current.start_match[ key ] )
+                        } )
+                    else
+                        # Else process normally
+                        closing_regex = Regexp.new( closing_regex % Hash.new { |_, key|
+                            Regexp.escape( @current.start_match[ key ] )
+                        } )
+                    end
+                end
+                # Match for next closing delimiter
+                next_finish = closing_regex.match( @src, @position )
+                raise UnmatchedException.new(
+                    @current.name,
+                    @src[ 0..@current.start_match.end( 0 ) ].count( "\n" ) + 1
+                ) unless next_finish
+                # Ensure "escape" capture is valid
+                until next_finish[ :escape ].length.even?
+                    # Match until length of "escape" capture is even
+                    next_finish = closing_regex.match( @src, next_finish.end( 0 ) )
+                end if closing_regex.names.include?( "escape" )
+            end
+            if !next_start then
+                # If done parsing yield last Component
+                @current.finish_match = next_finish
+                yield @current
+                break
+            end
+            break if !next_start # Break if pasrsing is complete
+            if @current&.parent && ( has_finish &&
+                ( @current.unestable || next_finish.begin( 0 ) < next_start.begin( 0 ) ) ) then
+                # If current Component should be finished then finish it
+                begin
+                    @position = next_finish.end( 0 ) # Update position
+                rescue NoMethodError
+                    # Raise UnmatchedException if next_finish is nil
+                    raise UnmatchedException.new(
+                        @current.name,
+                        @src[ 0..@current.start_match.end( 0 ) ].count( "\n" ) + 1
+                    )
+                end
+                # Update parsing data
+                @current.finish_match = next_finish
+                # Yield finished Component
+                yield @current
+                # Set current Component to the previous's parent
+                @current = @current.parent
+            else
+                # Else new Component is found set it as current
+                @current = Component.new( next_start, @current, start_entry )
+                @position = next_start.end( 0 ) # Update position
+                # Yield if Component uses syntactic sugar unestability
+                unless @current.finish && @current.finish != true  then
+                    yield @current
+                    @current = @current.parent
+                end
+            end
+        end
+    end
+    # External/Internal Utility Methods
+    def line
+        # Return number of lines already parsed
+        return @src[ 0..@position ].count( "\n" )
+    end
+    # Internal Utility Methods
+    protected
+    def get_next_start
+        #   This method finds the nearest match of a
+        #   components starting delimiter in the @src.
+        #   Returns nil if there are no more matches.
+        next_start = nil # Store the next starting delimiter
+        lcEntry = nil # Store the lang config entry of next_start
+        for i in @lang_config do
+            # Test match against closest starting delimiter on each iteration
+            matchBuffer = i[ :start ].match( @src, @position ) # Get a match
+            next unless matchBuffer # If no match is found then skip to next iteration
+            # Set the value of next_start to the closer match
+            next_start = ( !next_start ||
+                          matchBuffer.begin( 0 ) < next_start.begin( 0 ) ) ?
+                          matchBuffer : next_start
+            if next_start == matchBuffer then lcEntry = i end # Update lcEntry if necessary
+        end
+        return next_start, lcEntry # Return the closest match
+    end
+end

data/lib/mulparse/unmatched_exception.rb ADDED Viewed

@@ -0,0 +1,17 @@
+# snakeyworm John 3:16
+# This exception is raised when a Component's
+# ending delimiter can't be found.
+#
+class UnmatchedException < MulparseException
+    def initialize( name, line )
+        # Initialize Exception with message
+        super( "UnmatchedException: finish_match for %p at line: #{line} not found" % name )
+    end
+end

metadata ADDED Viewed

@@ -0,0 +1,53 @@
+--- !ruby/object:Gem::Specification
+name: mulparse
+version: !ruby/object:Gem::Version
+  version: 1.0.1
+platform: ruby
+authors:
+- Caleb Loera
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2019-06-01 00:00:00.000000000 Z
+dependencies: []
+description: |
+  Mulparse is a multilingual parser for multiple coding languages this parser is easily
+  extensible so one may define his own language and easily parse it.
+email:
+executables: []
+extensions: []
+extra_rdoc_files:
+- README.md
+files:
+- README.md
+- lib/mulparse.rb
+- lib/mulparse/component.rb
+- lib/mulparse/lang-configs/ruby.yaml
+- lib/mulparse/mulparse_exception.rb
+- lib/mulparse/parser.rb
+- lib/mulparse/unmatched_exception.rb
+homepage: https://github.com/snakeyworm/mulparse
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 1.9.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.5.2
+signing_key:
+specification_version: 4
+summary: Parses multiple languages and provides an easily searchable structure
+test_files: []