mulparse 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 71ca881927e698064749fcceaba4af74476a83db
4
+ data.tar.gz: 7ec5d516bcc59cf06429887640f969e0bde126dc
5
+ SHA512:
6
+ metadata.gz: b070fa38f8b5ca42280ca429b86e2ec9a16078850e292fe16f5a75ce760c9244210cfb6e9d694d54cf22fbc80f56dd98c0cda5e90f30bbae408b93af96cd204e
7
+ data.tar.gz: 71d954628f2be33a6db27a7431ac4ac7c1b441ff3d2a31fd16d0bd02eca72383e496fc2135955d93ac41c5d653cbfca762595692e8367ffb2de72828138f090f
data/README.md ADDED
@@ -0,0 +1,59 @@
1
+
2
+ snakeyworm John 3:16
3
+
4
+ # Mulparse
5
+ ***
6
+
7
+ Mulparse is a multilingual-parser for coding languages. The philosophy
8
+ behind this parser is having the ability to focus on bigger aspects of
9
+ your project. Parsing is a small and tedious aspect. This parser has the
10
+ advantage of being able to parse anything if properly defined in a "lang-config"
11
+ This parser also has many predefined "lang-configs" that provide simple
12
+ and efficient parsing of the languages you love. This parser was designed
13
+ to be easier to learn than a parser like citrus. The citrus parser has its
14
+ own mini language to learn. Mulparse only has simple hashes with a few fields
15
+ that contain simple values.
16
+
17
+ ### Lang-config Syntax
18
+ ***
19
+
20
+ A lang-config is simply an array that contains hashes where each hash defines
21
+ how to parse a component. A hash can contain the following fields where each
22
+ field is a symbol:
23
+
24
+ + name (Required)
25
+ + start (Required)
26
+ + finish (Required)
27
+ + unestable
28
+ + hash
29
+
30
+ #### name
31
+
32
+ This field is a string that represents the name of the component matched
33
+
34
+ #### start
35
+
36
+ A regular expression the matchesthe opening delimiter of a component.
37
+
38
+ #### finish
39
+
40
+ Either a regular expression or a string. If it is a regular expression it simply
41
+ matches the end delimiter of a component. One should note the special treatment of
42
+ any capture named "escape" in this regular expression. Where the regular expression
43
+ doesn't match unless the named capture's length is even. This is useful for matching
44
+ things such as strings that have an escape character. Otherwise if it is a string then
45
+ any named capture. If it is a string than all named captures from the start regular
46
+ expressions are passed to Regexp.escape() then the return value of Regexp.escape are
47
+ formatted into the string then it is converted to a regular expression. Note when this
48
+ field is a string "escape" is still specially treated.
49
+
50
+ #### unestable
51
+
52
+ Is a boolean value denoting wether or not the component it is applied on can't nest any
53
+ other components.
54
+
55
+ #### hash
56
+
57
+ This field is a hash. This hash's keys are used when the finish field is a string. In this
58
+ case all named captures from start are passed to this hash to retrieve a value to be formatted
59
+ into finish before it is converted to a regular expression.
data/lib/mulparse.rb ADDED
@@ -0,0 +1,7 @@
1
+
2
+ # Require needed files
3
+
4
+ require "mulparse/mulparse_exception.rb"
5
+ require "mulparse/unmatched_exception.rb"
6
+ require "mulparse/component.rb"
7
+ require "mulparse/parser.rb"
@@ -0,0 +1,134 @@
1
+
2
+ # snakeyworm John 3:16
3
+
4
+ # This file defines a class that contains
5
+ # data collected by the parser(Commonly referred
6
+ # to as components). Each instance of this class
7
+ # is linked to a parent and one or more child nodes.
8
+
9
+ # TODO:
10
+ #
11
+ # * Expand as needed to fit expanding lang config and
12
+ # parser.
13
+ #
14
+
15
+ class Component
16
+
17
+ # Class that wraps component data and provides hierarhcy navigation
18
+
19
+ # Declare accessor's to metadata
20
+
21
+ # Delimiters
22
+
23
+ attr_accessor :start_match
24
+ attr_accessor :finish_match
25
+
26
+ # Hierarchial
27
+
28
+ attr_reader :parent
29
+ attr_reader :children
30
+
31
+ # Lang config
32
+
33
+ attr_reader :name
34
+
35
+ attr_reader :start
36
+ attr_reader :finish
37
+
38
+ attr_reader :unestable
39
+
40
+ attr_reader :hash
41
+
42
+ def initialize(
43
+ next_start=nil,
44
+ parent=nil,
45
+ name: nil,
46
+ start: nil,
47
+ finish: nil,
48
+ unestable: nil,
49
+ hash: nil )
50
+
51
+ # Initialize the ComponentNode with the
52
+ # data provided.
53
+
54
+ @start_match = next_start # Store the component's opening match
55
+
56
+ # Set up lang config attributes
57
+
58
+ @name = name
59
+
60
+ @start = start
61
+ @finish = finish
62
+
63
+ @unestable = unestable
64
+
65
+ @hash = hash
66
+
67
+ @parent = parent # Set the ComponentNode's parent
68
+ @parent.children << self if parent
69
+
70
+ @children = []; # Set the children of this ComponentNode to an empty array
71
+
72
+ end
73
+
74
+ def sibling( offset=1 )
75
+
76
+ # Returns the sibling offset by the
77
+ # provided parameter.
78
+
79
+ return @parent.children[ @parent.children.index( self ) + offset ]
80
+
81
+ end
82
+
83
+ def has_sibling_at?( offset=1 )
84
+
85
+ # Returns true if a call to sibling() with the
86
+ # same parameter would run without error.
87
+
88
+ return @parent.children.index( self ) + offset < @parent.children.length
89
+
90
+ end
91
+
92
+ def get_src( src )
93
+
94
+ # Return the matches and the content between
95
+ # them in the src provided.
96
+
97
+ begin
98
+ return src[ ( @start_match.begin( 0 )...@finish_match.end( 0 ) ) ]
99
+ rescue NoMethodError
100
+ return @start_match[0]
101
+ end
102
+
103
+ end
104
+
105
+ def to_s()
106
+
107
+ # Return a string representation of this Component
108
+
109
+ if @start_match then
110
+
111
+ abbreviated_start_match = @start_match[0][0..10]
112
+ abbreviated_start_match += "..." if @start_match.length > 11
113
+
114
+ end
115
+
116
+ return "Component(\n" \
117
+ " start_match=%p,\n" \
118
+ " parent.name=%p,\n" \
119
+ " name=%p,\n" \
120
+ " start=%s,\n" \
121
+ " finish=%s,\n" \
122
+ " unestable=%p\n" \
123
+ ")" % [
124
+ abbreviated_start_match,
125
+ @parent&.name,
126
+ @name,
127
+ @start,
128
+ @finish || "nil",
129
+ @unestable
130
+ ]
131
+
132
+ end
133
+
134
+ end
@@ -0,0 +1,73 @@
1
+ ---
2
+ - :name: module
3
+ :start: !ruby/regexp "/\n (?<=\\s)module[\\t ]+ # Keyword\n (?:\\\\\\n+\\s*)*
4
+ # White space and line continuation\n (?<name>[A-Z]\\w*)(?=\\s) # Name\n
5
+ \ /x"
6
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
7
+ - :name: class
8
+ :start: !ruby/regexp "/\n (?<=\\s)class[\\t ]+ # Keyword\n (?:\\\\\\n+[\\t
9
+ ]*)* # White space and line continuation\n (?<nesting_class>(?:[A-Z]\\w*::(?:\\\\\\n+[\\t
10
+ ]*)*)*) # White space and line continuation\n (?<name>[A-Z]\\w*) # Name\n
11
+ \ (?:[\\t ]*(?:[\\t ]+\\\\\\n+\\s*)* # White space and line continuation\n
12
+ \ <? # Inheritance operator\n [\\t ]*(?:[\\t ]+\\\\\\n+\\s*)* # White
13
+ space and line continuation\n (?<parent>[A-Z]\\w*))?(?=\\s) # Parent class\n
14
+ \ /x"
15
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
16
+ - :name: singleton-class
17
+ :start: !ruby/regexp "/\n (?<=\\s)class[\\t ]+ # Keyword\n (?:\\\\\\n+[\\t
18
+ ]*)* # White space and line continuation\n [\\t ]*\n << # Singleton
19
+ operator\n [\\t ]*(?:[\\t ]+\\\\\\n+\\s*)* # White space and line continuation\n
20
+ \ (?<name>\\w*)(?=\\s) # Name\n /x"
21
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
22
+ - :name: method
23
+ :start: !ruby/regexp "/\n (?<=\\s)def[\\t ]+ # Keyword\n (?:\\\\\\n+[\\t
24
+ ]*)* # White space and line continuation\n (?:(?<class>\\w+) # Class\n
25
+ \ (?:\\\\\\n+[\\t ]*)*[\\t ]* # White space and line continuation\n \\.
26
+ # Dot operator\n [\\t ]*(?:\\\\\\n+[\\t ]*)*)? # White space and line continuation\n
27
+ \ (?<name>\\w+)(?=[\\s\\(]*) # Name\n /x"
28
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
29
+ - :name: access-modifier
30
+ :start: !ruby/regexp /(?<=\s)private|protected|public(?=\s)/
31
+ - :name: case
32
+ :start: !ruby/regexp /(?<=\s)case(?=[\(\t ])/
33
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
34
+ - :name: if
35
+ :start: !ruby/regexp "/\n (?<=[\\n;])[\\t ]* # White space\n if #
36
+ Key word\n (?:\\\\\\n+[\\t ]*)*[\\s\\(] # White space and line continuation\n
37
+ \ /x"
38
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
39
+ - :name: unless
40
+ :start: !ruby/regexp "/\n (?<=[\\n;])[\\t ]* # White space\n unless
41
+ # Key word\n (?:\\\\\\n+[\\t ]*)*[\\s\\(] # White space and line continuation\n
42
+ \ /x"
43
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
44
+ - :name: do-block
45
+ :start: !ruby/regexp "/\n [\\)\\s]+ # White space\n do # Key word\n
46
+ \ (?:\\\\\\n+[\\t ]*)*[\\s\\|] # White space and line continuation\n /x"
47
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
48
+ - :name: begin
49
+ :start: !ruby/regexp "/\n ;?\\s+ # White space\n begin # Key word\n
50
+ \ [(\\s] # White space\n /x"
51
+ :finish: !ruby/regexp /(?<=\s)end(?=\s)/
52
+ - :name: multi-comment
53
+ :start: !ruby/regexp /^=begin(?<content>\s.*?)(?<=\n)=end(?=\s)/m
54
+ - :name: single-comment
55
+ :start: !ruby/regexp /#(?<content>.*?)(?=\n)/
56
+ - :name: string
57
+ :start: !ruby/regexp /(?:%[q|Q]?(?<delimiter>"|'|[\`\~\!\@\#\$\%\^\&\*\(\)\-\_\=\+\[\]\{\}\\\|\;\:\,\<\.\>\/\?]))|(?<delimiter>"|')/
58
+ :finish: "(?<escape>\\\\*)%{delimiter}"
59
+ :unestable: true
60
+ :hash: &1
61
+ "{": "}"
62
+ "<": ">"
63
+ "[": "]"
64
+ "/": "/"
65
+ - :name: regular-expression
66
+ :start: !ruby/regexp /(?:%r(?<delimiter>"|'|[\`\~\!\@\#\$\%\^\&\*\(\)\-\_\=\+\[\]\{\}\\\|\;\:\,\<\.\>\/\?]))|(?<delimiter>\/)/
67
+ :finish: "(?<escape>\\\\*)%{delimiter}"
68
+ :unestable: true
69
+ :hash: *1
70
+ - :name: here-document
71
+ :start: !ruby/regexp /[=\s]<<\-?(?<mode>['"`]?)(?<EOD>[A-Za-z_]\w*)\k<mode>/
72
+ :finish: "(?<escape>[\\\\]*)%{EOD}"
73
+ :unestable: true
@@ -0,0 +1,9 @@
1
+
2
+ # snakeyworm John 3:16
3
+
4
+ # This class provides a superclass for
5
+ # all parsing exceptions.
6
+
7
+ class MulparseException < Exception
8
+
9
+ end
@@ -0,0 +1,222 @@
1
+
2
+ # snakeyworm John 3:16
3
+
4
+ # This files defines the class that actually
5
+ # performs IO on the file(s) provided and
6
+ # parses the input.
7
+
8
+ require "YAML"
9
+
10
+ class Parser
11
+
12
+ # This class is used to instantiate an instance that provides
13
+ # Enumerable interface for parsing.
14
+ #
15
+ # TODO:
16
+ #
17
+ # * Ensure that the lang config ussage is well documented.
18
+ #
19
+
20
+ include Enumerable
21
+
22
+ attr_reader :src
23
+
24
+ def initialize( src, lang_config )
25
+
26
+ # Initialize a parser and set lang_config and other parsing configurations
27
+
28
+ @src = src # src must be loaded from a text stream not a binary one
29
+
30
+ if lang_config.class == String then
31
+
32
+ # Open predefined lang-config
33
+
34
+ @lang_config = YAML.load( IO.read(
35
+ "#{File.absolute_path( File.expand_path( File.join( File.dirname( __FILE__ ), '..' ) ) )}" \
36
+ "/mulparse/lang-configs/#{lang_config}.yaml"
37
+ ) )
38
+
39
+ else
40
+ # Else use user provided lang-config
41
+ @lang_config = lang_config
42
+ end
43
+
44
+ @position = 0
45
+
46
+ # Store the current Component
47
+ @current = Component.new( finish: true )
48
+
49
+ end
50
+
51
+ def each( &block )
52
+
53
+ # Implements each method defined in Enumerable.
54
+ # This method passes the next component parsed
55
+ # in the srouce. The component is in the form
56
+ # of a Component.
57
+
58
+ loop do
59
+
60
+ # Either finish the currentCN or open a new one on each iteration
61
+
62
+ # Get the next start delimiter and its corresponding lcEntry
63
+
64
+ next_start, start_entry = get_next_start()
65
+
66
+ # Keep track of the existance of the finish attribute
67
+ has_finish = @current.finish && @current.finish != true # A true value indicates the root Component
68
+
69
+ if has_finish then
70
+
71
+ # If finsih regex provided prepare closing regex
72
+
73
+ closing_regex = @current.finish
74
+
75
+ if closing_regex.is_a?( String ) then
76
+
77
+ # Format in mirror to the closing_regex if mirror is provided
78
+
79
+ if @current.hash then
80
+
81
+ # Process hash argument if provided
82
+
83
+ closing_regex = Regexp.new( closing_regex % Hash.new { |_, key|
84
+ Regexp.escape( @current.hash[ @current.start_match[ key ] ] || @current.start_match[ key ] )
85
+ } )
86
+
87
+ else
88
+
89
+ # Else process normally
90
+
91
+ closing_regex = Regexp.new( closing_regex % Hash.new { |_, key|
92
+ Regexp.escape( @current.start_match[ key ] )
93
+ } )
94
+
95
+ end
96
+
97
+ end
98
+
99
+ # Match for next closing delimiter
100
+ next_finish = closing_regex.match( @src, @position )
101
+
102
+ raise UnmatchedException.new(
103
+ @current.name,
104
+ @src[ 0..@current.start_match.end( 0 ) ].count( "\n" ) + 1
105
+ ) unless next_finish
106
+
107
+ # Ensure "escape" capture is valid
108
+
109
+ until next_finish[ :escape ].length.even?
110
+
111
+ # Match until length of "escape" capture is even
112
+ next_finish = closing_regex.match( @src, next_finish.end( 0 ) )
113
+
114
+ end if closing_regex.names.include?( "escape" )
115
+
116
+ end
117
+
118
+ if !next_start then
119
+
120
+ # If done parsing yield last Component
121
+
122
+ @current.finish_match = next_finish
123
+ yield @current
124
+
125
+ break
126
+
127
+ end
128
+
129
+ break if !next_start # Break if pasrsing is complete
130
+
131
+ if @current&.parent && ( has_finish &&
132
+ ( @current.unestable || next_finish.begin( 0 ) < next_start.begin( 0 ) ) ) then
133
+
134
+ # If current Component should be finished then finish it
135
+
136
+ begin
137
+ @position = next_finish.end( 0 ) # Update position
138
+ rescue NoMethodError
139
+
140
+ # Raise UnmatchedException if next_finish is nil
141
+
142
+ raise UnmatchedException.new(
143
+ @current.name,
144
+ @src[ 0..@current.start_match.end( 0 ) ].count( "\n" ) + 1
145
+ )
146
+
147
+ end
148
+
149
+ # Update parsing data
150
+ @current.finish_match = next_finish
151
+
152
+ # Yield finished Component
153
+ yield @current
154
+
155
+ # Set current Component to the previous's parent
156
+ @current = @current.parent
157
+
158
+ else
159
+
160
+ # Else new Component is found set it as current
161
+
162
+ @current = Component.new( next_start, @current, start_entry )
163
+ @position = next_start.end( 0 ) # Update position
164
+
165
+ # Yield if Component uses syntactic sugar unestability
166
+
167
+ unless @current.finish && @current.finish != true then
168
+ yield @current
169
+ @current = @current.parent
170
+ end
171
+
172
+ end
173
+
174
+ end
175
+
176
+ end
177
+
178
+ # External/Internal Utility Methods
179
+
180
+ def line
181
+
182
+ # Return number of lines already parsed
183
+ return @src[ 0..@position ].count( "\n" )
184
+
185
+ end
186
+
187
+ # Internal Utility Methods
188
+
189
+ protected
190
+
191
+ def get_next_start
192
+
193
+ # This method finds the nearest match of a
194
+ # components starting delimiter in the @src.
195
+ # Returns nil if there are no more matches.
196
+
197
+ next_start = nil # Store the next starting delimiter
198
+ lcEntry = nil # Store the lang config entry of next_start
199
+
200
+ for i in @lang_config do
201
+
202
+ # Test match against closest starting delimiter on each iteration
203
+
204
+ matchBuffer = i[ :start ].match( @src, @position ) # Get a match
205
+
206
+ next unless matchBuffer # If no match is found then skip to next iteration
207
+
208
+ # Set the value of next_start to the closer match
209
+
210
+ next_start = ( !next_start ||
211
+ matchBuffer.begin( 0 ) < next_start.begin( 0 ) ) ?
212
+ matchBuffer : next_start
213
+
214
+ if next_start == matchBuffer then lcEntry = i end # Update lcEntry if necessary
215
+
216
+ end
217
+
218
+ return next_start, lcEntry # Return the closest match
219
+
220
+ end
221
+
222
+ end
@@ -0,0 +1,17 @@
1
+
2
+ # snakeyworm John 3:16
3
+
4
+ # This exception is raised when a Component's
5
+ # ending delimiter can't be found.
6
+ #
7
+
8
+ class UnmatchedException < MulparseException
9
+
10
+ def initialize( name, line )
11
+
12
+ # Initialize Exception with message
13
+ super( "UnmatchedException: finish_match for %p at line: #{line} not found" % name )
14
+
15
+ end
16
+
17
+ end
metadata ADDED
@@ -0,0 +1,53 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mulparse
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Caleb Loera
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-06-01 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: |
14
+ Mulparse is a multilingual parser for multiple coding languages this parser is easily
15
+ extensible so one may define his own language and easily parse it.
16
+ email:
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files:
20
+ - README.md
21
+ files:
22
+ - README.md
23
+ - lib/mulparse.rb
24
+ - lib/mulparse/component.rb
25
+ - lib/mulparse/lang-configs/ruby.yaml
26
+ - lib/mulparse/mulparse_exception.rb
27
+ - lib/mulparse/parser.rb
28
+ - lib/mulparse/unmatched_exception.rb
29
+ homepage: https://github.com/snakeyworm/mulparse
30
+ licenses:
31
+ - MIT
32
+ metadata: {}
33
+ post_install_message:
34
+ rdoc_options: []
35
+ require_paths:
36
+ - lib
37
+ required_ruby_version: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ version: 1.9.0
42
+ required_rubygems_version: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ requirements: []
48
+ rubyforge_project:
49
+ rubygems_version: 2.5.2
50
+ signing_key:
51
+ specification_version: 4
52
+ summary: Parses multiple languages and provides an easily searchable structure
53
+ test_files: []