yoga 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +2 -0
- data/Gemfile +1 -2
- data/README.md +233 -11
- data/lib/yoga/errors.rb +15 -0
- data/lib/yoga/parser/helpers.rb +2 -1
- data/lib/yoga/scanner.rb +23 -15
- data/lib/yoga/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: aac664b4901d0613248ba9cb7169931dfb135480
|
4
|
+
data.tar.gz: 61185a381b8363cea814da9df4aa948464157523
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b7d22c7ba8033a31f4192486775bacbbde86471daca4d4e1ccec55a1b00505c91ed7b3f4f0bc6b02b5d7d7d0692dd2bc437e2a2d9fff8f6e5e65ffdc30a79079
|
7
|
+
data.tar.gz: 6ee507d8d3e3f989ca55c81211b93c143e8f1a2233280200d6f4db231d2ddccdec97f9b1a3127145ab35b8c3a65904a0f001c8fd76dc56dc8a39cbbe576617a0
|
data/.travis.yml
CHANGED
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -1,36 +1,258 @@
|
|
1
1
|
# Yoga
|
2
|
+
[![Build Status][build-status]][build-status-link] [![Coverage Status][coverage-status]][coverage-status-link]
|
2
3
|
|
3
|
-
|
4
|
+
A helper for your Ruby parsers. This adds helper methods to make parsing
|
5
|
+
(and scanning!) easier and more structured. If you're looking for an LALR
|
6
|
+
parser generator, that isn't this. This is designed to help you construct
|
7
|
+
Recursive Descent parsers - which are solely LL(k). If you want an LALR parser
|
8
|
+
generator, see [_Antelope_](https://github.com/medcat/antelope) or
|
9
|
+
[Bison](https://www.gnu.org/software/bison/).
|
4
10
|
|
5
|
-
|
11
|
+
Yoga requires [Mixture](https://github.com/medcat/mixture) for parser node
|
12
|
+
attributes. However, the use of the parser nodes included with Yoga are
|
13
|
+
completely optional.
|
6
14
|
|
7
15
|
## Installation
|
8
16
|
|
9
17
|
Add this line to your application's Gemfile:
|
10
18
|
|
11
19
|
```ruby
|
12
|
-
gem
|
20
|
+
gem "yoga"
|
13
21
|
```
|
14
22
|
|
15
23
|
And then execute:
|
16
24
|
|
17
25
|
$ bundle
|
18
26
|
|
19
|
-
|
27
|
+
## Usage
|
20
28
|
|
21
|
-
|
29
|
+
To begin your parser, you will first have to create a scanner. A scanner
|
30
|
+
takes the source text and generates "tokens." These tokens are abstract
|
31
|
+
representations of the source text of the document. For example, for the
|
32
|
+
text `class A do`, you could have the tokens `:class`, `:CNAME`, and `:do`.
|
33
|
+
The actual names of the tokens are completely up to you. These token names
|
34
|
+
are later used in the parser to set up expectations - for example, for the
|
35
|
+
definition of a class, you could expect a `:class`, `:CNAME`, and a `:do`
|
36
|
+
token.
|
22
37
|
|
23
|
-
|
38
|
+
Essentially, the scanner breaks up the text into usable, bite-sized pieces
|
39
|
+
for the parser to chomp on. Here's what scanner may look like:
|
40
|
+
|
41
|
+
```ruby
|
42
|
+
module MyLanguage
|
43
|
+
class Scanner
|
44
|
+
# All of the behavior from Yoga for scanners. This provides the
|
45
|
+
# `match/2` method, the `call/0` method, the `match_line/1` method,
|
46
|
+
# the `location/1` method, and the `emit/2` method. The major ones that
|
47
|
+
# are used are the `match/2`, the `call/0`, and the `match_line/1`
|
48
|
+
# methods.
|
49
|
+
include Yoga::Scanner
|
50
|
+
|
51
|
+
# This must be implemented. This is called for the next token. This
|
52
|
+
# should only return a Token, or true.
|
53
|
+
def scan
|
54
|
+
# Match with a string value escapes the string, then turns it into a
|
55
|
+
# regular expression.
|
56
|
+
match("[") || match("]") ||
|
57
|
+
# Match with a symbol escapes the symbol, and turns it into a regular
|
58
|
+
# expression, suffixing it with `symbol_negative_assertion`. This is
|
59
|
+
# to prevent issues with identifiers and keywords.
|
60
|
+
match(:class) || match(:func) ||
|
61
|
+
# With a regular expression, it's matched exactly. However, a token
|
62
|
+
# name is highly recommended.
|
63
|
+
match(/[a-z][a-zA-Z0-9_]*[!?=]?/, :IDENT)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
end
|
67
|
+
```
|
68
|
+
|
69
|
+
And that's it! You now have a fully functioning scanner. In order to use it,
|
70
|
+
all you have to do is this:
|
71
|
+
|
72
|
+
```ruby
|
73
|
+
source = "class alpha [func a []]"
|
74
|
+
MyLanguage::Scanner.new(source).call # => #<Enumerable ...>
|
75
|
+
```
|
76
|
+
|
77
|
+
Note that `Scanner#call` returns an enumerable. `#call` is aliased as `#each`.
|
78
|
+
What this means is that tokens aren't generated until they're requested by the
|
79
|
+
parser - each token is generated from the source incrementally. If you want
|
80
|
+
to retrieve all of the tokens immediately, you have to first convert it into
|
81
|
+
a string, or perform some other operation on the enumerable (since it isn't
|
82
|
+
lazy):
|
83
|
+
|
84
|
+
```ruby
|
85
|
+
MyLanguage::Scanner.new(source).call.to_a # => [...]
|
86
|
+
```
|
87
|
+
|
88
|
+
The scanner also automatically adds location information to all of the tokens.
|
89
|
+
This is handled automatically by `match/2` and `emit/2` - the only issue being
|
90
|
+
that all regular expressions **must not** include a newline. Newlines should
|
91
|
+
be matched with `match_line/1`; if lines must be emitted as a token, you can
|
92
|
+
pass the kind of token to emit to `match_line/1` using the `kind:` keyword.
|
93
|
+
|
94
|
+
You may notice that all of the tokens have `<anon>` set as the location's file.
|
95
|
+
This is the default location, which is provided to the initializer:
|
96
|
+
|
97
|
+
```ruby
|
98
|
+
MyLanguage::Scanner.new(source, "foo").call.first.location.to_s # => "foo:1.1-6"
|
99
|
+
```
|
100
|
+
|
101
|
+
Parsers are a little bit more complicated. Before we can pull up the parser,
|
102
|
+
let's define a grammar and some node classes.
|
103
|
+
|
104
|
+
```
|
105
|
+
; This is the grammar.
|
106
|
+
<root> = *<statement>
|
107
|
+
<statement> = <expression> ';'
|
108
|
+
<expression> = <expression> <op> <expression>
|
109
|
+
<expression> /= <int> ; here, <int> is defined by the scanner.
|
110
|
+
<op> = '+' / '-' / '*' / '/' / '^' / '%' / '='
|
111
|
+
```
|
24
112
|
|
25
|
-
|
113
|
+
```ruby
|
114
|
+
module MyLanguage
|
115
|
+
class Parser
|
116
|
+
class Root < Yoga::Node
|
117
|
+
# An attribute on the node. This is required for Yoga nodes since the
|
118
|
+
# update syntax requires them. The type for the attribute is optional.
|
119
|
+
attribute :statements, type: [Yoga::Node]
|
120
|
+
end
|
121
|
+
|
122
|
+
class Expression < Yoga::Node
|
123
|
+
end
|
124
|
+
|
125
|
+
class Operation < Expression
|
126
|
+
attribute :operator, type: ::Symbol
|
127
|
+
attribute :left, type: Expression
|
128
|
+
attribute :right, type: Expression
|
129
|
+
end
|
130
|
+
|
131
|
+
class Literal < Expression
|
132
|
+
attribute :value, type: ::Integer
|
133
|
+
end
|
134
|
+
end
|
135
|
+
end
|
136
|
+
```
|
137
|
+
|
138
|
+
With those out of the way, let's take a look at the parser itself.
|
139
|
+
|
140
|
+
```ruby
|
141
|
+
module MyLanguage
|
142
|
+
class Parser
|
143
|
+
# This provides all of the parser helpers. This is the same as adding
|
144
|
+
# `Yoga::Parser::Helpers` as an include statement as well.
|
145
|
+
include Yoga::Parser
|
146
|
+
|
147
|
+
# Like the `scan/0` method on the scanner, this must be implemented. This
|
148
|
+
# is the entry point for the parser. However, public usage should use the
|
149
|
+
# `call/0` method. This should return a node of some sort.
|
150
|
+
def parse_root
|
151
|
+
# This "collects" a series of nodes in sequence. It iterates until it
|
152
|
+
# reaches the `:EOF` token (in this case). The first parameter to
|
153
|
+
# collect is the "terminating token," and can be any value that
|
154
|
+
# `expect/1` or `peek?/1` accepts. The second, optional parameter to
|
155
|
+
# collect is the "joining token," and is required between each node.
|
156
|
+
# We're not using the semicolon as a joining token because that is
|
157
|
+
# required for _all_ statements. The joining token can be used for
|
158
|
+
# things like argument lists. The parameter can be any value that
|
159
|
+
# `expect/1` or `peek?/1` accepts.
|
160
|
+
children = collect(:EOF) { parse_statement }
|
161
|
+
|
162
|
+
# "Unions" the location of all of the statements in the list.
|
163
|
+
location = children.map(&:location).inject(:union)
|
164
|
+
Parser::Root.new(statements: children, location: location)
|
165
|
+
end
|
166
|
+
|
167
|
+
# Parses a statement. This is the same as the <statement> rule as above.
|
168
|
+
def parse_statement
|
169
|
+
expression = parse_expression
|
170
|
+
# This says that the next token should be a semicolon. If the next token
|
171
|
+
# isn't, it throws an error with a detailed error message, denoting
|
172
|
+
# what was expected (in this case, a semicolon), what was given, and
|
173
|
+
# where the error was located in the source file.
|
174
|
+
expect(:";")
|
175
|
+
|
176
|
+
expression
|
177
|
+
end
|
26
178
|
|
27
|
-
## Development
|
28
179
|
|
29
|
-
|
180
|
+
# A switch statement, essentially. This is defined beforehand to make it
|
181
|
+
# _faster_ (not really; it's just useful). The first parameter to the
|
182
|
+
# switch function is the name of the switch. This is used later to
|
183
|
+
# actually perform the switch; it is also used to define a first set with
|
184
|
+
# the allowed tokens for the switch. The second parameter defines a key
|
185
|
+
# value pair. The keys are the tokens that are allowed; a symbol or an
|
186
|
+
# array of symbols can be used. The value is the block or the method that
|
187
|
+
# is executed upon encountering that token.
|
188
|
+
switch(:Operation,
|
189
|
+
"=": proc { |left| parse_operation(:"=", left) },
|
190
|
+
"+": proc { |left| parse_operation(:"+", left) },
|
191
|
+
"-": proc { |left| parse_operation(:"-", left) },
|
192
|
+
"*": proc { |left| parse_operation(:"*", left) },
|
193
|
+
"/": proc { |left| parse_operation(:"/", left) },
|
194
|
+
"^": proc { |left| parse_operation(:"^", left) },
|
195
|
+
"%": proc { |left| parse_operation(:"%", left) })
|
196
|
+
|
197
|
+
def parse_expression
|
198
|
+
# Parse a literal. All expressions must contain a literal of some sort;
|
199
|
+
# we're just going to use a numeric literal here.
|
200
|
+
left = parse_expression_literal
|
201
|
+
|
202
|
+
# Whenever the `.switch` function is called, it creates a
|
203
|
+
# "first set" that can be used like this. The first set consists of
|
204
|
+
# a set of tokens that are allowed for the switch statement. In this
|
205
|
+
# case, it just makes sure that the next token is an operator. If it
|
206
|
+
# is, it parses it as an operation.
|
207
|
+
if peek?(first(:Operation))
|
208
|
+
# Uses the switch defined below. If a token is found as a key, its
|
209
|
+
# block is executed; otherwise, it errors, giving a detailed error of
|
210
|
+
# what was expected.
|
211
|
+
switch(:Operation, left)
|
212
|
+
else
|
213
|
+
left
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
def parse_operation(op, left)
|
218
|
+
token = expect(op)
|
219
|
+
right = parse_expression
|
220
|
+
|
221
|
+
Parser::Operation.new(left: left, op: op, right: right, location:
|
222
|
+
left.location | op.location | right.location)
|
223
|
+
end
|
224
|
+
|
225
|
+
def parse_expression_literal
|
226
|
+
token = expect(:NUMERIC)
|
227
|
+
Parser::Literal.new(value: token.value, location: token.location)
|
228
|
+
end
|
229
|
+
end
|
230
|
+
end
|
231
|
+
```
|
232
|
+
|
233
|
+
This parser can then be used as such:
|
234
|
+
|
235
|
+
```ruby
|
236
|
+
source = "a = 2;\nb = a + 2;\n"
|
237
|
+
scanner = MyLanguage::Scanner.new(source).call
|
238
|
+
MyLanguage::Parser.new(scanner).call # => #<MyLanguage::Parser::Root ...>
|
239
|
+
```
|
240
|
+
|
241
|
+
That's about it! If you have any questions, you can email me at
|
242
|
+
<jeremy.rodi@medcat.me>, open an issue, or do what you like.
|
30
243
|
|
31
|
-
|
244
|
+
For more documentation, see [the Documentation][documentation] - Yoga has a
|
245
|
+
requirement of 100% documentation.
|
32
246
|
|
33
247
|
## Contributing
|
34
248
|
|
35
|
-
Bug reports and pull requests are welcome on GitHub at
|
249
|
+
Bug reports and pull requests are welcome on GitHub at
|
250
|
+
<https://github.com/medcat/yoga>. This project is intended to be a safe,
|
251
|
+
welcoming space for collaboration, and contributors are expected to adhere to
|
252
|
+
the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
|
36
253
|
|
254
|
+
[build-status]: https://travis-ci.org/medcat/yoga.svg?branch=master
|
255
|
+
[documentation]: http://www.rubydoc.info/github/medcat/yoga/master
|
256
|
+
[coverage-status]: https://coveralls.io/repos/github/medcat/yoga/badge.svg?branch=master
|
257
|
+
[build-status-link]: https://travis-ci.org/medcat/yoga
|
258
|
+
[coverage-status-link]: https://coveralls.io/github/medcat/yoga?branch=master
|
data/lib/yoga/errors.rb
CHANGED
@@ -33,6 +33,21 @@ module Yoga
|
|
33
33
|
attr_reader :location
|
34
34
|
end
|
35
35
|
|
36
|
+
# An error that occurred with scanning.
|
37
|
+
#
|
38
|
+
# @api private
|
39
|
+
class ScanError < LocationError; end
|
40
|
+
|
41
|
+
# An unexpected character was encountered while scanning.
|
42
|
+
#
|
43
|
+
# @api private
|
44
|
+
class UnexpectedCharacterError < LocationError
|
45
|
+
# (see Error#generate_message)
|
46
|
+
private def generate_message
|
47
|
+
"An unexpected character was encountered at #{@location}"
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
36
51
|
# An error that occurred with parsing.
|
37
52
|
#
|
38
53
|
# @api private
|
data/lib/yoga/parser/helpers.rb
CHANGED
@@ -142,7 +142,8 @@ module Yoga
|
|
142
142
|
# @return [::Object] The result of calling the block.
|
143
143
|
def switch(name, *param)
|
144
144
|
switch = self.class.switch(name)
|
145
|
-
block = switch
|
145
|
+
block = switch
|
146
|
+
.fetch(peek.kind) { switch.fetch(:$else) { error(switch.keys) } }
|
146
147
|
instance_exec(*param, &block)
|
147
148
|
end
|
148
149
|
|
data/lib/yoga/scanner.rb
CHANGED
@@ -6,12 +6,20 @@ module Yoga
|
|
6
6
|
# It is built to lazily scan whenever it is required, instead
|
7
7
|
# of all at once. This integrates nicely with the parser.
|
8
8
|
module Scanner
|
9
|
+
# The file of the scanner. This can be overwritten to provide a descriptor
|
10
|
+
# for the file.
|
11
|
+
#
|
12
|
+
# @return [::String]
|
13
|
+
attr_reader :file
|
14
|
+
|
9
15
|
# Initializes the scanner with the given source. Once the
|
10
16
|
# source is set, it shouldn't be changed.
|
11
17
|
#
|
12
18
|
# @param source [::String] The source.
|
13
|
-
|
19
|
+
# @param file [::String] The file the scanner comes from.
|
20
|
+
def initialize(source, file = "<anon>")
|
14
21
|
@source = source
|
22
|
+
@file = file
|
15
23
|
@line = 1
|
16
24
|
@last_line_at = 0
|
17
25
|
end
|
@@ -32,10 +40,10 @@ module Yoga
|
|
32
40
|
|
33
41
|
until @scanner.eos?
|
34
42
|
value = scan
|
35
|
-
yield value
|
43
|
+
yield value unless value == true || !value
|
36
44
|
end
|
37
45
|
|
38
|
-
yield
|
46
|
+
yield eof_token
|
39
47
|
self
|
40
48
|
end
|
41
49
|
|
@@ -53,7 +61,7 @@ module Yoga
|
|
53
61
|
fail NotImplementedError, "Please implement #{self.class}#scan"
|
54
62
|
end
|
55
63
|
|
56
|
-
|
64
|
+
protected
|
57
65
|
|
58
66
|
# Returns a location at the given location. If a size is given, it reduces
|
59
67
|
# the column number by the size and returns the size from that.
|
@@ -115,12 +123,13 @@ module Yoga
|
|
115
123
|
# such as line counting and caching, to be performed.
|
116
124
|
#
|
117
125
|
# @return [Boolean] If the line was matched.
|
118
|
-
def match_line(kind
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
126
|
+
def match_line(kind: false, required: false)
|
127
|
+
result = @scanner.scan(LINE_MATCHER)
|
128
|
+
(required ? (fail UnexpectedCharacterError, location: location) : return) \
|
129
|
+
unless result
|
130
|
+
@line += 1
|
131
|
+
@last_line_at = @scanner.charpos
|
132
|
+
(kind && emit(kind)) || true
|
124
133
|
end
|
125
134
|
|
126
135
|
# Returns the number of lines that have been covered so far in the scanner.
|
@@ -145,12 +154,11 @@ module Yoga
|
|
145
154
|
"(?![a-zA-Z])"
|
146
155
|
end
|
147
156
|
|
148
|
-
#
|
149
|
-
# for the file.
|
157
|
+
# Returns a token that denotes that the scanner is done scanning.
|
150
158
|
#
|
151
|
-
# @return [::
|
152
|
-
def
|
153
|
-
|
159
|
+
# @return [Yoga::Token]
|
160
|
+
def eof_token
|
161
|
+
emit(:EOF, "")
|
154
162
|
end
|
155
163
|
end
|
156
164
|
end
|
data/lib/yoga/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: yoga
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jeremy Rodi
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-03-
|
11
|
+
date: 2017-03-10 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: mixture
|
@@ -142,7 +142,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
142
142
|
version: '0'
|
143
143
|
requirements: []
|
144
144
|
rubyforge_project:
|
145
|
-
rubygems_version: 2.5.
|
145
|
+
rubygems_version: 2.5.1
|
146
146
|
signing_key:
|
147
147
|
specification_version: 4
|
148
148
|
summary: Ruby scanner and parser helpers.
|