yoga 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +2 -0
- data/Gemfile +1 -2
- data/README.md +233 -11
- data/lib/yoga/errors.rb +15 -0
- data/lib/yoga/parser/helpers.rb +2 -1
- data/lib/yoga/scanner.rb +23 -15
- data/lib/yoga/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: aac664b4901d0613248ba9cb7169931dfb135480
|
4
|
+
data.tar.gz: 61185a381b8363cea814da9df4aa948464157523
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b7d22c7ba8033a31f4192486775bacbbde86471daca4d4e1ccec55a1b00505c91ed7b3f4f0bc6b02b5d7d7d0692dd2bc437e2a2d9fff8f6e5e65ffdc30a79079
|
7
|
+
data.tar.gz: 6ee507d8d3e3f989ca55c81211b93c143e8f1a2233280200d6f4db231d2ddccdec97f9b1a3127145ab35b8c3a65904a0f001c8fd76dc56dc8a39cbbe576617a0
|
data/.travis.yml
CHANGED
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -1,36 +1,258 @@
|
|
1
1
|
# Yoga
|
2
|
+
[![Build Status][build-status]][build-status-link] [![Coverage Status][coverage-status]][coverage-status-link]
|
2
3
|
|
3
|
-
|
4
|
+
A helper for your Ruby parsers. This adds helper methods to make parsing
|
5
|
+
(and scanning!) easier and more structured. If you're looking for an LALR
|
6
|
+
parser generator, that isn't this. This is designed to help you construct
|
7
|
+
Recursive Descent parsers - which are solely LL(k). If you want an LALR parser
|
8
|
+
generator, see [_Antelope_](https://github.com/medcat/antelope) or
|
9
|
+
[Bison](https://www.gnu.org/software/bison/).
|
4
10
|
|
5
|
-
|
11
|
+
Yoga requires [Mixture](https://github.com/medcat/mixture) for parser node
|
12
|
+
attributes. However, the use of the parser nodes included with Yoga are
|
13
|
+
completely optional.
|
6
14
|
|
7
15
|
## Installation
|
8
16
|
|
9
17
|
Add this line to your application's Gemfile:
|
10
18
|
|
11
19
|
```ruby
|
12
|
-
gem
|
20
|
+
gem "yoga"
|
13
21
|
```
|
14
22
|
|
15
23
|
And then execute:
|
16
24
|
|
17
25
|
$ bundle
|
18
26
|
|
19
|
-
|
27
|
+
## Usage
|
20
28
|
|
21
|
-
|
29
|
+
To begin your parser, you will first have to create a scanner. A scanner
|
30
|
+
takes the source text and generates "tokens." These tokens are abstract
|
31
|
+
representations of the source text of the document. For example, for the
|
32
|
+
text `class A do`, you could have the tokens `:class`, `:CNAME`, and `:do`.
|
33
|
+
The actual names of the tokens are completely up to you. These token names
|
34
|
+
are later used in the parser to set up expectations - for example, for the
|
35
|
+
definition of a class, you could expect a `:class`, `:CNAME`, and a `:do`
|
36
|
+
token.
|
22
37
|
|
23
|
-
|
38
|
+
Essentially, the scanner breaks up the text into usable, bite-sized pieces
|
39
|
+
for the parser to chomp on. Here's what scanner may look like:
|
40
|
+
|
41
|
+
```ruby
|
42
|
+
module MyLanguage
|
43
|
+
class Scanner
|
44
|
+
# All of the behavior from Yoga for scanners. This provides the
|
45
|
+
# `match/2` method, the `call/0` method, the `match_line/1` method,
|
46
|
+
# the `location/1` method, and the `emit/2` method. The major ones that
|
47
|
+
# are used are the `match/2`, the `call/0`, and the `match_line/1`
|
48
|
+
# methods.
|
49
|
+
include Yoga::Scanner
|
50
|
+
|
51
|
+
# This must be implemented. This is called for the next token. This
|
52
|
+
# should only return a Token, or true.
|
53
|
+
def scan
|
54
|
+
# Match with a string value escapes the string, then turns it into a
|
55
|
+
# regular expression.
|
56
|
+
match("[") || match("]") ||
|
57
|
+
# Match with a symbol escapes the symbol, and turns it into a regular
|
58
|
+
# expression, suffixing it with `symbol_negative_assertion`. This is
|
59
|
+
# to prevent issues with identifiers and keywords.
|
60
|
+
match(:class) || match(:func) ||
|
61
|
+
# With a regular expression, it's matched exactly. However, a token
|
62
|
+
# name is highly recommended.
|
63
|
+
match(/[a-z][a-zA-Z0-9_]*[!?=]?/, :IDENT)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
end
|
67
|
+
```
|
68
|
+
|
69
|
+
And that's it! You now have a fully functioning scanner. In order to use it,
|
70
|
+
all you have to do is this:
|
71
|
+
|
72
|
+
```ruby
|
73
|
+
source = "class alpha [func a []]"
|
74
|
+
MyLanguage::Scanner.new(source).call # => #<Enumerable ...>
|
75
|
+
```
|
76
|
+
|
77
|
+
Note that `Scanner#call` returns an enumerable. `#call` is aliased as `#each`.
|
78
|
+
What this means is that tokens aren't generated until they're requested by the
|
79
|
+
parser - each token is generated from the source incrementally. If you want
|
80
|
+
to retrieve all of the tokens immediately, you have to first convert it into
|
81
|
+
a string, or perform some other operation on the enumerable (since it isn't
|
82
|
+
lazy):
|
83
|
+
|
84
|
+
```ruby
|
85
|
+
MyLanguage::Scanner.new(source).call.to_a # => [...]
|
86
|
+
```
|
87
|
+
|
88
|
+
The scanner also automatically adds location information to all of the tokens.
|
89
|
+
This is handled automatically by `match/2` and `emit/2` - the only issue being
|
90
|
+
that all regular expressions **must not** include a newline. Newlines should
|
91
|
+
be matched with `match_line/1`; if lines must be emitted as a token, you can
|
92
|
+
pass the kind of token to emit to `match_line/1` using the `kind:` keyword.
|
93
|
+
|
94
|
+
You may notice that all of the tokens have `<anon>` set as the location's file.
|
95
|
+
This is the default location, which is provided to the initializer:
|
96
|
+
|
97
|
+
```ruby
|
98
|
+
MyLanguage::Scanner.new(source, "foo").call.first.location.to_s # => "foo:1.1-6"
|
99
|
+
```
|
100
|
+
|
101
|
+
Parsers are a little bit more complicated. Before we can pull up the parser,
|
102
|
+
let's define a grammar and some node classes.
|
103
|
+
|
104
|
+
```
|
105
|
+
; This is the grammar.
|
106
|
+
<root> = *<statement>
|
107
|
+
<statement> = <expression> ';'
|
108
|
+
<expression> = <expression> <op> <expression>
|
109
|
+
<expression> /= <int> ; here, <int> is defined by the scanner.
|
110
|
+
<op> = '+' / '-' / '*' / '/' / '^' / '%' / '='
|
111
|
+
```
|
24
112
|
|
25
|
-
|
113
|
+
```ruby
|
114
|
+
module MyLanguage
|
115
|
+
class Parser
|
116
|
+
class Root < Yoga::Node
|
117
|
+
# An attribute on the node. This is required for Yoga nodes since the
|
118
|
+
# update syntax requires them. The type for the attribute is optional.
|
119
|
+
attribute :statements, type: [Yoga::Node]
|
120
|
+
end
|
121
|
+
|
122
|
+
class Expression < Yoga::Node
|
123
|
+
end
|
124
|
+
|
125
|
+
class Operation < Expression
|
126
|
+
attribute :operator, type: ::Symbol
|
127
|
+
attribute :left, type: Expression
|
128
|
+
attribute :right, type: Expression
|
129
|
+
end
|
130
|
+
|
131
|
+
class Literal < Expression
|
132
|
+
attribute :value, type: ::Integer
|
133
|
+
end
|
134
|
+
end
|
135
|
+
end
|
136
|
+
```
|
137
|
+
|
138
|
+
With those out of the way, let's take a look at the parser itself.
|
139
|
+
|
140
|
+
```ruby
|
141
|
+
module MyLanguage
|
142
|
+
class Parser
|
143
|
+
# This provides all of the parser helpers. This is the same as adding
|
144
|
+
# `Yoga::Parser::Helpers` as an include statement as well.
|
145
|
+
include Yoga::Parser
|
146
|
+
|
147
|
+
# Like the `scan/0` method on the scanner, this must be implemented. This
|
148
|
+
# is the entry point for the parser. However, public usage should use the
|
149
|
+
# `call/0` method. This should return a node of some sort.
|
150
|
+
def parse_root
|
151
|
+
# This "collects" a series of nodes in sequence. It iterates until it
|
152
|
+
# reaches the `:EOF` token (in this case). The first parameter to
|
153
|
+
# collect is the "terminating token," and can be any value that
|
154
|
+
# `expect/1` or `peek?/1` accepts. The second, optional parameter to
|
155
|
+
# collect is the "joining token," and is required between each node.
|
156
|
+
# We're not using the semicolon as a joining token because that is
|
157
|
+
# required for _all_ statements. The joining token can be used for
|
158
|
+
# things like argument lists. The parameter can be any value that
|
159
|
+
# `expect/1` or `peek?/1` accepts.
|
160
|
+
children = collect(:EOF) { parse_statement }
|
161
|
+
|
162
|
+
# "Unions" the location of all of the statements in the list.
|
163
|
+
location = children.map(&:location).inject(:union)
|
164
|
+
Parser::Root.new(statements: children, location: location)
|
165
|
+
end
|
166
|
+
|
167
|
+
# Parses a statement. This is the same as the <statement> rule as above.
|
168
|
+
def parse_statement
|
169
|
+
expression = parse_expression
|
170
|
+
# This says that the next token should be a semicolon. If the next token
|
171
|
+
# isn't, it throws an error with a detailed error message, denoting
|
172
|
+
# what was expected (in this case, a semicolon), what was given, and
|
173
|
+
# where the error was located in the source file.
|
174
|
+
expect(:";")
|
175
|
+
|
176
|
+
expression
|
177
|
+
end
|
26
178
|
|
27
|
-
## Development
|
28
179
|
|
29
|
-
|
180
|
+
# A switch statement, essentially. This is defined beforehand to make it
|
181
|
+
# _faster_ (not really; it's just useful). The first parameter to the
|
182
|
+
# switch function is the name of the switch. This is used later to
|
183
|
+
# actually perform the switch; it is also used to define a first set with
|
184
|
+
# the allowed tokens for the switch. The second parameter defines a key
|
185
|
+
# value pair. The keys are the tokens that are allowed; a symbol or an
|
186
|
+
# array of symbols can be used. The value is the block or the method that
|
187
|
+
# is executed upon encountering that token.
|
188
|
+
switch(:Operation,
|
189
|
+
"=": proc { |left| parse_operation(:"=", left) },
|
190
|
+
"+": proc { |left| parse_operation(:"+", left) },
|
191
|
+
"-": proc { |left| parse_operation(:"-", left) },
|
192
|
+
"*": proc { |left| parse_operation(:"*", left) },
|
193
|
+
"/": proc { |left| parse_operation(:"/", left) },
|
194
|
+
"^": proc { |left| parse_operation(:"^", left) },
|
195
|
+
"%": proc { |left| parse_operation(:"%", left) })
|
196
|
+
|
197
|
+
def parse_expression
|
198
|
+
# Parse a literal. All expressions must contain a literal of some sort;
|
199
|
+
# we're just going to use a numeric literal here.
|
200
|
+
left = parse_expression_literal
|
201
|
+
|
202
|
+
# Whenever the `.switch` function is called, it creates a
|
203
|
+
# "first set" that can be used like this. The first set consists of
|
204
|
+
# a set of tokens that are allowed for the switch statement. In this
|
205
|
+
# case, it just makes sure that the next token is an operator. If it
|
206
|
+
# is, it parses it as an operation.
|
207
|
+
if peek?(first(:Operation))
|
208
|
+
# Uses the switch defined below. If a token is found as a key, its
|
209
|
+
# block is executed; otherwise, it errors, giving a detailed error of
|
210
|
+
# what was expected.
|
211
|
+
switch(:Operation, left)
|
212
|
+
else
|
213
|
+
left
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
def parse_operation(op, left)
|
218
|
+
token = expect(op)
|
219
|
+
right = parse_expression
|
220
|
+
|
221
|
+
Parser::Operation.new(left: left, op: op, right: right, location:
|
222
|
+
left.location | op.location | right.location)
|
223
|
+
end
|
224
|
+
|
225
|
+
def parse_expression_literal
|
226
|
+
token = expect(:NUMERIC)
|
227
|
+
Parser::Literal.new(value: token.value, location: token.location)
|
228
|
+
end
|
229
|
+
end
|
230
|
+
end
|
231
|
+
```
|
232
|
+
|
233
|
+
This parser can then be used as such:
|
234
|
+
|
235
|
+
```ruby
|
236
|
+
source = "a = 2;\nb = a + 2;\n"
|
237
|
+
scanner = MyLanguage::Scanner.new(source).call
|
238
|
+
MyLanguage::Parser.new(scanner).call # => #<MyLanguage::Parser::Root ...>
|
239
|
+
```
|
240
|
+
|
241
|
+
That's about it! If you have any questions, you can email me at
|
242
|
+
<jeremy.rodi@medcat.me>, open an issue, or do what you like.
|
30
243
|
|
31
|
-
|
244
|
+
For more documentation, see [the Documentation][documentation] - Yoga has a
|
245
|
+
requirement of 100% documentation.
|
32
246
|
|
33
247
|
## Contributing
|
34
248
|
|
35
|
-
Bug reports and pull requests are welcome on GitHub at
|
249
|
+
Bug reports and pull requests are welcome on GitHub at
|
250
|
+
<https://github.com/medcat/yoga>. This project is intended to be a safe,
|
251
|
+
welcoming space for collaboration, and contributors are expected to adhere to
|
252
|
+
the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
|
36
253
|
|
254
|
+
[build-status]: https://travis-ci.org/medcat/yoga.svg?branch=master
|
255
|
+
[documentation]: http://www.rubydoc.info/github/medcat/yoga/master
|
256
|
+
[coverage-status]: https://coveralls.io/repos/github/medcat/yoga/badge.svg?branch=master
|
257
|
+
[build-status-link]: https://travis-ci.org/medcat/yoga
|
258
|
+
[coverage-status-link]: https://coveralls.io/github/medcat/yoga?branch=master
|
data/lib/yoga/errors.rb
CHANGED
@@ -33,6 +33,21 @@ module Yoga
|
|
33
33
|
attr_reader :location
|
34
34
|
end
|
35
35
|
|
36
|
+
# An error that occurred with scanning.
|
37
|
+
#
|
38
|
+
# @api private
|
39
|
+
class ScanError < LocationError; end
|
40
|
+
|
41
|
+
# An unexpected character was encountered while scanning.
|
42
|
+
#
|
43
|
+
# @api private
|
44
|
+
class UnexpectedCharacterError < LocationError
|
45
|
+
# (see Error#generate_message)
|
46
|
+
private def generate_message
|
47
|
+
"An unexpected character was encountered at #{@location}"
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
36
51
|
# An error that occurred with parsing.
|
37
52
|
#
|
38
53
|
# @api private
|
data/lib/yoga/parser/helpers.rb
CHANGED
@@ -142,7 +142,8 @@ module Yoga
|
|
142
142
|
# @return [::Object] The result of calling the block.
|
143
143
|
def switch(name, *param)
|
144
144
|
switch = self.class.switch(name)
|
145
|
-
block = switch
|
145
|
+
block = switch
|
146
|
+
.fetch(peek.kind) { switch.fetch(:$else) { error(switch.keys) } }
|
146
147
|
instance_exec(*param, &block)
|
147
148
|
end
|
148
149
|
|
data/lib/yoga/scanner.rb
CHANGED
@@ -6,12 +6,20 @@ module Yoga
|
|
6
6
|
# It is built to lazily scan whenever it is required, instead
|
7
7
|
# of all at once. This integrates nicely with the parser.
|
8
8
|
module Scanner
|
9
|
+
# The file of the scanner. This can be overwritten to provide a descriptor
|
10
|
+
# for the file.
|
11
|
+
#
|
12
|
+
# @return [::String]
|
13
|
+
attr_reader :file
|
14
|
+
|
9
15
|
# Initializes the scanner with the given source. Once the
|
10
16
|
# source is set, it shouldn't be changed.
|
11
17
|
#
|
12
18
|
# @param source [::String] The source.
|
13
|
-
|
19
|
+
# @param file [::String] The file the scanner comes from.
|
20
|
+
def initialize(source, file = "<anon>")
|
14
21
|
@source = source
|
22
|
+
@file = file
|
15
23
|
@line = 1
|
16
24
|
@last_line_at = 0
|
17
25
|
end
|
@@ -32,10 +40,10 @@ module Yoga
|
|
32
40
|
|
33
41
|
until @scanner.eos?
|
34
42
|
value = scan
|
35
|
-
yield value
|
43
|
+
yield value unless value == true || !value
|
36
44
|
end
|
37
45
|
|
38
|
-
yield
|
46
|
+
yield eof_token
|
39
47
|
self
|
40
48
|
end
|
41
49
|
|
@@ -53,7 +61,7 @@ module Yoga
|
|
53
61
|
fail NotImplementedError, "Please implement #{self.class}#scan"
|
54
62
|
end
|
55
63
|
|
56
|
-
|
64
|
+
protected
|
57
65
|
|
58
66
|
# Returns a location at the given location. If a size is given, it reduces
|
59
67
|
# the column number by the size and returns the size from that.
|
@@ -115,12 +123,13 @@ module Yoga
|
|
115
123
|
# such as line counting and caching, to be performed.
|
116
124
|
#
|
117
125
|
# @return [Boolean] If the line was matched.
|
118
|
-
def match_line(kind
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
126
|
+
def match_line(kind: false, required: false)
|
127
|
+
result = @scanner.scan(LINE_MATCHER)
|
128
|
+
(required ? (fail UnexpectedCharacterError, location: location) : return) \
|
129
|
+
unless result
|
130
|
+
@line += 1
|
131
|
+
@last_line_at = @scanner.charpos
|
132
|
+
(kind && emit(kind)) || true
|
124
133
|
end
|
125
134
|
|
126
135
|
# Returns the number of lines that have been covered so far in the scanner.
|
@@ -145,12 +154,11 @@ module Yoga
|
|
145
154
|
"(?![a-zA-Z])"
|
146
155
|
end
|
147
156
|
|
148
|
-
#
|
149
|
-
# for the file.
|
157
|
+
# Returns a token that denotes that the scanner is done scanning.
|
150
158
|
#
|
151
|
-
# @return [::
|
152
|
-
def
|
153
|
-
|
159
|
+
# @return [Yoga::Token]
|
160
|
+
def eof_token
|
161
|
+
emit(:EOF, "")
|
154
162
|
end
|
155
163
|
end
|
156
164
|
end
|
data/lib/yoga/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: yoga
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jeremy Rodi
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-03-
|
11
|
+
date: 2017-03-10 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: mixture
|
@@ -142,7 +142,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
142
142
|
version: '0'
|
143
143
|
requirements: []
|
144
144
|
rubyforge_project:
|
145
|
-
rubygems_version: 2.5.
|
145
|
+
rubygems_version: 2.5.1
|
146
146
|
signing_key:
|
147
147
|
specification_version: 4
|
148
148
|
summary: Ruby scanner and parser helpers.
|