ebnf 2.5.0 → 2.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +11 -11
- data/VERSION +1 -1
- data/bin/ebnf +6 -1
- data/etc/ebnf.ebnf +6 -5
- data/etc/ebnf.html +8 -2
- data/etc/ebnf.ll1.rb +1 -1
- data/etc/ebnf.ll1.sxp +3 -5
- data/etc/ebnf.peg.rb +13 -12
- data/etc/ebnf.peg.sxp +12 -11
- data/etc/ebnf.sxp +3 -5
- data/etc/iso-ebnf.isoebnf +4 -5
- data/lib/ebnf/abnf.rb +4 -4
- data/lib/ebnf/base.rb +12 -16
- data/lib/ebnf/ebnf/meta.rb +8 -6
- data/lib/ebnf/isoebnf.rb +4 -4
- data/lib/ebnf/ll1/parser.rb +3 -3
- data/lib/ebnf/native.rb +5 -4
- data/lib/ebnf/parser.rb +39 -10
- data/lib/ebnf/peg/parser.rb +27 -21
- data/lib/ebnf/peg/rule.rb +59 -26
- data/lib/ebnf/terminals.rb +5 -4
- metadata +17 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2cacadf02a11bd000711f0e3b68a343152fab195e04315210887a0cb9576a813
|
4
|
+
data.tar.gz: 400eaa6a4dfdc177dcafc80cdb535a63a31c7af092c8f01b00e880d4d021b7bc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6968633fa5be00518afc4138f02f971ea88d74322f8a24f84023c7fcc306d8d38a930e6f23144c01f25745c9ad86c0659c30d739d65762071476a0830d03a5aa
|
7
|
+
data.tar.gz: e3878b45bc9e553e78c60e9cc31ca84fda40e8a40986797764e1240ece08009c8ac092c8fdf852522e182d3928f7494636d960d70bf469d7e6d346e09bf7d2a9
|
data/README.md
CHANGED
@@ -26,10 +26,9 @@ As LL(1) grammars operate using `alt` and `seq` primitives, allowing for a match
|
|
26
26
|
* Transform `a ::= b+` into `a ::= b b*`
|
27
27
|
* Transform `a ::= b*` into `a ::= _empty | (b a)`
|
28
28
|
* Transform `a ::= op1 (op2)` into two rules:
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
```
|
29
|
+
|
30
|
+
a ::= op1 _a_1
|
31
|
+
_a_1_ ::= op2
|
33
32
|
|
34
33
|
Of note in this implementation is that the tokenizer and parser are streaming, so that they can process inputs of arbitrary size.
|
35
34
|
|
@@ -75,7 +74,7 @@ Generate formatted grammar using HTML (requires [Haml][Haml] gem):
|
|
75
74
|
|
76
75
|
### Parsing an ISO/IEC 14977 Grammar
|
77
76
|
|
78
|
-
The EBNF gem can also parse
|
77
|
+
The EBNF gem can also parse [ISO/IEC 14977][] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
|
79
78
|
|
80
79
|
grammar = EBNF.parse(File.open('./etc/iso-ebnf.isoebnf'), format: :isoebnf)
|
81
80
|
|
@@ -96,7 +95,7 @@ The {EBNF::Writer} class can be used to write parsed grammars out, either as for
|
|
96
95
|
The formatted HTML results are designed to be appropriate for including in specifications.
|
97
96
|
|
98
97
|
### Parser Errors
|
99
|
-
On a parsing failure,
|
98
|
+
On a parsing failure, an exception is raised with information that may be useful in determining the source of the error.
|
100
99
|
|
101
100
|
## EBNF Grammar
|
102
101
|
The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][]
|
@@ -104,7 +103,7 @@ The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][]
|
|
104
103
|
as defined in the
|
105
104
|
[XML 1.0 recommendation](https://www.w3.org/TR/REC-xml/), with minor extensions:
|
106
105
|
|
107
|
-
Note that the grammar includes an optional `[
|
106
|
+
Note that the grammar includes an optional `[number]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
|
108
107
|
|
109
108
|
The character set for EBNF is UTF-8.
|
110
109
|
|
@@ -116,7 +115,7 @@ which can also be proceeded by an optional number enclosed in square brackets to
|
|
116
115
|
|
117
116
|
[1] symbol ::= expression
|
118
117
|
|
119
|
-
(Note,
|
118
|
+
(Note, introduces an ambiguity if the previous rule ends in a range or enum and the current rule has no number. The parsers dynamically determine the terminal rules for the `LHS` (the identifier, symbol, and `::=`) and `RANGE`).
|
120
119
|
|
121
120
|
Symbols are written in CAPITAL CASE if they are the start symbol of a regular language (terminals), otherwise with they are treated as non-terminal rules. Literal strings are quoted.
|
122
121
|
|
@@ -134,7 +133,7 @@ Within the expression on the right-hand side of a rule, the following expression
|
|
134
133
|
<tr><td><code>[^abc], [^#xN#xN#xN]</code></td>
|
135
134
|
<td>matches any UTF-8 R\_CHAR or HEX with a value not among the characters given. The last component may be '-'. Enumerations and ranges of excluded values may be mixed in one set of brackets.</td></tr>
|
136
135
|
<tr><td><code>"string"</code></td>
|
137
|
-
<td>matches a literal string matching that given inside the double quotes.</td></tr>
|
136
|
+
<td>matches a literal string matching that given inside the double quotes case insensitively.</td></tr>
|
138
137
|
<tr><td><code>'string'</code></td>
|
139
138
|
<td>matches a literal string matching that given inside the single quotes.</td></tr>
|
140
139
|
<tr><td><code>A (B | C)</code></td>
|
@@ -158,7 +157,8 @@ Within the expression on the right-hand side of a rule, the following expression
|
|
158
157
|
</table>
|
159
158
|
|
160
159
|
* Comments include `//` and `#` through end of line (other than hex character) and `/* ... */ (* ... *) which may cross lines`
|
161
|
-
* All rules **MAY** start with an
|
160
|
+
* All rules **MAY** start with an number, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`, which is not retained after parsing
|
161
|
+
* Symbols **MAY** be enclosed in angle brackets `'<'` and `>`, which are dropped when parsing.
|
162
162
|
* `@terminals` causes following rules to be treated as terminals. Any terminal which is all upper-case (eg`TERMINAL`), or any rules with expressions that match characters (`#xN`, `[a-z]`, `[^a-z]`, `[abc]`, `[^abc]`, `"string"`, `'string'`, or `A - B`), are also treated as terminals.
|
163
163
|
* `@pass` defines the expression used to detect whitespace, which is removed in processing.
|
164
164
|
* No support for `wfc` (well-formedness constraint) or `vc` (validity constraint).
|
@@ -177,7 +177,7 @@ Intermediate representations of the grammar may be serialized to Lisp-like [S-Ex
|
|
177
177
|
|
178
178
|
is serialized as
|
179
179
|
|
180
|
-
(rule ebnf
|
180
|
+
(rule ebnf (star (alt declaration rule)))
|
181
181
|
|
182
182
|
Different components of an EBNF rule expression are transformed into their own operator:
|
183
183
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
2.
|
1
|
+
2.6.0
|
data/bin/ebnf
CHANGED
@@ -9,6 +9,7 @@ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), "..", 'lib')))
|
|
9
9
|
require 'rubygems'
|
10
10
|
require 'getoptlong'
|
11
11
|
require 'ebnf'
|
12
|
+
require 'rdf/spec'
|
12
13
|
|
13
14
|
options = {
|
14
15
|
output_format: :sxp,
|
@@ -86,7 +87,11 @@ end
|
|
86
87
|
|
87
88
|
input = File.open(ARGV[0]) if ARGV[0]
|
88
89
|
|
89
|
-
|
90
|
+
logger = Logger.new(STDERR)
|
91
|
+
logger.level = options[:level] || Logger::ERROR
|
92
|
+
logger.formatter = lambda {|severity, datetime, progname, msg| "%5s %s\n" % [severity, msg]}
|
93
|
+
|
94
|
+
ebnf = EBNF.parse(input || STDIN, logger: logger, **options)
|
90
95
|
ebnf.make_bnf if options[:bnf] || options[:ll1]
|
91
96
|
ebnf.make_peg if options[:peg]
|
92
97
|
if options[:ll1]
|
data/etc/ebnf.ebnf
CHANGED
@@ -5,9 +5,8 @@
|
|
5
5
|
|
6
6
|
# Use the LHS terminal to match the identifier, rule name and assignment due to
|
7
7
|
# confusion between the identifier and RANGE.
|
8
|
-
#
|
9
|
-
#
|
10
|
-
# In such case, best to enclose the rule in '()'.
|
8
|
+
# The PEG parser has special rules for matching LHS and RANGE
|
9
|
+
# so that RANGE is not confused with LHS.
|
11
10
|
[3] rule ::= LHS expression
|
12
11
|
|
13
12
|
[4] expression ::= alt
|
@@ -34,11 +33,13 @@
|
|
34
33
|
|
35
34
|
[11] LHS ::= ('[' SYMBOL ']' ' '+)? SYMBOL ' '* '::='
|
36
35
|
|
37
|
-
[12] SYMBOL ::=
|
36
|
+
[12] SYMBOL ::= '<' O_SYMBOL '>' | O_SYMBOL
|
37
|
+
|
38
|
+
[12a] O_SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
|
38
39
|
|
39
40
|
[13] HEX ::= '#x' ([a-f] | [A-F] | [0-9])+
|
40
41
|
|
41
|
-
[14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
|
42
|
+
[14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
|
42
43
|
|
43
44
|
[15] O_RANGE ::= '[^' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
|
44
45
|
|
data/etc/ebnf.html
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
<!-- Generated with ebnf version 2.
|
1
|
+
<!-- Generated with ebnf version 2.5.0. See https://github.com/dryruby/ebnf. -->
|
2
2
|
<table class="grammar">
|
3
3
|
<tbody id="grammar-productions" class="ebnf">
|
4
4
|
<tr id="grammar-production-ebnf">
|
@@ -77,6 +77,12 @@
|
|
77
77
|
<td>[12]</td>
|
78
78
|
<td><code>SYMBOL</code></td>
|
79
79
|
<td>::=</td>
|
80
|
+
<td><code class="grammar-paren">(</code>'<code class="grammar-literal"><</code>' <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a> '<code class="grammar-literal">></code>'<code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a></td>
|
81
|
+
</tr>
|
82
|
+
<tr id="grammar-production-O_SYMBOL">
|
83
|
+
<td>[12a]</td>
|
84
|
+
<td><code>O_SYMBOL</code></td>
|
85
|
+
<td>::=</td>
|
80
86
|
<td><code class="grammar-paren">(</code><code class="grammar-brac">[</code><code class="grammar-literal">a-z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">A-Z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">0-9</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> '<code class="grammar-literal">_</code>' <code class="grammar-alt">|</code> '<code class="grammar-literal">.</code>'<code class="grammar-paren">)</code><code class="grammar-plus">+</code></td>
|
81
87
|
</tr>
|
82
88
|
<tr id="grammar-production-HEX">
|
@@ -89,7 +95,7 @@
|
|
89
95
|
<td>[14]</td>
|
90
96
|
<td><code>RANGE</code></td>
|
91
97
|
<td>::=</td>
|
92
|
-
<td>'<code class="grammar-literal">[</code>' <code class="grammar-paren">(</code><code class="grammar-paren">(</code><a href="#grammar-production-R_CHAR">R_CHAR</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-R_CHAR">R_CHAR</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <code class="grammar-paren">(</code><a href="#grammar-production-HEX">HEX</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code class="grammar-alt">|</code> <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code><code class="grammar-plus">+</code> '<code class="grammar-literal">-</code>'<code class="grammar-opt">?</code>
|
98
|
+
<td>'<code class="grammar-literal">[</code>' <code class="grammar-paren">(</code><code class="grammar-paren">(</code><a href="#grammar-production-R_CHAR">R_CHAR</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-R_CHAR">R_CHAR</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <code class="grammar-paren">(</code><a href="#grammar-production-HEX">HEX</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code class="grammar-alt">|</code> <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code><code class="grammar-plus">+</code> '<code class="grammar-literal">-</code>'<code class="grammar-opt">?</code> '<code class="grammar-literal">]</code>'</td>
|
93
99
|
</tr>
|
94
100
|
<tr id="grammar-production-O_RANGE">
|
95
101
|
<td>[15]</td>
|
data/etc/ebnf.ll1.rb
CHANGED
data/etc/ebnf.ll1.sxp
CHANGED
@@ -100,13 +100,11 @@
|
|
100
100
|
(seq '@pass' expression))
|
101
101
|
(terminals _terminals (seq))
|
102
102
|
(terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
|
103
|
-
(terminal SYMBOL "12" (
|
103
|
+
(terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
|
104
|
+
(terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
|
104
105
|
(terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
|
105
106
|
(terminal RANGE "14"
|
106
|
-
(seq '['
|
107
|
-
(plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX))
|
108
|
-
(opt '-')
|
109
|
-
(diff ']' LHS)) )
|
107
|
+
(seq '[' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
|
110
108
|
(terminal O_RANGE "15"
|
111
109
|
(seq '[^' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
|
112
110
|
(terminal STRING1 "16" (seq '"' (star (diff CHAR '"')) '"'))
|
data/etc/ebnf.peg.rb
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
# This file is automatically generated by ebnf version 2.
|
1
|
+
# This file is automatically generated by ebnf version 2.5.0
|
2
2
|
# Derived from etc/ebnf.ebnf
|
3
3
|
module EBNFMeta
|
4
4
|
RULES = [
|
@@ -25,24 +25,25 @@ module EBNFMeta
|
|
25
25
|
EBNF::Rule.new(:_LHS_3, "11.3", [:seq, "[", :SYMBOL, "]", :_LHS_4], kind: :terminal).extend(EBNF::PEG::Rule),
|
26
26
|
EBNF::Rule.new(:_LHS_4, "11.4", [:plus, " "], kind: :terminal).extend(EBNF::PEG::Rule),
|
27
27
|
EBNF::Rule.new(:_LHS_2, "11.2", [:star, " "], kind: :terminal).extend(EBNF::PEG::Rule),
|
28
|
-
EBNF::Rule.new(:SYMBOL, "12", [:
|
29
|
-
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:
|
30
|
-
EBNF::Rule.new(:
|
31
|
-
EBNF::Rule.new(:
|
32
|
-
EBNF::Rule.new(:
|
28
|
+
EBNF::Rule.new(:SYMBOL, "12", [:alt, :_SYMBOL_1, :O_SYMBOL], kind: :terminal).extend(EBNF::PEG::Rule),
|
29
|
+
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:seq, "<", :O_SYMBOL, ">"], kind: :terminal).extend(EBNF::PEG::Rule),
|
30
|
+
EBNF::Rule.new(:O_SYMBOL, "12a", [:plus, :_O_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
|
31
|
+
EBNF::Rule.new(:_O_SYMBOL_1, "12a.1", [:alt, :_O_SYMBOL_2, :_O_SYMBOL_3, :_O_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
|
32
|
+
EBNF::Rule.new(:_O_SYMBOL_2, "12a.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
|
33
|
+
EBNF::Rule.new(:_O_SYMBOL_3, "12a.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
|
34
|
+
EBNF::Rule.new(:_O_SYMBOL_4, "12a.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
|
33
35
|
EBNF::Rule.new(:HEX, "13", [:seq, "#x", :_HEX_1], kind: :terminal).extend(EBNF::PEG::Rule),
|
34
36
|
EBNF::Rule.new(:_HEX_1, "13.1", [:plus, :_HEX_2], kind: :terminal).extend(EBNF::PEG::Rule),
|
35
37
|
EBNF::Rule.new(:_HEX_2, "13.2", [:alt, :_HEX_3, :_HEX_4, :_HEX_5], kind: :terminal).extend(EBNF::PEG::Rule),
|
36
38
|
EBNF::Rule.new(:_HEX_3, "13.3", [:range, "a-f"], kind: :terminal).extend(EBNF::PEG::Rule),
|
37
39
|
EBNF::Rule.new(:_HEX_4, "13.4", [:range, "A-F"], kind: :terminal).extend(EBNF::PEG::Rule),
|
38
40
|
EBNF::Rule.new(:_HEX_5, "13.5", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
|
39
|
-
EBNF::Rule.new(:RANGE, "14", [:seq, "[", :_RANGE_1, :_RANGE_2,
|
40
|
-
EBNF::Rule.new(:_RANGE_1, "14.1", [:plus, :
|
41
|
-
EBNF::Rule.new(:
|
42
|
-
EBNF::Rule.new(:
|
43
|
-
EBNF::Rule.new(:
|
41
|
+
EBNF::Rule.new(:RANGE, "14", [:seq, "[", :_RANGE_1, :_RANGE_2, "]"], kind: :terminal).extend(EBNF::PEG::Rule),
|
42
|
+
EBNF::Rule.new(:_RANGE_1, "14.1", [:plus, :_RANGE_3], kind: :terminal).extend(EBNF::PEG::Rule),
|
43
|
+
EBNF::Rule.new(:_RANGE_3, "14.3", [:alt, :_RANGE_4, :_RANGE_5, :R_CHAR, :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
|
44
|
+
EBNF::Rule.new(:_RANGE_4, "14.4", [:seq, :R_CHAR, "-", :R_CHAR], kind: :terminal).extend(EBNF::PEG::Rule),
|
45
|
+
EBNF::Rule.new(:_RANGE_5, "14.5", [:seq, :HEX, "-", :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
|
44
46
|
EBNF::Rule.new(:_RANGE_2, "14.2", [:opt, "-"], kind: :terminal).extend(EBNF::PEG::Rule),
|
45
|
-
EBNF::Rule.new(:_RANGE_3, "14.3", [:diff, "]", :LHS], kind: :terminal).extend(EBNF::PEG::Rule),
|
46
47
|
EBNF::Rule.new(:O_RANGE, "15", [:seq, "[^", :_O_RANGE_1, :_O_RANGE_2, "]"], kind: :terminal).extend(EBNF::PEG::Rule),
|
47
48
|
EBNF::Rule.new(:_O_RANGE_1, "15.1", [:plus, :_O_RANGE_3], kind: :terminal).extend(EBNF::PEG::Rule),
|
48
49
|
EBNF::Rule.new(:_O_RANGE_3, "15.3", [:alt, :_O_RANGE_4, :_O_RANGE_5, :R_CHAR, :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
|
data/etc/ebnf.peg.sxp
CHANGED
@@ -22,24 +22,25 @@
|
|
22
22
|
(terminal _LHS_3 "11.3" (seq '[' SYMBOL ']' _LHS_4))
|
23
23
|
(terminal _LHS_4 "11.4" (plus ' '))
|
24
24
|
(terminal _LHS_2 "11.2" (star ' '))
|
25
|
-
(terminal SYMBOL "12" (
|
26
|
-
(terminal _SYMBOL_1 "12.1" (
|
27
|
-
(terminal
|
28
|
-
(terminal
|
29
|
-
(terminal
|
25
|
+
(terminal SYMBOL "12" (alt _SYMBOL_1 O_SYMBOL))
|
26
|
+
(terminal _SYMBOL_1 "12.1" (seq '<' O_SYMBOL '>'))
|
27
|
+
(terminal O_SYMBOL "12a" (plus _O_SYMBOL_1))
|
28
|
+
(terminal _O_SYMBOL_1 "12a.1" (alt _O_SYMBOL_2 _O_SYMBOL_3 _O_SYMBOL_4 '_' '.'))
|
29
|
+
(terminal _O_SYMBOL_2 "12a.2" (range "a-z"))
|
30
|
+
(terminal _O_SYMBOL_3 "12a.3" (range "A-Z"))
|
31
|
+
(terminal _O_SYMBOL_4 "12a.4" (range "0-9"))
|
30
32
|
(terminal HEX "13" (seq '#x' _HEX_1))
|
31
33
|
(terminal _HEX_1 "13.1" (plus _HEX_2))
|
32
34
|
(terminal _HEX_2 "13.2" (alt _HEX_3 _HEX_4 _HEX_5))
|
33
35
|
(terminal _HEX_3 "13.3" (range "a-f"))
|
34
36
|
(terminal _HEX_4 "13.4" (range "A-F"))
|
35
37
|
(terminal _HEX_5 "13.5" (range "0-9"))
|
36
|
-
(terminal RANGE "14" (seq '[' _RANGE_1 _RANGE_2
|
37
|
-
(terminal _RANGE_1 "14.1" (plus
|
38
|
-
(terminal
|
39
|
-
(terminal
|
40
|
-
(terminal
|
38
|
+
(terminal RANGE "14" (seq '[' _RANGE_1 _RANGE_2 ']'))
|
39
|
+
(terminal _RANGE_1 "14.1" (plus _RANGE_3))
|
40
|
+
(terminal _RANGE_3 "14.3" (alt _RANGE_4 _RANGE_5 R_CHAR HEX))
|
41
|
+
(terminal _RANGE_4 "14.4" (seq R_CHAR '-' R_CHAR))
|
42
|
+
(terminal _RANGE_5 "14.5" (seq HEX '-' HEX))
|
41
43
|
(terminal _RANGE_2 "14.2" (opt '-'))
|
42
|
-
(terminal _RANGE_3 "14.3" (diff ']' LHS))
|
43
44
|
(terminal O_RANGE "15" (seq '[^' _O_RANGE_1 _O_RANGE_2 ']'))
|
44
45
|
(terminal _O_RANGE_1 "15.1" (plus _O_RANGE_3))
|
45
46
|
(terminal _O_RANGE_3 "15.3" (alt _O_RANGE_4 _O_RANGE_5 R_CHAR HEX))
|
data/etc/ebnf.sxp
CHANGED
@@ -12,13 +12,11 @@
|
|
12
12
|
(rule pass "10" (seq '@pass' expression))
|
13
13
|
(terminals _terminals (seq))
|
14
14
|
(terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
|
15
|
-
(terminal SYMBOL "12" (
|
15
|
+
(terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
|
16
|
+
(terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
|
16
17
|
(terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
|
17
18
|
(terminal RANGE "14"
|
18
|
-
(seq '['
|
19
|
-
(plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX))
|
20
|
-
(opt '-')
|
21
|
-
(diff ']' LHS)) )
|
19
|
+
(seq '[' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
|
22
20
|
(terminal O_RANGE "15"
|
23
21
|
(seq '[^' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
|
24
22
|
(terminal STRING1 "16" (seq '"' (star (diff CHAR '"')) '"'))
|
data/etc/iso-ebnf.isoebnf
CHANGED
@@ -1,4 +1,3 @@
|
|
1
|
-
(* W3C EBNF for ISO/IEC 14977 : 1996 EBNF *)
|
2
1
|
(* Scoured from https://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf *)
|
3
2
|
|
4
3
|
syntax = syntax_rule, {syntax_rule} ;
|
@@ -44,10 +43,10 @@ repeated_sequence = start_repeat_symbol, definitions_list, end_repeat_symbol
|
|
44
43
|
grouped_sequence = '(', definitions_list, ')'
|
45
44
|
(* The brackets ( and ) allow any <definitions list> to be a <primary> *);
|
46
45
|
|
47
|
-
terminal_string
|
48
|
-
|
49
|
-
|
50
|
-
|
46
|
+
terminal_string = ("'", first_terminal_character, {first_terminal_character}, "'")
|
47
|
+
| ('"', second_terminal_character, {second_terminal_character}, '"')
|
48
|
+
(* A <terminal string> represents the
|
49
|
+
<characters> between the quote symbols '_' or "_" *);
|
51
50
|
|
52
51
|
meta_identifier = letter, {meta_identifier_character}
|
53
52
|
(* A <meta identifier> is the name of a syntactic element of the language being defined *);
|
data/lib/ebnf/abnf.rb
CHANGED
@@ -234,10 +234,10 @@ module EBNF
|
|
234
234
|
# @return [EBNFParser]
|
235
235
|
def initialize(input, **options)
|
236
236
|
# If the `level` option is set, instantiate a logger for collecting trace information.
|
237
|
-
if options.
|
238
|
-
options[:logger]
|
239
|
-
|
240
|
-
|
237
|
+
if options.key?(:level)
|
238
|
+
options[:logger] ||= Logger.new(STDERR).
|
239
|
+
tap {|x| x.level = options[:level]}.
|
240
|
+
tap {|x| x.formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}}
|
241
241
|
end
|
242
242
|
|
243
243
|
# Read input, if necessary, which will be used in a Scanner.
|
data/lib/ebnf/base.rb
CHANGED
@@ -106,8 +106,8 @@ module EBNF
|
|
106
106
|
# Format of input, one of `:abnf`, `:ebnf`, `:isoebnf`, `:isoebnf`, `:native`, or `:sxp`.
|
107
107
|
# Use `:native` for the native EBNF parser, rather than the PEG parser.
|
108
108
|
# @param [Hash{Symbol => Object}] options
|
109
|
-
# @option options [Boolean
|
110
|
-
#
|
109
|
+
# @option options [Boolean] :level
|
110
|
+
# Trace level. 0(debug), 1(info), 2(warn), 3(error).
|
111
111
|
# @option options [Boolean, Array] :validate
|
112
112
|
# Validate resulting grammar.
|
113
113
|
def initialize(input, format: :ebnf, **options)
|
@@ -311,13 +311,7 @@ module EBNF
|
|
311
311
|
|
312
312
|
# Progress output, less than debugging
|
313
313
|
def progress(*args, **options)
|
314
|
-
|
315
|
-
depth = options[:depth] || @depth
|
316
|
-
args << yield if block_given?
|
317
|
-
message = "#{args.join(': ')}"
|
318
|
-
str = "[#{@lineno}]#{' ' * depth}#{message}"
|
319
|
-
@options[:debug] << str if @options[:debug].is_a?(Array)
|
320
|
-
$stderr.puts(str) if @options[:progress] || @options[:debug] == true
|
314
|
+
debug(*args, level: Logger::INFO, **options)
|
321
315
|
end
|
322
316
|
|
323
317
|
# Error output
|
@@ -325,10 +319,9 @@ module EBNF
|
|
325
319
|
depth = options[:depth] || @depth
|
326
320
|
args << yield if block_given?
|
327
321
|
message = "#{args.join(': ')}"
|
322
|
+
debug(message, level: Logger::ERROR, **options)
|
328
323
|
@errors << message
|
329
|
-
|
330
|
-
@options[:debug] << str if @options[:debug].is_a?(Array)
|
331
|
-
$stderr.puts(str)
|
324
|
+
$stderr.puts(message)
|
332
325
|
end
|
333
326
|
|
334
327
|
##
|
@@ -342,14 +335,17 @@ module EBNF
|
|
342
335
|
# @param [String] message ("")
|
343
336
|
#
|
344
337
|
# @yieldreturn [String] added to message
|
345
|
-
def debug(*args, **options)
|
346
|
-
return unless @options
|
338
|
+
def debug(*args, level: Logger::DEBUG, **options)
|
339
|
+
return unless @options.key?(:logger)
|
347
340
|
depth = options[:depth] || @depth
|
348
341
|
args << yield if block_given?
|
349
342
|
message = "#{args.join(': ')}"
|
350
343
|
str = "[#{@lineno}]#{' ' * depth}#{message}"
|
351
|
-
|
352
|
-
|
344
|
+
if @options[:logger].respond_to?(:add)
|
345
|
+
@options[:logger].add(level, str)
|
346
|
+
elsif @options[:logger].respond_to?(:<<)
|
347
|
+
@options[:logger] << "[#{lineno}] " + str
|
348
|
+
end
|
353
349
|
end
|
354
350
|
end
|
355
351
|
end
|
data/lib/ebnf/ebnf/meta.rb
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
# This file is automatically generated by ebnf version 2.
|
1
|
+
# This file is automatically generated by ebnf version 2.5.0
|
2
2
|
# Derived from etc/ebnf.ebnf
|
3
3
|
module EBNFMeta
|
4
4
|
RULES = [
|
@@ -25,11 +25,13 @@ module EBNFMeta
|
|
25
25
|
EBNF::Rule.new(:_LHS_3, "11.3", [:seq, "[", :SYMBOL, "]", :_LHS_4], kind: :terminal).extend(EBNF::PEG::Rule),
|
26
26
|
EBNF::Rule.new(:_LHS_4, "11.4", [:plus, " "], kind: :terminal).extend(EBNF::PEG::Rule),
|
27
27
|
EBNF::Rule.new(:_LHS_2, "11.2", [:star, " "], kind: :terminal).extend(EBNF::PEG::Rule),
|
28
|
-
EBNF::Rule.new(:SYMBOL, "12", [:
|
29
|
-
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:
|
30
|
-
EBNF::Rule.new(:
|
31
|
-
EBNF::Rule.new(:
|
32
|
-
EBNF::Rule.new(:
|
28
|
+
EBNF::Rule.new(:SYMBOL, "12", [:alt, :_SYMBOL_1, :O_SYMBOL], kind: :terminal).extend(EBNF::PEG::Rule),
|
29
|
+
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:seq, "<", :O_SYMBOL, ">"], kind: :terminal).extend(EBNF::PEG::Rule),
|
30
|
+
EBNF::Rule.new(:O_SYMBOL, "12a", [:plus, :_O_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
|
31
|
+
EBNF::Rule.new(:_O_SYMBOL_1, "12a.1", [:alt, :_O_SYMBOL_2, :_O_SYMBOL_3, :_O_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
|
32
|
+
EBNF::Rule.new(:_O_SYMBOL_2, "12a.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
|
33
|
+
EBNF::Rule.new(:_O_SYMBOL_3, "12a.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
|
34
|
+
EBNF::Rule.new(:_O_SYMBOL_4, "12a.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
|
33
35
|
EBNF::Rule.new(:HEX, "13", [:seq, "#x", :_HEX_1], kind: :terminal).extend(EBNF::PEG::Rule),
|
34
36
|
EBNF::Rule.new(:_HEX_1, "13.1", [:plus, :_HEX_2], kind: :terminal).extend(EBNF::PEG::Rule),
|
35
37
|
EBNF::Rule.new(:_HEX_2, "13.2", [:alt, :_HEX_3, :_HEX_4, :_HEX_5], kind: :terminal).extend(EBNF::PEG::Rule),
|
data/lib/ebnf/isoebnf.rb
CHANGED
@@ -196,10 +196,10 @@ module EBNF
|
|
196
196
|
# @return [EBNFParser]
|
197
197
|
def initialize(input, **options, &block)
|
198
198
|
# If the `level` option is set, instantiate a logger for collecting trace information.
|
199
|
-
if options.
|
200
|
-
options[:logger]
|
201
|
-
|
202
|
-
|
199
|
+
if options.key?(:level)
|
200
|
+
options[:logger] ||= Logger.new(STDERR).
|
201
|
+
tap {|x| x.level = options[:level]}.
|
202
|
+
tap {|x| x.formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}}
|
203
203
|
end
|
204
204
|
|
205
205
|
# Read input, if necessary, which will be used in a Scanner.
|
data/lib/ebnf/ll1/parser.rb
CHANGED
@@ -603,7 +603,7 @@ module EBNF::LL1
|
|
603
603
|
if handler
|
604
604
|
# Create a new production data element, potentially allowing handler
|
605
605
|
# to customize before pushing on the @prod_data stack
|
606
|
-
|
606
|
+
progress("#{prod}(:start):#{@prod_data.length}") {@prod_data.last}
|
607
607
|
data = {}
|
608
608
|
begin
|
609
609
|
self.class.eval_with_binding(self) {
|
@@ -617,12 +617,12 @@ module EBNF::LL1
|
|
617
617
|
elsif [:merge, :star].include?(@cleanup[prod])
|
618
618
|
# Save current data to merge later
|
619
619
|
@prod_data << {}
|
620
|
-
|
620
|
+
progress("#{prod}(:start}:#{@prod_data.length}:cleanup:#{@cleanup[prod]}") { get_token.inspect + (@recovering ? ' recovering' : '')}
|
621
621
|
else
|
622
622
|
# Make sure we push as many was we pop, even if there is no
|
623
623
|
# explicit start handler
|
624
624
|
@prod_data << {} if self.class.production_handlers[prod]
|
625
|
-
|
625
|
+
progress("#{prod}(:start:#{@prod_data.length})") { get_token.inspect + (@recovering ? ' recovering' : '')}
|
626
626
|
end
|
627
627
|
#puts "prod_data(s): " + @prod_data.inspect
|
628
628
|
end
|
data/lib/ebnf/native.rb
CHANGED
@@ -52,7 +52,7 @@ module EBNF
|
|
52
52
|
yield r unless r.empty?
|
53
53
|
#debug("eachRule(rule)") { "[#{cur_lineno}] #{s.inspect}" }
|
54
54
|
@lineno = cur_lineno
|
55
|
-
r = s
|
55
|
+
r = s.gsub(/[<>]/, '') # Remove angle brackets
|
56
56
|
else
|
57
57
|
# Collect until end of line, or start of comment or quote
|
58
58
|
s = scanner.scan_until(%r{(?:[/\(]\*)|#(?!x)|//|["']|$})
|
@@ -81,6 +81,7 @@ module EBNF
|
|
81
81
|
num, sym = num_sym.split(']', 2).map(&:strip)
|
82
82
|
num, sym = "", num if sym.nil?
|
83
83
|
num = num[1..-1]
|
84
|
+
sym = sym[1..-2] if sym.start_with?('<') && sym.end_with?('>')
|
84
85
|
r = Rule.new(sym && sym.to_sym, num, expression(expr).first, ebnf: self)
|
85
86
|
debug("ruleParts") { r.inspect }
|
86
87
|
r
|
@@ -226,7 +227,7 @@ module EBNF
|
|
226
227
|
# (a ' b c')
|
227
228
|
#
|
228
229
|
# >>> postfix("a? b c")
|
229
|
-
# ((opt
|
230
|
+
# ((opt a) ' b c')
|
230
231
|
def postfix(s)
|
231
232
|
debug("postfix") {"(#{s.inspect})"}
|
232
233
|
e, s = depth {primary(s)}
|
@@ -297,8 +298,8 @@ module EBNF
|
|
297
298
|
s.match(/(#x\h+)(.*)$/)
|
298
299
|
l, s = $1, $2
|
299
300
|
[[:hex, l], s]
|
300
|
-
when /[\w\.]/ # SYMBOL
|
301
|
-
s.match(
|
301
|
+
when '<', /[\w\.]/ # SYMBOL
|
302
|
+
s.match(/<?([\w\.]+)>?(.*)$/)
|
302
303
|
l, s = $1, $2
|
303
304
|
[l.to_sym, s]
|
304
305
|
when '-'
|
data/lib/ebnf/parser.rb
CHANGED
@@ -11,6 +11,12 @@ module EBNF
|
|
11
11
|
# @return [Array<EBNF::Rule>]
|
12
12
|
attr_reader :ast
|
13
13
|
|
14
|
+
# Set on first rule
|
15
|
+
attr_reader :lhs_includes_identifier
|
16
|
+
|
17
|
+
# Regular expression to match a [...] range, which may be distinguisehd from an LHS
|
18
|
+
attr_reader :range
|
19
|
+
|
14
20
|
# ## Terminals
|
15
21
|
# Define rules for Terminals, placing results on the input stack, making them available to upstream non-Terminal rules.
|
16
22
|
#
|
@@ -26,15 +32,32 @@ module EBNF
|
|
26
32
|
|
27
33
|
# Match the Left hand side of a rule or terminal
|
28
34
|
#
|
29
|
-
# [11] LHS ::= ('[' SYMBOL+ ']' ' '+)? SYMBOL ' '* '::='
|
35
|
+
# [11] LHS ::= ('[' SYMBOL+ ']' ' '+)? <? SYMBOL >? ' '* '::='
|
30
36
|
terminal(:LHS, LHS) do |value, prod|
|
31
|
-
value.to_s.scan(/(?:\[([^\]]+)\])?\s
|
37
|
+
md = value.to_s.scan(/(?:\[([^\]]+)\])?\s*<?(\w+)>?\s*::=/).first
|
38
|
+
if @lhs_includes_identifier.nil?
|
39
|
+
@lhs_includes_identifier = !md[0].nil?
|
40
|
+
@range = md[0] ? RANGE_NOT_LHS : RANGE
|
41
|
+
elsif @lhs_includes_identifier && !md[0]
|
42
|
+
error("LHS",
|
43
|
+
"Rule does not begin with a [xxx] identifier, which was established on the first rule",
|
44
|
+
production: :LHS,
|
45
|
+
rest: value)
|
46
|
+
elsif !@lhs_includes_identifier && md[0]
|
47
|
+
error("LHS",
|
48
|
+
"Rule begins with a [xxx] identifier, which was not established on the first rule",
|
49
|
+
production: :LHS,
|
50
|
+
rest: value)
|
51
|
+
end
|
52
|
+
md
|
32
53
|
end
|
33
54
|
|
34
55
|
# Match `SYMBOL` terminal
|
35
56
|
#
|
36
|
-
# [12] SYMBOL ::=
|
57
|
+
# [12] SYMBOL ::= '<' O_SYMBOL '>' | O_SYMBOL
|
58
|
+
# [12a] O_SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
|
37
59
|
terminal(:SYMBOL, SYMBOL) do |value|
|
60
|
+
value = value[1..-2] if value.start_with?('<') && value.end_with?('>')
|
38
61
|
value.to_sym
|
39
62
|
end
|
40
63
|
|
@@ -46,9 +69,10 @@ module EBNF
|
|
46
69
|
end
|
47
70
|
|
48
71
|
# Terminal for `RANGE` is matched as part of a `primary` rule.
|
72
|
+
# Note that this won't match if rules include identifiers.
|
49
73
|
#
|
50
|
-
# [14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
|
51
|
-
terminal(:RANGE,
|
74
|
+
# [14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
|
75
|
+
terminal(:RANGE, proc {@range}) do |value|
|
52
76
|
[:range, value[1..-2]]
|
53
77
|
end
|
54
78
|
|
@@ -128,7 +152,9 @@ module EBNF
|
|
128
152
|
# Invoke callback
|
129
153
|
id, sym = value[:LHS]
|
130
154
|
expression = value[:expression]
|
131
|
-
|
155
|
+
rule = EBNF::Rule.new(sym.to_sym, id, expression)
|
156
|
+
progress(:rule, rule.to_sxp)
|
157
|
+
callback.call(:rule, rule)
|
132
158
|
nil
|
133
159
|
end
|
134
160
|
|
@@ -266,12 +292,15 @@ module EBNF
|
|
266
292
|
# @return [EBNFParser]
|
267
293
|
def initialize(input, **options, &block)
|
268
294
|
# If the `level` option is set, instantiate a logger for collecting trace information.
|
269
|
-
if options.
|
270
|
-
options[:logger]
|
271
|
-
|
272
|
-
|
295
|
+
if options.key?(:level)
|
296
|
+
options[:logger] ||= Logger.new(STDERR).
|
297
|
+
tap {|x| x.level = options[:level]}.
|
298
|
+
tap {|x| x.formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}}
|
273
299
|
end
|
274
300
|
|
301
|
+
# This is established on the first rule.
|
302
|
+
self.class.instance_variable_set(:@lhs_includes_identifier, nil)
|
303
|
+
|
275
304
|
# Read input, if necessary, which will be used in a Scanner.
|
276
305
|
@input = input.respond_to?(:read) ? input.read : input.to_s
|
277
306
|
|
data/lib/ebnf/peg/parser.rb
CHANGED
@@ -68,10 +68,9 @@ module EBNF::PEG
|
|
68
68
|
#
|
69
69
|
# @param [Symbol] term
|
70
70
|
# The terminal name.
|
71
|
-
# @param [Regexp] regexp
|
72
|
-
# Pattern used to scan for this terminal
|
73
|
-
#
|
74
|
-
# If unset, the terminal rule is used for matching.
|
71
|
+
# @param [Regexp, Proc] regexp
|
72
|
+
# Pattern used to scan for this terminal.
|
73
|
+
# Passing a Proc will evaluate that proc to retrieve a regular expression.
|
75
74
|
# @param [Hash] options
|
76
75
|
# @option options [Boolean] :unescape
|
77
76
|
# Cause strings and codepoints to be unescaped.
|
@@ -83,8 +82,8 @@ module EBNF::PEG
|
|
83
82
|
# @yieldparam [Proc] block
|
84
83
|
# Block passed to initialization for yielding to calling parser.
|
85
84
|
# Should conform to the yield specs for #initialize
|
86
|
-
def terminal(term, regexp
|
87
|
-
terminal_regexps[term] = regexp
|
85
|
+
def terminal(term, regexp, **options, &block)
|
86
|
+
terminal_regexps[term] = regexp
|
88
87
|
terminal_handlers[term] = block if block_given?
|
89
88
|
terminal_options[term] = options.freeze
|
90
89
|
end
|
@@ -138,6 +137,8 @@ module EBNF::PEG
|
|
138
137
|
# @yieldparam [Proc] block
|
139
138
|
# Block passed to initialization for yielding to calling parser.
|
140
139
|
# Should conform to the yield specs for #initialize
|
140
|
+
# @yieldparam [Hash] **options
|
141
|
+
# Other data that may be passed to the production
|
141
142
|
# @yieldreturn [Object] the result of this production.
|
142
143
|
# Yield to generate a triple
|
143
144
|
def production(term, clear_packrat: false, &block)
|
@@ -183,6 +184,8 @@ module EBNF::PEG
|
|
183
184
|
# Identify the symbol of the starting rule with `start`.
|
184
185
|
# @param [Hash{Symbol => Object}] options
|
185
186
|
# @option options[Integer] :high_water passed to lexer
|
187
|
+
# @option options[:upper, :lower] :insensitive_strings
|
188
|
+
# Perform case-insensitive match of strings not defined as terminals, and map to either upper or lower case.
|
186
189
|
# @option options [Logger] :logger for errors/progress/debug.
|
187
190
|
# @option options[Integer] :low_water passed to lexer
|
188
191
|
# @option options[Boolean] :seq_hash (false)
|
@@ -201,7 +204,7 @@ module EBNF::PEG
|
|
201
204
|
# or errors raised during processing callbacks. Internal
|
202
205
|
# errors are raised using {Error}.
|
203
206
|
# @todo FIXME implement seq_hash
|
204
|
-
def parse(input = nil, start = nil, rules = nil, **options, &block)
|
207
|
+
def parse(input = nil, start = nil, rules = nil, insensitive_strings: nil, **options, &block)
|
205
208
|
start ||= options[:start]
|
206
209
|
rules ||= options[:rules] || []
|
207
210
|
@rules = rules.inject({}) {|memo, rule| memo.merge(rule.sym => rule)}
|
@@ -230,7 +233,7 @@ module EBNF::PEG
|
|
230
233
|
start_rule = @rules[start]
|
231
234
|
raise Error, "Starting production #{start.inspect} not defined" unless start_rule
|
232
235
|
|
233
|
-
result = start_rule.parse(scanner)
|
236
|
+
result = start_rule.parse(scanner, insensitive_strings: insensitive_strings)
|
234
237
|
if result == :unmatched
|
235
238
|
# Start rule wasn't matched, which is about the only error condition
|
236
239
|
error("--top--", @furthest_failure.to_s,
|
@@ -367,21 +370,17 @@ module EBNF::PEG
|
|
367
370
|
# Start for production
|
368
371
|
# Adds data avoiable during the processing of the production
|
369
372
|
#
|
373
|
+
# @param [Symbol] prod
|
374
|
+
# @param [Hash] **options other options available for handlers
|
370
375
|
# @return [Hash] composed of production options. Currently only `as_hash` is supported.
|
371
376
|
# @see ClassMethods#start_production
|
372
|
-
def onStart(prod)
|
377
|
+
def onStart(prod, **options)
|
373
378
|
handler = self.class.start_handlers[prod]
|
374
379
|
@productions << prod
|
375
|
-
debug("#{prod}(:start)", "",
|
376
|
-
lineno: (scanner.lineno if scanner),
|
377
|
-
pos: (scanner.pos if scanner)
|
378
|
-
) do
|
379
|
-
"#{prod}, pos: #{scanner ? scanner.pos : '?'}, rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
|
380
|
-
end
|
381
380
|
if handler
|
382
381
|
# Create a new production data element, potentially allowing handler
|
383
382
|
# to customize before pushing on the @prod_data stack
|
384
|
-
data = {_production: prod}
|
383
|
+
data = {_production: prod}.merge(options)
|
385
384
|
begin
|
386
385
|
self.class.eval_with_binding(self) {
|
387
386
|
handler.call(data, @parse_callback)
|
@@ -396,14 +395,21 @@ module EBNF::PEG
|
|
396
395
|
# explicit start handler
|
397
396
|
@prod_data << {_production: prod}
|
398
397
|
end
|
398
|
+
progress("#{prod}(:start)", "",
|
399
|
+
lineno: (scanner.lineno if scanner),
|
400
|
+
pos: (scanner.pos if scanner)
|
401
|
+
) do
|
402
|
+
"#{data.inspect}@(#{scanner ? scanner.pos : '?'}), rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
|
403
|
+
end
|
399
404
|
return self.class.start_options.fetch(prod, {}) # any options on this production
|
400
405
|
end
|
401
406
|
|
402
407
|
# Finish of production
|
403
408
|
#
|
404
409
|
# @param [Object] result parse result
|
410
|
+
# @param [Hash] **options other options available for handlers
|
405
411
|
# @return [Object] parse result, or the value returned from the handler
|
406
|
-
def onFinish(result)
|
412
|
+
def onFinish(result, **options)
|
407
413
|
#puts "prod_data(f): " + @prod_data.inspect
|
408
414
|
prod = @productions.last
|
409
415
|
handler, clear_packrat = self.class.production_handlers[prod]
|
@@ -415,14 +421,14 @@ module EBNF::PEG
|
|
415
421
|
# Pop production data element from stack, potentially allowing handler to use it
|
416
422
|
result = begin
|
417
423
|
self.class.eval_with_binding(self) {
|
418
|
-
handler.call(result, data, @parse_callback)
|
424
|
+
handler.call(result, data, @parse_callback, **options)
|
419
425
|
}
|
420
426
|
rescue ArgumentError, Error => e
|
421
427
|
error("finish", "#{e.class}: #{e.message}", production: prod, backtrace: e.backtrace)
|
422
428
|
@recovering = false
|
423
429
|
end
|
424
430
|
end
|
425
|
-
|
431
|
+
progress("#{prod}(:finish)", "",
|
426
432
|
lineno: (scanner.lineno if scanner),
|
427
433
|
level: result == :unmatched ? 0 : 1) do
|
428
434
|
"#{result.inspect}@(#{scanner ? scanner.pos : '?'}), rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
|
@@ -572,5 +578,5 @@ module EBNF::PEG
|
|
572
578
|
super(message.to_s)
|
573
579
|
end
|
574
580
|
end # class Error
|
575
|
-
end #
|
576
|
-
end # module EBNF::
|
581
|
+
end # module Parser
|
582
|
+
end # module EBNF::PEG
|
data/lib/ebnf/peg/rule.rb
CHANGED
@@ -13,7 +13,7 @@ module EBNF::PEG
|
|
13
13
|
##
|
14
14
|
# Parse a rule or terminal, invoking callbacks, as appropriate
|
15
15
|
|
16
|
-
# If there
|
16
|
+
# If there are `start_production` and/or `production` handlers,
|
17
17
|
# they are invoked with a `prod_data` stack, the input stream and offset.
|
18
18
|
# Otherwise, the results are added as an array value
|
19
19
|
# to a hash indexed by the rule name.
|
@@ -31,8 +31,9 @@ module EBNF::PEG
|
|
31
31
|
# * `star`: returns an array of the values matched for the specified production. For Terminals, these are concatenated into a single string.
|
32
32
|
#
|
33
33
|
# @param [Scanner] input
|
34
|
+
# @param [Hash] **options Other data that may be passed to handlers.
|
34
35
|
# @return [Hash{Symbol => Object}, :unmatched] A hash with keys for matched component of the expression. Returns :unmatched if the input does not match the production.
|
35
|
-
def parse(input)
|
36
|
+
def parse(input, **options)
|
36
37
|
# Save position and linenumber for backtracking
|
37
38
|
pos, lineno = input.pos, input.lineno
|
38
39
|
|
@@ -48,6 +49,7 @@ module EBNF::PEG
|
|
48
49
|
# use that to match the input,
|
49
50
|
# otherwise,
|
50
51
|
if regexp = parser.terminal_regexp(sym)
|
52
|
+
regexp = regexp.call() if regexp.is_a?(Proc)
|
51
53
|
term_opts = parser.terminal_options(sym)
|
52
54
|
if matched = input.scan(regexp)
|
53
55
|
# Optionally map matched
|
@@ -71,12 +73,12 @@ module EBNF::PEG
|
|
71
73
|
else
|
72
74
|
eat_whitespace(input)
|
73
75
|
end
|
74
|
-
start_options = parser.onStart(sym)
|
76
|
+
start_options = options.merge(parser.onStart(sym, **options))
|
75
77
|
string_regexp_opts = start_options[:insensitive_strings] ? Regexp::IGNORECASE : 0
|
76
78
|
|
77
79
|
result = case expr.first
|
78
80
|
when :alt
|
79
|
-
# Return the first expression to match.
|
81
|
+
# Return the first expression to match. Look at strings before terminals before non-terminals, with strings ordered by longest first
|
80
82
|
# Result is either :unmatched, or the value of the matching rule
|
81
83
|
alt = :unmatched
|
82
84
|
expr[1..-1].each do |prod|
|
@@ -84,14 +86,19 @@ module EBNF::PEG
|
|
84
86
|
when Symbol
|
85
87
|
rule = parser.find_rule(prod)
|
86
88
|
raise "No rule found for #{prod}" unless rule
|
87
|
-
rule.parse(input)
|
89
|
+
rule.parse(input, **options)
|
88
90
|
when String
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
91
|
+
# If the input matches a terminal for which the string is a prefix, don't match the string
|
92
|
+
if terminal_also_matches(input, prod, string_regexp_opts)
|
93
|
+
:unmatched
|
94
|
+
else
|
95
|
+
s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
|
96
|
+
case start_options[:insensitive_strings]
|
97
|
+
when :lower then s && s.downcase
|
98
|
+
when :upper then s && s.upcase
|
99
|
+
else s
|
100
|
+
end || :unmatched
|
101
|
+
end
|
95
102
|
end
|
96
103
|
if alt == :unmatched
|
97
104
|
# Update furthest failure for strings and terminals
|
@@ -127,9 +134,18 @@ module EBNF::PEG
|
|
127
134
|
when Symbol
|
128
135
|
rule = parser.find_rule(prod)
|
129
136
|
raise "No rule found for #{prod}" unless rule
|
130
|
-
rule.parse(input)
|
137
|
+
rule.parse(input, **options)
|
131
138
|
when String
|
132
|
-
input
|
139
|
+
if terminal_also_matches(input, prod, string_regexp_opts)
|
140
|
+
:unmatched
|
141
|
+
else
|
142
|
+
s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
|
143
|
+
case start_options[:insensitive_strings]
|
144
|
+
when :lower then s && s.downcase
|
145
|
+
when :upper then s && s.upcase
|
146
|
+
else s
|
147
|
+
end || :unmatched
|
148
|
+
end
|
133
149
|
end
|
134
150
|
if res != :unmatched
|
135
151
|
# Update furthest failure for terminals
|
@@ -148,7 +164,7 @@ module EBNF::PEG
|
|
148
164
|
when :plus
|
149
165
|
# Result is an array of all expressions while they match,
|
150
166
|
# at least one must match
|
151
|
-
plus = rept(input, 1, '*', expr[1], string_regexp_opts)
|
167
|
+
plus = rept(input, 1, '*', expr[1], string_regexp_opts, **options)
|
152
168
|
|
153
169
|
# Update furthest failure for strings and terminals
|
154
170
|
parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
|
@@ -163,7 +179,7 @@ module EBNF::PEG
|
|
163
179
|
when :rept
|
164
180
|
# Result is an array of all expressions while they match,
|
165
181
|
# an empty array of none match
|
166
|
-
rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts)
|
182
|
+
rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts, **options)
|
167
183
|
|
168
184
|
# # Update furthest failure for strings and terminals
|
169
185
|
parser.update_furthest_failure(input.pos, input.lineno, expr[3]) if terminal?
|
@@ -176,14 +192,18 @@ module EBNF::PEG
|
|
176
192
|
when Symbol
|
177
193
|
rule = parser.find_rule(prod)
|
178
194
|
raise "No rule found for #{prod}" unless rule
|
179
|
-
rule.parse(input)
|
195
|
+
rule.parse(input, **options.merge(_rept_data: accumulator))
|
180
196
|
when String
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
197
|
+
if terminal_also_matches(input, prod, string_regexp_opts)
|
198
|
+
:unmatched
|
199
|
+
else
|
200
|
+
s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
|
201
|
+
case start_options[:insensitive_strings]
|
202
|
+
when :lower then s && s.downcase
|
203
|
+
when :upper then s && s.upcase
|
204
|
+
else s
|
205
|
+
end || :unmatched
|
206
|
+
end
|
187
207
|
end
|
188
208
|
if res == :unmatched
|
189
209
|
# Update furthest failure for strings and terminals
|
@@ -204,7 +224,7 @@ module EBNF::PEG
|
|
204
224
|
when :star
|
205
225
|
# Result is an array of all expressions while they match,
|
206
226
|
# an empty array of none match
|
207
|
-
star = rept(input, 0, '*', expr[1], string_regexp_opts)
|
227
|
+
star = rept(input, 0, '*', expr[1], string_regexp_opts, **options)
|
208
228
|
|
209
229
|
# Update furthest failure for strings and terminals
|
210
230
|
parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
|
@@ -214,10 +234,11 @@ module EBNF::PEG
|
|
214
234
|
end
|
215
235
|
|
216
236
|
if result == :unmatched
|
237
|
+
# Rewind input to entry point if unmatched.
|
217
238
|
input.pos, input.lineno = pos, lineno
|
218
239
|
end
|
219
240
|
|
220
|
-
result = parser.onFinish(result)
|
241
|
+
result = parser.onFinish(result, **options)
|
221
242
|
(parser.packrat[sym] ||= {})[pos] = {
|
222
243
|
pos: input.pos,
|
223
244
|
lineno: input.lineno,
|
@@ -229,7 +250,8 @@ module EBNF::PEG
|
|
229
250
|
##
|
230
251
|
# Repitition, 0-1, 0-n, 1-n, ...
|
231
252
|
#
|
232
|
-
# Note, nil results are removed from the result, but count towards min/max calculations
|
253
|
+
# Note, nil results are removed from the result, but count towards min/max calculations.
|
254
|
+
# Saves temporary production data to prod_data stack.
|
233
255
|
#
|
234
256
|
# @param [Scanner] input
|
235
257
|
# @param [Integer] min
|
@@ -245,11 +267,12 @@ module EBNF::PEG
|
|
245
267
|
when Symbol
|
246
268
|
rule = parser.find_rule(prod)
|
247
269
|
raise "No rule found for #{prod}" unless rule
|
248
|
-
while (max == '*' || result.length < max) && (res = rule.parse(input)) != :unmatched
|
270
|
+
while (max == '*' || result.length < max) && (res = rule.parse(input, **options.merge(_rept_data: result))) != :unmatched
|
249
271
|
eat_whitespace(input) unless terminal?
|
250
272
|
result << res
|
251
273
|
end
|
252
274
|
when String
|
275
|
+
# FIXME: don't match a string, if input matches a terminal
|
253
276
|
while (res = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))) && (max == '*' || result.length < max)
|
254
277
|
eat_whitespace(input) unless terminal?
|
255
278
|
result << case options[:insensitive_strings]
|
@@ -263,6 +286,16 @@ module EBNF::PEG
|
|
263
286
|
result.length < min ? :unmatched : result.compact
|
264
287
|
end
|
265
288
|
|
289
|
+
##
|
290
|
+
# See if a terminal could have a longer match than a string
|
291
|
+
def terminal_also_matches(input, prod, string_regexp_opts)
|
292
|
+
str_regex = Regexp.new(Regexp.quote(prod), string_regexp_opts)
|
293
|
+
input.match?(str_regex) && parser.class.terminal_regexps.any? do |sym, re|
|
294
|
+
re = re.call() if re.is_a?(Proc)
|
295
|
+
(match_len = input.match?(re)) && match_len > prod.length
|
296
|
+
end
|
297
|
+
end
|
298
|
+
|
266
299
|
##
|
267
300
|
# Eat whitespace between non-terminal rules
|
268
301
|
def eat_whitespace(input)
|
data/lib/ebnf/terminals.rb
CHANGED
@@ -1,13 +1,14 @@
|
|
1
1
|
# encoding: utf-8
|
2
2
|
# Terminal definitions for the EBNF grammar
|
3
3
|
module EBNF::Terminals
|
4
|
-
SYMBOL_BASE = %r(\b[a-zA-Z0-9_\.]+\b)u.freeze
|
5
|
-
SYMBOL = %r(
|
4
|
+
SYMBOL_BASE = %r(\b[a-zA-Z0-9_\.]+\b)u.freeze # Word boundaries
|
5
|
+
SYMBOL = %r((?:#{SYMBOL_BASE}|(?:<#{SYMBOL_BASE}>))(?!\s*::=))u.freeze
|
6
6
|
HEX = %r(\#x\h+)u.freeze
|
7
7
|
CHAR = %r([\u0009\u000A\u000D\u0020-\uD7FF\u{10000}-\u{10FFFF}])u.freeze
|
8
8
|
R_CHAR = %r([\u0009\u000A\u000D\u0020-\u002C\u002E-\u005C\u005E-\uD7FF\u{10000}-\u{10FFFF}])u.freeze
|
9
|
-
|
10
|
-
|
9
|
+
LHS = %r((?:\[#{SYMBOL_BASE}\])?\s*<?#{SYMBOL_BASE}>?\s*::=)u.freeze
|
10
|
+
RANGE = %r(\[(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX})|#{R_CHAR}|#{HEX})+-?\])u.freeze
|
11
|
+
RANGE_NOT_LHS = %r(\[(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX})|#{R_CHAR}|#{HEX})+-?\](?!\s*<?#{SYMBOL_BASE}>?\s*::=))u.freeze
|
11
12
|
O_RANGE = %r(\[\^(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX}|#{R_CHAR}|#{HEX}))+-?\])u.freeze
|
12
13
|
STRING1 = %r("[\u0009\u000A\u000D\u0020\u0021\u0023-\uD7FF\u{10000}-\u{10FFFF}]*")u.freeze
|
13
14
|
STRING2 = %r('[\u0009\u000A\u000D\u0020-\u0026\u0028-\uD7FF\u{10000}-\u{10FFFF}]*')u.freeze
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ebnf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gregg Kellogg
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2024-12-01 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: sxp
|
@@ -80,6 +80,20 @@ dependencies:
|
|
80
80
|
- - "~>"
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: '1.8'
|
83
|
+
- !ruby/object:Gem::Dependency
|
84
|
+
name: base64
|
85
|
+
requirement: !ruby/object:Gem::Requirement
|
86
|
+
requirements:
|
87
|
+
- - "~>"
|
88
|
+
- !ruby/object:Gem::Version
|
89
|
+
version: '0.2'
|
90
|
+
type: :runtime
|
91
|
+
prerelease: false
|
92
|
+
version_requirements: !ruby/object:Gem::Requirement
|
93
|
+
requirements:
|
94
|
+
- - "~>"
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '0.2'
|
83
97
|
- !ruby/object:Gem::Dependency
|
84
98
|
name: amazing_print
|
85
99
|
requirement: !ruby/object:Gem::Requirement
|
@@ -293,7 +307,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
293
307
|
- !ruby/object:Gem::Version
|
294
308
|
version: '0'
|
295
309
|
requirements: []
|
296
|
-
rubygems_version: 3.
|
310
|
+
rubygems_version: 3.5.22
|
297
311
|
signing_key:
|
298
312
|
specification_version: 4
|
299
313
|
summary: EBNF parser and parser generator in Ruby.
|