ebnf 2.0.0 → 2.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +81 -36
- data/VERSION +1 -1
- data/bin/ebnf +34 -18
- data/etc/abnf-core.ebnf +52 -0
- data/etc/abnf.abnf +121 -0
- data/etc/abnf.ebnf +124 -0
- data/etc/abnf.sxp +45 -0
- data/etc/ebnf.ebnf +19 -25
- data/etc/ebnf.html +251 -206
- data/etc/ebnf.ll1.rb +27 -103
- data/etc/ebnf.ll1.sxp +105 -102
- data/etc/ebnf.peg.rb +54 -62
- data/etc/ebnf.peg.sxp +53 -62
- data/etc/ebnf.sxp +22 -19
- data/etc/iso-ebnf.ebnf +140 -0
- data/etc/iso-ebnf.isoebnf +138 -0
- data/etc/iso-ebnf.sxp +65 -0
- data/etc/sparql.ebnf +4 -4
- data/etc/sparql.sxp +8 -7
- data/etc/turtle.ebnf +3 -3
- data/etc/turtle.sxp +22 -20
- data/lib/ebnf.rb +3 -0
- data/lib/ebnf/abnf.rb +301 -0
- data/lib/ebnf/abnf/core.rb +23 -0
- data/lib/ebnf/abnf/meta.rb +111 -0
- data/lib/ebnf/base.rb +87 -44
- data/lib/ebnf/ebnf/meta.rb +90 -0
- data/lib/ebnf/isoebnf.rb +229 -0
- data/lib/ebnf/isoebnf/meta.rb +75 -0
- data/lib/ebnf/ll1.rb +4 -7
- data/lib/ebnf/ll1/parser.rb +12 -4
- data/lib/ebnf/native.rb +320 -0
- data/lib/ebnf/parser.rb +285 -302
- data/lib/ebnf/peg.rb +1 -1
- data/lib/ebnf/peg/parser.rb +24 -5
- data/lib/ebnf/peg/rule.rb +77 -58
- data/lib/ebnf/rule.rb +352 -121
- data/lib/ebnf/terminals.rb +13 -10
- data/lib/ebnf/writer.rb +550 -78
- metadata +48 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dc55292610eb978d5751361069f3b993d35db3a597442e2027cf0fd2ff886ba5
|
4
|
+
data.tar.gz: cc74cd0257a36fa3591f54becfdb51dfffbf44662598f2d67c6a36bf4e969e61
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bf7c7df32e027a0739b4830651dcff1f4b5186ff2177e1f57b4e952db33619660c4025a29ffc8dba5c7d0f5f5b95f2cbd432a379bc2ffb02d22f7ec6913a48e2
|
7
|
+
data.tar.gz: 909a8ff438172431054a33fb067198e32d98d5b88fa630978831af4f1dd728f0197512ee1f341667eb3039662db618122d7da5d0cc4cc55838625f685f7d7a9b
|
data/README.md
CHANGED
@@ -9,10 +9,17 @@
|
|
9
9
|
## Description
|
10
10
|
This is a [Ruby][] implementation of an [EBNF][] and [BNF][] parser and parser generator.
|
11
11
|
|
12
|
+
### [PEG][]/[Packrat][] Parser
|
13
|
+
In the primary mode, it supports a Parsing Expression Grammar ([PEG][]) parser generator. This performs more minmal transformations on the parsed grammar to extract sub-productions, which allows each component of a rule to generate its own parsing event.
|
14
|
+
|
15
|
+
The resulting {EBNF::PEG::Rule} objects then parse each associated rule according to the operator semantics and use a [Packrat][] memoizer to reduce extra work when backtracking.
|
16
|
+
|
17
|
+
These rules are driven using the {EBNF::PEG::Parser} module which calls invokes the starting rule and ensures that all input is consumed.
|
18
|
+
|
12
19
|
### LL(1) Parser
|
13
|
-
In
|
20
|
+
In another mode, it parses [EBNF][] grammars to [BNF][], generates [First/Follow][] and Branch tables for [LL(1)][] grammars, which can be used with the stream [Tokenizer][] and [LL(1) Parser][].
|
14
21
|
|
15
|
-
As LL(1) grammars operate using `alt` and `seq` primitives, allowing for a match on alternative productions or a sequence of productions, generating a parser requires turning the EBNF rules into BNF:
|
22
|
+
As LL(1) grammars operate using `alt` and `seq` primitives, allowing for a match on alternative productions or a sequence of productions, generating a parser requires turning the [EBNF][] rules into [BNF][]:
|
16
23
|
|
17
24
|
* Transform `a ::= b?` into `a ::= _empty | b`
|
18
25
|
* Transform `a ::= b+` into `a ::= b b*`
|
@@ -25,58 +32,77 @@ As LL(1) grammars operate using `alt` and `seq` primitives, allowing for a match
|
|
25
32
|
|
26
33
|
Of note in this implementation is that the tokenizer and parser are streaming, so that they can process inputs of arbitrary size.
|
27
34
|
|
28
|
-
|
35
|
+
The _exception operator_ (`A - B`) is only supported on terminals.
|
29
36
|
|
30
|
-
|
31
|
-
An additional Parsing Expression Grammar ([PEG][]) parser generator is also supported. This performs more minmal transformations on the parsed grammar to extract sub-productions, which allows each component of a rule to generate its own parsing event.
|
37
|
+
See {EBNF::LL1} and {EBNF::LL1::Parser} for further information.
|
32
38
|
|
33
39
|
## Usage
|
34
40
|
### Parsing an EBNF Grammar
|
35
41
|
|
36
42
|
require 'ebnf'
|
37
43
|
|
38
|
-
|
44
|
+
grammar = EBNF.parse(File.open('./etc/ebnf.ebnf'))
|
39
45
|
|
40
|
-
Output rules and terminals as S-Expressions, Turtle, HTML or BNF
|
46
|
+
Output rules and terminals as [S-Expressions][S-Expression], [Turtle][], HTML or [BNF][]
|
41
47
|
|
42
|
-
puts
|
43
|
-
puts
|
44
|
-
puts
|
45
|
-
puts
|
48
|
+
puts grammar.to_sxp
|
49
|
+
puts grammar.to_ttl
|
50
|
+
puts grammar.to_html
|
51
|
+
puts grammar.to_s
|
46
52
|
|
47
|
-
Transform EBNF to PEG (generates sub-rules for embedded expressions) and the RULES table as Ruby for parsing grammars:
|
53
|
+
Transform [EBNF][] to [PEG][] (generates sub-rules for embedded expressions) and the RULES table as Ruby for parsing grammars:
|
48
54
|
|
49
|
-
|
50
|
-
|
55
|
+
grammar.make_peg
|
56
|
+
grammar.to_ruby
|
51
57
|
|
52
|
-
Transform EBNF to BNF (generates sub-rules using `alt` or `seq` from `plus`, `star` or `opt`)
|
58
|
+
Transform [EBNF][] to [BNF][] (generates sub-rules using `alt` or `seq` from `plus`, `star` or `opt`)
|
53
59
|
|
54
|
-
|
60
|
+
grammar.make_bnf
|
55
61
|
|
56
62
|
Generate [First/Follow][] rules for BNF grammars (using "ebnf" as the starting production):
|
57
63
|
|
58
|
-
|
64
|
+
grammar.first_follow(:ebnf)
|
59
65
|
|
60
66
|
Generate Terminal, [First/Follow][], Cleanup and Branch tables as Ruby for parsing grammars:
|
61
67
|
|
62
|
-
|
63
|
-
|
68
|
+
grammar.build_tables
|
69
|
+
grammar.to_ruby
|
64
70
|
|
65
71
|
Generate formatted grammar using HTML (requires [Haml][Haml] gem):
|
66
72
|
|
67
|
-
|
73
|
+
grammar.to_html
|
74
|
+
|
75
|
+
### Parsing an ISO/IEC 14977 Grammar
|
76
|
+
|
77
|
+
The EBNF gem can also parse [ISO/EIC 14977] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
|
78
|
+
|
79
|
+
grammar = EBNF.parse(File.open('./etc/iso-ebnf.isoebnf', format: :isoebnf))
|
80
|
+
|
81
|
+
### Parsing an ABNF Grammar
|
82
|
+
|
83
|
+
The EBNF gem can also parse [ABNF] Grammars to [S-Expressions][S-Expression].
|
84
|
+
|
85
|
+
grammar = EBNF.parse(File.open('./etc/abnf.abnf', format: :abnf))
|
68
86
|
|
69
|
-
### Parser
|
87
|
+
### Parser Debugging
|
70
88
|
|
71
89
|
Inevitably while implementing a parser for some specific grammar, a developer will need greater insight into the operation of the parser. While this can involve sorting through a tremendous amount of data, the parser can be provided a [Logger][] instance which will output messages at varying levels of detail to document the state of the parser at any given point. Most useful is likely the `INFO` level of debugging, but even more detail is revealed using the `DEBUG` level. `WARN` and `ERROR` statements will typically also be provided as part of an exception if parsing fails, but can be shown in the context of other parsing state with appropriate indentation as part of the logger.
|
72
90
|
|
73
|
-
###
|
91
|
+
### Writing Grammars
|
92
|
+
|
93
|
+
The {EBNF::Writer} class can be used to write parsed grammars out, either as formatted text, or HTML. Because grammars are written from the Abstract Syntax Tree, represented as [S-Expressions][S-Expression], this provides a means of transforming between grammar formats (e.g., W3C [EBNF][] to [ABNF][]), although with some potential loss in semantic fidelity (case-insensitive string matching vs. case-sensitive matching).
|
94
|
+
|
95
|
+
The formatted HTML results are designed to be appropriate for including in specifications.
|
96
|
+
|
97
|
+
### Parser Errors
|
74
98
|
On a parsing failure, and exception is raised with information that may be useful in determining the source of the error.
|
75
99
|
|
76
100
|
## EBNF Grammar
|
77
101
|
The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][] (see {file:etc/ebnf.ebnf EBNF grammar}) as defined in the
|
78
102
|
[XML 1.0 recommendation](https://www.w3.org/TR/REC-xml/), with minor extensions:
|
79
103
|
|
104
|
+
The character set for EBNF is UTF-8.
|
105
|
+
|
80
106
|
The general form of a rule is:
|
81
107
|
|
82
108
|
symbol ::= expression
|
@@ -85,7 +111,9 @@ which can also be proceeded by an optional number enclosed in square brackets to
|
|
85
111
|
|
86
112
|
[1] symbol ::= expression
|
87
113
|
|
88
|
-
|
114
|
+
(Note, this can introduce an ambiguity if the previous rule ends in a range or enum and the current rule has no identifier. In this case, enclosing `expression` within parentheses, or adding intervening comments can resolve the ambiguity.)
|
115
|
+
|
116
|
+
Symbols are written in CAPITAL CASE if they are the start symbol of a regular language (terminals), otherwise with they are treated as non-terminal rules. Literal strings are quoted.
|
89
117
|
|
90
118
|
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:
|
91
119
|
|
@@ -93,13 +121,13 @@ Within the expression on the right-hand side of a rule, the following expression
|
|
93
121
|
<tr><td><code>#xN</code></td>
|
94
122
|
<td>where <code>N</code> is a hexadecimal integer, the expression matches the character whose number (code point) in ISO/IEC 10646 is <code>N</code>. The number of leading zeros in the <code>#xN</code> form is insignificant.</td></tr>
|
95
123
|
<tr><td><code>[a-zA-Z], [#xN-#xN]</code>
|
96
|
-
<td>matches any Char with a value in the range(s) indicated (inclusive).</td></tr>
|
124
|
+
<td>matches any Char or HEX with a value in the range(s) indicated (inclusive).</td></tr>
|
97
125
|
<tr><td><code>[abc], [#xN#xN#xN]</code></td>
|
98
|
-
<td>matches any
|
126
|
+
<td>matches any UTF-8 R\_CHAR or HEX with a value among the characters enumerated. The last component may be '-'. Enumerations and ranges may be mixed in one set of brackets.</td></tr>
|
99
127
|
<tr><td><code>[^a-z], [^#xN-#xN]</code></td>
|
100
|
-
<td>matches any Char
|
128
|
+
<td>matches any UTF-8 Char or HEX a value outside the range indicated.</td></tr>
|
101
129
|
<tr><td><code>[^abc], [^#xN#xN#xN]</code></td>
|
102
|
-
<td>matches any
|
130
|
+
<td>matches any UTF-8 R\_CHAR or HEX with a value not among the characters given. The last component may be '-'. Enumerations and ranges of excluded values may be mixed in one set of brackets.</td></tr>
|
103
131
|
<tr><td><code>"string"</code></td>
|
104
132
|
<td>matches a literal string matching that given inside the double quotes.</td></tr>
|
105
133
|
<tr><td><code>'string'</code></td>
|
@@ -113,7 +141,7 @@ Within the expression on the right-hand side of a rule, the following expression
|
|
113
141
|
<tr><td><code>A | B</code></td>
|
114
142
|
<td>matches <code>A</code> or <code>B</code>.</td></tr>
|
115
143
|
<tr><td><code>A - B</code></td>
|
116
|
-
<td>matches any string that matches <code>A</code> but does not match <code>B</code
|
144
|
+
<td>matches any string that matches <code>A</code> but does not match <code>B</code>. (Only supported on Terminals in LL(1) BNF).</td></tr>
|
117
145
|
<tr><td><code>A+</code></td>
|
118
146
|
<td>matches one or more occurrences of <code>A</code>. Concatenation has higher precedence than alternation; thus <code>A+ | B+</code> is identical to <code>(A+) | (B+)</code>.</td></tr>
|
119
147
|
<tr><td><code>A*</code></td>
|
@@ -130,10 +158,10 @@ Within the expression on the right-hand side of a rule, the following expression
|
|
130
158
|
* `@pass` defines the expression used to detect whitespace, which is removed in processing.
|
131
159
|
* No support for `wfc` (well-formedness constraint) or `vc` (validity constraint).
|
132
160
|
|
133
|
-
Parsing this grammar yields an S-Expression version: {file:etc/ebnf.sxp} (or [LL(1)][] version {file:etc/ebnf.ll1.sxp} or [PEG][] version {file:etc/ebnf.peg.sxp}).
|
161
|
+
Parsing this grammar yields an [S-Expression][] version: {file:etc/ebnf.sxp} (or [LL(1)][] version {file:etc/ebnf.ll1.sxp} or [PEG][] version {file:etc/ebnf.peg.sxp}).
|
134
162
|
|
135
163
|
### Parser S-Expressions
|
136
|
-
Intermediate representations of the grammar may be serialized to Lisp-like S-Expressions. For example, the rule
|
164
|
+
Intermediate representations of the grammar may be serialized to Lisp-like [S-Expressions][S-Expression]. For example, the rule
|
137
165
|
|
138
166
|
[1] ebnf ::= (declaration | rule)*
|
139
167
|
|
@@ -155,13 +183,23 @@ Different components of an EBNF rule expression are transformed into their own o
|
|
155
183
|
<tr><td><code>A?</code></td><td><code>(opt A)</code></td></tr>
|
156
184
|
<tr><td><code>A B</code></td><td><code>(seq A B)</code></td></tr>
|
157
185
|
<tr><td><code>A | B</code></td><td><code>(alt A B)</code></td></tr>
|
158
|
-
<tr><td><code>A - B</code></td
|
186
|
+
<tr><td><code>A - B</code></td>
|
187
|
+
<td><code>(diff A B) for terminals.<br/>
|
188
|
+
<code>(seq (not B) A) for non-terminals (PEG parsing only)</code></code></td></tr>
|
159
189
|
<tr><td><code>A+</code></td><td><code>(plus A)</code></td></tr>
|
160
190
|
<tr><td><code>A*</code></td><td><code>(star A)</code></td></tr>
|
161
|
-
<tr><td><code>@pass " "*</code></td><td><code>(pass (star " "))</code></td></tr>
|
191
|
+
<tr><td><code>@pass " "*</code></td><td><code>(pass _pass (star " "))</code></td></tr>
|
162
192
|
<tr><td><code>@terminals</code></td><td></td></tr>
|
163
193
|
</table>
|
164
194
|
|
195
|
+
Other rule operators are not directly supported in [EBNF][], but are included to support other notations (e.g., [ABNF][] and [ISO/IEC 14977][]):
|
196
|
+
|
197
|
+
<table>
|
198
|
+
<tr><td><code>%i"StRiNg"</code></td><td><code>(istr "StRiNg")</code></td><td>Case-insensitive string matching</td></tr>
|
199
|
+
<tr><td><code>'' - A</code></td><td><code>(not A)</code></td><td>Negative look-ahead, used for non-terminal uses of `B - A`.</td></tr>
|
200
|
+
<tr><td><code>n*mA</code></td><td><code>(rept n m A)</code></td><td>Explicit repetition.</td></tr>
|
201
|
+
</table>
|
202
|
+
|
165
203
|
Additionally, rules defined with an UPPERCASE symbol are treated as terminals.
|
166
204
|
|
167
205
|
For an [LL(1)][] parser generator, the {EBNF::BNF.make_bnf} method can be used to transform the EBNF rule into a BNF rule.
|
@@ -179,12 +217,16 @@ For a [PEG][] parser generator, there is a simpler transformation that reduces r
|
|
179
217
|
(rule _ebnf_1 "1.1" (alt declaration rule))
|
180
218
|
|
181
219
|
## Example parsers
|
182
|
-
For a [PEG][] parser for a simple grammar implementing a calculator see [Calc example](
|
220
|
+
For a [PEG][] parser for a simple grammar implementing a calculator see [Calc example](https://dryruby.github.io/ebnf/examples/calc/doc/calc.html)
|
183
221
|
|
184
|
-
For an example parser built using this gem that parses the [EBNF][] grammar, see [EBNF PEG Parser example](
|
222
|
+
For an example parser built using this gem that parses the [EBNF][] grammar, see [EBNF PEG Parser example](https://dryruby.github.io/ebnf/examples/ebnf-peg-parser/doc/parser.html). This example creates a parser for the [EBNF][] grammar which generates the same Abstract Syntax Tree as the built-in parser in the gem.
|
185
223
|
|
186
224
|
There is also an
|
187
|
-
[EBNF LL(1) Parser example](
|
225
|
+
[EBNF LL(1) Parser example](https://dryruby.github.io/ebnf/examples/ebnf-peg-parser/doc/parser.html).
|
226
|
+
|
227
|
+
The [ISO EBNF Parser](https://dryruby.github.io/ebnf/examples/isoebnf/doc/parser.html) example parses [ISO/IEC 14977][] into [S-Expressions][S-Expression], which can be used to parse compatible grammars using this parser (either [PEG][] or [LL(1)][]).
|
228
|
+
|
229
|
+
The [ABNF Parser](https://dryruby.github.io/ebnf/examples/abnf/doc/parser.html) example parses [ABNF][] into [S-Expressions][S-Expression], which can be used to parse compatible grammars using this [PEG][] parser.
|
188
230
|
|
189
231
|
## Acknowledgements
|
190
232
|
Much of this work, particularly the generic parser, is inspired by work originally done by
|
@@ -229,16 +271,19 @@ A copy of the [Turtle EBNF][] and derived parser files are included in the repos
|
|
229
271
|
[YARD]: https://yardoc.org/
|
230
272
|
[YARD-GS]: https://rubydoc.info/docs/yard/file/docs/GettingStarted.md
|
231
273
|
[PDD]: https://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
|
274
|
+
[ABNF]: https://www.rfc-editor.org/rfc/rfc5234
|
232
275
|
[BNF]: https://en.wikipedia.org/wiki/Backus–Naur_form
|
233
276
|
[EBNF]: https://www.w3.org/TR/REC-xml/#sec-notation
|
234
277
|
[EBNF doc]: https://rubydoc.info/github/dryruby/ebnf
|
235
278
|
[First/Follow]: https://en.wikipedia.org/wiki/LL_parser#Constructing_an_LL.281.29_parsing_table
|
279
|
+
[ISO/IEC 14977]:https://www.iso.org/standard/26153.html
|
236
280
|
[LL(1)]: https://www.csd.uwo.ca/~moreno//CS447/Lectures/Syntax.html/node14.html
|
237
281
|
[LL(1) Parser]: https://en.wikipedia.org/wiki/LL_parser
|
238
282
|
[Logger]: https://ruby-doc.org/stdlib-2.4.0/libdoc/logger/rdoc/Logger.html
|
283
|
+
[S-expression]: https://en.wikipedia.org/wiki/S-expression
|
239
284
|
[Tokenizer]: https://en.wikipedia.org/wiki/Lexical_analysis#Tokenizer
|
285
|
+
[Turtle]: https://www.w3.org/TR/2012/WD-turtle-20120710/
|
240
286
|
[Turtle EBNF]: https://dvcs.w3.org/hg/rdf/file/default/rdf-turtle/turtle.bnf
|
241
287
|
[Packrat]: https://pdos.csail.mit.edu/~baford/packrat/thesis/
|
242
288
|
[PEG]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
|
243
|
-
[Treetop]: https://rubygems.org/gems/treetop
|
244
289
|
[Haml]: https://rubygems.org/gems/haml
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
2.
|
1
|
+
2.1.0
|
data/bin/ebnf
CHANGED
@@ -15,6 +15,7 @@ options = {
|
|
15
15
|
output_format: :sxp,
|
16
16
|
prefix: "ttl",
|
17
17
|
namespace: "http://www.w3.org/ns/formats/Turtle#",
|
18
|
+
level: 4
|
18
19
|
}
|
19
20
|
|
20
21
|
input, out = nil, STDOUT
|
@@ -23,15 +24,17 @@ OPT_ARGS = [
|
|
23
24
|
["--debug", GetoptLong::NO_ARGUMENT, "Turn on debugging output"],
|
24
25
|
["--bnf", GetoptLong::NO_ARGUMENT, "Transform EBNF to BNF"],
|
25
26
|
["--evaluate","-e", GetoptLong::REQUIRED_ARGUMENT,"Evaluate argument as an EBNF document"],
|
27
|
+
["--format", "-f", GetoptLong::REQUIRED_ARGUMENT,"Specify output format one of abnf, abnfh, ebnf, html, isoebnf, isoebnfh, ttl, sxp, or rb"],
|
28
|
+
["--input-format", GetoptLong::REQUIRED_ARGUMENT,"Specify input format one of abnf, ebnf isoebnf, native, or sxp"],
|
26
29
|
["--ll1", GetoptLong::REQUIRED_ARGUMENT,"Generate First/Follow rules, argument is start symbol"],
|
27
|
-
["--format", "-f", GetoptLong::REQUIRED_ARGUMENT,"Specify output format one of ebnf, html, ttl, sxp, or rb"],
|
28
|
-
["--input-format", GetoptLong::REQUIRED_ARGUMENT,"Specify input format one of ebnf or sxp"],
|
29
30
|
["--mod-name", GetoptLong::REQUIRED_ARGUMENT,"Module name used when creating ruby tables"],
|
31
|
+
["--namespace", "-n", GetoptLong::REQUIRED_ARGUMENT,"Namespace to use when generating Turtle"],
|
30
32
|
["--output", "-o", GetoptLong::REQUIRED_ARGUMENT,"Output to the specified file path"],
|
31
33
|
["--peg", GetoptLong::NO_ARGUMENT, "Transform EBNF to PEG"],
|
32
34
|
["--prefix", "-p", GetoptLong::REQUIRED_ARGUMENT,"Prefix to use when generating Turtle"],
|
33
35
|
["--progress", "-v", GetoptLong::NO_ARGUMENT, "Detail on execution"],
|
34
|
-
["--
|
36
|
+
["--renumber", GetoptLong::NO_ARGUMENT, "Renumber parsed reules"],
|
37
|
+
["--validate", GetoptLong::NO_ARGUMENT, "Validate grammar"],
|
35
38
|
["--help", "-?", GetoptLong::NO_ARGUMENT, "This message"]
|
36
39
|
]
|
37
40
|
def usage
|
@@ -54,27 +57,34 @@ opts = GetoptLong.new(*OPT_ARGS.map {|o| o[0..-2]})
|
|
54
57
|
|
55
58
|
opts.each do |opt, arg|
|
56
59
|
case opt
|
57
|
-
when '--debug' then options[:
|
60
|
+
when '--debug' then options[:level] = 0
|
58
61
|
when '--bnf' then options[:bnf] = true
|
59
62
|
when '--evaluate' then input = arg
|
60
|
-
when '--input-format'
|
61
|
-
|
63
|
+
when '--input-format'
|
64
|
+
unless %w(abnf ebnf isoebnf native sxp).include?(arg)
|
65
|
+
STDERR.puts("unrecognized input format #{arg}")
|
66
|
+
usage
|
67
|
+
end
|
68
|
+
options[:format] = arg.to_sym
|
69
|
+
when '--format'
|
70
|
+
unless %w(abnf abnfh ebnf html isoebnf isoebnfh rb sxp).include?(arg)
|
71
|
+
STDERR.puts("unrecognized output format #{arg}")
|
72
|
+
usage
|
73
|
+
end
|
74
|
+
options[:output_format] = arg.to_sym
|
62
75
|
when '--ll1' then (options[:ll1] ||= []) << arg.to_sym
|
63
76
|
when '--mod-name' then options[:mod_name] = arg
|
64
77
|
when '--output' then out = File.open(arg, "w")
|
65
78
|
when '--peg' then options[:peg] = true
|
66
79
|
when '--prefix' then options[:prefix] = arg
|
80
|
+
when '--renumber' then options[:renumber] = true
|
67
81
|
when '--namespace' then options[:namespace] = arg
|
68
|
-
when '--progress' then options[:
|
82
|
+
when '--progress' then options[:level] = 1 unless options[:level] == 0
|
83
|
+
when '--validate' then options[:validate] = true
|
69
84
|
when '--help' then usage
|
70
85
|
end
|
71
86
|
end
|
72
87
|
|
73
|
-
if options[:output_format] == :rb && !(options[:ll1] || options[:peg])
|
74
|
-
STDERR.puts "outputing in .rb format requires --ll or --peg"
|
75
|
-
exit(1)
|
76
|
-
end
|
77
|
-
|
78
88
|
input = File.open(ARGV[0]) if ARGV[0]
|
79
89
|
|
80
90
|
ebnf = EBNF.parse(input || STDIN, **options)
|
@@ -85,13 +95,19 @@ if options[:ll1]
|
|
85
95
|
ebnf.build_tables
|
86
96
|
end
|
87
97
|
|
98
|
+
ebnf.renumber! if options[:renumber]
|
99
|
+
|
88
100
|
res = case options[:output_format]
|
89
|
-
when :
|
90
|
-
when :
|
91
|
-
when :
|
92
|
-
when :
|
93
|
-
when :
|
94
|
-
|
101
|
+
when :abnf then ebnf.to_s(format: :abnf)
|
102
|
+
when :abnfh then ebnf.to_html(format: :abnf)
|
103
|
+
when :ebnf then ebnf.to_s
|
104
|
+
when :html then ebnf.to_html
|
105
|
+
when :isoebnf then ebnf.to_s(format: :isoebnf)
|
106
|
+
when :isoebnfh then ebnf.to_html(format: :isoebnf)
|
107
|
+
when :sxp then ebnf.to_sxp
|
108
|
+
when :ttl then ebnf.to_ttl(options[:prefix], options[:namespace])
|
109
|
+
when :rb then ebnf.to_ruby(out, grammarFile: ARGV[0], **options)
|
110
|
+
else ebnf.ast.inspect
|
95
111
|
end
|
96
112
|
|
97
113
|
out.puts res
|
data/etc/abnf-core.ebnf
ADDED
@@ -0,0 +1,52 @@
|
|
1
|
+
# Core terminals available in uses of ABNF
|
2
|
+
ALPHA ::= [#x41-#x5A#x61-#x7A] # A-Z | a-z
|
3
|
+
|
4
|
+
BIT ::= '0' | '1'
|
5
|
+
|
6
|
+
CHAR ::= [#x01-#x7F]
|
7
|
+
# any 7-bit US-ASCII character,
|
8
|
+
# excluding NUL
|
9
|
+
CR ::= #x0D
|
10
|
+
# carriage return
|
11
|
+
|
12
|
+
CRLF ::= CR? LF
|
13
|
+
# Internet standard newline
|
14
|
+
|
15
|
+
CTL ::= [#x00-#x1F] | #x7F
|
16
|
+
# controls
|
17
|
+
|
18
|
+
DIGIT ::= [#x30-#x39]
|
19
|
+
# 0-9
|
20
|
+
|
21
|
+
DQUOTE ::= #x22
|
22
|
+
# " (Double Quote)
|
23
|
+
|
24
|
+
HEXDIG ::= DIGIT | [A-F] # [0-9A-F]
|
25
|
+
|
26
|
+
HTAB ::= #x09
|
27
|
+
# horizontal tab
|
28
|
+
|
29
|
+
LF ::= #x0A
|
30
|
+
# linefeed
|
31
|
+
|
32
|
+
LWSP ::= (WSP | CRLF WSP)*
|
33
|
+
# Use of this linear-white-space rule
|
34
|
+
# permits lines containing only white
|
35
|
+
# space that are no longer legal in
|
36
|
+
# mail headers and have caused
|
37
|
+
# interoperability problems in other
|
38
|
+
# contexts.
|
39
|
+
# Do not use when defining mail
|
40
|
+
# headers and use with caution in
|
41
|
+
# other contexts.
|
42
|
+
|
43
|
+
OCTET ::= [#x00-#xFF]
|
44
|
+
# 8 bits of data
|
45
|
+
|
46
|
+
SP ::= #x20
|
47
|
+
|
48
|
+
VCHAR ::= [#x21-#x7E]
|
49
|
+
# visible (printing) characters
|
50
|
+
|
51
|
+
WSP ::= SP | HTAB
|
52
|
+
# white space
|
data/etc/abnf.abnf
ADDED
@@ -0,0 +1,121 @@
|
|
1
|
+
rulelist = 1*( rule / (*c-wsp c-nl) )
|
2
|
+
|
3
|
+
rule = rulename defined-as elements c-nl
|
4
|
+
; continues if next line starts
|
5
|
+
; with white space
|
6
|
+
|
7
|
+
rulename = ALPHA *(ALPHA / DIGIT / "-")
|
8
|
+
|
9
|
+
defined-as = *c-wsp ("=" / "=/") *c-wsp
|
10
|
+
; basic rules definition and
|
11
|
+
; incremental alternatives
|
12
|
+
|
13
|
+
elements = alternation *c-wsp
|
14
|
+
|
15
|
+
c-wsp = WSP / (c-nl WSP)
|
16
|
+
|
17
|
+
c-nl = comment / CRLF
|
18
|
+
; comment or newline
|
19
|
+
|
20
|
+
comment = ";" *(WSP / VCHAR) CRLF
|
21
|
+
|
22
|
+
alternation = concatenation
|
23
|
+
*(*c-wsp "/" *c-wsp concatenation)
|
24
|
+
|
25
|
+
concatenation = repetition *(1*c-wsp repetition)
|
26
|
+
|
27
|
+
repetition = [repeat] element
|
28
|
+
|
29
|
+
repeat = (*DIGIT "*" *DIGIT) / 1*DIGIT
|
30
|
+
|
31
|
+
element = rulename / group / option /
|
32
|
+
char-val / num-val / prose-val
|
33
|
+
|
34
|
+
group = "(" *c-wsp alternation *c-wsp ")"
|
35
|
+
|
36
|
+
option = "[" *c-wsp alternation *c-wsp "]"
|
37
|
+
|
38
|
+
char-val = case-insensitive-string /
|
39
|
+
case-sensitive-string
|
40
|
+
|
41
|
+
case-insensitive-string =
|
42
|
+
[ "%i" ] quoted-string
|
43
|
+
|
44
|
+
case-sensitive-string =
|
45
|
+
"%s" quoted-string
|
46
|
+
|
47
|
+
quoted-string = DQUOTE *(%x20-21 / %x23-7E) DQUOTE
|
48
|
+
; quoted string of SP and VCHAR
|
49
|
+
; without DQUOTE
|
50
|
+
|
51
|
+
num-val = "%" (bin-val / dec-val / hex-val)
|
52
|
+
|
53
|
+
bin-val = "b" 1*BIT
|
54
|
+
[ 1*("." 1*BIT) / ("-" 1*BIT) ]
|
55
|
+
; series of concatenated bit values
|
56
|
+
; or single ONEOF range
|
57
|
+
|
58
|
+
dec-val = "d" 1*DIGIT
|
59
|
+
[ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]
|
60
|
+
|
61
|
+
hex-val = "x" 1*HEXDIG
|
62
|
+
[ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
|
63
|
+
|
64
|
+
prose-val = "<" *(%x20-3D / %x3F-7E) ">"
|
65
|
+
; bracketed string of SP and VCHAR
|
66
|
+
; without angles
|
67
|
+
; prose description, to be used as
|
68
|
+
; last resort
|
69
|
+
|
70
|
+
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
|
71
|
+
|
72
|
+
BIT = "0" / "1"
|
73
|
+
|
74
|
+
CHAR = %x01-7F
|
75
|
+
; any 7-bit US-ASCII character,
|
76
|
+
; excluding NUL
|
77
|
+
CR = %x0D
|
78
|
+
; carriage return
|
79
|
+
|
80
|
+
CRLF = [CR] LF
|
81
|
+
; Internet standard newline
|
82
|
+
; Extended to allow only newline
|
83
|
+
|
84
|
+
CTL = %x00-1F / %x7F
|
85
|
+
; controls
|
86
|
+
|
87
|
+
DIGIT = %x30-39
|
88
|
+
; 0-9
|
89
|
+
|
90
|
+
DQUOTE = %x22
|
91
|
+
; " (Double Quote)
|
92
|
+
|
93
|
+
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
|
94
|
+
|
95
|
+
HTAB = %x09
|
96
|
+
; horizontal tab
|
97
|
+
|
98
|
+
LF = %x0A
|
99
|
+
; linefeed
|
100
|
+
|
101
|
+
LWSP = *(WSP / CRLF WSP)
|
102
|
+
; Use of this linear-white-space rule
|
103
|
+
; permits lines containing only white
|
104
|
+
; space that are no longer legal in
|
105
|
+
; mail headers and have caused
|
106
|
+
; interoperability problems in other
|
107
|
+
; contexts.
|
108
|
+
; Do not use when defining mail
|
109
|
+
; headers and use with caution in
|
110
|
+
; other contexts.
|
111
|
+
|
112
|
+
OCTET = %x00-FF
|
113
|
+
; 8 bits of data
|
114
|
+
|
115
|
+
SP = %x20
|
116
|
+
|
117
|
+
VCHAR = %x21-7E
|
118
|
+
; visible (printing) characters
|
119
|
+
|
120
|
+
WSP = SP / HTAB
|
121
|
+
; white space
|