ebnf 2.5.0 → 2.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 76a49d9fe4e2cf23f7bde8a55f600bfad24d0f3e55ecaf8a9c0a2e21b7e1310d
4
- data.tar.gz: fbf204b9ff0bbe4dff0056ba75c6ed535269584bac336d041f904d2d3fb2e571
3
+ metadata.gz: 2cacadf02a11bd000711f0e3b68a343152fab195e04315210887a0cb9576a813
4
+ data.tar.gz: 400eaa6a4dfdc177dcafc80cdb535a63a31c7af092c8f01b00e880d4d021b7bc
5
5
  SHA512:
6
- metadata.gz: 74963a92956d8cbf2fe24f6df8fe0584667f941f42fd523c8a5e3f7f39f3bd0c74eaf8480ed22cc54d358cfd350630fcd6c84449dfff6591dc41e95174fd06b9
7
- data.tar.gz: 02c0844b57ffd898764a4cd3dd163105b8e4d32f34df20f3874b2f7dd2590e14ddff98d72f8fd40f2573897f858327ba7eee8e6cff21f551f9882a9c4f82514f
6
+ metadata.gz: 6968633fa5be00518afc4138f02f971ea88d74322f8a24f84023c7fcc306d8d38a930e6f23144c01f25745c9ad86c0659c30d739d65762071476a0830d03a5aa
7
+ data.tar.gz: e3878b45bc9e553e78c60e9cc31ca84fda40e8a40986797764e1240ece08009c8ac092c8fdf852522e182d3928f7494636d960d70bf469d7e6d346e09bf7d2a9
data/README.md CHANGED
@@ -26,10 +26,9 @@ As LL(1) grammars operate using `alt` and `seq` primitives, allowing for a match
26
26
  * Transform `a ::= b+` into `a ::= b b*`
27
27
  * Transform `a ::= b*` into `a ::= _empty | (b a)`
28
28
  * Transform `a ::= op1 (op2)` into two rules:
29
- ```
30
- a ::= op1 _a_1
31
- _a_1_ ::= op2
32
- ```
29
+
30
+ a ::= op1 _a_1
31
+ _a_1_ ::= op2
33
32
 
34
33
  Of note in this implementation is that the tokenizer and parser are streaming, so that they can process inputs of arbitrary size.
35
34
 
@@ -75,7 +74,7 @@ Generate formatted grammar using HTML (requires [Haml][Haml] gem):
75
74
 
76
75
  ### Parsing an ISO/IEC 14977 Grammar
77
76
 
78
- The EBNF gem can also parse [ISO/EIC 14977] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
77
+ The EBNF gem can also parse [ISO/IEC 14977][] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
79
78
 
80
79
  grammar = EBNF.parse(File.open('./etc/iso-ebnf.isoebnf'), format: :isoebnf)
81
80
 
@@ -96,7 +95,7 @@ The {EBNF::Writer} class can be used to write parsed grammars out, either as for
96
95
  The formatted HTML results are designed to be appropriate for including in specifications.
97
96
 
98
97
  ### Parser Errors
99
- On a parsing failure, and exception is raised with information that may be useful in determining the source of the error.
98
+ On a parsing failure, an exception is raised with information that may be useful in determining the source of the error.
100
99
 
101
100
  ## EBNF Grammar
102
101
  The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][]
@@ -104,7 +103,7 @@ The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][]
104
103
  as defined in the
105
104
  [XML 1.0 recommendation](https://www.w3.org/TR/REC-xml/), with minor extensions:
106
105
 
107
- Note that the grammar includes an optional `[identifer]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
106
+ Note that the grammar includes an optional `[number]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
108
107
 
109
108
  The character set for EBNF is UTF-8.
110
109
 
@@ -116,7 +115,7 @@ which can also be proceeded by an optional number enclosed in square brackets to
116
115
 
117
116
  [1] symbol ::= expression
118
117
 
119
- (Note, this can introduce an ambiguity if the previous rule ends in a range or enum and the current rule has no identifier. In this case, enclosing `expression` within parentheses, or adding intervening comments can resolve the ambiguity.)
118
+ (Note, introduces an ambiguity if the previous rule ends in a range or enum and the current rule has no number. The parsers dynamically determine the terminal rules for the `LHS` (the identifier, symbol, and `::=`) and `RANGE`).
120
119
 
121
120
  Symbols are written in CAPITAL CASE if they are the start symbol of a regular language (terminals), otherwise with they are treated as non-terminal rules. Literal strings are quoted.
122
121
 
@@ -134,7 +133,7 @@ Within the expression on the right-hand side of a rule, the following expression
134
133
  <tr><td><code>[^abc], [^#xN#xN#xN]</code></td>
135
134
  <td>matches any UTF-8 R\_CHAR or HEX with a value not among the characters given. The last component may be '-'. Enumerations and ranges of excluded values may be mixed in one set of brackets.</td></tr>
136
135
  <tr><td><code>"string"</code></td>
137
- <td>matches a literal string matching that given inside the double quotes.</td></tr>
136
+ <td>matches a literal string matching that given inside the double quotes case insensitively.</td></tr>
138
137
  <tr><td><code>'string'</code></td>
139
138
  <td>matches a literal string matching that given inside the single quotes.</td></tr>
140
139
  <tr><td><code>A (B | C)</code></td>
@@ -158,7 +157,8 @@ Within the expression on the right-hand side of a rule, the following expression
158
157
  </table>
159
158
 
160
159
  * Comments include `//` and `#` through end of line (other than hex character) and `/* ... */ (* ... *) which may cross lines`
161
- * All rules **MAY** start with an identifier, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`
160
+ * All rules **MAY** start with an number, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`, which is not retained after parsing
161
+ * Symbols **MAY** be enclosed in angle brackets `'<'` and `>`, which are dropped when parsing.
162
162
  * `@terminals` causes following rules to be treated as terminals. Any terminal which is all upper-case (eg`TERMINAL`), or any rules with expressions that match characters (`#xN`, `[a-z]`, `[^a-z]`, `[abc]`, `[^abc]`, `"string"`, `'string'`, or `A - B`), are also treated as terminals.
163
163
  * `@pass` defines the expression used to detect whitespace, which is removed in processing.
164
164
  * No support for `wfc` (well-formedness constraint) or `vc` (validity constraint).
@@ -177,7 +177,7 @@ Intermediate representations of the grammar may be serialized to Lisp-like [S-Ex
177
177
 
178
178
  is serialized as
179
179
 
180
- (rule ebnf "1" (star (alt declaration rule)))
180
+ (rule ebnf (star (alt declaration rule)))
181
181
 
182
182
  Different components of an EBNF rule expression are transformed into their own operator:
183
183
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 2.5.0
1
+ 2.6.0
data/bin/ebnf CHANGED
@@ -9,6 +9,7 @@ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), "..", 'lib')))
9
9
  require 'rubygems'
10
10
  require 'getoptlong'
11
11
  require 'ebnf'
12
+ require 'rdf/spec'
12
13
 
13
14
  options = {
14
15
  output_format: :sxp,
@@ -86,7 +87,11 @@ end
86
87
 
87
88
  input = File.open(ARGV[0]) if ARGV[0]
88
89
 
89
- ebnf = EBNF.parse(input || STDIN, **options)
90
+ logger = Logger.new(STDERR)
91
+ logger.level = options[:level] || Logger::ERROR
92
+ logger.formatter = lambda {|severity, datetime, progname, msg| "%5s %s\n" % [severity, msg]}
93
+
94
+ ebnf = EBNF.parse(input || STDIN, logger: logger, **options)
90
95
  ebnf.make_bnf if options[:bnf] || options[:ll1]
91
96
  ebnf.make_peg if options[:peg]
92
97
  if options[:ll1]
data/etc/ebnf.ebnf CHANGED
@@ -5,9 +5,8 @@
5
5
 
6
6
  # Use the LHS terminal to match the identifier, rule name and assignment due to
7
7
  # confusion between the identifier and RANGE.
8
- # Note, for grammars not using identifiers, it is still possible to confuse
9
- # a rule ending with a range the next rule, as it may be interpreted as an identifier.
10
- # In such case, best to enclose the rule in '()'.
8
+ # The PEG parser has special rules for matching LHS and RANGE
9
+ # so that RANGE is not confused with LHS.
11
10
  [3] rule ::= LHS expression
12
11
 
13
12
  [4] expression ::= alt
@@ -34,11 +33,13 @@
34
33
 
35
34
  [11] LHS ::= ('[' SYMBOL ']' ' '+)? SYMBOL ' '* '::='
36
35
 
37
- [12] SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
36
+ [12] SYMBOL ::= '<' O_SYMBOL '>' | O_SYMBOL
37
+
38
+ [12a] O_SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
38
39
 
39
40
  [13] HEX ::= '#x' ([a-f] | [A-F] | [0-9])+
40
41
 
41
- [14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']' - LHS
42
+ [14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
42
43
 
43
44
  [15] O_RANGE ::= '[^' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
44
45
 
data/etc/ebnf.html CHANGED
@@ -1,4 +1,4 @@
1
- <!-- Generated with ebnf version 2.4.0. See https://github.com/dryruby/ebnf. -->
1
+ <!-- Generated with ebnf version 2.5.0. See https://github.com/dryruby/ebnf. -->
2
2
  <table class="grammar">
3
3
  <tbody id="grammar-productions" class="ebnf">
4
4
  <tr id="grammar-production-ebnf">
@@ -77,6 +77,12 @@
77
77
  <td>[12]</td>
78
78
  <td><code>SYMBOL</code></td>
79
79
  <td>::=</td>
80
+ <td><code class="grammar-paren">(</code>'<code class="grammar-literal">&lt;</code>' <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a> '<code class="grammar-literal">&gt;</code>'<code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a></td>
81
+ </tr>
82
+ <tr id="grammar-production-O_SYMBOL">
83
+ <td>[12a]</td>
84
+ <td><code>O_SYMBOL</code></td>
85
+ <td>::=</td>
80
86
  <td><code class="grammar-paren">(</code><code class="grammar-brac">[</code><code class="grammar-literal">a-z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">A-Z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">0-9</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> '<code class="grammar-literal">_</code>' <code class="grammar-alt">|</code> '<code class="grammar-literal">.</code>'<code class="grammar-paren">)</code><code class="grammar-plus">+</code></td>
81
87
  </tr>
82
88
  <tr id="grammar-production-HEX">
@@ -89,7 +95,7 @@
89
95
  <td>[14]</td>
90
96
  <td><code>RANGE</code></td>
91
97
  <td>::=</td>
92
- <td>'<code class="grammar-literal">[</code>' <code class="grammar-paren">(</code><code class="grammar-paren">(</code><a href="#grammar-production-R_CHAR">R_CHAR</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-R_CHAR">R_CHAR</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <code class="grammar-paren">(</code><a href="#grammar-production-HEX">HEX</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code class="grammar-alt">|</code> <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code><code class="grammar-plus">+</code> '<code class="grammar-literal">-</code>'<code class="grammar-opt">?</code> <code class="grammar-paren">(</code>'<code class="grammar-literal">]</code>' <code class="grammar-diff">-</code> <a href="#grammar-production-LHS">LHS</a><code class="grammar-paren">)</code></td>
98
+ <td>'<code class="grammar-literal">[</code>' <code class="grammar-paren">(</code><code class="grammar-paren">(</code><a href="#grammar-production-R_CHAR">R_CHAR</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-R_CHAR">R_CHAR</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <code class="grammar-paren">(</code><a href="#grammar-production-HEX">HEX</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code class="grammar-alt">|</code> <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code><code class="grammar-plus">+</code> '<code class="grammar-literal">-</code>'<code class="grammar-opt">?</code> '<code class="grammar-literal">]</code>'</td>
93
99
  </tr>
94
100
  <tr id="grammar-production-O_RANGE">
95
101
  <td>[15]</td>
data/etc/ebnf.ll1.rb CHANGED
@@ -1,4 +1,4 @@
1
- # This file is automatically generated by ebnf version 2.4.0
1
+ # This file is automatically generated by ebnf version 2.5.0
2
2
  # Derived from etc/ebnf.ebnf
3
3
  module Meta
4
4
  START = :ebnf
data/etc/ebnf.ll1.sxp CHANGED
@@ -100,13 +100,11 @@
100
100
  (seq '@pass' expression))
101
101
  (terminals _terminals (seq))
102
102
  (terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
103
- (terminal SYMBOL "12" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
103
+ (terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
104
+ (terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
104
105
  (terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
105
106
  (terminal RANGE "14"
106
- (seq '['
107
- (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX))
108
- (opt '-')
109
- (diff ']' LHS)) )
107
+ (seq '[' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
110
108
  (terminal O_RANGE "15"
111
109
  (seq '[^' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
112
110
  (terminal STRING1 "16" (seq '"' (star (diff CHAR '"')) '"'))
data/etc/ebnf.peg.rb CHANGED
@@ -1,4 +1,4 @@
1
- # This file is automatically generated by ebnf version 2.4.0
1
+ # This file is automatically generated by ebnf version 2.5.0
2
2
  # Derived from etc/ebnf.ebnf
3
3
  module EBNFMeta
4
4
  RULES = [
@@ -25,24 +25,25 @@ module EBNFMeta
25
25
  EBNF::Rule.new(:_LHS_3, "11.3", [:seq, "[", :SYMBOL, "]", :_LHS_4], kind: :terminal).extend(EBNF::PEG::Rule),
26
26
  EBNF::Rule.new(:_LHS_4, "11.4", [:plus, " "], kind: :terminal).extend(EBNF::PEG::Rule),
27
27
  EBNF::Rule.new(:_LHS_2, "11.2", [:star, " "], kind: :terminal).extend(EBNF::PEG::Rule),
28
- EBNF::Rule.new(:SYMBOL, "12", [:plus, :_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
29
- EBNF::Rule.new(:_SYMBOL_1, "12.1", [:alt, :_SYMBOL_2, :_SYMBOL_3, :_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
30
- EBNF::Rule.new(:_SYMBOL_2, "12.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
31
- EBNF::Rule.new(:_SYMBOL_3, "12.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
32
- EBNF::Rule.new(:_SYMBOL_4, "12.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
28
+ EBNF::Rule.new(:SYMBOL, "12", [:alt, :_SYMBOL_1, :O_SYMBOL], kind: :terminal).extend(EBNF::PEG::Rule),
29
+ EBNF::Rule.new(:_SYMBOL_1, "12.1", [:seq, "<", :O_SYMBOL, ">"], kind: :terminal).extend(EBNF::PEG::Rule),
30
+ EBNF::Rule.new(:O_SYMBOL, "12a", [:plus, :_O_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
31
+ EBNF::Rule.new(:_O_SYMBOL_1, "12a.1", [:alt, :_O_SYMBOL_2, :_O_SYMBOL_3, :_O_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
32
+ EBNF::Rule.new(:_O_SYMBOL_2, "12a.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
33
+ EBNF::Rule.new(:_O_SYMBOL_3, "12a.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
34
+ EBNF::Rule.new(:_O_SYMBOL_4, "12a.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
33
35
  EBNF::Rule.new(:HEX, "13", [:seq, "#x", :_HEX_1], kind: :terminal).extend(EBNF::PEG::Rule),
34
36
  EBNF::Rule.new(:_HEX_1, "13.1", [:plus, :_HEX_2], kind: :terminal).extend(EBNF::PEG::Rule),
35
37
  EBNF::Rule.new(:_HEX_2, "13.2", [:alt, :_HEX_3, :_HEX_4, :_HEX_5], kind: :terminal).extend(EBNF::PEG::Rule),
36
38
  EBNF::Rule.new(:_HEX_3, "13.3", [:range, "a-f"], kind: :terminal).extend(EBNF::PEG::Rule),
37
39
  EBNF::Rule.new(:_HEX_4, "13.4", [:range, "A-F"], kind: :terminal).extend(EBNF::PEG::Rule),
38
40
  EBNF::Rule.new(:_HEX_5, "13.5", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
39
- EBNF::Rule.new(:RANGE, "14", [:seq, "[", :_RANGE_1, :_RANGE_2, :_RANGE_3], kind: :terminal).extend(EBNF::PEG::Rule),
40
- EBNF::Rule.new(:_RANGE_1, "14.1", [:plus, :_RANGE_4], kind: :terminal).extend(EBNF::PEG::Rule),
41
- EBNF::Rule.new(:_RANGE_4, "14.4", [:alt, :_RANGE_5, :_RANGE_6, :R_CHAR, :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
42
- EBNF::Rule.new(:_RANGE_5, "14.5", [:seq, :R_CHAR, "-", :R_CHAR], kind: :terminal).extend(EBNF::PEG::Rule),
43
- EBNF::Rule.new(:_RANGE_6, "14.6", [:seq, :HEX, "-", :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
41
+ EBNF::Rule.new(:RANGE, "14", [:seq, "[", :_RANGE_1, :_RANGE_2, "]"], kind: :terminal).extend(EBNF::PEG::Rule),
42
+ EBNF::Rule.new(:_RANGE_1, "14.1", [:plus, :_RANGE_3], kind: :terminal).extend(EBNF::PEG::Rule),
43
+ EBNF::Rule.new(:_RANGE_3, "14.3", [:alt, :_RANGE_4, :_RANGE_5, :R_CHAR, :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
44
+ EBNF::Rule.new(:_RANGE_4, "14.4", [:seq, :R_CHAR, "-", :R_CHAR], kind: :terminal).extend(EBNF::PEG::Rule),
45
+ EBNF::Rule.new(:_RANGE_5, "14.5", [:seq, :HEX, "-", :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
44
46
  EBNF::Rule.new(:_RANGE_2, "14.2", [:opt, "-"], kind: :terminal).extend(EBNF::PEG::Rule),
45
- EBNF::Rule.new(:_RANGE_3, "14.3", [:diff, "]", :LHS], kind: :terminal).extend(EBNF::PEG::Rule),
46
47
  EBNF::Rule.new(:O_RANGE, "15", [:seq, "[^", :_O_RANGE_1, :_O_RANGE_2, "]"], kind: :terminal).extend(EBNF::PEG::Rule),
47
48
  EBNF::Rule.new(:_O_RANGE_1, "15.1", [:plus, :_O_RANGE_3], kind: :terminal).extend(EBNF::PEG::Rule),
48
49
  EBNF::Rule.new(:_O_RANGE_3, "15.3", [:alt, :_O_RANGE_4, :_O_RANGE_5, :R_CHAR, :HEX], kind: :terminal).extend(EBNF::PEG::Rule),
data/etc/ebnf.peg.sxp CHANGED
@@ -22,24 +22,25 @@
22
22
  (terminal _LHS_3 "11.3" (seq '[' SYMBOL ']' _LHS_4))
23
23
  (terminal _LHS_4 "11.4" (plus ' '))
24
24
  (terminal _LHS_2 "11.2" (star ' '))
25
- (terminal SYMBOL "12" (plus _SYMBOL_1))
26
- (terminal _SYMBOL_1 "12.1" (alt _SYMBOL_2 _SYMBOL_3 _SYMBOL_4 '_' '.'))
27
- (terminal _SYMBOL_2 "12.2" (range "a-z"))
28
- (terminal _SYMBOL_3 "12.3" (range "A-Z"))
29
- (terminal _SYMBOL_4 "12.4" (range "0-9"))
25
+ (terminal SYMBOL "12" (alt _SYMBOL_1 O_SYMBOL))
26
+ (terminal _SYMBOL_1 "12.1" (seq '<' O_SYMBOL '>'))
27
+ (terminal O_SYMBOL "12a" (plus _O_SYMBOL_1))
28
+ (terminal _O_SYMBOL_1 "12a.1" (alt _O_SYMBOL_2 _O_SYMBOL_3 _O_SYMBOL_4 '_' '.'))
29
+ (terminal _O_SYMBOL_2 "12a.2" (range "a-z"))
30
+ (terminal _O_SYMBOL_3 "12a.3" (range "A-Z"))
31
+ (terminal _O_SYMBOL_4 "12a.4" (range "0-9"))
30
32
  (terminal HEX "13" (seq '#x' _HEX_1))
31
33
  (terminal _HEX_1 "13.1" (plus _HEX_2))
32
34
  (terminal _HEX_2 "13.2" (alt _HEX_3 _HEX_4 _HEX_5))
33
35
  (terminal _HEX_3 "13.3" (range "a-f"))
34
36
  (terminal _HEX_4 "13.4" (range "A-F"))
35
37
  (terminal _HEX_5 "13.5" (range "0-9"))
36
- (terminal RANGE "14" (seq '[' _RANGE_1 _RANGE_2 _RANGE_3))
37
- (terminal _RANGE_1 "14.1" (plus _RANGE_4))
38
- (terminal _RANGE_4 "14.4" (alt _RANGE_5 _RANGE_6 R_CHAR HEX))
39
- (terminal _RANGE_5 "14.5" (seq R_CHAR '-' R_CHAR))
40
- (terminal _RANGE_6 "14.6" (seq HEX '-' HEX))
38
+ (terminal RANGE "14" (seq '[' _RANGE_1 _RANGE_2 ']'))
39
+ (terminal _RANGE_1 "14.1" (plus _RANGE_3))
40
+ (terminal _RANGE_3 "14.3" (alt _RANGE_4 _RANGE_5 R_CHAR HEX))
41
+ (terminal _RANGE_4 "14.4" (seq R_CHAR '-' R_CHAR))
42
+ (terminal _RANGE_5 "14.5" (seq HEX '-' HEX))
41
43
  (terminal _RANGE_2 "14.2" (opt '-'))
42
- (terminal _RANGE_3 "14.3" (diff ']' LHS))
43
44
  (terminal O_RANGE "15" (seq '[^' _O_RANGE_1 _O_RANGE_2 ']'))
44
45
  (terminal _O_RANGE_1 "15.1" (plus _O_RANGE_3))
45
46
  (terminal _O_RANGE_3 "15.3" (alt _O_RANGE_4 _O_RANGE_5 R_CHAR HEX))
data/etc/ebnf.sxp CHANGED
@@ -12,13 +12,11 @@
12
12
  (rule pass "10" (seq '@pass' expression))
13
13
  (terminals _terminals (seq))
14
14
  (terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
15
- (terminal SYMBOL "12" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
15
+ (terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
16
+ (terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
16
17
  (terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
17
18
  (terminal RANGE "14"
18
- (seq '['
19
- (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX))
20
- (opt '-')
21
- (diff ']' LHS)) )
19
+ (seq '[' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
22
20
  (terminal O_RANGE "15"
23
21
  (seq '[^' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
24
22
  (terminal STRING1 "16" (seq '"' (star (diff CHAR '"')) '"'))
data/etc/iso-ebnf.isoebnf CHANGED
@@ -1,4 +1,3 @@
1
- (* W3C EBNF for ISO/IEC 14977 : 1996 EBNF *)
2
1
  (* Scoured from https://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf *)
3
2
 
4
3
  syntax = syntax_rule, {syntax_rule} ;
@@ -44,10 +43,10 @@ repeated_sequence = start_repeat_symbol, definitions_list, end_repeat_symbol
44
43
  grouped_sequence = '(', definitions_list, ')'
45
44
  (* The brackets ( and ) allow any <definitions list> to be a <primary> *);
46
45
 
47
- terminal_string = ("'", first_terminal_character, {first_terminal_character}, "'")
48
- | ('"', second_terminal_character, {second_terminal_character}, '"')
49
- (* A <terminal string> represents the
50
- <characters> between the quote symbols '_' or "_" *);
46
+ terminal_string = ("'", first_terminal_character, {first_terminal_character}, "'")
47
+ | ('"', second_terminal_character, {second_terminal_character}, '"')
48
+ (* A <terminal string> represents the
49
+ <characters> between the quote symbols '_' or "_" *);
51
50
 
52
51
  meta_identifier = letter, {meta_identifier_character}
53
52
  (* A <meta identifier> is the name of a syntactic element of the language being defined *);
data/lib/ebnf/abnf.rb CHANGED
@@ -234,10 +234,10 @@ module EBNF
234
234
  # @return [EBNFParser]
235
235
  def initialize(input, **options)
236
236
  # If the `level` option is set, instantiate a logger for collecting trace information.
237
- if options.has_key?(:level)
238
- options[:logger] = Logger.new(STDERR)
239
- options[:logger].level = options[:level]
240
- options[:logger].formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}
237
+ if options.key?(:level)
238
+ options[:logger] ||= Logger.new(STDERR).
239
+ tap {|x| x.level = options[:level]}.
240
+ tap {|x| x.formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}}
241
241
  end
242
242
 
243
243
  # Read input, if necessary, which will be used in a Scanner.
data/lib/ebnf/base.rb CHANGED
@@ -106,8 +106,8 @@ module EBNF
106
106
  # Format of input, one of `:abnf`, `:ebnf`, `:isoebnf`, `:isoebnf`, `:native`, or `:sxp`.
107
107
  # Use `:native` for the native EBNF parser, rather than the PEG parser.
108
108
  # @param [Hash{Symbol => Object}] options
109
- # @option options [Boolean, Array] :debug
110
- # Output debug information to an array or $stdout.
109
+ # @option options [Boolean] :level
110
+ # Trace level. 0(debug), 1(info), 2(warn), 3(error).
111
111
  # @option options [Boolean, Array] :validate
112
112
  # Validate resulting grammar.
113
113
  def initialize(input, format: :ebnf, **options)
@@ -311,13 +311,7 @@ module EBNF
311
311
 
312
312
  # Progress output, less than debugging
313
313
  def progress(*args, **options)
314
- return unless @options[:progress] || @options[:debug]
315
- depth = options[:depth] || @depth
316
- args << yield if block_given?
317
- message = "#{args.join(': ')}"
318
- str = "[#{@lineno}]#{' ' * depth}#{message}"
319
- @options[:debug] << str if @options[:debug].is_a?(Array)
320
- $stderr.puts(str) if @options[:progress] || @options[:debug] == true
314
+ debug(*args, level: Logger::INFO, **options)
321
315
  end
322
316
 
323
317
  # Error output
@@ -325,10 +319,9 @@ module EBNF
325
319
  depth = options[:depth] || @depth
326
320
  args << yield if block_given?
327
321
  message = "#{args.join(': ')}"
322
+ debug(message, level: Logger::ERROR, **options)
328
323
  @errors << message
329
- str = "[#{@lineno}]#{' ' * depth}#{message}"
330
- @options[:debug] << str if @options[:debug].is_a?(Array)
331
- $stderr.puts(str)
324
+ $stderr.puts(message)
332
325
  end
333
326
 
334
327
  ##
@@ -342,14 +335,17 @@ module EBNF
342
335
  # @param [String] message ("")
343
336
  #
344
337
  # @yieldreturn [String] added to message
345
- def debug(*args, **options)
346
- return unless @options[:debug]
338
+ def debug(*args, level: Logger::DEBUG, **options)
339
+ return unless @options.key?(:logger)
347
340
  depth = options[:depth] || @depth
348
341
  args << yield if block_given?
349
342
  message = "#{args.join(': ')}"
350
343
  str = "[#{@lineno}]#{' ' * depth}#{message}"
351
- @options[:debug] << str if @options[:debug].is_a?(Array)
352
- $stderr.puts(str) if @options[:debug] == true
344
+ if @options[:logger].respond_to?(:add)
345
+ @options[:logger].add(level, str)
346
+ elsif @options[:logger].respond_to?(:<<)
347
+ @options[:logger] << "[#{lineno}] " + str
348
+ end
353
349
  end
354
350
  end
355
351
  end
@@ -1,4 +1,4 @@
1
- # This file is automatically generated by ebnf version 2.0.0
1
+ # This file is automatically generated by ebnf version 2.5.0
2
2
  # Derived from etc/ebnf.ebnf
3
3
  module EBNFMeta
4
4
  RULES = [
@@ -25,11 +25,13 @@ module EBNFMeta
25
25
  EBNF::Rule.new(:_LHS_3, "11.3", [:seq, "[", :SYMBOL, "]", :_LHS_4], kind: :terminal).extend(EBNF::PEG::Rule),
26
26
  EBNF::Rule.new(:_LHS_4, "11.4", [:plus, " "], kind: :terminal).extend(EBNF::PEG::Rule),
27
27
  EBNF::Rule.new(:_LHS_2, "11.2", [:star, " "], kind: :terminal).extend(EBNF::PEG::Rule),
28
- EBNF::Rule.new(:SYMBOL, "12", [:plus, :_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
29
- EBNF::Rule.new(:_SYMBOL_1, "12.1", [:alt, :_SYMBOL_2, :_SYMBOL_3, :_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
30
- EBNF::Rule.new(:_SYMBOL_2, "12.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
31
- EBNF::Rule.new(:_SYMBOL_3, "12.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
32
- EBNF::Rule.new(:_SYMBOL_4, "12.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
28
+ EBNF::Rule.new(:SYMBOL, "12", [:alt, :_SYMBOL_1, :O_SYMBOL], kind: :terminal).extend(EBNF::PEG::Rule),
29
+ EBNF::Rule.new(:_SYMBOL_1, "12.1", [:seq, "<", :O_SYMBOL, ">"], kind: :terminal).extend(EBNF::PEG::Rule),
30
+ EBNF::Rule.new(:O_SYMBOL, "12a", [:plus, :_O_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
31
+ EBNF::Rule.new(:_O_SYMBOL_1, "12a.1", [:alt, :_O_SYMBOL_2, :_O_SYMBOL_3, :_O_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
32
+ EBNF::Rule.new(:_O_SYMBOL_2, "12a.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
33
+ EBNF::Rule.new(:_O_SYMBOL_3, "12a.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
34
+ EBNF::Rule.new(:_O_SYMBOL_4, "12a.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
33
35
  EBNF::Rule.new(:HEX, "13", [:seq, "#x", :_HEX_1], kind: :terminal).extend(EBNF::PEG::Rule),
34
36
  EBNF::Rule.new(:_HEX_1, "13.1", [:plus, :_HEX_2], kind: :terminal).extend(EBNF::PEG::Rule),
35
37
  EBNF::Rule.new(:_HEX_2, "13.2", [:alt, :_HEX_3, :_HEX_4, :_HEX_5], kind: :terminal).extend(EBNF::PEG::Rule),
data/lib/ebnf/isoebnf.rb CHANGED
@@ -196,10 +196,10 @@ module EBNF
196
196
  # @return [EBNFParser]
197
197
  def initialize(input, **options, &block)
198
198
  # If the `level` option is set, instantiate a logger for collecting trace information.
199
- if options.has_key?(:level)
200
- options[:logger] = Logger.new(STDERR)
201
- options[:logger].level = options[:level]
202
- options[:logger].formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}
199
+ if options.key?(:level)
200
+ options[:logger] ||= Logger.new(STDERR).
201
+ tap {|x| x.level = options[:level]}.
202
+ tap {|x| x.formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}}
203
203
  end
204
204
 
205
205
  # Read input, if necessary, which will be used in a Scanner.
@@ -603,7 +603,7 @@ module EBNF::LL1
603
603
  if handler
604
604
  # Create a new production data element, potentially allowing handler
605
605
  # to customize before pushing on the @prod_data stack
606
- debug("#{prod}(:start):#{@prod_data.length}") {@prod_data.last}
606
+ progress("#{prod}(:start):#{@prod_data.length}") {@prod_data.last}
607
607
  data = {}
608
608
  begin
609
609
  self.class.eval_with_binding(self) {
@@ -617,12 +617,12 @@ module EBNF::LL1
617
617
  elsif [:merge, :star].include?(@cleanup[prod])
618
618
  # Save current data to merge later
619
619
  @prod_data << {}
620
- debug("#{prod}(:start}:#{@prod_data.length}:cleanup:#{@cleanup[prod]}") { get_token.inspect + (@recovering ? ' recovering' : '')}
620
+ progress("#{prod}(:start}:#{@prod_data.length}:cleanup:#{@cleanup[prod]}") { get_token.inspect + (@recovering ? ' recovering' : '')}
621
621
  else
622
622
  # Make sure we push as many was we pop, even if there is no
623
623
  # explicit start handler
624
624
  @prod_data << {} if self.class.production_handlers[prod]
625
- debug("#{prod}(:start:#{@prod_data.length})") { get_token.inspect + (@recovering ? ' recovering' : '')}
625
+ progress("#{prod}(:start:#{@prod_data.length})") { get_token.inspect + (@recovering ? ' recovering' : '')}
626
626
  end
627
627
  #puts "prod_data(s): " + @prod_data.inspect
628
628
  end
data/lib/ebnf/native.rb CHANGED
@@ -52,7 +52,7 @@ module EBNF
52
52
  yield r unless r.empty?
53
53
  #debug("eachRule(rule)") { "[#{cur_lineno}] #{s.inspect}" }
54
54
  @lineno = cur_lineno
55
- r = s
55
+ r = s.gsub(/[<>]/, '') # Remove angle brackets
56
56
  else
57
57
  # Collect until end of line, or start of comment or quote
58
58
  s = scanner.scan_until(%r{(?:[/\(]\*)|#(?!x)|//|["']|$})
@@ -81,6 +81,7 @@ module EBNF
81
81
  num, sym = num_sym.split(']', 2).map(&:strip)
82
82
  num, sym = "", num if sym.nil?
83
83
  num = num[1..-1]
84
+ sym = sym[1..-2] if sym.start_with?('<') && sym.end_with?('>')
84
85
  r = Rule.new(sym && sym.to_sym, num, expression(expr).first, ebnf: self)
85
86
  debug("ruleParts") { r.inspect }
86
87
  r
@@ -226,7 +227,7 @@ module EBNF
226
227
  # (a ' b c')
227
228
  #
228
229
  # >>> postfix("a? b c")
229
- # ((opt, a) ' b c')
230
+ # ((opt a) ' b c')
230
231
  def postfix(s)
231
232
  debug("postfix") {"(#{s.inspect})"}
232
233
  e, s = depth {primary(s)}
@@ -297,8 +298,8 @@ module EBNF
297
298
  s.match(/(#x\h+)(.*)$/)
298
299
  l, s = $1, $2
299
300
  [[:hex, l], s]
300
- when /[\w\.]/ # SYMBOL
301
- s.match(/([\w\.]+)(.*)$/)
301
+ when '<', /[\w\.]/ # SYMBOL
302
+ s.match(/<?([\w\.]+)>?(.*)$/)
302
303
  l, s = $1, $2
303
304
  [l.to_sym, s]
304
305
  when '-'
data/lib/ebnf/parser.rb CHANGED
@@ -11,6 +11,12 @@ module EBNF
11
11
  # @return [Array<EBNF::Rule>]
12
12
  attr_reader :ast
13
13
 
14
+ # Set on first rule
15
+ attr_reader :lhs_includes_identifier
16
+
17
+ # Regular expression to match a [...] range, which may be distinguisehd from an LHS
18
+ attr_reader :range
19
+
14
20
  # ## Terminals
15
21
  # Define rules for Terminals, placing results on the input stack, making them available to upstream non-Terminal rules.
16
22
  #
@@ -26,15 +32,32 @@ module EBNF
26
32
 
27
33
  # Match the Left hand side of a rule or terminal
28
34
  #
29
- # [11] LHS ::= ('[' SYMBOL+ ']' ' '+)? SYMBOL ' '* '::='
35
+ # [11] LHS ::= ('[' SYMBOL+ ']' ' '+)? <? SYMBOL >? ' '* '::='
30
36
  terminal(:LHS, LHS) do |value, prod|
31
- value.to_s.scan(/(?:\[([^\]]+)\])?\s*(\w+)\s*::=/).first
37
+ md = value.to_s.scan(/(?:\[([^\]]+)\])?\s*<?(\w+)>?\s*::=/).first
38
+ if @lhs_includes_identifier.nil?
39
+ @lhs_includes_identifier = !md[0].nil?
40
+ @range = md[0] ? RANGE_NOT_LHS : RANGE
41
+ elsif @lhs_includes_identifier && !md[0]
42
+ error("LHS",
43
+ "Rule does not begin with a [xxx] identifier, which was established on the first rule",
44
+ production: :LHS,
45
+ rest: value)
46
+ elsif !@lhs_includes_identifier && md[0]
47
+ error("LHS",
48
+ "Rule begins with a [xxx] identifier, which was not established on the first rule",
49
+ production: :LHS,
50
+ rest: value)
51
+ end
52
+ md
32
53
  end
33
54
 
34
55
  # Match `SYMBOL` terminal
35
56
  #
36
- # [12] SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
57
+ # [12] SYMBOL ::= '<' O_SYMBOL '>' | O_SYMBOL
58
+ # [12a] O_SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
37
59
  terminal(:SYMBOL, SYMBOL) do |value|
60
+ value = value[1..-2] if value.start_with?('<') && value.end_with?('>')
38
61
  value.to_sym
39
62
  end
40
63
 
@@ -46,9 +69,10 @@ module EBNF
46
69
  end
47
70
 
48
71
  # Terminal for `RANGE` is matched as part of a `primary` rule.
72
+ # Note that this won't match if rules include identifiers.
49
73
  #
50
- # [14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']' - LHS
51
- terminal(:RANGE, RANGE) do |value|
74
+ # [14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
75
+ terminal(:RANGE, proc {@range}) do |value|
52
76
  [:range, value[1..-2]]
53
77
  end
54
78
 
@@ -128,7 +152,9 @@ module EBNF
128
152
  # Invoke callback
129
153
  id, sym = value[:LHS]
130
154
  expression = value[:expression]
131
- callback.call(:rule, EBNF::Rule.new(sym.to_sym, id, expression))
155
+ rule = EBNF::Rule.new(sym.to_sym, id, expression)
156
+ progress(:rule, rule.to_sxp)
157
+ callback.call(:rule, rule)
132
158
  nil
133
159
  end
134
160
 
@@ -266,12 +292,15 @@ module EBNF
266
292
  # @return [EBNFParser]
267
293
  def initialize(input, **options, &block)
268
294
  # If the `level` option is set, instantiate a logger for collecting trace information.
269
- if options.has_key?(:level)
270
- options[:logger] = Logger.new(STDERR)
271
- options[:logger].level = options[:level]
272
- options[:logger].formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}
295
+ if options.key?(:level)
296
+ options[:logger] ||= Logger.new(STDERR).
297
+ tap {|x| x.level = options[:level]}.
298
+ tap {|x| x.formatter = lambda {|severity, datetime, progname, msg| "#{severity} #{msg}\n"}}
273
299
  end
274
300
 
301
+ # This is established on the first rule.
302
+ self.class.instance_variable_set(:@lhs_includes_identifier, nil)
303
+
275
304
  # Read input, if necessary, which will be used in a Scanner.
276
305
  @input = input.respond_to?(:read) ? input.read : input.to_s
277
306
 
@@ -68,10 +68,9 @@ module EBNF::PEG
68
68
  #
69
69
  # @param [Symbol] term
70
70
  # The terminal name.
71
- # @param [Regexp] regexp (nil)
72
- # Pattern used to scan for this terminal,
73
- # defaults to the expression defined in the associated rule.
74
- # If unset, the terminal rule is used for matching.
71
+ # @param [Regexp, Proc] regexp
72
+ # Pattern used to scan for this terminal.
73
+ # Passing a Proc will evaluate that proc to retrieve a regular expression.
75
74
  # @param [Hash] options
76
75
  # @option options [Boolean] :unescape
77
76
  # Cause strings and codepoints to be unescaped.
@@ -83,8 +82,8 @@ module EBNF::PEG
83
82
  # @yieldparam [Proc] block
84
83
  # Block passed to initialization for yielding to calling parser.
85
84
  # Should conform to the yield specs for #initialize
86
- def terminal(term, regexp = nil, **options, &block)
87
- terminal_regexps[term] = regexp if regexp
85
+ def terminal(term, regexp, **options, &block)
86
+ terminal_regexps[term] = regexp
88
87
  terminal_handlers[term] = block if block_given?
89
88
  terminal_options[term] = options.freeze
90
89
  end
@@ -138,6 +137,8 @@ module EBNF::PEG
138
137
  # @yieldparam [Proc] block
139
138
  # Block passed to initialization for yielding to calling parser.
140
139
  # Should conform to the yield specs for #initialize
140
+ # @yieldparam [Hash] **options
141
+ # Other data that may be passed to the production
141
142
  # @yieldreturn [Object] the result of this production.
142
143
  # Yield to generate a triple
143
144
  def production(term, clear_packrat: false, &block)
@@ -183,6 +184,8 @@ module EBNF::PEG
183
184
  # Identify the symbol of the starting rule with `start`.
184
185
  # @param [Hash{Symbol => Object}] options
185
186
  # @option options[Integer] :high_water passed to lexer
187
+ # @option options[:upper, :lower] :insensitive_strings
188
+ # Perform case-insensitive match of strings not defined as terminals, and map to either upper or lower case.
186
189
  # @option options [Logger] :logger for errors/progress/debug.
187
190
  # @option options[Integer] :low_water passed to lexer
188
191
  # @option options[Boolean] :seq_hash (false)
@@ -201,7 +204,7 @@ module EBNF::PEG
201
204
  # or errors raised during processing callbacks. Internal
202
205
  # errors are raised using {Error}.
203
206
  # @todo FIXME implement seq_hash
204
- def parse(input = nil, start = nil, rules = nil, **options, &block)
207
+ def parse(input = nil, start = nil, rules = nil, insensitive_strings: nil, **options, &block)
205
208
  start ||= options[:start]
206
209
  rules ||= options[:rules] || []
207
210
  @rules = rules.inject({}) {|memo, rule| memo.merge(rule.sym => rule)}
@@ -230,7 +233,7 @@ module EBNF::PEG
230
233
  start_rule = @rules[start]
231
234
  raise Error, "Starting production #{start.inspect} not defined" unless start_rule
232
235
 
233
- result = start_rule.parse(scanner)
236
+ result = start_rule.parse(scanner, insensitive_strings: insensitive_strings)
234
237
  if result == :unmatched
235
238
  # Start rule wasn't matched, which is about the only error condition
236
239
  error("--top--", @furthest_failure.to_s,
@@ -367,21 +370,17 @@ module EBNF::PEG
367
370
  # Start for production
368
371
  # Adds data avoiable during the processing of the production
369
372
  #
373
+ # @param [Symbol] prod
374
+ # @param [Hash] **options other options available for handlers
370
375
  # @return [Hash] composed of production options. Currently only `as_hash` is supported.
371
376
  # @see ClassMethods#start_production
372
- def onStart(prod)
377
+ def onStart(prod, **options)
373
378
  handler = self.class.start_handlers[prod]
374
379
  @productions << prod
375
- debug("#{prod}(:start)", "",
376
- lineno: (scanner.lineno if scanner),
377
- pos: (scanner.pos if scanner)
378
- ) do
379
- "#{prod}, pos: #{scanner ? scanner.pos : '?'}, rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
380
- end
381
380
  if handler
382
381
  # Create a new production data element, potentially allowing handler
383
382
  # to customize before pushing on the @prod_data stack
384
- data = {_production: prod}
383
+ data = {_production: prod}.merge(options)
385
384
  begin
386
385
  self.class.eval_with_binding(self) {
387
386
  handler.call(data, @parse_callback)
@@ -396,14 +395,21 @@ module EBNF::PEG
396
395
  # explicit start handler
397
396
  @prod_data << {_production: prod}
398
397
  end
398
+ progress("#{prod}(:start)", "",
399
+ lineno: (scanner.lineno if scanner),
400
+ pos: (scanner.pos if scanner)
401
+ ) do
402
+ "#{data.inspect}@(#{scanner ? scanner.pos : '?'}), rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
403
+ end
399
404
  return self.class.start_options.fetch(prod, {}) # any options on this production
400
405
  end
401
406
 
402
407
  # Finish of production
403
408
  #
404
409
  # @param [Object] result parse result
410
+ # @param [Hash] **options other options available for handlers
405
411
  # @return [Object] parse result, or the value returned from the handler
406
- def onFinish(result)
412
+ def onFinish(result, **options)
407
413
  #puts "prod_data(f): " + @prod_data.inspect
408
414
  prod = @productions.last
409
415
  handler, clear_packrat = self.class.production_handlers[prod]
@@ -415,14 +421,14 @@ module EBNF::PEG
415
421
  # Pop production data element from stack, potentially allowing handler to use it
416
422
  result = begin
417
423
  self.class.eval_with_binding(self) {
418
- handler.call(result, data, @parse_callback)
424
+ handler.call(result, data, @parse_callback, **options)
419
425
  }
420
426
  rescue ArgumentError, Error => e
421
427
  error("finish", "#{e.class}: #{e.message}", production: prod, backtrace: e.backtrace)
422
428
  @recovering = false
423
429
  end
424
430
  end
425
- debug("#{prod}(:finish)", "",
431
+ progress("#{prod}(:finish)", "",
426
432
  lineno: (scanner.lineno if scanner),
427
433
  level: result == :unmatched ? 0 : 1) do
428
434
  "#{result.inspect}@(#{scanner ? scanner.pos : '?'}), rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
@@ -572,5 +578,5 @@ module EBNF::PEG
572
578
  super(message.to_s)
573
579
  end
574
580
  end # class Error
575
- end # class Parser
576
- end # module EBNF::LL1
581
+ end # module Parser
582
+ end # module EBNF::PEG
data/lib/ebnf/peg/rule.rb CHANGED
@@ -13,7 +13,7 @@ module EBNF::PEG
13
13
  ##
14
14
  # Parse a rule or terminal, invoking callbacks, as appropriate
15
15
 
16
- # If there is are `start_production` and/or `production`,
16
+ # If there are `start_production` and/or `production` handlers,
17
17
  # they are invoked with a `prod_data` stack, the input stream and offset.
18
18
  # Otherwise, the results are added as an array value
19
19
  # to a hash indexed by the rule name.
@@ -31,8 +31,9 @@ module EBNF::PEG
31
31
  # * `star`: returns an array of the values matched for the specified production. For Terminals, these are concatenated into a single string.
32
32
  #
33
33
  # @param [Scanner] input
34
+ # @param [Hash] **options Other data that may be passed to handlers.
34
35
  # @return [Hash{Symbol => Object}, :unmatched] A hash with keys for matched component of the expression. Returns :unmatched if the input does not match the production.
35
- def parse(input)
36
+ def parse(input, **options)
36
37
  # Save position and linenumber for backtracking
37
38
  pos, lineno = input.pos, input.lineno
38
39
 
@@ -48,6 +49,7 @@ module EBNF::PEG
48
49
  # use that to match the input,
49
50
  # otherwise,
50
51
  if regexp = parser.terminal_regexp(sym)
52
+ regexp = regexp.call() if regexp.is_a?(Proc)
51
53
  term_opts = parser.terminal_options(sym)
52
54
  if matched = input.scan(regexp)
53
55
  # Optionally map matched
@@ -71,12 +73,12 @@ module EBNF::PEG
71
73
  else
72
74
  eat_whitespace(input)
73
75
  end
74
- start_options = parser.onStart(sym)
76
+ start_options = options.merge(parser.onStart(sym, **options))
75
77
  string_regexp_opts = start_options[:insensitive_strings] ? Regexp::IGNORECASE : 0
76
78
 
77
79
  result = case expr.first
78
80
  when :alt
79
- # Return the first expression to match.
81
+ # Return the first expression to match. Look at strings before terminals before non-terminals, with strings ordered by longest first
80
82
  # Result is either :unmatched, or the value of the matching rule
81
83
  alt = :unmatched
82
84
  expr[1..-1].each do |prod|
@@ -84,14 +86,19 @@ module EBNF::PEG
84
86
  when Symbol
85
87
  rule = parser.find_rule(prod)
86
88
  raise "No rule found for #{prod}" unless rule
87
- rule.parse(input)
89
+ rule.parse(input, **options)
88
90
  when String
89
- s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
90
- case start_options[:insensitive_strings]
91
- when :lower then s && s.downcase
92
- when :upper then s && s.upcase
93
- else s
94
- end || :unmatched
91
+ # If the input matches a terminal for which the string is a prefix, don't match the string
92
+ if terminal_also_matches(input, prod, string_regexp_opts)
93
+ :unmatched
94
+ else
95
+ s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
96
+ case start_options[:insensitive_strings]
97
+ when :lower then s && s.downcase
98
+ when :upper then s && s.upcase
99
+ else s
100
+ end || :unmatched
101
+ end
95
102
  end
96
103
  if alt == :unmatched
97
104
  # Update furthest failure for strings and terminals
@@ -127,9 +134,18 @@ module EBNF::PEG
127
134
  when Symbol
128
135
  rule = parser.find_rule(prod)
129
136
  raise "No rule found for #{prod}" unless rule
130
- rule.parse(input)
137
+ rule.parse(input, **options)
131
138
  when String
132
- input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts)) || :unmatched
139
+ if terminal_also_matches(input, prod, string_regexp_opts)
140
+ :unmatched
141
+ else
142
+ s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
143
+ case start_options[:insensitive_strings]
144
+ when :lower then s && s.downcase
145
+ when :upper then s && s.upcase
146
+ else s
147
+ end || :unmatched
148
+ end
133
149
  end
134
150
  if res != :unmatched
135
151
  # Update furthest failure for terminals
@@ -148,7 +164,7 @@ module EBNF::PEG
148
164
  when :plus
149
165
  # Result is an array of all expressions while they match,
150
166
  # at least one must match
151
- plus = rept(input, 1, '*', expr[1], string_regexp_opts)
167
+ plus = rept(input, 1, '*', expr[1], string_regexp_opts, **options)
152
168
 
153
169
  # Update furthest failure for strings and terminals
154
170
  parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
@@ -163,7 +179,7 @@ module EBNF::PEG
163
179
  when :rept
164
180
  # Result is an array of all expressions while they match,
165
181
  # an empty array of none match
166
- rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts)
182
+ rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts, **options)
167
183
 
168
184
  # # Update furthest failure for strings and terminals
169
185
  parser.update_furthest_failure(input.pos, input.lineno, expr[3]) if terminal?
@@ -176,14 +192,18 @@ module EBNF::PEG
176
192
  when Symbol
177
193
  rule = parser.find_rule(prod)
178
194
  raise "No rule found for #{prod}" unless rule
179
- rule.parse(input)
195
+ rule.parse(input, **options.merge(_rept_data: accumulator))
180
196
  when String
181
- s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
182
- case start_options[:insensitive_strings]
183
- when :lower then s && s.downcase
184
- when :upper then s && s.upcase
185
- else s
186
- end || :unmatched
197
+ if terminal_also_matches(input, prod, string_regexp_opts)
198
+ :unmatched
199
+ else
200
+ s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
201
+ case start_options[:insensitive_strings]
202
+ when :lower then s && s.downcase
203
+ when :upper then s && s.upcase
204
+ else s
205
+ end || :unmatched
206
+ end
187
207
  end
188
208
  if res == :unmatched
189
209
  # Update furthest failure for strings and terminals
@@ -204,7 +224,7 @@ module EBNF::PEG
204
224
  when :star
205
225
  # Result is an array of all expressions while they match,
206
226
  # an empty array of none match
207
- star = rept(input, 0, '*', expr[1], string_regexp_opts)
227
+ star = rept(input, 0, '*', expr[1], string_regexp_opts, **options)
208
228
 
209
229
  # Update furthest failure for strings and terminals
210
230
  parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
@@ -214,10 +234,11 @@ module EBNF::PEG
214
234
  end
215
235
 
216
236
  if result == :unmatched
237
+ # Rewind input to entry point if unmatched.
217
238
  input.pos, input.lineno = pos, lineno
218
239
  end
219
240
 
220
- result = parser.onFinish(result)
241
+ result = parser.onFinish(result, **options)
221
242
  (parser.packrat[sym] ||= {})[pos] = {
222
243
  pos: input.pos,
223
244
  lineno: input.lineno,
@@ -229,7 +250,8 @@ module EBNF::PEG
229
250
  ##
230
251
  # Repitition, 0-1, 0-n, 1-n, ...
231
252
  #
232
- # Note, nil results are removed from the result, but count towards min/max calculations
253
+ # Note, nil results are removed from the result, but count towards min/max calculations.
254
+ # Saves temporary production data to prod_data stack.
233
255
  #
234
256
  # @param [Scanner] input
235
257
  # @param [Integer] min
@@ -245,11 +267,12 @@ module EBNF::PEG
245
267
  when Symbol
246
268
  rule = parser.find_rule(prod)
247
269
  raise "No rule found for #{prod}" unless rule
248
- while (max == '*' || result.length < max) && (res = rule.parse(input)) != :unmatched
270
+ while (max == '*' || result.length < max) && (res = rule.parse(input, **options.merge(_rept_data: result))) != :unmatched
249
271
  eat_whitespace(input) unless terminal?
250
272
  result << res
251
273
  end
252
274
  when String
275
+ # FIXME: don't match a string, if input matches a terminal
253
276
  while (res = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))) && (max == '*' || result.length < max)
254
277
  eat_whitespace(input) unless terminal?
255
278
  result << case options[:insensitive_strings]
@@ -263,6 +286,16 @@ module EBNF::PEG
263
286
  result.length < min ? :unmatched : result.compact
264
287
  end
265
288
 
289
+ ##
290
+ # See if a terminal could have a longer match than a string
291
+ def terminal_also_matches(input, prod, string_regexp_opts)
292
+ str_regex = Regexp.new(Regexp.quote(prod), string_regexp_opts)
293
+ input.match?(str_regex) && parser.class.terminal_regexps.any? do |sym, re|
294
+ re = re.call() if re.is_a?(Proc)
295
+ (match_len = input.match?(re)) && match_len > prod.length
296
+ end
297
+ end
298
+
266
299
  ##
267
300
  # Eat whitespace between non-terminal rules
268
301
  def eat_whitespace(input)
@@ -1,13 +1,14 @@
1
1
  # encoding: utf-8
2
2
  # Terminal definitions for the EBNF grammar
3
3
  module EBNF::Terminals
4
- SYMBOL_BASE = %r(\b[a-zA-Z0-9_\.]+\b)u.freeze
5
- SYMBOL = %r(#{SYMBOL_BASE}(?!\s*::=))u.freeze
4
+ SYMBOL_BASE = %r(\b[a-zA-Z0-9_\.]+\b)u.freeze # Word boundaries
5
+ SYMBOL = %r((?:#{SYMBOL_BASE}|(?:<#{SYMBOL_BASE}>))(?!\s*::=))u.freeze
6
6
  HEX = %r(\#x\h+)u.freeze
7
7
  CHAR = %r([\u0009\u000A\u000D\u0020-\uD7FF\u{10000}-\u{10FFFF}])u.freeze
8
8
  R_CHAR = %r([\u0009\u000A\u000D\u0020-\u002C\u002E-\u005C\u005E-\uD7FF\u{10000}-\u{10FFFF}])u.freeze
9
- RANGE = %r(\[(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX})|#{R_CHAR}|#{HEX})+-?\](?!\s+#{SYMBOL_BASE}\s*::=))u.freeze
10
- LHS = %r((?:\[#{SYMBOL_BASE}\])?\s*#{SYMBOL_BASE}\s*::=)u.freeze
9
+ LHS = %r((?:\[#{SYMBOL_BASE}\])?\s*<?#{SYMBOL_BASE}>?\s*::=)u.freeze
10
+ RANGE = %r(\[(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX})|#{R_CHAR}|#{HEX})+-?\])u.freeze
11
+ RANGE_NOT_LHS = %r(\[(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX})|#{R_CHAR}|#{HEX})+-?\](?!\s*<?#{SYMBOL_BASE}>?\s*::=))u.freeze
11
12
  O_RANGE = %r(\[\^(?:(?:#{R_CHAR}\-#{R_CHAR})|(?:#{HEX}\-#{HEX}|#{R_CHAR}|#{HEX}))+-?\])u.freeze
12
13
  STRING1 = %r("[\u0009\u000A\u000D\u0020\u0021\u0023-\uD7FF\u{10000}-\u{10FFFF}]*")u.freeze
13
14
  STRING2 = %r('[\u0009\u000A\u000D\u0020-\u0026\u0028-\uD7FF\u{10000}-\u{10FFFF}]*')u.freeze
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ebnf
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.5.0
4
+ version: 2.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gregg Kellogg
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-12-20 00:00:00.000000000 Z
11
+ date: 2024-12-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: sxp
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
82
  version: '1.8'
83
+ - !ruby/object:Gem::Dependency
84
+ name: base64
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '0.2'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '0.2'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: amazing_print
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -293,7 +307,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
293
307
  - !ruby/object:Gem::Version
294
308
  version: '0'
295
309
  requirements: []
296
- rubygems_version: 3.4.19
310
+ rubygems_version: 3.5.22
297
311
  signing_key:
298
312
  specification_version: 4
299
313
  summary: EBNF parser and parser generator in Ruby.