RubyGems - ebnf - Versions diffs - 2.1.0 → 2.2.0 - Mend

ebnf 2.1.0 → 2.2.0

Files changed (18) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: dc55292610eb978d5751361069f3b993d35db3a597442e2027cf0fd2ff886ba5
-  data.tar.gz: cc74cd0257a36fa3591f54becfdb51dfffbf44662598f2d67c6a36bf4e969e61
+  metadata.gz: '08b2411d5c4d34425d00259126e0d6f55c086b2c60c74e8d3ddc6a099a60ec5e'
+  data.tar.gz: d8185780e437d3db9c2644d62f51d497b25be130d20b79d63e3101e222180408
 SHA512:
-  metadata.gz: bf7c7df32e027a0739b4830651dcff1f4b5186ff2177e1f57b4e952db33619660c4025a29ffc8dba5c7d0f5f5b95f2cbd432a379bc2ffb02d22f7ec6913a48e2
-  data.tar.gz: 909a8ff438172431054a33fb067198e32d98d5b88fa630978831af4f1dd728f0197512ee1f341667eb3039662db618122d7da5d0cc4cc55838625f685f7d7a9b
+  metadata.gz: b972788258b8261d6e59a093268f31be74d3db13535b0db63199aae9ed36b93602d6b8034439152ce62361797eb3990b747d33a55551baa01bd2d1a9aed6bf6f
+  data.tar.gz: cc3b0bb1ecd8c0f0e7135989e96bb826d35f1406ecc3482e88a00f77aaf1df0c8ab5c6f98cb11b37c2c83f77b2d9d762f227d20a0bec44a27cbe48616c31f4a4

data/README.md CHANGED Viewed

@@ -3,8 +3,9 @@
 [EBNF][] parser and generic parser generator.
 [![Gem Version](https://badge.fury.io/rb/ebnf.png)](https://badge.fury.io/rb/ebnf)
-[![Build Status](https://secure.travis-ci.org/dryruby/ebnf.png?branch=master)](https://travis-ci.org/dryruby/ebnf)
-[![Coverage Status](https://coveralls.io/repos/dryruby/ebnf/badge.svg)](https://coveralls.io/r/dryruby/ebnf)
+[![Build Status](https://github.com/dryruby/ebnf/workflows/CI/badge.svg?branch=develop)](https://github.com/dryruby/ebnf/actions?query=workflow%3ACI)
+[![Coverage Status](https://coveralls.io/repos/dryruby/ebnf/badge.svg?branch=develop)](https://coveralls.io/r/dryruby/ebnf?branch=develop)
+[![Gitter chat](https://badges.gitter.im/ruby-rdf/rdf.png)](https://gitter.im/ruby-rdf/rdf)
 ## Description
 This is a [Ruby][] implementation of an [EBNF][] and [BNF][] parser and parser generator.
@@ -101,6 +102,8 @@ On a parsing failure, and exception is raised with information that may be usefu
 The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][] (see {file:etc/ebnf.ebnf EBNF grammar}) as defined in the
 [XML 1.0 recommendation](https://www.w3.org/TR/REC-xml/), with minor extensions:
+Note that the grammar includes an optional `[identifer]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser,  add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
 The character set for EBNF is UTF-8.
 The general form of a rule is:
@@ -259,7 +262,8 @@ This repository uses [Git Flow](https://github.com/nvie/gitflow) to mange develo
   list in the the `README`. Alphabetical order applies.
 * Do note that in order for us to merge any non-trivial changes (as a rule
   of thumb, additions larger than about 15 lines of code), we need an
-  explicit [public domain dedication][PDD] on record from you.
+  explicit [public domain dedication][PDD] on record from you,
+  which you will be asked to agree to on the first commit to a repo within the organization.
 ## License
 This is free and unencumbered public domain software. For more information,

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 2.1.0
1	+ 2.2.0

data/bin/ebnf CHANGED Viewed

@@ -34,7 +34,7 @@ OPT_ARGS = [
   ["--prefix", "-p",    GetoptLong::REQUIRED_ARGUMENT,"Prefix to use when generating Turtle"],
   ["--progress", "-v",  GetoptLong::NO_ARGUMENT,      "Detail on execution"],
   ["--renumber",        GetoptLong::NO_ARGUMENT,      "Renumber parsed reules"],
-  ["--validate",        GetoptLong::NO_ARGUMENT,      "Validate grammar"],
+  ["--validate",        GetoptLong::NO_ARGUMENT,      "Validate grammar and any generated HTML"],
   ["--help", "-?",      GetoptLong::NO_ARGUMENT,      "This message"]
 ]
 def usage
@@ -67,7 +67,7 @@ opts.each do |opt, arg|
     end
     options[:format] = arg.to_sym
   when '--format'
-    unless %w(abnf abnfh ebnf html isoebnf isoebnfh rb sxp).include?(arg)
+    unless %w(abnf abnfh ebnf html isoebnf isoebnfh rb sxp ttl).include?(arg)
       STDERR.puts("unrecognized output format #{arg}")
       usage
     end
@@ -99,11 +99,11 @@ ebnf.renumber! if options[:renumber]
 res = case options[:output_format]
 when :abnf      then ebnf.to_s(format: :abnf)
-when :abnfh     then ebnf.to_html(format: :abnf)
+when :abnfh     then ebnf.to_html(format: :abnf, validate: options[:validate])
 when :ebnf      then ebnf.to_s
-when :html      then ebnf.to_html
+when :html      then ebnf.to_html(validate: options[:validate])
 when :isoebnf   then ebnf.to_s(format: :isoebnf)
-when :isoebnfh  then ebnf.to_html(format: :isoebnf)
+when :isoebnfh  then ebnf.to_html(format: :isoebnf, validate: options[:validate])
 when :sxp       then ebnf.to_sxp
 when :ttl       then ebnf.to_ttl(options[:prefix], options[:namespace])
 when :rb        then ebnf.to_ruby(out, grammarFile: ARGV[0], **options)

data/etc/doap.ttl CHANGED Viewed

@@ -12,11 +12,18 @@
   doap:name          "ebnf" ;
   doap:homepage      <https://github.com/dryruby/ebnf> ;
   doap:license       <https://unlicense.org/1.0/> ;
-  doap:shortdesc     "EBNF parser and parser generator"@en ;
-  doap:description   "EBNF is a Ruby parser for W3C EBNF and a parser generator for compliant LL(1) grammars."@en ;
+  doap:shortdesc     "EBNF parser and parser generator in Ruby."@en ;
+  doap:description   "EBNF is a Ruby parser for W3C EBNF and a parser generator for PEG and LL(1). Also includes parsing modes for ISO EBNF and ABNF."@en ;
   doap:created       "2011-08-29"^^xsd:date ;
   doap:programming-language "Ruby" ;
-  doap:implements    <http://dbpedia.org/resource/Compiler-compiler> ;
+  doap:implements    <http://dbpedia.org/resource/Compiler-compiler>,
+                     <https://en.wikipedia.org/wiki/LL_parser>,
+                     <https://en.wikipedia.org/wiki/Parsing_expression_grammar>,
+                     <https://pdos.csail.mit.edu/~baford/packrat/thesis/>,
+                     <https://www.w3.org/TR/REC-xml/#sec-notation>,
+                     <https://en.wikipedia.org/wiki/Backus–Naur_form>,
+                     <https://www.iso.org/standard/26153.html>,
+                     <https://www.rfc-editor.org/rfc/rfc5234>;
   doap:category      <http://dbpedia.org/resource/Resource_Description_Framework>,
                      <http://dbpedia.org/resource/Ruby_(programming_language)> ;
   doap:download-page <> ;
@@ -27,7 +34,4 @@
   doap:maintainer    <https://greggkellogg.net/foaf#me> ;
   doap:documenter    <https://greggkellogg.net/foaf#me> ;
   foaf:maker         <https://greggkellogg.net/foaf#me> ;
-  dc:title           "ebnf" ;
-  dc:description     "EBNF is a Ruby parser for W3C EBNF and a parser generator for compliant LL(1) grammars."@en ;
-  dc:date            "2011-08-29"^^xsd:date ;
   dc:creator         <https://greggkellogg.net/foaf#me> .

data/etc/ebnf.html CHANGED Viewed

@@ -11,7 +11,7 @@
       <td>[2]</td>
       <td><code>declaration</code></td>
       <td>::=</td>
-      <td>"@terminals" <code>|</code> <a href="#grammar-production-pass">pass</a></td>
+      <td>&quot;@terminals&quot; <code>|</code> <a href="#grammar-production-pass">pass</a></td>
     </tr>
     <tr id="grammar-production-rule">
       <td>[3]</td>
@@ -53,61 +53,24 @@
       <td>[9]</td>
       <td><code>primary</code></td>
       <td>::=</td>
-      <td><a href="#grammar-production-HEX">HEX</a></td>
-    </tr>
-    <tr>
-      <td>[9]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><a href="#grammar-production-SYMBOL">SYMBOL</a></td>
-    </tr>
-    <tr>
-      <td>[9]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><a href="#grammar-production-O_RANGE">O_RANGE</a></td>
-    </tr>
-    <tr>
-      <td>[9]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><a href="#grammar-production-RANGE">RANGE</a></td>
-    </tr>
-    <tr>
-      <td>[9]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><a href="#grammar-production-STRING1">STRING1</a></td>
-    </tr>
-    <tr>
-      <td>[9]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><a href="#grammar-production-STRING2">STRING2</a></td>
-    </tr>
-    <tr>
-      <td>[9]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><code>(</code> "<code class="grammar-literal">(</code>" <a href="#grammar-production-expression">expression</a> "<code class="grammar-literal">)</code>"<code>)</code> </td>
+      <td><a href="#grammar-production-HEX">HEX</a> <code>|</code> <a href="#grammar-production-SYMBOL">SYMBOL</a> <code>|</code> <a href="#grammar-production-O_RANGE">O_RANGE</a> <code>|</code> <a href="#grammar-production-RANGE">RANGE</a> <code>|</code> <a href="#grammar-production-STRING1">STRING1</a> <code>|</code> <a href="#grammar-production-STRING2">STRING2</a> <code>|</code> <code>(</code> "<code class="grammar-literal">(</code>" <a href="#grammar-production-expression">expression</a> "<code class="grammar-literal">)</code>"<code>)</code> </td>
     </tr>
     <tr id="grammar-production-pass">
       <td>[10]</td>
       <td><code>pass</code></td>
       <td>::=</td>
-      <td>"@pass" <a href="#grammar-production-expression">expression</a></td>
+      <td>&quot;@pass&quot; <a href="#grammar-production-expression">expression</a></td>
     </tr>
-    <tr id="grammar-production-">
-      <td>@terminals</td>
-      <td><code></code></td>
+    <tr>
+      <td colspan=2>@terminals</td>
       <td></td>
-      <td><strong>Productions for terminals</strong></td>
+      <td><strong># Productions for terminals</strong></td>
     </tr>
     <tr id="grammar-production-LHS">
       <td>[11]</td>
       <td><code>LHS</code></td>
       <td>::=</td>
-      <td><code>(</code> "<code class="grammar-literal">[</code>" <a href="#grammar-production-SYMBOL">SYMBOL</a> "<code class="grammar-literal">]</code>" <code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code>+</code> <code>)</code> <code>?</code>  <a href="#grammar-production-SYMBOL">SYMBOL</a> <code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code>*</code>  "::="</td>
+      <td><code>(</code> "<code class="grammar-literal">[</code>" <a href="#grammar-production-SYMBOL">SYMBOL</a> "<code class="grammar-literal">]</code>" <code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code>+</code> <code>)</code> <code>?</code>  <a href="#grammar-production-SYMBOL">SYMBOL</a> <code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code>*</code>  &quot;::=&quot;</td>
     </tr>
     <tr id="grammar-production-SYMBOL">
       <td>[12]</td>
@@ -119,91 +82,37 @@
       <td>[13]</td>
       <td><code>HEX</code></td>
       <td>::=</td>
-      <td>"#x" <code>(</code> <code>[</code> <code class="grammar-literal">a-f</code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-literal">A-F</code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-literal">0-9</code><code>]</code> <code>)</code> <code>+</code> </td>
+      <td>&quot;#x&quot; <code>(</code> <code>[</code> <code class="grammar-literal">a-f</code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-literal">A-F</code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-literal">0-9</code><code>]</code> <code>)</code> <code>+</code> </td>
     </tr>
     <tr id="grammar-production-RANGE">
       <td>[14]</td>
       <td><code>RANGE</code></td>
       <td>::=</td>
-      <td>"<code class="grammar-literal">[</code>"</td>
-    </tr>
-    <tr id="grammar-production-">
-      <td>[14]</td>
-      <td><code></code></td>
-      <td></td>
-      <td><code>(</code> <code>(</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-R_CHAR">R_CHAR</a><code>)</code><code>(</code> <a href="#grammar-production-HEX">HEX</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-HEX">HEX</a><code>)</code>  <code>|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code>|</code> <a href="#grammar-production-HEX">HEX</a><code>)</code> <code>+</code></td>
-    </tr>
-    <tr id="grammar-production-">
-      <td>[14]</td>
-      <td><code></code></td>
-      <td></td>
-      <td>"<code class="grammar-literal">-</code>"<code>?</code></td>
-    </tr>
-    <tr id="grammar-production-">
-      <td>[14]</td>
-      <td><code></code></td>
-      <td></td>
-      <td><code>(</code> "<code class="grammar-literal">]</code>" <code>-</code> <a href="#grammar-production-LHS">LHS</a><code>)</code> </td>
+      <td>"<code class="grammar-literal">[</code>" <code>(</code> <code>(</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-R_CHAR">R_CHAR</a><code>)</code>  <code>|</code> <code>(</code> <a href="#grammar-production-HEX">HEX</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-HEX">HEX</a><code>)</code>  <code>|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code>|</code> <a href="#grammar-production-HEX">HEX</a><code>)</code> <code>+</code>  "<code class="grammar-literal">-</code>"<code>?</code>  <code>(</code> "<code class="grammar-literal">]</code>" <code>-</code> <a href="#grammar-production-LHS">LHS</a><code>)</code> </td>
     </tr>
     <tr id="grammar-production-O_RANGE">
       <td>[15]</td>
       <td><code>O_RANGE</code></td>
       <td>::=</td>
-      <td>"[^"</td>
-    </tr>
-    <tr id="grammar-production-">
-      <td>[15]</td>
-      <td><code></code></td>
-      <td></td>
-      <td><code>(</code> <code>(</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-R_CHAR">R_CHAR</a><code>)</code><code>(</code> <a href="#grammar-production-HEX">HEX</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-HEX">HEX</a><code>)</code>  <code>|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code>|</code> <a href="#grammar-production-HEX">HEX</a><code>)</code> <code>+</code></td>
-    </tr>
-    <tr id="grammar-production-">
-      <td>[15]</td>
-      <td><code></code></td>
-      <td></td>
-      <td>"<code class="grammar-literal">-</code>"<code>?</code></td>
-    </tr>
-    <tr id="grammar-production-">
-      <td>[15]</td>
-      <td><code></code></td>
-      <td></td>
-      <td>"<code class="grammar-literal">]</code>"</td>
+      <td>&quot;[^&quot; <code>(</code> <code>(</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-R_CHAR">R_CHAR</a><code>)</code>  <code>|</code> <code>(</code> <a href="#grammar-production-HEX">HEX</a> "<code class="grammar-literal">-</code>" <a href="#grammar-production-HEX">HEX</a><code>)</code>  <code>|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code>|</code> <a href="#grammar-production-HEX">HEX</a><code>)</code> <code>+</code>  "<code class="grammar-literal">-</code>"<code>?</code>  "<code class="grammar-literal">]</code>"</td>
     </tr>
     <tr id="grammar-production-STRING1">
       <td>[16]</td>
       <td><code>STRING1</code></td>
       <td>::=</td>
-      <td>'<code class="grammar-literal">"</code>' <code>(</code> <a href="#grammar-production-CHAR">CHAR</a> <code>-</code> '<code class="grammar-literal">"</code>'<code>)</code> <code>*</code>  '<code class="grammar-literal">"</code>'</td>
+      <td>'<code class="grammar-literal">&quot;</code>' <code>(</code> <a href="#grammar-production-CHAR">CHAR</a> <code>-</code> '<code class="grammar-literal">&quot;</code>'<code>)</code> <code>*</code>  '<code class="grammar-literal">&quot;</code>'</td>
     </tr>
     <tr id="grammar-production-STRING2">
       <td>[17]</td>
       <td><code>STRING2</code></td>
       <td>::=</td>
-      <td>"<code class="grammar-literal">'</code>" <code>(</code> <a href="#grammar-production-CHAR">CHAR</a> <code>-</code> "<code class="grammar-literal">'</code>"<code>)</code> <code>*</code>  "<code class="grammar-literal">'</code>"</td>
+      <td>"<code class="grammar-literal">&apos;</code>" <code>(</code> <a href="#grammar-production-CHAR">CHAR</a> <code>-</code> "<code class="grammar-literal">&apos;</code>"<code>)</code> <code>*</code>  "<code class="grammar-literal">&apos;</code>"</td>
     </tr>
     <tr id="grammar-production-CHAR">
       <td>[18]</td>
       <td><code>CHAR</code></td>
       <td>::=</td>
-      <td><code>[</code> <code class="grammar-char-escape"><abbr title="horizontal tab">#x09</abbr></code><code class="grammar-char-escape"><abbr title="new line">#x0A</abbr></code><code class="grammar-char-escape"><abbr title="carriage return">#x0D</abbr></code><code>]</code></td>
-    </tr>
-    <tr>
-      <td>[18]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><code>[</code> <code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code class="grammar-literal">-</code><code class="grammar-char-escape"><abbr title="unicode '퟿'">#xD7FF</abbr></code><code>]</code></td>
-    </tr>
-    <tr>
-      <td>[18]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><code>[</code> <code class="grammar-char-escape"><abbr title="unicode ''">#xE000</abbr></code><code class="grammar-literal">-</code><code class="grammar-char-escape"><abbr title="unicode '�'">#xFFFD</abbr></code><code>]</code></td>
-    </tr>
-    <tr>
-      <td>[18]</td>
-      <td><code></code></td>
-      <td>|</td>
-      <td><code>[</code> <code class="grammar-char-escape"><abbr title="unicode '𐀀'">#x00010000</abbr></code><code class="grammar-literal">-</code><code class="grammar-char-escape"><abbr title="unicode '􏿿'">#x0010FFFF</abbr></code><code>]</code> </td>
+      <td><code>[</code> <code class="grammar-char-escape"><abbr title="horizontal tab">#x09</abbr></code><code class="grammar-char-escape"><abbr title="new line">#x0A</abbr></code><code class="grammar-char-escape"><abbr title="carriage return">#x0D</abbr></code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code class="grammar-literal">-</code><code class="grammar-char-escape"><abbr title="unicode 'Reserved'">#xD7FF</abbr></code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-char-escape"><abbr title="unicode 'Private-use'">#xE000</abbr></code><code class="grammar-literal">-</code><code class="grammar-char-escape"><abbr title="unicode 'Graphic'">#xFFFD</abbr></code><code>]</code>  <code>|</code> <code>[</code> <code class="grammar-char-escape"><abbr title="unicode 'Graphic'">#x00010000</abbr></code><code class="grammar-literal">-</code><code class="grammar-char-escape"><abbr title="unicode 'Noncharacter'">#x0010FFFF</abbr></code><code>]</code> </td>
     </tr>
     <tr id="grammar-production-R_CHAR">
       <td>[19]</td>
@@ -224,28 +133,24 @@
       <td><code>[</code> <code class="grammar-char-escape"><abbr title="horizontal tab">#x09</abbr></code><code class="grammar-char-escape"><abbr title="new line">#x0A</abbr></code><code class="grammar-char-escape"><abbr title="carriage return">#x0D</abbr></code><code class="grammar-char-escape"><abbr title="space">#x20</abbr></code><code>]</code></td>
     </tr>
     <tr>
-      <td>[21]</td>
-      <td><code></code></td>
+      <td colspan=2></td>
       <td>|</td>
-      <td><code>(</code> <code>(</code> <code>(</code> "<code class="grammar-literal">#</code>" <code>-</code> "#x"<code>)</code>  <code>|</code> "//"<code>)</code>  <code>[</code> <code class="grammar-literal">^</code><code class="grammar-char-escape"><abbr title="new line">#x0A</abbr></code><code class="grammar-char-escape"><abbr title="carriage return">#x0D</abbr></code><code>]</code> <code>*</code> <code>)</code></td>
+      <td><code>(</code> <code>(</code> <code>(</code> "<code class="grammar-literal">#</code>" <code>-</code> &quot;#x&quot;<code>)</code>  <code>|</code> &quot;//&quot;<code>)</code>  <code>[</code> <code class="grammar-literal">^</code><code class="grammar-char-escape"><abbr title="new line">#x0A</abbr></code><code class="grammar-char-escape"><abbr title="carriage return">#x0D</abbr></code><code>]</code> <code>*</code> <code>)</code></td>
     </tr>
     <tr>
-      <td>[21]</td>
-      <td><code></code></td>
+      <td colspan=2></td>
       <td>|</td>
-      <td><code>(</code> "/*" <code>(</code> <code>(</code> "<code class="grammar-literal">*</code>" <code>[</code> <code class="grammar-literal">^/</code><code>]</code> <code>)</code> <code>?</code>  <code>|</code> <code>[</code> <code class="grammar-literal">^*</code><code>]</code> <code>)</code> <code>*</code>  "*/"<code>)</code></td>
+      <td><code>(</code> &quot;/*&quot; <code>(</code> <code>(</code> "<code class="grammar-literal">*</code>" <code>[</code> <code class="grammar-literal">^/</code><code>]</code> <code>)</code> <code>?</code>  <code>|</code> <code>[</code> <code class="grammar-literal">^*</code><code>]</code> <code>)</code> <code>*</code>  &quot;*/&quot;<code>)</code></td>
     </tr>
     <tr>
-      <td>[21]</td>
-      <td><code></code></td>
+      <td colspan=2></td>
       <td>|</td>
-      <td><code>(</code> "(*" <code>(</code> <code>(</code> "<code class="grammar-literal">*</code>" <code>[</code> <code class="grammar-literal">^)</code><code>]</code> <code>)</code> <code>?</code>  <code>|</code> <code>[</code> <code class="grammar-literal">^*</code><code>]</code> <code>)</code> <code>*</code>  "*)"<code>)</code> </td>
+      <td><code>(</code> &quot;(*&quot; <code>(</code> <code>(</code> "<code class="grammar-literal">*</code>" <code>[</code> <code class="grammar-literal">^)</code><code>]</code> <code>)</code> <code>?</code>  <code>|</code> <code>[</code> <code class="grammar-literal">^*</code><code>]</code> <code>)</code> <code>*</code>  &quot;*)&quot;<code>)</code> </td>
     </tr>
-    <tr id="grammar-production-">
-      <td>@pass</td>
-      <td><code></code></td>
-      <td></td>
+    <tr>
+      <td colspan=2>@pass</td>
       <td></td>
+      <td><a href="#grammar-production-PASS">PASS</a></td>
     </tr>
   </tbody>
 </table>

data/etc/ebnf.ll1.rb CHANGED Viewed

@@ -1,4 +1,4 @@
-# This file is automatically generated by ebnf version 2.0.0
+# This file is automatically generated by ebnf version 2.1.2
 # Derived from etc/ebnf.ebnf
 module Meta
   START = :ebnf

data/etc/ebnf.peg.rb CHANGED Viewed

@@ -1,4 +1,4 @@
-# This file is automatically generated by ebnf version 2.0.0
+# This file is automatically generated by ebnf version 2.1.2
 # Derived from etc/ebnf.ebnf
 module EBNFMeta
   RULES = [

data/lib/ebnf/base.rb CHANGED Viewed

@@ -220,9 +220,10 @@ module EBNF
     # Output formatted EBNF as HTML
     #
     # @param [:abnf, :ebnf, :isoebnf] format (:ebnf)
+    # @param [Boolean] validate (false) validate generated HTML.
     # @return [String]
-    def to_html(format: :ebnf)
-      Writer.html(*ast, format: format)
+    def to_html(format: :ebnf, validate: false)
+      Writer.html(*ast, format: format, validate: validate)
     end
     ##

data/lib/ebnf/ll1/lexer.rb CHANGED Viewed

@@ -32,60 +32,12 @@ module EBNF::LL1
   # @see https://en.wikipedia.org/wiki/Lexical_analysis
   class Lexer
     include Enumerable
-    ESCAPE_CHARS         = {
-      '\\t'   => "\t",  # \u0009 (tab)
-      '\\n'   => "\n",  # \u000A (line feed)
-      '\\r'   => "\r",  # \u000D (carriage return)
-      '\\b'   => "\b",  # \u0008 (backspace)
-      '\\f'   => "\f",  # \u000C (form feed)
-      '\\"'  => '"',    # \u0022 (quotation mark, double quote mark)
-      "\\'"  => '\'',   # \u0027 (apostrophe-quote, single quote mark)
-      '\\\\' => '\\'    # \u005C (backslash)
-    }.freeze
-    ESCAPE_CHAR4        = /\\u(?:[0-9A-Fa-f]{4,4})/u.freeze    # \uXXXX
-    ESCAPE_CHAR8        = /\\U(?:[0-9A-Fa-f]{8,8})/u.freeze    # \UXXXXXXXX
-    ECHAR               = /\\./u.freeze                        # More liberal unescaping
-    UCHAR               = /#{ESCAPE_CHAR4}|#{ESCAPE_CHAR8}/n.freeze
+    include ::EBNF::Unescape
     ##
     # @return [Regexp] defines whitespace, including comments, otherwise whitespace must be explicit in terminals
     attr_reader :whitespace
-    ##
-    # Returns a copy of the given `input` string with all `\uXXXX` and
-    # `\UXXXXXXXX` Unicode codepoint escape sequences replaced with their
-    # unescaped UTF-8 character counterparts.
-    #
-    # @param  [String] string
-    # @return [String]
-    # @see    https://www.w3.org/TR/rdf-sparql-query/#codepointEscape
-    def self.unescape_codepoints(string)
-      string = string.dup
-      string.force_encoding(Encoding::ASCII_8BIT) if string.respond_to?(:force_encoding)
-      # Decode \uXXXX and \UXXXXXXXX code points:
-      string = string.gsub(UCHAR) do |c|
-        s = [(c[2..-1]).hex].pack('U*')
-        s.respond_to?(:force_encoding) ? s.force_encoding(Encoding::ASCII_8BIT) : s
-      end
-      string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding)
-      string
-    end
-    ##
-    # Returns a copy of the given `input` string with all string escape
-    # sequences (e.g. `\n` and `\t`) replaced with their unescaped UTF-8
-    # character counterparts.
-    #
-    # @param  [String] input
-    # @return [String]
-    # @see    https://www.w3.org/TR/rdf-sparql-query/#grammarEscapes
-    def self.unescape_string(input)
-      input.gsub(ECHAR) { |escaped| ESCAPE_CHARS[escaped] || escaped[1..-1]}
-    end
     ##
     # Tokenizes the given `input` string or stream.
     #
@@ -338,7 +290,7 @@ module EBNF::LL1
       # @return [String]
       def unescape(string)
         if @options[:unescape]
-          Lexer.unescape_string(Lexer.unescape_codepoints(string))
+          EBNF::Unescape.unescape(string)
         else
           string
         end

data/lib/ebnf/ll1/scanner.rb CHANGED Viewed

@@ -69,7 +69,6 @@ module EBNF::LL1
     # @return [String]
     def rest
       feed_me
-      @lineno += 1 if eos?
       encode_utf8 super
     end

data/lib/ebnf/native.rb CHANGED Viewed

@@ -287,10 +287,10 @@ module EBNF
       case m = s[0,1]
       when '"', "'" # STRING1 or STRING2
         l, s = s[1..-1].split(m.rstrip, 2)
-        [LL1::Lexer.unescape_string(l), s]
+        [Unescape.unescape_string(l), s]
       when '[' # RANGE, O_RANGE
         l, s = s[1..-1].split(/(?<=[^\\])\]/, 2)
-        [[:range, LL1::Lexer.unescape_string(l)], s]
+        [[:range, Unescape.unescape_string(l)], s]
       when '#' # HEX
         s.match(/(#x\h+)(.*)$/)
         l, s = $1, $2

data/lib/ebnf/peg/parser.rb CHANGED Viewed

@@ -55,6 +55,7 @@ module EBNF::PEG
       def production_handlers; (@production_handlers ||= {}); end
       def terminal_handlers; (@terminal_handlers ||= {}); end
       def terminal_regexps; (@terminal_regexps ||= {}); end
+      def terminal_options; (@terminal_options ||= {}); end
       ##
       # Defines the pattern for a terminal node and a block to be invoked
@@ -72,9 +73,6 @@ module EBNF::PEG
       #   defaults to the expression defined in the associated rule.
       #   If unset, the terminal rule is used for matching.
       # @param [Hash] options
-      # @option options [Hash{String => String}] :map ({})
-      #   A mapping from terminals, in lower-case form, to
-      #   their canonical value
       # @option options [Boolean] :unescape
       #   Cause strings and codepoints to be unescaped.
       # @yield [value, prod]
@@ -88,6 +86,7 @@ module EBNF::PEG
       def terminal(term, regexp = nil, **options, &block)
         terminal_regexps[term] = regexp if regexp
         terminal_handlers[term] = block if block_given?
+        terminal_options[term] = options.freeze
       end
       ##
@@ -102,6 +101,8 @@ module EBNF::PEG
       #   Options which are returned from {Parser#onStart}.
       # @option options [Boolean] :as_hash (false)
       #   If the production is a `seq`, causes the value to be represented as a single hash, rather than an array of individual hashes for each sub-production. Note that this is not always advisable due to the possibility of repeated productions within the sequence.
+      # @option options[:upper, :lower] :insensitive_strings
+      #   Perform case-insensitive match of strings not defined as terminals, and map to either upper or lower case.
       # @yield [data, block]
       # @yieldparam [Hash] data
       #   A Hash defined for the current production, during :start
@@ -184,6 +185,8 @@ module EBNF::PEG
     # @option options[Integer] :high_water passed to lexer
     # @option options [Logger] :logger for errors/progress/debug.
     # @option options[Integer] :low_water passed to lexer
+    # @option options[Boolean] :seq_hash (false)
+    #   If `true`, sets the default for the value sent to a production handler that is for a `seq` to a hash composed of the flattened consitutent hashes that are otherwise provided.
     # @option options [Symbol, Regexp] :whitespace
     #   Symbol of whitespace rule (defaults to `@pass`), or a regular expression
     #   for eating whitespace between non-terminal rules (strongly encouraged).
@@ -197,6 +200,7 @@ module EBNF::PEG
     # @raise [Exception] Raises exceptions for parsing errors
     #   or errors raised during processing callbacks. Internal
     #   errors are raised using {Error}.
+    # @todo FIXME implement seq_hash
     def parse(input = nil, start = nil, rules = nil, **options, &block)
       start ||= options[:start]
       rules ||= options[:rules] || []
@@ -269,7 +273,8 @@ module EBNF::PEG
     # @param [String] message Error string
     # @param [Hash{Symbol => Object}] options
     # @option options [URI, #to_s] :production
-    # @option options [Token] :token
+    # @option options [Boolean] :raise abort furhter processing
+    # @option options [Array] :backtrace state where error occured
     # @see #debug
     def error(node, message, **options)
       lineno = options[:lineno] || (scanner.lineno if scanner)
@@ -282,7 +287,11 @@ module EBNF::PEG
       @recovering = true
       debug(node, m, level: 3, **options)
       if options[:raise] || @options[:validate]
-        raise Error.new(m, lineno: lineno, rest: options[:rest], production: options[:production])
+        raise Error.new(m,
+                lineno: lineno,
+                rest: options[:rest],
+                production: options[:production],
+                backtrace: options[:backtrace])
       end
     end
@@ -365,25 +374,27 @@ module EBNF::PEG
       @productions << prod
       debug("#{prod}(:start)", "",
         lineno: (scanner.lineno if scanner),
-        pos: (scanner.pos if scanner),
-        depth: (depth + 1)) {"#{prod}, pos: #{scanner ? scanner.pos : '?'}, rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"}
+        pos: (scanner.pos if scanner)
+      ) do
+          "#{prod}, pos: #{scanner ? scanner.pos : '?'}, rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
+      end
       if handler
         # Create a new production data element, potentially allowing handler
         # to customize before pushing on the @prod_data stack
-        data = {}
+        data = {_production: prod}
         begin
           self.class.eval_with_binding(self) {
             handler.call(data, @parse_callback)
           }
         rescue ArgumentError, Error => e
-          error("start", "#{e.class}: #{e.message}", production: prod)
+          error("start", "#{e.class}: #{e.message}", production: prod, backtrace: e.backtrace)
           @recovering = false
         end
         @prod_data << data
       elsif self.class.production_handlers[prod]
         # Make sure we push as many was we pop, even if there is no
         # explicit start handler
-        @prod_data << {}
+        @prod_data << {_production: prod}
       end
       return self.class.start_options.fetch(prod, {}) # any options on this production
     end
@@ -397,6 +408,9 @@ module EBNF::PEG
       prod = @productions.last
       handler, clear_packrat = self.class.production_handlers[prod]
       data = @prod_data.pop if handler || self.class.start_handlers[prod]
+      error("finish",
+        "prod_data production mismatch: expected #{prod.inspect}, got #{data[:_production].inspect}",
+        production: prod, prod_data: @prod_data) if data && prod != data[:_production]
       if handler && !@recovering && result != :unmatched
         # Pop production data element from stack, potentially allowing handler to use it
         result = begin
@@ -404,14 +418,13 @@ module EBNF::PEG
             handler.call(result, data, @parse_callback)
           }
         rescue ArgumentError, Error => e
-          error("finish", "#{e.class}: #{e.message}", production: prod)
+          error("finish", "#{e.class}: #{e.message}", production: prod, backtrace: e.backtrace)
           @recovering = false
         end
       end
-      progress("#{prod}(:finish)", "",
-               depth: (depth + 1),
-               lineno: (scanner.lineno if scanner),
-               level: result == :unmatched ? 0 : 1) do
+      debug("#{prod}(:finish)", "",
+             lineno: (scanner.lineno if scanner),
+             level: result == :unmatched ? 0 : 1) do
         "#{result.inspect}@(#{scanner ? scanner.pos : '?'}), rest: #{scanner ? scanner.rest[0..20].inspect : '?'}"
       end
       self.clear_packrat if clear_packrat
@@ -433,12 +446,12 @@ module EBNF::PEG
             handler.call(value, parentProd, @parse_callback)
           }
         rescue ArgumentError, Error => e
-          error("terminal", "#{e.class}: #{e.message}", value: value, production: prod)
+          error("terminal", "#{e.class}: #{e.message}", value: value, production: prod, backtrace: e.backtrace)
           @recovering = false
         end
       end
       progress("#{prod}(:terminal)", "",
-               depth: (depth + 2),
+               depth: (depth + 1),
                lineno: (scanner.lineno if scanner),
                level: value == :unmatched ? 0 : 1) do
         "#{value.inspect}@(#{scanner ? scanner.pos : '?'})"
@@ -460,10 +473,19 @@ module EBNF::PEG
     #
     # @param [Symbol] sym
     # @return [Regexp]
-    def find_terminal_regexp(sym)
+    def terminal_regexp(sym)
       self.class.terminal_regexps[sym]
     end
+    ##
+    # Find a regular expression defined for a terminal
+    #
+    # @param [Symbol] sym
+    # @return [Regexp]
+    def terminal_options(sym)
+      self.class.terminal_options[sym]
+    end
     ##
     # Record furthest failure.
     #

data/lib/ebnf/peg/rule.rb CHANGED Viewed

@@ -1,6 +1,8 @@
 module EBNF::PEG
   # Behaviior for parsing a PEG rule
   module Rule
+    include ::EBNF::Unescape
     ##
     # Initialized by parser when loading rules.
     # Used for finding rules and invoking elements of the parse process.
@@ -24,6 +26,7 @@ module EBNF::PEG
     # * `opt`: returns the value matched, or `nil` if unmatched.
     # * `plus`: returns an array of the values matched for the specified production, or `:unmatched`, if none are matched. For Terminals, these are concatenated into a single string.
     # * `range`: returns a string composed of the values matched, or `:unmatched`, if less than `min` are matched.
+    # * `rept`: returns an array of the values matched for the speficied production, or `:unmatched`, if none are matched. For Terminals, these are concatenated into a single string.
     # * `seq`: returns an array composed of single-entry hashes for each matched production indexed by the production name, or `:unmatched` if any production fails to match. For Terminals, returns a string created by concatenating these values. Via option in a `production` or definition, the result can be a single hash with values for each matched production; note that this is not always possible due to the possibility of repeated productions within the sequence.
     # * `star`: returns an array of the values matched for the specified production. For Terminals, these are concatenated into a single string.
     #
@@ -44,9 +47,18 @@ module EBNF::PEG
         # If the terminal is defined with a regular expression,
         # use that to match the input,
         # otherwise,
-        if regexp = parser.find_terminal_regexp(sym)
-          matched = input.scan(regexp)
+        if regexp = parser.terminal_regexp(sym)
+          term_opts = parser.terminal_options(sym)
+          if matched = input.scan(regexp)
+            # Optionally map matched
+            matched = term_opts.fetch(:map, {}).fetch(matched.downcase, matched)
+            # Optionally unescape matched
+            matched = unescape(matched) if term_opts[:unescape]
+          end
           result = parser.onTerminal(sym, (matched ? matched : :unmatched))
           # Update furthest failure for strings and terminals
           parser.update_furthest_failure(input.pos, input.lineno, sym) if result == :unmatched
           parser.packrat[sym][pos] = {
@@ -60,6 +72,7 @@ module EBNF::PEG
         eat_whitespace(input)
       end
       start_options = parser.onStart(sym)
+      string_regexp_opts = start_options[:insensitive_strings] ? Regexp::IGNORECASE : 0
       result = case expr.first
       when :alt
@@ -73,7 +86,12 @@ module EBNF::PEG
             raise "No rule found for #{prod}" unless rule
             rule.parse(input)
           when String
-            input.scan(Regexp.new(Regexp.quote(prod))) || :unmatched
+            s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
+            case start_options[:insensitive_strings]
+            when :lower then s && s.downcase
+            when :upper then s && s.upcase
+            else s
+            end || :unmatched
           end
           if alt == :unmatched
             # Update furthest failure for strings and terminals
@@ -111,7 +129,7 @@ module EBNF::PEG
           raise "No rule found for #{prod}" unless rule
           rule.parse(input)
         when String
-          input.scan(Regexp.new(Regexp.quote(prod))) || :unmatched
+          input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts)) || :unmatched
         end
         if res != :unmatched
           # Update furthest failure for terminals
@@ -122,7 +140,7 @@ module EBNF::PEG
         end
       when :opt
         # Result is the matched value or nil
-        opt = rept(input, 0, 1, expr[1])
+        opt = rept(input, 0, 1, expr[1], string_regexp_opts, **start_options)
         # Update furthest failure for strings and terminals
         parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
@@ -130,7 +148,7 @@ module EBNF::PEG
       when :plus
         # Result is an array of all expressions while they match,
         # at least one must match
-        plus = rept(input, 1, '*', expr[1])
+        plus = rept(input, 1, '*', expr[1], string_regexp_opts)
         # Update furthest failure for strings and terminals
         parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
@@ -142,6 +160,14 @@ module EBNF::PEG
           parser.update_furthest_failure(input.pos, input.lineno, expr[1])
           :unmatched
         end
+      when :rept
+        # Result is an array of all expressions while they match,
+        # an empty array of none match
+        rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts)
+        # # Update furthest failure for strings and terminals
+        parser.update_furthest_failure(input.pos, input.lineno, expr[3]) if terminal?
+        rept.is_a?(Array) && terminal? ? rept.join("") : rept
       when :seq
         # Evaluate each expression into an array of hashes where each hash contains a key from the associated production and the value is the parsed value of that production. Returns :unmatched if the input does not match the production. Value ordering is ensured by native Hash ordering.
         seq = expr[1..-1].each_with_object([]) do |prod, accumulator|
@@ -152,7 +178,12 @@ module EBNF::PEG
             raise "No rule found for #{prod}" unless rule
             rule.parse(input)
           when String
-            input.scan(Regexp.new(Regexp.quote(prod))) || :unmatched
+            s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))
+            case start_options[:insensitive_strings]
+            when :lower then s && s.downcase
+            when :upper then s && s.upcase
+            else s
+            end || :unmatched
           end
           if res == :unmatched
             # Update furthest failure for strings and terminals
@@ -173,7 +204,7 @@ module EBNF::PEG
       when :star
         # Result is an array of all expressions while they match,
         # an empty array of none match
-        star = rept(input, 0, '*', expr[1])
+        star = rept(input, 0, '*', expr[1], string_regexp_opts)
         # Update furthest failure for strings and terminals
         parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal?
@@ -205,8 +236,9 @@ module EBNF::PEG
     # @param [Integer] max
     #   If it is an integer, it stops matching after max entries.
     # @param [Symbol, String] prod
+    # @param [Integer] string_regexp_opts
     # @return [:unmatched, Array]
-    def rept(input, min, max, prod)
+    def rept(input, min, max, prod, string_regexp_opts, **options)
       result = []
       case prod
@@ -218,9 +250,13 @@ module EBNF::PEG
           result << res
         end
       when String
-        while (res = input.scan(Regexp.new(Regexp.quote(prod)))) && (max == '*' || result.length < max)
+        while (res = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))) && (max == '*' || result.length < max)
           eat_whitespace(input) unless terminal?
-          result << res
+          result << case options[:insensitive_strings]
+          when :lower then res.downcase
+          when :upper then res.upcase
+          else res
+          end
         end
       end

data/lib/ebnf/unescape.rb ADDED Viewed

@@ -0,0 +1,62 @@
+# encoding: utf-8
+# Unsecape strings
+module EBNF::Unescape
+  ESCAPE_CHARS         = {
+    '\\t'   => "\t",  # \u0009 (tab)
+    '\\n'   => "\n",  # \u000A (line feed)
+    '\\r'   => "\r",  # \u000D (carriage return)
+    '\\b'   => "\b",  # \u0008 (backspace)
+    '\\f'   => "\f",  # \u000C (form feed)
+    '\\"'  => '"',    # \u0022 (quotation mark, double quote mark)
+    "\\'"  => '\'',   # \u0027 (apostrophe-quote, single quote mark)
+    '\\\\' => '\\'    # \u005C (backslash)
+  }.freeze
+  ESCAPE_CHAR4        = /\\u(?:[0-9A-Fa-f]{4,4})/u.freeze    # \uXXXX
+  ESCAPE_CHAR8        = /\\U(?:[0-9A-Fa-f]{8,8})/u.freeze    # \UXXXXXXXX
+  ECHAR               = /\\./u.freeze                        # More liberal unescaping
+  UCHAR               = /#{ESCAPE_CHAR4}|#{ESCAPE_CHAR8}/n.freeze
+  ##
+  # Returns a copy of the given `input` string with all `\uXXXX` and
+  # `\UXXXXXXXX` Unicode codepoint escape sequences replaced with their
+  # unescaped UTF-8 character counterparts.
+  #
+  # @param  [String] string
+  # @return [String]
+  # @see    https://www.w3.org/TR/rdf-sparql-query/#codepointEscape
+  def unescape_codepoints(string)
+    string = string.dup
+    string.force_encoding(Encoding::ASCII_8BIT) if string.respond_to?(:force_encoding)
+    # Decode \uXXXX and \UXXXXXXXX code points:
+    string = string.gsub(UCHAR) do |c|
+      s = [(c[2..-1]).hex].pack('U*')
+      s.respond_to?(:force_encoding) ? s.force_encoding(Encoding::ASCII_8BIT) : s
+    end
+    string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding)
+    string
+  end
+  module_function :unescape_codepoints
+  ##
+  # Returns a copy of the given `input` string with all string escape
+  # sequences (e.g. `\n` and `\t`) replaced with their unescaped UTF-8
+  # character counterparts.
+  #
+  # @param  [String] input
+  # @return [String]
+  # @see    https://www.w3.org/TR/rdf-sparql-query/#grammarEscapes
+  def unescape_string(input)
+    input.gsub(ECHAR) { |escaped| ESCAPE_CHARS[escaped] || escaped[1..-1]}
+  end
+  module_function :unescape_string
+  # Perform string and codepoint unescaping if defined for this terminal
+  # @param [String] string
+  # @return [String]
+  def unescape(string)
+    unescape_string(unescape_codepoints(string))
+  end
+  module_function :unescape
+end

data/lib/ebnf/writer.rb CHANGED Viewed

@@ -2,12 +2,14 @@
 require 'rdf'
 require 'strscan'    unless defined?(StringScanner)
 require "ostruct"
+require 'unicode/types'
 ##
 # Serialize ruleset back to EBNF
 module EBNF
   class Writer
     LINE_LENGTH = 80
+    LINE_LENGTH_HTML = 200
     # ASCII escape names
     ASCII_ESCAPE_NAMES = [
@@ -85,22 +87,23 @@ module EBNF
     #
     # @param  [Array<Rule>] rules
     # @param [:abnf, :ebnf, :isoebnf] format (:ebnf)
+    # @param [Boolean] validate (false) validate generated HTML.
     # @return [Object]
-    def self.html(*rules, format: :ebnf)
+    def self.html(*rules, format: :ebnf, validate: false)
       require 'stringio' unless defined?(StringIO)
       buf = StringIO.new
-      Writer.new(rules, out: buf, html: true, format: format)
+      Writer.new(rules, out: buf, html: true, format: format, validate: validate)
       buf.string
     end
     ##
     # @param [Array<Rule>] rules
+    # @param [:abnf, :ebnf, :isoebnf] format (:ebnf)
+    # @param [Boolean] html (false) generate HTML output
+    # @param [Boolean] validate (false) validate generated HTML.
     # @param [Hash{Symbol => Object}] options
     # @param [#write] out ($stdout)
-    # @param [:abnf, :ebnf, :isoebnf] format (:ebnf)
-    # @option options [Symbol] format
-    # @option options [Boolean] html (false)
-    def initialize(rules, out: $stdout, html: false, format: :ebnf, **options)
+    def initialize(rules, out: $stdout, html: false, format: :ebnf, validate: false, **options)
       @options = options.merge(html: html)
       return if rules.empty?
@@ -118,19 +121,24 @@ module EBNF
         lhs_fmt = "%<id>-#{max_id+2}s " + lhs_fmt
         lhs_length += max_id + 3
       end
-      rhs_length = LINE_LENGTH - lhs_length
+      rhs_length = (html ? LINE_LENGTH_HTML : LINE_LENGTH) - lhs_length
       if html
         # Output as formatted HTML
         begin
           require 'erubis'
+          require 'htmlentities'
+          @coder = HTMLEntities.new
           eruby = Erubis::Eruby.new(ERB_DESC)
           formatted_rules = rules.map do |rule|
             if rule.kind == :terminals || rule.kind == :pass
               OpenStruct.new(id: ("@#{rule.kind}"),
                              sym: nil,
                              assign: nil,
-                             formatted: ("<strong>Productions for terminals</strong>" if rule.kind == :terminals))
+                             formatted: (
+                               rule.kind == :terminals ?
+                                 "<strong># Productions for terminals</strong>" :
+                                 self.send(format_meth, rule.expr)))
             else
               formatted_expr = self.send(format_meth, rule.expr)
               # Measure text without markup
@@ -151,7 +159,7 @@ module EBNF
                     formatted.sub!(%r{\s*<code>\|</code>\s*}, '')
                     (ndx > 0 ? (rule.alt? ? '|' : '') : '=')
                   end
-                  lines << OpenStruct.new(id: ("[#{rule.id}]" if rule.id),
+                  lines << OpenStruct.new(id: ((ndx == 0 ? "[#{rule.id}]" : "") if rule.id),
                                           sym: (rule.sym if ndx == 0 || format == :abnf),
                                           assign: assign,
                                           formatted: formatted)
@@ -168,10 +176,24 @@ module EBNF
               end
             end
           end.flatten
-          out.write eruby.evaluate(format: format, rules: formatted_rules)
+          html_result = eruby.evaluate(format: format, rules: formatted_rules)
+          if validate
+            begin
+              # Validate the output HTML
+              doc = Nokogiri::HTML5("<!DOCTYPE html>" + html_result, max_errors: 10)
+              raise EncodingError, "Errors found in generated HTML:\n  " +
+                doc.errors.map(&:to_s).join("\n  ") unless doc.errors.empty?
+            rescue LoadError, NoMethodError
+              # Skip
+            end
+          end
+          out.write html_result
           return
         rescue LoadError
-          $stderr.puts "Generating HTML requires erubis gem to be loaded"
+          $stderr.puts "Generating HTML requires erubis and htmlentities gems to be loaded"
         end
       end
@@ -216,7 +238,7 @@ module EBNF
     # Format the expression part of a rule
     def format_ebnf(expr, sep: nil, embedded: false)
-      return (@options[:html] ? %(<a href="#grammar-production-#{expr}">#{expr}</a>) : expr.to_s) if expr.is_a?(Symbol)
+      return (@options[:html] ? %(<a href="#grammar-production-#{@coder.encode expr}">#{@coder.encode expr}</a>) : expr.to_s) if expr.is_a?(Symbol)
       if expr.is_a?(String)
         return expr.length == 1 ?
           format_ebnf_char(expr) :
@@ -290,10 +312,10 @@ module EBNF
     # Format a single-character string, prefering hex for non-main ASCII
     def format_ebnf_char(c)
       case c.ord
-      when (0x21)         then (@options[:html] ? %("<code class="grammar-literal">#{c}</code>") : %{"#{c}"})
-      when 0x22           then (@options[:html] ? %('<code class="grammar-literal">"</code>') : %{'"'})
-      when (0x23..0x7e)   then (@options[:html] ? %("<code class="grammar-literal">#{c}</code>") : %{"#{c}"})
-      when (0x80..0xFFFD) then (@options[:html] ? %("<code class="grammar-literal">#{c}</code>") : %{"#{c}"})
+      when (0x21)         then (@options[:html] ? %("<code class="grammar-literal">#{@coder.encode c}</code>") : %{"#{c}"})
+      when 0x22           then (@options[:html] ? %('<code class="grammar-literal">&quot;</code>') : %{'"'})
+      when (0x23..0x7e)   then (@options[:html] ? %("<code class="grammar-literal">#{@coder.encode c}</code>") : %{"#{c}"})
+      when (0x80..0xFFFD) then (@options[:html] ? %("<code class="grammar-literal">#{@coder.encode c}</code>") : %{"#{c}"})
       else                     escape_ebnf_hex(c)
       end
     end
@@ -308,7 +330,7 @@ module EBNF
       while !s.eos?
         case
         when s.scan(/\A[!"\u0024-\u007e]+/)
-          buffer << (@options[:html] ? %(<code class="grammar-literal">#{s.matched}</code>) : s.matched)
+          buffer << (@options[:html] ? %(<code class="grammar-literal">#{@coder.encode s.matched}</code>) : s.matched)
         when s.scan(/\A#x\h+/)
           buffer << escape_ebnf_hex(s.matched[2..-1].hex.chr(Encoding::UTF_8))
         else
@@ -328,7 +350,8 @@ module EBNF
         end
       end
-      "#{quote}#{string}#{quote}"
+      res = "#{quote}#{string}#{quote}"
+      @options[:html] ? @coder.encode(res) : res
     end
     def escape_ebnf_hex(u)
@@ -340,16 +363,20 @@ module EBNF
       end
       char = fmt % u.ord
       if @options[:html]
-        if u.ord <= 0x20
-          char = %(<abbr title="#{ASCII_ESCAPE_NAMES[u.ord]}">#{char}</abbr>)
+        char = if u.ord <= 0x20
+          %(<abbr title="#{ASCII_ESCAPE_NAMES[u.ord]}">#{@coder.encode char}</abbr>)
+        elsif u.ord == 0x22
+          %(<abbr title="quot">>&quot;</abbr>)
         elsif u.ord < 0x7F
-          char = %(<abbr title="ascii '#{u}'">#{char}</abbr>)
+          %(<abbr title="ascii '#{@coder.encode u}'">#{@coder.encode char}</abbr>)
         elsif u.ord == 0x7F
-          char = %(<abbr title="delete">#{char}</abbr>)
+          %(<abbr title="delete">#{@coder.encode char}</abbr>)
         elsif u.ord <= 0xFF
-          char = %(<abbr title="extended ascii '#{u}'">#{char}</abbr>)
+          %(<abbr title="extended ascii '#{@coder.encode char}'">#{char}</abbr>)
+        elsif (%w(Control Private-use Surrogate Noncharacter Reserved) - ::Unicode::Types.of(u)).empty?
+          %(<abbr title="unicode '#{u}'">#{char}</abbr>)
         else
-          char = %(<abbr title="unicode '#{u}'">#{char}</abbr>)
+          %(<abbr title="unicode '#{::Unicode::Types.of(u).first}'">#{char}</abbr>)
         end
         %(<code class="grammar-char-escape">#{char}</code>)
       else
@@ -363,7 +390,7 @@ module EBNF
     # Format the expression part of a rule
     def format_abnf(expr, sep: nil, embedded: false, sensitive: true)
-      return (@options[:html] ? %(<a href="#grammar-production-#{expr}">#{expr}</a>) : expr.to_s) if expr.is_a?(Symbol)
+      return (@options[:html] ? %(<a href="#grammar-production-#{@coder.encode expr}">#{@coder.encode expr}</a>) : expr.to_s) if expr.is_a?(Symbol)
       if expr.is_a?(String)
         if expr.length == 1
           return format_abnf_char(expr)
@@ -380,7 +407,7 @@ module EBNF
           seq.unshift(:seq)
           return format_abnf(seq, sep: nil, embedded: false)
         else
-          return (@options[:html] ? %("<code class="grammar-literal">#{'%s' if sensitive}#{expr}</code>") : %(#{'%s' if sensitive}"#{expr}"))
+          return (@options[:html] ? %("<code class="grammar-literal">#{'%s' if sensitive}#{@coder.encode expr}</code>") : %(#{'%s' if sensitive}"#{expr}"))
         end
       end
       parts = {
@@ -448,7 +475,7 @@ module EBNF
     # Format a single-character string, prefering hex for non-main ASCII
     def format_abnf_char(c)
       if /[\x20-\x21\x23-\x7E]/.match?(c)
-        c.inspect
+        @options[:html] ? %("<code class="grammar-literal">#{@coder.encode c}</code>") : c.inspect
       else
         escape_abnf_hex(c)
       end
@@ -528,15 +555,17 @@ module EBNF
       char =  "%x" + (fmt % u.ord)
       if @options[:html]
         if u.ord <= 0x20
-          char = %(<abbr title="#{ASCII_ESCAPE_NAMES[u.ord]}">#{char}</abbr>)
-        elsif u.ord <= 0x7F
-          char = %(<abbr title="ascii '#{u}'">#{char}</abbr>)
+          char = %(<abbr title="#{ASCII_ESCAPE_NAMES[u.ord]}">#{@coder.encode char}</abbr>)
+        elsif u.ord == 0x22
+          %(<abbr title="quot">>&quot;</abbr>)
+        elsif u.ord < 0x7F
+          char = %(<abbr title="ascii '#{u}'">#{@coder.encode char}</abbr>)
         elsif u.ord == 0x7F
-          char = %(<abbr title="delete">#{char}</abbr>)
+          char = %(<abbr title="delete">#{@coder.encode char}</abbr>)
         elsif u.ord <= 0xFF
           char = %(<abbr title="extended ascii '#{u}'">#{char}</abbr>)
         else
-          char = %(<abbr title="unicode '#{u}'">#{char}</abbr>)
+          char = %(<abbr title="unicode '#{u.unicode_normaliz}'">#{char}</abbr>)
         end
         %(<code class="grammar-char-escape">#{char}</code>)
       else
@@ -550,7 +579,7 @@ module EBNF
     # Format the expression part of a rule
     def format_isoebnf(expr, sep: nil, embedded: false)
-      return (@options[:html] ? %(<a href="#grammar-production-#{expr}">#{expr}</a>) : expr.to_s) if expr.is_a?(Symbol)
+      return (@options[:html] ? %(<a href="#grammar-production-#{@coder.encode expr}">#{@coder.encode expr}</a>) : expr.to_s) if expr.is_a?(Symbol)
       if expr.is_a?(String)
         expr = expr[2..-1].hex.chr if expr =~ /\A#x\h+/
         expr.chars.each do |c|
@@ -558,9 +587,9 @@ module EBNF
             ISOEBNF::TERMINAL_CHARACTER.match?(c)
         end
         if expr =~ /"/
-          return (@options[:html] ? %('<code class="grammar-literal">#{expr}</code>') : %('#{expr}'))
+          return (@options[:html] ? %('<code class="grammar-literal">#{@coder.encode expr}</code>') : %('#{expr}'))
         else
-          return (@options[:html] ? %("<code class="grammar-literal">#{expr}</code>") : %("#{expr}"))
+          return (@options[:html] ? %("<code class="grammar-literal">#{@coder.encode expr}</code>") : %("#{expr}"))
         end
       end
       parts = {
@@ -679,11 +708,13 @@ module EBNF
       <table class="grammar">
         <tbody id="grammar-productions" class="<%= @format %>">
           <% for rule in @rules %>
-          <tr<%= %{ id="grammar-production-#{rule.sym}"} unless %w(=/ |).include?(rule.assign)%>>
+          <tr<%= %{ id="grammar-production-#{rule.sym}"} unless %w(=/ |).include?(rule.assign) || rule.sym.nil?%>>
             <% if rule.id %>
-            <td><%= rule.id %></td>
+            <td<%= " colspan=2" unless rule.sym %>><%= rule.id %></td>
             <% end %>
+            <% if rule.sym %>
             <td><code><%== rule.sym %></code></td>
+            <% end %>
             <td><%= rule.assign %></td>
             <td><%= rule.formatted %></td>
           </tr>

data/lib/ebnf.rb CHANGED Viewed

@@ -9,6 +9,7 @@ module EBNF
   autoload :PEG,      "ebnf/peg"
   autoload :Rule,     "ebnf/rule"
   autoload :Terminals,"ebnf/terminals"
+  autoload :Unescape, "ebnf/unescape"
   autoload :Writer,   "ebnf/writer"
   autoload :VERSION,  "ebnf/version"

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: ebnf
 version: !ruby/object:Gem::Version
-  version: 2.1.0
+  version: 2.2.0
 platform: ruby
 authors:
 - Gregg Kellogg
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-07-13 00:00:00.000000000 Z
+date: 2021-08-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: sxp
@@ -52,6 +52,48 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '3.1'
+- !ruby/object:Gem::Dependency
+  name: htmlentities
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.3'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.3'
+- !ruby/object:Gem::Dependency
+  name: unicode-types
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.6'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.6'
+- !ruby/object:Gem::Dependency
+  name: amazing_print
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.2'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.2'
 - !ruby/object:Gem::Dependency
   name: rdf-spec
   requirement: !ruby/object:Gem::Requirement
@@ -81,47 +123,47 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '3.1'
 - !ruby/object:Gem::Dependency
-  name: erubis
+  name: nokogiri
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.7'
+        version: '1.10'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.7'
+        version: '1.10'
 - !ruby/object:Gem::Dependency
-  name: nokogiri
+  name: erubis
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.10'
+        version: '2.7'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.10'
+        version: '2.7'
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '3.9'
+        version: '3.10'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '3.9'
+        version: '3.10'
 - !ruby/object:Gem::Dependency
   name: rspec-its
   requirement: !ruby/object:Gem::Requirement
@@ -164,8 +206,8 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '13.0'
-description: EBNF is a Ruby parser for W3C EBNF and a parser generator for compliant
-  LL(1) grammars.
+description: EBNF is a Ruby parser for W3C EBNF and a parser generator for PEG and
+  LL(1). Also includes parsing modes for ISO EBNF and ABNF.
 email: public-rdf-ruby@w3.org
 executables:
 - ebnf
@@ -226,13 +268,14 @@ files:
 - lib/ebnf/peg/rule.rb
 - lib/ebnf/rule.rb
 - lib/ebnf/terminals.rb
+- lib/ebnf/unescape.rb
 - lib/ebnf/version.rb
 - lib/ebnf/writer.rb
 homepage: https://github.com/dryruby/ebnf
 licenses:
 - Unlicense
 metadata: {}
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -247,8 +290,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.1.3
-signing_key:
+rubygems_version: 3.2.15
+signing_key:
 specification_version: 4
-summary: EBNF parser and parser generator.
+summary: EBNF parser and parser generator in Ruby.
 test_files: []