tdparser 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 0b0d873d5eee490ac295ad3a31d7df50c8477335aeac7f94d1614e429de40b05
4
+ data.tar.gz: 4e247c68e1d59d4931d97a32990cda0f0d708a291d09a50ba4b8ac0dffb777d1
5
+ SHA512:
6
+ metadata.gz: 80fd063170357063e7c2bc1ad0479246cd3ffd85b5f08586403b7523f42dc2d974d21f9f398d71553bc2848f9bc425461e1cf1696dfdf218e87e2aae26ef38fc
7
+ data.tar.gz: dbecb62bcc8104bc21fa5c08dd6bd6896c6430d473617f0b171df379e707939d0266718abaa8271cfda7244ac8a7fda1586a64e0b5ea2216c02cb0c0858e1bce
data/.dir-locals.el ADDED
@@ -0,0 +1,9 @@
1
+ ;;; Directory Local Variables -*- no-byte-compile: t -*-
2
+ ;;; For more information see (info "(emacs) Directory Variables")
3
+
4
+ ((nil
5
+ . ((eval
6
+ . (progn
7
+ (require 'grep)
8
+ (add-to-list 'grep-find-ignored-directories "html")
9
+ (add-to-list 'grep-find-ignored-directories "coverage"))))))
data/.envrc ADDED
@@ -0,0 +1,2 @@
1
+ watch_file manifest.scm
2
+ use guix
data/.rubocop.yml ADDED
@@ -0,0 +1,8 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.1
3
+ NewCops: enable
4
+ DisabledByDefault: true
5
+
6
+ # Method "fail" defined.
7
+ Style/SignalException:
8
+ Enabled: false
data/CHANGELOG.md ADDED
@@ -0,0 +1,10 @@
1
+ ## [Unreleased]
2
+
3
+ ## [1.5.0] - 2024-11-08
4
+
5
+ * Support Ruby 3.1 or later.
6
+ * Fix and add tests.
7
+ * Format documents.
8
+ * Rename some modules; Moved `TDPUtils` and `TDPXML` to `TDParser`.
9
+ * Rename require path from `tdp` to `tdparser`.
10
+ * Rename the gem name to TDParser from TDP4R.
data/COPYING ADDED
@@ -0,0 +1,25 @@
1
+ Copyright (c) 2003,2004,2005,2006 Takaaki Tateishi <ttate@ttsky.net>
2
+ Copyright (c) 2024 gemmaro <gemmaro.dev@gmail.com>
3
+ All rights reserved.
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions
7
+ are met:
8
+ 1. Redistributions of source code must retain the above copyright
9
+ notice, this list of conditions and the following disclaimer.
10
+ 2. Redistributions in binary form must reproduce the above copyright
11
+ notice, this list of conditions and the following disclaimer in the
12
+ documentation and/or other materials provided with the distribution.
13
+ 3. The name of the author may not be used to endorse or promote products
14
+ derived from this software without specific prior written permission.
15
+
16
+ THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
17
+ IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
18
+ OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
19
+ IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
20
+ INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
21
+ NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
22
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
23
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
25
+ THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README ADDED
@@ -0,0 +1,31 @@
1
+ = TDParser
2
+
3
+ This is a top-down parser combinator library for Ruby (LL(k) parser),
4
+ and is a successor of TDP4R.
5
+
6
+ == Description
7
+
8
+ TDParser is a Ruby library that helps us to construct a top-down
9
+ parser using recursive method calls, that is also called a recursive
10
+ descendent parser. The main features are
11
+
12
+ 1. constructing a parser using combinators as in Parsec (Daan Leijen:
13
+ Parsec (Monadic Parser Combinator Library for Haskell),
14
+ http://www.cs.uu.nl/~daan/parsec.html),
15
+ 2. backtracking parse algorithm with unlimited lookahead (Bryan Ford:
16
+ "Packrat Parsing: Simple, Powerful, Lazy, Linear Time", ICFP,
17
+ 2002.), and
18
+ 3. writing EBNF grammars using Ruby's objects.
19
+
20
+ The feature of (1) enables us to change some production rules in a
21
+ grammar at runtime and componentize a set of production rules. From
22
+ the feature of (2), we need not consider how to prevent conflicts
23
+ among production rules. In addition, TDParser can be viewed as an
24
+ internal DSL for writing LL(k) grammars because of (3).
25
+
26
+ == License
27
+
28
+ Copyright(C) 2003, 2004, 2005, 2006 Takaaki Tateishi <ttate@ttsky.net>
29
+ Copyright(C) 2024 gemmaro <gemmaro.dev@gmail.com>
30
+
31
+ See COPYING.
data/Rakefile ADDED
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ $LOAD_PATH << File.join(__dir__, '../lib')
4
+
5
+ require 'bundler/gem_tasks'
6
+ require 'rdoc/task'
7
+ require 'rake/testtask'
8
+
9
+ RDoc::Task.new do |rdoc|
10
+ readme = 'README'
11
+ rdoc.main = readme
12
+ rdoc.rdoc_files.include('lib/**/*.rb', readme, 'doc/*.rdoc')
13
+ end
14
+
15
+ Rake::TestTask.new do |t|
16
+ t.libs << 'samples' << 'test'
17
+ t.test_files = FileList['test/*_test.rb']
18
+ end
data/doc/faq.rdoc ADDED
@@ -0,0 +1,36 @@
1
+ = How do I write a rule that represents left/right-associative infix operators
2
+
3
+ One of the good example is an arithmetic expression for <tt>*</tt>,
4
+ <tt>/</tt>, <tt>+</tt> and <tt>-</tt>. If you use Racc (Yacc-style
5
+ parser for Ruby), you would write the following rule:
6
+
7
+ prechigh
8
+ left '*','/'
9
+ left '+','-'
10
+ preclow
11
+ ...
12
+ expr : expr '*' expr { result = val[0] * val[2]}
13
+ | expr '/' expr { result = val[0] / val[2]}
14
+ | expr '+' expr { result = val[0] + val[2]}
15
+ | expr '-' expr { result = val[0] - val[2]}
16
+ | NUMBER { result = val[0].to_i() }
17
+
18
+ In TDParser, you can write the above rule as follows:
19
+
20
+ TDParser.define{|g|
21
+ g.expr = chainl(NUMBER >> Proc.new{|x| x[0].to_i},
22
+ token("*")|token("/"),
23
+ token("+")|token("-")){|x|
24
+ case x[1]
25
+ when "*"
26
+ x[0] * x[2]
27
+ when "/"
28
+ x[0] / x[2]
29
+ when "+"
30
+ x[0] + x[2]
31
+ when "-"
32
+ x[0] - x[2]
33
+ end
34
+ }
35
+ # ...
36
+ }
data/doc/guide.rdoc ADDED
@@ -0,0 +1,150 @@
1
+ = TDParser Programmers Guide
2
+
3
+ TDParser is a Ruby component that helps us to construct a top-down
4
+ parser using method calls. This document describes how to use TDParser
5
+ in two styles. Both of styles are similar to one of JavaCC on the
6
+ surface. However, one is a style in which we define rules of a
7
+ grammar as methods (like shown in +sample4.rb+). The other is a style
8
+ in which each rule is defined as if it is a property of a grammar (see
9
+ also +sample5.rb+).
10
+
11
+ == Defining Rules in Module
12
+
13
+ The following class is a parser class, and it accepts expressions that
14
+ consists of digits and <tt>+</tt>.
15
+
16
+ class MyParser
17
+ include TDParser
18
+
19
+ def expr
20
+ token(/\d+/) - token("+") - rule(:expr) >> proc{|x| x[0].to_i + x[2] } |
21
+ token(/\d+/) >> proc{|x| x[0].to_i }
22
+ end
23
+ end
24
+
25
+ In this class, the method +expr+ represents the following production
26
+ rule.
27
+
28
+ expr := int '+' expr
29
+ | int
30
+
31
+ In addition, at the first line of the method +expr+, values accepted
32
+ by <tt>token(/\d+/)</tt>, <tt>token("+")</tt> and <tt>rule(:expr)</tt>
33
+ are assigned to <tt>x[0]</tt>, <tt>x[1]</tt> and <tt>x[2]</tt>
34
+ respectively. After that, in order to parse <tt>1 + 2</tt>, we first
35
+ split it into an array of tokens like <tt>["1", "+", "2"]</tt>, and
36
+ then call the +parse+ method of a parser object, which is created by
37
+ <tt>MyParser.new()</tt>, as follows.
38
+
39
+ parser = MyParser.new()
40
+ parser.expr.parse(["1", "+", "2"])
41
+
42
+ Note that we can pass one of the following objects to the parse method.
43
+
44
+ - an Enumerable object. E.g.: <tt>expr.parse(["1", "+", "2"])</tt>
45
+
46
+ - an object which has methods 'shift' and 'unshift'.
47
+ E.g.:
48
+
49
+ expr.parse(TDParser::TokenGenerator{|x|
50
+ x.yield("1"); x.yield("+"); x.yield("2")
51
+ })
52
+
53
+ - a block. E.g.: <tt>expr.parse{|x| x.yield("1"); x.yield("+");
54
+ x.yield("2") }</tt>
55
+
56
+ In that syntax, <tt>+</tt> is right-associative. However, we
57
+ <i>can't</i> write as follows.
58
+
59
+ def expr
60
+ rule(:expr) - token("+") - token(/\d+/) >> proc{|x| x[0].to_i + x[2].to_i }
61
+ token(/\d+/) >> proc{|x| x[0].to_i }
62
+ end
63
+
64
+ This problem is called left-recursion problem. So we have to use one
65
+ of the following rules instead.
66
+
67
+ def expr
68
+ token(/\d+/) - (token("+") - token(/\d+/))*0 >> proc{|x|
69
+ x[1].inject(x[0]){|acc,y|
70
+ case y[0]
71
+ when "+"
72
+ acc + y[1]
73
+ end
74
+ }
75
+ }
76
+ end
77
+
78
+ def expr # javacc style
79
+ n = nil
80
+ (token(/\d+/) >> proc{|x| n = x }) -
81
+ (token("+") - rule(/\d+/) >> proc{|y|
82
+ case y[0]
83
+ when "+"
84
+ n += y[1].to_i
85
+ end
86
+ })*0 >> proc{|x| n }
87
+ end
88
+
89
+ In the rules, <tt>(...)*N</tt> represents <i>N</i> or more rules
90
+ <tt>(...)</tt>. <tt>x[1]</tt> has multiple sequences of tokens
91
+ accepted by <tt>(...)*0</tt>. For example, if <tt>["1",
92
+ "+","1","+","2"]</tt> is parsed by the rule: <tt>token(/\d+/) -
93
+ (token("+") - token(/\d+/))*0</tt>, we obtain <tt>[["+", "1"], ["+",
94
+ "2"]]</tt> by <tt>x[1]</tt>.
95
+
96
+ == Defining Rules using <tt>TDParser.define()</tt>
97
+
98
+ The rule defined in the first sample script, shown in the previous
99
+ section, can also be defined as follows.
100
+
101
+ parser = TDParser.define{|g|
102
+ g.expr =
103
+ g.token(/\d+/) - g.token("+") - g.expr >> proc{|x| x[0].to_i + x[2] } |
104
+ g.token(/\d+/) >> proc{|x| x[0].to_i }
105
+ }
106
+
107
+ (See also <tt>sample5.rb</tt> and <tt>sample6.rb</tt>)
108
+
109
+ == Parser Combinators
110
+
111
+ * Constructors
112
+ * <tt>token(obj)</tt>
113
+ * <tt>rule(method)</tt>
114
+ * <tt>any()</tt>:: any token
115
+ * <tt>none()</tt>:: no more token
116
+ * <tt>empty()</tt>:: empty
117
+ * <tt>fail()</tt>:: failure
118
+ * <tt>backref(label)</tt>:: back reference
119
+ * <tt>stackref(stack)</tt>:: stack reference
120
+ * Operators
121
+ * <tt>rule - rule</tt>:: sequence
122
+ * <tt>rule | rule</tt>:: choice
123
+ * <tt>rule * n</tt>:: iteration
124
+ * <tt>rule * n..m</tt>:: iteration
125
+ * <tt>rule / label</tt>:: label
126
+ * <tt>rule % stack</tt>:: stack
127
+ * <tt>~ rule</tt>:: negative lookahead
128
+ * Utility Functions
129
+ * <tt>leftrec(base, rule1, ..., ruleN, &action)</tt>:: This constructs the following rule:
130
+
131
+ base - ruleN* >> action' |
132
+ ... |
133
+ base - rule1* >> action' |
134
+ fail()
135
+
136
+ * <tt>rightrec(rule1, ..., ruleN, base, &action)</tt>:: This constructs the following rule:
137
+
138
+ ruleN* - base >> action' |
139
+ ... |
140
+ rule1* - base >> action' |
141
+ fail()
142
+
143
+ * <tt>chainl(base, infix1, ..., infixN, &action)</tt>
144
+ * <tt>chainr(base, infix1, ..., infixN, &action)</tt>
145
+
146
+ == <tt>StringTokenizer</tt>
147
+
148
+ There is a simple tokenizer called TDParser::StringTokenizer in the
149
+ library <tt>tdparser/utils</tt>. (See <tt>MyParser#parse</tt> in
150
+ <tt>sample2.rb</tt>)
@@ -0,0 +1,91 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'tdparser'
4
+
5
+ module TDParser
6
+ class Token
7
+ attr_accessor :kind, :value
8
+
9
+ def initialize(kind, value)
10
+ @kind = kind
11
+ @value = value
12
+ end
13
+
14
+ def ==(other)
15
+ (other.class == self.class) &&
16
+ (@kind == other.kind) &&
17
+ (@value == other.value)
18
+ end
19
+
20
+ def ===(other)
21
+ super(other) || (@kind == other)
22
+ end
23
+
24
+ def =~(other)
25
+ @kind == other
26
+ end
27
+ end
28
+
29
+ class BasicStringTokenizer
30
+ def self.[](rule, ignore = nil)
31
+ new(rule, ignore)
32
+ end
33
+
34
+ def initialize(rule, ignore = nil)
35
+ require('strscan')
36
+ @rule = rule
37
+ @scan_pattern = Regexp.new(@rule.keys.join('|'))
38
+ @ignore_pattern = ignore
39
+ end
40
+
41
+ def generate(str)
42
+ scanner = StringScanner.new(str)
43
+ TDParser::TokenGenerator.new do |x|
44
+ until scanner.empty?
45
+ if @ignore_pattern
46
+ while scanner.scan(@ignore_pattern)
47
+ end
48
+ end
49
+ sstr = scanner.scan(@scan_pattern)
50
+ if sstr
51
+ @rule.each do |reg, kind|
52
+ next unless reg =~ sstr
53
+
54
+ x.yield(Token.new(kind, sstr))
55
+ yielded = true
56
+ break
57
+ end
58
+ else
59
+ c = scanner.scan(/./)
60
+ x.yield(c)
61
+ end
62
+ end
63
+ end
64
+ end
65
+ end
66
+
67
+ class StringTokenizer < BasicStringTokenizer
68
+ def initialize(rule, ignore = nil)
69
+ super(rule, ignore || /\s+/)
70
+ end
71
+ end
72
+
73
+ class WaitingTokenGenerator < TDParser::TokenGenerator
74
+ def initialize(*args)
75
+ super(*args)
76
+ @terminated = false
77
+ end
78
+
79
+ def terminate
80
+ @terminated = true
81
+ end
82
+
83
+ def shift
84
+ return nil if @terminated
85
+
86
+ while empty?
87
+ end
88
+ super()
89
+ end
90
+ end
91
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TDParser
4
+ VERSION = '1.5.0'
5
+ end
@@ -0,0 +1,180 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'tdparser'
4
+ require 'rexml/parsers/pullparser'
5
+ require 'rexml/document'
6
+
7
+ module TDParser
8
+ module XMLParser
9
+ class XMLTokenGenerator < TDParser::TokenGenerator
10
+ def initialize(src)
11
+ @xparser = REXML::Parsers::BaseParser.new(src)
12
+ super() do |g|
13
+ while @xparser.has_next?
14
+ e = @xparser.pull
15
+ g.yield(e)
16
+ end
17
+ end
18
+ end
19
+ end
20
+
21
+ class XArray < Array
22
+ def ===(ary)
23
+ return true if super(ary)
24
+ return false unless ary.is_a?(Array)
25
+
26
+ each_with_index do |v, idx|
27
+ case ary[idx]
28
+ when v
29
+ else
30
+ return false
31
+ end
32
+ end
33
+ true
34
+ end
35
+ end
36
+
37
+ class XHash < Hash
38
+ def ===(h)
39
+ return true if super(h)
40
+ return false unless h.is_a?(Hash)
41
+
42
+ each do |k, v|
43
+ case h[k]
44
+ when v
45
+ else
46
+ return false
47
+ end
48
+ end
49
+ true
50
+ end
51
+ end
52
+
53
+ def start_element(name = String)
54
+ token(XArray[:start_element, name, Hash])
55
+ end
56
+
57
+ def end_element(name = String)
58
+ token(XArray[:end_element, name])
59
+ end
60
+
61
+ def element(elem = String, &inner)
62
+ crule = if inner
63
+ inner.call | empty
64
+ else
65
+ empty
66
+ end
67
+ (start_element(elem) - crule - end_element(elem)) >> proc do |x|
68
+ name = x[0][1]
69
+ attrs = x[0][2]
70
+ node = REXML::Element.new
71
+ node.name = name
72
+ node.attributes.merge!(attrs)
73
+ [node, x[1]]
74
+ end
75
+ end
76
+
77
+ def text(match = String)
78
+ token(XArray[:text, match]) >> proc do |x|
79
+ REXML::Text.new(x[0][1])
80
+ end
81
+ end
82
+
83
+ def pi
84
+ token(XArray[:processing_instruction, String, String]) >> proc do |x|
85
+ REXML::Instruction.new(x[0][1], x[0][2])
86
+ end
87
+ end
88
+
89
+ def cdata(match = String)
90
+ token(XArray[:cdata, match]) >> proc do |x|
91
+ REXML::CData.new(x[0][1])
92
+ end
93
+ end
94
+
95
+ def comment(match = String)
96
+ token(XArray[:comment, match]) >> proc do |x|
97
+ REXML::Comment.new(x[0][1])
98
+ end
99
+ end
100
+
101
+ def xmldecl
102
+ token(XArray[:xmldecl]) >> proc do |x|
103
+ REXML::XMLDecl.new(x[0][1], x[0][2], x[0][3])
104
+ end
105
+ end
106
+
107
+ def start_doctype(name = String)
108
+ token(XArray[:start_doctype, name])
109
+ end
110
+
111
+ def end_doctype
112
+ token(XArray[:end_doctype])
113
+ end
114
+
115
+ def doctype(name = String, &inner)
116
+ crule = if inner
117
+ inner.call | empty
118
+ else
119
+ empty
120
+ end
121
+ (start_doctype(name) - crule - end_doctype) >> proc do |x|
122
+ node = REXML::DocType.new(x[0][1..])
123
+ [node, x[1]]
124
+ end
125
+ end
126
+
127
+ def externalentity(entity = String)
128
+ token(XArray[:externalentity, entity]) >> proc do |x|
129
+ REXML::ExternalEntity.new(x[0][1])
130
+ end
131
+ end
132
+
133
+ def elementdecl(elem = String)
134
+ token(XArray[:elementdecl, elem]) >> proc do |x|
135
+ REXML::ElementDecl.new(x[0][1])
136
+ end
137
+ end
138
+
139
+ def entitydecl(_entity = String)
140
+ token(XArray[:entitydecl, elem]) >> proc do |x|
141
+ REXML::Entity.new(x[0])
142
+ end
143
+ end
144
+
145
+ def attlistdecl(_decl = String)
146
+ token(XArray[:attlistdecl]) >> proc do |x|
147
+ REXML::AttlistDecl.new(x[0][1..])
148
+ end
149
+ end
150
+
151
+ def notationdecl(_decl = String)
152
+ token(XArray[:notationdecl]) >> proc do |x|
153
+ REXML::NotationDecl.new(*x[0][1..])
154
+ end
155
+ end
156
+
157
+ def any_node(&)
158
+ (element(&) | doctype(&) | text | pi | cdata |
159
+ comment | xmldecl | externalentity | elementdecl |
160
+ entitydecl | attlistdecl | notationdecl) >> proc { |x| x[2] }
161
+ end
162
+
163
+ def dom_constructor(&act)
164
+ proc do |x|
165
+ node = x[0][0]
166
+ child = x[0][1]
167
+ if child.is_a?(Array)
168
+ child.each { |c| node.add(c) }
169
+ else
170
+ node.add(child)
171
+ end
172
+ if act
173
+ act[node]
174
+ else
175
+ node
176
+ end
177
+ end
178
+ end
179
+ end
180
+ end