tokn 0.0.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/README.txt +194 -0
- data/bin/tokncompile +16 -0
- data/bin/toknprocess +26 -0
- data/figures/sample_dfa.pdf +0 -0
- data/lib/tokn/code_set.rb +392 -0
- data/lib/tokn/dfa.rb +196 -0
- data/lib/tokn/dfa_builder.rb +261 -0
- data/lib/tokn/range_partition.rb +233 -0
- data/lib/tokn/reg_parse.rb +379 -0
- data/lib/tokn/state.rb +320 -0
- data/lib/tokn/token_defn_parser.rb +156 -0
- data/lib/tokn/tokenizer.rb +211 -0
- data/lib/tokn/tokn_const.rb +29 -0
- data/lib/tokn/tools.rb +186 -0
- data/lib/tokn.rb +1 -0
- data/test/data/sampletext.txt +11 -0
- data/test/data/sampletokens.txt +32 -0
- data/test/simple.rb +33 -0
- data/test/test.rb +519 -0
- data/test/testcmds +4 -0
- metadata +69 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: d44494c850d61cd0ab5e3e588bbee398d85f7902
|
4
|
+
data.tar.gz: f5b35f65f7fb8f0df3adbcd4ff6d5df483ab3ce4
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 8432678eb42bcbacfa3db0c04b6f1cf728516e69da13fab1f79ab88002b15bb360cdd31ee76b40f148f2250b3dc78c4263037496c33611c482f1289fdf4998cd
|
7
|
+
data.tar.gz: c581302f4b0e77840d2a6e4d9e30657387b913b5c4f7c24376b083576fc4fdd76c28bdff401927528094caab059252e214ccc1979d170f268e27917ec1442708
|
data/README.txt
ADDED
@@ -0,0 +1,194 @@
|
|
1
|
+
'tokn' : A ruby gem for constructing DFAs and using them to tokenize text files.
|
2
|
+
|
3
|
+
Written and (c) by Jeff Sember, March 2013.
|
4
|
+
================================================================================
|
5
|
+
|
6
|
+
|
7
|
+
Description of the problem
|
8
|
+
================================================================================
|
9
|
+
|
10
|
+
For a simple example, suppose a particular text file is designed to have
|
11
|
+
tokens of the following three types:
|
12
|
+
|
13
|
+
1) 'a' followed by any number of 'a' or 'b'
|
14
|
+
2) 'b' followed by either 'aa' or zero or more 'b'
|
15
|
+
3) 'bbb'
|
16
|
+
|
17
|
+
We will also allow an additional token, one or more spaces, to separate them.
|
18
|
+
These four token types can be written using regular expressions as:
|
19
|
+
|
20
|
+
sep: \s
|
21
|
+
tku: a(a|b)*
|
22
|
+
tkv: b(aa|b*)
|
23
|
+
tkw: bbb
|
24
|
+
|
25
|
+
We've given each token definition a name (to the left of the colon).
|
26
|
+
|
27
|
+
Now suppose your program needs to read a text file and interpret the tokens it
|
28
|
+
finds there. This can be done using the DFA (deterministic finite state automaton)
|
29
|
+
shown in figures/sample_dfa.pdf. The token extraction algorithm is as follows:
|
30
|
+
|
31
|
+
1) Begin at the start state, S0.
|
32
|
+
2) Look at the next character in the source (text) file. If there is an arrow (edge)
|
33
|
+
labelled with that character, follow it to another state (it may lead to the
|
34
|
+
same state; that's okay), and advance the cursor to the next character in
|
35
|
+
the source file.
|
36
|
+
3) If there's an arrow labelled with a negative number N, don't follow the edge,
|
37
|
+
but instead remember the lowest (i.e., most negative) such N found.
|
38
|
+
4) Continue steps 2 and 3 until no further progress is possible.
|
39
|
+
5) At this point, N indicates the name of the token found. The cursor should be
|
40
|
+
restored to the point it was at when that N was recorded. The token's text
|
41
|
+
consists of the characters from the starting cursor position to that point.
|
42
|
+
6) If no N value was recorded, then the source text doesn't match any of the tokens,
|
43
|
+
which is considered an error.
|
44
|
+
|
45
|
+
|
46
|
+
The tokn module provides a simple and efficient way to perform this tokenization process.
|
47
|
+
Its major accomplishment is not just performing the above six steps, but rather that
|
48
|
+
it also can construct, from a set of token definitions, the DFA to be used in these steps.
|
49
|
+
Such DFAs are very useful, and can be used by non-Ruby programs as well.
|
50
|
+
|
51
|
+
|
52
|
+
Using the tokn module in a Ruby program
|
53
|
+
===================================================================================
|
54
|
+
|
55
|
+
There are three object classes of interest: DFA, Tokenizer, and Token. A DFA is
|
56
|
+
compiled once from a script containing token definitions (e.g, "tku: b(aa|b*) ..."),
|
57
|
+
and can then be stored (either in memory, or on disk as a JSON string) for later use.
|
58
|
+
|
59
|
+
When tokens need to be extracted from a source file (or simple string), a Tokenizer is
|
60
|
+
constructed. It requires both the DFA and the source file as input. Once this is done,
|
61
|
+
individual Token objects can be read from the Tokenizer.
|
62
|
+
|
63
|
+
Here's some example Ruby code showing how a text file "source.txt" can be split into
|
64
|
+
tokens. We'll assume there's a text file "tokendefs.txt" that contains the
|
65
|
+
definitions shown earlier.
|
66
|
+
|
67
|
+
require "Tokenizer"
|
68
|
+
|
69
|
+
dfa = dfa_from_script(readTextFile("tokendefs.txt"))
|
70
|
+
|
71
|
+
t = Tokenizer.new(dfa, readTextFile("source.txt"))
|
72
|
+
|
73
|
+
while t.hasNext
|
74
|
+
|
75
|
+
k = t.read # read token
|
76
|
+
|
77
|
+
if t.typeOf(k) == "sep" # skip 'whitespace'
|
78
|
+
next
|
79
|
+
end
|
80
|
+
|
81
|
+
...do something with the token ...
|
82
|
+
end
|
83
|
+
|
84
|
+
If later, another file needs to be tokenized, a new Tokenizer object can be
|
85
|
+
constructed and given the same dfa object as earlier.
|
86
|
+
|
87
|
+
|
88
|
+
Using the tokn command line utilities
|
89
|
+
===================================================================================
|
90
|
+
|
91
|
+
The module has two utility scripts: tokncompile, and toknprocess. These can be
|
92
|
+
found in the bin/ directory.
|
93
|
+
|
94
|
+
The tokncompile script reads a token definition script from standard input, and
|
95
|
+
compiles it to a DFA. For example, if you are in the tokn directory, you can
|
96
|
+
type:
|
97
|
+
|
98
|
+
tokncompile < sampletokens.txt > compileddfa.txt
|
99
|
+
|
100
|
+
It will produce the JSON encoding of the appropriate DFA. For a description of how
|
101
|
+
this JSON string represents the DFA, see Dfa.rb.
|
102
|
+
|
103
|
+
The toknprocess script takes two arguments: the name of a file containing a
|
104
|
+
previously compiled DFA, and the name of a source file. It extracts the sequence
|
105
|
+
of tokens from the source file to the standard output:
|
106
|
+
|
107
|
+
toknprocess compileddfa.txt sampletext.txt
|
108
|
+
|
109
|
+
This will produce the following output:
|
110
|
+
|
111
|
+
WS 1 1 // Example source file that can be tokenized
|
112
|
+
|
113
|
+
WS 2 1
|
114
|
+
|
115
|
+
ID 3 1 speed
|
116
|
+
WS 3 6
|
117
|
+
ASSIGN 3 7 =
|
118
|
+
WS 3 8
|
119
|
+
INT 3 9 42
|
120
|
+
WS 3 11
|
121
|
+
WS 3 14 // speed of object
|
122
|
+
|
123
|
+
WS 4 1
|
124
|
+
|
125
|
+
ID 5 1 gravity
|
126
|
+
WS 5 8
|
127
|
+
ASSIGN 5 9 =
|
128
|
+
WS 5 10
|
129
|
+
DBL 5 11 -9.80
|
130
|
+
WS 5 16
|
131
|
+
|
132
|
+
|
133
|
+
ID 7 1 title
|
134
|
+
WS 7 6
|
135
|
+
ASSIGN 7 7 =
|
136
|
+
WS 7 8
|
137
|
+
LBL 7 9 'This is a string with \' an escaped delimiter'
|
138
|
+
WS 7 56
|
139
|
+
|
140
|
+
|
141
|
+
IF 9 1 if
|
142
|
+
WS 9 3
|
143
|
+
ID 9 4 gravity
|
144
|
+
WS 9 11
|
145
|
+
EQUIV 9 12 ==
|
146
|
+
WS 9 14
|
147
|
+
INT 9 15 12
|
148
|
+
WS 9 17
|
149
|
+
BROP 9 18 {
|
150
|
+
WS 9 19
|
151
|
+
|
152
|
+
DO 10 3 do
|
153
|
+
WS 10 5
|
154
|
+
ID 10 6 something
|
155
|
+
WS 10 15
|
156
|
+
|
157
|
+
BRCL 11 1 }
|
158
|
+
WS 11 2
|
159
|
+
|
160
|
+
The extra linefeeds are the result of a token containing a linefeed.
|
161
|
+
|
162
|
+
|
163
|
+
FAQ
|
164
|
+
===================================================================================
|
165
|
+
|
166
|
+
1) Why can't I just use Ruby's regular expressions for tokenizing text?
|
167
|
+
|
168
|
+
You could construct a regular expression describing each possible token, and use that
|
169
|
+
to extract a token from the start of a string; you could then remove that token from the
|
170
|
+
string, and repeat. The trouble is that the regular expression has no easy way to indicate
|
171
|
+
which individual token's expression was matched. You would then (presumably) have to match
|
172
|
+
the returned token with each individual regular expression to identify the token type.
|
173
|
+
|
174
|
+
Another reason why standard regular expressions can be troublesome is that their
|
175
|
+
implementations actually 'recognize' a richer class of tokens than the ones described
|
176
|
+
here. This extra power can come at a cost; in some pathological cases, the running time
|
177
|
+
can become exponential.
|
178
|
+
|
179
|
+
2) Is tokn compatible with Unicode?
|
180
|
+
|
181
|
+
The tokn tool is capable of extracting tokens made up of characters that have
|
182
|
+
codes in the entire Unicode range: 0 through 0x10ffff (hex). In fact, the labels
|
183
|
+
on the DFA edges can be viewed as sets of any nonnegative integers (negative
|
184
|
+
values are reserved for the token identifiers). Note however that the current implementation
|
185
|
+
only reads Ruby characters from the input, which I believe are only 8 bits wide.
|
186
|
+
|
187
|
+
3) What do I do if I have some ideas for enhancing tokn, or want to point out some
|
188
|
+
problems with it?
|
189
|
+
|
190
|
+
Well, I can be reached as jpsember at gmail dot com.
|
191
|
+
|
192
|
+
|
193
|
+
|
194
|
+
|
data/bin/tokncompile
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
#!/usr/local/bin/ruby
|
2
|
+
|
3
|
+
# Compile a DFA from a token definition script,
|
4
|
+
# then serialize that DFA to stdout
|
5
|
+
#
|
6
|
+
# Example usage (for Unix):
|
7
|
+
#
|
8
|
+
# tokncompile < sampletokens.txt > dfa.txt
|
9
|
+
#
|
10
|
+
|
11
|
+
|
12
|
+
require 'tokn'
|
13
|
+
|
14
|
+
puts dfa_from_script(ARGF.read).serialize()
|
15
|
+
|
16
|
+
|
data/bin/toknprocess
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
#!/usr/local/bin/ruby
|
2
|
+
|
3
|
+
# Given a compiled DFA file and a source file,
|
4
|
+
# extract all tokens from the source file.
|
5
|
+
#
|
6
|
+
# Example usage (for Unix); assumes tokncompile.rb
|
7
|
+
# has been run beforehand:
|
8
|
+
#
|
9
|
+
#
|
10
|
+
# toknprocess dfa.txt sampletext.txt
|
11
|
+
#
|
12
|
+
|
13
|
+
require 'tokn'
|
14
|
+
|
15
|
+
if ARGV.size != 2
|
16
|
+
puts "Usage: toknprocess <dfa file> <source file>"
|
17
|
+
abort
|
18
|
+
end
|
19
|
+
|
20
|
+
dfa = dfa_from_file(ARGV[0])
|
21
|
+
tk = Tokenizer.new(dfa, readTextFile(ARGV[1]))
|
22
|
+
|
23
|
+
while tk.hasNext()
|
24
|
+
t = tk.read
|
25
|
+
printf("%s %d %d %s\n",tk.nameOf(t),t.lineNumber,t.column,t.text)
|
26
|
+
end
|
Binary file
|
@@ -0,0 +1,392 @@
|
|
1
|
+
require_relative 'tools'
|
2
|
+
|
3
|
+
req('tokn_const')
|
4
|
+
|
5
|
+
|
6
|
+
# A CodeSet is an ordered set of character or token codes that
|
7
|
+
# are used as labels on DFA edges.
|
8
|
+
#
|
9
|
+
# In addition to unicode character codes 0...0x10ffff, they
|
10
|
+
# also represent epsilon transitions (-1), or token identifiers ( < -1).
|
11
|
+
#
|
12
|
+
# Each CodeSet is represented as an array with 2n elements;
|
13
|
+
# each pair represents a closed lower and open upper range of values.
|
14
|
+
#
|
15
|
+
# Thus a value x is within the set [a1,a2,b1,b2,..]
|
16
|
+
# iff (a1 <= x < a2) or (b1 <= x < b2) or ...
|
17
|
+
#
|
18
|
+
class CodeSet
|
19
|
+
|
20
|
+
include Tokn
|
21
|
+
|
22
|
+
# Construct a copy of this set
|
23
|
+
#
|
24
|
+
def makeCopy
|
25
|
+
c = CodeSet.new
|
26
|
+
c.setTo(self)
|
27
|
+
c
|
28
|
+
end
|
29
|
+
|
30
|
+
# Initialize set; optionally add an initial contiguous range
|
31
|
+
#
|
32
|
+
def initialize(lower = nil, upper = nil)
|
33
|
+
@elem = []
|
34
|
+
if lower
|
35
|
+
add(lower,upper)
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
# Replace this set with a copy of another
|
40
|
+
#
|
41
|
+
def setTo(otherSet)
|
42
|
+
@elem.replace(otherSet.array)
|
43
|
+
end
|
44
|
+
|
45
|
+
# Get the array containing the code set range pairs
|
46
|
+
#
|
47
|
+
def array
|
48
|
+
return @elem
|
49
|
+
end
|
50
|
+
|
51
|
+
# Replace this set's array
|
52
|
+
# @param a array to point to (does not make a copy of it)
|
53
|
+
#
|
54
|
+
def setArray(a)
|
55
|
+
@elem = a
|
56
|
+
end
|
57
|
+
|
58
|
+
|
59
|
+
def hash
|
60
|
+
return @elem.hash
|
61
|
+
end
|
62
|
+
|
63
|
+
# Determine if this set is equivalent to another
|
64
|
+
#
|
65
|
+
def eql?(other)
|
66
|
+
@elem == other.array
|
67
|
+
end
|
68
|
+
|
69
|
+
|
70
|
+
# Add a contiguous range of values to the set
|
71
|
+
# @param lower min value in range
|
72
|
+
# @param upper one plus max value in range
|
73
|
+
#
|
74
|
+
def add(lower, upper = nil)
|
75
|
+
if not upper
|
76
|
+
upper = lower + 1
|
77
|
+
end
|
78
|
+
|
79
|
+
if lower >= upper
|
80
|
+
raise RangeError
|
81
|
+
end
|
82
|
+
|
83
|
+
newSet = []
|
84
|
+
i = 0
|
85
|
+
while i < @elem.size and @elem[i] < lower
|
86
|
+
newSet.push(@elem[i])
|
87
|
+
i += 1
|
88
|
+
end
|
89
|
+
|
90
|
+
if (i & 1) == 0
|
91
|
+
newSet.push(lower)
|
92
|
+
end
|
93
|
+
|
94
|
+
while i < @elem.size and @elem[i] <= upper
|
95
|
+
i += 1
|
96
|
+
end
|
97
|
+
|
98
|
+
if (i & 1) == 0
|
99
|
+
newSet.push(upper)
|
100
|
+
end
|
101
|
+
|
102
|
+
while i < @elem.size
|
103
|
+
newSet.push(@elem[i])
|
104
|
+
i += 1
|
105
|
+
end
|
106
|
+
|
107
|
+
@elem = newSet
|
108
|
+
|
109
|
+
end
|
110
|
+
|
111
|
+
|
112
|
+
|
113
|
+
|
114
|
+
|
115
|
+
|
116
|
+
# Remove a contiguous range of values from the set
|
117
|
+
# @param lower min value in range
|
118
|
+
# @param upper one plus max value in range
|
119
|
+
#
|
120
|
+
def remove(lower, upper = nil)
|
121
|
+
if not upper
|
122
|
+
upper = lower + 1
|
123
|
+
end
|
124
|
+
|
125
|
+
if lower >= upper
|
126
|
+
raise RangeError
|
127
|
+
end
|
128
|
+
|
129
|
+
newSet = []
|
130
|
+
i = 0
|
131
|
+
while i < @elem.size and @elem[i] < lower
|
132
|
+
newSet.push(@elem[i])
|
133
|
+
i += 1
|
134
|
+
end
|
135
|
+
|
136
|
+
if (i & 1) == 1
|
137
|
+
newSet.push(lower)
|
138
|
+
end
|
139
|
+
|
140
|
+
while i < @elem.size and @elem[i] <= upper
|
141
|
+
i += 1
|
142
|
+
end
|
143
|
+
|
144
|
+
if (i & 1) == 1
|
145
|
+
newSet.push(upper)
|
146
|
+
end
|
147
|
+
|
148
|
+
while i < @elem.size
|
149
|
+
newSet.push(@elem[i])
|
150
|
+
i += 1
|
151
|
+
end
|
152
|
+
|
153
|
+
setArray(newSet)
|
154
|
+
|
155
|
+
end
|
156
|
+
|
157
|
+
# Replace this set with itself minus another
|
158
|
+
#
|
159
|
+
def difference!(s)
|
160
|
+
setTo(difference(s))
|
161
|
+
end
|
162
|
+
|
163
|
+
# Calculate difference of this set minus another
|
164
|
+
def difference(s)
|
165
|
+
combineWith(s, 'd')
|
166
|
+
end
|
167
|
+
|
168
|
+
# Calculate the intersection of this set and another
|
169
|
+
def intersect(s)
|
170
|
+
combineWith(s, 'i')
|
171
|
+
end
|
172
|
+
|
173
|
+
|
174
|
+
|
175
|
+
# Set this set equal to its intersection with another
|
176
|
+
def intersect!(s)
|
177
|
+
setTo(intersect(s))
|
178
|
+
end
|
179
|
+
|
180
|
+
# Add every value from another CodeSet to this one
|
181
|
+
def addSet(s)
|
182
|
+
sa = s.array
|
183
|
+
|
184
|
+
(0 ... sa.length).step(2) {
|
185
|
+
|i| add(sa[i],sa[i+1])
|
186
|
+
}
|
187
|
+
end
|
188
|
+
|
189
|
+
# Determine if this set contains a particular value
|
190
|
+
def contains?(val)
|
191
|
+
ret = false
|
192
|
+
i = 0
|
193
|
+
while i < @elem.size
|
194
|
+
if val < @elem[i]
|
195
|
+
break
|
196
|
+
end
|
197
|
+
if val < @elem[i+1]
|
198
|
+
ret = true
|
199
|
+
break
|
200
|
+
end
|
201
|
+
i += 2
|
202
|
+
end
|
203
|
+
|
204
|
+
ret
|
205
|
+
|
206
|
+
end
|
207
|
+
|
208
|
+
# Get string representation of set, treating them (where
|
209
|
+
# possible) as printable ASCII characters
|
210
|
+
#
|
211
|
+
def to_s
|
212
|
+
s = ''
|
213
|
+
i = 0
|
214
|
+
while i < @elem.size
|
215
|
+
if s.size
|
216
|
+
s += ' '
|
217
|
+
end
|
218
|
+
|
219
|
+
lower = @elem[i]
|
220
|
+
upper = @elem[i+1]
|
221
|
+
s += dbStr(lower)
|
222
|
+
if upper != 1+lower
|
223
|
+
s += '..' + dbStr(upper-1)
|
224
|
+
end
|
225
|
+
i += 2
|
226
|
+
end
|
227
|
+
return s
|
228
|
+
end
|
229
|
+
|
230
|
+
def inspect
|
231
|
+
to_s
|
232
|
+
end
|
233
|
+
|
234
|
+
# Get string representation of set, treating them
|
235
|
+
# as integers
|
236
|
+
#
|
237
|
+
def to_s_alt
|
238
|
+
s = ''
|
239
|
+
i = 0
|
240
|
+
while i < @elem.size
|
241
|
+
if s.length > 0
|
242
|
+
s += ' '
|
243
|
+
end
|
244
|
+
low = @elem[i]
|
245
|
+
upr = @elem[i+1]
|
246
|
+
s += low.to_s
|
247
|
+
if upr > low+1
|
248
|
+
s += '..'
|
249
|
+
s += (upr-1).to_s
|
250
|
+
end
|
251
|
+
i += 2
|
252
|
+
end
|
253
|
+
return s
|
254
|
+
end
|
255
|
+
|
256
|
+
|
257
|
+
# Negate the inclusion of a contiguous range of values
|
258
|
+
#
|
259
|
+
# @param lower min value in range
|
260
|
+
# @param upper one plus max value in range
|
261
|
+
#
|
262
|
+
def negate(lower = 0, upper = CODEMAX)
|
263
|
+
s2 = CodeSet.new(lower,upper)
|
264
|
+
if lower >= upper
|
265
|
+
raise RangeError
|
266
|
+
end
|
267
|
+
|
268
|
+
newSet = []
|
269
|
+
i = 0
|
270
|
+
while i < @elem.size and @elem[i] <= lower
|
271
|
+
newSet.push(@elem[i])
|
272
|
+
i += 1
|
273
|
+
end
|
274
|
+
|
275
|
+
if i > 0 and newSet[i-1] == lower
|
276
|
+
newSet.pop
|
277
|
+
else
|
278
|
+
newSet.push(lower)
|
279
|
+
end
|
280
|
+
|
281
|
+
while i < @elem.size and @elem[i] <= upper
|
282
|
+
newSet.push(@elem[i])
|
283
|
+
i += 1
|
284
|
+
end
|
285
|
+
|
286
|
+
|
287
|
+
if newSet.length > 0 and newSet.last == upper
|
288
|
+
newSet.pop
|
289
|
+
else
|
290
|
+
newSet.push(upper)
|
291
|
+
end
|
292
|
+
|
293
|
+
while i < @elem.size
|
294
|
+
newSet.push(@elem[i])
|
295
|
+
i += 1
|
296
|
+
end
|
297
|
+
|
298
|
+
@elem = newSet
|
299
|
+
|
300
|
+
end
|
301
|
+
|
302
|
+
# Determine how many distinct values are represented by this set
|
303
|
+
def cardinality
|
304
|
+
c = 0
|
305
|
+
i = 0
|
306
|
+
while i < @elem.length
|
307
|
+
c += @elem[i+1] - @elem[i]
|
308
|
+
i += 2
|
309
|
+
end
|
310
|
+
c
|
311
|
+
end
|
312
|
+
|
313
|
+
# Determine if this set is empty
|
314
|
+
#
|
315
|
+
def empty?
|
316
|
+
@elem.empty?
|
317
|
+
end
|
318
|
+
|
319
|
+
private
|
320
|
+
|
321
|
+
# Get a debug description of a value within a CodeSet, suitable
|
322
|
+
# for including within a .dot label
|
323
|
+
#
|
324
|
+
def dbStr(charCode)
|
325
|
+
|
326
|
+
# Unless it corresponds to a non-confusing printable ASCII value,
|
327
|
+
# just print its decimal equivalent
|
328
|
+
|
329
|
+
s = charCode.to_s
|
330
|
+
|
331
|
+
if charCode == EPSILON
|
332
|
+
s = "(e)"
|
333
|
+
elsif (charCode > 32 && charCode < 0x7f && !"'\"\\[]{}()".index(charCode.chr))
|
334
|
+
s = charCode.chr
|
335
|
+
end
|
336
|
+
return s
|
337
|
+
end
|
338
|
+
|
339
|
+
# Combine this range (a) with another (b) according to particular operation
|
340
|
+
# > s other range (b)
|
341
|
+
# > oper 'i': intersection, a^b
|
342
|
+
# 'd': difference, a-b
|
343
|
+
# 'n': negation, (a & !b) | (!a & b)
|
344
|
+
#
|
345
|
+
def combineWith(s, oper)
|
346
|
+
sa = array
|
347
|
+
sb = s.array
|
348
|
+
|
349
|
+
i = 0
|
350
|
+
j = 0
|
351
|
+
c = []
|
352
|
+
|
353
|
+
wasInside = false
|
354
|
+
|
355
|
+
while i < sa.length || j < sb.length
|
356
|
+
|
357
|
+
if i == sa.length
|
358
|
+
v = sb[j]
|
359
|
+
elsif j == sb.length
|
360
|
+
v = sa[i]
|
361
|
+
else
|
362
|
+
v = [sa[i],sb[j]].min
|
363
|
+
end
|
364
|
+
|
365
|
+
if i < sa.length && v == sa[i]
|
366
|
+
i += 1
|
367
|
+
end
|
368
|
+
if j < sb.length && v == sb[j]
|
369
|
+
j += 1
|
370
|
+
end
|
371
|
+
|
372
|
+
case oper
|
373
|
+
when 'i'
|
374
|
+
inside = ((i & 1) == 1) && ((j & 1) == 1)
|
375
|
+
when 'd'
|
376
|
+
inside = ((i & 1) == 1) && ((j & 1) == 0)
|
377
|
+
else
|
378
|
+
raise Exception, "illegal"
|
379
|
+
end
|
380
|
+
|
381
|
+
if inside != wasInside
|
382
|
+
c.push v
|
383
|
+
wasInside = inside
|
384
|
+
end
|
385
|
+
end
|
386
|
+
ret = CodeSet.new()
|
387
|
+
ret.setArray(c)
|
388
|
+
ret
|
389
|
+
end
|
390
|
+
|
391
|
+
end
|
392
|
+
|