string_splitter 0.7.0 → 0.7.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/README.md +9 -9
- data/lib/string_splitter.rb +92 -69
- data/lib/string_splitter/split.rb +51 -0
- data/lib/string_splitter/version.rb +1 -1
- metadata +3 -16
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
|
4
|
+
data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
|
7
|
+
data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,12 @@
|
|
1
|
+
## 0.7.1 - 2020-08-22
|
2
|
+
|
3
|
+
#### Changes
|
4
|
+
|
5
|
+
- performance improvements
|
6
|
+
- delegate to `String#split` where possible
|
7
|
+
- use a regular class for Split rather than values.rb
|
8
|
+
- create Split objects directly rather than allocating intermediate hashes
|
9
|
+
|
1
10
|
## 0.7.0 - 2020-08-21
|
2
11
|
|
3
12
|
#### Breaking Changes
|
data/README.md
CHANGED
@@ -11,7 +11,7 @@
|
|
11
11
|
- [DESCRIPTION](#description)
|
12
12
|
- [WHY?](#why)
|
13
13
|
- [CAVEATS](#caveats)
|
14
|
-
- [Differences from String#split](#differences-from-
|
14
|
+
- [Differences from String#split](#differences-from-stringsplit)
|
15
15
|
- [COMPATIBILITY](#compatibility)
|
16
16
|
- [VERSION](#version)
|
17
17
|
- [SEE ALSO](#see-also)
|
@@ -130,7 +130,7 @@ end
|
|
130
130
|
Many languages have built-in `split` functions/methods for strings. They behave
|
131
131
|
similarly (notwithstanding the occasional
|
132
132
|
[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
|
133
|
-
handle a few common cases e.g.:
|
133
|
+
handle a few common cases, e.g.:
|
134
134
|
|
135
135
|
* limiting the number of splits
|
136
136
|
* including the separator(s) in the results
|
@@ -140,7 +140,7 @@ But, because the API is squeezed into two overloaded parameters (the delimiter
|
|
140
140
|
and the limit), achieving the desired results can be tricky. For instance,
|
141
141
|
while `String#split` removes empty trailing fields (by default), it provides no
|
142
142
|
way to remove *all* empty fields. Likewise, the cramped API means there's no
|
143
|
-
way to e.g
|
143
|
+
way to, e.g., combine a limit (positive integer) with the option to preserve
|
144
144
|
empty fields (negative integer), or use backreferences in a delimiter pattern
|
145
145
|
without including its captured subexpressions in the result.
|
146
146
|
|
@@ -192,7 +192,7 @@ to a regex or a full-blown parser.
|
|
192
192
|
As an example, the nominally unstructured output of many Unix commands is often
|
193
193
|
formatted in a way that's tantalizingly close to being
|
194
194
|
[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
|
195
|
-
apart from a few pesky exceptions e.g.:
|
195
|
+
apart from a few pesky exceptions, e.g.:
|
196
196
|
|
197
197
|
```bash
|
198
198
|
$ ls -l
|
@@ -205,7 +205,7 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
|
|
205
205
|
```
|
206
206
|
|
207
207
|
These lines can *almost* be parsed into an array of fields by splitting them on
|
208
|
-
whitespace. The exception is the date (columns 6-8) i.e.:
|
208
|
+
whitespace. The exception is the date (columns 6-8), i.e.:
|
209
209
|
|
210
210
|
```ruby
|
211
211
|
line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
|
@@ -224,7 +224,7 @@ instead of:
|
|
224
224
|
["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
|
225
225
|
```
|
226
226
|
|
227
|
-
One way to work around this is to parse the whole line e.g.:
|
227
|
+
One way to work around this is to parse the whole line, e.g.:
|
228
228
|
|
229
229
|
```ruby
|
230
230
|
line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
|
@@ -232,7 +232,7 @@ line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+
|
|
232
232
|
|
233
233
|
But that requires us to specify *everything*. What we really want is a version
|
234
234
|
of `split` which allows us to veto splitting for the 6th and 7th delimiters
|
235
|
-
(and to stop after the 8th delimiter) i.e. control over which splits are
|
235
|
+
(and to stop after the 8th delimiter), i.e. control over which splits are
|
236
236
|
accepted, rather than being restricted to the single, baked-in strategy
|
237
237
|
provided by the `limit` parameter.
|
238
238
|
|
@@ -258,7 +258,7 @@ ss.split(line, at: [1..5, 8])
|
|
258
258
|
## Differences from String#split
|
259
259
|
|
260
260
|
Unlike `String#split`, StringSplitter doesn't trim the string before splitting
|
261
|
-
|
261
|
+
if the delimiter is omitted or a single space, e.g.:
|
262
262
|
|
263
263
|
```ruby
|
264
264
|
" foo bar baz ".split # => ["foo", "bar", "baz"]
|
@@ -297,7 +297,7 @@ currently, Ruby 2.5 and above.
|
|
297
297
|
|
298
298
|
# VERSION
|
299
299
|
|
300
|
-
0.7.
|
300
|
+
0.7.1
|
301
301
|
|
302
302
|
# SEE ALSO
|
303
303
|
|
data/lib/string_splitter.rb
CHANGED
@@ -1,8 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require 'set'
|
4
|
-
require 'values'
|
5
4
|
|
5
|
+
require_relative 'string_splitter/split'
|
6
6
|
require_relative 'string_splitter/version'
|
7
7
|
|
8
8
|
# This class extends the functionality of +String#split+ by:
|
@@ -17,9 +17,10 @@ require_relative 'string_splitter/version'
|
|
17
17
|
# These enhancements allow splits to handle many cases that otherwise require bigger
|
18
18
|
# guns, e.g. regex matching or parsing.
|
19
19
|
#
|
20
|
-
# Implementation-wise, we split the string with
|
21
|
-
#
|
22
|
-
#
|
20
|
+
# Implementation-wise, we split the string either with String#split, or with a custom
|
21
|
+
# scanner if the delimiter may contain captures (since String#split doesn't handle
|
22
|
+
# them correctly) and parse the resulting tokens into an array of Split objects with
|
23
|
+
# the following attributes:
|
23
24
|
#
|
24
25
|
# - captures: separator substrings captured by parentheses in the delimiter pattern
|
25
26
|
# - count: the number of splits
|
@@ -43,42 +44,6 @@ class StringSplitter
|
|
43
44
|
DEFAULT_DELIMITER = /\s+/.freeze
|
44
45
|
REMOVE = [].freeze
|
45
46
|
|
46
|
-
Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
|
47
|
-
def position
|
48
|
-
index + 1
|
49
|
-
end
|
50
|
-
|
51
|
-
alias_method :pos, :position
|
52
|
-
|
53
|
-
# 0-based index relative to the end of the array, e.g. for 5 items:
|
54
|
-
#
|
55
|
-
# index | rindex
|
56
|
-
# ------|-------
|
57
|
-
# 0 | 4
|
58
|
-
# 1 | 3
|
59
|
-
# 2 | 2
|
60
|
-
# 3 | 1
|
61
|
-
# 4 | 0
|
62
|
-
def rindex
|
63
|
-
count - position
|
64
|
-
end
|
65
|
-
|
66
|
-
# 1-based position relative to the end of the array, e.g. for 5 items:
|
67
|
-
#
|
68
|
-
# position | rposition
|
69
|
-
# ----------|----------
|
70
|
-
# 1 | 5
|
71
|
-
# 2 | 4
|
72
|
-
# 3 | 3
|
73
|
-
# 4 | 2
|
74
|
-
# 5 | 1
|
75
|
-
def rposition
|
76
|
-
count + 1 - position
|
77
|
-
end
|
78
|
-
|
79
|
-
alias_method :rpos, :rposition
|
80
|
-
end
|
81
|
-
|
82
47
|
# simulate an enum. the value is returned by the case statement
|
83
48
|
# in the generated block if the positions match
|
84
49
|
module Action
|
@@ -130,9 +95,10 @@ class StringSplitter
|
|
130
95
|
|
131
96
|
return result unless splits
|
132
97
|
|
133
|
-
splits.
|
134
|
-
|
135
|
-
|
98
|
+
result << splits.first.lhs
|
99
|
+
|
100
|
+
splits.each_with_index do |split, index|
|
101
|
+
split.update!(count: count, index: index)
|
136
102
|
|
137
103
|
if accept.call(split)
|
138
104
|
result << split.captures << split.rhs
|
@@ -166,9 +132,10 @@ class StringSplitter
|
|
166
132
|
|
167
133
|
return result unless splits
|
168
134
|
|
169
|
-
splits.
|
170
|
-
|
171
|
-
|
135
|
+
result.unshift(splits.last.rhs)
|
136
|
+
|
137
|
+
splits.reverse_each.with_index do |split, index|
|
138
|
+
split.update!(count: count, index: index)
|
172
139
|
|
173
140
|
if accept.call(split)
|
174
141
|
# [lhs + captures] + result
|
@@ -190,7 +157,7 @@ class StringSplitter
|
|
190
157
|
# the following fields:
|
191
158
|
#
|
192
159
|
# - result: the array of separated strings to return from +split+ or +rsplit+.
|
193
|
-
# if the splits
|
160
|
+
# if the splits array is empty, the caller returns this array immediately
|
194
161
|
# without any further processing
|
195
162
|
#
|
196
163
|
# - splits: an array of hashes containing the lhs, rhs, separator and captured
|
@@ -202,23 +169,76 @@ class StringSplitter
|
|
202
169
|
# accepted (true) or rejected (false)
|
203
170
|
#
|
204
171
|
def init(string:, delimiter:, select:, reject:, block:)
|
205
|
-
if
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
172
|
+
return [[]] if string.empty?
|
173
|
+
|
174
|
+
unless block
|
175
|
+
if reject
|
176
|
+
positions = reject
|
177
|
+
action = Action::REJECT
|
178
|
+
elsif select
|
179
|
+
positions = select
|
180
|
+
action = Action::SELECT
|
181
|
+
else
|
182
|
+
block = ACCEPT_ALL
|
183
|
+
end
|
211
184
|
end
|
212
185
|
|
213
|
-
|
186
|
+
# use String#split if we can
|
187
|
+
#
|
188
|
+
# NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
|
189
|
+
# on TruffleRuby
|
190
|
+
|
191
|
+
if delimiter.is_a?(String)
|
192
|
+
limit = -1
|
193
|
+
|
194
|
+
if delimiter == ' '
|
195
|
+
delimiter = / / # don't trim
|
196
|
+
elsif delimiter.empty?
|
197
|
+
limit = 0 # remove the trailing empty string
|
198
|
+
end
|
199
|
+
|
200
|
+
result = string.split(delimiter, limit)
|
201
|
+
|
202
|
+
return [result] if result.length == 1 # delimiter not found: no splits
|
203
|
+
|
204
|
+
if block == ACCEPT_ALL # return the (2 or more) fields
|
205
|
+
result = result.reject(&:empty?) if @remove_empty_fields
|
206
|
+
return [result]
|
207
|
+
end
|
208
|
+
|
209
|
+
splits = []
|
210
|
+
|
211
|
+
result.each_cons(2) do |lhs, rhs| # 2 or more fields
|
212
|
+
splits << Split.new(
|
213
|
+
captures: [],
|
214
|
+
lhs: lhs,
|
215
|
+
rhs: rhs,
|
216
|
+
separator: delimiter
|
217
|
+
)
|
218
|
+
end
|
219
|
+
elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
|
220
|
+
# non-empty separators so -1 is safe
|
221
|
+
|
222
|
+
if @remove_empty_fields
|
223
|
+
result = []
|
224
|
+
string.split(delimiter, -1) do |field|
|
225
|
+
result << field unless it.empty?
|
226
|
+
end
|
227
|
+
else
|
228
|
+
result = string.split(delimiter, -1)
|
229
|
+
end
|
214
230
|
|
215
|
-
if splits.empty?
|
216
|
-
result = string.empty? ? [] : [string]
|
217
231
|
return [result]
|
232
|
+
else
|
233
|
+
splits = parse(string, delimiter)
|
218
234
|
end
|
219
235
|
|
220
|
-
|
221
|
-
|
236
|
+
count = splits.length
|
237
|
+
|
238
|
+
return [[string]] if count.zero?
|
239
|
+
|
240
|
+
block ||= compile(positions, action, count)
|
241
|
+
[[], splits, count, block]
|
222
242
|
end
|
223
243
|
|
224
244
|
def render(values)
|
@@ -227,6 +247,7 @@ class StringSplitter
|
|
227
247
|
value.empty? && @remove_empty_fields ? REMOVE : [value]
|
228
248
|
elsif @include_captures
|
229
249
|
if @spread_captures
|
250
|
+
# TODO make sure compact can return a Capture
|
230
251
|
@spread_captures == :compact ? value.compact : value
|
231
252
|
elsif value.empty?
|
232
253
|
# we expose non-captures (string delimiters or regexps with no
|
@@ -247,7 +268,7 @@ class StringSplitter
|
|
247
268
|
# the delimiter, returning an array of objects (hashes) representing each split.
|
248
269
|
# e.g. for:
|
249
270
|
#
|
250
|
-
# parse
|
271
|
+
# parse("foo:bar:baz:quux", ":")
|
251
272
|
#
|
252
273
|
# we return:
|
253
274
|
#
|
@@ -258,6 +279,7 @@ class StringSplitter
|
|
258
279
|
# ]
|
259
280
|
#
|
260
281
|
def parse(string, delimiter)
|
282
|
+
# has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
|
261
283
|
result = []
|
262
284
|
start = 0
|
263
285
|
|
@@ -273,21 +295,23 @@ class StringSplitter
|
|
273
295
|
next if separator.empty? && (index.zero? || after == string.length)
|
274
296
|
|
275
297
|
lhs = string.slice(start, index - start)
|
276
|
-
result.last
|
298
|
+
result.last.rhs = lhs unless result.empty?
|
277
299
|
|
278
300
|
# this is correct for the last/only match, but gets updated to the next
|
279
301
|
# match's lhs for other matches
|
280
302
|
rhs = match.post_match
|
281
303
|
|
282
|
-
|
304
|
+
# captures = (has_names ? Captures.new(match) : match.captures)
|
305
|
+
|
306
|
+
result << Split.new(
|
283
307
|
captures: match.captures,
|
284
308
|
lhs: lhs,
|
285
309
|
rhs: rhs,
|
286
|
-
separator: separator
|
287
|
-
|
310
|
+
separator: separator
|
311
|
+
)
|
288
312
|
|
289
|
-
#
|
290
|
-
# last character of the separator
|
313
|
+
# advance the start index (the start of the next lhs) to the position
|
314
|
+
# after the last character of the separator
|
291
315
|
start = after
|
292
316
|
end
|
293
317
|
|
@@ -297,8 +321,8 @@ class StringSplitter
|
|
297
321
|
# returns a lambda which splits at (i.e. accepts or rejects splits at, depending
|
298
322
|
# on the action) the supplied positions
|
299
323
|
#
|
300
|
-
# positions are preprocessed to support
|
301
|
-
#
|
324
|
+
# positions are preprocessed to support negative indices, infinite ranges, and
|
325
|
+
# descending ranges, e.g.:
|
302
326
|
#
|
303
327
|
# ss.split("foo:bar:baz:quux", ":", at: -1)
|
304
328
|
#
|
@@ -309,9 +333,8 @@ class StringSplitter
|
|
309
333
|
# and
|
310
334
|
#
|
311
335
|
# ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
|
312
|
-
# ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
|
313
336
|
#
|
314
|
-
#
|
337
|
+
# translates to:
|
315
338
|
#
|
316
339
|
# ss.split("foo:bar:baz:quux", ":", at: 6..8)
|
317
340
|
#
|
@@ -0,0 +1,51 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
class StringSplitter
|
4
|
+
class Split
|
5
|
+
attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
|
6
|
+
attr_writer :rhs
|
7
|
+
alias pos position
|
8
|
+
|
9
|
+
def initialize(captures:, lhs:, rhs:, separator:)
|
10
|
+
@captures = captures
|
11
|
+
@lhs = lhs
|
12
|
+
@rhs = rhs
|
13
|
+
@separator = separator
|
14
|
+
end
|
15
|
+
|
16
|
+
# 0-based index relative to the end of the array, e.g. for 5 items:
|
17
|
+
#
|
18
|
+
# index | rindex
|
19
|
+
# ------|-------
|
20
|
+
# 0 | 4
|
21
|
+
# 1 | 3
|
22
|
+
# 2 | 2
|
23
|
+
# 3 | 1
|
24
|
+
# 4 | 0
|
25
|
+
def rindex
|
26
|
+
@count - @position
|
27
|
+
end
|
28
|
+
|
29
|
+
# 1-based position relative to the end of the array, e.g. for 5 items:
|
30
|
+
#
|
31
|
+
# position | rposition
|
32
|
+
# ----------|----------
|
33
|
+
# 1 | 5
|
34
|
+
# 2 | 4
|
35
|
+
# 3 | 3
|
36
|
+
# 4 | 2
|
37
|
+
# 5 | 1
|
38
|
+
def rposition
|
39
|
+
@count + 1 - @position
|
40
|
+
end
|
41
|
+
|
42
|
+
alias rpos rposition
|
43
|
+
|
44
|
+
def update!(count:, index:)
|
45
|
+
@count = count
|
46
|
+
@index = index
|
47
|
+
@position = index + 1
|
48
|
+
freeze
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
metadata
CHANGED
@@ -1,29 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: string_splitter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- chocolateboy
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-08-
|
11
|
+
date: 2020-08-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
-
- !ruby/object:Gem::Dependency
|
14
|
-
name: values
|
15
|
-
requirement: !ruby/object:Gem::Requirement
|
16
|
-
requirements:
|
17
|
-
- - "~>"
|
18
|
-
- !ruby/object:Gem::Version
|
19
|
-
version: '1.8'
|
20
|
-
type: :runtime
|
21
|
-
prerelease: false
|
22
|
-
version_requirements: !ruby/object:Gem::Requirement
|
23
|
-
requirements:
|
24
|
-
- - "~>"
|
25
|
-
- !ruby/object:Gem::Version
|
26
|
-
version: '1.8'
|
27
13
|
- !ruby/object:Gem::Dependency
|
28
14
|
name: bundler
|
29
15
|
requirement: !ruby/object:Gem::Requirement
|
@@ -104,6 +90,7 @@ files:
|
|
104
90
|
- LICENSE.md
|
105
91
|
- README.md
|
106
92
|
- lib/string_splitter.rb
|
93
|
+
- lib/string_splitter/split.rb
|
107
94
|
- lib/string_splitter/version.rb
|
108
95
|
homepage: https://github.com/chocolateboy/string_splitter
|
109
96
|
licenses:
|