string_splitter 0.7.0 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/README.md +9 -9
- data/lib/string_splitter.rb +92 -69
- data/lib/string_splitter/split.rb +51 -0
- data/lib/string_splitter/version.rb +1 -1
- metadata +3 -16
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
|
4
|
+
data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
|
7
|
+
data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,12 @@
|
|
1
|
+
## 0.7.1 - 2020-08-22
|
2
|
+
|
3
|
+
#### Changes
|
4
|
+
|
5
|
+
- performance improvements
|
6
|
+
- delegate to `String#split` where possible
|
7
|
+
- use a regular class for Split rather than values.rb
|
8
|
+
- create Split objects directly rather than allocating intermediate hashes
|
9
|
+
|
1
10
|
## 0.7.0 - 2020-08-21
|
2
11
|
|
3
12
|
#### Breaking Changes
|
data/README.md
CHANGED
@@ -11,7 +11,7 @@
|
|
11
11
|
- [DESCRIPTION](#description)
|
12
12
|
- [WHY?](#why)
|
13
13
|
- [CAVEATS](#caveats)
|
14
|
-
- [Differences from String#split](#differences-from-
|
14
|
+
- [Differences from String#split](#differences-from-stringsplit)
|
15
15
|
- [COMPATIBILITY](#compatibility)
|
16
16
|
- [VERSION](#version)
|
17
17
|
- [SEE ALSO](#see-also)
|
@@ -130,7 +130,7 @@ end
|
|
130
130
|
Many languages have built-in `split` functions/methods for strings. They behave
|
131
131
|
similarly (notwithstanding the occasional
|
132
132
|
[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
|
133
|
-
handle a few common cases e.g.:
|
133
|
+
handle a few common cases, e.g.:
|
134
134
|
|
135
135
|
* limiting the number of splits
|
136
136
|
* including the separator(s) in the results
|
@@ -140,7 +140,7 @@ But, because the API is squeezed into two overloaded parameters (the delimiter
|
|
140
140
|
and the limit), achieving the desired results can be tricky. For instance,
|
141
141
|
while `String#split` removes empty trailing fields (by default), it provides no
|
142
142
|
way to remove *all* empty fields. Likewise, the cramped API means there's no
|
143
|
-
way to e.g
|
143
|
+
way to, e.g., combine a limit (positive integer) with the option to preserve
|
144
144
|
empty fields (negative integer), or use backreferences in a delimiter pattern
|
145
145
|
without including its captured subexpressions in the result.
|
146
146
|
|
@@ -192,7 +192,7 @@ to a regex or a full-blown parser.
|
|
192
192
|
As an example, the nominally unstructured output of many Unix commands is often
|
193
193
|
formatted in a way that's tantalizingly close to being
|
194
194
|
[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
|
195
|
-
apart from a few pesky exceptions e.g.:
|
195
|
+
apart from a few pesky exceptions, e.g.:
|
196
196
|
|
197
197
|
```bash
|
198
198
|
$ ls -l
|
@@ -205,7 +205,7 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
|
|
205
205
|
```
|
206
206
|
|
207
207
|
These lines can *almost* be parsed into an array of fields by splitting them on
|
208
|
-
whitespace. The exception is the date (columns 6-8) i.e.:
|
208
|
+
whitespace. The exception is the date (columns 6-8), i.e.:
|
209
209
|
|
210
210
|
```ruby
|
211
211
|
line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
|
@@ -224,7 +224,7 @@ instead of:
|
|
224
224
|
["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
|
225
225
|
```
|
226
226
|
|
227
|
-
One way to work around this is to parse the whole line e.g.:
|
227
|
+
One way to work around this is to parse the whole line, e.g.:
|
228
228
|
|
229
229
|
```ruby
|
230
230
|
line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
|
@@ -232,7 +232,7 @@ line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+
|
|
232
232
|
|
233
233
|
But that requires us to specify *everything*. What we really want is a version
|
234
234
|
of `split` which allows us to veto splitting for the 6th and 7th delimiters
|
235
|
-
(and to stop after the 8th delimiter) i.e. control over which splits are
|
235
|
+
(and to stop after the 8th delimiter), i.e. control over which splits are
|
236
236
|
accepted, rather than being restricted to the single, baked-in strategy
|
237
237
|
provided by the `limit` parameter.
|
238
238
|
|
@@ -258,7 +258,7 @@ ss.split(line, at: [1..5, 8])
|
|
258
258
|
## Differences from String#split
|
259
259
|
|
260
260
|
Unlike `String#split`, StringSplitter doesn't trim the string before splitting
|
261
|
-
|
261
|
+
if the delimiter is omitted or a single space, e.g.:
|
262
262
|
|
263
263
|
```ruby
|
264
264
|
" foo bar baz ".split # => ["foo", "bar", "baz"]
|
@@ -297,7 +297,7 @@ currently, Ruby 2.5 and above.
|
|
297
297
|
|
298
298
|
# VERSION
|
299
299
|
|
300
|
-
0.7.
|
300
|
+
0.7.1
|
301
301
|
|
302
302
|
# SEE ALSO
|
303
303
|
|
data/lib/string_splitter.rb
CHANGED
@@ -1,8 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require 'set'
|
4
|
-
require 'values'
|
5
4
|
|
5
|
+
require_relative 'string_splitter/split'
|
6
6
|
require_relative 'string_splitter/version'
|
7
7
|
|
8
8
|
# This class extends the functionality of +String#split+ by:
|
@@ -17,9 +17,10 @@ require_relative 'string_splitter/version'
|
|
17
17
|
# These enhancements allow splits to handle many cases that otherwise require bigger
|
18
18
|
# guns, e.g. regex matching or parsing.
|
19
19
|
#
|
20
|
-
# Implementation-wise, we split the string with
|
21
|
-
#
|
22
|
-
#
|
20
|
+
# Implementation-wise, we split the string either with String#split, or with a custom
|
21
|
+
# scanner if the delimiter may contain captures (since String#split doesn't handle
|
22
|
+
# them correctly) and parse the resulting tokens into an array of Split objects with
|
23
|
+
# the following attributes:
|
23
24
|
#
|
24
25
|
# - captures: separator substrings captured by parentheses in the delimiter pattern
|
25
26
|
# - count: the number of splits
|
@@ -43,42 +44,6 @@ class StringSplitter
|
|
43
44
|
DEFAULT_DELIMITER = /\s+/.freeze
|
44
45
|
REMOVE = [].freeze
|
45
46
|
|
46
|
-
Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
|
47
|
-
def position
|
48
|
-
index + 1
|
49
|
-
end
|
50
|
-
|
51
|
-
alias_method :pos, :position
|
52
|
-
|
53
|
-
# 0-based index relative to the end of the array, e.g. for 5 items:
|
54
|
-
#
|
55
|
-
# index | rindex
|
56
|
-
# ------|-------
|
57
|
-
# 0 | 4
|
58
|
-
# 1 | 3
|
59
|
-
# 2 | 2
|
60
|
-
# 3 | 1
|
61
|
-
# 4 | 0
|
62
|
-
def rindex
|
63
|
-
count - position
|
64
|
-
end
|
65
|
-
|
66
|
-
# 1-based position relative to the end of the array, e.g. for 5 items:
|
67
|
-
#
|
68
|
-
# position | rposition
|
69
|
-
# ----------|----------
|
70
|
-
# 1 | 5
|
71
|
-
# 2 | 4
|
72
|
-
# 3 | 3
|
73
|
-
# 4 | 2
|
74
|
-
# 5 | 1
|
75
|
-
def rposition
|
76
|
-
count + 1 - position
|
77
|
-
end
|
78
|
-
|
79
|
-
alias_method :rpos, :rposition
|
80
|
-
end
|
81
|
-
|
82
47
|
# simulate an enum. the value is returned by the case statement
|
83
48
|
# in the generated block if the positions match
|
84
49
|
module Action
|
@@ -130,9 +95,10 @@ class StringSplitter
|
|
130
95
|
|
131
96
|
return result unless splits
|
132
97
|
|
133
|
-
splits.
|
134
|
-
|
135
|
-
|
98
|
+
result << splits.first.lhs
|
99
|
+
|
100
|
+
splits.each_with_index do |split, index|
|
101
|
+
split.update!(count: count, index: index)
|
136
102
|
|
137
103
|
if accept.call(split)
|
138
104
|
result << split.captures << split.rhs
|
@@ -166,9 +132,10 @@ class StringSplitter
|
|
166
132
|
|
167
133
|
return result unless splits
|
168
134
|
|
169
|
-
splits.
|
170
|
-
|
171
|
-
|
135
|
+
result.unshift(splits.last.rhs)
|
136
|
+
|
137
|
+
splits.reverse_each.with_index do |split, index|
|
138
|
+
split.update!(count: count, index: index)
|
172
139
|
|
173
140
|
if accept.call(split)
|
174
141
|
# [lhs + captures] + result
|
@@ -190,7 +157,7 @@ class StringSplitter
|
|
190
157
|
# the following fields:
|
191
158
|
#
|
192
159
|
# - result: the array of separated strings to return from +split+ or +rsplit+.
|
193
|
-
# if the splits
|
160
|
+
# if the splits array is empty, the caller returns this array immediately
|
194
161
|
# without any further processing
|
195
162
|
#
|
196
163
|
# - splits: an array of hashes containing the lhs, rhs, separator and captured
|
@@ -202,23 +169,76 @@ class StringSplitter
|
|
202
169
|
# accepted (true) or rejected (false)
|
203
170
|
#
|
204
171
|
def init(string:, delimiter:, select:, reject:, block:)
|
205
|
-
if
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
172
|
+
return [[]] if string.empty?
|
173
|
+
|
174
|
+
unless block
|
175
|
+
if reject
|
176
|
+
positions = reject
|
177
|
+
action = Action::REJECT
|
178
|
+
elsif select
|
179
|
+
positions = select
|
180
|
+
action = Action::SELECT
|
181
|
+
else
|
182
|
+
block = ACCEPT_ALL
|
183
|
+
end
|
211
184
|
end
|
212
185
|
|
213
|
-
|
186
|
+
# use String#split if we can
|
187
|
+
#
|
188
|
+
# NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
|
189
|
+
# on TruffleRuby
|
190
|
+
|
191
|
+
if delimiter.is_a?(String)
|
192
|
+
limit = -1
|
193
|
+
|
194
|
+
if delimiter == ' '
|
195
|
+
delimiter = / / # don't trim
|
196
|
+
elsif delimiter.empty?
|
197
|
+
limit = 0 # remove the trailing empty string
|
198
|
+
end
|
199
|
+
|
200
|
+
result = string.split(delimiter, limit)
|
201
|
+
|
202
|
+
return [result] if result.length == 1 # delimiter not found: no splits
|
203
|
+
|
204
|
+
if block == ACCEPT_ALL # return the (2 or more) fields
|
205
|
+
result = result.reject(&:empty?) if @remove_empty_fields
|
206
|
+
return [result]
|
207
|
+
end
|
208
|
+
|
209
|
+
splits = []
|
210
|
+
|
211
|
+
result.each_cons(2) do |lhs, rhs| # 2 or more fields
|
212
|
+
splits << Split.new(
|
213
|
+
captures: [],
|
214
|
+
lhs: lhs,
|
215
|
+
rhs: rhs,
|
216
|
+
separator: delimiter
|
217
|
+
)
|
218
|
+
end
|
219
|
+
elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
|
220
|
+
# non-empty separators so -1 is safe
|
221
|
+
|
222
|
+
if @remove_empty_fields
|
223
|
+
result = []
|
224
|
+
string.split(delimiter, -1) do |field|
|
225
|
+
result << field unless it.empty?
|
226
|
+
end
|
227
|
+
else
|
228
|
+
result = string.split(delimiter, -1)
|
229
|
+
end
|
214
230
|
|
215
|
-
if splits.empty?
|
216
|
-
result = string.empty? ? [] : [string]
|
217
231
|
return [result]
|
232
|
+
else
|
233
|
+
splits = parse(string, delimiter)
|
218
234
|
end
|
219
235
|
|
220
|
-
|
221
|
-
|
236
|
+
count = splits.length
|
237
|
+
|
238
|
+
return [[string]] if count.zero?
|
239
|
+
|
240
|
+
block ||= compile(positions, action, count)
|
241
|
+
[[], splits, count, block]
|
222
242
|
end
|
223
243
|
|
224
244
|
def render(values)
|
@@ -227,6 +247,7 @@ class StringSplitter
|
|
227
247
|
value.empty? && @remove_empty_fields ? REMOVE : [value]
|
228
248
|
elsif @include_captures
|
229
249
|
if @spread_captures
|
250
|
+
# TODO make sure compact can return a Capture
|
230
251
|
@spread_captures == :compact ? value.compact : value
|
231
252
|
elsif value.empty?
|
232
253
|
# we expose non-captures (string delimiters or regexps with no
|
@@ -247,7 +268,7 @@ class StringSplitter
|
|
247
268
|
# the delimiter, returning an array of objects (hashes) representing each split.
|
248
269
|
# e.g. for:
|
249
270
|
#
|
250
|
-
# parse
|
271
|
+
# parse("foo:bar:baz:quux", ":")
|
251
272
|
#
|
252
273
|
# we return:
|
253
274
|
#
|
@@ -258,6 +279,7 @@ class StringSplitter
|
|
258
279
|
# ]
|
259
280
|
#
|
260
281
|
def parse(string, delimiter)
|
282
|
+
# has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
|
261
283
|
result = []
|
262
284
|
start = 0
|
263
285
|
|
@@ -273,21 +295,23 @@ class StringSplitter
|
|
273
295
|
next if separator.empty? && (index.zero? || after == string.length)
|
274
296
|
|
275
297
|
lhs = string.slice(start, index - start)
|
276
|
-
result.last
|
298
|
+
result.last.rhs = lhs unless result.empty?
|
277
299
|
|
278
300
|
# this is correct for the last/only match, but gets updated to the next
|
279
301
|
# match's lhs for other matches
|
280
302
|
rhs = match.post_match
|
281
303
|
|
282
|
-
|
304
|
+
# captures = (has_names ? Captures.new(match) : match.captures)
|
305
|
+
|
306
|
+
result << Split.new(
|
283
307
|
captures: match.captures,
|
284
308
|
lhs: lhs,
|
285
309
|
rhs: rhs,
|
286
|
-
separator: separator
|
287
|
-
|
310
|
+
separator: separator
|
311
|
+
)
|
288
312
|
|
289
|
-
#
|
290
|
-
# last character of the separator
|
313
|
+
# advance the start index (the start of the next lhs) to the position
|
314
|
+
# after the last character of the separator
|
291
315
|
start = after
|
292
316
|
end
|
293
317
|
|
@@ -297,8 +321,8 @@ class StringSplitter
|
|
297
321
|
# returns a lambda which splits at (i.e. accepts or rejects splits at, depending
|
298
322
|
# on the action) the supplied positions
|
299
323
|
#
|
300
|
-
# positions are preprocessed to support
|
301
|
-
#
|
324
|
+
# positions are preprocessed to support negative indices, infinite ranges, and
|
325
|
+
# descending ranges, e.g.:
|
302
326
|
#
|
303
327
|
# ss.split("foo:bar:baz:quux", ":", at: -1)
|
304
328
|
#
|
@@ -309,9 +333,8 @@ class StringSplitter
|
|
309
333
|
# and
|
310
334
|
#
|
311
335
|
# ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
|
312
|
-
# ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
|
313
336
|
#
|
314
|
-
#
|
337
|
+
# translates to:
|
315
338
|
#
|
316
339
|
# ss.split("foo:bar:baz:quux", ":", at: 6..8)
|
317
340
|
#
|
@@ -0,0 +1,51 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
class StringSplitter
|
4
|
+
class Split
|
5
|
+
attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
|
6
|
+
attr_writer :rhs
|
7
|
+
alias pos position
|
8
|
+
|
9
|
+
def initialize(captures:, lhs:, rhs:, separator:)
|
10
|
+
@captures = captures
|
11
|
+
@lhs = lhs
|
12
|
+
@rhs = rhs
|
13
|
+
@separator = separator
|
14
|
+
end
|
15
|
+
|
16
|
+
# 0-based index relative to the end of the array, e.g. for 5 items:
|
17
|
+
#
|
18
|
+
# index | rindex
|
19
|
+
# ------|-------
|
20
|
+
# 0 | 4
|
21
|
+
# 1 | 3
|
22
|
+
# 2 | 2
|
23
|
+
# 3 | 1
|
24
|
+
# 4 | 0
|
25
|
+
def rindex
|
26
|
+
@count - @position
|
27
|
+
end
|
28
|
+
|
29
|
+
# 1-based position relative to the end of the array, e.g. for 5 items:
|
30
|
+
#
|
31
|
+
# position | rposition
|
32
|
+
# ----------|----------
|
33
|
+
# 1 | 5
|
34
|
+
# 2 | 4
|
35
|
+
# 3 | 3
|
36
|
+
# 4 | 2
|
37
|
+
# 5 | 1
|
38
|
+
def rposition
|
39
|
+
@count + 1 - @position
|
40
|
+
end
|
41
|
+
|
42
|
+
alias rpos rposition
|
43
|
+
|
44
|
+
def update!(count:, index:)
|
45
|
+
@count = count
|
46
|
+
@index = index
|
47
|
+
@position = index + 1
|
48
|
+
freeze
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
metadata
CHANGED
@@ -1,29 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: string_splitter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- chocolateboy
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-08-
|
11
|
+
date: 2020-08-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
-
- !ruby/object:Gem::Dependency
|
14
|
-
name: values
|
15
|
-
requirement: !ruby/object:Gem::Requirement
|
16
|
-
requirements:
|
17
|
-
- - "~>"
|
18
|
-
- !ruby/object:Gem::Version
|
19
|
-
version: '1.8'
|
20
|
-
type: :runtime
|
21
|
-
prerelease: false
|
22
|
-
version_requirements: !ruby/object:Gem::Requirement
|
23
|
-
requirements:
|
24
|
-
- - "~>"
|
25
|
-
- !ruby/object:Gem::Version
|
26
|
-
version: '1.8'
|
27
13
|
- !ruby/object:Gem::Dependency
|
28
14
|
name: bundler
|
29
15
|
requirement: !ruby/object:Gem::Requirement
|
@@ -104,6 +90,7 @@ files:
|
|
104
90
|
- LICENSE.md
|
105
91
|
- README.md
|
106
92
|
- lib/string_splitter.rb
|
93
|
+
- lib/string_splitter/split.rb
|
107
94
|
- lib/string_splitter/version.rb
|
108
95
|
homepage: https://github.com/chocolateboy/string_splitter
|
109
96
|
licenses:
|