string_splitter 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 400534de6c3143ef81b2ad46a3a6432b7d83ef0900024ebdde3f06a4e1714890
4
- data.tar.gz: 643f5af7b9e13321dfa97b045b124d0c5ea576868b13141c264122bc96baea5e
3
+ metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
4
+ data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
5
5
  SHA512:
6
- metadata.gz: 35bed8fe69b33314813fbd68a8da0e8f4799b7891275ac601b157caeb0e0a3780f37ec7e7876d808b8dfcbfdf7527f45c3af0dc0d679e133865e96949a1d9ce3
7
- data.tar.gz: 8186e40d57654daf1a481ab74c128910f7aa346bc343a0a9933dc39b7cceeb204c1a55ac39b39321df46f7d02420fd87f93dd4a708be0a985d94833df018da87
6
+ metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
7
+ data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b
@@ -1,3 +1,12 @@
1
+ ## 0.7.1 - 2020-08-22
2
+
3
+ #### Changes
4
+
5
+ - performance improvements
6
+ - delegate to `String#split` where possible
7
+ - use a regular class for Split rather than values.rb
8
+ - create Split objects directly rather than allocating intermediate hashes
9
+
1
10
  ## 0.7.0 - 2020-08-21
2
11
 
3
12
  #### Breaking Changes
data/README.md CHANGED
@@ -11,7 +11,7 @@
11
11
  - [DESCRIPTION](#description)
12
12
  - [WHY?](#why)
13
13
  - [CAVEATS](#caveats)
14
- - [Differences from String#split](#differences-from-string%23split)
14
+ - [Differences from String#split](#differences-from-stringsplit)
15
15
  - [COMPATIBILITY](#compatibility)
16
16
  - [VERSION](#version)
17
17
  - [SEE ALSO](#see-also)
@@ -130,7 +130,7 @@ end
130
130
  Many languages have built-in `split` functions/methods for strings. They behave
131
131
  similarly (notwithstanding the occasional
132
132
  [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
133
- handle a few common cases e.g.:
133
+ handle a few common cases, e.g.:
134
134
 
135
135
  * limiting the number of splits
136
136
  * including the separator(s) in the results
@@ -140,7 +140,7 @@ But, because the API is squeezed into two overloaded parameters (the delimiter
140
140
  and the limit), achieving the desired results can be tricky. For instance,
141
141
  while `String#split` removes empty trailing fields (by default), it provides no
142
142
  way to remove *all* empty fields. Likewise, the cramped API means there's no
143
- way to e.g. combine a limit (positive integer) with the option to preserve
143
+ way to, e.g., combine a limit (positive integer) with the option to preserve
144
144
  empty fields (negative integer), or use backreferences in a delimiter pattern
145
145
  without including its captured subexpressions in the result.
146
146
 
@@ -192,7 +192,7 @@ to a regex or a full-blown parser.
192
192
  As an example, the nominally unstructured output of many Unix commands is often
193
193
  formatted in a way that's tantalizingly close to being
194
194
  [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
195
- apart from a few pesky exceptions e.g.:
195
+ apart from a few pesky exceptions, e.g.:
196
196
 
197
197
  ```bash
198
198
  $ ls -l
@@ -205,7 +205,7 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
205
205
  ```
206
206
 
207
207
  These lines can *almost* be parsed into an array of fields by splitting them on
208
- whitespace. The exception is the date (columns 6-8) i.e.:
208
+ whitespace. The exception is the date (columns 6-8), i.e.:
209
209
 
210
210
  ```ruby
211
211
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -224,7 +224,7 @@ instead of:
224
224
  ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
225
225
  ```
226
226
 
227
- One way to work around this is to parse the whole line e.g.:
227
+ One way to work around this is to parse the whole line, e.g.:
228
228
 
229
229
  ```ruby
230
230
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
@@ -232,7 +232,7 @@ line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+
232
232
 
233
233
  But that requires us to specify *everything*. What we really want is a version
234
234
  of `split` which allows us to veto splitting for the 6th and 7th delimiters
235
- (and to stop after the 8th delimiter) i.e. control over which splits are
235
+ (and to stop after the 8th delimiter), i.e. control over which splits are
236
236
  accepted, rather than being restricted to the single, baked-in strategy
237
237
  provided by the `limit` parameter.
238
238
 
@@ -258,7 +258,7 @@ ss.split(line, at: [1..5, 8])
258
258
  ## Differences from String#split
259
259
 
260
260
  Unlike `String#split`, StringSplitter doesn't trim the string before splitting
261
- (with `String#strip`) if the delimiter is omitted or a single space, e.g.:
261
+ if the delimiter is omitted or a single space, e.g.:
262
262
 
263
263
  ```ruby
264
264
  " foo bar baz ".split # => ["foo", "bar", "baz"]
@@ -297,7 +297,7 @@ currently, Ruby 2.5 and above.
297
297
 
298
298
  # VERSION
299
299
 
300
- 0.7.0
300
+ 0.7.1
301
301
 
302
302
  # SEE ALSO
303
303
 
@@ -1,8 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'set'
4
- require 'values'
5
4
 
5
+ require_relative 'string_splitter/split'
6
6
  require_relative 'string_splitter/version'
7
7
 
8
8
  # This class extends the functionality of +String#split+ by:
@@ -17,9 +17,10 @@ require_relative 'string_splitter/version'
17
17
  # These enhancements allow splits to handle many cases that otherwise require bigger
18
18
  # guns, e.g. regex matching or parsing.
19
19
  #
20
- # Implementation-wise, we split the string with a scanner which works in a similar
21
- # way to +String#split+ and parse the resulting tokens into an array of Split objects
22
- # with the following fields:
20
+ # Implementation-wise, we split the string either with String#split, or with a custom
21
+ # scanner if the delimiter may contain captures (since String#split doesn't handle
22
+ # them correctly) and parse the resulting tokens into an array of Split objects with
23
+ # the following attributes:
23
24
  #
24
25
  # - captures: separator substrings captured by parentheses in the delimiter pattern
25
26
  # - count: the number of splits
@@ -43,42 +44,6 @@ class StringSplitter
43
44
  DEFAULT_DELIMITER = /\s+/.freeze
44
45
  REMOVE = [].freeze
45
46
 
46
- Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
47
- def position
48
- index + 1
49
- end
50
-
51
- alias_method :pos, :position
52
-
53
- # 0-based index relative to the end of the array, e.g. for 5 items:
54
- #
55
- # index | rindex
56
- # ------|-------
57
- # 0 | 4
58
- # 1 | 3
59
- # 2 | 2
60
- # 3 | 1
61
- # 4 | 0
62
- def rindex
63
- count - position
64
- end
65
-
66
- # 1-based position relative to the end of the array, e.g. for 5 items:
67
- #
68
- # position | rposition
69
- # ----------|----------
70
- # 1 | 5
71
- # 2 | 4
72
- # 3 | 3
73
- # 4 | 2
74
- # 5 | 1
75
- def rposition
76
- count + 1 - position
77
- end
78
-
79
- alias_method :rpos, :rposition
80
- end
81
-
82
47
  # simulate an enum. the value is returned by the case statement
83
48
  # in the generated block if the positions match
84
49
  module Action
@@ -130,9 +95,10 @@ class StringSplitter
130
95
 
131
96
  return result unless splits
132
97
 
133
- splits.each_with_index do |hash, index|
134
- split = Split.with(hash.merge({ count: count, index: index }))
135
- result << split.lhs if result.empty?
98
+ result << splits.first.lhs
99
+
100
+ splits.each_with_index do |split, index|
101
+ split.update!(count: count, index: index)
136
102
 
137
103
  if accept.call(split)
138
104
  result << split.captures << split.rhs
@@ -166,9 +132,10 @@ class StringSplitter
166
132
 
167
133
  return result unless splits
168
134
 
169
- splits.reverse_each.with_index do |hash, index|
170
- split = Split.with(hash.merge({ count: count, index: index }))
171
- result.unshift(split.rhs) if result.empty?
135
+ result.unshift(splits.last.rhs)
136
+
137
+ splits.reverse_each.with_index do |split, index|
138
+ split.update!(count: count, index: index)
172
139
 
173
140
  if accept.call(split)
174
141
  # [lhs + captures] + result
@@ -190,7 +157,7 @@ class StringSplitter
190
157
  # the following fields:
191
158
  #
192
159
  # - result: the array of separated strings to return from +split+ or +rsplit+.
193
- # if the splits arry is empty, the caller returns this array immediately
160
+ # if the splits array is empty, the caller returns this array immediately
194
161
  # without any further processing
195
162
  #
196
163
  # - splits: an array of hashes containing the lhs, rhs, separator and captured
@@ -202,23 +169,76 @@ class StringSplitter
202
169
  # accepted (true) or rejected (false)
203
170
  #
204
171
  def init(string:, delimiter:, select:, reject:, block:)
205
- if reject
206
- positions = reject
207
- action = Action::REJECT
208
- elsif select
209
- positions = select
210
- action = Action::SELECT
172
+ return [[]] if string.empty?
173
+
174
+ unless block
175
+ if reject
176
+ positions = reject
177
+ action = Action::REJECT
178
+ elsif select
179
+ positions = select
180
+ action = Action::SELECT
181
+ else
182
+ block = ACCEPT_ALL
183
+ end
211
184
  end
212
185
 
213
- splits = parse(string, delimiter)
186
+ # use String#split if we can
187
+ #
188
+ # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
189
+ # on TruffleRuby
190
+
191
+ if delimiter.is_a?(String)
192
+ limit = -1
193
+
194
+ if delimiter == ' '
195
+ delimiter = / / # don't trim
196
+ elsif delimiter.empty?
197
+ limit = 0 # remove the trailing empty string
198
+ end
199
+
200
+ result = string.split(delimiter, limit)
201
+
202
+ return [result] if result.length == 1 # delimiter not found: no splits
203
+
204
+ if block == ACCEPT_ALL # return the (2 or more) fields
205
+ result = result.reject(&:empty?) if @remove_empty_fields
206
+ return [result]
207
+ end
208
+
209
+ splits = []
210
+
211
+ result.each_cons(2) do |lhs, rhs| # 2 or more fields
212
+ splits << Split.new(
213
+ captures: [],
214
+ lhs: lhs,
215
+ rhs: rhs,
216
+ separator: delimiter
217
+ )
218
+ end
219
+ elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
220
+ # non-empty separators so -1 is safe
221
+
222
+ if @remove_empty_fields
223
+ result = []
224
+ string.split(delimiter, -1) do |field|
225
+ result << field unless it.empty?
226
+ end
227
+ else
228
+ result = string.split(delimiter, -1)
229
+ end
214
230
 
215
- if splits.empty?
216
- result = string.empty? ? [] : [string]
217
231
  return [result]
232
+ else
233
+ splits = parse(string, delimiter)
218
234
  end
219
235
 
220
- block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
221
- [[], splits, splits.length, block]
236
+ count = splits.length
237
+
238
+ return [[string]] if count.zero?
239
+
240
+ block ||= compile(positions, action, count)
241
+ [[], splits, count, block]
222
242
  end
223
243
 
224
244
  def render(values)
@@ -227,6 +247,7 @@ class StringSplitter
227
247
  value.empty? && @remove_empty_fields ? REMOVE : [value]
228
248
  elsif @include_captures
229
249
  if @spread_captures
250
+ # TODO make sure compact can return a Capture
230
251
  @spread_captures == :compact ? value.compact : value
231
252
  elsif value.empty?
232
253
  # we expose non-captures (string delimiters or regexps with no
@@ -247,7 +268,7 @@ class StringSplitter
247
268
  # the delimiter, returning an array of objects (hashes) representing each split.
248
269
  # e.g. for:
249
270
  #
250
- # parse.split("foo:bar:baz:quux", ":")
271
+ # parse("foo:bar:baz:quux", ":")
251
272
  #
252
273
  # we return:
253
274
  #
@@ -258,6 +279,7 @@ class StringSplitter
258
279
  # ]
259
280
  #
260
281
  def parse(string, delimiter)
282
+ # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
261
283
  result = []
262
284
  start = 0
263
285
 
@@ -273,21 +295,23 @@ class StringSplitter
273
295
  next if separator.empty? && (index.zero? || after == string.length)
274
296
 
275
297
  lhs = string.slice(start, index - start)
276
- result.last[:rhs] = lhs unless result.empty?
298
+ result.last.rhs = lhs unless result.empty?
277
299
 
278
300
  # this is correct for the last/only match, but gets updated to the next
279
301
  # match's lhs for other matches
280
302
  rhs = match.post_match
281
303
 
282
- result << {
304
+ # captures = (has_names ? Captures.new(match) : match.captures)
305
+
306
+ result << Split.new(
283
307
  captures: match.captures,
284
308
  lhs: lhs,
285
309
  rhs: rhs,
286
- separator: separator,
287
- }
310
+ separator: separator
311
+ )
288
312
 
289
- # move the start index (the start of the next lhs) to the index after the
290
- # last character of the separator
313
+ # advance the start index (the start of the next lhs) to the position
314
+ # after the last character of the separator
291
315
  start = after
292
316
  end
293
317
 
@@ -297,8 +321,8 @@ class StringSplitter
297
321
  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
298
322
  # on the action) the supplied positions
299
323
  #
300
- # positions are preprocessed to support additional features: negative
301
- # ranges, infinite ranges, and descending ranges, e.g.:
324
+ # positions are preprocessed to support negative indices, infinite ranges, and
325
+ # descending ranges, e.g.:
302
326
  #
303
327
  # ss.split("foo:bar:baz:quux", ":", at: -1)
304
328
  #
@@ -309,9 +333,8 @@ class StringSplitter
309
333
  # and
310
334
  #
311
335
  # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
312
- # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
313
336
  #
314
- # translate to:
337
+ # translates to:
315
338
  #
316
339
  # ss.split("foo:bar:baz:quux", ":", at: 6..8)
317
340
  #
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ class StringSplitter
4
+ class Split
5
+ attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
6
+ attr_writer :rhs
7
+ alias pos position
8
+
9
+ def initialize(captures:, lhs:, rhs:, separator:)
10
+ @captures = captures
11
+ @lhs = lhs
12
+ @rhs = rhs
13
+ @separator = separator
14
+ end
15
+
16
+ # 0-based index relative to the end of the array, e.g. for 5 items:
17
+ #
18
+ # index | rindex
19
+ # ------|-------
20
+ # 0 | 4
21
+ # 1 | 3
22
+ # 2 | 2
23
+ # 3 | 1
24
+ # 4 | 0
25
+ def rindex
26
+ @count - @position
27
+ end
28
+
29
+ # 1-based position relative to the end of the array, e.g. for 5 items:
30
+ #
31
+ # position | rposition
32
+ # ----------|----------
33
+ # 1 | 5
34
+ # 2 | 4
35
+ # 3 | 3
36
+ # 4 | 2
37
+ # 5 | 1
38
+ def rposition
39
+ @count + 1 - @position
40
+ end
41
+
42
+ alias rpos rposition
43
+
44
+ def update!(count:, index:)
45
+ @count = count
46
+ @index = index
47
+ @position = index + 1
48
+ freeze
49
+ end
50
+ end
51
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.7.0'
4
+ VERSION = '0.7.1'
5
5
  end
metadata CHANGED
@@ -1,29 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-08-21 00:00:00.000000000 Z
11
+ date: 2020-08-22 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: values
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '1.8'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - "~>"
25
- - !ruby/object:Gem::Version
26
- version: '1.8'
27
13
  - !ruby/object:Gem::Dependency
28
14
  name: bundler
29
15
  requirement: !ruby/object:Gem::Requirement
@@ -104,6 +90,7 @@ files:
104
90
  - LICENSE.md
105
91
  - README.md
106
92
  - lib/string_splitter.rb
93
+ - lib/string_splitter/split.rb
107
94
  - lib/string_splitter/version.rb
108
95
  homepage: https://github.com/chocolateboy/string_splitter
109
96
  licenses: