string_splitter 0.7.0 → 0.7.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 400534de6c3143ef81b2ad46a3a6432b7d83ef0900024ebdde3f06a4e1714890
4
- data.tar.gz: 643f5af7b9e13321dfa97b045b124d0c5ea576868b13141c264122bc96baea5e
3
+ metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
4
+ data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
5
5
  SHA512:
6
- metadata.gz: 35bed8fe69b33314813fbd68a8da0e8f4799b7891275ac601b157caeb0e0a3780f37ec7e7876d808b8dfcbfdf7527f45c3af0dc0d679e133865e96949a1d9ce3
7
- data.tar.gz: 8186e40d57654daf1a481ab74c128910f7aa346bc343a0a9933dc39b7cceeb204c1a55ac39b39321df46f7d02420fd87f93dd4a708be0a985d94833df018da87
6
+ metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
7
+ data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b
@@ -1,3 +1,12 @@
1
+ ## 0.7.1 - 2020-08-22
2
+
3
+ #### Changes
4
+
5
+ - performance improvements
6
+ - delegate to `String#split` where possible
7
+ - use a regular class for Split rather than values.rb
8
+ - create Split objects directly rather than allocating intermediate hashes
9
+
1
10
  ## 0.7.0 - 2020-08-21
2
11
 
3
12
  #### Breaking Changes
data/README.md CHANGED
@@ -11,7 +11,7 @@
11
11
  - [DESCRIPTION](#description)
12
12
  - [WHY?](#why)
13
13
  - [CAVEATS](#caveats)
14
- - [Differences from String#split](#differences-from-string%23split)
14
+ - [Differences from String#split](#differences-from-stringsplit)
15
15
  - [COMPATIBILITY](#compatibility)
16
16
  - [VERSION](#version)
17
17
  - [SEE ALSO](#see-also)
@@ -130,7 +130,7 @@ end
130
130
  Many languages have built-in `split` functions/methods for strings. They behave
131
131
  similarly (notwithstanding the occasional
132
132
  [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
133
- handle a few common cases e.g.:
133
+ handle a few common cases, e.g.:
134
134
 
135
135
  * limiting the number of splits
136
136
  * including the separator(s) in the results
@@ -140,7 +140,7 @@ But, because the API is squeezed into two overloaded parameters (the delimiter
140
140
  and the limit), achieving the desired results can be tricky. For instance,
141
141
  while `String#split` removes empty trailing fields (by default), it provides no
142
142
  way to remove *all* empty fields. Likewise, the cramped API means there's no
143
- way to e.g. combine a limit (positive integer) with the option to preserve
143
+ way to, e.g., combine a limit (positive integer) with the option to preserve
144
144
  empty fields (negative integer), or use backreferences in a delimiter pattern
145
145
  without including its captured subexpressions in the result.
146
146
 
@@ -192,7 +192,7 @@ to a regex or a full-blown parser.
192
192
  As an example, the nominally unstructured output of many Unix commands is often
193
193
  formatted in a way that's tantalizingly close to being
194
194
  [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
195
- apart from a few pesky exceptions e.g.:
195
+ apart from a few pesky exceptions, e.g.:
196
196
 
197
197
  ```bash
198
198
  $ ls -l
@@ -205,7 +205,7 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
205
205
  ```
206
206
 
207
207
  These lines can *almost* be parsed into an array of fields by splitting them on
208
- whitespace. The exception is the date (columns 6-8) i.e.:
208
+ whitespace. The exception is the date (columns 6-8), i.e.:
209
209
 
210
210
  ```ruby
211
211
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -224,7 +224,7 @@ instead of:
224
224
  ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
225
225
  ```
226
226
 
227
- One way to work around this is to parse the whole line e.g.:
227
+ One way to work around this is to parse the whole line, e.g.:
228
228
 
229
229
  ```ruby
230
230
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
@@ -232,7 +232,7 @@ line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+
232
232
 
233
233
  But that requires us to specify *everything*. What we really want is a version
234
234
  of `split` which allows us to veto splitting for the 6th and 7th delimiters
235
- (and to stop after the 8th delimiter) i.e. control over which splits are
235
+ (and to stop after the 8th delimiter), i.e. control over which splits are
236
236
  accepted, rather than being restricted to the single, baked-in strategy
237
237
  provided by the `limit` parameter.
238
238
 
@@ -258,7 +258,7 @@ ss.split(line, at: [1..5, 8])
258
258
  ## Differences from String#split
259
259
 
260
260
  Unlike `String#split`, StringSplitter doesn't trim the string before splitting
261
- (with `String#strip`) if the delimiter is omitted or a single space, e.g.:
261
+ if the delimiter is omitted or a single space, e.g.:
262
262
 
263
263
  ```ruby
264
264
  " foo bar baz ".split # => ["foo", "bar", "baz"]
@@ -297,7 +297,7 @@ currently, Ruby 2.5 and above.
297
297
 
298
298
  # VERSION
299
299
 
300
- 0.7.0
300
+ 0.7.1
301
301
 
302
302
  # SEE ALSO
303
303
 
@@ -1,8 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'set'
4
- require 'values'
5
4
 
5
+ require_relative 'string_splitter/split'
6
6
  require_relative 'string_splitter/version'
7
7
 
8
8
  # This class extends the functionality of +String#split+ by:
@@ -17,9 +17,10 @@ require_relative 'string_splitter/version'
17
17
  # These enhancements allow splits to handle many cases that otherwise require bigger
18
18
  # guns, e.g. regex matching or parsing.
19
19
  #
20
- # Implementation-wise, we split the string with a scanner which works in a similar
21
- # way to +String#split+ and parse the resulting tokens into an array of Split objects
22
- # with the following fields:
20
+ # Implementation-wise, we split the string either with String#split, or with a custom
21
+ # scanner if the delimiter may contain captures (since String#split doesn't handle
22
+ # them correctly) and parse the resulting tokens into an array of Split objects with
23
+ # the following attributes:
23
24
  #
24
25
  # - captures: separator substrings captured by parentheses in the delimiter pattern
25
26
  # - count: the number of splits
@@ -43,42 +44,6 @@ class StringSplitter
43
44
  DEFAULT_DELIMITER = /\s+/.freeze
44
45
  REMOVE = [].freeze
45
46
 
46
- Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
47
- def position
48
- index + 1
49
- end
50
-
51
- alias_method :pos, :position
52
-
53
- # 0-based index relative to the end of the array, e.g. for 5 items:
54
- #
55
- # index | rindex
56
- # ------|-------
57
- # 0 | 4
58
- # 1 | 3
59
- # 2 | 2
60
- # 3 | 1
61
- # 4 | 0
62
- def rindex
63
- count - position
64
- end
65
-
66
- # 1-based position relative to the end of the array, e.g. for 5 items:
67
- #
68
- # position | rposition
69
- # ----------|----------
70
- # 1 | 5
71
- # 2 | 4
72
- # 3 | 3
73
- # 4 | 2
74
- # 5 | 1
75
- def rposition
76
- count + 1 - position
77
- end
78
-
79
- alias_method :rpos, :rposition
80
- end
81
-
82
47
  # simulate an enum. the value is returned by the case statement
83
48
  # in the generated block if the positions match
84
49
  module Action
@@ -130,9 +95,10 @@ class StringSplitter
130
95
 
131
96
  return result unless splits
132
97
 
133
- splits.each_with_index do |hash, index|
134
- split = Split.with(hash.merge({ count: count, index: index }))
135
- result << split.lhs if result.empty?
98
+ result << splits.first.lhs
99
+
100
+ splits.each_with_index do |split, index|
101
+ split.update!(count: count, index: index)
136
102
 
137
103
  if accept.call(split)
138
104
  result << split.captures << split.rhs
@@ -166,9 +132,10 @@ class StringSplitter
166
132
 
167
133
  return result unless splits
168
134
 
169
- splits.reverse_each.with_index do |hash, index|
170
- split = Split.with(hash.merge({ count: count, index: index }))
171
- result.unshift(split.rhs) if result.empty?
135
+ result.unshift(splits.last.rhs)
136
+
137
+ splits.reverse_each.with_index do |split, index|
138
+ split.update!(count: count, index: index)
172
139
 
173
140
  if accept.call(split)
174
141
  # [lhs + captures] + result
@@ -190,7 +157,7 @@ class StringSplitter
190
157
  # the following fields:
191
158
  #
192
159
  # - result: the array of separated strings to return from +split+ or +rsplit+.
193
- # if the splits arry is empty, the caller returns this array immediately
160
+ # if the splits array is empty, the caller returns this array immediately
194
161
  # without any further processing
195
162
  #
196
163
  # - splits: an array of hashes containing the lhs, rhs, separator and captured
@@ -202,23 +169,76 @@ class StringSplitter
202
169
  # accepted (true) or rejected (false)
203
170
  #
204
171
  def init(string:, delimiter:, select:, reject:, block:)
205
- if reject
206
- positions = reject
207
- action = Action::REJECT
208
- elsif select
209
- positions = select
210
- action = Action::SELECT
172
+ return [[]] if string.empty?
173
+
174
+ unless block
175
+ if reject
176
+ positions = reject
177
+ action = Action::REJECT
178
+ elsif select
179
+ positions = select
180
+ action = Action::SELECT
181
+ else
182
+ block = ACCEPT_ALL
183
+ end
211
184
  end
212
185
 
213
- splits = parse(string, delimiter)
186
+ # use String#split if we can
187
+ #
188
+ # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
189
+ # on TruffleRuby
190
+
191
+ if delimiter.is_a?(String)
192
+ limit = -1
193
+
194
+ if delimiter == ' '
195
+ delimiter = / / # don't trim
196
+ elsif delimiter.empty?
197
+ limit = 0 # remove the trailing empty string
198
+ end
199
+
200
+ result = string.split(delimiter, limit)
201
+
202
+ return [result] if result.length == 1 # delimiter not found: no splits
203
+
204
+ if block == ACCEPT_ALL # return the (2 or more) fields
205
+ result = result.reject(&:empty?) if @remove_empty_fields
206
+ return [result]
207
+ end
208
+
209
+ splits = []
210
+
211
+ result.each_cons(2) do |lhs, rhs| # 2 or more fields
212
+ splits << Split.new(
213
+ captures: [],
214
+ lhs: lhs,
215
+ rhs: rhs,
216
+ separator: delimiter
217
+ )
218
+ end
219
+ elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
220
+ # non-empty separators so -1 is safe
221
+
222
+ if @remove_empty_fields
223
+ result = []
224
+ string.split(delimiter, -1) do |field|
225
+ result << field unless it.empty?
226
+ end
227
+ else
228
+ result = string.split(delimiter, -1)
229
+ end
214
230
 
215
- if splits.empty?
216
- result = string.empty? ? [] : [string]
217
231
  return [result]
232
+ else
233
+ splits = parse(string, delimiter)
218
234
  end
219
235
 
220
- block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
221
- [[], splits, splits.length, block]
236
+ count = splits.length
237
+
238
+ return [[string]] if count.zero?
239
+
240
+ block ||= compile(positions, action, count)
241
+ [[], splits, count, block]
222
242
  end
223
243
 
224
244
  def render(values)
@@ -227,6 +247,7 @@ class StringSplitter
227
247
  value.empty? && @remove_empty_fields ? REMOVE : [value]
228
248
  elsif @include_captures
229
249
  if @spread_captures
250
+ # TODO make sure compact can return a Capture
230
251
  @spread_captures == :compact ? value.compact : value
231
252
  elsif value.empty?
232
253
  # we expose non-captures (string delimiters or regexps with no
@@ -247,7 +268,7 @@ class StringSplitter
247
268
  # the delimiter, returning an array of objects (hashes) representing each split.
248
269
  # e.g. for:
249
270
  #
250
- # parse.split("foo:bar:baz:quux", ":")
271
+ # parse("foo:bar:baz:quux", ":")
251
272
  #
252
273
  # we return:
253
274
  #
@@ -258,6 +279,7 @@ class StringSplitter
258
279
  # ]
259
280
  #
260
281
  def parse(string, delimiter)
282
+ # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
261
283
  result = []
262
284
  start = 0
263
285
 
@@ -273,21 +295,23 @@ class StringSplitter
273
295
  next if separator.empty? && (index.zero? || after == string.length)
274
296
 
275
297
  lhs = string.slice(start, index - start)
276
- result.last[:rhs] = lhs unless result.empty?
298
+ result.last.rhs = lhs unless result.empty?
277
299
 
278
300
  # this is correct for the last/only match, but gets updated to the next
279
301
  # match's lhs for other matches
280
302
  rhs = match.post_match
281
303
 
282
- result << {
304
+ # captures = (has_names ? Captures.new(match) : match.captures)
305
+
306
+ result << Split.new(
283
307
  captures: match.captures,
284
308
  lhs: lhs,
285
309
  rhs: rhs,
286
- separator: separator,
287
- }
310
+ separator: separator
311
+ )
288
312
 
289
- # move the start index (the start of the next lhs) to the index after the
290
- # last character of the separator
313
+ # advance the start index (the start of the next lhs) to the position
314
+ # after the last character of the separator
291
315
  start = after
292
316
  end
293
317
 
@@ -297,8 +321,8 @@ class StringSplitter
297
321
  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
298
322
  # on the action) the supplied positions
299
323
  #
300
- # positions are preprocessed to support additional features: negative
301
- # ranges, infinite ranges, and descending ranges, e.g.:
324
+ # positions are preprocessed to support negative indices, infinite ranges, and
325
+ # descending ranges, e.g.:
302
326
  #
303
327
  # ss.split("foo:bar:baz:quux", ":", at: -1)
304
328
  #
@@ -309,9 +333,8 @@ class StringSplitter
309
333
  # and
310
334
  #
311
335
  # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
312
- # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
313
336
  #
314
- # translate to:
337
+ # translates to:
315
338
  #
316
339
  # ss.split("foo:bar:baz:quux", ":", at: 6..8)
317
340
  #
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ class StringSplitter
4
+ class Split
5
+ attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
6
+ attr_writer :rhs
7
+ alias pos position
8
+
9
+ def initialize(captures:, lhs:, rhs:, separator:)
10
+ @captures = captures
11
+ @lhs = lhs
12
+ @rhs = rhs
13
+ @separator = separator
14
+ end
15
+
16
+ # 0-based index relative to the end of the array, e.g. for 5 items:
17
+ #
18
+ # index | rindex
19
+ # ------|-------
20
+ # 0 | 4
21
+ # 1 | 3
22
+ # 2 | 2
23
+ # 3 | 1
24
+ # 4 | 0
25
+ def rindex
26
+ @count - @position
27
+ end
28
+
29
+ # 1-based position relative to the end of the array, e.g. for 5 items:
30
+ #
31
+ # position | rposition
32
+ # ----------|----------
33
+ # 1 | 5
34
+ # 2 | 4
35
+ # 3 | 3
36
+ # 4 | 2
37
+ # 5 | 1
38
+ def rposition
39
+ @count + 1 - @position
40
+ end
41
+
42
+ alias rpos rposition
43
+
44
+ def update!(count:, index:)
45
+ @count = count
46
+ @index = index
47
+ @position = index + 1
48
+ freeze
49
+ end
50
+ end
51
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.7.0'
4
+ VERSION = '0.7.1'
5
5
  end
metadata CHANGED
@@ -1,29 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-08-21 00:00:00.000000000 Z
11
+ date: 2020-08-22 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: values
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '1.8'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - "~>"
25
- - !ruby/object:Gem::Version
26
- version: '1.8'
27
13
  - !ruby/object:Gem::Dependency
28
14
  name: bundler
29
15
  requirement: !ruby/object:Gem::Requirement
@@ -104,6 +90,7 @@ files:
104
90
  - LICENSE.md
105
91
  - README.md
106
92
  - lib/string_splitter.rb
93
+ - lib/string_splitter/split.rb
107
94
  - lib/string_splitter/version.rb
108
95
  homepage: https://github.com/chocolateboy/string_splitter
109
96
  licenses: