string_splitter 0.3.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 128e1b2cc29cb122f3d5040f7c9e115688532c7e68c59f0a5373291e995642f9
4
- data.tar.gz: 6b8729b7fb59aa984c1940ff0f9a1a308dded8b77c7db2b3c2a2ad4cdbd8bd52
3
+ metadata.gz: 9d97ccb956fe51694359cdb0d3a997d6574de088bac6ed5a8e572f92bb5ed54a
4
+ data.tar.gz: 845cefeb5efd5d01baa45759cb05ff7ae5e9a457c1f148b340bb24c038bd259e
5
5
  SHA512:
6
- metadata.gz: 3aa949fb5ac46369af379e2fd28bc18f7c93746515aea76005d606a4cb9f20426dec353bef8406264078cdccf89cdaca99902d961687f12fb72be08d9f2b0072
7
- data.tar.gz: acd982d39a003be78b4548992cf108141e89e51a58f918e10515fb01dd1fd562db319e5c0dc5475d3e74739726e1fd752bbee4850b823705fb9520ff6b05e99f
6
+ metadata.gz: 7a935a6e0f3434801dcae6a32575779e1d2eb706f8f208087a208e7fdba39ac5b49928f8b7617aec60493a8db5988a013028650f8b2ced01fadb620bfd4c77e5
7
+ data.tar.gz: d76c18a283c1e113c8bffb73b813eb6074481faa7ea339811dc9a7424a5e24fdc3efbe9afa941459e566cde8271c3cd19a97e3a37a8cf90d36a65a7bf8fd6dcf
@@ -1,18 +1,74 @@
1
+ ## 0.6.0 - 2020-08-20
2
+
3
+ #### Breaking Changes
4
+
5
+ - `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
6
+ unlike Ruby's `String#split` (but like Crystal's), the former no longer
7
+ strips the string before splitting
8
+ - rename the `remove_empty` option `remove_empty_fields`
9
+ - rename the `exclude` option `except` (alias for `reject`)
10
+
11
+ #### Fixes
12
+
13
+ - correctly handle backreferences in delimiter patterns
14
+
15
+ #### Features
16
+
17
+ - add support for descending, negative, and infinite ranges,
18
+ e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
19
+
20
+ ## 0.5.1 - 2018-07-01
21
+
22
+ #### Changes
23
+
24
+ - set StringSplitter::VERSION when `string_splitter.rb` is loaded
25
+
26
+ ## 0.5.0 - 2018-06-26
27
+
28
+ #### Fixes
29
+
30
+ - don't treat string delimiters as patterns
31
+
32
+ #### Features
33
+
34
+ - add a `reject`/`exclude` option which rejects splits at the specified positions
35
+ - add a `select` alias for `at`
36
+
37
+ ## 0.4.0 - 2018-06-24
38
+
39
+ #### Breaking Changes
40
+
41
+ - remove the `offset` alias for `split.index`
42
+
43
+ ## 0.3.1 - 2018-06-24
44
+
45
+ #### Fixes
46
+
47
+ - remove trailing empty field when the separator is empty
48
+ ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
49
+
1
50
  ## 0.3.0 - 2018-06-23
2
51
 
3
- - **breaking change**: rename the `default_separator` option to `default_delimiter`
4
- - to avoid ambiguity in the code, refer to the input pattern/string as the
5
- "delimiter" and the matched string as the "separator"
52
+ #### Breaking Changes
53
+
54
+ - rename the `default_separator` option `default_delimiter`
6
55
 
7
56
  ## 0.2.0 - 2018-06-22
8
57
 
9
- - **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
10
- (AKA `pos`) as the 1-based accessor
58
+ #### Breaking Changes
59
+
60
+ - make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
61
+ 1-based accessor
11
62
 
12
63
  ## 0.1.0 - 2018-06-22
13
64
 
14
- - **breaking change**: the block now takes a single `split` object with an
15
- `index` accessor, rather than seperate `index` and `split` arguments
65
+ #### Breaking Changes
66
+
67
+ - the block now takes a single `split` object with an `index` accessor, rather
68
+ than seperate `index` and `split` arguments
69
+
70
+ #### Features
71
+
16
72
  - add support for negative indices in the value supplied to the `at` option
17
73
  - add a `count` field to the split object containing the total number of splits
18
74
 
data/README.md CHANGED
@@ -3,14 +3,16 @@
3
3
  [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
4
4
  [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
5
5
 
6
- <!-- START doctoc generated TOC please keep comment here to allow auto update -->
7
- <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
6
+ <!-- toc -->
8
7
 
9
8
  - [NAME](#name)
10
9
  - [INSTALLATION](#installation)
11
10
  - [SYNOPSIS](#synopsis)
12
11
  - [DESCRIPTION](#description)
13
12
  - [WHY?](#why)
13
+ - [CAVEATS](#caveats)
14
+ - [Differences from String#split](#differences-from-string%23split)
15
+ - [COMPATIBILITY](#compatibility)
14
16
  - [VERSION](#version)
15
17
  - [SEE ALSO](#see-also)
16
18
  - [Gems](#gems)
@@ -18,7 +20,7 @@
18
20
  - [AUTHOR](#author)
19
21
  - [COPYRIGHT AND LICENSE](#copyright-and-license)
20
22
 
21
- <!-- END doctoc generated TOC please keep comment here to allow auto update -->
23
+ <!-- tocstop -->
22
24
 
23
25
  # NAME
24
26
 
@@ -36,65 +38,137 @@ gem "string_splitter"
36
38
  require "string_splitter"
37
39
 
38
40
  ss = StringSplitter.new
41
+ ```
42
+
43
+ **Same as `String#split`**
44
+
45
+ ```ruby
46
+ ss.split("foo bar baz")
47
+ ss.split(" foo bar baz ")
48
+ # => ["foo", "bar", "baz"]
49
+ ```
50
+
51
+ ```ruby
52
+ ss.split("foo", "")
53
+ ss.split("foo", //)
54
+ # => ["f", "o", "o"]
55
+ ```
39
56
 
40
- # same as String#split
41
- ss.split("foo bar baz quux")
42
- ss.split("foo bar baz quux", " ")
43
- ss.split("foo bar baz quux", /\s+/)
44
- # => ["foo", "bar", "baz", "quux"]
57
+ ```ruby
58
+ ss.split("", "...")
59
+ ss.split("", /.../)
60
+ # => []
61
+ ```
45
62
 
46
- # split at the first delimiter
63
+ **Split at the first delimiter**
64
+
65
+ ```ruby
47
66
  ss.split("foo:bar:baz:quux", ":", at: 1)
67
+ ss.split("foo:bar:baz:quux", ":", select: 1)
48
68
  # => ["foo", "bar:baz:quux"]
69
+ ```
70
+
71
+ **Split at the last delimiter**
49
72
 
50
- # split at the last delimiter
73
+ ```ruby
51
74
  ss.split("foo:bar:baz:quux", ":", at: -1)
52
75
  # => ["foo:bar:baz", "quux"]
76
+ ```
53
77
 
54
- # split at multiple delimiter positions
55
- ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
56
- # => ["1", "2", "3", "4:5:6:7", "8:9"]
78
+ **Split at multiple delimiter positions**
57
79
 
58
- # split from the right
80
+ ```ruby
81
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
82
+ # => ["1", "2", "3", "4:5:6:7:8", "9"]
83
+ ```
84
+
85
+ **Split at all but the first and last delimiters**
86
+
87
+ ```ruby
88
+ ss.split("1:2:3:4:5:6", ":", except: [1, -1])
89
+ ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
90
+ # => ["1:2", "3", "4", "5:6"]
91
+ ```
92
+
93
+ **Split from the right**
94
+
95
+ ```ruby
59
96
  ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
60
97
  # => ["1:2:3:4", "5:6", "7", "8", "9"]
98
+ ```
61
99
 
62
- # full control via a block
63
- result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
64
- split.index > 0 && split.lhs == split.rhs
100
+ **Split with negative, descending, and infinite ranges**
101
+
102
+ ```ruby
103
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
104
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [4...])
105
+ # => ["1:2:3:4", "5", "6", "7", "8:9"]
106
+ ```
107
+
108
+ ```ruby
109
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
110
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [..-3])
111
+ # => ["1", "2", "3", "4", "5", "6", "7:8:9"]
112
+ ```
113
+
114
+ ```ruby
115
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
116
+ # => ["1", "2:3", "4", "5", "6:7", "8", "9"]
117
+ ```
118
+
119
+ **Full control via a block**
120
+
121
+ ```ruby
122
+ result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
123
+ split.pos % 2 == 0
65
124
  end
66
- # => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
125
+ # => ["1:2", "3:4", "5:6", "7:8"]
126
+ ```
127
+
128
+ ```ruby
129
+ string = "banana".chars.sort.join # "aaabnn"
130
+
131
+ ss.split(string, "") do |split|
132
+ split.rhs != split.lhs
133
+ end
134
+ # => ["aaa", "b", "nn"]
67
135
  ```
68
136
 
69
137
  # DESCRIPTION
70
138
 
71
- Many languages have built-in string `split` functions/methods. They behave similarly
72
- (notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
73
- and handle a few common cases e.g.:
139
+ Many languages have built-in `split` functions/methods for strings. They behave
140
+ similarly (notwithstanding the occasional
141
+ [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
142
+ handle a few common cases e.g.:
74
143
 
75
144
  * limiting the number of splits
76
- * including the separators in the results
145
+ * including the separator(s) in the results
77
146
  * removing (some) empty fields
78
147
 
79
- But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
80
- achieving the desired effects can be tricky. For instance, while `String#split` removes empty
81
- trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
82
- cramped API means there's no way to e.g. combine a limit (positive integer) with the option
83
- to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
148
+ But, because the API is squeezed into two overloaded parameters (the delimiter
149
+ and the limit), achieving the desired results can be tricky. For instance,
150
+ while `String#split` removes empty trailing fields (by default), it provides no
151
+ way to remove *all* empty fields. Likewise, the cramped API means there's no
152
+ way to e.g. combine a limit (positive integer) with the option to preserve
153
+ empty fields (negative integer), or use backreferences in a delimiter pattern
84
154
  without including its captured subexpressions in the result.
85
155
 
86
- If `split` was being written from scratch, without the baggage of its legacy API,
87
- it's possible that some of these options would be made explicit rather than overloading
88
- the parameters. And, indeed, this is possible in some implementations,
89
- e.g. in Crystal:
156
+ If `split` was being written from scratch, without the baggage of its legacy
157
+ API, it's possible that some of these options would be made explicit rather
158
+ than overloading the parameters. And, indeed, this is possible in some
159
+ implementations, e.g. in Crystal:
90
160
 
91
161
  ```ruby
92
- ":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
93
- ":foo:bar:baz:".split(":", remove_empty: true) # => ["foo", "bar", "baz"]
162
+ ":foo:bar:baz:".split(":", remove_empty: false)
163
+ # => ["", "foo", "bar", "baz", ""]
164
+
165
+ ":foo:bar:baz:".split(":", remove_empty: true)
166
+ # => ["foo", "bar", "baz"]
94
167
  ````
95
168
 
96
- StringSplitter takes this one step further by moving the configuration out of the method altogether
97
- and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
169
+ StringSplitter takes this one step further by moving the configuration out of
170
+ the method altogether and delegating the strategy — i.e. which splits should be
171
+ accepted or rejected — to a block:
98
172
 
99
173
  ```ruby
100
174
  ss = StringSplitter.new
@@ -102,22 +176,29 @@ ss = StringSplitter.new
102
176
  ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
103
177
  # => ["foo", "bar:baz"]
104
178
 
105
- ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
106
- # => ["foo:bar", "baz"]
179
+ ss.split("foo:bar:baz:quux", ":") do |split|
180
+ split.position == 1 || split.position == 3
181
+ end
182
+ # => ["foo", "bar:baz", "quux"]
107
183
  ```
108
184
 
109
- As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
185
+ As a shortcut, the common case of splitting on delimiters at one or more
186
+ positions is supported by an option:
110
187
 
111
188
  ```ruby
112
- ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
189
+ ss.split("foo:bar:baz:quux", ":", at: [1, -1])
190
+ # => ["foo", "bar:baz", "quux"]
113
191
  ```
114
192
 
115
193
  # WHY?
116
194
 
117
- I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
195
+ I wanted to split semi-structured output into fields without having to resort
196
+ to a regex or a full-blown parser.
118
197
 
119
- As an example, the nominally unstructured output of many Unix commands is often formatted in a way
120
- that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
198
+ As an example, the nominally unstructured output of many Unix commands is often
199
+ formatted in a way that's tantalizingly close to being
200
+ [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
201
+ apart from a few pesky exceptions e.g.:
121
202
 
122
203
  ```bash
123
204
  $ ls -l
@@ -129,8 +210,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
129
210
  -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
130
211
  ```
131
212
 
132
- These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
133
- date (columns 6-8) i.e.:
213
+ These lines can *almost* be parsed into an array of fields by splitting them on
214
+ whitespace. The exception is the date (columns 6-8) i.e.:
134
215
 
135
216
  ```ruby
136
217
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -155,13 +236,14 @@ One way to work around this is to parse the whole line e.g.:
155
236
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
156
237
  ```
157
238
 
158
- But that requires us to specify *everything*. What we really want is a version of `split`
159
- which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
160
- splits are accepted, rather than being restricted to the single, baked-in strategy provided
161
- by the `limit` parameter.
239
+ But that requires us to specify *everything*. What we really want is a version
240
+ of `split` which allows us to veto splitting for the 6th and 7th delimiters
241
+ (and to stop after the 8th delimiter) i.e. control over which splits are
242
+ accepted, rather than being restricted to the single, baked-in strategy
243
+ provided by the `limit` parameter.
162
244
 
163
- By providing a simple way to accept or reject each split, StringSplitter makes cases like
164
- this easy to handle, either via a block:
245
+ By providing a simple way to accept or reject each split, StringSplitter makes
246
+ cases like this easy to handle, either via a block:
165
247
 
166
248
  ```ruby
167
249
  ss.split(line) do |split|
@@ -177,9 +259,42 @@ ss.split(line, at: [1..5, 8])
177
259
  # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
178
260
  ```
179
261
 
262
+ # CAVEATS
263
+
264
+ ## Differences from String#split
265
+
266
+ StringSplitter shares `String#split`'s behavior of trimming the string before
267
+ splitting if the delimiter is omitted, e.g.:
268
+
269
+ ```ruby
270
+ " foo bar baz ".split # => ["foo", "bar", "baz"]
271
+ ss.split(" foo bar baz ") # => ["foo", "bar", "baz"]
272
+ ```
273
+
274
+ However, unlike `String#split`, this doesn't also apply if a delimiter of `" "`
275
+ is supplied, e.g.:
276
+
277
+ ```ruby
278
+ " foo bar baz ".split(" ") # => ["foo", "bar", "baz"]
279
+ ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
280
+ ```
281
+
282
+ It also doesn't apply if a custom default-delimiter is defined:
283
+
284
+ ```ruby
285
+ ss = StringSplitter.new(default_delimiter: /\s+/)
286
+ ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
287
+ ```
288
+
289
+ # COMPATIBILITY
290
+
291
+ StringSplitter is tested and supported on all versions of Ruby [supported by
292
+ the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
293
+ currently, Ruby 2.5 and above.
294
+
180
295
  # VERSION
181
296
 
182
- 0.3.0
297
+ 0.6.0
183
298
 
184
299
  # SEE ALSO
185
300
 
@@ -197,7 +312,7 @@ ss.split(line, at: [1..5, 8])
197
312
 
198
313
  # COPYRIGHT AND LICENSE
199
314
 
200
- Copyright © 2018 by chocolateboy.
315
+ Copyright © 2018-2020 by chocolateboy.
201
316
 
202
317
  This is free software; you can redistribute it and/or modify it under the
203
- terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
318
+ terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).
@@ -1,204 +1,354 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'set'
3
4
  require 'values'
5
+ require_relative 'string_splitter/version'
4
6
 
5
7
  # This class extends the functionality of +String#split+ by:
6
8
  #
7
9
  # - providing full control over which splits are accepted or rejected
10
+ #
8
11
  # - adding support for splitting from right-to-left
9
- # - encapsulating splitting options/preferences in instances rather than trying to
10
- # cram them into overloaded method parameters
12
+ #
13
+ # - encapsulating splitting options/preferences in the splitter rather
14
+ # than trying to cram them into overloaded method parameters
11
15
  #
12
16
  # These enhancements allow splits to handle many cases that otherwise require bigger
13
- # guns e.g. regex matching or parsing.
17
+ # guns, e.g. regex matching or parsing.
18
+ #
19
+ # Implementation-wise, we effectively use the built-in +String#split+ method as a
20
+ # tokenizer, and parse the resulting tokens into an array of Split objects with the
21
+ # following fields:
22
+ #
23
+ # - captures: separator substrings captured by parentheses in the delimiter pattern
24
+ # - count: the number of splits
25
+ # - index: the 0-based index of the split in the array
26
+ # - lhs: the string to the left of the separator (back to the previous split candidate)
27
+ # - position: the 1-based index of the split in the array (alias: pos)
28
+ # - rhs: the string to the right of the separator (up to the next split candidate)
29
+ # - rindex: the 0-based index of the split relative to the end of the array
30
+ # - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
31
+ # - separator: the string matched by the delimiter pattern/string
32
+ #
14
33
  class StringSplitter
15
- ACCEPT = ->(_split) { true }
16
- DEFAULT_DELIMITER = /\s+/
17
- NO_SPLITS = []
34
+ # terminology: the delimiter is what we provide and the separators are what we get
35
+ # back (if we capture them). e.g. for:
36
+ #
37
+ # ss.split("foo:bar::baz", /(\W+)/)
38
+ #
39
+ # the delimiter is /(\W)/ and the separators are ":" and "::"
40
+
41
+ ACCEPT_ALL = ->(_split) { true }
42
+ DEFAULT_DELIMITER = /\s+/.freeze
18
43
 
19
44
  Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
20
45
  def position
21
46
  index + 1
22
47
  end
23
48
 
24
- alias_method :offset, :index
25
49
  alias_method :pos, :position
50
+
51
+ # 0-based index relative to the end of the array, e.g. for 5 items:
52
+ #
53
+ # index | rindex
54
+ # ------|-------
55
+ # 0 | 4
56
+ # 1 | 3
57
+ # 2 | 2
58
+ # 3 | 1
59
+ # 4 | 0
60
+ def rindex
61
+ count - position
62
+ end
63
+
64
+ # 1-based position relative to the end of the array, e.g. for 5 items:
65
+ #
66
+ # position | rposition
67
+ # ----------|----------
68
+ # 1 | 5
69
+ # 2 | 4
70
+ # 3 | 3
71
+ # 4 | 2
72
+ # 5 | 1
73
+ def rposition
74
+ count + 1 - position
75
+ end
76
+
77
+ alias_method :rpos, :rposition
26
78
  end
27
79
 
80
+ # simulate an enum. the value is returned by the case statement
81
+ # in the generated block if the positions match
82
+ module Action
83
+ SELECT = true
84
+ REJECT = false
85
+ end
86
+
87
+ private_constant :Action
88
+
28
89
  def initialize(
29
90
  default_delimiter: DEFAULT_DELIMITER,
30
91
  include_captures: true,
31
- remove_empty: false,
92
+ remove_empty: false, # TODO remove this
93
+ remove_empty_fields: remove_empty,
32
94
  spread_captures: true
33
95
  )
34
96
  @default_delimiter = default_delimiter
35
97
  @include_captures = include_captures
36
- @remove_empty = remove_empty
98
+ @remove_empty_fields = remove_empty_fields
37
99
  @spread_captures = spread_captures
38
100
  end
39
101
 
40
- attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
41
-
42
- def split(string, delimiter = @default_delimiter, at: nil, &block)
43
- result, block, splits, count, index = split_common(string, delimiter, at, block)
102
+ attr_reader(
103
+ :default_delimiter,
104
+ :include_captures,
105
+ :remove_empty_fields,
106
+ :spread_captures
107
+ )
44
108
 
45
- splits.each do |split|
46
- split = Split.with(split.merge({ index: (index += 1), count: count }))
109
+ # TODO remove this
110
+ alias remove_empty remove_empty_fields
111
+
112
+ def split(
113
+ string,
114
+ delimiter = @default_delimiter,
115
+ at: nil, # alias for select
116
+ except: nil, # alias for reject
117
+ select: at,
118
+ reject: except,
119
+ &block
120
+ )
121
+ result, splits, count, accept = init(
122
+ string: string,
123
+ delimiter: delimiter,
124
+ select: select,
125
+ reject: reject,
126
+ block: block
127
+ )
128
+
129
+ return result unless splits
130
+
131
+ splits.each_with_index do |hash, index|
132
+ split = Split.with(hash.merge({ count: count, index: index }))
47
133
  result << split.lhs if result.empty?
48
134
 
49
- if block.call(split)
50
- if @include_captures
51
- if @spread_captures
52
- result += split.captures
53
- else
54
- result << split.captures
55
- end
56
- end
57
-
58
- result << split.rhs
135
+ if accept.call(split)
136
+ result << split.captures << split.rhs
59
137
  else
60
138
  # append the rhs
61
139
  result[-1] = result[-1] + split.separator + split.rhs
62
140
  end
63
141
  end
64
142
 
65
- result
143
+ render(result)
66
144
  end
67
145
 
68
146
  alias lsplit split
69
147
 
70
- def rsplit(string, delimiter = @default_delimiter, at: nil, &block)
71
- result, block, splits, count, index = split_common(string, delimiter, at, block)
72
-
73
- splits.reverse!.each do |split|
74
- split = Split.with(split.merge({ index: (index += 1), count: count }))
148
+ def rsplit(
149
+ string,
150
+ delimiter = @default_delimiter,
151
+ at: nil, # alias for select
152
+ except: nil, # alias for reject
153
+ select: at,
154
+ reject: except,
155
+ &block
156
+ )
157
+ result, splits, count, accept = init(
158
+ string: string,
159
+ delimiter: delimiter,
160
+ select: select,
161
+ reject: reject,
162
+ block: block
163
+ )
164
+
165
+ return result unless splits
166
+
167
+ splits.reverse_each.with_index do |hash, index|
168
+ split = Split.with(hash.merge({ count: count, index: index }))
75
169
  result.unshift(split.rhs) if result.empty?
76
170
 
77
- if block.call(split)
78
- if @include_captures
79
- if @spread_captures
80
- result = split.captures + result
81
- else
82
- result.unshift(split.captures)
83
- end
84
- end
85
-
86
- result.unshift(split.lhs)
171
+ if accept.call(split)
172
+ # [lhs + captures] + result
173
+ result.unshift(split.lhs, split.captures)
87
174
  else
88
175
  # prepend the lhs
89
176
  result[0] = split.lhs + split.separator + result[0]
90
177
  end
91
178
  end
92
179
 
93
- result
180
+ render(result)
94
181
  end
95
182
 
96
183
  private
97
184
 
98
- def splits_for(parts, ncaptures)
99
- result = []
100
- splits = []
101
-
102
- until parts.empty?
103
- lhs = parts.shift
104
- separator = parts.shift
105
- captures = parts.shift(ncaptures)
106
- rhs = parts.length == 1 ? parts.shift : parts.first
107
-
108
- if @remove_empty && (lhs.empty? || rhs.empty?)
109
- if lhs.empty? && rhs.empty?
110
- # do nothing
111
- elsif parts.empty? # last split
112
- result << (!lhs.empty? ? lhs : rhs) if splits.empty?
113
- elsif rhs.empty?
114
- # replace the empty rhs with the non-empty lhs
115
- parts[0] = lhs
116
- end
185
+ # initialisation common to +split+ and +rsplit+
186
+ #
187
+ # takes a hash of options passed to +split+ or +rsplit+ and returns a triple with
188
+ # the following fields:
189
+ #
190
+ # - result: the array of separated strings to return from +split+ or +rsplit+.
191
+ # if the splits arry is empty, the caller returns this array immediately
192
+ # without any further processing
193
+ #
194
+ # - splits: an array of hashes containing the lhs, rhs, separator and captured
195
+ # separator substrings for each split
196
+ #
197
+ # - count: the number of splits
198
+ #
199
+ # - accept: a proc whose return value determines whether each split should be
200
+ # accepted (true) or rejected (false)
201
+ #
202
+ def init(string:, delimiter:, select:, reject:, block:)
203
+ if delimiter.equal?(DEFAULT_DELIMITER)
204
+ string = string.strip
205
+ end
117
206
 
118
- next
119
- end
207
+ if reject
208
+ positions = reject
209
+ action = Action::REJECT
210
+ elsif select
211
+ positions = select
212
+ action = Action::SELECT
213
+ end
120
214
 
121
- splits << {
122
- lhs: lhs,
123
- rhs: rhs,
124
- separator: separator,
125
- captures: captures,
126
- }
215
+ splits = parse(string, delimiter)
216
+
217
+ if splits.empty?
218
+ result = string.empty? ? [] : [string]
219
+ return [result]
127
220
  end
128
221
 
129
- [result, splits]
222
+ block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
223
+ [[], splits, splits.length, block]
130
224
  end
131
225
 
132
- # setup common to both split methods
133
- def split_common(string, delimiter, at, block)
134
- unless (match = string.match(delimiter))
135
- result = (@remove_empty && string.empty?) ? [] : [string]
136
- return [result, block, NO_SPLITS, 0, -1]
226
+ def render(result)
227
+ if @remove_empty_fields
228
+ result.reject! { |it| it.is_a?(String) && it.empty? }
137
229
  end
138
230
 
139
- ncaptures = match.captures.length
140
-
141
- if delimiter.is_a?(Regexp) && ncaptures > 0
142
- # increment back-references so they remain valid when the outer capture
143
- # is added e.g. to split on:
144
- #
145
- # - <foo-comment> ... </foo-comment>
146
- # - <bar-comment> ... </bar-comment>
147
- #
148
- # etc.
149
- #
150
- # before:
151
- #
152
- # %r| <(\w+-comment)> [^<]* </\1> |x
153
- #
154
- # after:
155
- #
156
- # %r| ( <(\w+-comment)> [^<]* </\2> ) |x
157
-
158
- delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
159
- match = Regexp.last_match
160
- match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
161
- end
231
+ unless @include_captures
232
+ return result.reject! { |it| it.is_a?(Array) }
162
233
  end
163
234
 
164
- parts = string.split(/(#{delimiter})/, -1)
165
- result, splits = splits_for(parts, ncaptures)
166
- count = splits.length
167
-
168
- unless block
169
- if at
170
- at = Array(at).map do |index|
171
- if index.is_a?(Integer) && index.negative?
172
- # translate 1-based negative indices to 1-based positive
173
- # indices e.g:
174
- #
175
- # ss.split("foo:bar:baz:quux", ":", at: -1)
176
- #
177
- # translates to:
178
- #
179
- # ss.split("foo:bar:baz:quux", ":", at: 3)
180
- #
181
- # XXX note: we don't use modulo, because we don't want
182
- # out-of-bounds indices to silently work e.g. we don't want:
183
- #
184
- # ss.split("foo:bar:baz:quux", ":", -42)
185
- #
186
- # to mysteriously match when the index is 2
187
-
188
- count + 1 + index
189
- else
190
- index
191
- end
192
- end
235
+ result.flat_map do |value|
236
+ next [value] unless value.is_a?(Array) && @spread_captures
237
+ @spread_captures == :compact ? value.compact : value
238
+ end
239
+ end
240
+
241
+ # takes a string and a delimiter pattern (regex or string) and splits it along
242
+ # the delimiter, returning an array of objects (hashes) representing each split.
243
+ # e.g. for:
244
+ #
245
+ # parse.split("foo:bar:baz:quux", ":")
246
+ #
247
+ # we return:
248
+ #
249
+ # [
250
+ # { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
251
+ # { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
252
+ # { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
253
+ # ]
254
+ #
255
+ def parse(string, pattern)
256
+ result = []
257
+ start = 0
258
+
259
+ # we don't use the argument passed to the +scan+ block here because it's a
260
+ # string (the separator) if there are no captures, rather than an empty
261
+ # array. we use match.captures instead to get the array
262
+ string.scan(pattern) do
263
+ match = Regexp.last_match
264
+ index, after = match.offset(0)
265
+ separator = match[0]
266
+
267
+ # ignore empty separators at the beginning and/or end of the string
268
+ next if separator.empty? && (index.zero? || after == string.length)
269
+
270
+ lhs = string.slice(start, index - start)
271
+ result.last[:rhs] = lhs unless result.empty?
272
+
273
+ # this is correct for the last/only match, but gets updated to the next
274
+ # match's lhs for other matches
275
+ rhs = match.post_match
276
+
277
+ result << {
278
+ captures: match.captures,
279
+ lhs: lhs,
280
+ rhs: rhs,
281
+ separator: separator,
282
+ }
283
+
284
+ # move the start index (the start of the lhs) to the index after the last
285
+ # character of the separator
286
+ start = after
287
+ end
288
+
289
+ result
290
+ end
193
291
 
194
- block = lambda do |split|
195
- case split.position when *at then true else false end
292
+ # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
293
+ # on the action) the supplied positions
294
+ #
295
+ # positions are preprocessed to support an additional feature: negative indices
296
+ # are translated to 1-based non-negative indices, e.g:
297
+ #
298
+ # ss.split("foo:bar:baz:quux", ":", at: -1)
299
+ #
300
+ # translates to:
301
+ #
302
+ # ss.split("foo:bar:baz:quux", ":", at: 3)
303
+ #
304
+ # and
305
+ #
306
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
307
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
308
+ #
309
+ # translate to:
310
+ #
311
+ # ss.split("foo:bar:baz:quux", ":", at: 6..8)
312
+ #
313
+ def compile(positions, action, nsplits)
314
+ # XXX note: we don't use modulo, because we don't want
315
+ # out-of-bounds indices to silently work, e.g. we don't want:
316
+ #
317
+ # ss.split("foo:bar:baz:quux", ":", at: -42)
318
+ #
319
+ # to mysteriously match when the index/position is 0/1
320
+ #
321
+ resolve = ->(int) { int.negative? ? nsplits + 1 + int : int }
322
+
323
+ # don't use Array(...) to wrap these as we don't want to convert ranges
324
+ positions = positions.is_a?(Array) ? positions : [positions]
325
+
326
+ positions = positions.map do |position|
327
+ if position.is_a?(Integer)
328
+ resolve[position]
329
+ elsif position.is_a?(Range)
330
+ rbegin = position.begin
331
+ rend = position.end
332
+ rexc = position.exclude_end?
333
+
334
+ if rbegin.nil?
335
+ Range.new(1, resolve[rend], rexc)
336
+ elsif rend.nil?
337
+ Range.new(resolve[rbegin], nsplits, rexc)
338
+ elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
339
+ from = resolve[rbegin]
340
+ to = resolve[rend]
341
+ to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
342
+ else
343
+ position
196
344
  end
345
+ elsif position.is_a?(Set)
346
+ position.map { |it| resolve[it] }.to_set
197
347
  else
198
- block = ACCEPT
348
+ position
199
349
  end
200
350
  end
201
351
 
202
- [result, block, splits, count, -1]
352
+ ->(split) { case split.position when *positions then action else !action end }
203
353
  end
204
354
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.3.0'
4
+ VERSION = '0.6.0'
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-06-23 00:00:00.000000000 Z
11
+ date: 2020-08-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: values
@@ -30,42 +30,42 @@ dependencies:
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '1.16'
33
+ version: '2.1'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '1.16'
40
+ version: '2.1'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: minitest
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '5.11'
47
+ version: '5.0'
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '5.11'
54
+ version: '5.0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: minitest-power_assert
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: 0.3.0
61
+ version: '0.3'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: 0.3.0
68
+ version: '0.3'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: minitest-reporters
71
71
  requirement: !ruby/object:Gem::Requirement
@@ -86,29 +86,15 @@ dependencies:
86
86
  requirements:
87
87
  - - "~>"
88
88
  - !ruby/object:Gem::Version
89
- version: '10.0'
89
+ version: '13.0'
90
90
  type: :development
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
- version: '10.0'
97
- - !ruby/object:Gem::Dependency
98
- name: rubocop
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - "~>"
102
- - !ruby/object:Gem::Version
103
- version: 0.54.0
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - "~>"
109
- - !ruby/object:Gem::Version
110
- version: 0.54.0
111
- description:
96
+ version: '13.0'
97
+ description:
112
98
  email: chocolate@cpan.org
113
99
  executables: []
114
100
  extensions: []
@@ -127,7 +113,7 @@ metadata:
127
113
  bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
128
114
  changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
129
115
  source_code_uri: https://github.com/chocolateboy/string_splitter
130
- post_install_message:
116
+ post_install_message:
131
117
  rdoc_options: []
132
118
  require_paths:
133
119
  - lib
@@ -135,16 +121,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
135
121
  requirements:
136
122
  - - ">="
137
123
  - !ruby/object:Gem::Version
138
- version: '0'
124
+ version: '2.3'
139
125
  required_rubygems_version: !ruby/object:Gem::Requirement
140
126
  requirements:
141
127
  - - ">="
142
128
  - !ruby/object:Gem::Version
143
129
  version: '0'
144
130
  requirements: []
145
- rubyforge_project:
146
- rubygems_version: 2.7.7
147
- signing_key:
131
+ rubygems_version: 3.1.4
132
+ signing_key:
148
133
  specification_version: 4
149
134
  summary: String#split on steroids
150
135
  test_files: []