string_splitter 0.5.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ce843d4e98a02b464296d78d6be21f9fd7ce179b8156889f799280594250072c
4
- data.tar.gz: 4a378aaa5c951224b3091c3e1bf5d55c5a61f6c5395f9cf820ba008b6b1b7af6
3
+ metadata.gz: aa94b66f61dc3f1970d92b7fa4fb01ba0932262338c124592f539a2f3b4e7ca3
4
+ data.tar.gz: 0b415f82cfe372bdc9fd0e72d26beeb5071fc02f31e6fa5c02606d3837680506
5
5
  SHA512:
6
- metadata.gz: 68f9104ab3c5145322e5027814062a52692e854240a47df3e456d3a5d5c04f213a7875741aa00c6dd7bd5abb71e48f9e09b3d9dc5eb9bb488ef314849952c277
7
- data.tar.gz: b2cdb8d72ef41459085580192db7e2792025ed16a84bda7a27b82e148bcd40eddb9b9fe2b5b866d9417085b8560905e1667516db843cd57aa8712639462766f9
6
+ metadata.gz: 5efaba21a21e25b4b3f54a505f35cf16bba346427dfe4694554de6a1e917160e08cab9be65e80fd7d7407ba82cf5e876fff21ca2607e2da4fc137f3907b2322a
7
+ data.tar.gz: 37ac5c38859a8ed4036e68d9e588c58c8aa63e6ce2a47066dc1d02b115564cc187c505bec4b066a20981eedb21d210f01d019715ab9d91c9dc88f6e320763a3c
@@ -1,32 +1,96 @@
1
+ ## 0.7.2 - 2020-08-22
2
+
3
+ #### Fixes
4
+
5
+ - fix/test default delimiter + `remove_empty_fields`
6
+
7
+ ## 0.7.1 - 2020-08-22
8
+
9
+ #### Changes
10
+
11
+ - performance improvements
12
+ - delegate to `String#split` where possible
13
+ - use a regular class for Split rather than values.rb
14
+ - create Split objects directly rather than allocating intermediate hashes
15
+
16
+ ## 0.7.0 - 2020-08-21
17
+
18
+ #### Breaking Changes
19
+
20
+ - `String#split` incompatibility: we no longer trim the string (with
21
+ `String#strip`) before splitting if the delimiter is omitted
22
+
23
+ ## 0.6.0 - 2020-08-20
24
+
25
+ #### Breaking Changes
26
+
27
+ - `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
28
+ unlike Ruby's `String#split`, the former no longer strips the string before
29
+ splitting
30
+ - rename the `remove_empty` option `remove_empty_fields`
31
+ - rename the `exclude` option `except` (alias for `reject`)
32
+
33
+ #### Features
34
+
35
+ - add support for descending, negative, and infinite ranges,
36
+ e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
37
+
38
+ #### Fixes
39
+
40
+ - correctly handle backreferences in delimiter patterns
41
+
42
+ ## 0.5.1 - 2018-07-01
43
+
44
+ #### Changes
45
+
46
+ - set StringSplitter::VERSION when `string_splitter.rb` is loaded
47
+
1
48
  ## 0.5.0 - 2018-06-26
2
49
 
3
- - don't treat string delimiters as patterns
50
+ #### Features
51
+
4
52
  - add a `reject`/`exclude` option which rejects splits at the specified positions
5
53
  - add a `select` alias for `at`
6
54
 
55
+ #### Fixes
56
+
57
+ - don't treat string delimiters as patterns
58
+
7
59
  ## 0.4.0 - 2018-06-24
8
60
 
9
- - **breaking change**: remove the `offset` alias for `split.index`
61
+ #### Breaking Changes
62
+
63
+ - remove the `offset` alias for `split.index`
10
64
 
11
65
  ## 0.3.1 - 2018-06-24
12
66
 
13
- - remove trailing empty field when the separator is empty ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
67
+ #### Fixes
68
+
69
+ - remove trailing empty field when the separator is empty
70
+ ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
14
71
 
15
72
  ## 0.3.0 - 2018-06-23
16
73
 
17
- - **breaking change**: rename the `default_separator` option to `default_delimiter`
18
- - to avoid ambiguity in the code, refer to the input pattern/string as the
19
- "delimiter" and the matched string as the "separator"
74
+ #### Breaking Changes
75
+
76
+ - rename the `default_separator` option `default_delimiter`
20
77
 
21
78
  ## 0.2.0 - 2018-06-22
22
79
 
23
- - **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
24
- (AKA `pos`) as the 1-based accessor
80
+ #### Breaking Changes
81
+
82
+ - make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
83
+ 1-based accessor
25
84
 
26
85
  ## 0.1.0 - 2018-06-22
27
86
 
28
- - **breaking change**: the block now takes a single `split` object with an
29
- `index` accessor, rather than seperate `index` and `split` arguments
87
+ #### Breaking Changes
88
+
89
+ - the block now takes a single `split` object with an `index` accessor, rather
90
+ than separate `index` and `split` arguments
91
+
92
+ #### Features
93
+
30
94
  - add support for negative indices in the value supplied to the `at` option
31
95
  - add a `count` field to the split object containing the total number of splits
32
96
 
data/README.md CHANGED
@@ -3,14 +3,16 @@
3
3
  [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
4
4
  [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
5
5
 
6
- <!-- START doctoc generated TOC please keep comment here to allow auto update -->
7
- <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
6
+ <!-- toc -->
8
7
 
9
8
  - [NAME](#name)
10
9
  - [INSTALLATION](#installation)
11
10
  - [SYNOPSIS](#synopsis)
12
11
  - [DESCRIPTION](#description)
13
12
  - [WHY?](#why)
13
+ - [CAVEATS](#caveats)
14
+ - [Differences from String#split](#differences-from-stringsplit)
15
+ - [COMPATIBILITY](#compatibility)
14
16
  - [VERSION](#version)
15
17
  - [SEE ALSO](#see-also)
16
18
  - [Gems](#gems)
@@ -18,7 +20,7 @@
18
20
  - [AUTHOR](#author)
19
21
  - [COPYRIGHT AND LICENSE](#copyright-and-license)
20
22
 
21
- <!-- END doctoc generated TOC please keep comment here to allow auto update -->
23
+ <!-- tocstop -->
22
24
 
23
25
  # NAME
24
26
 
@@ -36,65 +38,128 @@ gem "string_splitter"
36
38
  require "string_splitter"
37
39
 
38
40
  ss = StringSplitter.new
41
+ ```
42
+
43
+ **Same as `String#split`**
39
44
 
40
- # same as String#split
41
- ss.split("foo bar baz quux")
42
- ss.split("foo bar baz quux", " ")
43
- ss.split("foo bar baz quux", /\s+/)
44
- # => ["foo", "bar", "baz", "quux"]
45
+ ```ruby
46
+ ss.split("foo bar baz")
47
+ ss.split("foo bar baz", " ")
48
+ ss.split("foo bar baz", /\s+/)
49
+ # => ["foo", "bar", "baz"]
50
+
51
+ ss.split("foo", "")
52
+ ss.split("foo", //)
53
+ # => ["f", "o", "o"]
54
+
55
+ ss.split("", "...")
56
+ ss.split("", /.../)
57
+ # => []
58
+ ```
45
59
 
46
- # split at the first delimiter
60
+ **Split at the first delimiter**
61
+
62
+ ```ruby
47
63
  ss.split("foo:bar:baz:quux", ":", at: 1)
64
+ ss.split("foo:bar:baz:quux", ":", select: 1)
48
65
  # => ["foo", "bar:baz:quux"]
66
+ ```
49
67
 
50
- # split at the last delimiter
68
+ **Split at the last delimiter**
69
+
70
+ ```ruby
51
71
  ss.split("foo:bar:baz:quux", ":", at: -1)
52
72
  # => ["foo:bar:baz", "quux"]
73
+ ```
74
+
75
+ **Split at multiple delimiter positions**
76
+
77
+ ```ruby
78
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
79
+ # => ["1", "2", "3", "4:5:6:7:8", "9"]
80
+ ```
53
81
 
54
- # split at multiple delimiter positions
55
- ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
56
- # => ["1", "2", "3", "4:5:6:7", "8:9"]
82
+ **Split at all but the first and last delimiters**
57
83
 
58
- # split from the right
84
+ ```ruby
85
+ ss.split("1:2:3:4:5:6", ":", except: [1, -1])
86
+ ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
87
+ # => ["1:2", "3", "4", "5:6"]
88
+ ```
89
+
90
+ **Split from the right**
91
+
92
+ ```ruby
59
93
  ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
60
94
  # => ["1:2:3:4", "5:6", "7", "8", "9"]
95
+ ```
96
+
97
+ **Split with negative, descending, and infinite ranges**
98
+
99
+ ```ruby
100
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
101
+ # => ["1", "2", "3", "4", "5", "6", "7:8:9"]
102
+
103
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
104
+ # => ["1:2:3:4", "5", "6", "7", "8:9"]
105
+
106
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
107
+ # => ["1", "2:3", "4", "5", "6:7", "8", "9"]
108
+ ```
109
+
110
+ **Full control via a block**
61
111
 
62
- # full control via a block
63
- result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
64
- split.index > 0 && split.lhs == split.rhs
112
+ ```ruby
113
+ result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
114
+ split.pos % 2 == 0
65
115
  end
66
- # => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
116
+ # => ["1:2", "3:4", "5:6", "7:8"]
117
+ ```
118
+
119
+ ```ruby
120
+ string = "banana".chars.sort.join # "aaabnn"
121
+
122
+ ss.split(string, "") do |split|
123
+ split.rhs != split.lhs
124
+ end
125
+ # => ["aaa", "b", "nn"]
67
126
  ```
68
127
 
69
128
  # DESCRIPTION
70
129
 
71
- Many languages have built-in string `split` functions/methods. They behave similarly
72
- (notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
73
- and handle a few common cases e.g.:
130
+ Many languages have built-in `split` functions/methods for strings. They behave
131
+ similarly (notwithstanding the occasional
132
+ [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
133
+ handle a few common cases, e.g.:
74
134
 
75
135
  * limiting the number of splits
76
- * including the separators in the results
136
+ * including the separator(s) in the results
77
137
  * removing (some) empty fields
78
138
 
79
- But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
80
- achieving the desired effects can be tricky. For instance, while `String#split` removes empty
81
- trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
82
- cramped API means there's no way to e.g. combine a limit (positive integer) with the option
83
- to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
139
+ But, because the API is squeezed into two overloaded parameters (the delimiter
140
+ and the limit), achieving the desired results can be tricky. For instance,
141
+ while `String#split` removes empty trailing fields (by default), it provides no
142
+ way to remove *all* empty fields. Likewise, the cramped API means there's no
143
+ way to, e.g., combine a limit (positive integer) with the option to preserve
144
+ empty fields (negative integer), or use backreferences in a delimiter pattern
84
145
  without including its captured subexpressions in the result.
85
146
 
86
- If `split` was being written from scratch, without the baggage of its legacy API,
87
- it's possible that some of these options would be made explicit rather than overloading
88
- the parameters. And, indeed, this is possible in some implementations,
89
- e.g. in Crystal:
147
+ If `split` was being written from scratch, without the baggage of its legacy
148
+ API, it's possible that some of these options would be made explicit rather
149
+ than overloading the parameters. And, indeed, this is possible in some
150
+ implementations, e.g. in Crystal:
90
151
 
91
152
  ```ruby
92
- ":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
93
- ":foo:bar:baz:".split(":", remove_empty: true) # => ["foo", "bar", "baz"]
153
+ ":foo:bar:baz:".split(":", remove_empty: false)
154
+ # => ["", "foo", "bar", "baz", ""]
155
+
156
+ ":foo:bar:baz:".split(":", remove_empty: true)
157
+ # => ["foo", "bar", "baz"]
94
158
  ````
95
159
 
96
- StringSplitter takes this one step further by moving the configuration out of the method altogether
97
- and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
160
+ StringSplitter takes this one step further by moving the configuration out of
161
+ the method altogether and delegating the strategy — i.e. which splits should be
162
+ accepted or rejected — to a block:
98
163
 
99
164
  ```ruby
100
165
  ss = StringSplitter.new
@@ -102,22 +167,32 @@ ss = StringSplitter.new
102
167
  ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
103
168
  # => ["foo", "bar:baz"]
104
169
 
105
- ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
106
- # => ["foo:bar", "baz"]
170
+ ss.split("foo:bar:baz:quux", ":") do |split|
171
+ split.position == 1 || split.position == 3
172
+ end
173
+ # => ["foo", "bar:baz", "quux"]
107
174
  ```
108
175
 
109
- As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
176
+ As a shortcut, the common case of splitting (or not splitting) at one or more
177
+ positions is supported by dedicated options:
110
178
 
111
179
  ```ruby
112
- ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
180
+ ss.split("foo:bar:baz:quux", ":", select: [1, -1])
181
+ # => ["foo", "bar:baz", "quux"]
182
+
183
+ ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
184
+ # => ["foo:bar", "baz:quux"]
113
185
  ```
114
186
 
115
187
  # WHY?
116
188
 
117
- I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
189
+ I wanted to split semi-structured output into fields without having to resort
190
+ to a regex or a full-blown parser.
118
191
 
119
- As an example, the nominally unstructured output of many Unix commands is often formatted in a way
120
- that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
192
+ As an example, the nominally unstructured output of many Unix commands is often
193
+ formatted in a way that's tantalizingly close to being
194
+ [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
195
+ apart from a few pesky exceptions, e.g.:
121
196
 
122
197
  ```bash
123
198
  $ ls -l
@@ -129,8 +204,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
129
204
  -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
130
205
  ```
131
206
 
132
- These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
133
- date (columns 6-8) i.e.:
207
+ These lines can *almost* be parsed into an array of fields by splitting them on
208
+ whitespace. The exception is the date (columns 6-8), i.e.:
134
209
 
135
210
  ```ruby
136
211
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -149,19 +224,20 @@ instead of:
149
224
  ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
150
225
  ```
151
226
 
152
- One way to work around this is to parse the whole line e.g.:
227
+ One way to work around this is to parse the whole line, e.g.:
153
228
 
154
229
  ```ruby
155
230
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
156
231
  ```
157
232
 
158
- But that requires us to specify *everything*. What we really want is a version of `split`
159
- which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
160
- splits are accepted, rather than being restricted to the single, baked-in strategy provided
161
- by the `limit` parameter.
233
+ But that requires us to specify *everything*. What we really want is a version
234
+ of `split` which allows us to veto splitting for the 6th and 7th delimiters
235
+ (and to stop after the 8th delimiter), i.e. control over which splits are
236
+ accepted, rather than being restricted to the single, baked-in strategy
237
+ provided by the `limit` parameter.
162
238
 
163
- By providing a simple way to accept or reject each split, StringSplitter makes cases like
164
- this easy to handle, either via a block:
239
+ By providing a simple way to accept or reject each split, StringSplitter makes
240
+ cases like this easy to handle, either via a block:
165
241
 
166
242
  ```ruby
167
243
  ss.split(line) do |split|
@@ -177,9 +253,51 @@ ss.split(line, at: [1..5, 8])
177
253
  # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
178
254
  ```
179
255
 
256
+ # CAVEATS
257
+
258
+ ## Differences from String#split
259
+
260
+ Unlike `String#split`, StringSplitter doesn't trim the string before splitting
261
+ if the delimiter is omitted or a single space, e.g.:
262
+
263
+ ```ruby
264
+ " foo bar baz ".split # => ["foo", "bar", "baz"]
265
+ " foo bar baz ".split(" ") # => ["foo", "bar", "baz"]
266
+
267
+ ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
268
+ ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
269
+ ```
270
+
271
+ `String#split` omits the `nil` values of unmatched optional captures:
272
+
273
+ ```ruby
274
+ "foo:bar:baz".scan(/(:)|(-)/) # => [[":", nil], [":", nil]]
275
+ "foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
276
+ ```
277
+
278
+ StringSplitter preserves them by default (if `include_captures` is true, as it
279
+ is by default), though they can be omitted from spread captures by passing
280
+ `:compact` as the value of the `spread_captures` option:
281
+
282
+ ```ruby
283
+ s1 = StringSplitter.new(spread_captures: true)
284
+ s2 = StringSplitter.new(spread_captures: false)
285
+ s3 = StringSplitter.new(spread_captures: :compact)
286
+
287
+ s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
288
+ s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
289
+ s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
290
+ ```
291
+
292
+ # COMPATIBILITY
293
+
294
+ StringSplitter is tested and supported on all versions of Ruby [supported by
295
+ the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
296
+ currently, Ruby 2.5 and above.
297
+
180
298
  # VERSION
181
299
 
182
- 0.5.0
300
+ 0.7.2
183
301
 
184
302
  # SEE ALSO
185
303
 
@@ -197,7 +315,7 @@ ss.split(line, at: [1..5, 8])
197
315
 
198
316
  # COPYRIGHT AND LICENSE
199
317
 
200
- Copyright © 2018 by chocolateboy.
318
+ Copyright © 2018-2020 by chocolateboy.
201
319
 
202
320
  This is free software; you can redistribute it and/or modify it under the
203
- terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
321
+ terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).
@@ -1,53 +1,91 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'values'
3
+ require 'set'
4
+
5
+ require_relative 'string_splitter/split'
6
+ require_relative 'string_splitter/version'
4
7
 
5
8
  # This class extends the functionality of +String#split+ by:
6
9
  #
7
10
  # - providing full control over which splits are accepted or rejected
11
+ #
8
12
  # - adding support for splitting from right-to-left
9
- # - encapsulating splitting options/preferences in instances rather than trying to
10
- # cram them into overloaded method parameters
13
+ #
14
+ # - encapsulating splitting options/preferences in the splitter rather
15
+ # than trying to cram them into overloaded method parameters
11
16
  #
12
17
  # These enhancements allow splits to handle many cases that otherwise require bigger
13
- # guns e.g. regex matching or parsing.
18
+ # guns, e.g. regex matching or parsing.
19
+ #
20
+ # Implementation-wise, we split the string either with String#split, or with a custom
21
+ # scanner if the delimiter may contain captures (since String#split doesn't handle
22
+ # them correctly) and parse the resulting tokens into an array of Split objects with
23
+ # the following attributes:
24
+ #
25
+ # - captures: separator substrings captured by parentheses in the delimiter pattern
26
+ # - count: the number of splits
27
+ # - index: the 0-based index of the split in the array
28
+ # - lhs: the string to the left of the separator (back to the previous split candidate)
29
+ # - position: the 1-based index of the split in the array (alias: pos)
30
+ # - rhs: the string to the right of the separator (up to the next split candidate)
31
+ # - rindex: the 0-based index of the split relative to the end of the array
32
+ # - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
33
+ # - separator: the string matched by the delimiter pattern/string
34
+ #
14
35
  class StringSplitter
15
- ACCEPT_ALL = ->(_split) { true }
16
- DEFAULT_DELIMITER = /\s+/
17
- NO_SPLITS = []
18
-
19
- Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
20
- def position
21
- index + 1
22
- end
36
+ # terminology: the delimiter is what we provide and the separators are what we get
37
+ # back (if we capture them). e.g. for:
38
+ #
39
+ # ss.split("foo:bar::baz", /(\W+)/)
40
+ #
41
+ # the delimiter is /(\W)/ and the separators are ":" and "::"
23
42
 
24
- alias_method :pos, :position
43
+ ACCEPT_ALL = ->(_split) { true }
44
+ DEFAULT_DELIMITER = /\s+/.freeze
45
+ REMOVE = [].freeze
46
+
47
+ # simulate an enum. the value is returned by the case statement
48
+ # in the generated block if the positions match
49
+ module Action
50
+ SELECT = true
51
+ REJECT = false
25
52
  end
26
53
 
54
+ private_constant :Action
55
+
27
56
  def initialize(
28
57
  default_delimiter: DEFAULT_DELIMITER,
29
58
  include_captures: true,
30
- remove_empty: false,
59
+ remove_empty: false, # TODO remove this
60
+ remove_empty_fields: remove_empty,
31
61
  spread_captures: true
32
62
  )
33
63
  @default_delimiter = default_delimiter
34
64
  @include_captures = include_captures
35
- @remove_empty = remove_empty
65
+ @remove_empty_fields = remove_empty_fields
36
66
  @spread_captures = spread_captures
37
67
  end
38
68
 
39
- attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
69
+ attr_reader(
70
+ :default_delimiter,
71
+ :include_captures,
72
+ :remove_empty_fields,
73
+ :spread_captures
74
+ )
75
+
76
+ # TODO remove this
77
+ alias remove_empty remove_empty_fields
40
78
 
41
79
  def split(
42
80
  string,
43
81
  delimiter = @default_delimiter,
44
- at: nil,
82
+ at: nil, # alias for select
83
+ except: nil, # alias for reject
45
84
  select: at,
46
- exclude: nil,
47
- reject: exclude,
85
+ reject: except,
48
86
  &block
49
87
  )
50
- result, block, splits, count, index = split_common(
88
+ result, splits, count, accept = init(
51
89
  string: string,
52
90
  delimiter: delimiter,
53
91
  select: select,
@@ -55,27 +93,22 @@ class StringSplitter
55
93
  block: block
56
94
  )
57
95
 
58
- splits.each do |split|
59
- split = Split.with(split.merge({ index: (index += 1), count: count }))
60
- result << split.lhs if result.empty?
61
-
62
- if block.call(split)
63
- if @include_captures
64
- if @spread_captures
65
- result += split.captures
66
- else
67
- result << split.captures
68
- end
69
- end
96
+ return result unless splits
70
97
 
71
- result << split.rhs
98
+ result << splits.first.lhs
99
+
100
+ splits.each_with_index do |split, index|
101
+ split.update!(count: count, index: index)
102
+
103
+ if accept.call(split)
104
+ result << split.captures << split.rhs
72
105
  else
73
106
  # append the rhs
74
107
  result[-1] = result[-1] + split.separator + split.rhs
75
108
  end
76
109
  end
77
110
 
78
- result
111
+ render(result)
79
112
  end
80
113
 
81
114
  alias lsplit split
@@ -83,13 +116,13 @@ class StringSplitter
83
116
  def rsplit(
84
117
  string,
85
118
  delimiter = @default_delimiter,
86
- at: nil,
119
+ at: nil, # alias for select
120
+ except: nil, # alias for reject
87
121
  select: at,
88
- exclude: nil,
89
- reject: exclude,
122
+ reject: except,
90
123
  &block
91
124
  )
92
- result, block, splits, count, index = split_common(
125
+ result, splits, count, accept = init(
93
126
  string: string,
94
127
  delimiter: delimiter,
95
128
  select: select,
@@ -97,195 +130,262 @@ class StringSplitter
97
130
  block: block
98
131
  )
99
132
 
100
- splits.reverse!.each do |split|
101
- split = Split.with(split.merge({ index: (index += 1), count: count }))
102
- result.unshift(split.rhs) if result.empty?
103
-
104
- if block.call(split)
105
- if @include_captures
106
- if @spread_captures
107
- result = split.captures + result
108
- else
109
- result.unshift(split.captures)
110
- end
111
- end
133
+ return result unless splits
134
+
135
+ result.unshift(splits.last.rhs)
112
136
 
113
- result.unshift(split.lhs)
137
+ splits.reverse_each.with_index do |split, index|
138
+ split.update!(count: count, index: index)
139
+
140
+ if accept.call(split)
141
+ # [lhs + captures] + result
142
+ result.unshift(split.lhs, split.captures)
114
143
  else
115
144
  # prepend the lhs
116
145
  result[0] = split.lhs + split.separator + result[0]
117
146
  end
118
147
  end
119
148
 
120
- result
149
+ render(result)
121
150
  end
122
151
 
123
152
  private
124
153
 
125
- def splits_for(parts, ncaptures)
126
- result = []
127
- splits = []
128
-
129
- until parts.empty?
130
- lhs = parts.shift
131
- separator = parts.shift
132
- captures = parts.shift(ncaptures)
133
- rhs = parts.length == 1 ? parts.shift : parts.first
134
-
135
- if @remove_empty && (lhs.empty? || rhs.empty?)
136
- if lhs.empty? && rhs.empty?
137
- # do nothing
138
- elsif parts.empty? # last split
139
- result << (!lhs.empty? ? lhs : rhs) if splits.empty?
140
- elsif rhs.empty?
141
- # replace the empty rhs with the non-empty lhs
142
- parts[0] = lhs
143
- end
154
+ # initialisation common to +split+ and +rsplit+
155
+ #
156
+ # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
157
+ # the following fields:
158
+ #
159
+ # - result: the array of separated strings to return from +split+ or +rsplit+.
160
+ # if the splits array is empty, the caller returns this array immediately
161
+ # without any further processing
162
+ #
163
+ # - splits: an array of hashes containing the lhs, rhs, separator and captured
164
+ # separator substrings for each split
165
+ #
166
+ # - count: the number of splits
167
+ #
168
+ # - accept: a proc whose return value determines whether each split should be
169
+ # accepted (true) or rejected (false)
170
+ #
171
+ def init(string:, delimiter:, select:, reject:, block:)
172
+ return [[]] if string.empty?
173
+
174
+ unless block
175
+ if reject
176
+ positions = reject
177
+ action = Action::REJECT
178
+ elsif select
179
+ positions = select
180
+ action = Action::SELECT
181
+ else
182
+ block = ACCEPT_ALL
183
+ end
184
+ end
185
+
186
+ # use String#split if we can
187
+ #
188
+ # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
189
+ # on TruffleRuby
144
190
 
145
- next
191
+ if delimiter.is_a?(String)
192
+ limit = -1
193
+
194
+ if delimiter == ' '
195
+ delimiter = / / # don't trim
196
+ elsif delimiter.empty?
197
+ limit = 0 # remove the trailing empty string
146
198
  end
147
199
 
148
- splits << {
149
- lhs: lhs,
150
- rhs: rhs,
151
- separator: separator,
152
- captures: captures,
153
- }
154
- end
200
+ result = string.split(delimiter, limit)
155
201
 
156
- [result, splits]
157
- end
202
+ return [result] if result.length == 1 # delimiter not found: no splits
158
203
 
159
- # setup common to both split methods
160
- def split_common(string:, delimiter:, select:, reject:, block:)
161
- unless (match = string.match(delimiter))
162
- result = (@remove_empty && string.empty?) ? [] : [string]
163
- return [result, block, NO_SPLITS, 0, -1]
164
- end
204
+ if block == ACCEPT_ALL # return the (2 or more) fields
205
+ result = result.reject(&:empty?) if @remove_empty_fields
206
+ return [result]
207
+ end
165
208
 
166
- select = Array(select)
167
- reject = Array(reject)
209
+ splits = []
168
210
 
169
- if !reject.empty?
170
- positions = reject
171
- action = :reject
172
- elsif !select.empty?
173
- positions = select
174
- action = :select
211
+ result.each_cons(2) do |lhs, rhs| # 2 or more fields
212
+ splits << Split.new(
213
+ captures: [],
214
+ lhs: lhs,
215
+ rhs: rhs,
216
+ separator: delimiter
217
+ )
218
+ end
219
+ elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
220
+ # non-empty separators so -1 is safe
221
+
222
+ # XXX String#split with block was introduced in Ruby 2.6:
223
+ #
224
+ # - https://rubyreferences.github.io/rubychanges/2.6.html#stringsplit-with-block
225
+ #
226
+ # rather than sniffing, we'll just use the compatible version for now
227
+ #
228
+ # if @remove_empty_fields
229
+ # result = []
230
+ #
231
+ # string.split(delimiter, -1) do |field|
232
+ # result << field unless field.empty?
233
+ # end
234
+ # else
235
+ # result = string.split(delimiter, -1)
236
+ # end
237
+
238
+ result = string.split(delimiter, -1)
239
+ result = result.reject(&:empty?) if @remove_empty_fields
240
+ return [result]
241
+ else
242
+ splits = parse(string, delimiter)
175
243
  end
176
244
 
177
- ncaptures = match.captures.length
178
- delimiter = Regexp.quote(delimiter) if delimiter.is_a?(String)
179
- delimiter = increment_backrefs(delimiter, ncaptures)
180
- parts = string.split(/(#{delimiter})/, -1)
181
- remove_trailing_empty_field!(parts, ncaptures)
182
- result, splits = splits_for(parts, ncaptures)
183
245
  count = splits.length
184
- block ||= positions ? match_positions(positions, action, count) : ACCEPT_ALL
185
246
 
186
- [result, block, splits, count, -1]
247
+ return [[string]] if count.zero?
248
+
249
+ block ||= compile(positions, action, count)
250
+ [[], splits, count, block]
187
251
  end
188
252
 
189
- # increment back-references so they remain valid when the outer capture
190
- # is added.
191
- #
192
- # e.g. to split on:
193
- #
194
- # - <foo-comment> ... </foo-comment>
195
- # - <bar-comment> ... </bar-comment>
196
- #
197
- # etc.
253
+ def render(values)
254
+ values.flat_map do |value|
255
+ if value.is_a?(String)
256
+ value.empty? && @remove_empty_fields ? REMOVE : [value]
257
+ elsif @include_captures
258
+ if @spread_captures
259
+ # TODO make sure compact can return a Capture
260
+ @spread_captures == :compact ? value.compact : value
261
+ elsif value.empty?
262
+ # we expose non-captures (string delimiters or regexps with no
263
+ # captures) as empty arrays inside the block, so the type is
264
+ # consistent, but it doesn't make sense to keep them in the
265
+ # result
266
+ REMOVE
267
+ else
268
+ [value]
269
+ end
270
+ else
271
+ REMOVE
272
+ end
273
+ end
274
+ end
275
+
276
+ # takes a string and a delimiter pattern (regex or string) and splits it along
277
+ # the delimiter, returning an array of objects (hashes) representing each split.
278
+ # e.g. for:
198
279
  #
199
- # before:
280
+ # parse("foo:bar:baz:quux", ":")
200
281
  #
201
- # %r| <(\w+-comment)> [^<]* </\1-comment> |x
282
+ # we return:
202
283
  #
203
- # after:
284
+ # [
285
+ # { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
286
+ # { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
287
+ # { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
288
+ # ]
204
289
  #
205
- # %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
290
+ def parse(string, delimiter)
291
+ # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
292
+ result = []
293
+ start = 0
206
294
 
207
- def increment_backrefs(delimiter, ncaptures)
208
- if delimiter.is_a?(Regexp) && ncaptures > 0
209
- delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
210
- match = Regexp.last_match
211
- match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
212
- end
295
+ # we don't use the argument passed to the +scan+ block here because it's a
296
+ # string (the separator) if there are no captures, rather than an empty
297
+ # array. we use match.captures instead to get the array
298
+ string.scan(delimiter) do
299
+ match = Regexp.last_match
300
+ index, after = match.offset(0)
301
+ separator = match[0]
302
+
303
+ # ignore empty separators at the beginning and/or end of the string
304
+ next if separator.empty? && (index.zero? || after == string.length)
305
+
306
+ lhs = string.slice(start, index - start)
307
+ result.last.rhs = lhs unless result.empty?
308
+
309
+ # this is correct for the last/only match, but gets updated to the next
310
+ # match's lhs for other matches
311
+ rhs = match.post_match
312
+
313
+ # captures = (has_names ? Captures.new(match) : match.captures)
314
+
315
+ result << Split.new(
316
+ captures: match.captures,
317
+ lhs: lhs,
318
+ rhs: rhs,
319
+ separator: separator
320
+ )
321
+
322
+ # advance the start index (the start of the next lhs) to the position
323
+ # after the last character of the separator
324
+ start = after
213
325
  end
214
326
 
215
- delimiter
327
+ result
216
328
  end
217
329
 
218
- # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
219
- # on an empty string/pattern without removing trailing empty fields e.g.:
330
+ # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
331
+ # on the action) the supplied positions
220
332
  #
221
- # "foobar".split("", -1)
222
- # "foobar".split(//, -1)
223
- # # => ["f", "o", "o", "b", "a", "r", ""]
333
+ # positions are preprocessed to support negative indices, infinite ranges, and
334
+ # descending ranges, e.g.:
224
335
  #
225
- # "foobar".split(/()/, -1)
226
- # # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
336
+ # ss.split("foo:bar:baz:quux", ":", at: -1)
227
337
  #
228
- # "foobar".split(/(())/, -1)
229
- # # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
338
+ # translates to:
230
339
  #
231
- # *there is no such thing as an empty field whose separator is empty*, so
232
- # if String#split's result ends with an empty separator, 0 or more (empty)
233
- # captures and an empty field, we can safely remove them.
234
-
235
- def remove_trailing_empty_field!(parts, ncaptures)
236
- # the trailing field is at index -1. if there are 0 captures, the separator
237
- # is at -2:
238
- #
239
- # [empty_separator, empty_field]
240
- #
241
- # if there is 1 capture, the separator is at -3:
242
- #
243
- # [empty_separator, capture, empty_field]
340
+ # ss.split("foo:bar:baz:quux", ":", at: 3)
341
+ #
342
+ # and
343
+ #
344
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
345
+ #
346
+ # translates to:
347
+ #
348
+ # ss.split("foo:bar:baz:quux", ":", at: 6..8)
349
+ #
350
+ def compile(positions, action, count)
351
+ # XXX note: we don't use modulo, because we don't want
352
+ # out-of-bounds indices to silently work, e.g. we don't want:
244
353
  #
245
- # etc. therefore we find the separator by walking back
354
+ # ss.split("foo:bar:baz:quux", ":", at: -42)
246
355
  #
247
- # 1 (empty field)
248
- # + ncaptures
249
- # + 1 (separator)
356
+ # to mysteriously match when the index/position is 0/1
250
357
  #
251
- # steps from the end of the array i.e. ncaptures + 2
252
- count = ncaptures + 2
253
- separator_index = count * -1
254
-
255
- return unless parts[-1].empty? && parts[separator_index].empty?
256
-
257
- # drop the empty separator, the (empty) captures, and the trailing empty field
258
- parts.pop(count)
259
- end
260
-
261
- def match_positions(positions, action, nsplits)
262
- positions = Array(positions).map do |position|
263
- if position.is_a?(Integer) && position.negative?
264
- # translate negative indices to 1-based non-negative indices e.g:
265
- #
266
- # ss.split("foo:bar:baz:quux", ":", at: -1)
267
- #
268
- # translates to:
269
- #
270
- # ss.split("foo:bar:baz:quux", ":", at: 3)
271
- #
272
- # XXX note: we don't use modulo, because we don't want
273
- # out-of-bounds indices to silently work e.g. we don't want:
274
- #
275
- # ss.split("foo:bar:baz:quux", ":", -42)
276
- #
277
- # to mysteriously match when the position is 2
278
-
279
- nsplits + 1 + position
358
+ resolve = ->(int) { int.negative? ? count + 1 + int : int }
359
+
360
+ # don't use Array(...) to wrap these as we don't want to convert ranges
361
+ positions = positions.is_a?(Array) ? positions : [positions]
362
+
363
+ positions = positions.map do |position|
364
+ if position.is_a?(Integer)
365
+ resolve[position]
366
+ elsif position.is_a?(Range)
367
+ rbegin = position.begin
368
+ rend = position.end
369
+ rexc = position.exclude_end?
370
+
371
+ if rbegin.nil?
372
+ Range.new(1, resolve[rend], rexc)
373
+ elsif rend.nil?
374
+ Range.new(resolve[rbegin], count, rexc)
375
+ elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
376
+ from = resolve[rbegin]
377
+ to = resolve[rend]
378
+ to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
379
+ else
380
+ position
381
+ end
382
+ elsif position.is_a?(Set)
383
+ position.map { |it| resolve[it] }.to_set
280
384
  else
281
385
  position
282
386
  end
283
387
  end
284
388
 
285
- match = action == :select
286
-
287
- lambda do |split|
288
- case split.position when *positions then match else !match end
289
- end
389
+ ->(split) { case split.position when *positions then action else !action end }
290
390
  end
291
391
  end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ class StringSplitter
4
+ class Split
5
+ attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
6
+ attr_writer :rhs
7
+ alias pos position
8
+
9
+ def initialize(captures:, lhs:, rhs:, separator:)
10
+ @captures = captures
11
+ @lhs = lhs
12
+ @rhs = rhs
13
+ @separator = separator
14
+ end
15
+
16
+ # 0-based index relative to the end of the array, e.g. for 5 items:
17
+ #
18
+ # index | rindex
19
+ # ------|-------
20
+ # 0 | 4
21
+ # 1 | 3
22
+ # 2 | 2
23
+ # 3 | 1
24
+ # 4 | 0
25
+ def rindex
26
+ @count - @position
27
+ end
28
+
29
+ # 1-based position relative to the end of the array, e.g. for 5 items:
30
+ #
31
+ # position | rposition
32
+ # ----------|----------
33
+ # 1 | 5
34
+ # 2 | 4
35
+ # 3 | 3
36
+ # 4 | 2
37
+ # 5 | 1
38
+ def rposition
39
+ @count + 1 - @position
40
+ end
41
+
42
+ alias rpos rposition
43
+
44
+ def update!(count:, index:)
45
+ @count = count
46
+ @index = index
47
+ @position = index + 1
48
+ freeze
49
+ end
50
+ end
51
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.5.0'
4
+ VERSION = '0.7.2'
5
5
  end
metadata CHANGED
@@ -1,71 +1,57 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.7.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-06-25 00:00:00.000000000 Z
11
+ date: 2020-08-22 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: values
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '1.8'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - "~>"
25
- - !ruby/object:Gem::Version
26
- version: '1.8'
27
13
  - !ruby/object:Gem::Dependency
28
14
  name: bundler
29
15
  requirement: !ruby/object:Gem::Requirement
30
16
  requirements:
31
17
  - - "~>"
32
18
  - !ruby/object:Gem::Version
33
- version: '1.16'
19
+ version: '2.1'
34
20
  type: :development
35
21
  prerelease: false
36
22
  version_requirements: !ruby/object:Gem::Requirement
37
23
  requirements:
38
24
  - - "~>"
39
25
  - !ruby/object:Gem::Version
40
- version: '1.16'
26
+ version: '2.1'
41
27
  - !ruby/object:Gem::Dependency
42
28
  name: minitest
43
29
  requirement: !ruby/object:Gem::Requirement
44
30
  requirements:
45
31
  - - "~>"
46
32
  - !ruby/object:Gem::Version
47
- version: '5.11'
33
+ version: '5.0'
48
34
  type: :development
49
35
  prerelease: false
50
36
  version_requirements: !ruby/object:Gem::Requirement
51
37
  requirements:
52
38
  - - "~>"
53
39
  - !ruby/object:Gem::Version
54
- version: '5.11'
40
+ version: '5.0'
55
41
  - !ruby/object:Gem::Dependency
56
42
  name: minitest-power_assert
57
43
  requirement: !ruby/object:Gem::Requirement
58
44
  requirements:
59
45
  - - "~>"
60
46
  - !ruby/object:Gem::Version
61
- version: 0.3.0
47
+ version: '0.3'
62
48
  type: :development
63
49
  prerelease: false
64
50
  version_requirements: !ruby/object:Gem::Requirement
65
51
  requirements:
66
52
  - - "~>"
67
53
  - !ruby/object:Gem::Version
68
- version: 0.3.0
54
+ version: '0.3'
69
55
  - !ruby/object:Gem::Dependency
70
56
  name: minitest-reporters
71
57
  requirement: !ruby/object:Gem::Requirement
@@ -86,29 +72,15 @@ dependencies:
86
72
  requirements:
87
73
  - - "~>"
88
74
  - !ruby/object:Gem::Version
89
- version: '10.0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - "~>"
95
- - !ruby/object:Gem::Version
96
- version: '10.0'
97
- - !ruby/object:Gem::Dependency
98
- name: rubocop
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - "~>"
102
- - !ruby/object:Gem::Version
103
- version: 0.54.0
75
+ version: '13.0'
104
76
  type: :development
105
77
  prerelease: false
106
78
  version_requirements: !ruby/object:Gem::Requirement
107
79
  requirements:
108
80
  - - "~>"
109
81
  - !ruby/object:Gem::Version
110
- version: 0.54.0
111
- description:
82
+ version: '13.0'
83
+ description:
112
84
  email: chocolate@cpan.org
113
85
  executables: []
114
86
  extensions: []
@@ -118,6 +90,7 @@ files:
118
90
  - LICENSE.md
119
91
  - README.md
120
92
  - lib/string_splitter.rb
93
+ - lib/string_splitter/split.rb
121
94
  - lib/string_splitter/version.rb
122
95
  homepage: https://github.com/chocolateboy/string_splitter
123
96
  licenses:
@@ -127,7 +100,7 @@ metadata:
127
100
  bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
128
101
  changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
129
102
  source_code_uri: https://github.com/chocolateboy/string_splitter
130
- post_install_message:
103
+ post_install_message:
131
104
  rdoc_options: []
132
105
  require_paths:
133
106
  - lib
@@ -135,16 +108,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
135
108
  requirements:
136
109
  - - ">="
137
110
  - !ruby/object:Gem::Version
138
- version: '0'
111
+ version: '2.3'
139
112
  required_rubygems_version: !ruby/object:Gem::Requirement
140
113
  requirements:
141
114
  - - ">="
142
115
  - !ruby/object:Gem::Version
143
116
  version: '0'
144
117
  requirements: []
145
- rubyforge_project:
146
- rubygems_version: 2.7.7
147
- signing_key:
118
+ rubygems_version: 3.1.4
119
+ signing_key:
148
120
  specification_version: 4
149
121
  summary: String#split on steroids
150
122
  test_files: []