string_splitter 0.5.0 → 0.7.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ce843d4e98a02b464296d78d6be21f9fd7ce179b8156889f799280594250072c
4
- data.tar.gz: 4a378aaa5c951224b3091c3e1bf5d55c5a61f6c5395f9cf820ba008b6b1b7af6
3
+ metadata.gz: aa94b66f61dc3f1970d92b7fa4fb01ba0932262338c124592f539a2f3b4e7ca3
4
+ data.tar.gz: 0b415f82cfe372bdc9fd0e72d26beeb5071fc02f31e6fa5c02606d3837680506
5
5
  SHA512:
6
- metadata.gz: 68f9104ab3c5145322e5027814062a52692e854240a47df3e456d3a5d5c04f213a7875741aa00c6dd7bd5abb71e48f9e09b3d9dc5eb9bb488ef314849952c277
7
- data.tar.gz: b2cdb8d72ef41459085580192db7e2792025ed16a84bda7a27b82e148bcd40eddb9b9fe2b5b866d9417085b8560905e1667516db843cd57aa8712639462766f9
6
+ metadata.gz: 5efaba21a21e25b4b3f54a505f35cf16bba346427dfe4694554de6a1e917160e08cab9be65e80fd7d7407ba82cf5e876fff21ca2607e2da4fc137f3907b2322a
7
+ data.tar.gz: 37ac5c38859a8ed4036e68d9e588c58c8aa63e6ce2a47066dc1d02b115564cc187c505bec4b066a20981eedb21d210f01d019715ab9d91c9dc88f6e320763a3c
@@ -1,32 +1,96 @@
1
+ ## 0.7.2 - 2020-08-22
2
+
3
+ #### Fixes
4
+
5
+ - fix/test default delimiter + `remove_empty_fields`
6
+
7
+ ## 0.7.1 - 2020-08-22
8
+
9
+ #### Changes
10
+
11
+ - performance improvements
12
+ - delegate to `String#split` where possible
13
+ - use a regular class for Split rather than values.rb
14
+ - create Split objects directly rather than allocating intermediate hashes
15
+
16
+ ## 0.7.0 - 2020-08-21
17
+
18
+ #### Breaking Changes
19
+
20
+ - `String#split` incompatibility: we no longer trim the string (with
21
+ `String#strip`) before splitting if the delimiter is omitted
22
+
23
+ ## 0.6.0 - 2020-08-20
24
+
25
+ #### Breaking Changes
26
+
27
+ - `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
28
+ unlike Ruby's `String#split`, the former no longer strips the string before
29
+ splitting
30
+ - rename the `remove_empty` option `remove_empty_fields`
31
+ - rename the `exclude` option `except` (alias for `reject`)
32
+
33
+ #### Features
34
+
35
+ - add support for descending, negative, and infinite ranges,
36
+ e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
37
+
38
+ #### Fixes
39
+
40
+ - correctly handle backreferences in delimiter patterns
41
+
42
+ ## 0.5.1 - 2018-07-01
43
+
44
+ #### Changes
45
+
46
+ - set StringSplitter::VERSION when `string_splitter.rb` is loaded
47
+
1
48
  ## 0.5.0 - 2018-06-26
2
49
 
3
- - don't treat string delimiters as patterns
50
+ #### Features
51
+
4
52
  - add a `reject`/`exclude` option which rejects splits at the specified positions
5
53
  - add a `select` alias for `at`
6
54
 
55
+ #### Fixes
56
+
57
+ - don't treat string delimiters as patterns
58
+
7
59
  ## 0.4.0 - 2018-06-24
8
60
 
9
- - **breaking change**: remove the `offset` alias for `split.index`
61
+ #### Breaking Changes
62
+
63
+ - remove the `offset` alias for `split.index`
10
64
 
11
65
  ## 0.3.1 - 2018-06-24
12
66
 
13
- - remove trailing empty field when the separator is empty ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
67
+ #### Fixes
68
+
69
+ - remove trailing empty field when the separator is empty
70
+ ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
14
71
 
15
72
  ## 0.3.0 - 2018-06-23
16
73
 
17
- - **breaking change**: rename the `default_separator` option to `default_delimiter`
18
- - to avoid ambiguity in the code, refer to the input pattern/string as the
19
- "delimiter" and the matched string as the "separator"
74
+ #### Breaking Changes
75
+
76
+ - rename the `default_separator` option `default_delimiter`
20
77
 
21
78
  ## 0.2.0 - 2018-06-22
22
79
 
23
- - **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
24
- (AKA `pos`) as the 1-based accessor
80
+ #### Breaking Changes
81
+
82
+ - make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
83
+ 1-based accessor
25
84
 
26
85
  ## 0.1.0 - 2018-06-22
27
86
 
28
- - **breaking change**: the block now takes a single `split` object with an
29
- `index` accessor, rather than seperate `index` and `split` arguments
87
+ #### Breaking Changes
88
+
89
+ - the block now takes a single `split` object with an `index` accessor, rather
90
+ than separate `index` and `split` arguments
91
+
92
+ #### Features
93
+
30
94
  - add support for negative indices in the value supplied to the `at` option
31
95
  - add a `count` field to the split object containing the total number of splits
32
96
 
data/README.md CHANGED
@@ -3,14 +3,16 @@
3
3
  [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
4
4
  [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
5
5
 
6
- <!-- START doctoc generated TOC please keep comment here to allow auto update -->
7
- <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
6
+ <!-- toc -->
8
7
 
9
8
  - [NAME](#name)
10
9
  - [INSTALLATION](#installation)
11
10
  - [SYNOPSIS](#synopsis)
12
11
  - [DESCRIPTION](#description)
13
12
  - [WHY?](#why)
13
+ - [CAVEATS](#caveats)
14
+ - [Differences from String#split](#differences-from-stringsplit)
15
+ - [COMPATIBILITY](#compatibility)
14
16
  - [VERSION](#version)
15
17
  - [SEE ALSO](#see-also)
16
18
  - [Gems](#gems)
@@ -18,7 +20,7 @@
18
20
  - [AUTHOR](#author)
19
21
  - [COPYRIGHT AND LICENSE](#copyright-and-license)
20
22
 
21
- <!-- END doctoc generated TOC please keep comment here to allow auto update -->
23
+ <!-- tocstop -->
22
24
 
23
25
  # NAME
24
26
 
@@ -36,65 +38,128 @@ gem "string_splitter"
36
38
  require "string_splitter"
37
39
 
38
40
  ss = StringSplitter.new
41
+ ```
42
+
43
+ **Same as `String#split`**
39
44
 
40
- # same as String#split
41
- ss.split("foo bar baz quux")
42
- ss.split("foo bar baz quux", " ")
43
- ss.split("foo bar baz quux", /\s+/)
44
- # => ["foo", "bar", "baz", "quux"]
45
+ ```ruby
46
+ ss.split("foo bar baz")
47
+ ss.split("foo bar baz", " ")
48
+ ss.split("foo bar baz", /\s+/)
49
+ # => ["foo", "bar", "baz"]
50
+
51
+ ss.split("foo", "")
52
+ ss.split("foo", //)
53
+ # => ["f", "o", "o"]
54
+
55
+ ss.split("", "...")
56
+ ss.split("", /.../)
57
+ # => []
58
+ ```
45
59
 
46
- # split at the first delimiter
60
+ **Split at the first delimiter**
61
+
62
+ ```ruby
47
63
  ss.split("foo:bar:baz:quux", ":", at: 1)
64
+ ss.split("foo:bar:baz:quux", ":", select: 1)
48
65
  # => ["foo", "bar:baz:quux"]
66
+ ```
49
67
 
50
- # split at the last delimiter
68
+ **Split at the last delimiter**
69
+
70
+ ```ruby
51
71
  ss.split("foo:bar:baz:quux", ":", at: -1)
52
72
  # => ["foo:bar:baz", "quux"]
73
+ ```
74
+
75
+ **Split at multiple delimiter positions**
76
+
77
+ ```ruby
78
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
79
+ # => ["1", "2", "3", "4:5:6:7:8", "9"]
80
+ ```
53
81
 
54
- # split at multiple delimiter positions
55
- ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
56
- # => ["1", "2", "3", "4:5:6:7", "8:9"]
82
+ **Split at all but the first and last delimiters**
57
83
 
58
- # split from the right
84
+ ```ruby
85
+ ss.split("1:2:3:4:5:6", ":", except: [1, -1])
86
+ ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
87
+ # => ["1:2", "3", "4", "5:6"]
88
+ ```
89
+
90
+ **Split from the right**
91
+
92
+ ```ruby
59
93
  ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
60
94
  # => ["1:2:3:4", "5:6", "7", "8", "9"]
95
+ ```
96
+
97
+ **Split with negative, descending, and infinite ranges**
98
+
99
+ ```ruby
100
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
101
+ # => ["1", "2", "3", "4", "5", "6", "7:8:9"]
102
+
103
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
104
+ # => ["1:2:3:4", "5", "6", "7", "8:9"]
105
+
106
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
107
+ # => ["1", "2:3", "4", "5", "6:7", "8", "9"]
108
+ ```
109
+
110
+ **Full control via a block**
61
111
 
62
- # full control via a block
63
- result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
64
- split.index > 0 && split.lhs == split.rhs
112
+ ```ruby
113
+ result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
114
+ split.pos % 2 == 0
65
115
  end
66
- # => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
116
+ # => ["1:2", "3:4", "5:6", "7:8"]
117
+ ```
118
+
119
+ ```ruby
120
+ string = "banana".chars.sort.join # "aaabnn"
121
+
122
+ ss.split(string, "") do |split|
123
+ split.rhs != split.lhs
124
+ end
125
+ # => ["aaa", "b", "nn"]
67
126
  ```
68
127
 
69
128
  # DESCRIPTION
70
129
 
71
- Many languages have built-in string `split` functions/methods. They behave similarly
72
- (notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
73
- and handle a few common cases e.g.:
130
+ Many languages have built-in `split` functions/methods for strings. They behave
131
+ similarly (notwithstanding the occasional
132
+ [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
133
+ handle a few common cases, e.g.:
74
134
 
75
135
  * limiting the number of splits
76
- * including the separators in the results
136
+ * including the separator(s) in the results
77
137
  * removing (some) empty fields
78
138
 
79
- But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
80
- achieving the desired effects can be tricky. For instance, while `String#split` removes empty
81
- trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
82
- cramped API means there's no way to e.g. combine a limit (positive integer) with the option
83
- to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
139
+ But, because the API is squeezed into two overloaded parameters (the delimiter
140
+ and the limit), achieving the desired results can be tricky. For instance,
141
+ while `String#split` removes empty trailing fields (by default), it provides no
142
+ way to remove *all* empty fields. Likewise, the cramped API means there's no
143
+ way to, e.g., combine a limit (positive integer) with the option to preserve
144
+ empty fields (negative integer), or use backreferences in a delimiter pattern
84
145
  without including its captured subexpressions in the result.
85
146
 
86
- If `split` was being written from scratch, without the baggage of its legacy API,
87
- it's possible that some of these options would be made explicit rather than overloading
88
- the parameters. And, indeed, this is possible in some implementations,
89
- e.g. in Crystal:
147
+ If `split` was being written from scratch, without the baggage of its legacy
148
+ API, it's possible that some of these options would be made explicit rather
149
+ than overloading the parameters. And, indeed, this is possible in some
150
+ implementations, e.g. in Crystal:
90
151
 
91
152
  ```ruby
92
- ":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
93
- ":foo:bar:baz:".split(":", remove_empty: true) # => ["foo", "bar", "baz"]
153
+ ":foo:bar:baz:".split(":", remove_empty: false)
154
+ # => ["", "foo", "bar", "baz", ""]
155
+
156
+ ":foo:bar:baz:".split(":", remove_empty: true)
157
+ # => ["foo", "bar", "baz"]
94
158
  ````
95
159
 
96
- StringSplitter takes this one step further by moving the configuration out of the method altogether
97
- and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
160
+ StringSplitter takes this one step further by moving the configuration out of
161
+ the method altogether and delegating the strategy — i.e. which splits should be
162
+ accepted or rejected — to a block:
98
163
 
99
164
  ```ruby
100
165
  ss = StringSplitter.new
@@ -102,22 +167,32 @@ ss = StringSplitter.new
102
167
  ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
103
168
  # => ["foo", "bar:baz"]
104
169
 
105
- ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
106
- # => ["foo:bar", "baz"]
170
+ ss.split("foo:bar:baz:quux", ":") do |split|
171
+ split.position == 1 || split.position == 3
172
+ end
173
+ # => ["foo", "bar:baz", "quux"]
107
174
  ```
108
175
 
109
- As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
176
+ As a shortcut, the common case of splitting (or not splitting) at one or more
177
+ positions is supported by dedicated options:
110
178
 
111
179
  ```ruby
112
- ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
180
+ ss.split("foo:bar:baz:quux", ":", select: [1, -1])
181
+ # => ["foo", "bar:baz", "quux"]
182
+
183
+ ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
184
+ # => ["foo:bar", "baz:quux"]
113
185
  ```
114
186
 
115
187
  # WHY?
116
188
 
117
- I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
189
+ I wanted to split semi-structured output into fields without having to resort
190
+ to a regex or a full-blown parser.
118
191
 
119
- As an example, the nominally unstructured output of many Unix commands is often formatted in a way
120
- that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
192
+ As an example, the nominally unstructured output of many Unix commands is often
193
+ formatted in a way that's tantalizingly close to being
194
+ [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
195
+ apart from a few pesky exceptions, e.g.:
121
196
 
122
197
  ```bash
123
198
  $ ls -l
@@ -129,8 +204,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
129
204
  -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
130
205
  ```
131
206
 
132
- These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
133
- date (columns 6-8) i.e.:
207
+ These lines can *almost* be parsed into an array of fields by splitting them on
208
+ whitespace. The exception is the date (columns 6-8), i.e.:
134
209
 
135
210
  ```ruby
136
211
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -149,19 +224,20 @@ instead of:
149
224
  ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
150
225
  ```
151
226
 
152
- One way to work around this is to parse the whole line e.g.:
227
+ One way to work around this is to parse the whole line, e.g.:
153
228
 
154
229
  ```ruby
155
230
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
156
231
  ```
157
232
 
158
- But that requires us to specify *everything*. What we really want is a version of `split`
159
- which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
160
- splits are accepted, rather than being restricted to the single, baked-in strategy provided
161
- by the `limit` parameter.
233
+ But that requires us to specify *everything*. What we really want is a version
234
+ of `split` which allows us to veto splitting for the 6th and 7th delimiters
235
+ (and to stop after the 8th delimiter), i.e. control over which splits are
236
+ accepted, rather than being restricted to the single, baked-in strategy
237
+ provided by the `limit` parameter.
162
238
 
163
- By providing a simple way to accept or reject each split, StringSplitter makes cases like
164
- this easy to handle, either via a block:
239
+ By providing a simple way to accept or reject each split, StringSplitter makes
240
+ cases like this easy to handle, either via a block:
165
241
 
166
242
  ```ruby
167
243
  ss.split(line) do |split|
@@ -177,9 +253,51 @@ ss.split(line, at: [1..5, 8])
177
253
  # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
178
254
  ```
179
255
 
256
+ # CAVEATS
257
+
258
+ ## Differences from String#split
259
+
260
+ Unlike `String#split`, StringSplitter doesn't trim the string before splitting
261
+ if the delimiter is omitted or a single space, e.g.:
262
+
263
+ ```ruby
264
+ " foo bar baz ".split # => ["foo", "bar", "baz"]
265
+ " foo bar baz ".split(" ") # => ["foo", "bar", "baz"]
266
+
267
+ ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
268
+ ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
269
+ ```
270
+
271
+ `String#split` omits the `nil` values of unmatched optional captures:
272
+
273
+ ```ruby
274
+ "foo:bar:baz".scan(/(:)|(-)/) # => [[":", nil], [":", nil]]
275
+ "foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
276
+ ```
277
+
278
+ StringSplitter preserves them by default (if `include_captures` is true, as it
279
+ is by default), though they can be omitted from spread captures by passing
280
+ `:compact` as the value of the `spread_captures` option:
281
+
282
+ ```ruby
283
+ s1 = StringSplitter.new(spread_captures: true)
284
+ s2 = StringSplitter.new(spread_captures: false)
285
+ s3 = StringSplitter.new(spread_captures: :compact)
286
+
287
+ s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
288
+ s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
289
+ s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
290
+ ```
291
+
292
+ # COMPATIBILITY
293
+
294
+ StringSplitter is tested and supported on all versions of Ruby [supported by
295
+ the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
296
+ currently, Ruby 2.5 and above.
297
+
180
298
  # VERSION
181
299
 
182
- 0.5.0
300
+ 0.7.2
183
301
 
184
302
  # SEE ALSO
185
303
 
@@ -197,7 +315,7 @@ ss.split(line, at: [1..5, 8])
197
315
 
198
316
  # COPYRIGHT AND LICENSE
199
317
 
200
- Copyright © 2018 by chocolateboy.
318
+ Copyright © 2018-2020 by chocolateboy.
201
319
 
202
320
  This is free software; you can redistribute it and/or modify it under the
203
- terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
321
+ terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).
@@ -1,53 +1,91 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'values'
3
+ require 'set'
4
+
5
+ require_relative 'string_splitter/split'
6
+ require_relative 'string_splitter/version'
4
7
 
5
8
  # This class extends the functionality of +String#split+ by:
6
9
  #
7
10
  # - providing full control over which splits are accepted or rejected
11
+ #
8
12
  # - adding support for splitting from right-to-left
9
- # - encapsulating splitting options/preferences in instances rather than trying to
10
- # cram them into overloaded method parameters
13
+ #
14
+ # - encapsulating splitting options/preferences in the splitter rather
15
+ # than trying to cram them into overloaded method parameters
11
16
  #
12
17
  # These enhancements allow splits to handle many cases that otherwise require bigger
13
- # guns e.g. regex matching or parsing.
18
+ # guns, e.g. regex matching or parsing.
19
+ #
20
+ # Implementation-wise, we split the string either with String#split, or with a custom
21
+ # scanner if the delimiter may contain captures (since String#split doesn't handle
22
+ # them correctly) and parse the resulting tokens into an array of Split objects with
23
+ # the following attributes:
24
+ #
25
+ # - captures: separator substrings captured by parentheses in the delimiter pattern
26
+ # - count: the number of splits
27
+ # - index: the 0-based index of the split in the array
28
+ # - lhs: the string to the left of the separator (back to the previous split candidate)
29
+ # - position: the 1-based index of the split in the array (alias: pos)
30
+ # - rhs: the string to the right of the separator (up to the next split candidate)
31
+ # - rindex: the 0-based index of the split relative to the end of the array
32
+ # - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
33
+ # - separator: the string matched by the delimiter pattern/string
34
+ #
14
35
  class StringSplitter
15
- ACCEPT_ALL = ->(_split) { true }
16
- DEFAULT_DELIMITER = /\s+/
17
- NO_SPLITS = []
18
-
19
- Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
20
- def position
21
- index + 1
22
- end
36
+ # terminology: the delimiter is what we provide and the separators are what we get
37
+ # back (if we capture them). e.g. for:
38
+ #
39
+ # ss.split("foo:bar::baz", /(\W+)/)
40
+ #
41
+ # the delimiter is /(\W)/ and the separators are ":" and "::"
23
42
 
24
- alias_method :pos, :position
43
+ ACCEPT_ALL = ->(_split) { true }
44
+ DEFAULT_DELIMITER = /\s+/.freeze
45
+ REMOVE = [].freeze
46
+
47
+ # simulate an enum. the value is returned by the case statement
48
+ # in the generated block if the positions match
49
+ module Action
50
+ SELECT = true
51
+ REJECT = false
25
52
  end
26
53
 
54
+ private_constant :Action
55
+
27
56
  def initialize(
28
57
  default_delimiter: DEFAULT_DELIMITER,
29
58
  include_captures: true,
30
- remove_empty: false,
59
+ remove_empty: false, # TODO remove this
60
+ remove_empty_fields: remove_empty,
31
61
  spread_captures: true
32
62
  )
33
63
  @default_delimiter = default_delimiter
34
64
  @include_captures = include_captures
35
- @remove_empty = remove_empty
65
+ @remove_empty_fields = remove_empty_fields
36
66
  @spread_captures = spread_captures
37
67
  end
38
68
 
39
- attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
69
+ attr_reader(
70
+ :default_delimiter,
71
+ :include_captures,
72
+ :remove_empty_fields,
73
+ :spread_captures
74
+ )
75
+
76
+ # TODO remove this
77
+ alias remove_empty remove_empty_fields
40
78
 
41
79
  def split(
42
80
  string,
43
81
  delimiter = @default_delimiter,
44
- at: nil,
82
+ at: nil, # alias for select
83
+ except: nil, # alias for reject
45
84
  select: at,
46
- exclude: nil,
47
- reject: exclude,
85
+ reject: except,
48
86
  &block
49
87
  )
50
- result, block, splits, count, index = split_common(
88
+ result, splits, count, accept = init(
51
89
  string: string,
52
90
  delimiter: delimiter,
53
91
  select: select,
@@ -55,27 +93,22 @@ class StringSplitter
55
93
  block: block
56
94
  )
57
95
 
58
- splits.each do |split|
59
- split = Split.with(split.merge({ index: (index += 1), count: count }))
60
- result << split.lhs if result.empty?
61
-
62
- if block.call(split)
63
- if @include_captures
64
- if @spread_captures
65
- result += split.captures
66
- else
67
- result << split.captures
68
- end
69
- end
96
+ return result unless splits
70
97
 
71
- result << split.rhs
98
+ result << splits.first.lhs
99
+
100
+ splits.each_with_index do |split, index|
101
+ split.update!(count: count, index: index)
102
+
103
+ if accept.call(split)
104
+ result << split.captures << split.rhs
72
105
  else
73
106
  # append the rhs
74
107
  result[-1] = result[-1] + split.separator + split.rhs
75
108
  end
76
109
  end
77
110
 
78
- result
111
+ render(result)
79
112
  end
80
113
 
81
114
  alias lsplit split
@@ -83,13 +116,13 @@ class StringSplitter
83
116
  def rsplit(
84
117
  string,
85
118
  delimiter = @default_delimiter,
86
- at: nil,
119
+ at: nil, # alias for select
120
+ except: nil, # alias for reject
87
121
  select: at,
88
- exclude: nil,
89
- reject: exclude,
122
+ reject: except,
90
123
  &block
91
124
  )
92
- result, block, splits, count, index = split_common(
125
+ result, splits, count, accept = init(
93
126
  string: string,
94
127
  delimiter: delimiter,
95
128
  select: select,
@@ -97,195 +130,262 @@ class StringSplitter
97
130
  block: block
98
131
  )
99
132
 
100
- splits.reverse!.each do |split|
101
- split = Split.with(split.merge({ index: (index += 1), count: count }))
102
- result.unshift(split.rhs) if result.empty?
103
-
104
- if block.call(split)
105
- if @include_captures
106
- if @spread_captures
107
- result = split.captures + result
108
- else
109
- result.unshift(split.captures)
110
- end
111
- end
133
+ return result unless splits
134
+
135
+ result.unshift(splits.last.rhs)
112
136
 
113
- result.unshift(split.lhs)
137
+ splits.reverse_each.with_index do |split, index|
138
+ split.update!(count: count, index: index)
139
+
140
+ if accept.call(split)
141
+ # [lhs + captures] + result
142
+ result.unshift(split.lhs, split.captures)
114
143
  else
115
144
  # prepend the lhs
116
145
  result[0] = split.lhs + split.separator + result[0]
117
146
  end
118
147
  end
119
148
 
120
- result
149
+ render(result)
121
150
  end
122
151
 
123
152
  private
124
153
 
125
- def splits_for(parts, ncaptures)
126
- result = []
127
- splits = []
128
-
129
- until parts.empty?
130
- lhs = parts.shift
131
- separator = parts.shift
132
- captures = parts.shift(ncaptures)
133
- rhs = parts.length == 1 ? parts.shift : parts.first
134
-
135
- if @remove_empty && (lhs.empty? || rhs.empty?)
136
- if lhs.empty? && rhs.empty?
137
- # do nothing
138
- elsif parts.empty? # last split
139
- result << (!lhs.empty? ? lhs : rhs) if splits.empty?
140
- elsif rhs.empty?
141
- # replace the empty rhs with the non-empty lhs
142
- parts[0] = lhs
143
- end
154
+ # initialisation common to +split+ and +rsplit+
155
+ #
156
+ # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
157
+ # the following fields:
158
+ #
159
+ # - result: the array of separated strings to return from +split+ or +rsplit+.
160
+ # if the splits array is empty, the caller returns this array immediately
161
+ # without any further processing
162
+ #
163
+ # - splits: an array of hashes containing the lhs, rhs, separator and captured
164
+ # separator substrings for each split
165
+ #
166
+ # - count: the number of splits
167
+ #
168
+ # - accept: a proc whose return value determines whether each split should be
169
+ # accepted (true) or rejected (false)
170
+ #
171
+ def init(string:, delimiter:, select:, reject:, block:)
172
+ return [[]] if string.empty?
173
+
174
+ unless block
175
+ if reject
176
+ positions = reject
177
+ action = Action::REJECT
178
+ elsif select
179
+ positions = select
180
+ action = Action::SELECT
181
+ else
182
+ block = ACCEPT_ALL
183
+ end
184
+ end
185
+
186
+ # use String#split if we can
187
+ #
188
+ # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
189
+ # on TruffleRuby
144
190
 
145
- next
191
+ if delimiter.is_a?(String)
192
+ limit = -1
193
+
194
+ if delimiter == ' '
195
+ delimiter = / / # don't trim
196
+ elsif delimiter.empty?
197
+ limit = 0 # remove the trailing empty string
146
198
  end
147
199
 
148
- splits << {
149
- lhs: lhs,
150
- rhs: rhs,
151
- separator: separator,
152
- captures: captures,
153
- }
154
- end
200
+ result = string.split(delimiter, limit)
155
201
 
156
- [result, splits]
157
- end
202
+ return [result] if result.length == 1 # delimiter not found: no splits
158
203
 
159
- # setup common to both split methods
160
- def split_common(string:, delimiter:, select:, reject:, block:)
161
- unless (match = string.match(delimiter))
162
- result = (@remove_empty && string.empty?) ? [] : [string]
163
- return [result, block, NO_SPLITS, 0, -1]
164
- end
204
+ if block == ACCEPT_ALL # return the (2 or more) fields
205
+ result = result.reject(&:empty?) if @remove_empty_fields
206
+ return [result]
207
+ end
165
208
 
166
- select = Array(select)
167
- reject = Array(reject)
209
+ splits = []
168
210
 
169
- if !reject.empty?
170
- positions = reject
171
- action = :reject
172
- elsif !select.empty?
173
- positions = select
174
- action = :select
211
+ result.each_cons(2) do |lhs, rhs| # 2 or more fields
212
+ splits << Split.new(
213
+ captures: [],
214
+ lhs: lhs,
215
+ rhs: rhs,
216
+ separator: delimiter
217
+ )
218
+ end
219
+ elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
220
+ # non-empty separators so -1 is safe
221
+
222
+ # XXX String#split with block was introduced in Ruby 2.6:
223
+ #
224
+ # - https://rubyreferences.github.io/rubychanges/2.6.html#stringsplit-with-block
225
+ #
226
+ # rather than sniffing, we'll just use the compatible version for now
227
+ #
228
+ # if @remove_empty_fields
229
+ # result = []
230
+ #
231
+ # string.split(delimiter, -1) do |field|
232
+ # result << field unless field.empty?
233
+ # end
234
+ # else
235
+ # result = string.split(delimiter, -1)
236
+ # end
237
+
238
+ result = string.split(delimiter, -1)
239
+ result = result.reject(&:empty?) if @remove_empty_fields
240
+ return [result]
241
+ else
242
+ splits = parse(string, delimiter)
175
243
  end
176
244
 
177
- ncaptures = match.captures.length
178
- delimiter = Regexp.quote(delimiter) if delimiter.is_a?(String)
179
- delimiter = increment_backrefs(delimiter, ncaptures)
180
- parts = string.split(/(#{delimiter})/, -1)
181
- remove_trailing_empty_field!(parts, ncaptures)
182
- result, splits = splits_for(parts, ncaptures)
183
245
  count = splits.length
184
- block ||= positions ? match_positions(positions, action, count) : ACCEPT_ALL
185
246
 
186
- [result, block, splits, count, -1]
247
+ return [[string]] if count.zero?
248
+
249
+ block ||= compile(positions, action, count)
250
+ [[], splits, count, block]
187
251
  end
188
252
 
189
- # increment back-references so they remain valid when the outer capture
190
- # is added.
191
- #
192
- # e.g. to split on:
193
- #
194
- # - <foo-comment> ... </foo-comment>
195
- # - <bar-comment> ... </bar-comment>
196
- #
197
- # etc.
253
+ def render(values)
254
+ values.flat_map do |value|
255
+ if value.is_a?(String)
256
+ value.empty? && @remove_empty_fields ? REMOVE : [value]
257
+ elsif @include_captures
258
+ if @spread_captures
259
+ # TODO make sure compact can return a Capture
260
+ @spread_captures == :compact ? value.compact : value
261
+ elsif value.empty?
262
+ # we expose non-captures (string delimiters or regexps with no
263
+ # captures) as empty arrays inside the block, so the type is
264
+ # consistent, but it doesn't make sense to keep them in the
265
+ # result
266
+ REMOVE
267
+ else
268
+ [value]
269
+ end
270
+ else
271
+ REMOVE
272
+ end
273
+ end
274
+ end
275
+
276
+ # takes a string and a delimiter pattern (regex or string) and splits it along
277
+ # the delimiter, returning an array of objects (hashes) representing each split.
278
+ # e.g. for:
198
279
  #
199
- # before:
280
+ # parse("foo:bar:baz:quux", ":")
200
281
  #
201
- # %r| <(\w+-comment)> [^<]* </\1-comment> |x
282
+ # we return:
202
283
  #
203
- # after:
284
+ # [
285
+ # { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
286
+ # { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
287
+ # { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
288
+ # ]
204
289
  #
205
- # %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
290
+ def parse(string, delimiter)
291
+ # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
292
+ result = []
293
+ start = 0
206
294
 
207
- def increment_backrefs(delimiter, ncaptures)
208
- if delimiter.is_a?(Regexp) && ncaptures > 0
209
- delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
210
- match = Regexp.last_match
211
- match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
212
- end
295
+ # we don't use the argument passed to the +scan+ block here because it's a
296
+ # string (the separator) if there are no captures, rather than an empty
297
+ # array. we use match.captures instead to get the array
298
+ string.scan(delimiter) do
299
+ match = Regexp.last_match
300
+ index, after = match.offset(0)
301
+ separator = match[0]
302
+
303
+ # ignore empty separators at the beginning and/or end of the string
304
+ next if separator.empty? && (index.zero? || after == string.length)
305
+
306
+ lhs = string.slice(start, index - start)
307
+ result.last.rhs = lhs unless result.empty?
308
+
309
+ # this is correct for the last/only match, but gets updated to the next
310
+ # match's lhs for other matches
311
+ rhs = match.post_match
312
+
313
+ # captures = (has_names ? Captures.new(match) : match.captures)
314
+
315
+ result << Split.new(
316
+ captures: match.captures,
317
+ lhs: lhs,
318
+ rhs: rhs,
319
+ separator: separator
320
+ )
321
+
322
+ # advance the start index (the start of the next lhs) to the position
323
+ # after the last character of the separator
324
+ start = after
213
325
  end
214
326
 
215
- delimiter
327
+ result
216
328
  end
217
329
 
218
- # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
219
- # on an empty string/pattern without removing trailing empty fields e.g.:
330
+ # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
331
+ # on the action) the supplied positions
220
332
  #
221
- # "foobar".split("", -1)
222
- # "foobar".split(//, -1)
223
- # # => ["f", "o", "o", "b", "a", "r", ""]
333
+ # positions are preprocessed to support negative indices, infinite ranges, and
334
+ # descending ranges, e.g.:
224
335
  #
225
- # "foobar".split(/()/, -1)
226
- # # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
336
+ # ss.split("foo:bar:baz:quux", ":", at: -1)
227
337
  #
228
- # "foobar".split(/(())/, -1)
229
- # # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
338
+ # translates to:
230
339
  #
231
- # *there is no such thing as an empty field whose separator is empty*, so
232
- # if String#split's result ends with an empty separator, 0 or more (empty)
233
- # captures and an empty field, we can safely remove them.
234
-
235
- def remove_trailing_empty_field!(parts, ncaptures)
236
- # the trailing field is at index -1. if there are 0 captures, the separator
237
- # is at -2:
238
- #
239
- # [empty_separator, empty_field]
240
- #
241
- # if there is 1 capture, the separator is at -3:
242
- #
243
- # [empty_separator, capture, empty_field]
340
+ # ss.split("foo:bar:baz:quux", ":", at: 3)
341
+ #
342
+ # and
343
+ #
344
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
345
+ #
346
+ # translates to:
347
+ #
348
+ # ss.split("foo:bar:baz:quux", ":", at: 6..8)
349
+ #
350
+ def compile(positions, action, count)
351
+ # XXX note: we don't use modulo, because we don't want
352
+ # out-of-bounds indices to silently work, e.g. we don't want:
244
353
  #
245
- # etc. therefore we find the separator by walking back
354
+ # ss.split("foo:bar:baz:quux", ":", at: -42)
246
355
  #
247
- # 1 (empty field)
248
- # + ncaptures
249
- # + 1 (separator)
356
+ # to mysteriously match when the index/position is 0/1
250
357
  #
251
- # steps from the end of the array i.e. ncaptures + 2
252
- count = ncaptures + 2
253
- separator_index = count * -1
254
-
255
- return unless parts[-1].empty? && parts[separator_index].empty?
256
-
257
- # drop the empty separator, the (empty) captures, and the trailing empty field
258
- parts.pop(count)
259
- end
260
-
261
- def match_positions(positions, action, nsplits)
262
- positions = Array(positions).map do |position|
263
- if position.is_a?(Integer) && position.negative?
264
- # translate negative indices to 1-based non-negative indices e.g:
265
- #
266
- # ss.split("foo:bar:baz:quux", ":", at: -1)
267
- #
268
- # translates to:
269
- #
270
- # ss.split("foo:bar:baz:quux", ":", at: 3)
271
- #
272
- # XXX note: we don't use modulo, because we don't want
273
- # out-of-bounds indices to silently work e.g. we don't want:
274
- #
275
- # ss.split("foo:bar:baz:quux", ":", -42)
276
- #
277
- # to mysteriously match when the position is 2
278
-
279
- nsplits + 1 + position
358
+ resolve = ->(int) { int.negative? ? count + 1 + int : int }
359
+
360
+ # don't use Array(...) to wrap these as we don't want to convert ranges
361
+ positions = positions.is_a?(Array) ? positions : [positions]
362
+
363
+ positions = positions.map do |position|
364
+ if position.is_a?(Integer)
365
+ resolve[position]
366
+ elsif position.is_a?(Range)
367
+ rbegin = position.begin
368
+ rend = position.end
369
+ rexc = position.exclude_end?
370
+
371
+ if rbegin.nil?
372
+ Range.new(1, resolve[rend], rexc)
373
+ elsif rend.nil?
374
+ Range.new(resolve[rbegin], count, rexc)
375
+ elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
376
+ from = resolve[rbegin]
377
+ to = resolve[rend]
378
+ to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
379
+ else
380
+ position
381
+ end
382
+ elsif position.is_a?(Set)
383
+ position.map { |it| resolve[it] }.to_set
280
384
  else
281
385
  position
282
386
  end
283
387
  end
284
388
 
285
- match = action == :select
286
-
287
- lambda do |split|
288
- case split.position when *positions then match else !match end
289
- end
389
+ ->(split) { case split.position when *positions then action else !action end }
290
390
  end
291
391
  end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ class StringSplitter
4
+ class Split
5
+ attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
6
+ attr_writer :rhs
7
+ alias pos position
8
+
9
+ def initialize(captures:, lhs:, rhs:, separator:)
10
+ @captures = captures
11
+ @lhs = lhs
12
+ @rhs = rhs
13
+ @separator = separator
14
+ end
15
+
16
+ # 0-based index relative to the end of the array, e.g. for 5 items:
17
+ #
18
+ # index | rindex
19
+ # ------|-------
20
+ # 0 | 4
21
+ # 1 | 3
22
+ # 2 | 2
23
+ # 3 | 1
24
+ # 4 | 0
25
+ def rindex
26
+ @count - @position
27
+ end
28
+
29
+ # 1-based position relative to the end of the array, e.g. for 5 items:
30
+ #
31
+ # position | rposition
32
+ # ----------|----------
33
+ # 1 | 5
34
+ # 2 | 4
35
+ # 3 | 3
36
+ # 4 | 2
37
+ # 5 | 1
38
+ def rposition
39
+ @count + 1 - @position
40
+ end
41
+
42
+ alias rpos rposition
43
+
44
+ def update!(count:, index:)
45
+ @count = count
46
+ @index = index
47
+ @position = index + 1
48
+ freeze
49
+ end
50
+ end
51
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.5.0'
4
+ VERSION = '0.7.2'
5
5
  end
metadata CHANGED
@@ -1,71 +1,57 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.7.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-06-25 00:00:00.000000000 Z
11
+ date: 2020-08-22 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: values
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '1.8'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - "~>"
25
- - !ruby/object:Gem::Version
26
- version: '1.8'
27
13
  - !ruby/object:Gem::Dependency
28
14
  name: bundler
29
15
  requirement: !ruby/object:Gem::Requirement
30
16
  requirements:
31
17
  - - "~>"
32
18
  - !ruby/object:Gem::Version
33
- version: '1.16'
19
+ version: '2.1'
34
20
  type: :development
35
21
  prerelease: false
36
22
  version_requirements: !ruby/object:Gem::Requirement
37
23
  requirements:
38
24
  - - "~>"
39
25
  - !ruby/object:Gem::Version
40
- version: '1.16'
26
+ version: '2.1'
41
27
  - !ruby/object:Gem::Dependency
42
28
  name: minitest
43
29
  requirement: !ruby/object:Gem::Requirement
44
30
  requirements:
45
31
  - - "~>"
46
32
  - !ruby/object:Gem::Version
47
- version: '5.11'
33
+ version: '5.0'
48
34
  type: :development
49
35
  prerelease: false
50
36
  version_requirements: !ruby/object:Gem::Requirement
51
37
  requirements:
52
38
  - - "~>"
53
39
  - !ruby/object:Gem::Version
54
- version: '5.11'
40
+ version: '5.0'
55
41
  - !ruby/object:Gem::Dependency
56
42
  name: minitest-power_assert
57
43
  requirement: !ruby/object:Gem::Requirement
58
44
  requirements:
59
45
  - - "~>"
60
46
  - !ruby/object:Gem::Version
61
- version: 0.3.0
47
+ version: '0.3'
62
48
  type: :development
63
49
  prerelease: false
64
50
  version_requirements: !ruby/object:Gem::Requirement
65
51
  requirements:
66
52
  - - "~>"
67
53
  - !ruby/object:Gem::Version
68
- version: 0.3.0
54
+ version: '0.3'
69
55
  - !ruby/object:Gem::Dependency
70
56
  name: minitest-reporters
71
57
  requirement: !ruby/object:Gem::Requirement
@@ -86,29 +72,15 @@ dependencies:
86
72
  requirements:
87
73
  - - "~>"
88
74
  - !ruby/object:Gem::Version
89
- version: '10.0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - "~>"
95
- - !ruby/object:Gem::Version
96
- version: '10.0'
97
- - !ruby/object:Gem::Dependency
98
- name: rubocop
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - "~>"
102
- - !ruby/object:Gem::Version
103
- version: 0.54.0
75
+ version: '13.0'
104
76
  type: :development
105
77
  prerelease: false
106
78
  version_requirements: !ruby/object:Gem::Requirement
107
79
  requirements:
108
80
  - - "~>"
109
81
  - !ruby/object:Gem::Version
110
- version: 0.54.0
111
- description:
82
+ version: '13.0'
83
+ description:
112
84
  email: chocolate@cpan.org
113
85
  executables: []
114
86
  extensions: []
@@ -118,6 +90,7 @@ files:
118
90
  - LICENSE.md
119
91
  - README.md
120
92
  - lib/string_splitter.rb
93
+ - lib/string_splitter/split.rb
121
94
  - lib/string_splitter/version.rb
122
95
  homepage: https://github.com/chocolateboy/string_splitter
123
96
  licenses:
@@ -127,7 +100,7 @@ metadata:
127
100
  bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
128
101
  changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
129
102
  source_code_uri: https://github.com/chocolateboy/string_splitter
130
- post_install_message:
103
+ post_install_message:
131
104
  rdoc_options: []
132
105
  require_paths:
133
106
  - lib
@@ -135,16 +108,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
135
108
  requirements:
136
109
  - - ">="
137
110
  - !ruby/object:Gem::Version
138
- version: '0'
111
+ version: '2.3'
139
112
  required_rubygems_version: !ruby/object:Gem::Requirement
140
113
  requirements:
141
114
  - - ">="
142
115
  - !ruby/object:Gem::Version
143
116
  version: '0'
144
117
  requirements: []
145
- rubyforge_project:
146
- rubygems_version: 2.7.7
147
- signing_key:
118
+ rubygems_version: 3.1.4
119
+ signing_key:
148
120
  specification_version: 4
149
121
  summary: String#split on steroids
150
122
  test_files: []