string_splitter 0.5.1 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 582dd9d8bae0421a49348bf0ccade081a4cc448e8e27943dcb67004b1b684f6d
4
- data.tar.gz: 10990476dec6bf7edc909cd8558d0404fd9295820238ac527ebf3294454815a2
3
+ metadata.gz: 9d97ccb956fe51694359cdb0d3a997d6574de088bac6ed5a8e572f92bb5ed54a
4
+ data.tar.gz: 845cefeb5efd5d01baa45759cb05ff7ae5e9a457c1f148b340bb24c038bd259e
5
5
  SHA512:
6
- metadata.gz: 666914aa76ca9f425dc7ef60b0110dbb1239fad3ae44ac49ba0ee59531b93d800cb2ca475c524ee359dbde4b21a0b97a89fa3f6910bb78d1b6737729ffddc1a9
7
- data.tar.gz: 4c9522bcc4e858a98e4b9c79abe2ecf845b0a8209479b802637936215c0a5c02e9c0853f103779618636774ec5ce55a7157ea8144eaadaa97f918a94e062d4e9
6
+ metadata.gz: 7a935a6e0f3434801dcae6a32575779e1d2eb706f8f208087a208e7fdba39ac5b49928f8b7617aec60493a8db5988a013028650f8b2ced01fadb620bfd4c77e5
7
+ data.tar.gz: d76c18a283c1e113c8bffb73b813eb6074481faa7ea339811dc9a7424a5e24fdc3efbe9afa941459e566cde8271c3cd19a97e3a37a8cf90d36a65a7bf8fd6dcf
@@ -1,37 +1,74 @@
1
+ ## 0.6.0 - 2020-08-20
2
+
3
+ #### Breaking Changes
4
+
5
+ - `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
6
+ unlike Ruby's `String#split` (but like Crystal's), the former no longer
7
+ strips the string before splitting
8
+ - rename the `remove_empty` option `remove_empty_fields`
9
+ - rename the `exclude` option `except` (alias for `reject`)
10
+
11
+ #### Fixes
12
+
13
+ - correctly handle backreferences in delimiter patterns
14
+
15
+ #### Features
16
+
17
+ - add support for descending, negative, and infinite ranges,
18
+ e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
19
+
1
20
  ## 0.5.1 - 2018-07-01
2
21
 
22
+ #### Changes
23
+
3
24
  - set StringSplitter::VERSION when `string_splitter.rb` is loaded
4
- - doc tweaks
5
25
 
6
26
  ## 0.5.0 - 2018-06-26
7
27
 
28
+ #### Fixes
29
+
8
30
  - don't treat string delimiters as patterns
31
+
32
+ #### Features
33
+
9
34
  - add a `reject`/`exclude` option which rejects splits at the specified positions
10
35
  - add a `select` alias for `at`
11
36
 
12
37
  ## 0.4.0 - 2018-06-24
13
38
 
14
- - **breaking change**: remove the `offset` alias for `split.index`
39
+ #### Breaking Changes
40
+
41
+ - remove the `offset` alias for `split.index`
15
42
 
16
43
  ## 0.3.1 - 2018-06-24
17
44
 
18
- - remove trailing empty field when the separator is empty ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
45
+ #### Fixes
46
+
47
+ - remove trailing empty field when the separator is empty
48
+ ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
19
49
 
20
50
  ## 0.3.0 - 2018-06-23
21
51
 
22
- - **breaking change**: rename the `default_separator` option to `default_delimiter`
23
- - to avoid ambiguity in the code, refer to the input pattern/string as the
24
- "delimiter" and the matched string as the "separator"
52
+ #### Breaking Changes
53
+
54
+ - rename the `default_separator` option `default_delimiter`
25
55
 
26
56
  ## 0.2.0 - 2018-06-22
27
57
 
28
- - **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
29
- (AKA `pos`) as the 1-based accessor
58
+ #### Breaking Changes
59
+
60
+ - make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
61
+ 1-based accessor
30
62
 
31
63
  ## 0.1.0 - 2018-06-22
32
64
 
33
- - **breaking change**: the block now takes a single `split` object with an
34
- `index` accessor, rather than seperate `index` and `split` arguments
65
+ #### Breaking Changes
66
+
67
+ - the block now takes a single `split` object with an `index` accessor, rather
68
+ than seperate `index` and `split` arguments
69
+
70
+ #### Features
71
+
35
72
  - add support for negative indices in the value supplied to the `at` option
36
73
  - add a `count` field to the split object containing the total number of splits
37
74
 
data/README.md CHANGED
@@ -3,14 +3,15 @@
3
3
  [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
4
4
  [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
5
5
 
6
- <!-- START doctoc generated TOC please keep comment here to allow auto update -->
7
- <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
6
+ <!-- toc -->
8
7
 
9
8
  - [NAME](#name)
10
9
  - [INSTALLATION](#installation)
11
10
  - [SYNOPSIS](#synopsis)
12
11
  - [DESCRIPTION](#description)
13
12
  - [WHY?](#why)
13
+ - [CAVEATS](#caveats)
14
+ - [Differences from String#split](#differences-from-string%23split)
14
15
  - [COMPATIBILITY](#compatibility)
15
16
  - [VERSION](#version)
16
17
  - [SEE ALSO](#see-also)
@@ -19,7 +20,7 @@
19
20
  - [AUTHOR](#author)
20
21
  - [COPYRIGHT AND LICENSE](#copyright-and-license)
21
22
 
22
- <!-- END doctoc generated TOC please keep comment here to allow auto update -->
23
+ <!-- tocstop -->
23
24
 
24
25
  # NAME
25
26
 
@@ -42,16 +43,28 @@ ss = StringSplitter.new
42
43
  **Same as `String#split`**
43
44
 
44
45
  ```ruby
45
- ss.split("foo bar baz quux")
46
- ss.split("foo bar baz quux", " ")
47
- ss.split("foo bar baz quux", /\s+/)
48
- # => ["foo", "bar", "baz", "quux"]
46
+ ss.split("foo bar baz")
47
+ ss.split(" foo bar baz ")
48
+ # => ["foo", "bar", "baz"]
49
+ ```
50
+
51
+ ```ruby
52
+ ss.split("foo", "")
53
+ ss.split("foo", //)
54
+ # => ["f", "o", "o"]
55
+ ```
56
+
57
+ ```ruby
58
+ ss.split("", "...")
59
+ ss.split("", /.../)
60
+ # => []
49
61
  ```
50
62
 
51
63
  **Split at the first delimiter**
52
64
 
53
65
  ```ruby
54
66
  ss.split("foo:bar:baz:quux", ":", at: 1)
67
+ ss.split("foo:bar:baz:quux", ":", select: 1)
55
68
  # => ["foo", "bar:baz:quux"]
56
69
  ```
57
70
 
@@ -65,8 +78,16 @@ ss.split("foo:bar:baz:quux", ":", at: -1)
65
78
  **Split at multiple delimiter positions**
66
79
 
67
80
  ```ruby
68
- ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
69
- # => ["1", "2", "3", "4:5:6:7", "8:9"]
81
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
82
+ # => ["1", "2", "3", "4:5:6:7:8", "9"]
83
+ ```
84
+
85
+ **Split at all but the first and last delimiters**
86
+
87
+ ```ruby
88
+ ss.split("1:2:3:4:5:6", ":", except: [1, -1])
89
+ ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
90
+ # => ["1:2", "3", "4", "5:6"]
70
91
  ```
71
92
 
72
93
  **Split from the right**
@@ -75,44 +96,79 @@ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
75
96
  ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
76
97
  # => ["1:2:3:4", "5:6", "7", "8", "9"]
77
98
  ```
99
+
100
+ **Split with negative, descending, and infinite ranges**
101
+
102
+ ```ruby
103
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
104
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [4...])
105
+ # => ["1:2:3:4", "5", "6", "7", "8:9"]
106
+ ```
107
+
108
+ ```ruby
109
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
110
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [..-3])
111
+ # => ["1", "2", "3", "4", "5", "6", "7:8:9"]
112
+ ```
113
+
114
+ ```ruby
115
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
116
+ # => ["1", "2:3", "4", "5", "6:7", "8", "9"]
117
+ ```
118
+
78
119
  **Full control via a block**
79
120
 
80
121
  ```ruby
81
- result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
82
- split.index > 0 && split.lhs == split.rhs
122
+ result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
123
+ split.pos % 2 == 0
83
124
  end
84
- # => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
125
+ # => ["1:2", "3:4", "5:6", "7:8"]
126
+ ```
127
+
128
+ ```ruby
129
+ string = "banana".chars.sort.join # "aaabnn"
130
+
131
+ ss.split(string, "") do |split|
132
+ split.rhs != split.lhs
133
+ end
134
+ # => ["aaa", "b", "nn"]
85
135
  ```
86
136
 
87
137
  # DESCRIPTION
88
138
 
89
- Many languages have built-in `split` functions/methods for strings. They behave similarly
90
- (notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
91
- and handle a few common cases e.g.:
139
+ Many languages have built-in `split` functions/methods for strings. They behave
140
+ similarly (notwithstanding the occasional
141
+ [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
142
+ handle a few common cases e.g.:
92
143
 
93
144
  * limiting the number of splits
94
145
  * including the separator(s) in the results
95
146
  * removing (some) empty fields
96
147
 
97
- But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
98
- achieving the desired results can be tricky. For instance, while `String#split` removes empty
99
- trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
100
- cramped API means there's no way to e.g. combine a limit (positive integer) with the option
101
- to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
148
+ But, because the API is squeezed into two overloaded parameters (the delimiter
149
+ and the limit), achieving the desired results can be tricky. For instance,
150
+ while `String#split` removes empty trailing fields (by default), it provides no
151
+ way to remove *all* empty fields. Likewise, the cramped API means there's no
152
+ way to e.g. combine a limit (positive integer) with the option to preserve
153
+ empty fields (negative integer), or use backreferences in a delimiter pattern
102
154
  without including its captured subexpressions in the result.
103
155
 
104
- If `split` was being written from scratch, without the baggage of its legacy API,
105
- it's possible that some of these options would be made explicit rather than overloading
106
- the parameters. And, indeed, this is possible in some implementations,
107
- e.g. in Crystal:
156
+ If `split` was being written from scratch, without the baggage of its legacy
157
+ API, it's possible that some of these options would be made explicit rather
158
+ than overloading the parameters. And, indeed, this is possible in some
159
+ implementations, e.g. in Crystal:
108
160
 
109
161
  ```ruby
110
- ":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
111
- ":foo:bar:baz:".split(":", remove_empty: true) # => ["foo", "bar", "baz"]
162
+ ":foo:bar:baz:".split(":", remove_empty: false)
163
+ # => ["", "foo", "bar", "baz", ""]
164
+
165
+ ":foo:bar:baz:".split(":", remove_empty: true)
166
+ # => ["foo", "bar", "baz"]
112
167
  ````
113
168
 
114
- StringSplitter takes this one step further by moving the configuration out of the method altogether
115
- and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
169
+ StringSplitter takes this one step further by moving the configuration out of
170
+ the method altogether and delegating the strategy — i.e. which splits should be
171
+ accepted or rejected — to a block:
116
172
 
117
173
  ```ruby
118
174
  ss = StringSplitter.new
@@ -120,22 +176,28 @@ ss = StringSplitter.new
120
176
  ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
121
177
  # => ["foo", "bar:baz"]
122
178
 
123
- ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
124
- # => ["foo:bar", "baz"]
179
+ ss.split("foo:bar:baz:quux", ":") do |split|
180
+ split.position == 1 || split.position == 3
181
+ end
182
+ # => ["foo", "bar:baz", "quux"]
125
183
  ```
126
184
 
127
- As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
185
+ As a shortcut, the common case of splitting on delimiters at one or more
186
+ positions is supported by an option:
128
187
 
129
188
  ```ruby
130
- ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
189
+ ss.split("foo:bar:baz:quux", ":", at: [1, -1])
190
+ # => ["foo", "bar:baz", "quux"]
131
191
  ```
132
192
 
133
193
  # WHY?
134
194
 
135
- I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
195
+ I wanted to split semi-structured output into fields without having to resort
196
+ to a regex or a full-blown parser.
136
197
 
137
- As an example, the nominally unstructured output of many Unix commands is often formatted in a way
138
- that's tantalizingly close to being [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
198
+ As an example, the nominally unstructured output of many Unix commands is often
199
+ formatted in a way that's tantalizingly close to being
200
+ [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
139
201
  apart from a few pesky exceptions e.g.:
140
202
 
141
203
  ```bash
@@ -148,8 +210,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
148
210
  -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
149
211
  ```
150
212
 
151
- These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
152
- date (columns 6-8) i.e.:
213
+ These lines can *almost* be parsed into an array of fields by splitting them on
214
+ whitespace. The exception is the date (columns 6-8) i.e.:
153
215
 
154
216
  ```ruby
155
217
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -174,13 +236,14 @@ One way to work around this is to parse the whole line e.g.:
174
236
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
175
237
  ```
176
238
 
177
- But that requires us to specify *everything*. What we really want is a version of `split`
178
- which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
179
- splits are accepted, rather than being restricted to the single, baked-in strategy provided
180
- by the `limit` parameter.
239
+ But that requires us to specify *everything*. What we really want is a version
240
+ of `split` which allows us to veto splitting for the 6th and 7th delimiters
241
+ (and to stop after the 8th delimiter) i.e. control over which splits are
242
+ accepted, rather than being restricted to the single, baked-in strategy
243
+ provided by the `limit` parameter.
181
244
 
182
- By providing a simple way to accept or reject each split, StringSplitter makes cases like
183
- this easy to handle, either via a block:
245
+ By providing a simple way to accept or reject each split, StringSplitter makes
246
+ cases like this easy to handle, either via a block:
184
247
 
185
248
  ```ruby
186
249
  ss.split(line) do |split|
@@ -196,14 +259,42 @@ ss.split(line, at: [1..5, 8])
196
259
  # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
197
260
  ```
198
261
 
262
+ # CAVEATS
263
+
264
+ ## Differences from String#split
265
+
266
+ StringSplitter shares `String#split`'s behavior of trimming the string before
267
+ splitting if the delimiter is omitted, e.g.:
268
+
269
+ ```ruby
270
+ " foo bar baz ".split # => ["foo", "bar", "baz"]
271
+ ss.split(" foo bar baz ") # => ["foo", "bar", "baz"]
272
+ ```
273
+
274
+ However, unlike `String#split`, this doesn't also apply if a delimiter of `" "`
275
+ is supplied, e.g.:
276
+
277
+ ```ruby
278
+ " foo bar baz ".split(" ") # => ["foo", "bar", "baz"]
279
+ ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
280
+ ```
281
+
282
+ It also doesn't apply if a custom default-delimiter is defined:
283
+
284
+ ```ruby
285
+ ss = StringSplitter.new(default_delimiter: /\s+/)
286
+ ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
287
+ ```
288
+
199
289
  # COMPATIBILITY
200
290
 
201
- StringSplitter is tested and supported on all versions of Ruby [supported by the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/),
202
- i.e., currently, Ruby 2.3 and above.
291
+ StringSplitter is tested and supported on all versions of Ruby [supported by
292
+ the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
293
+ currently, Ruby 2.5 and above.
203
294
 
204
295
  # VERSION
205
296
 
206
- 0.5.1
297
+ 0.6.0
207
298
 
208
299
  # SEE ALSO
209
300
 
@@ -221,8 +312,7 @@ i.e., currently, Ruby 2.3 and above.
221
312
 
222
313
  # COPYRIGHT AND LICENSE
223
314
 
224
- Copyright © 2018 by chocolateboy.
315
+ Copyright © 2018-2020 by chocolateboy.
225
316
 
226
317
  This is free software; you can redistribute it and/or modify it under the
227
- terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
228
-
318
+ terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).
@@ -1,21 +1,45 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'set'
3
4
  require 'values'
4
5
  require_relative 'string_splitter/version'
5
6
 
6
7
  # This class extends the functionality of +String#split+ by:
7
8
  #
8
9
  # - providing full control over which splits are accepted or rejected
10
+ #
9
11
  # - adding support for splitting from right-to-left
12
+ #
10
13
  # - encapsulating splitting options/preferences in the splitter rather
11
14
  # than trying to cram them into overloaded method parameters
12
15
  #
13
16
  # These enhancements allow splits to handle many cases that otherwise require bigger
14
- # guns e.g. regex matching or parsing.
17
+ # guns, e.g. regex matching or parsing.
18
+ #
19
+ # Implementation-wise, we effectively use the built-in +String#split+ method as a
20
+ # tokenizer, and parse the resulting tokens into an array of Split objects with the
21
+ # following fields:
22
+ #
23
+ # - captures: separator substrings captured by parentheses in the delimiter pattern
24
+ # - count: the number of splits
25
+ # - index: the 0-based index of the split in the array
26
+ # - lhs: the string to the left of the separator (back to the previous split candidate)
27
+ # - position: the 1-based index of the split in the array (alias: pos)
28
+ # - rhs: the string to the right of the separator (up to the next split candidate)
29
+ # - rindex: the 0-based index of the split relative to the end of the array
30
+ # - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
31
+ # - separator: the string matched by the delimiter pattern/string
32
+ #
15
33
  class StringSplitter
34
+ # terminology: the delimiter is what we provide and the separators are what we get
35
+ # back (if we capture them). e.g. for:
36
+ #
37
+ # ss.split("foo:bar::baz", /(\W+)/)
38
+ #
39
+ # the delimiter is /(\W)/ and the separators are ":" and "::"
40
+
16
41
  ACCEPT_ALL = ->(_split) { true }
17
- DEFAULT_DELIMITER = /\s+/
18
- NO_SPLITS = []
42
+ DEFAULT_DELIMITER = /\s+/.freeze
19
43
 
20
44
  Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
21
45
  def position
@@ -23,32 +47,78 @@ class StringSplitter
23
47
  end
24
48
 
25
49
  alias_method :pos, :position
50
+
51
+ # 0-based index relative to the end of the array, e.g. for 5 items:
52
+ #
53
+ # index | rindex
54
+ # ------|-------
55
+ # 0 | 4
56
+ # 1 | 3
57
+ # 2 | 2
58
+ # 3 | 1
59
+ # 4 | 0
60
+ def rindex
61
+ count - position
62
+ end
63
+
64
+ # 1-based position relative to the end of the array, e.g. for 5 items:
65
+ #
66
+ # position | rposition
67
+ # ----------|----------
68
+ # 1 | 5
69
+ # 2 | 4
70
+ # 3 | 3
71
+ # 4 | 2
72
+ # 5 | 1
73
+ def rposition
74
+ count + 1 - position
75
+ end
76
+
77
+ alias_method :rpos, :rposition
78
+ end
79
+
80
+ # simulate an enum. the value is returned by the case statement
81
+ # in the generated block if the positions match
82
+ module Action
83
+ SELECT = true
84
+ REJECT = false
26
85
  end
27
86
 
87
+ private_constant :Action
88
+
28
89
  def initialize(
29
90
  default_delimiter: DEFAULT_DELIMITER,
30
91
  include_captures: true,
31
- remove_empty: false,
92
+ remove_empty: false, # TODO remove this
93
+ remove_empty_fields: remove_empty,
32
94
  spread_captures: true
33
95
  )
34
96
  @default_delimiter = default_delimiter
35
97
  @include_captures = include_captures
36
- @remove_empty = remove_empty
98
+ @remove_empty_fields = remove_empty_fields
37
99
  @spread_captures = spread_captures
38
100
  end
39
101
 
40
- attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
102
+ attr_reader(
103
+ :default_delimiter,
104
+ :include_captures,
105
+ :remove_empty_fields,
106
+ :spread_captures
107
+ )
108
+
109
+ # TODO remove this
110
+ alias remove_empty remove_empty_fields
41
111
 
42
112
  def split(
43
113
  string,
44
114
  delimiter = @default_delimiter,
45
- at: nil,
115
+ at: nil, # alias for select
116
+ except: nil, # alias for reject
46
117
  select: at,
47
- exclude: nil,
48
- reject: exclude,
118
+ reject: except,
49
119
  &block
50
120
  )
51
- result, splits, block = split_init(
121
+ result, splits, count, accept = init(
52
122
  string: string,
53
123
  delimiter: delimiter,
54
124
  select: select,
@@ -56,29 +126,21 @@ class StringSplitter
56
126
  block: block
57
127
  )
58
128
 
59
- count = splits.length
129
+ return result unless splits
60
130
 
61
- splits.each_with_index do |split, index|
62
- split = Split.with(split.merge({ index: index, count: count }))
131
+ splits.each_with_index do |hash, index|
132
+ split = Split.with(hash.merge({ count: count, index: index }))
63
133
  result << split.lhs if result.empty?
64
134
 
65
- if block.call(split)
66
- if @include_captures
67
- if @spread_captures
68
- result += split.captures
69
- else
70
- result << split.captures
71
- end
72
- end
73
-
74
- result << split.rhs
135
+ if accept.call(split)
136
+ result << split.captures << split.rhs
75
137
  else
76
- # concatenate the rhs
138
+ # append the rhs
77
139
  result[-1] = result[-1] + split.separator + split.rhs
78
140
  end
79
141
  end
80
142
 
81
- result
143
+ render(result)
82
144
  end
83
145
 
84
146
  alias lsplit split
@@ -86,13 +148,13 @@ class StringSplitter
86
148
  def rsplit(
87
149
  string,
88
150
  delimiter = @default_delimiter,
89
- at: nil,
151
+ at: nil, # alias for select
152
+ except: nil, # alias for reject
90
153
  select: at,
91
- exclude: nil,
92
- reject: exclude,
154
+ reject: except,
93
155
  &block
94
156
  )
95
- result, splits, block = split_init(
157
+ result, splits, count, accept = init(
96
158
  string: string,
97
159
  delimiter: delimiter,
98
160
  select: select,
@@ -100,203 +162,193 @@ class StringSplitter
100
162
  block: block
101
163
  )
102
164
 
103
- count = splits.length
165
+ return result unless splits
104
166
 
105
- splits.reverse!.each_with_index do |split, index|
106
- split = Split.with(split.merge({ index: index, count: count }))
167
+ splits.reverse_each.with_index do |hash, index|
168
+ split = Split.with(hash.merge({ count: count, index: index }))
107
169
  result.unshift(split.rhs) if result.empty?
108
170
 
109
- if block.call(split)
110
- if @include_captures
111
- if @spread_captures
112
- result = split.captures + result
113
- else
114
- result.unshift(split.captures)
115
- end
116
- end
117
-
118
- result.unshift(split.lhs)
171
+ if accept.call(split)
172
+ # [lhs + captures] + result
173
+ result.unshift(split.lhs, split.captures)
119
174
  else
120
175
  # prepend the lhs
121
176
  result[0] = split.lhs + split.separator + result[0]
122
177
  end
123
178
  end
124
179
 
125
- result
180
+ render(result)
126
181
  end
127
182
 
128
183
  private
129
184
 
130
- def splits_for(parts, ncaptures)
131
- result = []
132
- splits = []
133
-
134
- until parts.empty?
135
- lhs = parts.shift
136
- separator = parts.shift
137
- captures = parts.shift(ncaptures)
138
- rhs = parts.length == 1 ? parts.shift : parts.first
139
-
140
- if @remove_empty && (lhs.empty? || rhs.empty?)
141
- if lhs.empty? && rhs.empty?
142
- # do nothing
143
- elsif parts.empty? # last split
144
- result << (!lhs.empty? ? lhs : rhs) if splits.empty?
145
- elsif rhs.empty?
146
- # replace the empty rhs with the non-empty lhs
147
- parts[0] = lhs
148
- end
149
-
150
- next
151
- end
185
+ # initialisation common to +split+ and +rsplit+
186
+ #
187
+ # takes a hash of options passed to +split+ or +rsplit+ and returns a triple with
188
+ # the following fields:
189
+ #
190
+ # - result: the array of separated strings to return from +split+ or +rsplit+.
191
+ # if the splits arry is empty, the caller returns this array immediately
192
+ # without any further processing
193
+ #
194
+ # - splits: an array of hashes containing the lhs, rhs, separator and captured
195
+ # separator substrings for each split
196
+ #
197
+ # - count: the number of splits
198
+ #
199
+ # - accept: a proc whose return value determines whether each split should be
200
+ # accepted (true) or rejected (false)
201
+ #
202
+ def init(string:, delimiter:, select:, reject:, block:)
203
+ if delimiter.equal?(DEFAULT_DELIMITER)
204
+ string = string.strip
205
+ end
152
206
 
153
- splits << {
154
- lhs: lhs,
155
- rhs: rhs,
156
- separator: separator,
157
- captures: captures,
158
- }
207
+ if reject
208
+ positions = reject
209
+ action = Action::REJECT
210
+ elsif select
211
+ positions = select
212
+ action = Action::SELECT
159
213
  end
160
214
 
161
- [result, splits]
162
- end
215
+ splits = parse(string, delimiter)
163
216
 
164
- # takes a hash of options passed to +split+ or +rsplit+ and returns a:
165
- #
166
- # [result, splits, block]
167
- #
168
- # triple, where `result` is the return value of the method, `splits` is an array
169
- # of hashes containing the lhs/rhs, separator and captures of each split, and
170
- # `block` is a proc which specifies whether each split should be accepted or
171
- # rejected
172
- def split_init(string:, delimiter:, select:, reject:, block:)
173
- unless (match = string.match(delimiter))
174
- result = (@remove_empty && string.empty?) ? [] : [string]
175
- return [result, NO_SPLITS, block]
217
+ if splits.empty?
218
+ result = string.empty? ? [] : [string]
219
+ return [result]
176
220
  end
177
221
 
178
- select = Array(select)
179
- reject = Array(reject)
222
+ block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
223
+ [[], splits, splits.length, block]
224
+ end
180
225
 
181
- if !reject.empty?
182
- positions = reject
183
- action = :reject
184
- elsif !select.empty?
185
- positions = select
186
- action = :select
226
+ def render(result)
227
+ if @remove_empty_fields
228
+ result.reject! { |it| it.is_a?(String) && it.empty? }
187
229
  end
188
230
 
189
- ncaptures = match.captures.length
190
- delimiter = Regexp.quote(delimiter) if delimiter.is_a?(String)
191
- delimiter = increment_backrefs(delimiter, ncaptures)
192
- parts = string.split(/(#{delimiter})/, -1)
193
- remove_trailing_empty_field!(parts, ncaptures)
194
- result, splits = splits_for(parts, ncaptures)
195
- block ||= positions ? match_positions(positions, action, splits.length) : ACCEPT_ALL
231
+ unless @include_captures
232
+ return result.reject! { |it| it.is_a?(Array) }
233
+ end
196
234
 
197
- [result, splits, block]
235
+ result.flat_map do |value|
236
+ next [value] unless value.is_a?(Array) && @spread_captures
237
+ @spread_captures == :compact ? value.compact : value
238
+ end
198
239
  end
199
240
 
200
- # increment back-references so they remain valid when the outer capture
201
- # is added.
202
- #
203
- # e.g. to split on:
241
+ # takes a string and a delimiter pattern (regex or string) and splits it along
242
+ # the delimiter, returning an array of objects (hashes) representing each split.
243
+ # e.g. for:
204
244
  #
205
- # - <foo-comment> ... </foo-comment>
206
- # - <bar-comment> ... </bar-comment>
245
+ # parse.split("foo:bar:baz:quux", ":")
207
246
  #
208
- # etc.
247
+ # we return:
209
248
  #
210
- # before:
249
+ # [
250
+ # { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
251
+ # { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
252
+ # { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
253
+ # ]
211
254
  #
212
- # %r| <(\w+-comment)> [^<]* </\1-comment> |x
213
- #
214
- # after:
215
- #
216
- # %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
255
+ def parse(string, pattern)
256
+ result = []
257
+ start = 0
217
258
 
218
- def increment_backrefs(delimiter, ncaptures)
219
- if delimiter.is_a?(Regexp) && ncaptures > 0
220
- delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
221
- match = Regexp.last_match
222
- match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
223
- end
259
+ # we don't use the argument passed to the +scan+ block here because it's a
260
+ # string (the separator) if there are no captures, rather than an empty
261
+ # array. we use match.captures instead to get the array
262
+ string.scan(pattern) do
263
+ match = Regexp.last_match
264
+ index, after = match.offset(0)
265
+ separator = match[0]
266
+
267
+ # ignore empty separators at the beginning and/or end of the string
268
+ next if separator.empty? && (index.zero? || after == string.length)
269
+
270
+ lhs = string.slice(start, index - start)
271
+ result.last[:rhs] = lhs unless result.empty?
272
+
273
+ # this is correct for the last/only match, but gets updated to the next
274
+ # match's lhs for other matches
275
+ rhs = match.post_match
276
+
277
+ result << {
278
+ captures: match.captures,
279
+ lhs: lhs,
280
+ rhs: rhs,
281
+ separator: separator,
282
+ }
283
+
284
+ # move the start index (the start of the lhs) to the index after the last
285
+ # character of the separator
286
+ start = after
224
287
  end
225
288
 
226
- delimiter
289
+ result
227
290
  end
228
291
 
229
- # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
230
- # on an empty string/pattern without removing trailing empty fields e.g.:
292
+ # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
293
+ # on the action) the supplied positions
231
294
  #
232
- # "foobar".split("", -1)
233
- # "foobar".split(//, -1)
234
- # # => ["f", "o", "o", "b", "a", "r", ""]
295
+ # positions are preprocessed to support an additional feature: negative indices
296
+ # are translated to 1-based non-negative indices, e.g:
235
297
  #
236
- # "foobar".split(/()/, -1)
237
- # # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
298
+ # ss.split("foo:bar:baz:quux", ":", at: -1)
238
299
  #
239
- # "foobar".split(/(())/, -1)
240
- # # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
300
+ # translates to:
241
301
  #
242
- # *there is no such thing as an empty field whose separator is empty*, so
243
- # if String#split's result ends with an empty separator, 0 or more (empty)
244
- # captures and an empty field, we can safely remove them.
245
-
246
- def remove_trailing_empty_field!(parts, ncaptures)
247
- # the trailing field is at index -1. if there are 0 captures, the separator
248
- # is at -2:
249
- #
250
- # [empty_separator, empty_field]
251
- #
252
- # if there is 1 capture, the separator is at -3:
253
- #
254
- # [empty_separator, capture, empty_field]
302
+ # ss.split("foo:bar:baz:quux", ":", at: 3)
303
+ #
304
+ # and
305
+ #
306
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
307
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
308
+ #
309
+ # translate to:
310
+ #
311
+ # ss.split("foo:bar:baz:quux", ":", at: 6..8)
312
+ #
313
+ def compile(positions, action, nsplits)
314
+ # XXX note: we don't use modulo, because we don't want
315
+ # out-of-bounds indices to silently work, e.g. we don't want:
255
316
  #
256
- # etc. therefore we find the separator by walking back
317
+ # ss.split("foo:bar:baz:quux", ":", at: -42)
257
318
  #
258
- # 1 (empty field)
259
- # + ncaptures
260
- # + 1 (separator)
319
+ # to mysteriously match when the index/position is 0/1
261
320
  #
262
- # steps from the end of the array i.e. ncaptures + 2
263
- count = ncaptures + 2
264
- separator_index = count * -1
265
-
266
- return unless parts[-1].empty? && parts[separator_index].empty?
267
-
268
- # drop the empty separator, the (empty) captures, and the trailing empty field
269
- parts.pop(count)
270
- end
271
-
272
- def match_positions(positions, action, nsplits)
273
- positions = Array(positions).map do |position|
274
- if position.is_a?(Integer) && position.negative?
275
- # translate negative indices to 1-based non-negative indices e.g:
276
- #
277
- # ss.split("foo:bar:baz:quux", ":", at: -1)
278
- #
279
- # translates to:
280
- #
281
- # ss.split("foo:bar:baz:quux", ":", at: 3)
282
- #
283
- # XXX note: we don't use modulo, because we don't want
284
- # out-of-bounds indices to silently work e.g. we don't want:
285
- #
286
- # ss.split("foo:bar:baz:quux", ":", -42)
287
- #
288
- # to mysteriously match when the position is 2
289
-
290
- nsplits + 1 + position
321
+ resolve = ->(int) { int.negative? ? nsplits + 1 + int : int }
322
+
323
+ # don't use Array(...) to wrap these as we don't want to convert ranges
324
+ positions = positions.is_a?(Array) ? positions : [positions]
325
+
326
+ positions = positions.map do |position|
327
+ if position.is_a?(Integer)
328
+ resolve[position]
329
+ elsif position.is_a?(Range)
330
+ rbegin = position.begin
331
+ rend = position.end
332
+ rexc = position.exclude_end?
333
+
334
+ if rbegin.nil?
335
+ Range.new(1, resolve[rend], rexc)
336
+ elsif rend.nil?
337
+ Range.new(resolve[rbegin], nsplits, rexc)
338
+ elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
339
+ from = resolve[rbegin]
340
+ to = resolve[rend]
341
+ to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
342
+ else
343
+ position
344
+ end
345
+ elsif position.is_a?(Set)
346
+ position.map { |it| resolve[it] }.to_set
291
347
  else
292
348
  position
293
349
  end
294
350
  end
295
351
 
296
- match = action == :select
297
-
298
- lambda do |split|
299
- case split.position when *positions then match else !match end
300
- end
352
+ ->(split) { case split.position when *positions then action else !action end }
301
353
  end
302
354
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.5.1'
4
+ VERSION = '0.6.0'
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.1
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-07-01 00:00:00.000000000 Z
11
+ date: 2020-08-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: values
@@ -30,42 +30,42 @@ dependencies:
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '1.16'
33
+ version: '2.1'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '1.16'
40
+ version: '2.1'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: minitest
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '5.11'
47
+ version: '5.0'
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '5.11'
54
+ version: '5.0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: minitest-power_assert
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: 0.3.0
61
+ version: '0.3'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: 0.3.0
68
+ version: '0.3'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: minitest-reporters
71
71
  requirement: !ruby/object:Gem::Requirement
@@ -86,29 +86,15 @@ dependencies:
86
86
  requirements:
87
87
  - - "~>"
88
88
  - !ruby/object:Gem::Version
89
- version: '10.0'
89
+ version: '13.0'
90
90
  type: :development
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
- version: '10.0'
97
- - !ruby/object:Gem::Dependency
98
- name: rubocop
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - "~>"
102
- - !ruby/object:Gem::Version
103
- version: 0.54.0
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - "~>"
109
- - !ruby/object:Gem::Version
110
- version: 0.54.0
111
- description:
96
+ version: '13.0'
97
+ description:
112
98
  email: chocolate@cpan.org
113
99
  executables: []
114
100
  extensions: []
@@ -127,7 +113,7 @@ metadata:
127
113
  bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
128
114
  changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
129
115
  source_code_uri: https://github.com/chocolateboy/string_splitter
130
- post_install_message:
116
+ post_install_message:
131
117
  rdoc_options: []
132
118
  require_paths:
133
119
  - lib
@@ -135,16 +121,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
135
121
  requirements:
136
122
  - - ">="
137
123
  - !ruby/object:Gem::Version
138
- version: '0'
124
+ version: '2.3'
139
125
  required_rubygems_version: !ruby/object:Gem::Requirement
140
126
  requirements:
141
127
  - - ">="
142
128
  - !ruby/object:Gem::Version
143
129
  version: '0'
144
130
  requirements: []
145
- rubyforge_project:
146
- rubygems_version: 2.7.7
147
- signing_key:
131
+ rubygems_version: 3.1.4
132
+ signing_key:
148
133
  specification_version: 4
149
134
  summary: String#split on steroids
150
135
  test_files: []