string_splitter 0.4.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6e73d7818b793f4c8dcbba35afb97045428085f17c07df328875383b2d162818
4
- data.tar.gz: 6a15e18b265779aeaf5d8b18c5e13d70299ed7abbc52da815e2cee46419ce9a1
3
+ metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
4
+ data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
5
5
  SHA512:
6
- metadata.gz: dee0e70311f3718d0f7b68f77b56c657e137c8a658905ddc58f4fbef536b80071dced6a9836d0ad531b93e15c16c5f25ec42965f02a0a1e016de179b133bf746
7
- data.tar.gz: 72a237dd80e3e06aad4a5bcc4545e6ea4d3d8167beb9afa6b502d964d510857947fe6071a88ded9f83df77a54da059c97abc1b3e70d8c14a03c8b77b0a01a675
6
+ metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
7
+ data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b
@@ -1,26 +1,90 @@
1
+ ## 0.7.1 - 2020-08-22
2
+
3
+ #### Changes
4
+
5
+ - performance improvements
6
+ - delegate to `String#split` where possible
7
+ - use a regular class for Split rather than values.rb
8
+ - create Split objects directly rather than allocating intermediate hashes
9
+
10
+ ## 0.7.0 - 2020-08-21
11
+
12
+ #### Breaking Changes
13
+
14
+ - `String#split` incompatibility: we no longer trim the string (with
15
+ `String#strip`) before splitting if the delimiter is omitted
16
+
17
+ ## 0.6.0 - 2020-08-20
18
+
19
+ #### Breaking Changes
20
+
21
+ - `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
22
+ unlike Ruby's `String#split`, the former no longer strips the string before
23
+ splitting
24
+ - rename the `remove_empty` option `remove_empty_fields`
25
+ - rename the `exclude` option `except` (alias for `reject`)
26
+
27
+ #### Features
28
+
29
+ - add support for descending, negative, and infinite ranges,
30
+ e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
31
+
32
+ #### Fixes
33
+
34
+ - correctly handle backreferences in delimiter patterns
35
+
36
+ ## 0.5.1 - 2018-07-01
37
+
38
+ #### Changes
39
+
40
+ - set StringSplitter::VERSION when `string_splitter.rb` is loaded
41
+
42
+ ## 0.5.0 - 2018-06-26
43
+
44
+ #### Features
45
+
46
+ - add a `reject`/`exclude` option which rejects splits at the specified positions
47
+ - add a `select` alias for `at`
48
+
49
+ #### Fixes
50
+
51
+ - don't treat string delimiters as patterns
52
+
1
53
  ## 0.4.0 - 2018-06-24
2
54
 
3
- - **breaking change**: remove the `offset` alias for `split.index`
55
+ #### Breaking Changes
56
+
57
+ - remove the `offset` alias for `split.index`
4
58
 
5
59
  ## 0.3.1 - 2018-06-24
6
60
 
7
- - remove trailing empty field when the separator is empty (#1)
61
+ #### Fixes
62
+
63
+ - remove trailing empty field when the separator is empty
64
+ ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
8
65
 
9
66
  ## 0.3.0 - 2018-06-23
10
67
 
11
- - **breaking change**: rename the `default_separator` option to `default_delimiter`
12
- - to avoid ambiguity in the code, refer to the input pattern/string as the
13
- "delimiter" and the matched string as the "separator"
68
+ #### Breaking Changes
69
+
70
+ - rename the `default_separator` option `default_delimiter`
14
71
 
15
72
  ## 0.2.0 - 2018-06-22
16
73
 
17
- - **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
18
- (AKA `pos`) as the 1-based accessor
74
+ #### Breaking Changes
75
+
76
+ - make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
77
+ 1-based accessor
19
78
 
20
79
  ## 0.1.0 - 2018-06-22
21
80
 
22
- - **breaking change**: the block now takes a single `split` object with an
23
- `index` accessor, rather than seperate `index` and `split` arguments
81
+ #### Breaking Changes
82
+
83
+ - the block now takes a single `split` object with an `index` accessor, rather
84
+ than separate `index` and `split` arguments
85
+
86
+ #### Features
87
+
24
88
  - add support for negative indices in the value supplied to the `at` option
25
89
  - add a `count` field to the split object containing the total number of splits
26
90
 
data/README.md CHANGED
@@ -3,14 +3,16 @@
3
3
  [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
4
4
  [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
5
5
 
6
- <!-- START doctoc generated TOC please keep comment here to allow auto update -->
7
- <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
6
+ <!-- toc -->
8
7
 
9
8
  - [NAME](#name)
10
9
  - [INSTALLATION](#installation)
11
10
  - [SYNOPSIS](#synopsis)
12
11
  - [DESCRIPTION](#description)
13
12
  - [WHY?](#why)
13
+ - [CAVEATS](#caveats)
14
+ - [Differences from String#split](#differences-from-stringsplit)
15
+ - [COMPATIBILITY](#compatibility)
14
16
  - [VERSION](#version)
15
17
  - [SEE ALSO](#see-also)
16
18
  - [Gems](#gems)
@@ -18,7 +20,7 @@
18
20
  - [AUTHOR](#author)
19
21
  - [COPYRIGHT AND LICENSE](#copyright-and-license)
20
22
 
21
- <!-- END doctoc generated TOC please keep comment here to allow auto update -->
23
+ <!-- tocstop -->
22
24
 
23
25
  # NAME
24
26
 
@@ -36,65 +38,128 @@ gem "string_splitter"
36
38
  require "string_splitter"
37
39
 
38
40
  ss = StringSplitter.new
41
+ ```
42
+
43
+ **Same as `String#split`**
39
44
 
40
- # same as String#split
41
- ss.split("foo bar baz quux")
42
- ss.split("foo bar baz quux", " ")
43
- ss.split("foo bar baz quux", /\s+/)
44
- # => ["foo", "bar", "baz", "quux"]
45
+ ```ruby
46
+ ss.split("foo bar baz")
47
+ ss.split("foo bar baz", " ")
48
+ ss.split("foo bar baz", /\s+/)
49
+ # => ["foo", "bar", "baz"]
50
+
51
+ ss.split("foo", "")
52
+ ss.split("foo", //)
53
+ # => ["f", "o", "o"]
54
+
55
+ ss.split("", "...")
56
+ ss.split("", /.../)
57
+ # => []
58
+ ```
45
59
 
46
- # split at the first delimiter
60
+ **Split at the first delimiter**
61
+
62
+ ```ruby
47
63
  ss.split("foo:bar:baz:quux", ":", at: 1)
64
+ ss.split("foo:bar:baz:quux", ":", select: 1)
48
65
  # => ["foo", "bar:baz:quux"]
66
+ ```
49
67
 
50
- # split at the last delimiter
68
+ **Split at the last delimiter**
69
+
70
+ ```ruby
51
71
  ss.split("foo:bar:baz:quux", ":", at: -1)
52
72
  # => ["foo:bar:baz", "quux"]
73
+ ```
74
+
75
+ **Split at multiple delimiter positions**
76
+
77
+ ```ruby
78
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
79
+ # => ["1", "2", "3", "4:5:6:7:8", "9"]
80
+ ```
53
81
 
54
- # split at multiple delimiter positions
55
- ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
56
- # => ["1", "2", "3", "4:5:6:7", "8:9"]
82
+ **Split at all but the first and last delimiters**
57
83
 
58
- # split from the right
84
+ ```ruby
85
+ ss.split("1:2:3:4:5:6", ":", except: [1, -1])
86
+ ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
87
+ # => ["1:2", "3", "4", "5:6"]
88
+ ```
89
+
90
+ **Split from the right**
91
+
92
+ ```ruby
59
93
  ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
60
94
  # => ["1:2:3:4", "5:6", "7", "8", "9"]
95
+ ```
96
+
97
+ **Split with negative, descending, and infinite ranges**
98
+
99
+ ```ruby
100
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
101
+ # => ["1", "2", "3", "4", "5", "6", "7:8:9"]
102
+
103
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
104
+ # => ["1:2:3:4", "5", "6", "7", "8:9"]
105
+
106
+ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
107
+ # => ["1", "2:3", "4", "5", "6:7", "8", "9"]
108
+ ```
109
+
110
+ **Full control via a block**
61
111
 
62
- # full control via a block
63
- result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
64
- split.index > 0 && split.lhs == split.rhs
112
+ ```ruby
113
+ result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
114
+ split.pos % 2 == 0
65
115
  end
66
- # => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
116
+ # => ["1:2", "3:4", "5:6", "7:8"]
117
+ ```
118
+
119
+ ```ruby
120
+ string = "banana".chars.sort.join # "aaabnn"
121
+
122
+ ss.split(string, "") do |split|
123
+ split.rhs != split.lhs
124
+ end
125
+ # => ["aaa", "b", "nn"]
67
126
  ```
68
127
 
69
128
  # DESCRIPTION
70
129
 
71
- Many languages have built-in string `split` functions/methods. They behave similarly
72
- (notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
73
- and handle a few common cases e.g.:
130
+ Many languages have built-in `split` functions/methods for strings. They behave
131
+ similarly (notwithstanding the occasional
132
+ [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
133
+ handle a few common cases, e.g.:
74
134
 
75
135
  * limiting the number of splits
76
- * including the separators in the results
136
+ * including the separator(s) in the results
77
137
  * removing (some) empty fields
78
138
 
79
- But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
80
- achieving the desired effects can be tricky. For instance, while `String#split` removes empty
81
- trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
82
- cramped API means there's no way to e.g. combine a limit (positive integer) with the option
83
- to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
139
+ But, because the API is squeezed into two overloaded parameters (the delimiter
140
+ and the limit), achieving the desired results can be tricky. For instance,
141
+ while `String#split` removes empty trailing fields (by default), it provides no
142
+ way to remove *all* empty fields. Likewise, the cramped API means there's no
143
+ way to, e.g., combine a limit (positive integer) with the option to preserve
144
+ empty fields (negative integer), or use backreferences in a delimiter pattern
84
145
  without including its captured subexpressions in the result.
85
146
 
86
- If `split` was being written from scratch, without the baggage of its legacy API,
87
- it's possible that some of these options would be made explicit rather than overloading
88
- the parameters. And, indeed, this is possible in some implementations,
89
- e.g. in Crystal:
147
+ If `split` was being written from scratch, without the baggage of its legacy
148
+ API, it's possible that some of these options would be made explicit rather
149
+ than overloading the parameters. And, indeed, this is possible in some
150
+ implementations, e.g. in Crystal:
90
151
 
91
152
  ```ruby
92
- ":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
93
- ":foo:bar:baz:".split(":", remove_empty: true) # => ["foo", "bar", "baz"]
153
+ ":foo:bar:baz:".split(":", remove_empty: false)
154
+ # => ["", "foo", "bar", "baz", ""]
155
+
156
+ ":foo:bar:baz:".split(":", remove_empty: true)
157
+ # => ["foo", "bar", "baz"]
94
158
  ````
95
159
 
96
- StringSplitter takes this one step further by moving the configuration out of the method altogether
97
- and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
160
+ StringSplitter takes this one step further by moving the configuration out of
161
+ the method altogether and delegating the strategy — i.e. which splits should be
162
+ accepted or rejected — to a block:
98
163
 
99
164
  ```ruby
100
165
  ss = StringSplitter.new
@@ -102,22 +167,32 @@ ss = StringSplitter.new
102
167
  ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
103
168
  # => ["foo", "bar:baz"]
104
169
 
105
- ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
106
- # => ["foo:bar", "baz"]
170
+ ss.split("foo:bar:baz:quux", ":") do |split|
171
+ split.position == 1 || split.position == 3
172
+ end
173
+ # => ["foo", "bar:baz", "quux"]
107
174
  ```
108
175
 
109
- As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
176
+ As a shortcut, the common case of splitting (or not splitting) at one or more
177
+ positions is supported by dedicated options:
110
178
 
111
179
  ```ruby
112
- ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
180
+ ss.split("foo:bar:baz:quux", ":", select: [1, -1])
181
+ # => ["foo", "bar:baz", "quux"]
182
+
183
+ ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
184
+ # => ["foo:bar", "baz:quux"]
113
185
  ```
114
186
 
115
187
  # WHY?
116
188
 
117
- I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
189
+ I wanted to split semi-structured output into fields without having to resort
190
+ to a regex or a full-blown parser.
118
191
 
119
- As an example, the nominally unstructured output of many Unix commands is often formatted in a way
120
- that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
192
+ As an example, the nominally unstructured output of many Unix commands is often
193
+ formatted in a way that's tantalizingly close to being
194
+ [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
195
+ apart from a few pesky exceptions, e.g.:
121
196
 
122
197
  ```bash
123
198
  $ ls -l
@@ -129,8 +204,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
129
204
  -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
130
205
  ```
131
206
 
132
- These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
133
- date (columns 6-8) i.e.:
207
+ These lines can *almost* be parsed into an array of fields by splitting them on
208
+ whitespace. The exception is the date (columns 6-8), i.e.:
134
209
 
135
210
  ```ruby
136
211
  line = "-rw-r--r-- 1 user users 87 Jun 18 18:16 CHANGELOG.md"
@@ -149,19 +224,20 @@ instead of:
149
224
  ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
150
225
  ```
151
226
 
152
- One way to work around this is to parse the whole line e.g.:
227
+ One way to work around this is to parse the whole line, e.g.:
153
228
 
154
229
  ```ruby
155
230
  line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
156
231
  ```
157
232
 
158
- But that requires us to specify *everything*. What we really want is a version of `split`
159
- which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
160
- splits are accepted, rather than being restricted to the single, baked-in strategy provided
161
- by the `limit` parameter.
233
+ But that requires us to specify *everything*. What we really want is a version
234
+ of `split` which allows us to veto splitting for the 6th and 7th delimiters
235
+ (and to stop after the 8th delimiter), i.e. control over which splits are
236
+ accepted, rather than being restricted to the single, baked-in strategy
237
+ provided by the `limit` parameter.
162
238
 
163
- By providing a simple way to accept or reject each split, StringSplitter makes cases like
164
- this easy to handle, either via a block:
239
+ By providing a simple way to accept or reject each split, StringSplitter makes
240
+ cases like this easy to handle, either via a block:
165
241
 
166
242
  ```ruby
167
243
  ss.split(line) do |split|
@@ -177,9 +253,51 @@ ss.split(line, at: [1..5, 8])
177
253
  # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
178
254
  ```
179
255
 
256
+ # CAVEATS
257
+
258
+ ## Differences from String#split
259
+
260
+ Unlike `String#split`, StringSplitter doesn't trim the string before splitting
261
+ if the delimiter is omitted or a single space, e.g.:
262
+
263
+ ```ruby
264
+ " foo bar baz ".split # => ["foo", "bar", "baz"]
265
+ " foo bar baz ".split(" ") # => ["foo", "bar", "baz"]
266
+
267
+ ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
268
+ ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
269
+ ```
270
+
271
+ `String#split` omits the `nil` values of unmatched optional captures:
272
+
273
+ ```ruby
274
+ "foo:bar:baz".scan(/(:)|(-)/) # => [[":", nil], [":", nil]]
275
+ "foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
276
+ ```
277
+
278
+ StringSplitter preserves them by default (if `include_captures` is true, as it
279
+ is by default), though they can be omitted from spread captures by passing
280
+ `:compact` as the value of the `spread_captures` option:
281
+
282
+ ```ruby
283
+ s1 = StringSplitter.new(spread_captures: true)
284
+ s2 = StringSplitter.new(spread_captures: false)
285
+ s3 = StringSplitter.new(spread_captures: :compact)
286
+
287
+ s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
288
+ s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
289
+ s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
290
+ ```
291
+
292
+ # COMPATIBILITY
293
+
294
+ StringSplitter is tested and supported on all versions of Ruby [supported by
295
+ the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
296
+ currently, Ruby 2.5 and above.
297
+
180
298
  # VERSION
181
299
 
182
- 0.4.0
300
+ 0.7.1
183
301
 
184
302
  # SEE ALSO
185
303
 
@@ -197,7 +315,7 @@ ss.split(line, at: [1..5, 8])
197
315
 
198
316
  # COPYRIGHT AND LICENSE
199
317
 
200
- Copyright © 2018 by chocolateboy.
318
+ Copyright © 2018-2020 by chocolateboy.
201
319
 
202
320
  This is free software; you can redistribute it and/or modify it under the
203
- terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
321
+ terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).
@@ -1,249 +1,382 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'values'
3
+ require 'set'
4
+
5
+ require_relative 'string_splitter/split'
6
+ require_relative 'string_splitter/version'
4
7
 
5
8
  # This class extends the functionality of +String#split+ by:
6
9
  #
7
10
  # - providing full control over which splits are accepted or rejected
11
+ #
8
12
  # - adding support for splitting from right-to-left
9
- # - encapsulating splitting options/preferences in instances rather than trying to
10
- # cram them into overloaded method parameters
13
+ #
14
+ # - encapsulating splitting options/preferences in the splitter rather
15
+ # than trying to cram them into overloaded method parameters
11
16
  #
12
17
  # These enhancements allow splits to handle many cases that otherwise require bigger
13
- # guns e.g. regex matching or parsing.
18
+ # guns, e.g. regex matching or parsing.
19
+ #
20
+ # Implementation-wise, we split the string either with String#split, or with a custom
21
+ # scanner if the delimiter may contain captures (since String#split doesn't handle
22
+ # them correctly) and parse the resulting tokens into an array of Split objects with
23
+ # the following attributes:
24
+ #
25
+ # - captures: separator substrings captured by parentheses in the delimiter pattern
26
+ # - count: the number of splits
27
+ # - index: the 0-based index of the split in the array
28
+ # - lhs: the string to the left of the separator (back to the previous split candidate)
29
+ # - position: the 1-based index of the split in the array (alias: pos)
30
+ # - rhs: the string to the right of the separator (up to the next split candidate)
31
+ # - rindex: the 0-based index of the split relative to the end of the array
32
+ # - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
33
+ # - separator: the string matched by the delimiter pattern/string
34
+ #
14
35
  class StringSplitter
15
- ACCEPT = ->(_split) { true }
16
- DEFAULT_DELIMITER = /\s+/
17
- NO_SPLITS = []
36
+ # terminology: the delimiter is what we provide and the separators are what we get
37
+ # back (if we capture them). e.g. for:
38
+ #
39
+ # ss.split("foo:bar::baz", /(\W+)/)
40
+ #
41
+ # the delimiter is /(\W)/ and the separators are ":" and "::"
18
42
 
19
- Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
20
- def position
21
- index + 1
22
- end
43
+ ACCEPT_ALL = ->(_split) { true }
44
+ DEFAULT_DELIMITER = /\s+/.freeze
45
+ REMOVE = [].freeze
23
46
 
24
- alias_method :pos, :position
47
+ # simulate an enum. the value is returned by the case statement
48
+ # in the generated block if the positions match
49
+ module Action
50
+ SELECT = true
51
+ REJECT = false
25
52
  end
26
53
 
54
+ private_constant :Action
55
+
27
56
  def initialize(
28
57
  default_delimiter: DEFAULT_DELIMITER,
29
58
  include_captures: true,
30
- remove_empty: false,
59
+ remove_empty: false, # TODO remove this
60
+ remove_empty_fields: remove_empty,
31
61
  spread_captures: true
32
62
  )
33
63
  @default_delimiter = default_delimiter
34
64
  @include_captures = include_captures
35
- @remove_empty = remove_empty
65
+ @remove_empty_fields = remove_empty_fields
36
66
  @spread_captures = spread_captures
37
67
  end
38
68
 
39
- attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
69
+ attr_reader(
70
+ :default_delimiter,
71
+ :include_captures,
72
+ :remove_empty_fields,
73
+ :spread_captures
74
+ )
40
75
 
41
- def split(string, delimiter = @default_delimiter, at: nil, &block)
42
- result, block, splits, count, index = split_common(string, delimiter, at, block)
76
+ # TODO remove this
77
+ alias remove_empty remove_empty_fields
78
+
79
+ def split(
80
+ string,
81
+ delimiter = @default_delimiter,
82
+ at: nil, # alias for select
83
+ except: nil, # alias for reject
84
+ select: at,
85
+ reject: except,
86
+ &block
87
+ )
88
+ result, splits, count, accept = init(
89
+ string: string,
90
+ delimiter: delimiter,
91
+ select: select,
92
+ reject: reject,
93
+ block: block
94
+ )
43
95
 
44
- splits.each do |split|
45
- split = Split.with(split.merge({ index: (index += 1), count: count }))
46
- result << split.lhs if result.empty?
96
+ return result unless splits
47
97
 
48
- if block.call(split)
49
- if @include_captures
50
- if @spread_captures
51
- result += split.captures
52
- else
53
- result << split.captures
54
- end
55
- end
98
+ result << splits.first.lhs
99
+
100
+ splits.each_with_index do |split, index|
101
+ split.update!(count: count, index: index)
56
102
 
57
- result << split.rhs
103
+ if accept.call(split)
104
+ result << split.captures << split.rhs
58
105
  else
59
106
  # append the rhs
60
107
  result[-1] = result[-1] + split.separator + split.rhs
61
108
  end
62
109
  end
63
110
 
64
- result
111
+ render(result)
65
112
  end
66
113
 
67
114
  alias lsplit split
68
115
 
69
- def rsplit(string, delimiter = @default_delimiter, at: nil, &block)
70
- result, block, splits, count, index = split_common(string, delimiter, at, block)
116
+ def rsplit(
117
+ string,
118
+ delimiter = @default_delimiter,
119
+ at: nil, # alias for select
120
+ except: nil, # alias for reject
121
+ select: at,
122
+ reject: except,
123
+ &block
124
+ )
125
+ result, splits, count, accept = init(
126
+ string: string,
127
+ delimiter: delimiter,
128
+ select: select,
129
+ reject: reject,
130
+ block: block
131
+ )
132
+
133
+ return result unless splits
71
134
 
72
- splits.reverse!.each do |split|
73
- split = Split.with(split.merge({ index: (index += 1), count: count }))
74
- result.unshift(split.rhs) if result.empty?
135
+ result.unshift(splits.last.rhs)
75
136
 
76
- if block.call(split)
77
- if @include_captures
78
- if @spread_captures
79
- result = split.captures + result
80
- else
81
- result.unshift(split.captures)
82
- end
83
- end
137
+ splits.reverse_each.with_index do |split, index|
138
+ split.update!(count: count, index: index)
84
139
 
85
- result.unshift(split.lhs)
140
+ if accept.call(split)
141
+ # [lhs + captures] + result
142
+ result.unshift(split.lhs, split.captures)
86
143
  else
87
144
  # prepend the lhs
88
145
  result[0] = split.lhs + split.separator + result[0]
89
146
  end
90
147
  end
91
148
 
92
- result
149
+ render(result)
93
150
  end
94
151
 
95
152
  private
96
153
 
97
- def splits_for(parts, ncaptures)
98
- result = []
99
- splits = []
100
-
101
- until parts.empty?
102
- lhs = parts.shift
103
- separator = parts.shift
104
- captures = parts.shift(ncaptures)
105
- rhs = parts.length == 1 ? parts.shift : parts.first
106
-
107
- if @remove_empty && (lhs.empty? || rhs.empty?)
108
- if lhs.empty? && rhs.empty?
109
- # do nothing
110
- elsif parts.empty? # last split
111
- result << (!lhs.empty? ? lhs : rhs) if splits.empty?
112
- elsif rhs.empty?
113
- # replace the empty rhs with the non-empty lhs
114
- parts[0] = lhs
115
- end
154
+ # initialisation common to +split+ and +rsplit+
155
+ #
156
+ # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
157
+ # the following fields:
158
+ #
159
+ # - result: the array of separated strings to return from +split+ or +rsplit+.
160
+ # if the splits array is empty, the caller returns this array immediately
161
+ # without any further processing
162
+ #
163
+ # - splits: an array of hashes containing the lhs, rhs, separator and captured
164
+ # separator substrings for each split
165
+ #
166
+ # - count: the number of splits
167
+ #
168
+ # - accept: a proc whose return value determines whether each split should be
169
+ # accepted (true) or rejected (false)
170
+ #
171
+ def init(string:, delimiter:, select:, reject:, block:)
172
+ return [[]] if string.empty?
173
+
174
+ unless block
175
+ if reject
176
+ positions = reject
177
+ action = Action::REJECT
178
+ elsif select
179
+ positions = select
180
+ action = Action::SELECT
181
+ else
182
+ block = ACCEPT_ALL
183
+ end
184
+ end
185
+
186
+ # use String#split if we can
187
+ #
188
+ # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
189
+ # on TruffleRuby
190
+
191
+ if delimiter.is_a?(String)
192
+ limit = -1
116
193
 
117
- next
194
+ if delimiter == ' '
195
+ delimiter = / / # don't trim
196
+ elsif delimiter.empty?
197
+ limit = 0 # remove the trailing empty string
118
198
  end
119
199
 
120
- splits << {
121
- lhs: lhs,
122
- rhs: rhs,
123
- separator: separator,
124
- captures: captures,
125
- }
126
- end
200
+ result = string.split(delimiter, limit)
127
201
 
128
- [result, splits]
129
- end
202
+ return [result] if result.length == 1 # delimiter not found: no splits
203
+
204
+ if block == ACCEPT_ALL # return the (2 or more) fields
205
+ result = result.reject(&:empty?) if @remove_empty_fields
206
+ return [result]
207
+ end
208
+
209
+ splits = []
210
+
211
+ result.each_cons(2) do |lhs, rhs| # 2 or more fields
212
+ splits << Split.new(
213
+ captures: [],
214
+ lhs: lhs,
215
+ rhs: rhs,
216
+ separator: delimiter
217
+ )
218
+ end
219
+ elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
220
+ # non-empty separators so -1 is safe
221
+
222
+ if @remove_empty_fields
223
+ result = []
224
+ string.split(delimiter, -1) do |field|
225
+ result << field unless it.empty?
226
+ end
227
+ else
228
+ result = string.split(delimiter, -1)
229
+ end
130
230
 
131
- # setup common to both split methods
132
- def split_common(string, delimiter, at, block)
133
- unless (match = string.match(delimiter))
134
- result = (@remove_empty && string.empty?) ? [] : [string]
135
- return [result, block, NO_SPLITS, 0, -1]
231
+ return [result]
232
+ else
233
+ splits = parse(string, delimiter)
136
234
  end
137
235
 
138
- ncaptures = match.captures.length
139
- delimiter = increment_backrefs(delimiter, ncaptures)
140
- parts = string.split(/(#{delimiter})/, -1)
141
- remove_trailing_empty_field!(parts, ncaptures)
142
- result, splits = splits_for(parts, ncaptures)
143
236
  count = splits.length
144
- block ||= at ? match_positions(at, count) : ACCEPT
145
237
 
146
- [result, block, splits, count, -1]
238
+ return [[string]] if count.zero?
239
+
240
+ block ||= compile(positions, action, count)
241
+ [[], splits, count, block]
147
242
  end
148
243
 
149
- # increment back-references so they remain valid when the outer capture
150
- # is added.
151
- #
152
- # e.g. to split on:
153
- #
154
- # - <foo-comment> ... </foo-comment>
155
- # - <bar-comment> ... </bar-comment>
156
- #
157
- # etc.
244
+ def render(values)
245
+ values.flat_map do |value|
246
+ if value.is_a?(String)
247
+ value.empty? && @remove_empty_fields ? REMOVE : [value]
248
+ elsif @include_captures
249
+ if @spread_captures
250
+ # TODO make sure compact can return a Capture
251
+ @spread_captures == :compact ? value.compact : value
252
+ elsif value.empty?
253
+ # we expose non-captures (string delimiters or regexps with no
254
+ # captures) as empty arrays inside the block, so the type is
255
+ # consistent, but it doesn't make sense to keep them in the
256
+ # result
257
+ REMOVE
258
+ else
259
+ [value]
260
+ end
261
+ else
262
+ REMOVE
263
+ end
264
+ end
265
+ end
266
+
267
+ # takes a string and a delimiter pattern (regex or string) and splits it along
268
+ # the delimiter, returning an array of objects (hashes) representing each split.
269
+ # e.g. for:
158
270
  #
159
- # before:
271
+ # parse("foo:bar:baz:quux", ":")
160
272
  #
161
- # %r| <(\w+-comment)> [^<]* </\1-comment> |x
273
+ # we return:
162
274
  #
163
- # after:
275
+ # [
276
+ # { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
277
+ # { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
278
+ # { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
279
+ # ]
164
280
  #
165
- # %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
281
+ def parse(string, delimiter)
282
+ # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
283
+ result = []
284
+ start = 0
166
285
 
167
- def increment_backrefs(delimiter, ncaptures)
168
- if delimiter.is_a?(Regexp) && ncaptures > 0
169
- delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
170
- match = Regexp.last_match
171
- match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
172
- end
286
+ # we don't use the argument passed to the +scan+ block here because it's a
287
+ # string (the separator) if there are no captures, rather than an empty
288
+ # array. we use match.captures instead to get the array
289
+ string.scan(delimiter) do
290
+ match = Regexp.last_match
291
+ index, after = match.offset(0)
292
+ separator = match[0]
293
+
294
+ # ignore empty separators at the beginning and/or end of the string
295
+ next if separator.empty? && (index.zero? || after == string.length)
296
+
297
+ lhs = string.slice(start, index - start)
298
+ result.last.rhs = lhs unless result.empty?
299
+
300
+ # this is correct for the last/only match, but gets updated to the next
301
+ # match's lhs for other matches
302
+ rhs = match.post_match
303
+
304
+ # captures = (has_names ? Captures.new(match) : match.captures)
305
+
306
+ result << Split.new(
307
+ captures: match.captures,
308
+ lhs: lhs,
309
+ rhs: rhs,
310
+ separator: separator
311
+ )
312
+
313
+ # advance the start index (the start of the next lhs) to the position
314
+ # after the last character of the separator
315
+ start = after
173
316
  end
174
317
 
175
- delimiter
318
+ result
176
319
  end
177
320
 
178
- # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
179
- # on an empty string/pattern without removing trailing empty fields e.g.:
321
+ # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
322
+ # on the action) the supplied positions
180
323
  #
181
- # "foobar".split("", -1)
182
- # "foobar".split(//, -1)
183
- # # => ["f", "o", "o", "b", "a", "r", ""]
324
+ # positions are preprocessed to support negative indices, infinite ranges, and
325
+ # descending ranges, e.g.:
184
326
  #
185
- # "foobar".split(/()/, -1)
186
- # # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
327
+ # ss.split("foo:bar:baz:quux", ":", at: -1)
187
328
  #
188
- # "foobar".split(/(())/, -1)
189
- # # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
329
+ # translates to:
190
330
  #
191
- # *there is no such thing as an empty field whose separator is empty*, so
192
- # if String#split's result ends with an empty separator, 0 or more (empty)
193
- # captures and an empty field, we can safely remove them.
194
-
195
- def remove_trailing_empty_field!(parts, ncaptures)
196
- # the trailing field is at index -1. if there are 0 captures, the separator
197
- # is at -2:
198
- #
199
- # [empty_separator, empty_field]
200
- #
201
- # if there is 1 capture, the separator is at -3:
202
- #
203
- # [empty_separator, capture, empty_field]
331
+ # ss.split("foo:bar:baz:quux", ":", at: 3)
332
+ #
333
+ # and
334
+ #
335
+ # ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
336
+ #
337
+ # translates to:
338
+ #
339
+ # ss.split("foo:bar:baz:quux", ":", at: 6..8)
340
+ #
341
+ def compile(positions, action, count)
342
+ # XXX note: we don't use modulo, because we don't want
343
+ # out-of-bounds indices to silently work, e.g. we don't want:
204
344
  #
205
- # etc. therefore we find the separator by walking back
345
+ # ss.split("foo:bar:baz:quux", ":", at: -42)
206
346
  #
207
- # 1 (empty field)
208
- # + ncaptures
209
- # + 1 (separator)
347
+ # to mysteriously match when the index/position is 0/1
210
348
  #
211
- # steps from the end of the array i.e. ncaptures + 2
212
- count = ncaptures + 2
213
- separator_index = count * -1
214
-
215
- return unless parts[-1].empty? && parts[separator_index].empty?
216
-
217
- # drop the empty separator, the (empty) captures, and the trailing empty field
218
- parts.pop(count)
219
- end
220
-
221
- def match_positions(positions, nsplits)
222
- positions = Array(positions).map do |position|
223
- if position.is_a?(Integer) && position.negative?
224
- # translate negative indices to 1-based non-negative indices e.g:
225
- #
226
- # ss.split("foo:bar:baz:quux", ":", at: -1)
227
- #
228
- # translates to:
229
- #
230
- # ss.split("foo:bar:baz:quux", ":", at: 3)
231
- #
232
- # XXX note: we don't use modulo, because we don't want
233
- # out-of-bounds indices to silently work e.g. we don't want:
234
- #
235
- # ss.split("foo:bar:baz:quux", ":", -42)
236
- #
237
- # to mysteriously match when the position is 2
238
-
239
- nsplits + 1 + position
349
+ resolve = ->(int) { int.negative? ? count + 1 + int : int }
350
+
351
+ # don't use Array(...) to wrap these as we don't want to convert ranges
352
+ positions = positions.is_a?(Array) ? positions : [positions]
353
+
354
+ positions = positions.map do |position|
355
+ if position.is_a?(Integer)
356
+ resolve[position]
357
+ elsif position.is_a?(Range)
358
+ rbegin = position.begin
359
+ rend = position.end
360
+ rexc = position.exclude_end?
361
+
362
+ if rbegin.nil?
363
+ Range.new(1, resolve[rend], rexc)
364
+ elsif rend.nil?
365
+ Range.new(resolve[rbegin], count, rexc)
366
+ elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
367
+ from = resolve[rbegin]
368
+ to = resolve[rend]
369
+ to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
370
+ else
371
+ position
372
+ end
373
+ elsif position.is_a?(Set)
374
+ position.map { |it| resolve[it] }.to_set
240
375
  else
241
376
  position
242
377
  end
243
378
  end
244
379
 
245
- lambda do |split|
246
- case split.position when *positions then true else false end
247
- end
380
+ ->(split) { case split.position when *positions then action else !action end }
248
381
  end
249
382
  end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ class StringSplitter
4
+ class Split
5
+ attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
6
+ attr_writer :rhs
7
+ alias pos position
8
+
9
+ def initialize(captures:, lhs:, rhs:, separator:)
10
+ @captures = captures
11
+ @lhs = lhs
12
+ @rhs = rhs
13
+ @separator = separator
14
+ end
15
+
16
+ # 0-based index relative to the end of the array, e.g. for 5 items:
17
+ #
18
+ # index | rindex
19
+ # ------|-------
20
+ # 0 | 4
21
+ # 1 | 3
22
+ # 2 | 2
23
+ # 3 | 1
24
+ # 4 | 0
25
+ def rindex
26
+ @count - @position
27
+ end
28
+
29
+ # 1-based position relative to the end of the array, e.g. for 5 items:
30
+ #
31
+ # position | rposition
32
+ # ----------|----------
33
+ # 1 | 5
34
+ # 2 | 4
35
+ # 3 | 3
36
+ # 4 | 2
37
+ # 5 | 1
38
+ def rposition
39
+ @count + 1 - @position
40
+ end
41
+
42
+ alias rpos rposition
43
+
44
+ def update!(count:, index:)
45
+ @count = count
46
+ @index = index
47
+ @position = index + 1
48
+ freeze
49
+ end
50
+ end
51
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class StringSplitter
4
- VERSION = '0.4.0'
4
+ VERSION = '0.7.1'
5
5
  end
metadata CHANGED
@@ -1,71 +1,57 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: string_splitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - chocolateboy
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-06-24 00:00:00.000000000 Z
11
+ date: 2020-08-22 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: values
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '1.8'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - "~>"
25
- - !ruby/object:Gem::Version
26
- version: '1.8'
27
13
  - !ruby/object:Gem::Dependency
28
14
  name: bundler
29
15
  requirement: !ruby/object:Gem::Requirement
30
16
  requirements:
31
17
  - - "~>"
32
18
  - !ruby/object:Gem::Version
33
- version: '1.16'
19
+ version: '2.1'
34
20
  type: :development
35
21
  prerelease: false
36
22
  version_requirements: !ruby/object:Gem::Requirement
37
23
  requirements:
38
24
  - - "~>"
39
25
  - !ruby/object:Gem::Version
40
- version: '1.16'
26
+ version: '2.1'
41
27
  - !ruby/object:Gem::Dependency
42
28
  name: minitest
43
29
  requirement: !ruby/object:Gem::Requirement
44
30
  requirements:
45
31
  - - "~>"
46
32
  - !ruby/object:Gem::Version
47
- version: '5.11'
33
+ version: '5.0'
48
34
  type: :development
49
35
  prerelease: false
50
36
  version_requirements: !ruby/object:Gem::Requirement
51
37
  requirements:
52
38
  - - "~>"
53
39
  - !ruby/object:Gem::Version
54
- version: '5.11'
40
+ version: '5.0'
55
41
  - !ruby/object:Gem::Dependency
56
42
  name: minitest-power_assert
57
43
  requirement: !ruby/object:Gem::Requirement
58
44
  requirements:
59
45
  - - "~>"
60
46
  - !ruby/object:Gem::Version
61
- version: 0.3.0
47
+ version: '0.3'
62
48
  type: :development
63
49
  prerelease: false
64
50
  version_requirements: !ruby/object:Gem::Requirement
65
51
  requirements:
66
52
  - - "~>"
67
53
  - !ruby/object:Gem::Version
68
- version: 0.3.0
54
+ version: '0.3'
69
55
  - !ruby/object:Gem::Dependency
70
56
  name: minitest-reporters
71
57
  requirement: !ruby/object:Gem::Requirement
@@ -86,29 +72,15 @@ dependencies:
86
72
  requirements:
87
73
  - - "~>"
88
74
  - !ruby/object:Gem::Version
89
- version: '10.0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - "~>"
95
- - !ruby/object:Gem::Version
96
- version: '10.0'
97
- - !ruby/object:Gem::Dependency
98
- name: rubocop
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - "~>"
102
- - !ruby/object:Gem::Version
103
- version: 0.54.0
75
+ version: '13.0'
104
76
  type: :development
105
77
  prerelease: false
106
78
  version_requirements: !ruby/object:Gem::Requirement
107
79
  requirements:
108
80
  - - "~>"
109
81
  - !ruby/object:Gem::Version
110
- version: 0.54.0
111
- description:
82
+ version: '13.0'
83
+ description:
112
84
  email: chocolate@cpan.org
113
85
  executables: []
114
86
  extensions: []
@@ -118,6 +90,7 @@ files:
118
90
  - LICENSE.md
119
91
  - README.md
120
92
  - lib/string_splitter.rb
93
+ - lib/string_splitter/split.rb
121
94
  - lib/string_splitter/version.rb
122
95
  homepage: https://github.com/chocolateboy/string_splitter
123
96
  licenses:
@@ -127,7 +100,7 @@ metadata:
127
100
  bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
128
101
  changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
129
102
  source_code_uri: https://github.com/chocolateboy/string_splitter
130
- post_install_message:
103
+ post_install_message:
131
104
  rdoc_options: []
132
105
  require_paths:
133
106
  - lib
@@ -135,16 +108,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
135
108
  requirements:
136
109
  - - ">="
137
110
  - !ruby/object:Gem::Version
138
- version: '0'
111
+ version: '2.3'
139
112
  required_rubygems_version: !ruby/object:Gem::Requirement
140
113
  requirements:
141
114
  - - ">="
142
115
  - !ruby/object:Gem::Version
143
116
  version: '0'
144
117
  requirements: []
145
- rubyforge_project:
146
- rubygems_version: 2.7.7
147
- signing_key:
118
+ rubygems_version: 3.1.4
119
+ signing_key:
148
120
  specification_version: 4
149
121
  summary: String#split on steroids
150
122
  test_files: []