RubyGems - string_splitter - Versions diffs - 0.5.1 → 0.6.0 - Mend

string_splitter 0.5.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +47 -10
data/README.md +139 -49
data/lib/string_splitter.rb +233 -181
data/lib/string_splitter/version.rb +1 -1
metadata +16 -31

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 582dd9d8bae0421a49348bf0ccade081a4cc448e8e27943dcb67004b1b684f6d
-  data.tar.gz: 10990476dec6bf7edc909cd8558d0404fd9295820238ac527ebf3294454815a2
+  metadata.gz: 9d97ccb956fe51694359cdb0d3a997d6574de088bac6ed5a8e572f92bb5ed54a
+  data.tar.gz: 845cefeb5efd5d01baa45759cb05ff7ae5e9a457c1f148b340bb24c038bd259e
 SHA512:
-  metadata.gz: 666914aa76ca9f425dc7ef60b0110dbb1239fad3ae44ac49ba0ee59531b93d800cb2ca475c524ee359dbde4b21a0b97a89fa3f6910bb78d1b6737729ffddc1a9
-  data.tar.gz: 4c9522bcc4e858a98e4b9c79abe2ecf845b0a8209479b802637936215c0a5c02e9c0853f103779618636774ec5ce55a7157ea8144eaadaa97f918a94e062d4e9
+  metadata.gz: 7a935a6e0f3434801dcae6a32575779e1d2eb706f8f208087a208e7fdba39ac5b49928f8b7617aec60493a8db5988a013028650f8b2ced01fadb620bfd4c77e5
+  data.tar.gz: d76c18a283c1e113c8bffb73b813eb6074481faa7ea339811dc9a7424a5e24fdc3efbe9afa941459e566cde8271c3cd19a97e3a37a8cf90d36a65a7bf8fd6dcf

data/CHANGELOG.md CHANGED

@@ -1,37 +1,74 @@
+## 0.6.0 - 2020-08-20
+#### Breaking Changes
+- `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
+  unlike Ruby's `String#split` (but like Crystal's), the former no longer
+  strips the string before splitting
+- rename the `remove_empty` option `remove_empty_fields`
+- rename the `exclude` option `except` (alias for `reject`)
+#### Fixes
+- correctly handle backreferences in delimiter patterns
+#### Features
+- add support for descending, negative, and infinite ranges,
+  e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
 ## 0.5.1 - 2018-07-01
+#### Changes
 - set StringSplitter::VERSION when `string_splitter.rb` is loaded
-- doc tweaks
 ## 0.5.0 - 2018-06-26
+#### Fixes
 - don't treat string delimiters as patterns
+#### Features
 - add a `reject`/`exclude` option which rejects splits at the specified positions
 - add a `select` alias for `at`
 ## 0.4.0 - 2018-06-24
-- **breaking change**: remove the `offset` alias for `split.index`
+#### Breaking Changes
+- remove the `offset` alias for `split.index`
 ## 0.3.1 - 2018-06-24
-- remove trailing empty field when the separator is empty ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
+#### Fixes
+- remove trailing empty field when the separator is empty
+  ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
 ## 0.3.0 - 2018-06-23
-- **breaking change**: rename the `default_separator` option to `default_delimiter`
-  - to avoid ambiguity in the code, refer to the input pattern/string as the
-    "delimiter" and the matched string as the "separator"
+#### Breaking Changes
+- rename the `default_separator` option `default_delimiter`
 ## 0.2.0 - 2018-06-22
-- **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
-  (AKA `pos`) as the 1-based accessor
+#### Breaking Changes
+- make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
+  1-based accessor
 ## 0.1.0 - 2018-06-22
-- **breaking change**: the block now takes a single `split` object with an
-  `index` accessor, rather than seperate `index` and `split` arguments
+#### Breaking Changes
+- the block now takes a single `split` object with an `index` accessor, rather
+  than seperate `index` and `split` arguments
+#### Features
 - add support for negative indices in the value supplied to the `at` option
 - add a `count` field to the split object containing the total number of splits

data/README.md CHANGED

@@ -3,14 +3,15 @@
 [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
 [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+<!-- toc -->
 - [NAME](#name)
 - [INSTALLATION](#installation)
 - [SYNOPSIS](#synopsis)
 - [DESCRIPTION](#description)
 - [WHY?](#why)
+- [CAVEATS](#caveats)
+  - [Differences from String#split](#differences-from-string%23split)
 - [COMPATIBILITY](#compatibility)
 - [VERSION](#version)
 - [SEE ALSO](#see-also)
@@ -19,7 +20,7 @@
 - [AUTHOR](#author)
 - [COPYRIGHT AND LICENSE](#copyright-and-license)
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+<!-- tocstop -->
 # NAME
@@ -42,16 +43,28 @@ ss = StringSplitter.new
 **Same as `String#split`**
 ```ruby
-ss.split("foo bar baz quux")
-ss.split("foo bar baz quux", " ")
-ss.split("foo bar baz quux", /\s+/)
-# => ["foo", "bar", "baz", "quux"]
+ss.split("foo bar baz")
+ss.split("  foo bar baz  ")
+# => ["foo", "bar", "baz"]
+```
+```ruby
+ss.split("foo", "")
+ss.split("foo", //)
+# => ["f", "o", "o"]
+```
+```ruby
+ss.split("", "...")
+ss.split("", /.../)
+# => []
 ```
 **Split at the first delimiter**
 ```ruby
 ss.split("foo:bar:baz:quux", ":", at: 1)
+ss.split("foo:bar:baz:quux", ":", select: 1)
 # => ["foo", "bar:baz:quux"]
 ```
@@ -65,8 +78,16 @@ ss.split("foo:bar:baz:quux", ":", at: -1)
 **Split at multiple delimiter positions**
 ```ruby
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
-# => ["1", "2", "3", "4:5:6:7", "8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2", "3", "4:5:6:7:8", "9"]
+```
+**Split at all but the first and last delimiters**
+```ruby
+ss.split("1:2:3:4:5:6", ":", except: [1, -1])
+ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
+# => ["1:2", "3", "4", "5:6"]
 ```
 **Split from the right**
@@ -75,44 +96,79 @@ ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
 ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
 # => ["1:2:3:4", "5:6", "7", "8", "9"]
 ```
+**Split with negative, descending, and infinite ranges**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [4...])
+# => ["1:2:3:4", "5", "6", "7", "8:9"]
+```
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [..-3])
+# => ["1", "2", "3", "4", "5", "6", "7:8:9"]
+```
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
+# => ["1", "2:3", "4", "5", "6:7", "8", "9"]
+```
 **Full control via a block**
 ```ruby
-result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
-  split.index > 0 && split.lhs == split.rhs
+result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
+  split.pos % 2 == 0
 end
-# => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
+# => ["1:2", "3:4", "5:6", "7:8"]
+```
+```ruby
+string = "banana".chars.sort.join # "aaabnn"
+ss.split(string, "") do |split|
+    split.rhs != split.lhs
+end
+# => ["aaa", "b", "nn"]
 ```
 # DESCRIPTION
-Many languages have built-in `split` functions/methods for strings. They behave similarly
-(notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
-and handle a few common cases e.g.:
+Many languages have built-in `split` functions/methods for strings. They behave
+similarly (notwithstanding the occasional
+[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
+handle a few common cases e.g.:
 * limiting the number of splits
 * including the separator(s) in the results
 * removing (some) empty fields
-But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
-achieving the desired results can be tricky. For instance, while `String#split` removes empty
-trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
-cramped API means there's no way to e.g. combine a limit (positive integer) with the option
-to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
+But, because the API is squeezed into two overloaded parameters (the delimiter
+and the limit), achieving the desired results can be tricky. For instance,
+while `String#split` removes empty trailing fields (by default), it provides no
+way to remove *all* empty fields. Likewise, the cramped API means there's no
+way to e.g. combine a limit (positive integer) with the option to preserve
+empty fields (negative integer), or use backreferences in a delimiter pattern
 without including its captured subexpressions in the result.
-If `split` was being written from scratch, without the baggage of its legacy API,
-it's possible that some of these options would be made explicit rather than overloading
-the parameters. And, indeed, this is possible in some implementations,
-e.g. in Crystal:
+If `split` was being written from scratch, without the baggage of its legacy
+API, it's possible that some of these options would be made explicit rather
+than overloading the parameters. And, indeed, this is possible in some
+implementations, e.g. in Crystal:
 ```ruby
-":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
-":foo:bar:baz:".split(":", remove_empty: true)  # => ["foo", "bar", "baz"]
+":foo:bar:baz:".split(":", remove_empty: false)
+# => ["", "foo", "bar", "baz", ""]
+":foo:bar:baz:".split(":", remove_empty: true)
+# => ["foo", "bar", "baz"]
 ````
-StringSplitter takes this one step further by moving the configuration out of the method altogether
-and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
+StringSplitter takes this one step further by moving the configuration out of
+the method altogether and delegating the strategy — i.e. which splits should be
+accepted or rejected — to a block:
 ```ruby
 ss = StringSplitter.new
@@ -120,22 +176,28 @@ ss = StringSplitter.new
 ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
 # => ["foo", "bar:baz"]
-ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
-# => ["foo:bar", "baz"]
+ss.split("foo:bar:baz:quux", ":") do |split|
+  split.position == 1 || split.position == 3
+end
+# => ["foo", "bar:baz", "quux"]
 ```
-As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
+As a shortcut, the common case of splitting on delimiters at one or more
+positions is supported by an option:
 ```ruby
-ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", at: [1, -1])
+# => ["foo", "bar:baz", "quux"]
 ```
 # WHY?
-I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
+I wanted to split semi-structured output into fields without having to resort
+to a regex or a full-blown parser.
-As an example, the nominally unstructured output of many Unix commands is often formatted in a way
-that's tantalizingly close to being [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
+As an example, the nominally unstructured output of many Unix commands is often
+formatted in a way that's tantalizingly close to being
+[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
 apart from a few pesky exceptions e.g.:
 ```bash
@@ -148,8 +210,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
 -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
 ```
-These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
-date (columns 6-8) i.e.:
+These lines can *almost* be parsed into an array of fields by splitting them on
+whitespace. The exception is the date (columns 6-8) i.e.:
 ```ruby
 line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
@@ -174,13 +236,14 @@ One way to work around this is to parse the whole line e.g.:
 line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
 ```
-But that requires us to specify *everything*. What we really want is a version of `split`
-which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
-splits are accepted, rather than being restricted to the single, baked-in strategy provided
-by the `limit` parameter.
+But that requires us to specify *everything*. What we really want is a version
+of `split` which allows us to veto splitting for the 6th and 7th delimiters
+(and to stop after the 8th delimiter) i.e. control over which splits are
+accepted, rather than being restricted to the single, baked-in strategy
+provided by the `limit` parameter.
-By providing a simple way to accept or reject each split, StringSplitter makes cases like
-this easy to handle, either via a block:
+By providing a simple way to accept or reject each split, StringSplitter makes
+cases like this easy to handle, either via a block:
 ```ruby
 ss.split(line) do |split|
@@ -196,14 +259,42 @@ ss.split(line, at: [1..5, 8])
 # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
+# CAVEATS
+## Differences from String#split
+StringSplitter shares `String#split`'s behavior of trimming the string before
+splitting if the delimiter is omitted, e.g.:
+```ruby
+" foo bar baz ".split      # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ")  # => ["foo", "bar", "baz"]
+```
+However, unlike `String#split`, this doesn't also apply if a delimiter of `" "`
+is supplied, e.g.:
+```ruby
+" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
+```
+It also doesn't apply if a custom default-delimiter is defined:
+```ruby
+ss = StringSplitter.new(default_delimiter: /\s+/)
+ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
+```
 # COMPATIBILITY
-StringSplitter is tested and supported on all versions of Ruby [supported by the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/),
-i.e., currently, Ruby 2.3 and above.
+StringSplitter is tested and supported on all versions of Ruby [supported by
+the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
+currently, Ruby 2.5 and above.
 # VERSION
-0.5.1
+0.6.0
 # SEE ALSO
@@ -221,8 +312,7 @@ i.e., currently, Ruby 2.3 and above.
 # COPYRIGHT AND LICENSE
-Copyright © 2018 by chocolateboy.
+Copyright © 2018-2020 by chocolateboy.
 This is free software; you can redistribute it and/or modify it under the
-terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
+terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).

data/lib/string_splitter.rb CHANGED

@@ -1,21 +1,45 @@
 # frozen_string_literal: true
+require 'set'
 require 'values'
 require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
 #
 #   - providing full control over which splits are accepted or rejected
+#
 #   - adding support for splitting from right-to-left
+#
 #   - encapsulating splitting options/preferences in the splitter rather
 #     than trying to cram them into overloaded method parameters
 #
 # These enhancements allow splits to handle many cases that otherwise require bigger
-# guns e.g. regex matching or parsing.
+# guns, e.g. regex matching or parsing.
+#
+# Implementation-wise, we effectively use the built-in +String#split+ method as a
+# tokenizer, and parse the resulting tokens into an array of Split objects with the
+# following fields:
+#
+#   - captures:  separator substrings captured by parentheses in the delimiter pattern
+#   - count:     the number of splits
+#   - index:     the 0-based index of the split in the array
+#   - lhs:       the string to the left of the separator (back to the previous split candidate)
+#   - position:  the 1-based index of the split in the array (alias: pos)
+#   - rhs:       the string to the right of the separator (up to the next split candidate)
+#   - rindex:    the 0-based index of the split relative to the end of the array
+#   - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
+#   - separator: the string matched by the delimiter pattern/string
+#
 class StringSplitter
+  # terminology: the delimiter is what we provide and the separators are what we get
+  # back (if we capture them). e.g. for:
+  #
+  #   ss.split("foo:bar::baz", /(\W+)/)
+  #
+  # the delimiter is /(\W)/ and the separators are ":" and "::"
   ACCEPT_ALL = ->(_split) { true }
-  DEFAULT_DELIMITER = /\s+/
-  NO_SPLITS = []
+  DEFAULT_DELIMITER = /\s+/.freeze
   Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
     def position
@@ -23,32 +47,78 @@ class StringSplitter
     end
     alias_method :pos, :position
+    # 0-based index relative to the end of the array, e.g. for 5 items:
+    #
+    #  index | rindex
+    #  ------|-------
+    #    0   |   4
+    #    1   |   3
+    #    2   |   2
+    #    3   |   1
+    #    4   |   0
+    def rindex
+      count - position
+    end
+    # 1-based position relative to the end of the array, e.g. for 5 items:
+    #
+    #   position | rposition
+    #  ----------|----------
+    #      1     |    5
+    #      2     |    4
+    #      3     |    3
+    #      4     |    2
+    #      5     |    1
+    def rposition
+      count + 1 - position
+    end
+    alias_method :rpos, :rposition
+  end
+  # simulate an enum. the value is returned by the case statement
+  # in the generated block if the positions match
+  module Action
+    SELECT = true
+    REJECT = false
   end
+  private_constant :Action
   def initialize(
     default_delimiter: DEFAULT_DELIMITER,
     include_captures: true,
-    remove_empty: false,
+    remove_empty: false, # TODO remove this
+    remove_empty_fields: remove_empty,
     spread_captures: true
   )
     @default_delimiter = default_delimiter
     @include_captures = include_captures
-    @remove_empty = remove_empty
+    @remove_empty_fields = remove_empty_fields
     @spread_captures = spread_captures
   end
-  attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
+  attr_reader(
+    :default_delimiter,
+    :include_captures,
+    :remove_empty_fields,
+    :spread_captures
+  )
+  # TODO remove this
+  alias remove_empty remove_empty_fields
   def split(
     string,
     delimiter = @default_delimiter,
-    at: nil,
+    at: nil, # alias for select
+    except: nil, # alias for reject
     select: at,
-    exclude: nil,
-    reject: exclude,
+    reject: except,
     &block
   )
-    result, splits, block = split_init(
+    result, splits, count, accept = init(
       string: string,
       delimiter: delimiter,
       select: select,
@@ -56,29 +126,21 @@ class StringSplitter
       block: block
     )
-    count = splits.length
+    return result unless splits
-    splits.each_with_index do |split, index|
-      split = Split.with(split.merge({ index: index, count: count }))
+    splits.each_with_index do |hash, index|
+      split = Split.with(hash.merge({ count: count, index: index }))
       result << split.lhs if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result += split.captures
-          else
-            result << split.captures
-          end
-        end
-        result << split.rhs
+      if accept.call(split)
+        result << split.captures << split.rhs
       else
-        # concatenate the rhs
+        # append the rhs
         result[-1] = result[-1] + split.separator + split.rhs
       end
     end
-    result
+    render(result)
   end
   alias lsplit split
@@ -86,13 +148,13 @@ class StringSplitter
   def rsplit(
     string,
     delimiter = @default_delimiter,
-    at: nil,
+    at: nil, # alias for select
+    except: nil, # alias for reject
     select: at,
-    exclude: nil,
-    reject: exclude,
+    reject: except,
     &block
   )
-    result, splits, block = split_init(
+    result, splits, count, accept = init(
       string: string,
       delimiter: delimiter,
       select: select,
@@ -100,203 +162,193 @@ class StringSplitter
       block: block
     )
-    count = splits.length
+    return result unless splits
-    splits.reverse!.each_with_index do |split, index|
-      split = Split.with(split.merge({ index: index, count: count }))
+    splits.reverse_each.with_index do |hash, index|
+      split = Split.with(hash.merge({ count: count, index: index }))
       result.unshift(split.rhs) if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result = split.captures + result
-          else
-            result.unshift(split.captures)
-          end
-        end
-        result.unshift(split.lhs)
+      if accept.call(split)
+        # [lhs + captures] + result
+        result.unshift(split.lhs, split.captures)
       else
         # prepend the lhs
         result[0] = split.lhs + split.separator + result[0]
       end
     end
-    result
+    render(result)
   end
   private
-  def splits_for(parts, ncaptures)
-    result = []
-    splits = []
-    until parts.empty?
-      lhs = parts.shift
-      separator = parts.shift
-      captures = parts.shift(ncaptures)
-      rhs = parts.length == 1 ? parts.shift : parts.first
-      if @remove_empty && (lhs.empty? || rhs.empty?)
-        if lhs.empty? && rhs.empty?
-          # do nothing
-        elsif parts.empty? # last split
-          result << (!lhs.empty? ? lhs : rhs) if splits.empty?
-        elsif rhs.empty?
-          # replace the empty rhs with the non-empty lhs
-          parts[0] = lhs
-        end
-        next
-      end
+  # initialisation common to +split+ and +rsplit+
+  #
+  # takes a hash of options passed to +split+ or +rsplit+ and returns a triple with
+  # the following fields:
+  #
+  #   - result: the array of separated strings to return from +split+ or +rsplit+.
+  #     if the splits arry is empty, the caller returns this array immediately
+  #     without any further processing
+  #
+  #   - splits: an array of hashes containing the lhs, rhs, separator and captured
+  #     separator substrings for each split
+  #
+  #   - count: the number of splits
+  #
+  #   - accept: a proc whose return value determines whether each split should be
+  #     accepted (true) or rejected (false)
+  #
+  def init(string:, delimiter:, select:, reject:, block:)
+    if delimiter.equal?(DEFAULT_DELIMITER)
+      string = string.strip
+    end
-      splits << {
-        lhs: lhs,
-        rhs: rhs,
-        separator: separator,
-        captures: captures,
-      }
+    if reject
+      positions = reject
+      action = Action::REJECT
+    elsif select
+      positions = select
+      action = Action::SELECT
     end
-    [result, splits]
-  end
+    splits = parse(string, delimiter)
-  # takes a hash of options passed to +split+ or +rsplit+ and returns a:
-  #
-  #   [result, splits, block]
-  #
-  # triple, where `result` is the return value of the method, `splits` is an array
-  # of hashes containing the lhs/rhs, separator and captures of each split, and
-  # `block` is a proc which specifies whether each split should be accepted or
-  # rejected
-  def split_init(string:, delimiter:, select:, reject:, block:)
-    unless (match = string.match(delimiter))
-      result = (@remove_empty && string.empty?) ? [] : [string]
-      return [result, NO_SPLITS, block]
+    if splits.empty?
+      result = string.empty? ? [] : [string]
+      return [result]
     end
-    select = Array(select)
-    reject = Array(reject)
+    block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
+    [[], splits, splits.length, block]
+  end
-    if !reject.empty?
-      positions = reject
-      action = :reject
-    elsif !select.empty?
-      positions = select
-      action = :select
+  def render(result)
+    if @remove_empty_fields
+      result.reject! { |it| it.is_a?(String) && it.empty? }
     end
-    ncaptures = match.captures.length
-    delimiter = Regexp.quote(delimiter) if delimiter.is_a?(String)
-    delimiter = increment_backrefs(delimiter, ncaptures)
-    parts = string.split(/(#{delimiter})/, -1)
-    remove_trailing_empty_field!(parts, ncaptures)
-    result, splits = splits_for(parts, ncaptures)
-    block ||= positions ? match_positions(positions, action, splits.length) : ACCEPT_ALL
+    unless @include_captures
+      return result.reject! { |it| it.is_a?(Array) }
+    end
-    [result, splits, block]
+    result.flat_map do |value|
+      next [value] unless value.is_a?(Array) && @spread_captures
+      @spread_captures == :compact ? value.compact : value
+    end
   end
-  # increment back-references so they remain valid when the outer capture
-  # is added.
-  #
-  # e.g. to split on:
+  # takes a string and a delimiter pattern (regex or string) and splits it along
+  # the delimiter, returning an array of objects (hashes) representing each split.
+  # e.g. for:
   #
-  #   - <foo-comment> ... </foo-comment>
-  #   - <bar-comment> ... </bar-comment>
+  #   parse.split("foo:bar:baz:quux", ":")
   #
-  # etc.
+  # we return:
   #
-  # before:
+  #   [
+  #       { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
+  #       { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
+  #       { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
+  #   ]
   #
-  #   %r|   <(\w+-comment)> [^<]* </\1-comment>   |x
-  #
-  # after:
-  #
-  #   %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
+  def parse(string, pattern)
+    result = []
+    start = 0
-  def increment_backrefs(delimiter, ncaptures)
-    if delimiter.is_a?(Regexp) && ncaptures > 0
-      delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
-        match = Regexp.last_match
-        match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
-      end
+    # we don't use the argument passed to the +scan+ block here because it's a
+    # string (the separator) if there are no captures, rather than an empty
+    # array. we use match.captures instead to get the array
+    string.scan(pattern) do
+      match = Regexp.last_match
+      index, after = match.offset(0)
+      separator = match[0]
+      # ignore empty separators at the beginning and/or end of the string
+      next if separator.empty? && (index.zero? || after == string.length)
+      lhs = string.slice(start, index - start)
+      result.last[:rhs] = lhs unless result.empty?
+      # this is correct for the last/only match, but gets updated to the next
+      # match's lhs for other matches
+      rhs = match.post_match
+      result << {
+        captures: match.captures,
+        lhs: lhs,
+        rhs: rhs,
+        separator: separator,
+      }
+      # move the start index (the start of the lhs) to the index after the last
+      # character of the separator
+      start = after
     end
-    delimiter
+    result
   end
-  # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
-  # on an empty string/pattern without removing trailing empty fields e.g.:
+  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
+  # on the action) the supplied positions
   #
-  #   "foobar".split("", -1)
-  #   "foobar".split(//, -1)
-  #   # => ["f", "o", "o", "b", "a", "r", ""]
+  # positions are preprocessed to support an additional feature: negative indices
+  # are translated to 1-based non-negative indices, e.g:
   #
-  #   "foobar".split(/()/, -1)
-  #   # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
+  #   ss.split("foo:bar:baz:quux", ":", at: -1)
   #
-  #   "foobar".split(/(())/, -1)
-  #   # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
+  # translates to:
   #
-  # *there is no such thing as an empty field whose separator is empty*, so
-  # if String#split's result ends with an empty separator, 0 or more (empty)
-  # captures and an empty field, we can safely remove them.
-  def remove_trailing_empty_field!(parts, ncaptures)
-    # the trailing field is at index -1. if there are 0 captures, the separator
-    # is at -2:
-    #
-    #   [empty_separator, empty_field]
-    #
-    # if there is 1 capture, the separator is at -3:
-    #
-    #   [empty_separator, capture, empty_field]
+  #   ss.split("foo:bar:baz:quux", ":", at: 3)
+  #
+  # and
+  #
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #
+  # translate to:
+  #
+  #   ss.split("foo:bar:baz:quux", ":", at: 6..8)
+  #
+  def compile(positions, action, nsplits)
+    # XXX note: we don't use modulo, because we don't want
+    # out-of-bounds indices to silently work, e.g. we don't want:
     #
-    # etc. therefore we find the separator by walking back
+    #   ss.split("foo:bar:baz:quux", ":", at: -42)
     #
-    #  1 (empty field)
-    #  + ncaptures
-    #  + 1 (separator)
+    # to mysteriously match when the index/position is 0/1
     #
-    # steps from the end of the array i.e. ncaptures + 2
-    count = ncaptures + 2
-    separator_index = count * -1
-    return unless parts[-1].empty? && parts[separator_index].empty?
-    # drop the empty separator, the (empty) captures, and the trailing empty field
-    parts.pop(count)
-  end
-  def match_positions(positions, action, nsplits)
-    positions = Array(positions).map do |position|
-      if position.is_a?(Integer) && position.negative?
-        # translate negative indices to 1-based non-negative indices e.g:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: -1)
-        #
-        # translates to:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: 3)
-        #
-        # XXX note: we don't use modulo, because we don't want
-        # out-of-bounds indices to silently work e.g. we don't want:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", -42)
-        #
-        # to mysteriously match when the position is 2
-        nsplits + 1 + position
+    resolve = ->(int) { int.negative? ? nsplits + 1 + int : int }
+    # don't use Array(...) to wrap these as we don't want to convert ranges
+    positions = positions.is_a?(Array) ? positions : [positions]
+    positions = positions.map do |position|
+      if position.is_a?(Integer)
+        resolve[position]
+      elsif position.is_a?(Range)
+        rbegin = position.begin
+        rend = position.end
+        rexc = position.exclude_end?
+        if rbegin.nil?
+          Range.new(1, resolve[rend], rexc)
+        elsif rend.nil?
+          Range.new(resolve[rbegin], nsplits, rexc)
+        elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
+          from = resolve[rbegin]
+          to = resolve[rend]
+          to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
+        else
+          position
+        end
+      elsif position.is_a?(Set)
+        position.map { |it| resolve[it] }.to_set
       else
         position
       end
     end
-    match = action == :select
-    lambda do |split|
-      case split.position when *positions then match else !match end
-    end
+    ->(split) { case split.position when *positions then action else !action end }
   end
 end

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.5.1'
+  VERSION = '0.6.0'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.5.1
+  version: 0.6.0
 platform: ruby
 authors:
 - chocolateboy
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-07-01 00:00:00.000000000 Z
+date: 2020-08-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: values
@@ -30,42 +30,42 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
 - !ruby/object:Gem::Dependency
   name: minitest-power_assert
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
 - !ruby/object:Gem::Dependency
   name: minitest-reporters
   requirement: !ruby/object:Gem::Requirement
@@ -86,29 +86,15 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
-- !ruby/object:Gem::Dependency
-  name: rubocop
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
-description:
+        version: '13.0'
+description:
 email: chocolate@cpan.org
 executables: []
 extensions: []
@@ -127,7 +113,7 @@ metadata:
   bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
   changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
   source_code_uri: https://github.com/chocolateboy/string_splitter
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -135,16 +121,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '2.3'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.7
-signing_key:
+rubygems_version: 3.1.4
+signing_key:
 specification_version: 4
 summary: String#split on steroids
 test_files: []