RubyGems - string_splitter - Versions diffs - 0.5.1 → 0.7.3 - Mend

string_splitter 0.5.1 → 0.7.3

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +76 -11
data/README.md +146 -53
data/lib/string_splitter.rb +280 -188
data/lib/string_splitter/split.rb +61 -0
data/lib/string_splitter/version.rb +1 -1
metadata +17 -45

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 582dd9d8bae0421a49348bf0ccade081a4cc448e8e27943dcb67004b1b684f6d
-  data.tar.gz: 10990476dec6bf7edc909cd8558d0404fd9295820238ac527ebf3294454815a2
+  metadata.gz: d922735ed5c3b8acdc9f0fa0d0c439f0293e1b72739dd1f6b9ef9018a332f6c9
+  data.tar.gz: b61f3b6e827675abd5fe1457a000735c4ae4a4a11dc858fc705b783820230fce
 SHA512:
-  metadata.gz: 666914aa76ca9f425dc7ef60b0110dbb1239fad3ae44ac49ba0ee59531b93d800cb2ca475c524ee359dbde4b21a0b97a89fa3f6910bb78d1b6737729ffddc1a9
-  data.tar.gz: 4c9522bcc4e858a98e4b9c79abe2ecf845b0a8209479b802637936215c0a5c02e9c0853f103779618636774ec5ce55a7157ea8144eaadaa97f918a94e062d4e9
+  metadata.gz: f226e28ffb81f405ac986c01bf0cdb4270512d0a595d89d7d77ad1b53ed25dd122dac084887de26e26293d214906709724f018b83353a9928209257f6c80bc9e
+  data.tar.gz: f56357c60d8d52ff577a1ee5c4d16bbd4082bf29853dc3593fad3eefd6670a345ed245534944f5972fca1909e7e16526351eab898e4832d7cb2054ec86be4851

data/CHANGELOG.md CHANGED

@@ -1,37 +1,102 @@
+## 0.7.3 - 2020-08-24
+#### Changes
+- avoid exposing an internal Split method inside blocks
+## 0.7.2 - 2020-08-22
+#### Fixes
+- fix/test default delimiter + `remove_empty_fields`
+## 0.7.1 - 2020-08-22
+#### Changes
+- performance improvements
+  - delegate to `String#split` where possible
+  - use a regular class for Split rather than values.rb
+  - create Split objects directly rather than allocating intermediate hashes
+## 0.7.0 - 2020-08-21
+#### Breaking Changes
+- `String#split` incompatibility: we no longer trim the string (with
+  `String#strip`) before splitting if the delimiter is omitted
+## 0.6.0 - 2020-08-20
+#### Breaking Changes
+- `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
+  unlike Ruby's `String#split`, the former no longer strips the string before
+  splitting
+- rename the `remove_empty` option `remove_empty_fields`
+- rename the `exclude` option `except` (alias for `reject`)
+#### Features
+- add support for descending, negative, and infinite ranges,
+  e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
+#### Fixes
+- correctly handle backreferences in delimiter patterns
 ## 0.5.1 - 2018-07-01
+#### Changes
 - set StringSplitter::VERSION when `string_splitter.rb` is loaded
-- doc tweaks
 ## 0.5.0 - 2018-06-26
-- don't treat string delimiters as patterns
+#### Features
 - add a `reject`/`exclude` option which rejects splits at the specified positions
 - add a `select` alias for `at`
+#### Fixes
+- don't treat string delimiters as patterns
 ## 0.4.0 - 2018-06-24
-- **breaking change**: remove the `offset` alias for `split.index`
+#### Breaking Changes
+- remove the `offset` alias for `split.index`
 ## 0.3.1 - 2018-06-24
-- remove trailing empty field when the separator is empty ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
+#### Fixes
+- remove trailing empty field when the separator is empty
+  ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
 ## 0.3.0 - 2018-06-23
-- **breaking change**: rename the `default_separator` option to `default_delimiter`
-  - to avoid ambiguity in the code, refer to the input pattern/string as the
-    "delimiter" and the matched string as the "separator"
+#### Breaking Changes
+- rename the `default_separator` option `default_delimiter`
 ## 0.2.0 - 2018-06-22
-- **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
-  (AKA `pos`) as the 1-based accessor
+#### Breaking Changes
+- make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
+  1-based accessor
 ## 0.1.0 - 2018-06-22
-- **breaking change**: the block now takes a single `split` object with an
-  `index` accessor, rather than seperate `index` and `split` arguments
+#### Breaking Changes
+- the block now takes a single `split` object with an `index` accessor, rather
+  than separate `index` and `split` arguments
+#### Features
 - add support for negative indices in the value supplied to the `at` option
 - add a `count` field to the split object containing the total number of splits

data/README.md CHANGED

@@ -3,14 +3,15 @@
 [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
 [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+<!-- toc -->
 - [NAME](#name)
 - [INSTALLATION](#installation)
 - [SYNOPSIS](#synopsis)
 - [DESCRIPTION](#description)
 - [WHY?](#why)
+- [CAVEATS](#caveats)
+  - [Differences from String#split](#differences-from-stringsplit)
 - [COMPATIBILITY](#compatibility)
 - [VERSION](#version)
 - [SEE ALSO](#see-also)
@@ -19,7 +20,7 @@
 - [AUTHOR](#author)
 - [COPYRIGHT AND LICENSE](#copyright-and-license)
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+<!-- tocstop -->
 # NAME
@@ -42,16 +43,25 @@ ss = StringSplitter.new
 **Same as `String#split`**
 ```ruby
-ss.split("foo bar baz quux")
-ss.split("foo bar baz quux", " ")
-ss.split("foo bar baz quux", /\s+/)
-# => ["foo", "bar", "baz", "quux"]
+ss.split("foo bar baz")
+ss.split("foo bar baz", " ")
+ss.split("foo bar baz", /\s+/)
+# => ["foo", "bar", "baz"]
+ss.split("foo", "")
+ss.split("foo", //)
+# => ["f", "o", "o"]
+ss.split("", "...")
+ss.split("", /.../)
+# => []
 ```
 **Split at the first delimiter**
 ```ruby
 ss.split("foo:bar:baz:quux", ":", at: 1)
+ss.split("foo:bar:baz:quux", ":", select: 1)
 # => ["foo", "bar:baz:quux"]
 ```
@@ -65,54 +75,91 @@ ss.split("foo:bar:baz:quux", ":", at: -1)
 **Split at multiple delimiter positions**
 ```ruby
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
-# => ["1", "2", "3", "4:5:6:7", "8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2", "3", "4:5:6:7:8", "9"]
+```
+**Split at all but the first and last delimiters**
+```ruby
+ss.split("1:2:3:4:5:6", ":", except: [1, -1])
+ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
+# => ["1:2", "3", "4", "5:6"]
 ```
 **Split from the right**
 ```ruby
-ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
-# => ["1:2:3:4", "5:6", "7", "8", "9"]
+ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2:3:4:5:6", "7", "8", "9"]
+```
+**Split with negative, descending, and infinite ranges**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
+# => ["1", "2", "3", "4", "5", "6", "7:8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
+# => ["1:2:3:4", "5", "6", "7", "8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
+# => ["1", "2:3", "4", "5", "6:7", "8", "9"]
 ```
 **Full control via a block**
 ```ruby
-result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
-  split.index > 0 && split.lhs == split.rhs
+result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
+  split.pos % 2 == 0
 end
-# => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
+# => ["1:2", "3:4", "5:6", "7:8"]
+```
+```ruby
+string = "banana".chars.sort.join # "aaabnn"
+ss.split(string, "") do |split|
+    split.rhs != split.lhs
+end
+# => ["aaa", "b", "nn"]
 ```
 # DESCRIPTION
-Many languages have built-in `split` functions/methods for strings. They behave similarly
-(notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
-and handle a few common cases e.g.:
+Many languages have built-in `split` functions/methods for strings. They behave
+similarly (notwithstanding the occasional
+[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
+handle a few common cases, e.g.:
 * limiting the number of splits
 * including the separator(s) in the results
 * removing (some) empty fields
-But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
-achieving the desired results can be tricky. For instance, while `String#split` removes empty
-trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
-cramped API means there's no way to e.g. combine a limit (positive integer) with the option
-to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
+But, because the API is squeezed into two overloaded parameters (the delimiter
+and the limit), achieving the desired results can be tricky. For instance,
+while `String#split` removes empty trailing fields (by default), it provides no
+way to remove *all* empty fields. Likewise, the cramped API means there's no
+way to, e.g., combine a limit (positive integer) with the option to preserve
+empty fields (negative integer), or use backreferences in a delimiter pattern
 without including its captured subexpressions in the result.
-If `split` was being written from scratch, without the baggage of its legacy API,
-it's possible that some of these options would be made explicit rather than overloading
-the parameters. And, indeed, this is possible in some implementations,
-e.g. in Crystal:
+If `split` was being written from scratch, without the baggage of its legacy
+API, it's possible that some of these options would be made explicit rather
+than overloading the parameters. And, indeed, this is possible in some
+implementations, e.g. in Crystal:
 ```ruby
-":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
-":foo:bar:baz:".split(":", remove_empty: true)  # => ["foo", "bar", "baz"]
+":foo:bar:baz:".split(":", remove_empty: false)
+# => ["", "foo", "bar", "baz", ""]
+":foo:bar:baz:".split(":", remove_empty: true)
+# => ["foo", "bar", "baz"]
 ````
-StringSplitter takes this one step further by moving the configuration out of the method altogether
-and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
+StringSplitter takes this one step further by moving the configuration out of
+the method altogether and delegating the strategy — i.e. which splits should be
+accepted or rejected — to a block:
 ```ruby
 ss = StringSplitter.new
@@ -120,23 +167,32 @@ ss = StringSplitter.new
 ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
 # => ["foo", "bar:baz"]
-ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
-# => ["foo:bar", "baz"]
+ss.split("foo:bar:baz:quux", ":") do |split|
+  split.position == 1 || split.position == 3
+end
+# => ["foo", "bar:baz", "quux"]
 ```
-As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
+As a shortcut, the common case of splitting (or not splitting) at one or more
+positions is supported by dedicated options:
 ```ruby
-ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", select: [1, -1])
+# => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
+# => ["foo:bar", "baz:quux"]
 ```
 # WHY?
-I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
+I wanted to split semi-structured output into fields without having to resort
+to a regex or a full-blown parser.
-As an example, the nominally unstructured output of many Unix commands is often formatted in a way
-that's tantalizingly close to being [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
-apart from a few pesky exceptions e.g.:
+As an example, the nominally unstructured output of many Unix commands is often
+formatted in a way that's tantalizingly close to being
+[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
+apart from a few pesky exceptions, e.g.:
 ```bash
 $ ls -l
@@ -148,8 +204,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
 -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
 ```
-These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
-date (columns 6-8) i.e.:
+These lines can *almost* be parsed into an array of fields by splitting them on
+whitespace. The exception is the date (columns 6-8), i.e.:
 ```ruby
 line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
@@ -168,19 +224,20 @@ instead of:
 ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
-One way to work around this is to parse the whole line e.g.:
+One way to work around this is to parse the whole line, e.g.:
 ```ruby
 line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
 ```
-But that requires us to specify *everything*. What we really want is a version of `split`
-which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
-splits are accepted, rather than being restricted to the single, baked-in strategy provided
-by the `limit` parameter.
+But that requires us to specify *everything*. What we really want is a version
+of `split` which allows us to veto splitting for the 6th and 7th delimiters
+(and to stop after the 8th delimiter), i.e. control over which splits are
+accepted, rather than being restricted to the single, baked-in strategy
+provided by the `limit` parameter.
-By providing a simple way to accept or reject each split, StringSplitter makes cases like
-this easy to handle, either via a block:
+By providing a simple way to accept or reject each split, StringSplitter makes
+cases like this easy to handle, either via a block:
 ```ruby
 ss.split(line) do |split|
@@ -196,14 +253,51 @@ ss.split(line, at: [1..5, 8])
 # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
+# CAVEATS
+## Differences from String#split
+Unlike `String#split`, StringSplitter doesn't trim the string before splitting
+if the delimiter is omitted or a single space, e.g.:
+```ruby
+" foo bar baz ".split          # => ["foo", "bar", "baz"]
+" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ")      # => ["", "foo", "bar", "baz", ""]
+ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
+```
+`String#split` omits the `nil` values of unmatched optional captures:
+```ruby
+"foo:bar:baz".scan(/(:)|(-)/)  # => [[":", nil], [":", nil]]
+"foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
+```
+StringSplitter preserves them by default (if `include_captures` is true, as it
+is by default), though they can be omitted from spread captures by passing
+`:compact` as the value of the `spread_captures` option:
+```ruby
+s1 = StringSplitter.new(spread_captures: true)
+s2 = StringSplitter.new(spread_captures: false)
+s3 = StringSplitter.new(spread_captures: :compact)
+s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
+s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
+s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
+```
 # COMPATIBILITY
-StringSplitter is tested and supported on all versions of Ruby [supported by the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/),
-i.e., currently, Ruby 2.3 and above.
+StringSplitter is tested and supported on all versions of Ruby [supported by
+the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
+currently, Ruby 2.5 and above.
 # VERSION
-0.5.1
+0.7.3
 # SEE ALSO
@@ -221,8 +315,7 @@ i.e., currently, Ruby 2.3 and above.
 # COPYRIGHT AND LICENSE
-Copyright © 2018 by chocolateboy.
+Copyright © 2018-2020 by chocolateboy.
 This is free software; you can redistribute it and/or modify it under the
-terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
+terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).

data/lib/string_splitter.rb CHANGED

@@ -1,54 +1,94 @@
 # frozen_string_literal: true
-require 'values'
+require 'set'
+require_relative 'string_splitter/split'
 require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
 #
 #   - providing full control over which splits are accepted or rejected
+#
 #   - adding support for splitting from right-to-left
+#
 #   - encapsulating splitting options/preferences in the splitter rather
 #     than trying to cram them into overloaded method parameters
 #
 # These enhancements allow splits to handle many cases that otherwise require bigger
-# guns e.g. regex matching or parsing.
+# guns, e.g. regex matching or parsing.
+#
+# Implementation-wise, we split the string either with String#split, or with a custom
+# scanner if the delimiter may contain captures (since String#split doesn't handle
+# them correctly), and parse the resulting tokens into an array of Split objects with
+# the following attributes:
+#
+#   - captures:  separator substrings captured by parentheses in the delimiter pattern
+#   - count:     the number of splits
+#   - index:     the 0-based index of the split in the array
+#   - lhs:       the string to the left of the separator (back to the previous split candidate)
+#   - position:  the 1-based index of the split in the array (alias: pos)
+#   - rhs:       the string to the right of the separator (up to the next split candidate)
+#   - rindex:    the 0-based index of the split relative to the end of the array
+#   - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
+#   - separator: the string matched by the delimiter pattern/string
+#
 class StringSplitter
-  ACCEPT_ALL = ->(_split) { true }
-  DEFAULT_DELIMITER = /\s+/
-  NO_SPLITS = []
+  # terminology: the delimiter is what we provide and the separators are what we get
+  # back (if we capture them). e.g. for:
+  #
+  #   ss.split("foo:bar::baz", /(\W+)/)
+  #
+  # the delimiter is /(\W)/ and the separators are ":" and "::"
-  Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
-    def position
-      index + 1
-    end
+  # pull in the StringSplitter::Split#update! method
+  using Split::Refinements
-    alias_method :pos, :position
+  ACCEPT_ALL = ->(_split) { true }
+  DEFAULT_DELIMITER = /\s+/.freeze
+  REMOVE = [].freeze
+  # simulate an enum. the value is returned by the case statement
+  # in the generated block if the positions match
+  module Action
+    SELECT = true
+    REJECT = false
   end
+  private_constant :Action
   def initialize(
     default_delimiter: DEFAULT_DELIMITER,
     include_captures: true,
-    remove_empty: false,
+    remove_empty: false, # TODO remove this
+    remove_empty_fields: remove_empty,
     spread_captures: true
   )
     @default_delimiter = default_delimiter
     @include_captures = include_captures
-    @remove_empty = remove_empty
+    @remove_empty_fields = remove_empty_fields
     @spread_captures = spread_captures
   end
-  attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
+  attr_reader(
+    :default_delimiter,
+    :include_captures,
+    :remove_empty_fields,
+    :spread_captures
+  )
+  # TODO remove this
+  alias remove_empty remove_empty_fields
   def split(
     string,
     delimiter = @default_delimiter,
-    at: nil,
+    at: nil, # alias for select
+    except: nil, # alias for reject
     select: at,
-    exclude: nil,
-    reject: exclude,
+    reject: except,
     &block
   )
-    result, splits, block = split_init(
+    result, splits, count, accept = init(
       string: string,
       delimiter: delimiter,
       select: select,
@@ -56,29 +96,22 @@ class StringSplitter
       block: block
     )
-    count = splits.length
+    return result unless splits
+    result << splits.first.lhs
     splits.each_with_index do |split, index|
-      split = Split.with(split.merge({ index: index, count: count }))
-      result << split.lhs if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result += split.captures
-          else
-            result << split.captures
-          end
-        end
+      split.update!(count: count, index: index)
-        result << split.rhs
+      if accept.call(split)
+        result << split.captures << split.rhs
       else
-        # concatenate the rhs
+        # append the rhs
         result[-1] = result[-1] + split.separator + split.rhs
       end
     end
-    result
+    render(result)
   end
   alias lsplit split
@@ -86,13 +119,13 @@ class StringSplitter
   def rsplit(
     string,
     delimiter = @default_delimiter,
-    at: nil,
+    at: nil, # alias for select
+    except: nil, # alias for reject
     select: at,
-    exclude: nil,
-    reject: exclude,
+    reject: except,
     &block
   )
-    result, splits, block = split_init(
+    result, splits, count, accept = init(
       string: string,
       delimiter: delimiter,
       select: select,
@@ -100,203 +133,262 @@ class StringSplitter
       block: block
     )
-    count = splits.length
+    return result unless splits
-    splits.reverse!.each_with_index do |split, index|
-      split = Split.with(split.merge({ index: index, count: count }))
-      result.unshift(split.rhs) if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result = split.captures + result
-          else
-            result.unshift(split.captures)
-          end
-        end
+    result.unshift(splits.last.rhs)
+    splits.reverse_each.with_index do |split, index|
+      split.update!(count: count, index: index)
-        result.unshift(split.lhs)
+      if accept.call(split)
+        # [lhs + captures] + result
+        result.unshift(split.lhs, split.captures)
       else
         # prepend the lhs
         result[0] = split.lhs + split.separator + result[0]
       end
     end
-    result
+    render(result)
   end
   private
-  def splits_for(parts, ncaptures)
-    result = []
-    splits = []
+  # initialisation common to +split+ and +rsplit+
+  #
+  # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
+  # the following fields:
+  #
+  #   - result: the array of separated strings to return from +split+ or +rsplit+.
+  #     if the splits array is empty, the caller returns this array immediately
+  #     without any further processing
+  #
+  #   - splits: an array of Split objects exposing the lhs, rhs, separator and
+  #     captured separator substrings for each split
+  #
+  #   - count: the number of splits
+  #
+  #   - accept: a proc whose return value determines whether each split should be
+  #     accepted (true) or rejected (false)
+  #
+  def init(string:, delimiter:, select:, reject:, block:)
+    return [[]] if string.empty?
+    unless block
+      if reject
+        positions = reject
+        action = Action::REJECT
+      elsif select
+        positions = select
+        action = Action::SELECT
+      else
+        block = ACCEPT_ALL
+      end
+    end
-    until parts.empty?
-      lhs = parts.shift
-      separator = parts.shift
-      captures = parts.shift(ncaptures)
-      rhs = parts.length == 1 ? parts.shift : parts.first
-      if @remove_empty && (lhs.empty? || rhs.empty?)
-        if lhs.empty? && rhs.empty?
-          # do nothing
-        elsif parts.empty? # last split
-          result << (!lhs.empty? ? lhs : rhs) if splits.empty?
-        elsif rhs.empty?
-          # replace the empty rhs with the non-empty lhs
-          parts[0] = lhs
-        end
+    # use String#split if we can
+    #
+    # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
+    # on TruffleRuby
+    if delimiter.is_a?(String)
+      limit = -1
-        next
+      if delimiter == ' '
+        delimiter = / / # don't trim
+      elsif delimiter.empty?
+        limit = 0 # remove the trailing empty string
       end
-      splits << {
-        lhs: lhs,
-        rhs: rhs,
-        separator: separator,
-        captures: captures,
-      }
-    end
+      result = string.split(delimiter, limit)
-    [result, splits]
-  end
+      return [result] if result.length == 1 # delimiter not found: no splits
-  # takes a hash of options passed to +split+ or +rsplit+ and returns a:
-  #
-  #   [result, splits, block]
-  #
-  # triple, where `result` is the return value of the method, `splits` is an array
-  # of hashes containing the lhs/rhs, separator and captures of each split, and
-  # `block` is a proc which specifies whether each split should be accepted or
-  # rejected
-  def split_init(string:, delimiter:, select:, reject:, block:)
-    unless (match = string.match(delimiter))
-      result = (@remove_empty && string.empty?) ? [] : [string]
-      return [result, NO_SPLITS, block]
-    end
+      if block == ACCEPT_ALL # return the (2 or more) fields
+        result = result.reject(&:empty?) if @remove_empty_fields
+        return [result]
+      end
-    select = Array(select)
-    reject = Array(reject)
+      splits = []
-    if !reject.empty?
-      positions = reject
-      action = :reject
-    elsif !select.empty?
-      positions = select
-      action = :select
+      result.each_cons(2) do |lhs, rhs| # 2 or more fields
+        splits << Split.new(
+          captures: [],
+          lhs: lhs,
+          rhs: rhs,
+          separator: delimiter
+        )
+      end
+    elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
+      # non-empty separators so -1 is safe
+      # XXX String#split with block was introduced in Ruby 2.6:
+      #
+      # - https://rubyreferences.github.io/rubychanges/2.6.html#stringsplit-with-block
+      #
+      # rather than sniffing, we'll just use the compatible version for now
+      #
+      # if @remove_empty_fields
+      #     result = []
+      #
+      #     string.split(delimiter, -1) do |field|
+      #         result << field unless field.empty?
+      #     end
+      # else
+      #     result = string.split(delimiter, -1)
+      # end
+      result = string.split(delimiter, -1)
+      result = result.reject(&:empty?) if @remove_empty_fields
+      return [result]
+    else
+      splits = parse(string, delimiter)
     end
-    ncaptures = match.captures.length
-    delimiter = Regexp.quote(delimiter) if delimiter.is_a?(String)
-    delimiter = increment_backrefs(delimiter, ncaptures)
-    parts = string.split(/(#{delimiter})/, -1)
-    remove_trailing_empty_field!(parts, ncaptures)
-    result, splits = splits_for(parts, ncaptures)
-    block ||= positions ? match_positions(positions, action, splits.length) : ACCEPT_ALL
+    count = splits.length
-    [result, splits, block]
+    return [[string]] if count.zero?
+    block ||= compile(positions, action, count)
+    [[], splits, count, block]
   end
-  # increment back-references so they remain valid when the outer capture
-  # is added.
-  #
-  # e.g. to split on:
-  #
-  #   - <foo-comment> ... </foo-comment>
-  #   - <bar-comment> ... </bar-comment>
-  #
-  # etc.
+  def render(values)
+    values.flat_map do |value|
+      if value.is_a?(String)
+        value.empty? && @remove_empty_fields ? REMOVE : [value]
+      elsif @include_captures
+        if @spread_captures
+          # TODO make sure compact can return a Capture
+          @spread_captures == :compact ? value.compact : value
+        elsif value.empty?
+          # we expose non-captures (string delimiters or regexps with no
+          # captures) as empty arrays inside the block, so the type is
+          # consistent, but it doesn't make sense to keep them in the
+          # result
+          REMOVE
+        else
+          [value]
+        end
+      else
+        REMOVE
+      end
+    end
+  end
+  # takes a string and a delimiter pattern (regex or string) and splits it along
+  # the delimiter, returning an array of objects representing each split.
+  # e.g. for:
   #
-  # before:
+  #   parse("foo:bar:baz:quux", ":")
   #
-  #   %r|   <(\w+-comment)> [^<]* </\1-comment>   |x
+  # we return:
   #
-  # after:
+  #   [
+  #       Split.new(lhs: "foo", rhs: "bar",  separator: ":", captures: []),
+  #       Split.new(lhs: "bar", rhs: "baz",  separator: ":", captures: []),
+  #       Split.new(lhs: "baz", rhs: "quux", separator: ":", captures: []),
+  #   ]
   #
-  #   %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
+  def parse(string, delimiter)
+    # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
+    splits = []
+    start = 0
-  def increment_backrefs(delimiter, ncaptures)
-    if delimiter.is_a?(Regexp) && ncaptures > 0
-      delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
-        match = Regexp.last_match
-        match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
-      end
+    # we don't use the argument passed to the +scan+ block here because it's a
+    # string (the separator) if there are no captures, rather than an empty
+    # array. we use match.captures instead to get the array
+    string.scan(delimiter) do
+      match = Regexp.last_match
+      index, after = match.offset(0)
+      separator = match[0]
+      # ignore empty separators at the beginning and/or end of the string
+      next if separator.empty? && (index.zero? || after == string.length)
+      lhs = string.slice(start, index - start)
+      splits.last.rhs = lhs unless splits.empty?
+      # this is correct for the last/only match, but gets updated to the next
+      # match's lhs for other matches
+      rhs = match.post_match
+      # captures = has_names ? Captures.new(match) : match.captures
+      splits << Split.new(
+        captures: match.captures,
+        lhs: lhs,
+        rhs: rhs,
+        separator: separator
+      )
+      # advance the start index (the start of the next lhs) to the position
+      # after the last character of the separator
+      start = after
     end
-    delimiter
+    splits
   end
-  # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
-  # on an empty string/pattern without removing trailing empty fields e.g.:
+  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
+  # on the action) the supplied positions
   #
-  #   "foobar".split("", -1)
-  #   "foobar".split(//, -1)
-  #   # => ["f", "o", "o", "b", "a", "r", ""]
+  # positions are preprocessed to support negative indices, infinite ranges, and
+  # descending ranges, e.g.:
   #
-  #   "foobar".split(/()/, -1)
-  #   # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
+  #   ss.split("foo:bar:baz:quux", ":", at: -1)
   #
-  #   "foobar".split(/(())/, -1)
-  #   # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
+  # translates to:
   #
-  # *there is no such thing as an empty field whose separator is empty*, so
-  # if String#split's result ends with an empty separator, 0 or more (empty)
-  # captures and an empty field, we can safely remove them.
-  def remove_trailing_empty_field!(parts, ncaptures)
-    # the trailing field is at index -1. if there are 0 captures, the separator
-    # is at -2:
-    #
-    #   [empty_separator, empty_field]
-    #
-    # if there is 1 capture, the separator is at -3:
-    #
-    #   [empty_separator, capture, empty_field]
+  #   ss.split("foo:bar:baz:quux", ":", at: 3)
+  #
+  # and
+  #
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", at: -3..)
+  #
+  # translates to:
+  #
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", at: 6..8)
+  #
+  def compile(positions, action, count)
+    # XXX note: we don't use modulo, because we don't want
+    # out-of-bounds indices to silently work, e.g. we don't want:
     #
-    # etc. therefore we find the separator by walking back
+    #   ss.split("foo:bar:baz:quux", ":", at: -42)
     #
-    #  1 (empty field)
-    #  + ncaptures
-    #  + 1 (separator)
+    # to mysteriously match when the index/position is 0/1
     #
-    # steps from the end of the array i.e. ncaptures + 2
-    count = ncaptures + 2
-    separator_index = count * -1
-    return unless parts[-1].empty? && parts[separator_index].empty?
-    # drop the empty separator, the (empty) captures, and the trailing empty field
-    parts.pop(count)
-  end
-  def match_positions(positions, action, nsplits)
-    positions = Array(positions).map do |position|
-      if position.is_a?(Integer) && position.negative?
-        # translate negative indices to 1-based non-negative indices e.g:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: -1)
-        #
-        # translates to:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: 3)
-        #
-        # XXX note: we don't use modulo, because we don't want
-        # out-of-bounds indices to silently work e.g. we don't want:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", -42)
-        #
-        # to mysteriously match when the position is 2
-        nsplits + 1 + position
+    resolve = ->(int) { int.negative? ? count + 1 + int : int }
+    # don't use Array(...) to wrap these as we don't want to convert ranges
+    positions = positions.is_a?(Array) ? positions : [positions]
+    positions = positions.map do |position|
+      if position.is_a?(Integer)
+        resolve[position]
+      elsif position.is_a?(Range)
+        rbegin = position.begin
+        rend = position.end
+        rexc = position.exclude_end?
+        if rbegin.nil?
+          Range.new(1, resolve[rend], rexc)
+        elsif rend.nil?
+          Range.new(resolve[rbegin], count, rexc)
+        elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
+          from = resolve[rbegin]
+          to = resolve[rend]
+          to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
+        else
+          position
+        end
+      elsif position.is_a?(Set)
+        position.map { |it| resolve[it] }.to_set
       else
         position
       end
     end
-    match = action == :select
-    lambda do |split|
-      case split.position when *positions then match else !match end
-    end
+    ->(split) { case split.position when *positions then action else !action end }
   end
 end

data/lib/string_splitter/split.rb ADDED

@@ -0,0 +1,61 @@
+# frozen_string_literal: true
+class StringSplitter
+  class Split
+    # expose the +update!+ method as a refinement to StringSplitter but don't
+    # expose it to blocks
+    #
+    # idea based on a suggestion here (as an alternative to a `friend` modifier):
+    # https://bugs.ruby-lang.org/issues/12962#note-5
+    module Refinements
+      refine Split do
+        def update!(count:, index:)
+          @count = count
+          @index = index
+          @position = index + 1
+          freeze
+        end
+      end
+    end
+    attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
+    attr_writer :rhs
+    alias pos position
+    def initialize(captures:, lhs:, rhs:, separator:)
+      @captures = captures
+      @lhs = lhs
+      @rhs = rhs
+      @separator = separator
+    end
+    # 0-based index relative to the end of the array, e.g. for 5 items:
+    #
+    #  index | rindex
+    #  ------|-------
+    #    0   |   4
+    #    1   |   3
+    #    2   |   2
+    #    3   |   1
+    #    4   |   0
+    def rindex
+      @count - @position
+    end
+    # 1-based position relative to the end of the array, e.g. for 5 items:
+    #
+    #   position | rposition
+    #  ----------|----------
+    #      1     |    5
+    #      2     |    4
+    #      3     |    3
+    #      4     |    2
+    #      5     |    1
+    def rposition
+      @count + 1 - @position
+    end
+    alias rpos rposition
+  end
+end

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.5.1'
+  VERSION = '0.7.3'
 end

metadata CHANGED

@@ -1,71 +1,57 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.5.1
+  version: 0.7.3
 platform: ruby
 authors:
 - chocolateboy
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-07-01 00:00:00.000000000 Z
+date: 2020-08-24 00:00:00.000000000 Z
 dependencies:
-- !ruby/object:Gem::Dependency
-  name: values
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.8'
-  type: :runtime
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.8'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
 - !ruby/object:Gem::Dependency
   name: minitest-power_assert
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
 - !ruby/object:Gem::Dependency
   name: minitest-reporters
   requirement: !ruby/object:Gem::Requirement
@@ -86,29 +72,15 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '10.0'
-- !ruby/object:Gem::Dependency
-  name: rubocop
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.54.0
-description:
+        version: '13.0'
+description:
 email: chocolate@cpan.org
 executables: []
 extensions: []
@@ -118,6 +90,7 @@ files:
 - LICENSE.md
 - README.md
 - lib/string_splitter.rb
+- lib/string_splitter/split.rb
 - lib/string_splitter/version.rb
 homepage: https://github.com/chocolateboy/string_splitter
 licenses:
@@ -127,7 +100,7 @@ metadata:
   bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
   changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
   source_code_uri: https://github.com/chocolateboy/string_splitter
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -135,16 +108,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '2.3'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.7
-signing_key:
+rubygems_version: 3.1.4
+signing_key:
 specification_version: 4
 summary: String#split on steroids
 test_files: []