RubyGems - string_splitter - Versions diffs - 0.3.1 → 0.7.0 - Mend

string_splitter 0.3.1 → 0.7.0

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +67 -8
data/README.md +171 -53
data/lib/string_splitter.rb +272 -163
data/lib/string_splitter/version.rb +1 -1
metadata +16 -31

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 67fd08fc0c1d5928d849206b28130eadedbd7c38755f1c123f4c3d46cbbc5619
-  data.tar.gz: b102be89d4c59f9a2d3dd4661277a4fd3a31816f7dbae1630f2f6954bedad62a
+  metadata.gz: 400534de6c3143ef81b2ad46a3a6432b7d83ef0900024ebdde3f06a4e1714890
+  data.tar.gz: 643f5af7b9e13321dfa97b045b124d0c5ea576868b13141c264122bc96baea5e
 SHA512:
-  metadata.gz: 87d567793e20367c52625d5fa9dd6cea5470221b3d53bc54d0bd59f0f8835635d81a67e1fcecba5fcce5a116c6ba6c346c4b74fa563ac21bee5ff0d06d07ad8b
-  data.tar.gz: eab3f78e4c61e77c7bb283eb50e871d665fd4a323913e9fbf525ab2a6bfa05f0ebbabf490284371c00be6690f937f485823a8bc9dc3f59aafc9bff71c8cbe893
+  metadata.gz: 35bed8fe69b33314813fbd68a8da0e8f4799b7891275ac601b157caeb0e0a3780f37ec7e7876d808b8dfcbfdf7527f45c3af0dc0d679e133865e96949a1d9ce3
+  data.tar.gz: 8186e40d57654daf1a481ab74c128910f7aa346bc343a0a9933dc39b7cceeb204c1a55ac39b39321df46f7d02420fd87f93dd4a708be0a985d94833df018da87

data/CHANGELOG.md CHANGED

@@ -1,22 +1,81 @@
+## 0.7.0 - 2020-08-21
+#### Breaking Changes
+- `String#split` incompatibility: we no longer trim the string (with
+  `String#strip`) before splitting if the delimiter is omitted
+## 0.6.0 - 2020-08-20
+#### Breaking Changes
+- `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
+  unlike Ruby's `String#split`, the former no longer strips the string before
+  splitting
+- rename the `remove_empty` option `remove_empty_fields`
+- rename the `exclude` option `except` (alias for `reject`)
+#### Features
+- add support for descending, negative, and infinite ranges,
+  e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
+#### Fixes
+- correctly handle backreferences in delimiter patterns
+## 0.5.1 - 2018-07-01
+#### Changes
+- set StringSplitter::VERSION when `string_splitter.rb` is loaded
+## 0.5.0 - 2018-06-26
+#### Features
+- add a `reject`/`exclude` option which rejects splits at the specified positions
+- add a `select` alias for `at`
+#### Fixes
+- don't treat string delimiters as patterns
+## 0.4.0 - 2018-06-24
+#### Breaking Changes
+- remove the `offset` alias for `split.index`
 ## 0.3.1 - 2018-06-24
-- remove trailing empty field when the separator is empty (#1)
+#### Fixes
+- remove trailing empty field when the separator is empty
+  ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
 ## 0.3.0 - 2018-06-23
-- **breaking change**: rename the `default_separator` option to `default_delimiter`
-  - to avoid ambiguity in the code, refer to the input pattern/string as the
-    "delimiter" and the matched string as the "separator"
+#### Breaking Changes
+- rename the `default_separator` option `default_delimiter`
 ## 0.2.0 - 2018-06-22
-- **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
-  (AKA `pos`) as the 1-based accessor
+#### Breaking Changes
+- make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
+  1-based accessor
 ## 0.1.0 - 2018-06-22
-- **breaking change**: the block now takes a single `split` object with an
-  `index` accessor, rather than seperate `index` and `split` arguments
+#### Breaking Changes
+- the block now takes a single `split` object with an `index` accessor, rather
+  than separate `index` and `split` arguments
+#### Features
 - add support for negative indices in the value supplied to the `at` option
 - add a `count` field to the split object containing the total number of splits

data/README.md CHANGED

@@ -3,14 +3,16 @@
 [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
 [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+<!-- toc -->
 - [NAME](#name)
 - [INSTALLATION](#installation)
 - [SYNOPSIS](#synopsis)
 - [DESCRIPTION](#description)
 - [WHY?](#why)
+- [CAVEATS](#caveats)
+  - [Differences from String#split](#differences-from-string%23split)
+- [COMPATIBILITY](#compatibility)
 - [VERSION](#version)
 - [SEE ALSO](#see-also)
   - [Gems](#gems)
@@ -18,7 +20,7 @@
 - [AUTHOR](#author)
 - [COPYRIGHT AND LICENSE](#copyright-and-license)
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+<!-- tocstop -->
 # NAME
@@ -36,65 +38,128 @@ gem "string_splitter"
 require "string_splitter"
 ss = StringSplitter.new
+```
+**Same as `String#split`**
-# same as String#split
-ss.split("foo bar baz quux")
-ss.split("foo bar baz quux", " ")
-ss.split("foo bar baz quux", /\s+/)
-# => ["foo", "bar", "baz", "quux"]
+```ruby
+ss.split("foo bar baz")
+ss.split("foo bar baz", " ")
+ss.split("foo bar baz", /\s+/)
+# => ["foo", "bar", "baz"]
+ss.split("foo", "")
+ss.split("foo", //)
+# => ["f", "o", "o"]
+ss.split("", "...")
+ss.split("", /.../)
+# => []
+```
-# split at the first delimiter
+**Split at the first delimiter**
+```ruby
 ss.split("foo:bar:baz:quux", ":", at: 1)
+ss.split("foo:bar:baz:quux", ":", select: 1)
 # => ["foo", "bar:baz:quux"]
+```
-# split at the last delimiter
+**Split at the last delimiter**
+```ruby
 ss.split("foo:bar:baz:quux", ":", at: -1)
 # => ["foo:bar:baz", "quux"]
+```
+**Split at multiple delimiter positions**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2", "3", "4:5:6:7:8", "9"]
+```
-# split at multiple delimiter positions
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
-# => ["1", "2", "3", "4:5:6:7", "8:9"]
+**Split at all but the first and last delimiters**
-# split from the right
+```ruby
+ss.split("1:2:3:4:5:6", ":", except: [1, -1])
+ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
+# => ["1:2", "3", "4", "5:6"]
+```
+**Split from the right**
+```ruby
 ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
 # => ["1:2:3:4", "5:6", "7", "8", "9"]
+```
+**Split with negative, descending, and infinite ranges**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
+# => ["1", "2", "3", "4", "5", "6", "7:8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
+# => ["1:2:3:4", "5", "6", "7", "8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
+# => ["1", "2:3", "4", "5", "6:7", "8", "9"]
+```
+**Full control via a block**
-# full control via a block
-result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
-  split.index > 0 && split.lhs == split.rhs
+```ruby
+result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
+  split.pos % 2 == 0
 end
-# => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
+# => ["1:2", "3:4", "5:6", "7:8"]
+```
+```ruby
+string = "banana".chars.sort.join # "aaabnn"
+ss.split(string, "") do |split|
+    split.rhs != split.lhs
+end
+# => ["aaa", "b", "nn"]
 ```
 # DESCRIPTION
-Many languages have built-in string `split` functions/methods. They behave similarly
-(notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
-and handle a few common cases e.g.:
+Many languages have built-in `split` functions/methods for strings. They behave
+similarly (notwithstanding the occasional
+[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
+handle a few common cases e.g.:
 * limiting the number of splits
-* including the separators in the results
+* including the separator(s) in the results
 * removing (some) empty fields
-But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
-achieving the desired effects can be tricky. For instance, while `String#split` removes empty
-trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
-cramped API means there's no way to e.g. combine a limit (positive integer) with the option
-to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
+But, because the API is squeezed into two overloaded parameters (the delimiter
+and the limit), achieving the desired results can be tricky. For instance,
+while `String#split` removes empty trailing fields (by default), it provides no
+way to remove *all* empty fields. Likewise, the cramped API means there's no
+way to e.g. combine a limit (positive integer) with the option to preserve
+empty fields (negative integer), or use backreferences in a delimiter pattern
 without including its captured subexpressions in the result.
-If `split` was being written from scratch, without the baggage of its legacy API,
-it's possible that some of these options would be made explicit rather than overloading
-the parameters. And, indeed, this is possible in some implementations,
-e.g. in Crystal:
+If `split` was being written from scratch, without the baggage of its legacy
+API, it's possible that some of these options would be made explicit rather
+than overloading the parameters. And, indeed, this is possible in some
+implementations, e.g. in Crystal:
 ```ruby
-":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
-":foo:bar:baz:".split(":", remove_empty: true)  # => ["foo", "bar", "baz"]
+":foo:bar:baz:".split(":", remove_empty: false)
+# => ["", "foo", "bar", "baz", ""]
+":foo:bar:baz:".split(":", remove_empty: true)
+# => ["foo", "bar", "baz"]
 ````
-StringSplitter takes this one step further by moving the configuration out of the method altogether
-and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
+StringSplitter takes this one step further by moving the configuration out of
+the method altogether and delegating the strategy — i.e. which splits should be
+accepted or rejected — to a block:
 ```ruby
 ss = StringSplitter.new
@@ -102,22 +167,32 @@ ss = StringSplitter.new
 ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
 # => ["foo", "bar:baz"]
-ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
-# => ["foo:bar", "baz"]
+ss.split("foo:bar:baz:quux", ":") do |split|
+  split.position == 1 || split.position == 3
+end
+# => ["foo", "bar:baz", "quux"]
 ```
-As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
+As a shortcut, the common case of splitting (or not splitting) at one or more
+positions is supported by dedicated options:
 ```ruby
-ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", select: [1, -1])
+# => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
+# => ["foo:bar", "baz:quux"]
 ```
 # WHY?
-I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
+I wanted to split semi-structured output into fields without having to resort
+to a regex or a full-blown parser.
-As an example, the nominally unstructured output of many Unix commands is often formatted in a way
-that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
+As an example, the nominally unstructured output of many Unix commands is often
+formatted in a way that's tantalizingly close to being
+[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
+apart from a few pesky exceptions e.g.:
 ```bash
 $ ls -l
@@ -129,8 +204,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
 -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
 ```
-These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
-date (columns 6-8) i.e.:
+These lines can *almost* be parsed into an array of fields by splitting them on
+whitespace. The exception is the date (columns 6-8) i.e.:
 ```ruby
 line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
@@ -155,13 +230,14 @@ One way to work around this is to parse the whole line e.g.:
 line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
 ```
-But that requires us to specify *everything*. What we really want is a version of `split`
-which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
-splits are accepted, rather than being restricted to the single, baked-in strategy provided
-by the `limit` parameter.
+But that requires us to specify *everything*. What we really want is a version
+of `split` which allows us to veto splitting for the 6th and 7th delimiters
+(and to stop after the 8th delimiter) i.e. control over which splits are
+accepted, rather than being restricted to the single, baked-in strategy
+provided by the `limit` parameter.
-By providing a simple way to accept or reject each split, StringSplitter makes cases like
-this easy to handle, either via a block:
+By providing a simple way to accept or reject each split, StringSplitter makes
+cases like this easy to handle, either via a block:
 ```ruby
 ss.split(line) do |split|
@@ -177,9 +253,51 @@ ss.split(line, at: [1..5, 8])
 # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
+# CAVEATS
+## Differences from String#split
+Unlike `String#split`, StringSplitter doesn't trim the string before splitting
+(with `String#strip`) if the delimiter is omitted or a single space, e.g.:
+```ruby
+" foo bar baz ".split          # => ["foo", "bar", "baz"]
+" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ")      # => ["", "foo", "bar", "baz", ""]
+ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
+```
+`String#split` omits the `nil` values of unmatched optional captures:
+```ruby
+"foo:bar:baz".scan(/(:)|(-)/)  # => [[":", nil], [":", nil]]
+"foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
+```
+StringSplitter preserves them by default (if `include_captures` is true, as it
+is by default), though they can be omitted from spread captures by passing
+`:compact` as the value of the `spread_captures` option:
+```ruby
+s1 = StringSplitter.new(spread_captures: true)
+s2 = StringSplitter.new(spread_captures: false)
+s3 = StringSplitter.new(spread_captures: :compact)
+s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
+s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
+s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
+```
+# COMPATIBILITY
+StringSplitter is tested and supported on all versions of Ruby [supported by
+the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
+currently, Ruby 2.5 and above.
 # VERSION
-0.3.1
+0.7.0
 # SEE ALSO
@@ -197,7 +315,7 @@ ss.split(line, at: [1..5, 8])
 # COPYRIGHT AND LICENSE
-Copyright © 2018 by chocolateboy.
+Copyright © 2018-2020 by chocolateboy.
 This is free software; you can redistribute it and/or modify it under the
-terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
+terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).

data/lib/string_splitter.rb CHANGED

@@ -1,250 +1,359 @@
 # frozen_string_literal: true
+require 'set'
 require 'values'
+require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
 #
 #   - providing full control over which splits are accepted or rejected
+#
 #   - adding support for splitting from right-to-left
-#   - encapsulating splitting options/preferences in instances rather than trying to
-#     cram them into overloaded method parameters
+#
+#   - encapsulating splitting options/preferences in the splitter rather
+#     than trying to cram them into overloaded method parameters
 #
 # These enhancements allow splits to handle many cases that otherwise require bigger
-# guns e.g. regex matching or parsing.
+# guns, e.g. regex matching or parsing.
+#
+# Implementation-wise, we split the string with a scanner which works in a similar
+# way to +String#split+ and parse the resulting tokens into an array of Split objects
+# with the following fields:
+#
+#   - captures:  separator substrings captured by parentheses in the delimiter pattern
+#   - count:     the number of splits
+#   - index:     the 0-based index of the split in the array
+#   - lhs:       the string to the left of the separator (back to the previous split candidate)
+#   - position:  the 1-based index of the split in the array (alias: pos)
+#   - rhs:       the string to the right of the separator (up to the next split candidate)
+#   - rindex:    the 0-based index of the split relative to the end of the array
+#   - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
+#   - separator: the string matched by the delimiter pattern/string
+#
 class StringSplitter
-  ACCEPT = ->(_split) { true }
-  DEFAULT_DELIMITER = /\s+/
-  NO_SPLITS = []
+  # terminology: the delimiter is what we provide and the separators are what we get
+  # back (if we capture them). e.g. for:
+  #
+  #   ss.split("foo:bar::baz", /(\W+)/)
+  #
+  # the delimiter is /(\W)/ and the separators are ":" and "::"
+  ACCEPT_ALL = ->(_split) { true }
+  DEFAULT_DELIMITER = /\s+/.freeze
+  REMOVE = [].freeze
   Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
     def position
       index + 1
     end
-    alias_method :offset, :index
     alias_method :pos, :position
+    # 0-based index relative to the end of the array, e.g. for 5 items:
+    #
+    #  index | rindex
+    #  ------|-------
+    #    0   |   4
+    #    1   |   3
+    #    2   |   2
+    #    3   |   1
+    #    4   |   0
+    def rindex
+      count - position
+    end
+    # 1-based position relative to the end of the array, e.g. for 5 items:
+    #
+    #   position | rposition
+    #  ----------|----------
+    #      1     |    5
+    #      2     |    4
+    #      3     |    3
+    #      4     |    2
+    #      5     |    1
+    def rposition
+      count + 1 - position
+    end
+    alias_method :rpos, :rposition
+  end
+  # simulate an enum. the value is returned by the case statement
+  # in the generated block if the positions match
+  module Action
+    SELECT = true
+    REJECT = false
   end
+  private_constant :Action
   def initialize(
     default_delimiter: DEFAULT_DELIMITER,
     include_captures: true,
-    remove_empty: false,
+    remove_empty: false, # TODO remove this
+    remove_empty_fields: remove_empty,
     spread_captures: true
   )
     @default_delimiter = default_delimiter
     @include_captures = include_captures
-    @remove_empty = remove_empty
+    @remove_empty_fields = remove_empty_fields
     @spread_captures = spread_captures
   end
-  attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
-  def split(string, delimiter = @default_delimiter, at: nil, &block)
-    result, block, splits, count, index = split_common(string, delimiter, at, block)
+  attr_reader(
+    :default_delimiter,
+    :include_captures,
+    :remove_empty_fields,
+    :spread_captures
+  )
-    splits.each do |split|
-      split = Split.with(split.merge({ index: (index += 1), count: count }))
+  # TODO remove this
+  alias remove_empty remove_empty_fields
+  def split(
+    string,
+    delimiter = @default_delimiter,
+    at: nil, # alias for select
+    except: nil, # alias for reject
+    select: at,
+    reject: except,
+    &block
+  )
+    result, splits, count, accept = init(
+      string: string,
+      delimiter: delimiter,
+      select: select,
+      reject: reject,
+      block: block
+    )
+    return result unless splits
+    splits.each_with_index do |hash, index|
+      split = Split.with(hash.merge({ count: count, index: index }))
       result << split.lhs if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result += split.captures
-          else
-            result << split.captures
-          end
-        end
-        result << split.rhs
+      if accept.call(split)
+        result << split.captures << split.rhs
       else
         # append the rhs
         result[-1] = result[-1] + split.separator + split.rhs
       end
     end
-    result
+    render(result)
   end
   alias lsplit split
-  def rsplit(string, delimiter = @default_delimiter, at: nil, &block)
-    result, block, splits, count, index = split_common(string, delimiter, at, block)
-    splits.reverse!.each do |split|
-      split = Split.with(split.merge({ index: (index += 1), count: count }))
+  def rsplit(
+    string,
+    delimiter = @default_delimiter,
+    at: nil, # alias for select
+    except: nil, # alias for reject
+    select: at,
+    reject: except,
+    &block
+  )
+    result, splits, count, accept = init(
+      string: string,
+      delimiter: delimiter,
+      select: select,
+      reject: reject,
+      block: block
+    )
+    return result unless splits
+    splits.reverse_each.with_index do |hash, index|
+      split = Split.with(hash.merge({ count: count, index: index }))
       result.unshift(split.rhs) if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result = split.captures + result
-          else
-            result.unshift(split.captures)
-          end
-        end
-        result.unshift(split.lhs)
+      if accept.call(split)
+        # [lhs + captures] + result
+        result.unshift(split.lhs, split.captures)
       else
         # prepend the lhs
         result[0] = split.lhs + split.separator + result[0]
       end
     end
-    result
+    render(result)
   end
   private
-  def splits_for(parts, ncaptures)
-    result = []
-    splits = []
-    until parts.empty?
-      lhs = parts.shift
-      separator = parts.shift
-      captures = parts.shift(ncaptures)
-      rhs = parts.length == 1 ? parts.shift : parts.first
-      if @remove_empty && (lhs.empty? || rhs.empty?)
-        if lhs.empty? && rhs.empty?
-          # do nothing
-        elsif parts.empty? # last split
-          result << (!lhs.empty? ? lhs : rhs) if splits.empty?
-        elsif rhs.empty?
-          # replace the empty rhs with the non-empty lhs
-          parts[0] = lhs
-        end
+  # initialisation common to +split+ and +rsplit+
+  #
+  # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
+  # the following fields:
+  #
+  #   - result: the array of separated strings to return from +split+ or +rsplit+.
+  #     if the splits arry is empty, the caller returns this array immediately
+  #     without any further processing
+  #
+  #   - splits: an array of hashes containing the lhs, rhs, separator and captured
+  #     separator substrings for each split
+  #
+  #   - count: the number of splits
+  #
+  #   - accept: a proc whose return value determines whether each split should be
+  #     accepted (true) or rejected (false)
+  #
+  def init(string:, delimiter:, select:, reject:, block:)
+    if reject
+      positions = reject
+      action = Action::REJECT
+    elsif select
+      positions = select
+      action = Action::SELECT
+    end
-        next
-      end
+    splits = parse(string, delimiter)
-      splits << {
-        lhs: lhs,
-        rhs: rhs,
-        separator: separator,
-        captures: captures,
-      }
+    if splits.empty?
+      result = string.empty? ? [] : [string]
+      return [result]
     end
-    [result, splits]
+    block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
+    [[], splits, splits.length, block]
   end
-  # setup common to both split methods
-  def split_common(string, delimiter, at, block)
-    unless (match = string.match(delimiter))
-      result = (@remove_empty && string.empty?) ? [] : [string]
-      return [result, block, NO_SPLITS, 0, -1]
+  def render(values)
+    values.flat_map do |value|
+      if value.is_a?(String)
+        value.empty? && @remove_empty_fields ? REMOVE : [value]
+      elsif @include_captures
+        if @spread_captures
+          @spread_captures == :compact ? value.compact : value
+        elsif value.empty?
+          # we expose non-captures (string delimiters or regexps with no
+          # captures) as empty arrays inside the block, so the type is
+          # consistent, but it doesn't make sense to keep them in the
+          # result
+          REMOVE
+        else
+          [value]
+        end
+      else
+        REMOVE
+      end
     end
-    ncaptures = match.captures.length
-    delimiter = increment_backrefs(delimiter, ncaptures)
-    parts = string.split(/(#{delimiter})/, -1)
-    remove_trailing_empty_field!(parts, ncaptures)
-    result, splits = splits_for(parts, ncaptures)
-    count = splits.length
-    block ||= at ? match_positions(at, count) : ACCEPT
-    [result, block, splits, count, -1]
   end
-  # increment back-references so they remain valid when the outer capture
-  # is added.
-  #
-  # e.g. to split on:
+  # takes a string and a delimiter pattern (regex or string) and splits it along
+  # the delimiter, returning an array of objects (hashes) representing each split.
+  # e.g. for:
   #
-  #   - <foo-comment> ... </foo-comment>
-  #   - <bar-comment> ... </bar-comment>
+  #   parse.split("foo:bar:baz:quux", ":")
   #
-  # etc.
+  # we return:
   #
-  # before:
+  #   [
+  #       { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
+  #       { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
+  #       { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
+  #   ]
   #
-  #   %r|   <(\w+-comment)> [^<]* </\1>   |x
-  #
-  # after:
-  #
-  #   %r| ( <(\w+-comment)> [^<]* </\2> ) |x
+  def parse(string, delimiter)
+    result = []
+    start = 0
-  def increment_backrefs(delimiter, ncaptures)
-    if delimiter.is_a?(Regexp) && ncaptures > 0
-      delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
-        match = Regexp.last_match
-        match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
-      end
+    # we don't use the argument passed to the +scan+ block here because it's a
+    # string (the separator) if there are no captures, rather than an empty
+    # array. we use match.captures instead to get the array
+    string.scan(delimiter) do
+      match = Regexp.last_match
+      index, after = match.offset(0)
+      separator = match[0]
+      # ignore empty separators at the beginning and/or end of the string
+      next if separator.empty? && (index.zero? || after == string.length)
+      lhs = string.slice(start, index - start)
+      result.last[:rhs] = lhs unless result.empty?
+      # this is correct for the last/only match, but gets updated to the next
+      # match's lhs for other matches
+      rhs = match.post_match
+      result << {
+        captures: match.captures,
+        lhs: lhs,
+        rhs: rhs,
+        separator: separator,
+      }
+      # move the start index (the start of the next lhs) to the index after the
+      # last character of the separator
+      start = after
     end
-    delimiter
+    result
   end
-  # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
-  # on an empty string/pattern without removing trailing empty fields e.g.:
+  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
+  # on the action) the supplied positions
   #
-  #   "foobar".split("", -1)
-  #   "foobar".split(//, -1)
-  #   # => ["f", "o", "o", "b", "a", "r", ""]
+  # positions are preprocessed to support additional features: negative
+  # ranges, infinite ranges, and descending ranges, e.g.:
   #
-  #   "foobar".split(/()/, -1)
-  #   # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
+  #   ss.split("foo:bar:baz:quux", ":", at: -1)
   #
-  #   "foobar".split(/(())/, -1)
-  #   # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
+  # translates to:
   #
-  # *there is no such thing as an empty field whose separator is empty*, so
-  # if String#split's result ends with an empty separator, 0 or more (empty)
-  # captures and an empty field, we can safely remove them.
-  def remove_trailing_empty_field!(parts, ncaptures)
-    # the trailing field is at index -1. if there are 0 captures, the separator
-    # is at -2:
-    #
-    #   [empty_separator, empty_field]
-    #
-    # if there is 1 capture, the separator is at -3:
-    #
-    #   [empty_separator, capture, empty_field]
+  #   ss.split("foo:bar:baz:quux", ":", at: 3)
+  #
+  # and
+  #
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #
+  # translate to:
+  #
+  #   ss.split("foo:bar:baz:quux", ":", at: 6..8)
+  #
+  def compile(positions, action, count)
+    # XXX note: we don't use modulo, because we don't want
+    # out-of-bounds indices to silently work, e.g. we don't want:
     #
-    # etc. therefore we find the separator by walking back
+    #   ss.split("foo:bar:baz:quux", ":", at: -42)
     #
-    #  1 (empty field)
-    #  + ncaptures
-    #  + 1 (separator)
+    # to mysteriously match when the index/position is 0/1
     #
-    # steps from the end of the array i.e. ncaptures + 2
-    count = ncaptures + 2
-    separator_index = count * -1
-    return unless parts[-1].empty? && parts[separator_index].empty?
-    # drop the empty separator, the (empty) captures, and the trailing empty field
-    parts.pop(count)
-  end
-  def match_positions(positions, nsplits)
-    positions = Array(positions).map do |position|
-      if position.is_a?(Integer) && position.negative?
-        # translate negative indices to 1-based non-negative indices e.g:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: -1)
-        #
-        # translates to:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: 3)
-        #
-        # XXX note: we don't use modulo, because we don't want
-        # out-of-bounds indices to silently work e.g. we don't want:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", -42)
-        #
-        # to mysteriously match when the position is 2
-        nsplits + 1 + position
+    resolve = ->(int) { int.negative? ? count + 1 + int : int }
+    # don't use Array(...) to wrap these as we don't want to convert ranges
+    positions = positions.is_a?(Array) ? positions : [positions]
+    positions = positions.map do |position|
+      if position.is_a?(Integer)
+        resolve[position]
+      elsif position.is_a?(Range)
+        rbegin = position.begin
+        rend = position.end
+        rexc = position.exclude_end?
+        if rbegin.nil?
+          Range.new(1, resolve[rend], rexc)
+        elsif rend.nil?
+          Range.new(resolve[rbegin], count, rexc)
+        elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
+          from = resolve[rbegin]
+          to = resolve[rend]
+          to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
+        else
+          position
+        end
+      elsif position.is_a?(Set)
+        position.map { |it| resolve[it] }.to_set
       else
         position
       end
     end
-    lambda do |split|
-      case split.position when *positions then true else false end
-    end
+    ->(split) { case split.position when *positions then action else !action end }
   end
 end

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.3.1'
+  VERSION = '0.7.0'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.3.1
+  version: 0.7.0
 platform: ruby
 authors:
 - chocolateboy
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-06-24 00:00:00.000000000 Z
+date: 2020-08-21 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: values
@@ -30,42 +30,42 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
 - !ruby/object:Gem::Dependency
   name: minitest-power_assert
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
 - !ruby/object:Gem::Dependency
   name: minitest-reporters
   requirement: !ruby/object:Gem::Requirement
@@ -86,29 +86,15 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
-- !ruby/object:Gem::Dependency
-  name: rubocop
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
-description:
+        version: '13.0'
+description:
 email: chocolate@cpan.org
 executables: []
 extensions: []
@@ -127,7 +113,7 @@ metadata:
   bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
   changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
   source_code_uri: https://github.com/chocolateboy/string_splitter
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -135,16 +121,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '2.3'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.7
-signing_key:
+rubygems_version: 3.1.4
+signing_key:
 specification_version: 4
 summary: String#split on steroids
 test_files: []