RubyGems - string_splitter - Versions diffs - 0.4.0 → 0.7.1 - Mend

string_splitter 0.4.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +73 -9
data/README.md +172 -54
data/lib/string_splitter.rb +297 -164
data/lib/string_splitter/split.rb +51 -0
data/lib/string_splitter/version.rb +1 -1
metadata +17 -45

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6e73d7818b793f4c8dcbba35afb97045428085f17c07df328875383b2d162818
-  data.tar.gz: 6a15e18b265779aeaf5d8b18c5e13d70299ed7abbc52da815e2cee46419ce9a1
+  metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
+  data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
 SHA512:
-  metadata.gz: dee0e70311f3718d0f7b68f77b56c657e137c8a658905ddc58f4fbef536b80071dced6a9836d0ad531b93e15c16c5f25ec42965f02a0a1e016de179b133bf746
-  data.tar.gz: 72a237dd80e3e06aad4a5bcc4545e6ea4d3d8167beb9afa6b502d964d510857947fe6071a88ded9f83df77a54da059c97abc1b3e70d8c14a03c8b77b0a01a675
+  metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
+  data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b

data/CHANGELOG.md CHANGED

@@ -1,26 +1,90 @@
+## 0.7.1 - 2020-08-22
+#### Changes
+- performance improvements
+  - delegate to `String#split` where possible
+  - use a regular class for Split rather than values.rb
+  - create Split objects directly rather than allocating intermediate hashes
+## 0.7.0 - 2020-08-21
+#### Breaking Changes
+- `String#split` incompatibility: we no longer trim the string (with
+  `String#strip`) before splitting if the delimiter is omitted
+## 0.6.0 - 2020-08-20
+#### Breaking Changes
+- `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
+  unlike Ruby's `String#split`, the former no longer strips the string before
+  splitting
+- rename the `remove_empty` option `remove_empty_fields`
+- rename the `exclude` option `except` (alias for `reject`)
+#### Features
+- add support for descending, negative, and infinite ranges,
+  e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
+#### Fixes
+- correctly handle backreferences in delimiter patterns
+## 0.5.1 - 2018-07-01
+#### Changes
+- set StringSplitter::VERSION when `string_splitter.rb` is loaded
+## 0.5.0 - 2018-06-26
+#### Features
+- add a `reject`/`exclude` option which rejects splits at the specified positions
+- add a `select` alias for `at`
+#### Fixes
+- don't treat string delimiters as patterns
 ## 0.4.0 - 2018-06-24
-- **breaking change**: remove the `offset` alias for `split.index`
+#### Breaking Changes
+- remove the `offset` alias for `split.index`
 ## 0.3.1 - 2018-06-24
-- remove trailing empty field when the separator is empty (#1)
+#### Fixes
+- remove trailing empty field when the separator is empty
+  ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
 ## 0.3.0 - 2018-06-23
-- **breaking change**: rename the `default_separator` option to `default_delimiter`
-  - to avoid ambiguity in the code, refer to the input pattern/string as the
-    "delimiter" and the matched string as the "separator"
+#### Breaking Changes
+- rename the `default_separator` option `default_delimiter`
 ## 0.2.0 - 2018-06-22
-- **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
-  (AKA `pos`) as the 1-based accessor
+#### Breaking Changes
+- make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
+  1-based accessor
 ## 0.1.0 - 2018-06-22
-- **breaking change**: the block now takes a single `split` object with an
-  `index` accessor, rather than seperate `index` and `split` arguments
+#### Breaking Changes
+- the block now takes a single `split` object with an `index` accessor, rather
+  than separate `index` and `split` arguments
+#### Features
 - add support for negative indices in the value supplied to the `at` option
 - add a `count` field to the split object containing the total number of splits

data/README.md CHANGED

@@ -3,14 +3,16 @@
 [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
 [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+<!-- toc -->
 - [NAME](#name)
 - [INSTALLATION](#installation)
 - [SYNOPSIS](#synopsis)
 - [DESCRIPTION](#description)
 - [WHY?](#why)
+- [CAVEATS](#caveats)
+  - [Differences from String#split](#differences-from-stringsplit)
+- [COMPATIBILITY](#compatibility)
 - [VERSION](#version)
 - [SEE ALSO](#see-also)
   - [Gems](#gems)
@@ -18,7 +20,7 @@
 - [AUTHOR](#author)
 - [COPYRIGHT AND LICENSE](#copyright-and-license)
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+<!-- tocstop -->
 # NAME
@@ -36,65 +38,128 @@ gem "string_splitter"
 require "string_splitter"
 ss = StringSplitter.new
+```
+**Same as `String#split`**
-# same as String#split
-ss.split("foo bar baz quux")
-ss.split("foo bar baz quux", " ")
-ss.split("foo bar baz quux", /\s+/)
-# => ["foo", "bar", "baz", "quux"]
+```ruby
+ss.split("foo bar baz")
+ss.split("foo bar baz", " ")
+ss.split("foo bar baz", /\s+/)
+# => ["foo", "bar", "baz"]
+ss.split("foo", "")
+ss.split("foo", //)
+# => ["f", "o", "o"]
+ss.split("", "...")
+ss.split("", /.../)
+# => []
+```
-# split at the first delimiter
+**Split at the first delimiter**
+```ruby
 ss.split("foo:bar:baz:quux", ":", at: 1)
+ss.split("foo:bar:baz:quux", ":", select: 1)
 # => ["foo", "bar:baz:quux"]
+```
-# split at the last delimiter
+**Split at the last delimiter**
+```ruby
 ss.split("foo:bar:baz:quux", ":", at: -1)
 # => ["foo:bar:baz", "quux"]
+```
+**Split at multiple delimiter positions**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2", "3", "4:5:6:7:8", "9"]
+```
-# split at multiple delimiter positions
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
-# => ["1", "2", "3", "4:5:6:7", "8:9"]
+**Split at all but the first and last delimiters**
-# split from the right
+```ruby
+ss.split("1:2:3:4:5:6", ":", except: [1, -1])
+ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
+# => ["1:2", "3", "4", "5:6"]
+```
+**Split from the right**
+```ruby
 ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
 # => ["1:2:3:4", "5:6", "7", "8", "9"]
+```
+**Split with negative, descending, and infinite ranges**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
+# => ["1", "2", "3", "4", "5", "6", "7:8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
+# => ["1:2:3:4", "5", "6", "7", "8:9"]
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
+# => ["1", "2:3", "4", "5", "6:7", "8", "9"]
+```
+**Full control via a block**
-# full control via a block
-result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
-  split.index > 0 && split.lhs == split.rhs
+```ruby
+result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
+  split.pos % 2 == 0
 end
-# => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
+# => ["1:2", "3:4", "5:6", "7:8"]
+```
+```ruby
+string = "banana".chars.sort.join # "aaabnn"
+ss.split(string, "") do |split|
+    split.rhs != split.lhs
+end
+# => ["aaa", "b", "nn"]
 ```
 # DESCRIPTION
-Many languages have built-in string `split` functions/methods. They behave similarly
-(notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
-and handle a few common cases e.g.:
+Many languages have built-in `split` functions/methods for strings. They behave
+similarly (notwithstanding the occasional
+[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
+handle a few common cases, e.g.:
 * limiting the number of splits
-* including the separators in the results
+* including the separator(s) in the results
 * removing (some) empty fields
-But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
-achieving the desired effects can be tricky. For instance, while `String#split` removes empty
-trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
-cramped API means there's no way to e.g. combine a limit (positive integer) with the option
-to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
+But, because the API is squeezed into two overloaded parameters (the delimiter
+and the limit), achieving the desired results can be tricky. For instance,
+while `String#split` removes empty trailing fields (by default), it provides no
+way to remove *all* empty fields. Likewise, the cramped API means there's no
+way to, e.g., combine a limit (positive integer) with the option to preserve
+empty fields (negative integer), or use backreferences in a delimiter pattern
 without including its captured subexpressions in the result.
-If `split` was being written from scratch, without the baggage of its legacy API,
-it's possible that some of these options would be made explicit rather than overloading
-the parameters. And, indeed, this is possible in some implementations,
-e.g. in Crystal:
+If `split` was being written from scratch, without the baggage of its legacy
+API, it's possible that some of these options would be made explicit rather
+than overloading the parameters. And, indeed, this is possible in some
+implementations, e.g. in Crystal:
 ```ruby
-":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
-":foo:bar:baz:".split(":", remove_empty: true)  # => ["foo", "bar", "baz"]
+":foo:bar:baz:".split(":", remove_empty: false)
+# => ["", "foo", "bar", "baz", ""]
+":foo:bar:baz:".split(":", remove_empty: true)
+# => ["foo", "bar", "baz"]
 ````
-StringSplitter takes this one step further by moving the configuration out of the method altogether
-and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
+StringSplitter takes this one step further by moving the configuration out of
+the method altogether and delegating the strategy — i.e. which splits should be
+accepted or rejected — to a block:
 ```ruby
 ss = StringSplitter.new
@@ -102,22 +167,32 @@ ss = StringSplitter.new
 ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
 # => ["foo", "bar:baz"]
-ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
-# => ["foo:bar", "baz"]
+ss.split("foo:bar:baz:quux", ":") do |split|
+  split.position == 1 || split.position == 3
+end
+# => ["foo", "bar:baz", "quux"]
 ```
-As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
+As a shortcut, the common case of splitting (or not splitting) at one or more
+positions is supported by dedicated options:
 ```ruby
-ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", select: [1, -1])
+# => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
+# => ["foo:bar", "baz:quux"]
 ```
 # WHY?
-I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
+I wanted to split semi-structured output into fields without having to resort
+to a regex or a full-blown parser.
-As an example, the nominally unstructured output of many Unix commands is often formatted in a way
-that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
+As an example, the nominally unstructured output of many Unix commands is often
+formatted in a way that's tantalizingly close to being
+[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
+apart from a few pesky exceptions, e.g.:
 ```bash
 $ ls -l
@@ -129,8 +204,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
 -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
 ```
-These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
-date (columns 6-8) i.e.:
+These lines can *almost* be parsed into an array of fields by splitting them on
+whitespace. The exception is the date (columns 6-8), i.e.:
 ```ruby
 line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
@@ -149,19 +224,20 @@ instead of:
 ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
-One way to work around this is to parse the whole line e.g.:
+One way to work around this is to parse the whole line, e.g.:
 ```ruby
 line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
 ```
-But that requires us to specify *everything*. What we really want is a version of `split`
-which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
-splits are accepted, rather than being restricted to the single, baked-in strategy provided
-by the `limit` parameter.
+But that requires us to specify *everything*. What we really want is a version
+of `split` which allows us to veto splitting for the 6th and 7th delimiters
+(and to stop after the 8th delimiter), i.e. control over which splits are
+accepted, rather than being restricted to the single, baked-in strategy
+provided by the `limit` parameter.
-By providing a simple way to accept or reject each split, StringSplitter makes cases like
-this easy to handle, either via a block:
+By providing a simple way to accept or reject each split, StringSplitter makes
+cases like this easy to handle, either via a block:
 ```ruby
 ss.split(line) do |split|
@@ -177,9 +253,51 @@ ss.split(line, at: [1..5, 8])
 # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
+# CAVEATS
+## Differences from String#split
+Unlike `String#split`, StringSplitter doesn't trim the string before splitting
+if the delimiter is omitted or a single space, e.g.:
+```ruby
+" foo bar baz ".split          # => ["foo", "bar", "baz"]
+" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ")      # => ["", "foo", "bar", "baz", ""]
+ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
+```
+`String#split` omits the `nil` values of unmatched optional captures:
+```ruby
+"foo:bar:baz".scan(/(:)|(-)/)  # => [[":", nil], [":", nil]]
+"foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
+```
+StringSplitter preserves them by default (if `include_captures` is true, as it
+is by default), though they can be omitted from spread captures by passing
+`:compact` as the value of the `spread_captures` option:
+```ruby
+s1 = StringSplitter.new(spread_captures: true)
+s2 = StringSplitter.new(spread_captures: false)
+s3 = StringSplitter.new(spread_captures: :compact)
+s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
+s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
+s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
+```
+# COMPATIBILITY
+StringSplitter is tested and supported on all versions of Ruby [supported by
+the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
+currently, Ruby 2.5 and above.
 # VERSION
-0.4.0
+0.7.1
 # SEE ALSO
@@ -197,7 +315,7 @@ ss.split(line, at: [1..5, 8])
 # COPYRIGHT AND LICENSE
-Copyright © 2018 by chocolateboy.
+Copyright © 2018-2020 by chocolateboy.
 This is free software; you can redistribute it and/or modify it under the
-terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
+terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).

data/lib/string_splitter.rb CHANGED

@@ -1,249 +1,382 @@
 # frozen_string_literal: true
-require 'values'
+require 'set'
+require_relative 'string_splitter/split'
+require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
 #
 #   - providing full control over which splits are accepted or rejected
+#
 #   - adding support for splitting from right-to-left
-#   - encapsulating splitting options/preferences in instances rather than trying to
-#     cram them into overloaded method parameters
+#
+#   - encapsulating splitting options/preferences in the splitter rather
+#     than trying to cram them into overloaded method parameters
 #
 # These enhancements allow splits to handle many cases that otherwise require bigger
-# guns e.g. regex matching or parsing.
+# guns, e.g. regex matching or parsing.
+#
+# Implementation-wise, we split the string either with String#split, or with a custom
+# scanner if the delimiter may contain captures (since String#split doesn't handle
+# them correctly) and parse the resulting tokens into an array of Split objects with
+# the following attributes:
+#
+#   - captures:  separator substrings captured by parentheses in the delimiter pattern
+#   - count:     the number of splits
+#   - index:     the 0-based index of the split in the array
+#   - lhs:       the string to the left of the separator (back to the previous split candidate)
+#   - position:  the 1-based index of the split in the array (alias: pos)
+#   - rhs:       the string to the right of the separator (up to the next split candidate)
+#   - rindex:    the 0-based index of the split relative to the end of the array
+#   - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
+#   - separator: the string matched by the delimiter pattern/string
+#
 class StringSplitter
-  ACCEPT = ->(_split) { true }
-  DEFAULT_DELIMITER = /\s+/
-  NO_SPLITS = []
+  # terminology: the delimiter is what we provide and the separators are what we get
+  # back (if we capture them). e.g. for:
+  #
+  #   ss.split("foo:bar::baz", /(\W+)/)
+  #
+  # the delimiter is /(\W)/ and the separators are ":" and "::"
-  Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
-    def position
-      index + 1
-    end
+  ACCEPT_ALL = ->(_split) { true }
+  DEFAULT_DELIMITER = /\s+/.freeze
+  REMOVE = [].freeze
-    alias_method :pos, :position
+  # simulate an enum. the value is returned by the case statement
+  # in the generated block if the positions match
+  module Action
+    SELECT = true
+    REJECT = false
   end
+  private_constant :Action
   def initialize(
     default_delimiter: DEFAULT_DELIMITER,
     include_captures: true,
-    remove_empty: false,
+    remove_empty: false, # TODO remove this
+    remove_empty_fields: remove_empty,
     spread_captures: true
   )
     @default_delimiter = default_delimiter
     @include_captures = include_captures
-    @remove_empty = remove_empty
+    @remove_empty_fields = remove_empty_fields
     @spread_captures = spread_captures
   end
-  attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
+  attr_reader(
+    :default_delimiter,
+    :include_captures,
+    :remove_empty_fields,
+    :spread_captures
+  )
-  def split(string, delimiter = @default_delimiter, at: nil, &block)
-    result, block, splits, count, index = split_common(string, delimiter, at, block)
+  # TODO remove this
+  alias remove_empty remove_empty_fields
+  def split(
+    string,
+    delimiter = @default_delimiter,
+    at: nil, # alias for select
+    except: nil, # alias for reject
+    select: at,
+    reject: except,
+    &block
+  )
+    result, splits, count, accept = init(
+      string: string,
+      delimiter: delimiter,
+      select: select,
+      reject: reject,
+      block: block
+    )
-    splits.each do |split|
-      split = Split.with(split.merge({ index: (index += 1), count: count }))
-      result << split.lhs if result.empty?
+    return result unless splits
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result += split.captures
-          else
-            result << split.captures
-          end
-        end
+    result << splits.first.lhs
+    splits.each_with_index do |split, index|
+      split.update!(count: count, index: index)
-        result << split.rhs
+      if accept.call(split)
+        result << split.captures << split.rhs
       else
         # append the rhs
         result[-1] = result[-1] + split.separator + split.rhs
       end
     end
-    result
+    render(result)
   end
   alias lsplit split
-  def rsplit(string, delimiter = @default_delimiter, at: nil, &block)
-    result, block, splits, count, index = split_common(string, delimiter, at, block)
+  def rsplit(
+    string,
+    delimiter = @default_delimiter,
+    at: nil, # alias for select
+    except: nil, # alias for reject
+    select: at,
+    reject: except,
+    &block
+  )
+    result, splits, count, accept = init(
+      string: string,
+      delimiter: delimiter,
+      select: select,
+      reject: reject,
+      block: block
+    )
+    return result unless splits
-    splits.reverse!.each do |split|
-      split = Split.with(split.merge({ index: (index += 1), count: count }))
-      result.unshift(split.rhs) if result.empty?
+    result.unshift(splits.last.rhs)
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result = split.captures + result
-          else
-            result.unshift(split.captures)
-          end
-        end
+    splits.reverse_each.with_index do |split, index|
+      split.update!(count: count, index: index)
-        result.unshift(split.lhs)
+      if accept.call(split)
+        # [lhs + captures] + result
+        result.unshift(split.lhs, split.captures)
       else
         # prepend the lhs
         result[0] = split.lhs + split.separator + result[0]
       end
     end
-    result
+    render(result)
   end
   private
-  def splits_for(parts, ncaptures)
-    result = []
-    splits = []
-    until parts.empty?
-      lhs = parts.shift
-      separator = parts.shift
-      captures = parts.shift(ncaptures)
-      rhs = parts.length == 1 ? parts.shift : parts.first
-      if @remove_empty && (lhs.empty? || rhs.empty?)
-        if lhs.empty? && rhs.empty?
-          # do nothing
-        elsif parts.empty? # last split
-          result << (!lhs.empty? ? lhs : rhs) if splits.empty?
-        elsif rhs.empty?
-          # replace the empty rhs with the non-empty lhs
-          parts[0] = lhs
-        end
+  # initialisation common to +split+ and +rsplit+
+  #
+  # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
+  # the following fields:
+  #
+  #   - result: the array of separated strings to return from +split+ or +rsplit+.
+  #     if the splits array is empty, the caller returns this array immediately
+  #     without any further processing
+  #
+  #   - splits: an array of hashes containing the lhs, rhs, separator and captured
+  #     separator substrings for each split
+  #
+  #   - count: the number of splits
+  #
+  #   - accept: a proc whose return value determines whether each split should be
+  #     accepted (true) or rejected (false)
+  #
+  def init(string:, delimiter:, select:, reject:, block:)
+    return [[]] if string.empty?
+    unless block
+      if reject
+        positions = reject
+        action = Action::REJECT
+      elsif select
+        positions = select
+        action = Action::SELECT
+      else
+        block = ACCEPT_ALL
+      end
+    end
+    # use String#split if we can
+    #
+    # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
+    # on TruffleRuby
+    if delimiter.is_a?(String)
+      limit = -1
-        next
+      if delimiter == ' '
+        delimiter = / / # don't trim
+      elsif delimiter.empty?
+        limit = 0 # remove the trailing empty string
       end
-      splits << {
-        lhs: lhs,
-        rhs: rhs,
-        separator: separator,
-        captures: captures,
-      }
-    end
+      result = string.split(delimiter, limit)
-    [result, splits]
-  end
+      return [result] if result.length == 1 # delimiter not found: no splits
+      if block == ACCEPT_ALL # return the (2 or more) fields
+        result = result.reject(&:empty?) if @remove_empty_fields
+        return [result]
+      end
+      splits = []
+      result.each_cons(2) do |lhs, rhs| # 2 or more fields
+        splits << Split.new(
+          captures: [],
+          lhs: lhs,
+          rhs: rhs,
+          separator: delimiter
+        )
+      end
+    elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
+      # non-empty separators so -1 is safe
+      if @remove_empty_fields
+        result = []
+        string.split(delimiter, -1) do |field|
+          result << field unless it.empty?
+        end
+      else
+        result = string.split(delimiter, -1)
+      end
-  # setup common to both split methods
-  def split_common(string, delimiter, at, block)
-    unless (match = string.match(delimiter))
-      result = (@remove_empty && string.empty?) ? [] : [string]
-      return [result, block, NO_SPLITS, 0, -1]
+      return [result]
+    else
+      splits = parse(string, delimiter)
     end
-    ncaptures = match.captures.length
-    delimiter = increment_backrefs(delimiter, ncaptures)
-    parts = string.split(/(#{delimiter})/, -1)
-    remove_trailing_empty_field!(parts, ncaptures)
-    result, splits = splits_for(parts, ncaptures)
     count = splits.length
-    block ||= at ? match_positions(at, count) : ACCEPT
-    [result, block, splits, count, -1]
+    return [[string]] if count.zero?
+    block ||= compile(positions, action, count)
+    [[], splits, count, block]
   end
-  # increment back-references so they remain valid when the outer capture
-  # is added.
-  #
-  # e.g. to split on:
-  #
-  #   - <foo-comment> ... </foo-comment>
-  #   - <bar-comment> ... </bar-comment>
-  #
-  # etc.
+  def render(values)
+    values.flat_map do |value|
+      if value.is_a?(String)
+        value.empty? && @remove_empty_fields ? REMOVE : [value]
+      elsif @include_captures
+        if @spread_captures
+          # TODO make sure compact can return a Capture
+          @spread_captures == :compact ? value.compact : value
+        elsif value.empty?
+          # we expose non-captures (string delimiters or regexps with no
+          # captures) as empty arrays inside the block, so the type is
+          # consistent, but it doesn't make sense to keep them in the
+          # result
+          REMOVE
+        else
+          [value]
+        end
+      else
+        REMOVE
+      end
+    end
+  end
+  # takes a string and a delimiter pattern (regex or string) and splits it along
+  # the delimiter, returning an array of objects (hashes) representing each split.
+  # e.g. for:
   #
-  # before:
+  #   parse("foo:bar:baz:quux", ":")
   #
-  #   %r|   <(\w+-comment)> [^<]* </\1-comment>   |x
+  # we return:
   #
-  # after:
+  #   [
+  #       { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
+  #       { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
+  #       { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
+  #   ]
   #
-  #   %r| ( <(\w+-comment)> [^<]* </\2-comment> ) |x
+  def parse(string, delimiter)
+    # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
+    result = []
+    start = 0
-  def increment_backrefs(delimiter, ncaptures)
-    if delimiter.is_a?(Regexp) && ncaptures > 0
-      delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
-        match = Regexp.last_match
-        match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
-      end
+    # we don't use the argument passed to the +scan+ block here because it's a
+    # string (the separator) if there are no captures, rather than an empty
+    # array. we use match.captures instead to get the array
+    string.scan(delimiter) do
+      match = Regexp.last_match
+      index, after = match.offset(0)
+      separator = match[0]
+      # ignore empty separators at the beginning and/or end of the string
+      next if separator.empty? && (index.zero? || after == string.length)
+      lhs = string.slice(start, index - start)
+      result.last.rhs = lhs unless result.empty?
+      # this is correct for the last/only match, but gets updated to the next
+      # match's lhs for other matches
+      rhs = match.post_match
+      # captures = (has_names ? Captures.new(match) : match.captures)
+      result << Split.new(
+        captures: match.captures,
+        lhs: lhs,
+        rhs: rhs,
+        separator: separator
+      )
+      # advance the start index (the start of the next lhs) to the position
+      # after the last character of the separator
+      start = after
     end
-    delimiter
+    result
   end
-  # work around Ruby's (and Perl's and Groovy's) unhelpful behavior when splitting
-  # on an empty string/pattern without removing trailing empty fields e.g.:
+  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
+  # on the action) the supplied positions
   #
-  #   "foobar".split("", -1)
-  #   "foobar".split(//, -1)
-  #   # => ["f", "o", "o", "b", "a", "r", ""]
+  # positions are preprocessed to support negative indices, infinite ranges, and
+  # descending ranges, e.g.:
   #
-  #   "foobar".split(/()/, -1)
-  #   # => ["f", "", "o", "", "o", "", "b", "", "a", "", "r", "", ""]
+  #   ss.split("foo:bar:baz:quux", ":", at: -1)
   #
-  #   "foobar".split(/(())/, -1)
-  #   # => ["f", "", "", "o", "", "", "o", "", "", "b", "", "", "a", "", "", "r", "", "", ""]
+  # translates to:
   #
-  # *there is no such thing as an empty field whose separator is empty*, so
-  # if String#split's result ends with an empty separator, 0 or more (empty)
-  # captures and an empty field, we can safely remove them.
-  def remove_trailing_empty_field!(parts, ncaptures)
-    # the trailing field is at index -1. if there are 0 captures, the separator
-    # is at -2:
-    #
-    #   [empty_separator, empty_field]
-    #
-    # if there is 1 capture, the separator is at -3:
-    #
-    #   [empty_separator, capture, empty_field]
+  #   ss.split("foo:bar:baz:quux", ":", at: 3)
+  #
+  # and
+  #
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #
+  # translates to:
+  #
+  #   ss.split("foo:bar:baz:quux", ":", at: 6..8)
+  #
+  def compile(positions, action, count)
+    # XXX note: we don't use modulo, because we don't want
+    # out-of-bounds indices to silently work, e.g. we don't want:
     #
-    # etc. therefore we find the separator by walking back
+    #   ss.split("foo:bar:baz:quux", ":", at: -42)
     #
-    #  1 (empty field)
-    #  + ncaptures
-    #  + 1 (separator)
+    # to mysteriously match when the index/position is 0/1
     #
-    # steps from the end of the array i.e. ncaptures + 2
-    count = ncaptures + 2
-    separator_index = count * -1
-    return unless parts[-1].empty? && parts[separator_index].empty?
-    # drop the empty separator, the (empty) captures, and the trailing empty field
-    parts.pop(count)
-  end
-  def match_positions(positions, nsplits)
-    positions = Array(positions).map do |position|
-      if position.is_a?(Integer) && position.negative?
-        # translate negative indices to 1-based non-negative indices e.g:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: -1)
-        #
-        # translates to:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", at: 3)
-        #
-        # XXX note: we don't use modulo, because we don't want
-        # out-of-bounds indices to silently work e.g. we don't want:
-        #
-        #   ss.split("foo:bar:baz:quux", ":", -42)
-        #
-        # to mysteriously match when the position is 2
-        nsplits + 1 + position
+    resolve = ->(int) { int.negative? ? count + 1 + int : int }
+    # don't use Array(...) to wrap these as we don't want to convert ranges
+    positions = positions.is_a?(Array) ? positions : [positions]
+    positions = positions.map do |position|
+      if position.is_a?(Integer)
+        resolve[position]
+      elsif position.is_a?(Range)
+        rbegin = position.begin
+        rend = position.end
+        rexc = position.exclude_end?
+        if rbegin.nil?
+          Range.new(1, resolve[rend], rexc)
+        elsif rend.nil?
+          Range.new(resolve[rbegin], count, rexc)
+        elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
+          from = resolve[rbegin]
+          to = resolve[rend]
+          to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
+        else
+          position
+        end
+      elsif position.is_a?(Set)
+        position.map { |it| resolve[it] }.to_set
       else
         position
       end
     end
-    lambda do |split|
-      case split.position when *positions then true else false end
-    end
+    ->(split) { case split.position when *positions then action else !action end }
   end
 end

data/lib/string_splitter/split.rb ADDED

@@ -0,0 +1,51 @@
+# frozen_string_literal: true
+class StringSplitter
+  class Split
+    attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
+    attr_writer :rhs
+    alias pos position
+    def initialize(captures:, lhs:, rhs:, separator:)
+      @captures = captures
+      @lhs = lhs
+      @rhs = rhs
+      @separator = separator
+    end
+    # 0-based index relative to the end of the array, e.g. for 5 items:
+    #
+    #  index | rindex
+    #  ------|-------
+    #    0   |   4
+    #    1   |   3
+    #    2   |   2
+    #    3   |   1
+    #    4   |   0
+    def rindex
+      @count - @position
+    end
+    # 1-based position relative to the end of the array, e.g. for 5 items:
+    #
+    #   position | rposition
+    #  ----------|----------
+    #      1     |    5
+    #      2     |    4
+    #      3     |    3
+    #      4     |    2
+    #      5     |    1
+    def rposition
+      @count + 1 - @position
+    end
+    alias rpos rposition
+    def update!(count:, index:)
+      @count = count
+      @index = index
+      @position = index + 1
+      freeze
+    end
+  end
+end

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.4.0'
+  VERSION = '0.7.1'
 end

metadata CHANGED

@@ -1,71 +1,57 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 0.7.1
 platform: ruby
 authors:
 - chocolateboy
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-06-24 00:00:00.000000000 Z
+date: 2020-08-22 00:00:00.000000000 Z
 dependencies:
-- !ruby/object:Gem::Dependency
-  name: values
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.8'
-  type: :runtime
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.8'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
 - !ruby/object:Gem::Dependency
   name: minitest-power_assert
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
 - !ruby/object:Gem::Dependency
   name: minitest-reporters
   requirement: !ruby/object:Gem::Requirement
@@ -86,29 +72,15 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '10.0'
-- !ruby/object:Gem::Dependency
-  name: rubocop
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.54.0
-description:
+        version: '13.0'
+description:
 email: chocolate@cpan.org
 executables: []
 extensions: []
@@ -118,6 +90,7 @@ files:
 - LICENSE.md
 - README.md
 - lib/string_splitter.rb
+- lib/string_splitter/split.rb
 - lib/string_splitter/version.rb
 homepage: https://github.com/chocolateboy/string_splitter
 licenses:
@@ -127,7 +100,7 @@ metadata:
   bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
   changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
   source_code_uri: https://github.com/chocolateboy/string_splitter
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -135,16 +108,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '2.3'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.7
-signing_key:
+rubygems_version: 3.1.4
+signing_key:
 specification_version: 4
 summary: String#split on steroids
 test_files: []