RubyGems - string_splitter - Versions diffs - 0.3.0 → 0.6.0 - Mend

string_splitter 0.3.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +63 -7
data/README.md +168 -53
data/lib/string_splitter.rb +281 -131
data/lib/string_splitter/version.rb +1 -1
metadata +16 -31

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 128e1b2cc29cb122f3d5040f7c9e115688532c7e68c59f0a5373291e995642f9
-  data.tar.gz: 6b8729b7fb59aa984c1940ff0f9a1a308dded8b77c7db2b3c2a2ad4cdbd8bd52
+  metadata.gz: 9d97ccb956fe51694359cdb0d3a997d6574de088bac6ed5a8e572f92bb5ed54a
+  data.tar.gz: 845cefeb5efd5d01baa45759cb05ff7ae5e9a457c1f148b340bb24c038bd259e
 SHA512:
-  metadata.gz: 3aa949fb5ac46369af379e2fd28bc18f7c93746515aea76005d606a4cb9f20426dec353bef8406264078cdccf89cdaca99902d961687f12fb72be08d9f2b0072
-  data.tar.gz: acd982d39a003be78b4548992cf108141e89e51a58f918e10515fb01dd1fd562db319e5c0dc5475d3e74739726e1fd752bbee4850b823705fb9520ff6b05e99f
+  metadata.gz: 7a935a6e0f3434801dcae6a32575779e1d2eb706f8f208087a208e7fdba39ac5b49928f8b7617aec60493a8db5988a013028650f8b2ced01fadb620bfd4c77e5
+  data.tar.gz: d76c18a283c1e113c8bffb73b813eb6074481faa7ea339811dc9a7424a5e24fdc3efbe9afa941459e566cde8271c3cd19a97e3a37a8cf90d36a65a7bf8fd6dcf

data/CHANGELOG.md CHANGED

@@ -1,18 +1,74 @@
+## 0.6.0 - 2020-08-20
+#### Breaking Changes
+- `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
+  unlike Ruby's `String#split` (but like Crystal's), the former no longer
+  strips the string before splitting
+- rename the `remove_empty` option `remove_empty_fields`
+- rename the `exclude` option `except` (alias for `reject`)
+#### Fixes
+- correctly handle backreferences in delimiter patterns
+#### Features
+- add support for descending, negative, and infinite ranges,
+  e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
+## 0.5.1 - 2018-07-01
+#### Changes
+- set StringSplitter::VERSION when `string_splitter.rb` is loaded
+## 0.5.0 - 2018-06-26
+#### Fixes
+- don't treat string delimiters as patterns
+#### Features
+- add a `reject`/`exclude` option which rejects splits at the specified positions
+- add a `select` alias for `at`
+## 0.4.0 - 2018-06-24
+#### Breaking Changes
+- remove the `offset` alias for `split.index`
+## 0.3.1 - 2018-06-24
+#### Fixes
+- remove trailing empty field when the separator is empty
+  ([#1](https://github.com/chocolateboy/string_splitter/issues/1))
 ## 0.3.0 - 2018-06-23
-- **breaking change**: rename the `default_separator` option to `default_delimiter`
-  - to avoid ambiguity in the code, refer to the input pattern/string as the
-    "delimiter" and the matched string as the "separator"
+#### Breaking Changes
+- rename the `default_separator` option `default_delimiter`
 ## 0.2.0 - 2018-06-22
-- **breaking change**: make `index` (AKA `offset`) 0-based and add `position`
-  (AKA `pos`) as the 1-based accessor
+#### Breaking Changes
+- make `index` (AKA `offset`) 0-based and add `position` (AKA `pos`) as the
+  1-based accessor
 ## 0.1.0 - 2018-06-22
-- **breaking change**: the block now takes a single `split` object with an
-  `index` accessor, rather than seperate `index` and `split` arguments
+#### Breaking Changes
+- the block now takes a single `split` object with an `index` accessor, rather
+  than seperate `index` and `split` arguments
+#### Features
 - add support for negative indices in the value supplied to the `at` option
 - add a `count` field to the split object containing the total number of splits

data/README.md CHANGED

@@ -3,14 +3,16 @@
 [![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)
 [![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+<!-- toc -->
 - [NAME](#name)
 - [INSTALLATION](#installation)
 - [SYNOPSIS](#synopsis)
 - [DESCRIPTION](#description)
 - [WHY?](#why)
+- [CAVEATS](#caveats)
+  - [Differences from String#split](#differences-from-string%23split)
+- [COMPATIBILITY](#compatibility)
 - [VERSION](#version)
 - [SEE ALSO](#see-also)
   - [Gems](#gems)
@@ -18,7 +20,7 @@
 - [AUTHOR](#author)
 - [COPYRIGHT AND LICENSE](#copyright-and-license)
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+<!-- tocstop -->
 # NAME
@@ -36,65 +38,137 @@ gem "string_splitter"
 require "string_splitter"
 ss = StringSplitter.new
+```
+**Same as `String#split`**
+```ruby
+ss.split("foo bar baz")
+ss.split("  foo bar baz  ")
+# => ["foo", "bar", "baz"]
+```
+```ruby
+ss.split("foo", "")
+ss.split("foo", //)
+# => ["f", "o", "o"]
+```
-# same as String#split
-ss.split("foo bar baz quux")
-ss.split("foo bar baz quux", " ")
-ss.split("foo bar baz quux", /\s+/)
-# => ["foo", "bar", "baz", "quux"]
+```ruby
+ss.split("", "...")
+ss.split("", /.../)
+# => []
+```
-# split at the first delimiter
+**Split at the first delimiter**
+```ruby
 ss.split("foo:bar:baz:quux", ":", at: 1)
+ss.split("foo:bar:baz:quux", ":", select: 1)
 # => ["foo", "bar:baz:quux"]
+```
+**Split at the last delimiter**
-# split at the last delimiter
+```ruby
 ss.split("foo:bar:baz:quux", ":", at: -1)
 # => ["foo:bar:baz", "quux"]
+```
-# split at multiple delimiter positions
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -2])
-# => ["1", "2", "3", "4:5:6:7", "8:9"]
+**Split at multiple delimiter positions**
-# split from the right
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2", "3", "4:5:6:7:8", "9"]
+```
+**Split at all but the first and last delimiters**
+```ruby
+ss.split("1:2:3:4:5:6", ":", except: [1, -1])
+ss.split("1:2:3:4:5:6", ":", reject: [1, -1])
+# => ["1:2", "3", "4", "5:6"]
+```
+**Split from the right**
+```ruby
 ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
 # => ["1:2:3:4", "5:6", "7", "8", "9"]
+```
-# full control via a block
-result = ss.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
-  split.index > 0 && split.lhs == split.rhs
+**Split with negative, descending, and infinite ranges**
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [4...])
+# => ["1:2:3:4", "5", "6", "7", "8:9"]
+```
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [..-3])
+# => ["1", "2", "3", "4", "5", "6", "7:8:9"]
+```
+```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
+# => ["1", "2:3", "4", "5", "6:7", "8", "9"]
+```
+**Full control via a block**
+```ruby
+result = ss.split("1:2:3:4:5:6:7:8", ":") do |split|
+  split.pos % 2 == 0
 end
-# => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
+# => ["1:2", "3:4", "5:6", "7:8"]
+```
+```ruby
+string = "banana".chars.sort.join # "aaabnn"
+ss.split(string, "") do |split|
+    split.rhs != split.lhs
+end
+# => ["aaa", "b", "nn"]
 ```
 # DESCRIPTION
-Many languages have built-in string `split` functions/methods. They behave similarly
-(notwithstanding the occasional [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)),
-and handle a few common cases e.g.:
+Many languages have built-in `split` functions/methods for strings. They behave
+similarly (notwithstanding the occasional
+[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
+handle a few common cases e.g.:
 * limiting the number of splits
-* including the separators in the results
+* including the separator(s) in the results
 * removing (some) empty fields
-But, because the API is squeezed into two overloaded parameters (the delimiter and the limit),
-achieving the desired effects can be tricky. For instance, while `String#split` removes empty
-trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
-cramped API means there's no way to e.g. combine a limit (positive integer) with the option
-to preserve empty fields (negative integer), or use backreferences in a delimiter pattern
+But, because the API is squeezed into two overloaded parameters (the delimiter
+and the limit), achieving the desired results can be tricky. For instance,
+while `String#split` removes empty trailing fields (by default), it provides no
+way to remove *all* empty fields. Likewise, the cramped API means there's no
+way to e.g. combine a limit (positive integer) with the option to preserve
+empty fields (negative integer), or use backreferences in a delimiter pattern
 without including its captured subexpressions in the result.
-If `split` was being written from scratch, without the baggage of its legacy API,
-it's possible that some of these options would be made explicit rather than overloading
-the parameters. And, indeed, this is possible in some implementations,
-e.g. in Crystal:
+If `split` was being written from scratch, without the baggage of its legacy
+API, it's possible that some of these options would be made explicit rather
+than overloading the parameters. And, indeed, this is possible in some
+implementations, e.g. in Crystal:
 ```ruby
-":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
-":foo:bar:baz:".split(":", remove_empty: true)  # => ["foo", "bar", "baz"]
+":foo:bar:baz:".split(":", remove_empty: false)
+# => ["", "foo", "bar", "baz", ""]
+":foo:bar:baz:".split(":", remove_empty: true)
+# => ["foo", "bar", "baz"]
 ````
-StringSplitter takes this one step further by moving the configuration out of the method altogether
-and delegating the strategy — i.e. which splits should be accepted or rejected — to a block:
+StringSplitter takes this one step further by moving the configuration out of
+the method altogether and delegating the strategy — i.e. which splits should be
+accepted or rejected — to a block:
 ```ruby
 ss = StringSplitter.new
@@ -102,22 +176,29 @@ ss = StringSplitter.new
 ss.split("foo:bar:baz", ":") { |split| split.index == 0 }
 # => ["foo", "bar:baz"]
-ss.split("foo:bar:baz", ":") { |split| split.position == split.count }
-# => ["foo:bar", "baz"]
+ss.split("foo:bar:baz:quux", ":") do |split|
+  split.position == 1 || split.position == 3
+end
+# => ["foo", "bar:baz", "quux"]
 ```
-As a shortcut, the common case of splitting on delimiters at one or more positions is supported by an option:
+As a shortcut, the common case of splitting on delimiters at one or more
+positions is supported by an option:
 ```ruby
-ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", at: [1, -1])
+# => ["foo", "bar:baz", "quux"]
 ```
 # WHY?
-I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
+I wanted to split semi-structured output into fields without having to resort
+to a regex or a full-blown parser.
-As an example, the nominally unstructured output of many Unix commands is often formatted in a way
-that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
+As an example, the nominally unstructured output of many Unix commands is often
+formatted in a way that's tantalizingly close to being
+[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
+apart from a few pesky exceptions e.g.:
 ```bash
 $ ls -l
@@ -129,8 +210,8 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
 -rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md
 ```
-These lines can *almost* be parsed into an array of fields by splitting them on whitespace. The exception is the
-date (columns 6-8) i.e.:
+These lines can *almost* be parsed into an array of fields by splitting them on
+whitespace. The exception is the date (columns 6-8) i.e.:
 ```ruby
 line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
@@ -155,13 +236,14 @@ One way to work around this is to parse the whole line e.g.:
 line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
 ```
-But that requires us to specify *everything*. What we really want is a version of `split`
-which allows us to veto splitting for the 6th and 7th delimiters i.e. control over which
-splits are accepted, rather than being restricted to the single, baked-in strategy provided
-by the `limit` parameter.
+But that requires us to specify *everything*. What we really want is a version
+of `split` which allows us to veto splitting for the 6th and 7th delimiters
+(and to stop after the 8th delimiter) i.e. control over which splits are
+accepted, rather than being restricted to the single, baked-in strategy
+provided by the `limit` parameter.
-By providing a simple way to accept or reject each split, StringSplitter makes cases like
-this easy to handle, either via a block:
+By providing a simple way to accept or reject each split, StringSplitter makes
+cases like this easy to handle, either via a block:
 ```ruby
 ss.split(line) do |split|
@@ -177,9 +259,42 @@ ss.split(line, at: [1..5, 8])
 # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
+# CAVEATS
+## Differences from String#split
+StringSplitter shares `String#split`'s behavior of trimming the string before
+splitting if the delimiter is omitted, e.g.:
+```ruby
+" foo bar baz ".split      # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ")  # => ["foo", "bar", "baz"]
+```
+However, unlike `String#split`, this doesn't also apply if a delimiter of `" "`
+is supplied, e.g.:
+```ruby
+" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
+```
+It also doesn't apply if a custom default-delimiter is defined:
+```ruby
+ss = StringSplitter.new(default_delimiter: /\s+/)
+ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
+```
+# COMPATIBILITY
+StringSplitter is tested and supported on all versions of Ruby [supported by
+the ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,
+currently, Ruby 2.5 and above.
 # VERSION
-0.3.0
+0.6.0
 # SEE ALSO
@@ -197,7 +312,7 @@ ss.split(line, at: [1..5, 8])
 # COPYRIGHT AND LICENSE
-Copyright © 2018 by chocolateboy.
+Copyright © 2018-2020 by chocolateboy.
 This is free software; you can redistribute it and/or modify it under the
-terms of the [Artistic License 2.0](http://www.opensource.org/licenses/artistic-license-2.0.php).
+terms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).

data/lib/string_splitter.rb CHANGED

@@ -1,204 +1,354 @@
 # frozen_string_literal: true
+require 'set'
 require 'values'
+require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
 #
 #   - providing full control over which splits are accepted or rejected
+#
 #   - adding support for splitting from right-to-left
-#   - encapsulating splitting options/preferences in instances rather than trying to
-#     cram them into overloaded method parameters
+#
+#   - encapsulating splitting options/preferences in the splitter rather
+#     than trying to cram them into overloaded method parameters
 #
 # These enhancements allow splits to handle many cases that otherwise require bigger
-# guns e.g. regex matching or parsing.
+# guns, e.g. regex matching or parsing.
+#
+# Implementation-wise, we effectively use the built-in +String#split+ method as a
+# tokenizer, and parse the resulting tokens into an array of Split objects with the
+# following fields:
+#
+#   - captures:  separator substrings captured by parentheses in the delimiter pattern
+#   - count:     the number of splits
+#   - index:     the 0-based index of the split in the array
+#   - lhs:       the string to the left of the separator (back to the previous split candidate)
+#   - position:  the 1-based index of the split in the array (alias: pos)
+#   - rhs:       the string to the right of the separator (up to the next split candidate)
+#   - rindex:    the 0-based index of the split relative to the end of the array
+#   - rposition: the 1-based index of the split relative to the end of the array (alias: rpos)
+#   - separator: the string matched by the delimiter pattern/string
+#
 class StringSplitter
-  ACCEPT = ->(_split) { true }
-  DEFAULT_DELIMITER = /\s+/
-  NO_SPLITS = []
+  # terminology: the delimiter is what we provide and the separators are what we get
+  # back (if we capture them). e.g. for:
+  #
+  #   ss.split("foo:bar::baz", /(\W+)/)
+  #
+  # the delimiter is /(\W)/ and the separators are ":" and "::"
+  ACCEPT_ALL = ->(_split) { true }
+  DEFAULT_DELIMITER = /\s+/.freeze
   Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
     def position
       index + 1
     end
-    alias_method :offset, :index
     alias_method :pos, :position
+    # 0-based index relative to the end of the array, e.g. for 5 items:
+    #
+    #  index | rindex
+    #  ------|-------
+    #    0   |   4
+    #    1   |   3
+    #    2   |   2
+    #    3   |   1
+    #    4   |   0
+    def rindex
+      count - position
+    end
+    # 1-based position relative to the end of the array, e.g. for 5 items:
+    #
+    #   position | rposition
+    #  ----------|----------
+    #      1     |    5
+    #      2     |    4
+    #      3     |    3
+    #      4     |    2
+    #      5     |    1
+    def rposition
+      count + 1 - position
+    end
+    alias_method :rpos, :rposition
   end
+  # simulate an enum. the value is returned by the case statement
+  # in the generated block if the positions match
+  module Action
+    SELECT = true
+    REJECT = false
+  end
+  private_constant :Action
   def initialize(
     default_delimiter: DEFAULT_DELIMITER,
     include_captures: true,
-    remove_empty: false,
+    remove_empty: false, # TODO remove this
+    remove_empty_fields: remove_empty,
     spread_captures: true
   )
     @default_delimiter = default_delimiter
     @include_captures = include_captures
-    @remove_empty = remove_empty
+    @remove_empty_fields = remove_empty_fields
     @spread_captures = spread_captures
   end
-  attr_reader :default_delimiter, :include_captures, :remove_empty, :spread_captures
-  def split(string, delimiter = @default_delimiter, at: nil, &block)
-    result, block, splits, count, index = split_common(string, delimiter, at, block)
+  attr_reader(
+    :default_delimiter,
+    :include_captures,
+    :remove_empty_fields,
+    :spread_captures
+  )
-    splits.each do |split|
-      split = Split.with(split.merge({ index: (index += 1), count: count }))
+  # TODO remove this
+  alias remove_empty remove_empty_fields
+  def split(
+    string,
+    delimiter = @default_delimiter,
+    at: nil, # alias for select
+    except: nil, # alias for reject
+    select: at,
+    reject: except,
+    &block
+  )
+    result, splits, count, accept = init(
+      string: string,
+      delimiter: delimiter,
+      select: select,
+      reject: reject,
+      block: block
+    )
+    return result unless splits
+    splits.each_with_index do |hash, index|
+      split = Split.with(hash.merge({ count: count, index: index }))
       result << split.lhs if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result += split.captures
-          else
-            result << split.captures
-          end
-        end
-        result << split.rhs
+      if accept.call(split)
+        result << split.captures << split.rhs
       else
         # append the rhs
         result[-1] = result[-1] + split.separator + split.rhs
       end
     end
-    result
+    render(result)
   end
   alias lsplit split
-  def rsplit(string, delimiter = @default_delimiter, at: nil, &block)
-    result, block, splits, count, index = split_common(string, delimiter, at, block)
-    splits.reverse!.each do |split|
-      split = Split.with(split.merge({ index: (index += 1), count: count }))
+  def rsplit(
+    string,
+    delimiter = @default_delimiter,
+    at: nil, # alias for select
+    except: nil, # alias for reject
+    select: at,
+    reject: except,
+    &block
+  )
+    result, splits, count, accept = init(
+      string: string,
+      delimiter: delimiter,
+      select: select,
+      reject: reject,
+      block: block
+    )
+    return result unless splits
+    splits.reverse_each.with_index do |hash, index|
+      split = Split.with(hash.merge({ count: count, index: index }))
       result.unshift(split.rhs) if result.empty?
-      if block.call(split)
-        if @include_captures
-          if @spread_captures
-            result = split.captures + result
-          else
-            result.unshift(split.captures)
-          end
-        end
-        result.unshift(split.lhs)
+      if accept.call(split)
+        # [lhs + captures] + result
+        result.unshift(split.lhs, split.captures)
       else
         # prepend the lhs
         result[0] = split.lhs + split.separator + result[0]
       end
     end
-    result
+    render(result)
   end
   private
-  def splits_for(parts, ncaptures)
-    result = []
-    splits = []
-    until parts.empty?
-      lhs = parts.shift
-      separator = parts.shift
-      captures = parts.shift(ncaptures)
-      rhs = parts.length == 1 ? parts.shift : parts.first
-      if @remove_empty && (lhs.empty? || rhs.empty?)
-        if lhs.empty? && rhs.empty?
-          # do nothing
-        elsif parts.empty? # last split
-          result << (!lhs.empty? ? lhs : rhs) if splits.empty?
-        elsif rhs.empty?
-          # replace the empty rhs with the non-empty lhs
-          parts[0] = lhs
-        end
+  # initialisation common to +split+ and +rsplit+
+  #
+  # takes a hash of options passed to +split+ or +rsplit+ and returns a triple with
+  # the following fields:
+  #
+  #   - result: the array of separated strings to return from +split+ or +rsplit+.
+  #     if the splits arry is empty, the caller returns this array immediately
+  #     without any further processing
+  #
+  #   - splits: an array of hashes containing the lhs, rhs, separator and captured
+  #     separator substrings for each split
+  #
+  #   - count: the number of splits
+  #
+  #   - accept: a proc whose return value determines whether each split should be
+  #     accepted (true) or rejected (false)
+  #
+  def init(string:, delimiter:, select:, reject:, block:)
+    if delimiter.equal?(DEFAULT_DELIMITER)
+      string = string.strip
+    end
-        next
-      end
+    if reject
+      positions = reject
+      action = Action::REJECT
+    elsif select
+      positions = select
+      action = Action::SELECT
+    end
-      splits << {
-        lhs: lhs,
-        rhs: rhs,
-        separator: separator,
-        captures: captures,
-      }
+    splits = parse(string, delimiter)
+    if splits.empty?
+      result = string.empty? ? [] : [string]
+      return [result]
     end
-    [result, splits]
+    block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
+    [[], splits, splits.length, block]
   end
-  # setup common to both split methods
-  def split_common(string, delimiter, at, block)
-    unless (match = string.match(delimiter))
-      result = (@remove_empty && string.empty?) ? [] : [string]
-      return [result, block, NO_SPLITS, 0, -1]
+  def render(result)
+    if @remove_empty_fields
+      result.reject! { |it| it.is_a?(String) && it.empty? }
     end
-    ncaptures = match.captures.length
-    if delimiter.is_a?(Regexp) && ncaptures > 0
-      # increment back-references so they remain valid when the outer capture
-      # is added e.g. to split on:
-      #
-      #   - <foo-comment> ... </foo-comment>
-      #   - <bar-comment> ... </bar-comment>
-      #
-      # etc.
-      #
-      # before:
-      #
-      #   %r|   <(\w+-comment)> [^<]* </\1>   |x
-      #
-      # after:
-      #
-      #   %r| ( <(\w+-comment)> [^<]* </\2> ) |x
-      delimiter = delimiter.to_s.gsub(/\\(?:(\d+)|.)/) do
-        match = Regexp.last_match
-        match[1] ? '\\' + match[1].to_i.next.to_s : match[0]
-      end
+    unless @include_captures
+      return result.reject! { |it| it.is_a?(Array) }
     end
-    parts = string.split(/(#{delimiter})/, -1)
-    result, splits = splits_for(parts, ncaptures)
-    count = splits.length
-    unless block
-      if at
-        at = Array(at).map do |index|
-          if index.is_a?(Integer) && index.negative?
-            # translate 1-based negative indices to 1-based positive
-            # indices e.g:
-            #
-            #   ss.split("foo:bar:baz:quux", ":", at: -1)
-            #
-            # translates to:
-            #
-            #   ss.split("foo:bar:baz:quux", ":", at: 3)
-            #
-            # XXX note: we don't use modulo, because we don't want
-            # out-of-bounds indices to silently work e.g. we don't want:
-            #
-            #   ss.split("foo:bar:baz:quux", ":", -42)
-            #
-            # to mysteriously match when the index is 2
-            count + 1 + index
-          else
-            index
-          end
-        end
+    result.flat_map do |value|
+      next [value] unless value.is_a?(Array) && @spread_captures
+      @spread_captures == :compact ? value.compact : value
+    end
+  end
+  # takes a string and a delimiter pattern (regex or string) and splits it along
+  # the delimiter, returning an array of objects (hashes) representing each split.
+  # e.g. for:
+  #
+  #   parse.split("foo:bar:baz:quux", ":")
+  #
+  # we return:
+  #
+  #   [
+  #       { lhs: "foo", rhs: "bar", separator: ":", captures: [] },
+  #       { lhs: "bar", rhs: "baz", separator: ":", captures: [] },
+  #       { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
+  #   ]
+  #
+  def parse(string, pattern)
+    result = []
+    start = 0
+    # we don't use the argument passed to the +scan+ block here because it's a
+    # string (the separator) if there are no captures, rather than an empty
+    # array. we use match.captures instead to get the array
+    string.scan(pattern) do
+      match = Regexp.last_match
+      index, after = match.offset(0)
+      separator = match[0]
+      # ignore empty separators at the beginning and/or end of the string
+      next if separator.empty? && (index.zero? || after == string.length)
+      lhs = string.slice(start, index - start)
+      result.last[:rhs] = lhs unless result.empty?
+      # this is correct for the last/only match, but gets updated to the next
+      # match's lhs for other matches
+      rhs = match.post_match
+      result << {
+        captures: match.captures,
+        lhs: lhs,
+        rhs: rhs,
+        separator: separator,
+      }
+      # move the start index (the start of the lhs) to the index after the last
+      # character of the separator
+      start = after
+    end
+    result
+  end
-        block = lambda do |split|
-          case split.position when *at then true else false end
+  # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
+  # on the action) the supplied positions
+  #
+  # positions are preprocessed to support an additional feature: negative indices
+  # are translated to 1-based non-negative indices, e.g:
+  #
+  #   ss.split("foo:bar:baz:quux", ":", at: -1)
+  #
+  # translates to:
+  #
+  #   ss.split("foo:bar:baz:quux", ":", at: 3)
+  #
+  # and
+  #
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
+  #
+  # translate to:
+  #
+  #   ss.split("foo:bar:baz:quux", ":", at: 6..8)
+  #
+  def compile(positions, action, nsplits)
+    # XXX note: we don't use modulo, because we don't want
+    # out-of-bounds indices to silently work, e.g. we don't want:
+    #
+    #   ss.split("foo:bar:baz:quux", ":", at: -42)
+    #
+    # to mysteriously match when the index/position is 0/1
+    #
+    resolve = ->(int) { int.negative? ? nsplits + 1 + int : int }
+    # don't use Array(...) to wrap these as we don't want to convert ranges
+    positions = positions.is_a?(Array) ? positions : [positions]
+    positions = positions.map do |position|
+      if position.is_a?(Integer)
+        resolve[position]
+      elsif position.is_a?(Range)
+        rbegin = position.begin
+        rend = position.end
+        rexc = position.exclude_end?
+        if rbegin.nil?
+          Range.new(1, resolve[rend], rexc)
+        elsif rend.nil?
+          Range.new(resolve[rbegin], nsplits, rexc)
+        elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
+          from = resolve[rbegin]
+          to = resolve[rend]
+          to < from ? Range.new(to, from, rexc) : Range.new(from, to, rexc)
+        else
+          position
         end
+      elsif position.is_a?(Set)
+        position.map { |it| resolve[it] }.to_set
       else
-        block = ACCEPT
+        position
       end
     end
-    [result, block, splits, count, -1]
+    ->(split) { case split.position when *positions then action else !action end }
   end
 end

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.3.0'
+  VERSION = '0.6.0'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.6.0
 platform: ruby
 authors:
 - chocolateboy
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-06-23 00:00:00.000000000 Z
+date: 2020-08-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: values
@@ -30,42 +30,42 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.16'
+        version: '2.1'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '5.11'
+        version: '5.0'
 - !ruby/object:Gem::Dependency
   name: minitest-power_assert
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.3.0
+        version: '0.3'
 - !ruby/object:Gem::Dependency
   name: minitest-reporters
   requirement: !ruby/object:Gem::Requirement
@@ -86,29 +86,15 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
-- !ruby/object:Gem::Dependency
-  name: rubocop
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.54.0
-description:
+        version: '13.0'
+description:
 email: chocolate@cpan.org
 executables: []
 extensions: []
@@ -127,7 +113,7 @@ metadata:
   bug_tracker_uri: https://github.com/chocolateboy/string_splitter/issues
   changelog_uri: https://github.com/chocolateboy/string_splitter/blob/master/CHANGELOG.md
   source_code_uri: https://github.com/chocolateboy/string_splitter
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -135,16 +121,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '2.3'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.7
-signing_key:
+rubygems_version: 3.1.4
+signing_key:
 specification_version: 4
 summary: String#split on steroids
 test_files: []