RubyGems - string_splitter - Versions diffs - 0.6.0 → 0.7.0 - Mend

string_splitter 0.6.0 → 0.7.0

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +18 -11
data/README.md +32 -29
data/lib/string_splitter.rb +34 -29
data/lib/string_splitter/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 9d97ccb956fe51694359cdb0d3a997d6574de088bac6ed5a8e572f92bb5ed54a
-  data.tar.gz: 845cefeb5efd5d01baa45759cb05ff7ae5e9a457c1f148b340bb24c038bd259e
+  metadata.gz: 400534de6c3143ef81b2ad46a3a6432b7d83ef0900024ebdde3f06a4e1714890
+  data.tar.gz: 643f5af7b9e13321dfa97b045b124d0c5ea576868b13141c264122bc96baea5e
 SHA512:
-  metadata.gz: 7a935a6e0f3434801dcae6a32575779e1d2eb706f8f208087a208e7fdba39ac5b49928f8b7617aec60493a8db5988a013028650f8b2ced01fadb620bfd4c77e5
-  data.tar.gz: d76c18a283c1e113c8bffb73b813eb6074481faa7ea339811dc9a7424a5e24fdc3efbe9afa941459e566cde8271c3cd19a97e3a37a8cf90d36a65a7bf8fd6dcf
+  metadata.gz: 35bed8fe69b33314813fbd68a8da0e8f4799b7891275ac601b157caeb0e0a3780f37ec7e7876d808b8dfcbfdf7527f45c3af0dc0d679e133865e96949a1d9ce3
+  data.tar.gz: 8186e40d57654daf1a481ab74c128910f7aa346bc343a0a9933dc39b7cceeb204c1a55ac39b39321df46f7d02420fd87f93dd4a708be0a985d94833df018da87

data/CHANGELOG.md CHANGED

@@ -1,22 +1,29 @@
+## 0.7.0 - 2020-08-21
+#### Breaking Changes
+- `String#split` incompatibility: we no longer trim the string (with
+  `String#strip`) before splitting if the delimiter is omitted
 ## 0.6.0 - 2020-08-20
 #### Breaking Changes
 - `ss.split(str, " ")` is no longer treated the same as `ss.split(str)` i.e.
-  unlike Ruby's `String#split` (but like Crystal's), the former no longer
-  strips the string before splitting
+  unlike Ruby's `String#split`, the former no longer strips the string before
+  splitting
 - rename the `remove_empty` option `remove_empty_fields`
 - rename the `exclude` option `except` (alias for `reject`)
-#### Fixes
-- correctly handle backreferences in delimiter patterns
 #### Features
 - add support for descending, negative, and infinite ranges,
   e.g. `ss.split(str, ":", at: [..4, 4..., 3..1, -1..-3])` etc.
+#### Fixes
+- correctly handle backreferences in delimiter patterns
 ## 0.5.1 - 2018-07-01
 #### Changes
@@ -25,15 +32,15 @@
 ## 0.5.0 - 2018-06-26
-#### Fixes
-- don't treat string delimiters as patterns
 #### Features
 - add a `reject`/`exclude` option which rejects splits at the specified positions
 - add a `select` alias for `at`
+#### Fixes
+- don't treat string delimiters as patterns
 ## 0.4.0 - 2018-06-24
 #### Breaking Changes
@@ -65,7 +72,7 @@
 #### Breaking Changes
 - the block now takes a single `split` object with an `index` accessor, rather
-  than seperate `index` and `split` arguments
+  than separate `index` and `split` arguments
 #### Features

data/README.md CHANGED

@@ -44,17 +44,14 @@ ss = StringSplitter.new
 ```ruby
 ss.split("foo bar baz")
-ss.split("  foo bar baz  ")
+ss.split("foo bar baz", " ")
+ss.split("foo bar baz", /\s+/)
 # => ["foo", "bar", "baz"]
-```
-```ruby
 ss.split("foo", "")
 ss.split("foo", //)
 # => ["f", "o", "o"]
-```
-```ruby
 ss.split("", "...")
 ss.split("", /.../)
 # => []
@@ -99,19 +96,13 @@ ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
 **Split with negative, descending, and infinite ranges**
-```ruby
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [4...])
-# => ["1:2:3:4", "5", "6", "7", "8:9"]
-```
 ```ruby
 ss.split("1:2:3:4:5:6:7:8:9", ":", at: ..-3)
-ss.split("1:2:3:4:5:6:7:8:9", ":", at: [..-3])
 # => ["1", "2", "3", "4", "5", "6", "7:8:9"]
-```
-```ruby
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: 4...)
+# => ["1:2:3:4", "5", "6", "7", "8:9"]
 ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1, 5..3, -2..])
 # => ["1", "2:3", "4", "5", "6:7", "8", "9"]
 ```
@@ -182,12 +173,15 @@ end
 # => ["foo", "bar:baz", "quux"]
 ```
-As a shortcut, the common case of splitting on delimiters at one or more
-positions is supported by an option:
+As a shortcut, the common case of splitting (or not splitting) at one or more
+positions is supported by dedicated options:
 ```ruby
-ss.split("foo:bar:baz:quux", ":", at: [1, -1])
+ss.split("foo:bar:baz:quux", ":", select: [1, -1])
 # => ["foo", "bar:baz", "quux"]
+ss.split("foo:bar:baz:quux", ":", reject: [1, -1])
+# => ["foo:bar", "baz:quux"]
 ```
 # WHY?
@@ -263,27 +257,36 @@ ss.split(line, at: [1..5, 8])
 ## Differences from String#split
-StringSplitter shares `String#split`'s behavior of trimming the string before
-splitting if the delimiter is omitted, e.g.:
+Unlike `String#split`, StringSplitter doesn't trim the string before splitting
+(with `String#strip`) if the delimiter is omitted or a single space, e.g.:
 ```ruby
-" foo bar baz ".split      # => ["foo", "bar", "baz"]
-ss.split(" foo bar baz ")  # => ["foo", "bar", "baz"]
+" foo bar baz ".split          # => ["foo", "bar", "baz"]
+" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
+ss.split(" foo bar baz ")      # => ["", "foo", "bar", "baz", ""]
+ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
 ```
-However, unlike `String#split`, this doesn't also apply if a delimiter of `" "`
-is supplied, e.g.:
+`String#split` omits the `nil` values of unmatched optional captures:
 ```ruby
-" foo bar baz ".split(" ")     # => ["foo", "bar", "baz"]
-ss.split(" foo bar baz ", " ") # => ["", "foo", "bar", "baz", ""]
+"foo:bar:baz".scan(/(:)|(-)/)  # => [[":", nil], [":", nil]]
+"foo:bar:baz".split(/(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
 ```
-It also doesn't apply if a custom default-delimiter is defined:
+StringSplitter preserves them by default (if `include_captures` is true, as it
+is by default), though they can be omitted from spread captures by passing
+`:compact` as the value of the `spread_captures` option:
 ```ruby
-ss = StringSplitter.new(default_delimiter: /\s+/)
-ss.split(" foo bar baz ") # => ["", "foo", "bar", "baz", ""]
+s1 = StringSplitter.new(spread_captures: true)
+s2 = StringSplitter.new(spread_captures: false)
+s3 = StringSplitter.new(spread_captures: :compact)
+s1.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", nil, "bar", ":", nil, "baz"]
+s2.split("foo:bar:baz", /(:)|(-)/) # => ["foo", [":", nil], "bar", [":", nil], "baz"]
+s3.split("foo:bar:baz", /(:)|(-)/) # => ["foo", ":", "bar", ":", "baz"]
 ```
 # COMPATIBILITY
@@ -294,7 +297,7 @@ currently, Ruby 2.5 and above.
 # VERSION
-0.6.0
+0.7.0
 # SEE ALSO

data/lib/string_splitter.rb CHANGED

@@ -2,6 +2,7 @@
 require 'set'
 require 'values'
 require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
@@ -16,9 +17,9 @@ require_relative 'string_splitter/version'
 # These enhancements allow splits to handle many cases that otherwise require bigger
 # guns, e.g. regex matching or parsing.
 #
-# Implementation-wise, we effectively use the built-in +String#split+ method as a
-# tokenizer, and parse the resulting tokens into an array of Split objects with the
-# following fields:
+# Implementation-wise, we split the string with a scanner which works in a similar
+# way to +String#split+ and parse the resulting tokens into an array of Split objects
+# with the following fields:
 #
 #   - captures:  separator substrings captured by parentheses in the delimiter pattern
 #   - count:     the number of splits
@@ -40,6 +41,7 @@ class StringSplitter
   ACCEPT_ALL = ->(_split) { true }
   DEFAULT_DELIMITER = /\s+/.freeze
+  REMOVE = [].freeze
   Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
     def position
@@ -184,7 +186,7 @@ class StringSplitter
   # initialisation common to +split+ and +rsplit+
   #
-  # takes a hash of options passed to +split+ or +rsplit+ and returns a triple with
+  # takes a hash of options passed to +split+ or +rsplit+ and returns a tuple with
   # the following fields:
   #
   #   - result: the array of separated strings to return from +split+ or +rsplit+.
@@ -200,10 +202,6 @@ class StringSplitter
   #     accepted (true) or rejected (false)
   #
   def init(string:, delimiter:, select:, reject:, block:)
-    if delimiter.equal?(DEFAULT_DELIMITER)
-      string = string.strip
-    end
     if reject
       positions = reject
       action = Action::REJECT
@@ -223,18 +221,25 @@ class StringSplitter
     [[], splits, splits.length, block]
   end
-  def render(result)
-    if @remove_empty_fields
-      result.reject! { |it| it.is_a?(String) && it.empty? }
-    end
-    unless @include_captures
-      return result.reject! { |it| it.is_a?(Array) }
-    end
-    result.flat_map do |value|
-      next [value] unless value.is_a?(Array) && @spread_captures
-      @spread_captures == :compact ? value.compact : value
+  def render(values)
+    values.flat_map do |value|
+      if value.is_a?(String)
+        value.empty? && @remove_empty_fields ? REMOVE : [value]
+      elsif @include_captures
+        if @spread_captures
+          @spread_captures == :compact ? value.compact : value
+        elsif value.empty?
+          # we expose non-captures (string delimiters or regexps with no
+          # captures) as empty arrays inside the block, so the type is
+          # consistent, but it doesn't make sense to keep them in the
+          # result
+          REMOVE
+        else
+          [value]
+        end
+      else
+        REMOVE
+      end
     end
   end
@@ -252,14 +257,14 @@ class StringSplitter
   #       { lhs: "baz", rhs: "quux", separator: ":", captures: [] },
   #   ]
   #
-  def parse(string, pattern)
+  def parse(string, delimiter)
     result = []
     start = 0
     # we don't use the argument passed to the +scan+ block here because it's a
     # string (the separator) if there are no captures, rather than an empty
     # array. we use match.captures instead to get the array
-    string.scan(pattern) do
+    string.scan(delimiter) do
       match = Regexp.last_match
       index, after = match.offset(0)
       separator = match[0]
@@ -281,8 +286,8 @@ class StringSplitter
         separator: separator,
       }
-      # move the start index (the start of the lhs) to the index after the last
-      # character of the separator
+      # move the start index (the start of the next lhs) to the index after the
+      # last character of the separator
       start = after
     end
@@ -292,8 +297,8 @@ class StringSplitter
   # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
   # on the action) the supplied positions
   #
-  # positions are preprocessed to support an additional feature: negative indices
-  # are translated to 1-based non-negative indices, e.g:
+  # positions are preprocessed to support additional features: negative
+  # ranges, infinite ranges, and descending ranges, e.g.:
   #
   #   ss.split("foo:bar:baz:quux", ":", at: -1)
   #
@@ -310,7 +315,7 @@ class StringSplitter
   #
   #   ss.split("foo:bar:baz:quux", ":", at: 6..8)
   #
-  def compile(positions, action, nsplits)
+  def compile(positions, action, count)
     # XXX note: we don't use modulo, because we don't want
     # out-of-bounds indices to silently work, e.g. we don't want:
     #
@@ -318,7 +323,7 @@ class StringSplitter
     #
     # to mysteriously match when the index/position is 0/1
     #
-    resolve = ->(int) { int.negative? ? nsplits + 1 + int : int }
+    resolve = ->(int) { int.negative? ? count + 1 + int : int }
     # don't use Array(...) to wrap these as we don't want to convert ranges
     positions = positions.is_a?(Array) ? positions : [positions]
@@ -334,7 +339,7 @@ class StringSplitter
         if rbegin.nil?
           Range.new(1, resolve[rend], rexc)
         elsif rend.nil?
-          Range.new(resolve[rbegin], nsplits, rexc)
+          Range.new(resolve[rbegin], count, rexc)
         elsif rbegin.negative? || rend.negative? || (rend - rbegin).negative?
           from = resolve[rbegin]
           to = resolve[rend]

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.6.0'
+  VERSION = '0.7.0'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.6.0
+  version: 0.7.0
 platform: ruby
 authors:
 - chocolateboy
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-08-20 00:00:00.000000000 Z
+date: 2020-08-21 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: values