RubyGems - string_splitter - Versions diffs - 0.7.0 → 0.7.1 - Mend

string_splitter 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/README.md +9 -9
data/lib/string_splitter.rb +92 -69
data/lib/string_splitter/split.rb +51 -0
data/lib/string_splitter/version.rb +1 -1
metadata +3 -16

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 400534de6c3143ef81b2ad46a3a6432b7d83ef0900024ebdde3f06a4e1714890
-  data.tar.gz: 643f5af7b9e13321dfa97b045b124d0c5ea576868b13141c264122bc96baea5e
+  metadata.gz: 799ba605477bc50679baaa0ae5d12ac8077fc3a57611f69beddb3396a45e3a13
+  data.tar.gz: 0fbdf7225b69ea52b615ac7523bd15266dc9b0dbbe541e7b3802027a0a8c6c36
 SHA512:
-  metadata.gz: 35bed8fe69b33314813fbd68a8da0e8f4799b7891275ac601b157caeb0e0a3780f37ec7e7876d808b8dfcbfdf7527f45c3af0dc0d679e133865e96949a1d9ce3
-  data.tar.gz: 8186e40d57654daf1a481ab74c128910f7aa346bc343a0a9933dc39b7cceeb204c1a55ac39b39321df46f7d02420fd87f93dd4a708be0a985d94833df018da87
+  metadata.gz: c8fc9cf7bbd351013091918f5398c27efcda0b9b8c1f66294af76f1864e911d2fc0520b653fe1bdf3d11fb912dd0615b0954e38176f87fbf2a6cc931d0bdf6be
+  data.tar.gz: 98bd2cdeae3a27f9f54bb982b75033c9180e688419c0f5209682462a27e1792d6c8ec6d16ec6340c359c22373cdcad07c05a8ced5b03811060cf492d09a1c13b

data/CHANGELOG.md CHANGED

@@ -1,3 +1,12 @@
+## 0.7.1 - 2020-08-22
+#### Changes
+- performance improvements
+  - delegate to `String#split` where possible
+  - use a regular class for Split rather than values.rb
+  - create Split objects directly rather than allocating intermediate hashes
 ## 0.7.0 - 2020-08-21
 #### Breaking Changes

data/README.md CHANGED

@@ -11,7 +11,7 @@
 - [DESCRIPTION](#description)
 - [WHY?](#why)
 - [CAVEATS](#caveats)
-  - [Differences from String#split](#differences-from-string%23split)
+  - [Differences from String#split](#differences-from-stringsplit)
 - [COMPATIBILITY](#compatibility)
 - [VERSION](#version)
 - [SEE ALSO](#see-also)
@@ -130,7 +130,7 @@ end
 Many languages have built-in `split` functions/methods for strings. They behave
 similarly (notwithstanding the occasional
 [surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and
-handle a few common cases e.g.:
+handle a few common cases, e.g.:
 * limiting the number of splits
 * including the separator(s) in the results
@@ -140,7 +140,7 @@ But, because the API is squeezed into two overloaded parameters (the delimiter
 and the limit), achieving the desired results can be tricky. For instance,
 while `String#split` removes empty trailing fields (by default), it provides no
 way to remove *all* empty fields. Likewise, the cramped API means there's no
-way to e.g. combine a limit (positive integer) with the option to preserve
+way to, e.g., combine a limit (positive integer) with the option to preserve
 empty fields (negative integer), or use backreferences in a delimiter pattern
 without including its captured subexpressions in the result.
@@ -192,7 +192,7 @@ to a regex or a full-blown parser.
 As an example, the nominally unstructured output of many Unix commands is often
 formatted in a way that's tantalizingly close to being
 [machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),
-apart from a few pesky exceptions e.g.:
+apart from a few pesky exceptions, e.g.:
 ```bash
 $ ls -l
@@ -205,7 +205,7 @@ drwxr-xr-x 3 user users 4096 Jun 19 22:56 lib
 ```
 These lines can *almost* be parsed into an array of fields by splitting them on
-whitespace. The exception is the date (columns 6-8) i.e.:
+whitespace. The exception is the date (columns 6-8), i.e.:
 ```ruby
 line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
@@ -224,7 +224,7 @@ instead of:
 ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
-One way to work around this is to parse the whole line e.g.:
+One way to work around this is to parse the whole line, e.g.:
 ```ruby
 line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+ \S+) \s+ (.+)$/x)
@@ -232,7 +232,7 @@ line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+
 But that requires us to specify *everything*. What we really want is a version
 of `split` which allows us to veto splitting for the 6th and 7th delimiters
-(and to stop after the 8th delimiter) i.e. control over which splits are
+(and to stop after the 8th delimiter), i.e. control over which splits are
 accepted, rather than being restricted to the single, baked-in strategy
 provided by the `limit` parameter.
@@ -258,7 +258,7 @@ ss.split(line, at: [1..5, 8])
 ## Differences from String#split
 Unlike `String#split`, StringSplitter doesn't trim the string before splitting
-(with `String#strip`) if the delimiter is omitted or a single space, e.g.:
+if the delimiter is omitted or a single space, e.g.:
 ```ruby
 " foo bar baz ".split          # => ["foo", "bar", "baz"]
@@ -297,7 +297,7 @@ currently, Ruby 2.5 and above.
 # VERSION
-0.7.0
+0.7.1
 # SEE ALSO

data/lib/string_splitter.rb CHANGED

@@ -1,8 +1,8 @@
 # frozen_string_literal: true
 require 'set'
-require 'values'
+require_relative 'string_splitter/split'
 require_relative 'string_splitter/version'
 # This class extends the functionality of +String#split+ by:
@@ -17,9 +17,10 @@ require_relative 'string_splitter/version'
 # These enhancements allow splits to handle many cases that otherwise require bigger
 # guns, e.g. regex matching or parsing.
 #
-# Implementation-wise, we split the string with a scanner which works in a similar
-# way to +String#split+ and parse the resulting tokens into an array of Split objects
-# with the following fields:
+# Implementation-wise, we split the string either with String#split, or with a custom
+# scanner if the delimiter may contain captures (since String#split doesn't handle
+# them correctly) and parse the resulting tokens into an array of Split objects with
+# the following attributes:
 #
 #   - captures:  separator substrings captured by parentheses in the delimiter pattern
 #   - count:     the number of splits
@@ -43,42 +44,6 @@ class StringSplitter
   DEFAULT_DELIMITER = /\s+/.freeze
   REMOVE = [].freeze
-  Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator) do
-    def position
-      index + 1
-    end
-    alias_method :pos, :position
-    # 0-based index relative to the end of the array, e.g. for 5 items:
-    #
-    #  index | rindex
-    #  ------|-------
-    #    0   |   4
-    #    1   |   3
-    #    2   |   2
-    #    3   |   1
-    #    4   |   0
-    def rindex
-      count - position
-    end
-    # 1-based position relative to the end of the array, e.g. for 5 items:
-    #
-    #   position | rposition
-    #  ----------|----------
-    #      1     |    5
-    #      2     |    4
-    #      3     |    3
-    #      4     |    2
-    #      5     |    1
-    def rposition
-      count + 1 - position
-    end
-    alias_method :rpos, :rposition
-  end
   # simulate an enum. the value is returned by the case statement
   # in the generated block if the positions match
   module Action
@@ -130,9 +95,10 @@ class StringSplitter
     return result unless splits
-    splits.each_with_index do |hash, index|
-      split = Split.with(hash.merge({ count: count, index: index }))
-      result << split.lhs if result.empty?
+    result << splits.first.lhs
+    splits.each_with_index do |split, index|
+      split.update!(count: count, index: index)
       if accept.call(split)
         result << split.captures << split.rhs
@@ -166,9 +132,10 @@ class StringSplitter
     return result unless splits
-    splits.reverse_each.with_index do |hash, index|
-      split = Split.with(hash.merge({ count: count, index: index }))
-      result.unshift(split.rhs) if result.empty?
+    result.unshift(splits.last.rhs)
+    splits.reverse_each.with_index do |split, index|
+      split.update!(count: count, index: index)
       if accept.call(split)
         # [lhs + captures] + result
@@ -190,7 +157,7 @@ class StringSplitter
   # the following fields:
   #
   #   - result: the array of separated strings to return from +split+ or +rsplit+.
-  #     if the splits arry is empty, the caller returns this array immediately
+  #     if the splits array is empty, the caller returns this array immediately
   #     without any further processing
   #
   #   - splits: an array of hashes containing the lhs, rhs, separator and captured
@@ -202,23 +169,76 @@ class StringSplitter
   #     accepted (true) or rejected (false)
   #
   def init(string:, delimiter:, select:, reject:, block:)
-    if reject
-      positions = reject
-      action = Action::REJECT
-    elsif select
-      positions = select
-      action = Action::SELECT
+    return [[]] if string.empty?
+    unless block
+      if reject
+        positions = reject
+        action = Action::REJECT
+      elsif select
+        positions = select
+        action = Action::SELECT
+      else
+        block = ACCEPT_ALL
+      end
     end
-    splits = parse(string, delimiter)
+    # use String#split if we can
+    #
+    # NOTE +reject!+ is no faster than +reject+ on MRI and significantly slower
+    # on TruffleRuby
+    if delimiter.is_a?(String)
+      limit = -1
+      if delimiter == ' '
+        delimiter = / / # don't trim
+      elsif delimiter.empty?
+        limit = 0 # remove the trailing empty string
+      end
+      result = string.split(delimiter, limit)
+      return [result] if result.length == 1 # delimiter not found: no splits
+      if block == ACCEPT_ALL # return the (2 or more) fields
+        result = result.reject(&:empty?) if @remove_empty_fields
+        return [result]
+      end
+      splits = []
+      result.each_cons(2) do |lhs, rhs| # 2 or more fields
+        splits << Split.new(
+          captures: [],
+          lhs: lhs,
+          rhs: rhs,
+          separator: delimiter
+        )
+      end
+    elsif delimiter == DEFAULT_DELIMITER && block == ACCEPT_ALL
+      # non-empty separators so -1 is safe
+      if @remove_empty_fields
+        result = []
+        string.split(delimiter, -1) do |field|
+          result << field unless it.empty?
+        end
+      else
+        result = string.split(delimiter, -1)
+      end
-    if splits.empty?
-      result = string.empty? ? [] : [string]
       return [result]
+    else
+      splits = parse(string, delimiter)
     end
-    block ||= positions ? compile(positions, action, splits.length) : ACCEPT_ALL
-    [[], splits, splits.length, block]
+    count = splits.length
+    return [[string]] if count.zero?
+    block ||= compile(positions, action, count)
+    [[], splits, count, block]
   end
   def render(values)
@@ -227,6 +247,7 @@ class StringSplitter
         value.empty? && @remove_empty_fields ? REMOVE : [value]
       elsif @include_captures
         if @spread_captures
+          # TODO make sure compact can return a Capture
           @spread_captures == :compact ? value.compact : value
         elsif value.empty?
           # we expose non-captures (string delimiters or regexps with no
@@ -247,7 +268,7 @@ class StringSplitter
   # the delimiter, returning an array of objects (hashes) representing each split.
   # e.g. for:
   #
-  #   parse.split("foo:bar:baz:quux", ":")
+  #   parse("foo:bar:baz:quux", ":")
   #
   # we return:
   #
@@ -258,6 +279,7 @@ class StringSplitter
   #   ]
   #
   def parse(string, delimiter)
+    # has_names = delimiter.is_a?(Regexp) && !delimiter.names.empty?
     result = []
     start = 0
@@ -273,21 +295,23 @@ class StringSplitter
       next if separator.empty? && (index.zero? || after == string.length)
       lhs = string.slice(start, index - start)
-      result.last[:rhs] = lhs unless result.empty?
+      result.last.rhs = lhs unless result.empty?
       # this is correct for the last/only match, but gets updated to the next
       # match's lhs for other matches
       rhs = match.post_match
-      result << {
+      # captures = (has_names ? Captures.new(match) : match.captures)
+      result << Split.new(
         captures: match.captures,
         lhs: lhs,
         rhs: rhs,
-        separator: separator,
-      }
+        separator: separator
+      )
-      # move the start index (the start of the next lhs) to the index after the
-      # last character of the separator
+      # advance the start index (the start of the next lhs) to the position
+      # after the last character of the separator
       start = after
     end
@@ -297,8 +321,8 @@ class StringSplitter
   # returns a lambda which splits at (i.e. accepts or rejects splits at, depending
   # on the action) the supplied positions
   #
-  # positions are preprocessed to support additional features: negative
-  # ranges, infinite ranges, and descending ranges, e.g.:
+  # positions are preprocessed to support negative indices, infinite ranges, and
+  # descending ranges, e.g.:
   #
   #   ss.split("foo:bar:baz:quux", ":", at: -1)
   #
@@ -309,9 +333,8 @@ class StringSplitter
   # and
   #
   #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
-  #   ss.split("1:2:3:4:5:6:7:8:9", ":", -3..)
   #
-  # translate to:
+  # translates to:
   #
   #   ss.split("foo:bar:baz:quux", ":", at: 6..8)
   #

data/lib/string_splitter/split.rb ADDED

@@ -0,0 +1,51 @@
+# frozen_string_literal: true
+class StringSplitter
+  class Split
+    attr_reader :captures, :count, :index, :lhs, :position, :rhs, :separator
+    attr_writer :rhs
+    alias pos position
+    def initialize(captures:, lhs:, rhs:, separator:)
+      @captures = captures
+      @lhs = lhs
+      @rhs = rhs
+      @separator = separator
+    end
+    # 0-based index relative to the end of the array, e.g. for 5 items:
+    #
+    #  index | rindex
+    #  ------|-------
+    #    0   |   4
+    #    1   |   3
+    #    2   |   2
+    #    3   |   1
+    #    4   |   0
+    def rindex
+      @count - @position
+    end
+    # 1-based position relative to the end of the array, e.g. for 5 items:
+    #
+    #   position | rposition
+    #  ----------|----------
+    #      1     |    5
+    #      2     |    4
+    #      3     |    3
+    #      4     |    2
+    #      5     |    1
+    def rposition
+      @count + 1 - @position
+    end
+    alias rpos rposition
+    def update!(count:, index:)
+      @count = count
+      @index = index
+      @position = index + 1
+      freeze
+    end
+  end
+end

data/lib/string_splitter/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.7.0'
+  VERSION = '0.7.1'
 end

metadata CHANGED

@@ -1,29 +1,15 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.7.0
+  version: 0.7.1
 platform: ruby
 authors:
 - chocolateboy
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-08-21 00:00:00.000000000 Z
+date: 2020-08-22 00:00:00.000000000 Z
 dependencies:
-- !ruby/object:Gem::Dependency
-  name: values
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.8'
-  type: :runtime
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.8'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
@@ -104,6 +90,7 @@ files:
 - LICENSE.md
 - README.md
 - lib/string_splitter.rb
+- lib/string_splitter/split.rb
 - lib/string_splitter/version.rb
 homepage: https://github.com/chocolateboy/string_splitter
 licenses: