RubyGems - string_splitter - Versions diffs - 0.0.1 → 0.1.0 - Mend

string_splitter 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +8 -1
data/README.md +43 -28
data/lib/string_splitter.rb +96 -83
data/lib/string_splitter/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 22890318b3693dc7d3489ff580109bf377a8798b820e8589c9ff52a9880ca4a1
-  data.tar.gz: bb9b1894513e3206bd50ccc2908b756d0fac5bfec13d96d76617ba006306b4da
+  metadata.gz: 3e22243b9c975e4ac2ffa8f03871f5c41fd406cb3b4b780dd303e89bcf024c45
+  data.tar.gz: fced8a0defba0a46d1dde5ffef3c7ff9a93c4f43afa8d0b19c98d93524c466f4
 SHA512:
-  metadata.gz: 51777239fa93949f6fef2690a12d91ece9b024cbffc60d5942073366a159df90341e497dcb19053de11dadc2ac318c7fb8d19f52f92b405445ec994fe847dd23
-  data.tar.gz: a65a1db7358c8e0f46a28ee2328e47b8ffa326d810686b35b56a8ce09358932b62337f885e58bb8ac97397c046046d4d02a744ca985383a4db4bd3e97e979737
+  metadata.gz: 2d7991eb02cea9c35a26e9248c33ff923b71637b476110f58d4b35abae398efa5603eca6d0ed178d797a69d79f6b1caa1468ec4753e185d60280ccfe4049bc1a
+  data.tar.gz: a944e39f2105d61585ca703073dcedd3e4b4e2be9dd88db01bfef07ab67bb265b15741d3b8a558c505525e06f800385f857a81945f527256102fdcef3084ee89

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,10 @@
+## 0.1.0 - 2018-06-22
+- **breaking change**: the block now takes a single `split` object with an `index`
+  field, rather than seperate `index` and `split` arguments
+- add support for negative indices in the value supplied to the `at` option
+- add a `count` field to the split object containing the total number of splits
 ## 0.0.1 - 2018-06-21
-* initial release
+- initial release

data/README.md CHANGED Viewed

@@ -39,6 +39,8 @@ ss = StringSplitter.new
 # same as String#split
 ss.split("foo bar baz quux")
+ss.split("foo bar baz quux", " ")
+ss.split("foo bar baz quux", /\s+/)
 # => ["foo", "bar", "baz", "quux"]
 # split on the first separator
@@ -46,20 +48,22 @@ ss.split("foo:bar:baz:quux", ":", at: 1)
 # => ["foo", "bar:baz:quux"]
 # split on the last separator
-ss.rsplit("foo:bar:baz:quux", ":", at: 1)
+ss.split("foo:bar:baz:quux", ":", at: -1)
 # => ["foo:bar:baz", "quux"]
-# split on a multiple indices
-line = "-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md"
-ss.split(line, at: [1..5, 8])
-# => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
+# split on multiple separator indices
+ss.split("1:2:3:4:5:6:7:8:9", ":", at: [1..3, -1])
+# => ["1", "2", "3", "4:5:6:7:8", "9"]
-# fine-grained control via a block
-ss.split("foo:bar:baz-baz", /[:-]/) do |i, split|
-  split.rhs == "baz" && strip.separator == "-"
-end
-# => ["foo:bar:baz", "baz"]
+# split from the right
+ss.rsplit("1:2:3:4:5:6:7:8:9", ":", at: [1..3, 5])
+# => ["1:2:3:4", "5:6", "7", "8", "9"]
+# full control via a block
+result = s.split('a:a:a:b:c:c:e:a:a:d:c', ":") do |split|
+  split.index > 1 && split.lhs == split.rhs
+end
+# => ["a:a", "a:b:c", "c:e:a", "a:d:c"]
 ```
 # DESCRIPTION
@@ -70,17 +74,18 @@ and handle a few common cases e.g.:
 * limiting the number of splits
 * including the separators in the results
-* removing (some) empty tokens
+* removing (some) empty fields
 But, because the API is squeezed into two overloaded parameters (the separator and the limit),
 achieving the desired effects can be tricky. For instance, while `String#split` removes empty
-trailing tokens (by default), it provides no way to remove *all* empty tokens. Likewise, the
+trailing fields (by default), it provides no way to remove *all* empty fields. Likewise, the
 cramped API means there's no way to combine e.g. a limit (positive integer) with the option
-to preserve empty tokens (negative integer).
+to preserve empty fields (negative integer).
 If `split` was being written from scratch, without the baggage of its legacy API,
 it's possible that some of these options would be made explicit rather than overloading
-the `limit` parameter. And, indeed, this is possible in some implementations, e.g. in Crystal:
+the parameters. And, indeed, this is possible in some implementations,
+e.g. in Crystal:
 ```ruby
 ":foo:bar:baz:".split(":", remove_empty: false) # => ["", "foo", "bar", "baz", ""]
@@ -93,23 +98,25 @@ and delegating the strategy — i.e. which splits should be accepted or rejected
 ```ruby
 ss = StringSplitter.new
-ss.split("foo:bar:baz", ":")  { |i| i == 1 } # => ["foo", "bar:baz"]
-ss.rsplit("foo:bar:baz", ":") { |i| i == 1 } # => ["foo:bar", "baz"]
+ss.split("foo:bar:baz", ":") { |split| split.index == 1 }
+# => ["foo", "bar:baz"]
+ss.split("foo:bar:baz", ":") { |split| split.index == split.count }
+# => ["foo:bar", "baz"]
 ```
-As a shortcut, the common case of splitting at one or more indices can be specified via an option:
+As a shortcut, the common case of splitting on separators at one or more indices is supported by an option:
 ```ruby
-ss.split('foo:bar:baz:quux', ':', at: [1, 3]) # => ["foo", "bar:baz", "quux"]
+ss.split('foo:bar:baz:quux', ':', at: [1, -1]) # => ["foo", "bar:baz", "quux"]
 ```
 # WHY?
 I wanted to split semi-structured output into fields without having to resort to a regex or a full-blown parser.
-As an example, the nominally unstructured/human-friendly output of many Unix commands is, in practice,
-*almost* structured. It's often tantalizingly close to being space-separated, apart from a few pesky
-exceptions e.g.:
+As an example, the nominally unstructured output of many Unix commands is often, in practice, formatted in a way
+that's tantalizingly close to being machine-readable, apart from a few pesky exceptions e.g.:
 ```bash
 $ ls -la
@@ -148,22 +155,30 @@ line.match(/^(\S+) \s+ (\d+) \s+ (\S+) \s+ (\S+) \s+ (\d+) \s+ (\S+ \s+ \d+ \s+
 ```
 But that requires us to specify *everything*. What we really want is a version of `split`
-that we can disable for the 6th and 7th columns i.e. manual control over which splits
-are accepted, rather than being restricted to the single, baked-in strategy supported by
-the `limit` parameter.
+which allows us to veto splitting for the 6th and 7th separators i.e. control over which
+splits are accepted, rather than being restricted to the single, baked-in strategy provided
+by the `limit` parameter.
-StringSplitter makes it easy to create your own splitting strategies to both emulate and
-enhance existing behaviors and create new ones e.g., in this case:
+By providing a simple way to accept or reject each split, StringSplitter makes cases like
+this easy to handle, either via a block:
 ```ruby
-ss.split(line, at: [1..5, 8])
+ss.split(line) do |split|
+  case split.index when 1..5, 8 then true end
+end
+# => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
+```
+Or via its option shortcut:
+```ruby
+ss.split(line, at: [1..5, 8])
 # => ["-rw-r--r--", "1", "user", "users", "87", "Jun 18 18:16", "CHANGELOG.md"]
 ```
 # VERSION
-0.0.1
+0.1.0
 # SEE ALSO

data/lib/string_splitter.rb CHANGED Viewed

@@ -7,38 +7,39 @@ require 'values'
 #   - providing full control over which splits are accepted or rejected
 #   - adding support for splitting from right-to-left
 #   - encapsulating splitting options/preferences in instances rather than trying to
-#     cram them in to overloaded method parameters
+#     cram them into overloaded method parameters
 #
 # These enhancements allow splits to handle many cases that otherwise require bigger
 # guns e.g. regex matching or parsing.
 class StringSplitter
-  ACCEPT = ->(_index, _split) { true }
-  Split = Value.new(:captures, :lhs, :rhs, :separator)
-  # TODO: add default_separator
-  def initialize(include_captures: true, remove_empty: false, spread_captures: true)
+  ACCEPT = ->(_split) { true }
+  DEFAULT_SEPARATOR = /\s+/
+  NO_SPLITS = []
+  Split = Value.new(:captures, :count, :index, :lhs, :rhs, :separator)
+  def initialize(
+    default_separator: DEFAULT_SEPARATOR,
+    include_captures: true,
+    remove_empty: false,
+    spread_captures: true
+  )
+    @default_separator = default_separator
     @include_captures = include_captures
     @remove_empty = remove_empty
     @spread_captures = spread_captures
   end
-  def split(string, delimiter = /\s+/, at: nil, &block)
-    result, block, iterator, index = split_common(string, delimiter, at, block, :forward)
-    return result unless iterator
+  attr_reader :default_separator, :include_captures, :remove_empty, :spread_captures
-    iterator.each do |split|
-      next if @remove_empty && split.rhs.empty?
-      if result.empty?
-        next if @remove_empty && split.lhs.empty?
-        result << split.lhs
-      end
+  def split(string, delimiter = @default_separator, at: nil, &block)
+    result, block, splits, count, index = split_common(string, delimiter, at, block)
-      index += 1
+    splits.each do |split|
+      split = Split.with(split.merge({ index: (index += 1), count: count }))
+      result << split.lhs if result.empty?
-      if block.call(index, split)
+      if block.call(split)
         if @include_captures
           if @spread_captures
             result += split.captures
@@ -59,22 +60,14 @@ class StringSplitter
   alias lsplit split
-  def rsplit(string, delimiter = /\s+/, at: nil, &block)
-    result, block, iterator, index = split_common(string, delimiter, at, block, :reverse)
-    return result unless iterator
-    iterator.each do |split|
-      next if @remove_empty && split.lhs.empty?
-      if result.empty?
-        next if @remove_empty && split.rhs.empty?
-        result.unshift(split.rhs)
-      end
+  def rsplit(string, delimiter = @default_separator, at: nil, &block)
+    result, block, splits, count, index = split_common(string, delimiter, at, block)
-      index += 1
+    splits.reverse!.each do |split|
+      split = Split.with(split.merge({ index: (index += 1), count: count }))
+      result.unshift(split.rhs) if result.empty?
-      if block.call(index, split)
+      if block.call(split)
         if @include_captures
           if @spread_captures
             result = split.captures + result
@@ -95,61 +88,45 @@ class StringSplitter
   private
-  def forward_iterator(parts, ncaptures)
-    parts = parts.dup
-    Enumerator.new do |yielder|
-      until parts.empty?
-        lhs = parts.shift
-        separator = parts.shift
-        captures = parts.shift(ncaptures)
-        rhs = parts.length == 1 ? parts.shift : parts.first
-        yielder << Split.with({
-          lhs: lhs,
-          rhs: rhs,
-          separator: separator,
-          captures: captures,
-        })
-      end
-    end
-  end
+  def splits_for(parts, ncaptures)
+    result = []
+    splits = []
+    until parts.empty?
+      lhs = parts.shift
+      separator = parts.shift
+      captures = parts.shift(ncaptures)
+      rhs = parts.length == 1 ? parts.shift : parts.first
+      if @remove_empty && (lhs.empty? || rhs.empty?)
+        if lhs.empty? && rhs.empty?
+          # do nothing
+        elsif parts.empty? # last split
+          result << (!lhs.empty? ? lhs : rhs) if splits.empty?
+        elsif !lhs.empty?
+          # replace the empty rhs with the non-empty lhs
+          parts[0] = lhs
+        end
-  def reverse_iterator(parts, ncaptures)
-    parts = parts.dup
-    Enumerator.new do |yielder|
-      until parts.empty?
-        rhs = parts.pop
-        captures = parts.pop(ncaptures)
-        separator = parts.pop
-        lhs = parts.length == 1 ? parts.pop : parts.last
-        yielder << Split.with({
-          lhs: lhs,
-          rhs: rhs,
-          separator: separator,
-          captures: captures,
-        })
+        next
       end
+      splits << {
+        lhs: lhs,
+        rhs: rhs,
+        separator: separator,
+        captures: captures,
+      }
     end
+    [result, splits]
   end
   # setup common to both split methods
-  def split_common(string, delimiter, at, block, type)
+  def split_common(string, delimiter, at, block)
     unless (match = string.match(delimiter))
       result = (@remove_empty && string.empty?) ? [] : [string]
-      return [result]
-    end
-    unless block
-      if at
-        block = lambda do |index, _split|
-          case index when *at then true else false end
-        end
-      else
-        block = ACCEPT
-      end
+      return [result, block, NO_SPLITS, 0, 0]
     end
     ncaptures = match.captures.length
@@ -178,7 +155,43 @@ class StringSplitter
     end
     parts = string.split(/(#{delimiter})/, -1)
-    iterator = method("#{type}_iterator".to_sym).call(parts, ncaptures)
-    [[], block, iterator, 0]
+    result, splits = splits_for(parts, ncaptures)
+    count = splits.length
+    unless block
+      if at
+        at = Array(at).map do |index|
+          if index.is_a?(Integer) && index.negative?
+            # translate 1-based negative indices to 1-based positive
+            # indices e.g:
+            #
+            #   ss.split("foo:bar:baz:quux", ":", at: -1)
+            #
+            # translates to:
+            #
+            #   ss.split("foo:bar:baz:quux", ":", at: 3)
+            #
+            # XXX note: we don't use modulo, because we don't want
+            # out-of-bounds indices to silently work e.g. we don't want:
+            #
+            #   ss.split("foo:bar:baz:quux", ":", -42)
+            #
+            # to mysteriously match when the index is 2
+            count + 1 + index
+          else
+            index
+          end
+        end
+        block = lambda do |split|
+          case split.index when *at then true else false end
+        end
+      else
+        block = ACCEPT
+      end
+    end
+    [result, block, splits, count, 0]
   end
 end

data/lib/string_splitter/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class StringSplitter
-  VERSION = '0.0.1'
+  VERSION = '0.1.0'
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: string_splitter
 version: !ruby/object:Gem::Version
-  version: 0.0.1
+  version: 0.1.0
 platform: ruby
 authors:
 - chocolateboy
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-06-20 00:00:00.000000000 Z
+date: 2018-06-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: values