RubyGems - buftok - Versions diffs - 0.1 → 0.2.0 - Mend

buftok 0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

data/CONTRIBUTING.md ADDED

@@ -0,0 +1,49 @@
+## Contributing
+In the spirit of [free software][free-sw], **everyone** is encouraged to help
+improve this project. Here are some ways *you* can contribute:
+[free-sw]: http://www.fsf.org/licensing/essays/free-sw.html
+* Use alpha, beta, and pre-release versions.
+* Report bugs.
+* Suggest new features.
+* Write or edit documentation.
+* Write specifications.
+* Write code (**no patch is too small**: fix typos, add comments, clean up
+  inconsistent whitespace).
+* Refactor code.
+* Fix [issues][].
+* Review patches.
+[issues]: https://github.com/sferik/buftok/issues
+## Submitting an Issue
+We use the [GitHub issue tracker][issues] to track bugs and features. Before
+submitting a bug report or feature request, check to make sure it hasn't
+already been submitted. When submitting a bug report, please include a [Gist][]
+that includes a stack trace and any details that may be necessary to reproduce
+the bug, including your gem version, Ruby version, and operating system.
+Ideally, a bug report should include a pull request with failing specs.
+[gist]: https://gist.github.com/
+## Submitting a Pull Request
+1. [Fork the repository.][fork]
+2. [Create a topic branch.][branch]
+3. Add specs for your unimplemented feature or bug fix.
+4. Run `bundle exec rake spec`. If your specs pass, return to step 3.
+5. Implement your feature or bug fix.
+6. Run `bundle exec rake spec`. If your specs fail, return to step 5.
+7. Run `open coverage/index.html`. If your changes are not completely covered
+   by your tests, return to step 3.
+8. Run `RUBYOPT=W2 bundle exec rake spec 2>&1 | grep buftok`. If your changes
+   produce any warnings, return to step 5.
+9. Add documentation for your feature or bug fix.
+10. Run `bundle exec rake yard`. If your changes are not 100% documented, go
+    back to step 9.
+11. Commit and push your changes.
+12. [Submit a pull request.][pr]
+[fork]: http://help.github.com/fork-a-repo/
+[branch]: http://learn.github.com/p/branching.html
+[pr]: http://help.github.com/send-pull-requests/

data/Gemfile ADDED

@@ -0,0 +1,6 @@
+source 'https://rubygems.org'
+gem 'rake'
+gem 'rdoc'
+gemspec

data/LICENSE.md ADDED

@@ -0,0 +1,56 @@
+Ruby is copyrighted free software by Yukihiro Matsumoto <matz@netlab.jp>.
+You can redistribute it and/or modify it under either the terms of the
+2-clause BSDL (see the file BSDL), or the conditions below:
+  1. You may make and give away verbatim copies of the source form of the
+     software without restriction, provided that you duplicate all of the
+     original copyright notices and associated disclaimers.
+  2. You may modify your copy of the software in any way, provided that
+     you do at least ONE of the following:
+       a) place your modifications in the Public Domain or otherwise
+          make them Freely Available, such as by posting said
+	  modifications to Usenet or an equivalent medium, or by allowing
+	  the author to include your modifications in the software.
+       b) use the modified software only within your corporation or
+          organization.
+       c) give non-standard binaries non-standard names, with
+          instructions on where to get the original software distribution.
+       d) make other distribution arrangements with the author.
+  3. You may distribute the software in object code or binary form,
+     provided that you do at least ONE of the following:
+       a) distribute the binaries and library files of the software,
+	  together with instructions (in the manual page or equivalent)
+	  on where to get the original distribution.
+       b) accompany the distribution with the machine-readable source of
+	  the software.
+       c) give non-standard binaries non-standard names, with
+          instructions on where to get the original software distribution.
+       d) make other distribution arrangements with the author.
+  4. You may modify and include the part of the software into any other
+     software (possibly commercial).  But some files in the distribution
+     are not written by the author, so that they are not under these terms.
+     For the list of those files and their copying conditions, see the
+     file LEGAL.
+  5. The scripts and library files supplied as input to or produced as
+     output from the software do not automatically fall under the
+     copyright of the software, but belong to whomever generated them,
+     and may be sold commercially, and may be aggregated with this
+     software.
+  6. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
+     IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
+     WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+     PURPOSE.

data/README.md ADDED

@@ -0,0 +1,48 @@
+# BufferedTokenizer
+[![Gem Version](https://badge.fury.io/rb/buftok.png)][gem]
+[![Build Status](https://travis-ci.org/sferik/buftok.png?branch=master)][travis]
+[![Dependency Status](https://gemnasium.com/sferik/buftok.png?travis)][gemnasium]
+[![Code Climate](https://codeclimate.com/github/sferik/buftok.png)][codeclimate]
+[gem]: https://rubygems.org/gems/buftok
+[travis]: https://travis-ci.org/sferik/buftok
+[gemnasium]: https://gemnasium.com/sferik/buftok
+[codeclimate]: https://codeclimate.com/github/sferik/buftok
+###### Statefully split input data by a specifiable token
+BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by
+default.  It allows input to be spoon-fed from some outside source which
+receives arbitrary length datagrams which may-or-may-not contain the token by
+which entities are delimited.  In this respect it's ideally paired with
+something like [EventMachine][].
+[EventMachine]: http://rubyeventmachine.com/
+## Supported Ruby Versions
+This library aims to support and is [tested against][travis] the following Ruby
+implementations:
+* Ruby 1.8.7
+* Ruby 1.9.2
+* Ruby 1.9.3
+* Ruby 2.0.0
+If something doesn't work on one of these interpreters, it's a bug.
+This library may inadvertently work (or seem to work) on other Ruby
+implementations, however support will only be provided for the versions listed
+above.
+If you would like this library to support another Ruby version, you may
+volunteer to be a maintainer. Being a maintainer entails making sure all tests
+run and pass on that implementation. When something breaks on your
+implementation, you will be responsible for providing patches in a timely
+fashion. If critical issues for a particular implementation exist at the time
+of a major release, support for that Ruby version may be dropped.
+## Copyright
+Copyright (c) 2006-2013 Tony Arcieri, Martin Emde, Erik Michaels-Ober.
+Distributed under the [Ruby license][license].
+[license]: http://www.ruby-lang.org/en/LICENSE.txt

data/Rakefile CHANGED

@@ -1,31 +1,66 @@
-require 'rake'
-require 'rake/rdoctask'
-require 'rake/gempackagetask'
-require 'spec/rake/spectask'
+require 'bundler'
+require 'rdoc/task'
+require 'rake/testtask'
-Spec::Rake::SpecTask.new(:spec) do |task|
-    task.spec_files = FileList['**/*_spec.rb']
-end
+task :default => :test
+Bundler::GemHelper.install_tasks
-Rake::RDocTask.new(:rdoc) do |task|
-    task.rdoc_dir = 'doc'
-    task.title    = 'BufferedTokenizer'
-    task.rdoc_files.include('lib/**/*.rb')
+RDoc::Task.new do |task|
+  task.rdoc_dir = 'doc'
+  task.title    = 'BufferedTokenizer'
+  task.rdoc_files.include('lib/**/*.rb')
 end
-spec = Gem::Specification.new do |s|
-  s.name = %q{buftok}
-  s.version = "0.1"
-  s.date = %q{2006-12-18}
-  s.summary = %q{BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs}
-  s.email = %q{tony@clickcaster.com}
-  s.homepage = %q{http://buftok.rubyforge.org}
-  s.rubyforge_project = %q{buftok}
-  s.has_rdoc = true
-  s.authors = ["Tony Arcieri","Martin Emde"]
-  s.files = ["Rakefile", "lib", "lib/buftok.rb"]
+Rake::TestTask.new :test do |t|
+  t.libs << 'lib'
+  t.test_files = FileList['test/**/*.rb']
 end
-Rake::GemPackageTask.new(spec) do |pkg|
-  pkg.need_tar = true
+desc "Benchmark the current implementation"
+task :bench do
+  require 'benchmark'
+  require File.expand_path('lib/buftok', File.dirname(__FILE__))
+  n = 50000
+  delimiter = "\n\n"
+  frequency1 = 1000
+  puts "generating #{n} strings, with #{delimiter.inspect} every #{frequency1} strings..."
+  data1 = (0...n).map do |i|
+    (((i % frequency1 == 1) ? "\n" : "") +
+      ("s" * i) +
+      ((i % frequency1 == 0) ? "\n" : "")).freeze
+  end
+  frequency2 = 10
+  puts "generating #{n} strings, with #{delimiter.inspect} every #{frequency2} strings..."
+  data2 = (0...n).map do |i|
+    (((i % frequency2 == 1) ? "\n" : "") +
+      ("s" * i) +
+      ((i % frequency2 == 0) ? "\n" : "")).freeze
+  end
+  Benchmark.bmbm do |x|
+    x.report("1 char, freq: #{frequency1}") do
+      bt1 = BufferedTokenizer.new
+      n.times { |i| bt1.extract(data1[i]) }
+    end
+    x.report("2 char, freq: #{frequency1}") do
+      bt2 = BufferedTokenizer.new(delimiter)
+      n.times { |i| bt2.extract(data1[i]) }
+    end
+    x.report("1 char, freq: #{frequency2}") do
+      bt3 = BufferedTokenizer.new
+      n.times { |i| bt3.extract(data2[i]) }
+    end
+    x.report("2 char, freq: #{frequency2}") do
+      bt4 = BufferedTokenizer.new(delimiter)
+      n.times { |i| bt4.extract(data2[i]) }
+    end
+  end
 end

data/buftok.gemspec ADDED

@@ -0,0 +1,17 @@
+Gem::Specification.new do |spec|
+  spec.add_development_dependency 'bundler', '~> 1.0'
+  spec.authors       = ["Tony Arcieri", "Martin Emde", "Erik Michaels-Ober"]
+  spec.description   = %q{BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs}
+  spec.email         = "sferik@gmail.com"
+  spec.files         = %w(CONTRIBUTING.md Gemfile LICENSE.md README.md Rakefile buftok.gemspec)
+  spec.files        += Dir.glob("lib/**/*.rb")
+  spec.files        += Dir.glob("test/**/*.rb")
+  spec.test_files    = spec.files.grep(%r{^test/})
+  spec.homepage      = "https://github.com/sferik/buftok"
+  spec.licenses      = ['MIT']
+  spec.name          = "buftok"
+  spec.require_paths = ["lib"]
+  spec.required_rubygems_version = '>= 1.3.5'
+  spec.summary       = spec.description
+  spec.version       = "0.2.0"
+end

data/lib/buftok.rb CHANGED

@@ -1,26 +1,22 @@
-# BufferedTokenizer - Statefully split input data by a specifiable token
-# (C)2006 Tony Arcieri, Martin Emde
-# Distributed under the Ruby license (http://www.ruby-lang.org/en/LICENSE.txt)
 # BufferedTokenizer takes a delimiter upon instantiation, or acts line-based
 # by default.  It allows input to be spoon-fed from some outside source which
 # receives arbitrary length datagrams which may-or-may-not contain the token
 # by which entities are delimited.  In this respect it's ideally paired with
-# something like EventMachine (http://rubyforge.org/projects/eventmachine)
+# something like EventMachine (http://rubyeventmachine.com/).
 class BufferedTokenizer
-  # New BufferedTokenizers will operate on lines delimited by "\n" by default
-  # or allow you to specify any delimiter token you so choose, which will then
-  # be used by String#split to tokenize the input data
-  def initialize(delimiter = "\n")
-    # Store the specified delimiter
+  # New BufferedTokenizers will operate on lines delimited by a delimiter,
+  # which is by default the global input delimiter $/ ("\n").
+  #
+  # The input buffer is stored as an array.  This is by far the most efficient
+  # approach given language constraints (in C a linked list would be a more
+  # appropriate data structure).  Segments of input data are stored in a list
+  # which is only joined when a token is reached, substantially reducing the
+  # number of objects required for the operation.
+  def initialize(delimiter = $/)
     @delimiter = delimiter
-    # The input buffer is stored as an array.  This is by far the most efficient
-    # approach given language constraints (in C a linked list would be a more
-    # appropriate data structure).  Segments of input data are stored in a list
-    # which is only joined when a token is reached, substantially reducing the
-    # number of objects required for the operation.
     @input = []
+    @tail = ''
+    @trim = @delimiter.length - 1
   end
   # Extract takes an arbitrary string of input data and returns an array of
@@ -28,49 +24,36 @@ class BufferedTokenizer
   # makes for easy processing of datagrams using a pattern like:
   #
   #   tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
+  #
+  # Using -1 makes split to return "" if the token is at the end of
+  # the string, meaning the last element is the start of the next chunk.
   def extract(data)
-    # Extract token-delimited entities from the input string with the split command.
-    # There's a bit of craftiness here with the -1 parameter.  Normally split would
-    # behave no differently regardless of if the token lies at the very end of the
-    # input buffer or not (i.e. a literal edge case)  Specifying -1 forces split to
-    # return "" in this case, meaning that the last entry in the list represents a
-    # new segment of data where the token has not been encountered
-    entities = data.split @delimiter, -1
+    if @trim > 0
+      tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
+      data = tail_end + data if tail_end
+    end
-    # Move the first entry in the resulting array into the input buffer.  It represents
-    # the last segment of a token-delimited entity unless it's the only entry in the list.
-    @input << entities.shift
+    @input << @tail
+    entities = data.split(@delimiter, -1)
+    @tail = entities.shift
-    # If the resulting array from the split is empty, the token was not encountered
-    # (not even at the end of the buffer).  Since we've encountered no token-delimited
-    # entities this go-around, return an empty array.
-    return [] if entities.empty?
+    unless entities.empty?
+      @input << @tail
+      entities.unshift @input.join
+      @input.clear
+      @tail = entities.pop
+    end
-    # At this point, we've hit a token, or potentially multiple tokens.  Now we can bring
-    # together all the data we've buffered from earlier calls without hitting a token,
-    # and add it to our list of discovered entities.
-    entities.unshift @input.join
-    # Now that we've hit a token, joined the input buffer and added it to the entities
-    # list, we can go ahead and clear the input buffer.  All of the segments that were
-    # stored before the join can now be garbage collected.
-    @input.clear
-    # The last entity in the list is not token delimited, however, thanks to the -1
-    # passed to split.  It represents the beginning of a new list of as-yet-untokenized
-    # data, so we add it to the start of the list.
-    @input << entities.pop
-    # Now we're left with the list of extracted token-delimited entities we wanted
-    # in the first place.  Hooray!
     entities
   end
   # Flush the contents of the input buffer, i.e. return the input buffer even though
   # a token has not yet been encountered
   def flush
+    @input << @tail
     buffer = @input.join
     @input.clear
+    @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7
     buffer
   end
 end

data/test/test_buftok.rb ADDED

@@ -0,0 +1,27 @@
+require 'test/unit'
+require 'buftok'
+class TestBuftok < Test::Unit::TestCase
+  def test_buftok
+    tokenizer = BufferedTokenizer.new
+    assert_equal %w[foo], tokenizer.extract("foo\nbar".freeze)
+    assert_equal %w[barbaz qux], tokenizer.extract("baz\nqux\nquu".freeze)
+    assert_equal 'quu', tokenizer.flush
+    assert_equal '', tokenizer.flush
+  end
+  def test_delimiter
+    tokenizer = BufferedTokenizer.new('<>')
+    assert_equal ['', "foo\n"], tokenizer.extract("<>foo\n<>".freeze)
+    assert_equal %w[bar], tokenizer.extract('bar<>baz'.freeze)
+    assert_equal 'baz', tokenizer.flush
+  end
+  def test_split_delimiter
+    tokenizer = BufferedTokenizer.new('<>'.freeze)
+    assert_equal [], tokenizer.extract('foo<'.freeze)
+    assert_equal %w[foo], tokenizer.extract('>bar<'.freeze)
+    assert_equal %w[bar<baz qux], tokenizer.extract('baz<>qux<>'.freeze)
+    assert_equal '', tokenizer.flush
+  end
+end

metadata CHANGED

@@ -1,49 +1,75 @@
---- !ruby/object:Gem::Specification
-rubygems_version: 0.9.0
-specification_version: 1
+--- !ruby/object:Gem::Specification
 name: buftok
-version: !ruby/object:Gem::Version
-  version: "0.1"
-date: 2006-12-18 00:00:00 -07:00
-summary: BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs
-require_paths:
-- lib
-email: tony@clickcaster.com
-homepage: http://buftok.rubyforge.org
-rubyforge_project: buftok
-description:
-autorequire:
-default_executable:
-bindir: bin
-has_rdoc: true
-required_ruby_version: !ruby/object:Gem::Version::Requirement
-  requirements:
-  - - ">"
-    - !ruby/object:Gem::Version
-      version: 0.0.0
-  version:
+version: !ruby/object:Gem::Version
+  version: 0.2.0
+  prerelease:
 platform: ruby
-signing_key:
-cert_chain:
-post_install_message:
-authors:
+authors:
 - Tony Arcieri
 - Martin Emde
-files:
+- Erik Michaels-Ober
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2013-11-22 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        version: '1.0'
+description: BufferedTokenizer extracts token delimited entities from a sequence of
+  arbitrary inputs
+email: sferik@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- CONTRIBUTING.md
+- Gemfile
+- LICENSE.md
+- README.md
 - Rakefile
-- lib
+- buftok.gemspec
 - lib/buftok.rb
-test_files: []
+- test/test_buftok.rb
+homepage: https://github.com/sferik/buftok
+licenses:
+- MIT
+post_install_message:
 rdoc_options: []
-extra_rdoc_files: []
-executables: []
-extensions: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: 1.3.5
 requirements: []
-dependencies: []
+rubyforge_project:
+rubygems_version: 1.8.23
+signing_key:
+specification_version: 3
+summary: BufferedTokenizer extracts token delimited entities from a sequence of arbitrary
+  inputs
+test_files:
+- test/test_buftok.rb
+has_rdoc: