RubyGems - buftok - Versions diffs - 0.2.0 → 1.0.0 - Mend

buftok 0.2.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: eb2a12017fae47f920bd9a3694845079e8a4a76c5fcffad9abcc18716e2c50a7
+  data.tar.gz: af65ae0012ce581da376624888e208338face85e18bc27f13c3d246400a6d464
+SHA512:
+  metadata.gz: 2a0f62b94c72c92b2fa31a22b2fd52952c25fea5e9b4ab2f4770bba3e87d1e9372539d95565c94e13cd4f9358a7e006787b0c73ce5318dd1d0a8da8076ed8b2b
+  data.tar.gz: '08266806dca8fb19015c54447c9156c008b5cb979fbb8eb097dbf161759d264f4ec07e659d7e123dcebf7243738f87dc6fac8b7278b025892eb77fd2a8839606'

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,86 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [1.0.0] - 2026-03-20
+### Added
+- RBS type signatures in `sig/buftok.rbs` with Steep for strict type checking
+- RuboCop, Standard, rubocop-minitest, rubocop-performance, and rubocop-rake for linting
+- Mutant for mutation testing with 100% coverage
+- GitHub Actions workflows for linting, type checking, and mutation testing
+- `.github/FUNDING.yml` for GitHub Sponsors
+- Gemspec metadata (`allowed_push_host`, `changelog_uri`, `documentation_uri`,
+  `funding_uri`, `homepage_uri`, `rubygems_mfa_required`, `source_code_uri`,
+  `bug_tracker_uri`)
+- `CHANGELOG.md`
+### Changed
+- Require Ruby >= 3.2
+- Require RubyGems >= 3.0
+- Test against Ruby 3.2, 3.3, 3.4, and 4.0 (drop EOL 2.6, 2.7, 3.0)
+- Update `actions/checkout` to v6 and `ruby/setup-ruby` to v1
+- Replace test-unit with Minitest 6
+- Replace `inject` with `sum` in `size` method
+- Use `@tail.clear` instead of `String.new` in `flush` (drop Ruby 1.8.7 workaround)
+- Move development dependencies from gemspec to Gemfile
+- Bump rake from `~> 10.0` to `>= 13`
+- Extract `rejoin_split_delimiter` and `consolidate_input` private methods
+- Update copyright years to 2006-2026
+- Rename Erik Michaels-Ober to Erik Berlin
+### Fixed
+- Typo in test comment ("Desipte" -> "Despite")
+## [0.3.0] - 2021-03-25
+### Added
+- `Buftok` constant as an alias for `BufferedTokenizer`
+- `BufferedTokenizer#size` method to determine internal buffer size
+- GitHub Actions CI workflow
+- Support for `frozen_string_literal`
+### Changed
+- Replace Ruby license with MIT license
+- Modernize gemspec
+- Remove Travis CI in favor of GitHub Actions
+- Update supported Ruby versions to 2.6, 2.7, 3.0
+## [0.2.0] - 2013-11-22
+### Added
+- Tests
+- Benchmark rake task
+- Support for multi-character delimiters split across chunks
+- Section on supported Ruby versions in README
+### Changed
+- Use global input delimiter `$/` as default instead of hard-coded `"\n"`
+- Unified handling of single/multi-character delimiters
+## [0.1.0] - 2013-11-20
+### Added
+- Initial release of BufferedTokenizer
+- Line-based tokenization with configurable delimiter
+- `extract` method for incremental tokenization
+- `flush` method to retrieve remaining buffer contents
+[Unreleased]: https://github.com/sferik/buftok/compare/v1.0.0...HEAD
+[1.0.0]: https://github.com/sferik/buftok/compare/v0.3.0...v1.0.0
+[0.3.0]: https://github.com/sferik/buftok/compare/v0.2.0...v0.3.0
+[0.2.0]: https://github.com/sferik/buftok/compare/v0.1...v0.2.0
+[0.1.0]: https://github.com/sferik/buftok/releases/tag/v0.1

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2006-2026 Tony Arcieri, Martin Emde, Erik Berlin
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md CHANGED Viewed

@@ -1,39 +1,116 @@
 # BufferedTokenizer
-[![Gem Version](https://badge.fury.io/rb/buftok.png)][gem]
-[![Build Status](https://travis-ci.org/sferik/buftok.png?branch=master)][travis]
-[![Dependency Status](https://gemnasium.com/sferik/buftok.png?travis)][gemnasium]
-[![Code Climate](https://codeclimate.com/github/sferik/buftok.png)][codeclimate]
+[![Gem Version](http://img.shields.io/gem/v/buftok.svg)][gem]
+[![Test](https://github.com/sferik/buftok/actions/workflows/test.yml/badge.svg)][test]
+[![Lint](https://github.com/sferik/buftok/actions/workflows/lint.yml/badge.svg)][lint]
+[![Type Check](https://github.com/sferik/buftok/actions/workflows/typecheck.yml/badge.svg)][typecheck]
+[![Mutation Testing](https://github.com/sferik/buftok/actions/workflows/mutant.yml/badge.svg)][mutant]
+[![Documentation Coverage](https://github.com/sferik/buftok/actions/workflows/yardstick.yml/badge.svg)][yardstick]
 [gem]: https://rubygems.org/gems/buftok
-[travis]: https://travis-ci.org/sferik/buftok
-[gemnasium]: https://gemnasium.com/sferik/buftok
-[codeclimate]: https://codeclimate.com/github/sferik/buftok
+[test]: https://github.com/sferik/buftok/actions/workflows/test.yml
+[lint]: https://github.com/sferik/buftok/actions/workflows/lint.yml
+[typecheck]: https://github.com/sferik/buftok/actions/workflows/typecheck.yml
+[mutant]: https://github.com/sferik/buftok/actions/workflows/mutant.yml
+[yardstick]: https://github.com/sferik/buftok/actions/workflows/yardstick.yml
 ###### Statefully split input data by a specifiable token
 BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by
-default.  It allows input to be spoon-fed from some outside source which
+default. It allows input to be spoon-fed from some outside source which
 receives arbitrary length datagrams which may-or-may-not contain the token by
-which entities are delimited.  In this respect it's ideally paired with
-something like [EventMachine][].
+which entities are delimited. It's useful any time you need to extract
+delimited messages from a stream of chunked data.
-[EventMachine]: http://rubyeventmachine.com/
+## Examples
+### TCP Server
+Process newline-delimited commands from a TCP client:
+```ruby
+require "socket"
+require "buftok"
+server = TCPServer.new(4000)
+loop do
+  client = server.accept
+  tokenizer = BufferedTokenizer.new("\n")
+  while (data = client.readpartial(4096))
+    tokenizer.extract(data).each do |line|
+      puts "Received: #{line}"
+    end
+  end
+rescue EOFError
+  client.close
+end
+```
+### Streaming IO
+Read a large file in chunks without loading it all into memory:
+```ruby
+require "buftok"
+tokenizer = BufferedTokenizer.new("\n")
+File.open("large_log_file.txt") do |file|
+  while (chunk = file.read(8192))
+    tokenizer.extract(chunk).each do |line|
+      process_log_line(line)
+    end
+  end
+end
+# Don't forget to flush any remaining data
+remaining = tokenizer.flush
+process_log_line(remaining) unless remaining.empty?
+```
+> [!IMPORTANT]
+> Always call `flush` when you're done reading from the stream to process any
+> remaining data that didn't end with a delimiter.
+### Custom Delimiters
+Parse a stream using a multi-character delimiter:
+```ruby
+require "buftok"
+tokenizer = BufferedTokenizer.new("\r\n\r\n")
+chunks = ["HTTP/1.1 200 OK\r\n", "Content-Type: text/plain\r\n\r\n", "Hello"]
+chunks.each do |chunk|
+  tokenizer.extract(chunk).each do |headers|
+    puts "Headers: #{headers}"
+  end
+end
+puts "Body so far: #{tokenizer.flush}"
+```
+> [!TIP]
+> Multi-character delimiters that get split across chunks are handled
+> automatically — no special handling is needed on your end.
 ## Supported Ruby Versions
-This library aims to support and is [tested against][travis] the following Ruby
+This library aims to support and is [tested against][test] the following Ruby
 implementations:
-* Ruby 1.8.7
-* Ruby 1.9.2
-* Ruby 1.9.3
-* Ruby 2.0.0
+* Ruby 3.2
+* Ruby 3.3
+* Ruby 3.4
+* Ruby 4.0
 If something doesn't work on one of these interpreters, it's a bug.
-This library may inadvertently work (or seem to work) on other Ruby
-implementations, however support will only be provided for the versions listed
-above.
+This code will likely still work on older Ruby versions but support will not be
+provided for end-of-life versions.
 If you would like this library to support another Ruby version, you may
 volunteer to be a maintainer. Being a maintainer entails making sure all tests
@@ -43,6 +120,7 @@ fashion. If critical issues for a particular implementation exist at the time
 of a major release, support for that Ruby version may be dropped.
 ## Copyright
-Copyright (c) 2006-2013 Tony Arcieri, Martin Emde, Erik Michaels-Ober.
-Distributed under the [Ruby license][license].
-[license]: http://www.ruby-lang.org/en/LICENSE.txt
+Copyright (c) 2006-2026 Tony Arcieri, Martin Emde, Erik Berlin.
+Distributed under the [MIT license][license].
+[license]: https://opensource.org/licenses/MIT

data/buftok.gemspec CHANGED Viewed

@@ -1,17 +1,28 @@
+# frozen_string_literal: true
 Gem::Specification.new do |spec|
-  spec.add_development_dependency 'bundler', '~> 1.0'
-  spec.authors       = ["Tony Arcieri", "Martin Emde", "Erik Michaels-Ober"]
-  spec.description   = %q{BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs}
-  spec.email         = "sferik@gmail.com"
-  spec.files         = %w(CONTRIBUTING.md Gemfile LICENSE.md README.md Rakefile buftok.gemspec)
-  spec.files        += Dir.glob("lib/**/*.rb")
-  spec.files        += Dir.glob("test/**/*.rb")
-  spec.test_files    = spec.files.grep(%r{^test/})
-  spec.homepage      = "https://github.com/sferik/buftok"
-  spec.licenses      = ['MIT']
-  spec.name          = "buftok"
+  spec.version = "1.0.0"
+  spec.authors = ["Tony Arcieri", "Martin Emde", "Erik Berlin"]
+  spec.summary = "BufferedTokenizer extracts token delimited entities from a sequence of string inputs"
+  spec.description = spec.summary
+  spec.email = ["sferik@gmail.com", "martin.emde@gmail.com"]
+  spec.files = %w[CHANGELOG.md CONTRIBUTING.md LICENSE.txt README.md buftok.gemspec] + Dir["lib/**/*.rb"]
+  spec.homepage = "https://github.com/sferik/buftok"
+  spec.licenses = ["MIT"]
+  spec.name = "buftok"
   spec.require_paths = ["lib"]
-  spec.required_rubygems_version = '>= 1.3.5'
-  spec.summary       = spec.description
-  spec.version       = "0.2.0"
+  spec.required_ruby_version = ">= 3.2"
+  spec.required_rubygems_version = ">= 3.0"
+  spec.metadata = {
+    "allowed_push_host" => "https://rubygems.org",
+    "bug_tracker_uri" => "#{spec.homepage}/issues",
+    "changelog_uri" => "#{spec.homepage}/blob/master/CHANGELOG.md",
+    "documentation_uri" => "https://rubydoc.info/gems/buftok/",
+    "funding_uri" => "https://github.com/sponsors/sferik/",
+    "homepage_uri" => spec.homepage,
+    "rubygems_mfa_required" => "true",
+    "source_code_uri" => spec.homepage
+  }
 end

data/lib/buftok.rb CHANGED Viewed

@@ -1,59 +1,169 @@
+# frozen_string_literal: true
+# Statefully split input data by a specifiable token
+#
 # BufferedTokenizer takes a delimiter upon instantiation, or acts line-based
-# by default.  It allows input to be spoon-fed from some outside source which
+# by default. It allows input to be spoon-fed from some outside source which
 # receives arbitrary length datagrams which may-or-may-not contain the token
-# by which entities are delimited.  In this respect it's ideally paired with
-# something like EventMachine (http://rubyeventmachine.com/).
+# by which entities are delimited.
+#
+# @example
+#   tokenizer = BufferedTokenizer.new("\n")
+#   tokenizer.extract("foo\nbar")  #=> ["foo"]
+#   tokenizer.extract("baz\n")     #=> ["barbaz"]
+#   tokenizer.flush                 #=> ""
 class BufferedTokenizer
-  # New BufferedTokenizers will operate on lines delimited by a delimiter,
-  # which is by default the global input delimiter $/ ("\n").
+  # Limit passed to String#split to preserve trailing empty fields
+  SPLIT_LIMIT = -1
+  # Return the delimiter overlap length
+  #
+  # The number of characters at the end of a chunk that may contain a
+  # partial delimiter, equal to delimiter.length - 1.
+  #
+  # @example
+  #   BufferedTokenizer.new("<>").overlap  #=> 1
+  #
+  # @return [Integer] delimiter.length - 1
+  #
+  # @api public
+  attr_reader :overlap
+  # Create a new BufferedTokenizer
+  #
+  # Operates on lines delimited by a delimiter, which is by default "\n".
   #
-  # The input buffer is stored as an array.  This is by far the most efficient
+  # The input buffer is stored as an array. This is by far the most efficient
   # approach given language constraints (in C a linked list would be a more
-  # appropriate data structure).  Segments of input data are stored in a list
+  # appropriate data structure). Segments of input data are stored in a list
   # which is only joined when a token is reached, substantially reducing the
   # number of objects required for the operation.
-  def initialize(delimiter = $/)
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new("<>")
+  #
+  # @param delimiter [String] the token delimiter (default: "\n")
+  #
+  # @return [BufferedTokenizer]
+  #
+  # @api public
+  def initialize(delimiter = "\n")
     @delimiter = delimiter
     @input = []
-    @tail = ''
-    @trim = @delimiter.length - 1
+    @tail = +""
+    @overlap = @delimiter.length - 1
   end
+  # Return the byte size of the internal buffer
+  #
+  # Size is not cached and is determined every time this method is called
+  # in order to optimize throughput for extract.
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new
+  #   tokenizer.extract("foo")
+  #   tokenizer.size  #=> 3
+  #
+  # @return [Integer]
+  #
+  # @api public
+  def size
+    @tail.length + @input.sum(&:length)
+  end
+  # Extract tokenized entities from the input data
+  #
   # Extract takes an arbitrary string of input data and returns an array of
-  # tokenized entities, provided there were any available to extract.  This
+  # tokenized entities, provided there were any available to extract. This
   # makes for easy processing of datagrams using a pattern like:
   #
-  #   tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
+  #   tokenizer.extract(data).map { |entity| Decode(entity) }.each { ... }
   #
-  # Using -1 makes split to return "" if the token is at the end of
+  # Using -1 makes split return "" if the token is at the end of
   # the string, meaning the last element is the start of the next chunk.
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new
+  #   tokenizer.extract("foo\nbar")  #=> ["foo"]
+  #
+  # @param data [String] a chunk of input data
+  #
+  # @return [Array<String>] complete tokens extracted from the input
+  #
+  # @api public
   def extract(data)
-    if @trim > 0
-      tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
-      data = tail_end + data if tail_end
-    end
+    data = rejoin_split_delimiter(data)
     @input << @tail
-    entities = data.split(@delimiter, -1)
-    @tail = entities.shift
-    unless entities.empty?
-      @input << @tail
-      entities.unshift @input.join
-      @input.clear
-      @tail = entities.pop
-    end
+    entities = data.split(@delimiter, SPLIT_LIMIT)
+    @tail = entities.shift # : String
+    consolidate_input(entities) if entities.length.positive?
     entities
   end
-  # Flush the contents of the input buffer, i.e. return the input buffer even though
-  # a token has not yet been encountered
+  # Flush the contents of the input buffer
+  #
+  # Return the contents of the input buffer even though a token has not
+  # yet been encountered, then reset the buffer.
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new
+  #   tokenizer.extract("foo\nbar")
+  #   tokenizer.flush  #=> "bar"
+  #
+  # @return [String] the buffered input
+  #
+  # @api public
   def flush
     @input << @tail
     buffer = @input.join
     @input.clear
-    @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7
+    @tail = +""
     buffer
   end
+  private
+  # Rejoin a delimiter that was split across two chunks
+  #
+  # When the delimiter is longer than one character, it may be split across
+  # two successive chunks. Transfer the trailing overlap from @tail back onto
+  # the front of the incoming data so that split can find the full delimiter.
+  #
+  # @param data [String] incoming data
+  #
+  # @return [String] data with any split delimiter prefix restored
+  #
+  # @api private
+  def rejoin_split_delimiter(data)
+    if @overlap.positive?
+      tail_end = @tail[-@overlap..]
+      @tail.slice!(-@overlap, @overlap)
+      tail_end ? tail_end + data : data
+    else
+      data
+    end
+  end
+  # Consolidate the input buffer into the first entity
+  #
+  # Once at least one delimiter has been found, join the accumulated input
+  # buffer with the first entity and move the trailing partial into @tail.
+  #
+  # @param entities [Array<String>] split entities
+  #
+  # @return [void]
+  #
+  # @api private
+  def consolidate_input(entities)
+    @input << @tail
+    entities.unshift @input.join
+    @input.clear
+    @tail = entities.pop # : String
+  end
 end
+# Alias for {BufferedTokenizer}, matching the gem name
+Buftok = BufferedTokenizer

metadata CHANGED Viewed

@@ -1,75 +1,59 @@
 --- !ruby/object:Gem::Specification
 name: buftok
 version: !ruby/object:Gem::Version
-  version: 0.2.0
-  prerelease:
+  version: 1.0.0
 platform: ruby
 authors:
 - Tony Arcieri
 - Martin Emde
-- Erik Michaels-Ober
-autorequire:
+- Erik Berlin
 bindir: bin
 cert_chain: []
-date: 2013-11-22 00:00:00.000000000 Z
-dependencies:
-- !ruby/object:Gem::Dependency
-  name: bundler
-  requirement: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ~>
-      - !ruby/object:Gem::Version
-        version: '1.0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ~>
-      - !ruby/object:Gem::Version
-        version: '1.0'
+date: 1980-01-02 00:00:00.000000000 Z
+dependencies: []
 description: BufferedTokenizer extracts token delimited entities from a sequence of
-  arbitrary inputs
-email: sferik@gmail.com
+  string inputs
+email:
+- sferik@gmail.com
+- martin.emde@gmail.com
 executables: []
 extensions: []
 extra_rdoc_files: []
 files:
+- CHANGELOG.md
 - CONTRIBUTING.md
-- Gemfile
-- LICENSE.md
+- LICENSE.txt
 - README.md
-- Rakefile
 - buftok.gemspec
 - lib/buftok.rb
-- test/test_buftok.rb
 homepage: https://github.com/sferik/buftok
 licenses:
 - MIT
-post_install_message:
+metadata:
+  allowed_push_host: https://rubygems.org
+  bug_tracker_uri: https://github.com/sferik/buftok/issues
+  changelog_uri: https://github.com/sferik/buftok/blob/master/CHANGELOG.md
+  documentation_uri: https://rubydoc.info/gems/buftok/
+  funding_uri: https://github.com/sponsors/sferik/
+  homepage_uri: https://github.com/sferik/buftok
+  rubygems_mfa_required: 'true'
+  source_code_uri: https://github.com/sferik/buftok
 rdoc_options: []
 require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
-  - - ! '>='
+  - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '3.2'
 required_rubygems_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
-  - - ! '>='
+  - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.5
+      version: '3.0'
 requirements: []
-rubyforge_project:
-rubygems_version: 1.8.23
-signing_key:
-specification_version: 3
-summary: BufferedTokenizer extracts token delimited entities from a sequence of arbitrary
+rubygems_version: 4.0.6
+specification_version: 4
+summary: BufferedTokenizer extracts token delimited entities from a sequence of string
   inputs
-test_files:
-- test/test_buftok.rb
-has_rdoc:
+test_files: []

data/Gemfile DELETED Viewed

@@ -1,6 +0,0 @@
-source 'https://rubygems.org'
-gem 'rake'
-gem 'rdoc'
-gemspec

data/LICENSE.md DELETED Viewed

@@ -1,56 +0,0 @@
-Ruby is copyrighted free software by Yukihiro Matsumoto <matz@netlab.jp>.
-You can redistribute it and/or modify it under either the terms of the
-2-clause BSDL (see the file BSDL), or the conditions below:
-  1. You may make and give away verbatim copies of the source form of the
-     software without restriction, provided that you duplicate all of the
-     original copyright notices and associated disclaimers.
-  2. You may modify your copy of the software in any way, provided that
-     you do at least ONE of the following:
-       a) place your modifications in the Public Domain or otherwise
-          make them Freely Available, such as by posting said
-	  modifications to Usenet or an equivalent medium, or by allowing
-	  the author to include your modifications in the software.
-       b) use the modified software only within your corporation or
-          organization.
-       c) give non-standard binaries non-standard names, with
-          instructions on where to get the original software distribution.
-       d) make other distribution arrangements with the author.
-  3. You may distribute the software in object code or binary form,
-     provided that you do at least ONE of the following:
-       a) distribute the binaries and library files of the software,
-	  together with instructions (in the manual page or equivalent)
-	  on where to get the original distribution.
-       b) accompany the distribution with the machine-readable source of
-	  the software.
-       c) give non-standard binaries non-standard names, with
-          instructions on where to get the original software distribution.
-       d) make other distribution arrangements with the author.
-  4. You may modify and include the part of the software into any other
-     software (possibly commercial).  But some files in the distribution
-     are not written by the author, so that they are not under these terms.
-     For the list of those files and their copying conditions, see the
-     file LEGAL.
-  5. The scripts and library files supplied as input to or produced as
-     output from the software do not automatically fall under the
-     copyright of the software, but belong to whomever generated them,
-     and may be sold commercially, and may be aggregated with this
-     software.
-  6. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
-     IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
-     WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-     PURPOSE.

data/Rakefile DELETED Viewed

@@ -1,66 +0,0 @@
-require 'bundler'
-require 'rdoc/task'
-require 'rake/testtask'
-task :default => :test
-Bundler::GemHelper.install_tasks
-RDoc::Task.new do |task|
-  task.rdoc_dir = 'doc'
-  task.title    = 'BufferedTokenizer'
-  task.rdoc_files.include('lib/**/*.rb')
-end
-Rake::TestTask.new :test do |t|
-  t.libs << 'lib'
-  t.test_files = FileList['test/**/*.rb']
-end
-desc "Benchmark the current implementation"
-task :bench do
-  require 'benchmark'
-  require File.expand_path('lib/buftok', File.dirname(__FILE__))
-  n = 50000
-  delimiter = "\n\n"
-  frequency1 = 1000
-  puts "generating #{n} strings, with #{delimiter.inspect} every #{frequency1} strings..."
-  data1 = (0...n).map do |i|
-    (((i % frequency1 == 1) ? "\n" : "") +
-      ("s" * i) +
-      ((i % frequency1 == 0) ? "\n" : "")).freeze
-  end
-  frequency2 = 10
-  puts "generating #{n} strings, with #{delimiter.inspect} every #{frequency2} strings..."
-  data2 = (0...n).map do |i|
-    (((i % frequency2 == 1) ? "\n" : "") +
-      ("s" * i) +
-      ((i % frequency2 == 0) ? "\n" : "")).freeze
-  end
-  Benchmark.bmbm do |x|
-    x.report("1 char, freq: #{frequency1}") do
-      bt1 = BufferedTokenizer.new
-      n.times { |i| bt1.extract(data1[i]) }
-    end
-    x.report("2 char, freq: #{frequency1}") do
-      bt2 = BufferedTokenizer.new(delimiter)
-      n.times { |i| bt2.extract(data1[i]) }
-    end
-    x.report("1 char, freq: #{frequency2}") do
-      bt3 = BufferedTokenizer.new
-      n.times { |i| bt3.extract(data2[i]) }
-    end
-    x.report("2 char, freq: #{frequency2}") do
-      bt4 = BufferedTokenizer.new(delimiter)
-      n.times { |i| bt4.extract(data2[i]) }
-    end
-  end
-end

data/test/test_buftok.rb DELETED Viewed

@@ -1,27 +0,0 @@
-require 'test/unit'
-require 'buftok'
-class TestBuftok < Test::Unit::TestCase
-  def test_buftok
-    tokenizer = BufferedTokenizer.new
-    assert_equal %w[foo], tokenizer.extract("foo\nbar".freeze)
-    assert_equal %w[barbaz qux], tokenizer.extract("baz\nqux\nquu".freeze)
-    assert_equal 'quu', tokenizer.flush
-    assert_equal '', tokenizer.flush
-  end
-  def test_delimiter
-    tokenizer = BufferedTokenizer.new('<>')
-    assert_equal ['', "foo\n"], tokenizer.extract("<>foo\n<>".freeze)
-    assert_equal %w[bar], tokenizer.extract('bar<>baz'.freeze)
-    assert_equal 'baz', tokenizer.flush
-  end
-  def test_split_delimiter
-    tokenizer = BufferedTokenizer.new('<>'.freeze)
-    assert_equal [], tokenizer.extract('foo<'.freeze)
-    assert_equal %w[foo], tokenizer.extract('>bar<'.freeze)
-    assert_equal %w[bar<baz qux], tokenizer.extract('baz<>qux<>'.freeze)
-    assert_equal '', tokenizer.flush
-  end
-end