RubyGems - buftok - Versions diffs - 0.3.0 → 1.0.1 - Mend

buftok 0.3.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 67c39aeda72dd14dc738490d14c121acd8591050057f488517c0a92039187179
-  data.tar.gz: dd6f4e0460ac0c2c076d9d4d05e91bdd29a2667167872f2850127b3ed7f72118
+  metadata.gz: ccb8527aed71cad86076d553ad95bb5034ff41e0cce2138e58c70c92525a0f94
+  data.tar.gz: 8cb51801dc3975bf6719278edb7237482483ff75d8d6a51a2726e5306453595f
 SHA512:
-  metadata.gz: 9a2db2dffe2660fcb5ec89e813b8a953cdbdebd530184be5bf88f35dbf7bfd06dd15e5441b6e4316b6a23a83458d43df90212f7a30f9a2d25de5b49609ff6857
-  data.tar.gz: c8312db37a322e718163142e1246f909eca8e50f07bde78b706d46b327c035bf2c4d3e7ebb59d10b7cdeb5dbac29dcf7b9b8d1cad2ab5e7702f65ebfde315d79
+  metadata.gz: 9813d806b60d57ad1ba3729acaa48e8f25a94a84d96b08217911638d3866b4deb14f0fdca0fc0b5ea5563557367bc3165d77f3e707a26a858549a360f0fe17a2
+  data.tar.gz: 74df77109d7b1733a643d70ab310f086983e8913cd926c15ec79d0221263bace433ec3863f826bef41022b729a2b4b52bbcef694aacaf86ab6333a5a163b46df

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,100 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [1.0.1] - 2026-03-20
+### Changed
+- Improve gem push workflow security and reliability
+  - Add top-level `permissions: contents: read` and scope `contents: write` to the job
+  - Restrict workflow to `sferik/buftok` repository
+  - Add `rubygems.org` deployment environment
+  - Pin `rubygems/configure-rubygems-credentials` to v1.0.0
+  - Sign gem with Sigstore before pushing
+  - Push gem with `--attestation` flag
+  - Simplify release steps and remove manual git config
+## [1.0.0] - 2026-03-20
+### Added
+- RBS type signatures in `sig/buftok.rbs` with Steep for strict type checking
+- RuboCop, Standard, rubocop-minitest, rubocop-performance, and rubocop-rake for linting
+- Mutant for mutation testing with 100% coverage
+- GitHub Actions workflows for linting, type checking, and mutation testing
+- `.github/FUNDING.yml` for GitHub Sponsors
+- Gemspec metadata (`allowed_push_host`, `changelog_uri`, `documentation_uri`,
+  `funding_uri`, `homepage_uri`, `rubygems_mfa_required`, `source_code_uri`,
+  `bug_tracker_uri`)
+- `CHANGELOG.md`
+### Changed
+- Require Ruby >= 3.2
+- Require RubyGems >= 3.0
+- Test against Ruby 3.2, 3.3, 3.4, and 4.0 (drop EOL 2.6, 2.7, 3.0)
+- Update `actions/checkout` to v6 and `ruby/setup-ruby` to v1
+- Replace test-unit with Minitest 6
+- Replace `inject` with `sum` in `size` method
+- Use `@tail.clear` instead of `String.new` in `flush` (drop Ruby 1.8.7 workaround)
+- Move development dependencies from gemspec to Gemfile
+- Bump rake from `~> 10.0` to `>= 13`
+- Extract `rejoin_split_delimiter` and `consolidate_input` private methods
+- Update copyright years to 2006-2026
+- Rename Erik Michaels-Ober to Erik Berlin
+### Fixed
+- Typo in test comment ("Desipte" -> "Despite")
+## [0.3.0] - 2021-03-25
+### Added
+- `Buftok` constant as an alias for `BufferedTokenizer`
+- `BufferedTokenizer#size` method to determine internal buffer size
+- GitHub Actions CI workflow
+- Support for `frozen_string_literal`
+### Changed
+- Replace Ruby license with MIT license
+- Modernize gemspec
+- Remove Travis CI in favor of GitHub Actions
+- Update supported Ruby versions to 2.6, 2.7, 3.0
+## [0.2.0] - 2013-11-22
+### Added
+- Tests
+- Benchmark rake task
+- Support for multi-character delimiters split across chunks
+- Section on supported Ruby versions in README
+### Changed
+- Use global input delimiter `$/` as default instead of hard-coded `"\n"`
+- Unified handling of single/multi-character delimiters
+## [0.1.0] - 2013-11-20
+### Added
+- Initial release of BufferedTokenizer
+- Line-based tokenization with configurable delimiter
+- `extract` method for incremental tokenization
+- `flush` method to retrieve remaining buffer contents
+[Unreleased]: https://github.com/sferik/buftok/compare/v1.0.1...HEAD
+[1.0.1]: https://github.com/sferik/buftok/compare/v1.0.0...v1.0.1
+[1.0.0]: https://github.com/sferik/buftok/compare/v0.3.0...v1.0.0
+[0.3.0]: https://github.com/sferik/buftok/compare/v0.2.0...v0.3.0
+[0.2.0]: https://github.com/sferik/buftok/compare/v0.1...v0.2.0
+[0.1.0]: https://github.com/sferik/buftok/releases/tag/v0.1

data/LICENSE.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 The MIT License (MIT)
-Copyright (c) 2021 Tony Arcieri, Martin Emde, Erik Michaels-Ober
+Copyright (c) 2006-2026 Tony Arcieri, Martin Emde, Erik Berlin
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

data/README.md CHANGED Viewed

@@ -1,34 +1,116 @@
 # BufferedTokenizer
 [![Gem Version](http://img.shields.io/gem/v/buftok.svg)][gem]
-[![Build Status](https://github.com/sferik/buftok/actions/workflows/ruby.yml/badge.svg)][build]
+[![Test](https://github.com/sferik/buftok/actions/workflows/test.yml/badge.svg)][test]
+[![Lint](https://github.com/sferik/buftok/actions/workflows/lint.yml/badge.svg)][lint]
+[![Type Check](https://github.com/sferik/buftok/actions/workflows/typecheck.yml/badge.svg)][typecheck]
+[![Mutation Testing](https://github.com/sferik/buftok/actions/workflows/mutant.yml/badge.svg)][mutant]
+[![Documentation Coverage](https://github.com/sferik/buftok/actions/workflows/yardstick.yml/badge.svg)][yardstick]
 [gem]: https://rubygems.org/gems/buftok
-[build]: https://github.com/sferik/buftok/actions
+[test]: https://github.com/sferik/buftok/actions/workflows/test.yml
+[lint]: https://github.com/sferik/buftok/actions/workflows/lint.yml
+[typecheck]: https://github.com/sferik/buftok/actions/workflows/typecheck.yml
+[mutant]: https://github.com/sferik/buftok/actions/workflows/mutant.yml
+[yardstick]: https://github.com/sferik/buftok/actions/workflows/yardstick.yml
 ###### Statefully split input data by a specifiable token
 BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by
-default.  It allows input to be spoon-fed from some outside source which
+default. It allows input to be spoon-fed from some outside source which
 receives arbitrary length datagrams which may-or-may-not contain the token by
-which entities are delimited.  In this respect it's ideally paired with
-something like [EventMachine][].
+which entities are delimited. It's useful any time you need to extract
+delimited messages from a stream of chunked data.
-[EventMachine]: http://rubyeventmachine.com/
+## Examples
+### TCP Server
+Process newline-delimited commands from a TCP client:
+```ruby
+require "socket"
+require "buftok"
+server = TCPServer.new(4000)
+loop do
+  client = server.accept
+  tokenizer = BufferedTokenizer.new("\n")
+  while (data = client.readpartial(4096))
+    tokenizer.extract(data).each do |line|
+      puts "Received: #{line}"
+    end
+  end
+rescue EOFError
+  client.close
+end
+```
+### Streaming IO
+Read a large file in chunks without loading it all into memory:
+```ruby
+require "buftok"
+tokenizer = BufferedTokenizer.new("\n")
+File.open("large_log_file.txt") do |file|
+  while (chunk = file.read(8192))
+    tokenizer.extract(chunk).each do |line|
+      process_log_line(line)
+    end
+  end
+end
+# Don't forget to flush any remaining data
+remaining = tokenizer.flush
+process_log_line(remaining) unless remaining.empty?
+```
+> [!IMPORTANT]
+> Always call `flush` when you're done reading from the stream to process any
+> remaining data that didn't end with a delimiter.
+### Custom Delimiters
+Parse a stream using a multi-character delimiter:
+```ruby
+require "buftok"
+tokenizer = BufferedTokenizer.new("\r\n\r\n")
+chunks = ["HTTP/1.1 200 OK\r\n", "Content-Type: text/plain\r\n\r\n", "Hello"]
+chunks.each do |chunk|
+  tokenizer.extract(chunk).each do |headers|
+    puts "Headers: #{headers}"
+  end
+end
+puts "Body so far: #{tokenizer.flush}"
+```
+> [!TIP]
+> Multi-character delimiters that get split across chunks are handled
+> automatically — no special handling is needed on your end.
 ## Supported Ruby Versions
-This library aims to support and is [tested against][build] the following Ruby
+This library aims to support and is [tested against][test] the following Ruby
 implementations:
-* Ruby 2.6
-* Ruby 2.7
-* Ruby 3.0
+* Ruby 3.2
+* Ruby 3.3
+* Ruby 3.4
+* Ruby 4.0
 If something doesn't work on one of these interpreters, it's a bug.
-This code will likely still work on older versions since it has not undergone
-many changes since release. However, support will not be provided for
-end-of-life ruby versions.
+This code will likely still work on older Ruby versions but support will not be
+provided for end-of-life versions.
 If you would like this library to support another Ruby version, you may
 volunteer to be a maintainer. Being a maintainer entails making sure all tests
@@ -38,7 +120,7 @@ fashion. If critical issues for a particular implementation exist at the time
 of a major release, support for that Ruby version may be dropped.
 ## Copyright
-Copyright (c) 2006-2021 Tony Arcieri, Martin Emde, Erik Michaels-Ober.
+Copyright (c) 2006-2026 Tony Arcieri, Martin Emde, Erik Berlin.
 Distributed under the [MIT license][license].
 [license]: https://opensource.org/licenses/MIT

data/buftok.gemspec CHANGED Viewed

@@ -1,19 +1,28 @@
+# frozen_string_literal: true
 Gem::Specification.new do |spec|
-  spec.version       = "0.3.0"
+  spec.version = "1.0.1"
-  spec.authors       = ["Tony Arcieri", "Martin Emde", "Erik Michaels-Ober"]
-  spec.summary       = %q{BufferedTokenizer extracts token delimited entities from a sequence of string inputs}
-  spec.description   = spec.summary
-  spec.email         = ["sferik@gmail.com", "martin.emde@gmail.com"]
-  spec.files         = %w(CONTRIBUTING.md LICENSE.txt README.md buftok.gemspec) + Dir["lib/**/*.rb"]
-  spec.homepage      = "https://github.com/sferik/buftok"
-  spec.licenses      = ["MIT"]
-  spec.name          = "buftok"
+  spec.authors = ["Tony Arcieri", "Martin Emde", "Erik Berlin"]
+  spec.summary = "BufferedTokenizer extracts token delimited entities from a sequence of string inputs"
+  spec.description = spec.summary
+  spec.email = ["sferik@gmail.com", "martin.emde@gmail.com"]
+  spec.files = %w[CHANGELOG.md CONTRIBUTING.md LICENSE.txt README.md buftok.gemspec] + Dir["lib/**/*.rb"]
+  spec.homepage = "https://github.com/sferik/buftok"
+  spec.licenses = ["MIT"]
+  spec.name = "buftok"
   spec.require_paths = ["lib"]
-  spec.required_rubygems_version = ">= 1.3.5"
+  spec.required_ruby_version = ">= 3.2"
+  spec.required_rubygems_version = ">= 3.0"
-  spec.add_development_dependency "bundler", ">= 1.17"
-  spec.add_development_dependency "rake", "~> 10.0"
-  spec.add_development_dependency "rdoc"
-  spec.add_development_dependency "test-unit"
+  spec.metadata = {
+    "allowed_push_host" => "https://rubygems.org",
+    "bug_tracker_uri" => "#{spec.homepage}/issues",
+    "changelog_uri" => "#{spec.homepage}/blob/master/CHANGELOG.md",
+    "documentation_uri" => "https://rubydoc.info/gems/buftok/",
+    "funding_uri" => "https://github.com/sponsors/sferik/",
+    "homepage_uri" => spec.homepage,
+    "rubygems_mfa_required" => "true",
+    "source_code_uri" => spec.homepage
+  }
 end

data/lib/buftok.rb CHANGED Viewed

@@ -1,72 +1,169 @@
 # frozen_string_literal: true
+# Statefully split input data by a specifiable token
 #
 # BufferedTokenizer takes a delimiter upon instantiation, or acts line-based
-# by default.  It allows input to be spoon-fed from some outside source which
+# by default. It allows input to be spoon-fed from some outside source which
 # receives arbitrary length datagrams which may-or-may-not contain the token
-# by which entities are delimited.  In this respect it's ideally paired with
-# something like EventMachine (http://rubyeventmachine.com/).
+# by which entities are delimited.
+#
+# @example
+#   tokenizer = BufferedTokenizer.new("\n")
+#   tokenizer.extract("foo\nbar")  #=> ["foo"]
+#   tokenizer.extract("baz\n")     #=> ["barbaz"]
+#   tokenizer.flush                 #=> ""
 class BufferedTokenizer
-  # New BufferedTokenizers will operate on lines delimited by a delimiter,
-  # which is by default the global input delimiter $/ ("\n").
+  # Limit passed to String#split to preserve trailing empty fields
+  SPLIT_LIMIT = -1
+  # Return the delimiter overlap length
+  #
+  # The number of characters at the end of a chunk that may contain a
+  # partial delimiter, equal to delimiter.length - 1.
+  #
+  # @example
+  #   BufferedTokenizer.new("<>").overlap  #=> 1
+  #
+  # @return [Integer] delimiter.length - 1
   #
-  # The input buffer is stored as an array.  This is by far the most efficient
+  # @api public
+  attr_reader :overlap
+  # Create a new BufferedTokenizer
+  #
+  # Operates on lines delimited by a delimiter, which is by default "\n".
+  #
+  # The input buffer is stored as an array. This is by far the most efficient
   # approach given language constraints (in C a linked list would be a more
-  # appropriate data structure).  Segments of input data are stored in a list
+  # appropriate data structure). Segments of input data are stored in a list
   # which is only joined when a token is reached, substantially reducing the
   # number of objects required for the operation.
-  def initialize(delimiter = $/)
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new("<>")
+  #
+  # @param delimiter [String] the token delimiter (default: "\n")
+  #
+  # @return [BufferedTokenizer]
+  #
+  # @api public
+  def initialize(delimiter = "\n")
     @delimiter = delimiter
     @input = []
-    @tail = String.new
-    @trim = @delimiter.length - 1
+    @tail = +""
+    @overlap = @delimiter.length - 1
   end
-  # Determine the size of the internal buffer.
+  # Return the byte size of the internal buffer
   #
   # Size is not cached and is determined every time this method is called
   # in order to optimize throughput for extract.
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new
+  #   tokenizer.extract("foo")
+  #   tokenizer.size  #=> 3
+  #
+  # @return [Integer]
+  #
+  # @api public
   def size
-    @tail.length + @input.inject(0) { |total, input| total + input.length }
+    @tail.length + @input.sum(&:length)
   end
+  # Extract tokenized entities from the input data
+  #
   # Extract takes an arbitrary string of input data and returns an array of
-  # tokenized entities, provided there were any available to extract.  This
+  # tokenized entities, provided there were any available to extract. This
   # makes for easy processing of datagrams using a pattern like:
   #
-  #   tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
+  #   tokenizer.extract(data).map { |entity| Decode(entity) }.each { ... }
   #
-  # Using -1 makes split to return "" if the token is at the end of
+  # Using -1 makes split return "" if the token is at the end of
   # the string, meaning the last element is the start of the next chunk.
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new
+  #   tokenizer.extract("foo\nbar")  #=> ["foo"]
+  #
+  # @param data [String] a chunk of input data
+  #
+  # @return [Array<String>] complete tokens extracted from the input
+  #
+  # @api public
   def extract(data)
-    if @trim > 0
-      tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
-      data = tail_end + data if tail_end
-    end
+    data = rejoin_split_delimiter(data)
     @input << @tail
-    entities = data.split(@delimiter, -1)
-    @tail = entities.shift
+    entities = data.split(@delimiter, SPLIT_LIMIT)
+    @tail = entities.shift # : String
-    unless entities.empty?
-      @input << @tail
-      entities.unshift @input.join
-      @input.clear
-      @tail = entities.pop
-    end
+    consolidate_input(entities) if entities.length.positive?
     entities
   end
-  # Flush the contents of the input buffer, i.e. return the input buffer even though
-  # a token has not yet been encountered
+  # Flush the contents of the input buffer
+  #
+  # Return the contents of the input buffer even though a token has not
+  # yet been encountered, then reset the buffer.
+  #
+  # @example
+  #   tokenizer = BufferedTokenizer.new
+  #   tokenizer.extract("foo\nbar")
+  #   tokenizer.flush  #=> "bar"
+  #
+  # @return [String] the buffered input
+  #
+  # @api public
   def flush
     @input << @tail
     buffer = @input.join
     @input.clear
-    @tail = String.new # @tail.clear is slightly faster, but not supported on 1.8.7
+    @tail = +""
     buffer
   end
+  private
+  # Rejoin a delimiter that was split across two chunks
+  #
+  # When the delimiter is longer than one character, it may be split across
+  # two successive chunks. Transfer the trailing overlap from @tail back onto
+  # the front of the incoming data so that split can find the full delimiter.
+  #
+  # @param data [String] incoming data
+  #
+  # @return [String] data with any split delimiter prefix restored
+  #
+  # @api private
+  def rejoin_split_delimiter(data)
+    if @overlap.positive?
+      tail_end = @tail[-@overlap..]
+      @tail.slice!(-@overlap, @overlap)
+      tail_end ? tail_end + data : data
+    else
+      data
+    end
+  end
+  # Consolidate the input buffer into the first entity
+  #
+  # Once at least one delimiter has been found, join the accumulated input
+  # buffer with the first entity and move the trailing partial into @tail.
+  #
+  # @param entities [Array<String>] split entities
+  #
+  # @return [void]
+  #
+  # @api private
+  def consolidate_input(entities)
+    @input << @tail
+    entities.unshift @input.join
+    @input.clear
+    @tail = entities.pop # : String
+  end
 end
-# The expected constant for a gem named buftok
+# Alias for {BufferedTokenizer}, matching the gem name
 Buftok = BufferedTokenizer

metadata CHANGED Viewed

@@ -1,73 +1,16 @@
 --- !ruby/object:Gem::Specification
 name: buftok
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 1.0.1
 platform: ruby
 authors:
 - Tony Arcieri
 - Martin Emde
-- Erik Michaels-Ober
-autorequire:
+- Erik Berlin
 bindir: bin
 cert_chain: []
-date: 2021-03-25 00:00:00.000000000 Z
-dependencies:
-- !ruby/object:Gem::Dependency
-  name: bundler
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '1.17'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '1.17'
-- !ruby/object:Gem::Dependency
-  name: rake
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '10.0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '10.0'
-- !ruby/object:Gem::Dependency
-  name: rdoc
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: test-unit
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
+date: 1980-01-02 00:00:00.000000000 Z
+dependencies: []
 description: BufferedTokenizer extracts token delimited entities from a sequence of
   string inputs
 email:
@@ -77,6 +20,7 @@ executables: []
 extensions: []
 extra_rdoc_files: []
 files:
+- CHANGELOG.md
 - CONTRIBUTING.md
 - LICENSE.txt
 - README.md
@@ -85,8 +29,15 @@ files:
 homepage: https://github.com/sferik/buftok
 licenses:
 - MIT
-metadata: {}
-post_install_message:
+metadata:
+  allowed_push_host: https://rubygems.org
+  bug_tracker_uri: https://github.com/sferik/buftok/issues
+  changelog_uri: https://github.com/sferik/buftok/blob/master/CHANGELOG.md
+  documentation_uri: https://rubydoc.info/gems/buftok/
+  funding_uri: https://github.com/sponsors/sferik/
+  homepage_uri: https://github.com/sferik/buftok
+  rubygems_mfa_required: 'true'
+  source_code_uri: https://github.com/sferik/buftok
 rdoc_options: []
 require_paths:
 - lib
@@ -94,15 +45,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: '3.2'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.5
+      version: '3.0'
 requirements: []
-rubygems_version: 3.2.3
-signing_key:
+rubygems_version: 4.0.8
 specification_version: 4
 summary: BufferedTokenizer extracts token delimited entities from a sequence of string
   inputs