RubyGems - token_estimator - Versions diffs - 0.1.0 → 0.1.2 - Mend

token_estimator 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +4 -0
data/README.md +73 -0
data/lib/token_estimator/version.rb +1 -1
data/lib/token_estimator.rb +2 -0
data/token_estimator-0.1.0.gem +0 -0
data/token_estimator-0.1.1.gem +0 -0
metadata +4 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c4f8fd7ec54ffb7145d793b94eada2a34aa331a562cbb7b52eee0f74fcd6c181
-  data.tar.gz: b61a2e7c7ce96b60267bcde6d67fedfe5330ad9f4d021b2d9a4b30fbccbaf6e9
+  metadata.gz: e76555160bf7038e96625963f99f434979da61bab11f70c0e7293d83db8c1588
+  data.tar.gz: 46d04371c15c88c39d96f87223c7a0713f9ceae367c52a48b50ca59b57d417e5
 SHA512:
-  metadata.gz: 19333d9dce63923d2490c525b723a1585de5107eb16d5021c05ea1f461d0ad7659435b0d07df92d8e4b209f79501b41386132b604158dc20c60b2db2f571680d
-  data.tar.gz: 2790194ec3c05d191dbbdc0037e01567a9e80236dc872eaf335c93738d010b5ec414bb04872d0fa22cc2f1df11cfa2c49888d002d0692460c766e8d34c48abaa
+  metadata.gz: 110fb3caff83e609a5b4fa9e28ff92c1b0a866d9f30b669ea72b071e36f80e96377c451581dd94fa1155280311c9fd2fbb9ab5e9516c66c6c6b507aaba4972c7
+  data.tar.gz: 339ca2a1c63812c424cc2d1771d4dd665151ca069965511e469dd17587764a31957f463a2c5f5222e677413c9b1117f5331c1bd2762bd4d7c96260496e586f94

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,9 @@
 ## [Unreleased]
+## [0.1.1] - 2024-07-15
+- Added `TokenEstimator::Estimator::SUPPORTED_FILE_TYPES` method to specify supported file types.
 ## [0.1.0] - 2024-07-11
 - Initial release

data/README.md CHANGED Viewed

@@ -13,6 +13,68 @@ And then execute:
 bundle install
 ```
+## Methods
+#### `count_tokens_from_text`
+Count tokens from a given text.
+```rb
+    require "token_estimator"
+    tokenizer_name = "gpt2"
+    estimator = TokenEstimator::Estimator.new(tokenizer_name)
+    text = "Your sample text here."
+    token_estimation = estimator.count_tokens_from_text(text)
+    puts "Token estimation: #{token_estimation}"
+```
+#### `count_tokens_from_file`
+Count tokens from a file. The file type is determined by the file extension.
+```rb
+    require "token_estimator"
+    file_path = "spec/fixtures/files/lorem.pdf"
+    tokenizer_name = "gpt2"
+    estimator = TokenEstimator::Estimator.new(tokenizer_name)
+    token_estimation = estimator.count_tokens_from_file(file_path)
+    puts "Token estimation: #{token_estimation}"
+```
+#### `count_tokens_from_excel_file`
+Counts tokens from an Excel (.xlsx) file.
+#### `count_tokens_from_csv_file`
+Counts tokens from a CSV file.
+#### `count_tokens_from_pdf_file`
+Counts tokens from a PDF file.
+#### `count_tokens_from_txt_file`
+Counts tokens from a plain text (.txt) file.
+#### `count_tokens_from_markdown_file`
+Counts tokens from a Markdown (.md) file.
+#### `count_tokens_from_json_file`
+Counts tokens from a JSON file.
+#### `count_tokens_from_html_file`
+Counts tokens from an HTML file.
+#### `count_tokens_from_json`
+Counts tokens from a JSON object.
+#### `count_tokens_from_html`
+Counts tokens from an HTML string.
+#### `TokenEstimator::Estimator::SUPPORTED_FILE_TYPES`
+Return the supported file types.
 ## Roadmap
 Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:
@@ -29,6 +91,17 @@ Here is a checklist of the formats we currently support for token counting and t
 - [ ] PNG
 - [ ] JPG
+## Error Handling
+If you try to count tokens from an unsupported file type, the gem will raise an `UnsupportedFileTypeError`
+```rb
+begin
+  token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
+rescue TokenEstimator::UnsupportedFileTypeError => e
+  puts e.message
+end
+```
 ## Contributing
 Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.

data/lib/token_estimator/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module TokenEstimator
-  VERSION = "0.1.0"
+  VERSION = "0.1.2"
 end

data/lib/token_estimator.rb CHANGED Viewed

@@ -13,6 +13,8 @@ module TokenEstimator
   class UnsupportedFileTypeError < StandardError; end
   class Estimator
+    SUPPORTED_FILE_TYPES = [".txt", ".csv", ".pdf", ".json", ".md", ".html", ".xlsx"]
     def initialize(tokenizer_name)
       @tokenizer = Tokenizers.from_pretrained(tokenizer_name)
     end

data/token_estimator-0.1.0.gem ADDED Viewed

Binary file

data/token_estimator-0.1.1.gem ADDED Viewed

Binary file

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: token_estimator
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.2
 platform: ruby
 authors:
 - aemabit
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2024-07-11 00:00:00.000000000 Z
+date: 2024-07-15 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rails
@@ -138,6 +138,8 @@ files:
 - lib/token_estimator.rb
 - lib/token_estimator/version.rb
 - sig/token_estimator.rbs
+- token_estimator-0.1.0.gem
+- token_estimator-0.1.1.gem
 homepage: https://github.com/aemabit/token_estimator
 licenses:
 - MIT