RubyGems - token_estimator - Versions diffs - 0.1.0 → 0.1.1 - Mend

token_estimator 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/README.md +70 -0
data/lib/token_estimator/version.rb +1 -1
data/lib/token_estimator.rb +4 -0
data/token_estimator-0.1.0.gem +0 -0
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c4f8fd7ec54ffb7145d793b94eada2a34aa331a562cbb7b52eee0f74fcd6c181
-  data.tar.gz: b61a2e7c7ce96b60267bcde6d67fedfe5330ad9f4d021b2d9a4b30fbccbaf6e9
+  metadata.gz: 4eb2770739cb655fed20189d0f63414c489d8d943066870b3cb6e18260d7523f
+  data.tar.gz: 367201d551861d9fb49a398baddc3537c87fa4c929ee3a5d82ee3f7c501a3dc1
 SHA512:
-  metadata.gz: 19333d9dce63923d2490c525b723a1585de5107eb16d5021c05ea1f461d0ad7659435b0d07df92d8e4b209f79501b41386132b604158dc20c60b2db2f571680d
-  data.tar.gz: 2790194ec3c05d191dbbdc0037e01567a9e80236dc872eaf335c93738d010b5ec414bb04872d0fa22cc2f1df11cfa2c49888d002d0692460c766e8d34c48abaa
+  metadata.gz: 00a175350809995a880b31b79db68fb2bfc184f4da9e180b89438959a4b39d17e01e169ca1896f97b984a36284712260d7e7f434d2bd12e3c4ee60599cb38280
+  data.tar.gz: dae8df4734be69cea8760a642c3e764d25340cf7b27058e831c3e72fc999b3a24cc93ea3c0706cf7fe0b937f6602c63d5649a5ff3cad4bfa06e6df860b31703a

data/README.md CHANGED Viewed

@@ -13,6 +13,65 @@ And then execute:
 bundle install
 ```
+## Methods
+#### `count_tokens_from_text`
+Count tokens from a given text.
+```rb
+    require "token_estimator"
+    tokenizer_name = "gpt2"
+    estimator = TokenEstimator::Estimator.new(tokenizer_name)
+    text = "Your sample text here."
+    token_estimation = estimator.count_tokens_from_text(text)
+    puts "Token estimation: #{token_estimation}"
+```
+#### `count_tokens_from_file`
+Count tokens from a file. The file type is determined by the file extension.
+```rb
+    require "token_estimator"
+    file_path = "spec/fixtures/files/lorem.pdf"
+    tokenizer_name = "gpt2"
+    estimator = TokenEstimator::Estimator.new(tokenizer_name)
+    token_estimation = estimator.count_tokens_from_file(file_path)
+    puts "Token estimation: #{token_estimation}"
+```
+#### `count_tokens_from_excel_file`
+Counts tokens from an Excel (.xlsx) file.
+#### `count_tokens_from_csv_file`
+Counts tokens from a CSV file.
+#### `count_tokens_from_pdf_file`
+Counts tokens from a PDF file.
+#### `count_tokens_from_txt_file`
+Counts tokens from a plain text (.txt) file.
+#### `count_tokens_from_markdown_file`
+Counts tokens from a Markdown (.md) file.
+#### `count_tokens_from_json_file`
+Counts tokens from a JSON file.
+#### `count_tokens_from_html_file`
+Counts tokens from an HTML file.
+#### `count_tokens_from_json`
+Counts tokens from a JSON object.
+#### `count_tokens_from_html`
+Counts tokens from an HTML string.
 ## Roadmap
 Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:
@@ -29,6 +88,17 @@ Here is a checklist of the formats we currently support for token counting and t
 - [ ] PNG
 - [ ] JPG
+## Error Handling
+If you try to count tokens from an unsupported file type, the gem will raise an `UnsupportedFileTypeError`
+```rb
+begin
+  token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
+rescue TokenEstimator::UnsupportedFileTypeError => e
+  puts e.message
+end
+```
 ## Contributing
 Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.

data/lib/token_estimator/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module TokenEstimator
-  VERSION = "0.1.0"
+  VERSION = "0.1.1"
 end

data/lib/token_estimator.rb CHANGED Viewed

@@ -95,6 +95,10 @@ module TokenEstimator
       tokens.count
     end
+    def supported_file_types
+      [".txt", ".csv", ".pdf", ".json", ".md", ".html", ".xlsx"]
+    end
     private
     def extract_text_from_excel(xlsx)

data/token_estimator-0.1.0.gem ADDED Viewed

Binary file

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: token_estimator
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.1
 platform: ruby
 authors:
 - aemabit
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2024-07-11 00:00:00.000000000 Z
+date: 2024-07-15 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rails
@@ -138,6 +138,7 @@ files:
 - lib/token_estimator.rb
 - lib/token_estimator/version.rb
 - sig/token_estimator.rbs
+- token_estimator-0.1.0.gem
 homepage: https://github.com/aemabit/token_estimator
 licenses:
 - MIT