token_estimator 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c4f8fd7ec54ffb7145d793b94eada2a34aa331a562cbb7b52eee0f74fcd6c181
4
- data.tar.gz: b61a2e7c7ce96b60267bcde6d67fedfe5330ad9f4d021b2d9a4b30fbccbaf6e9
3
+ metadata.gz: 4eb2770739cb655fed20189d0f63414c489d8d943066870b3cb6e18260d7523f
4
+ data.tar.gz: 367201d551861d9fb49a398baddc3537c87fa4c929ee3a5d82ee3f7c501a3dc1
5
5
  SHA512:
6
- metadata.gz: 19333d9dce63923d2490c525b723a1585de5107eb16d5021c05ea1f461d0ad7659435b0d07df92d8e4b209f79501b41386132b604158dc20c60b2db2f571680d
7
- data.tar.gz: 2790194ec3c05d191dbbdc0037e01567a9e80236dc872eaf335c93738d010b5ec414bb04872d0fa22cc2f1df11cfa2c49888d002d0692460c766e8d34c48abaa
6
+ metadata.gz: 00a175350809995a880b31b79db68fb2bfc184f4da9e180b89438959a4b39d17e01e169ca1896f97b984a36284712260d7e7f434d2bd12e3c4ee60599cb38280
7
+ data.tar.gz: dae8df4734be69cea8760a642c3e764d25340cf7b27058e831c3e72fc999b3a24cc93ea3c0706cf7fe0b937f6602c63d5649a5ff3cad4bfa06e6df860b31703a
data/README.md CHANGED
@@ -13,6 +13,65 @@ And then execute:
13
13
  bundle install
14
14
  ```
15
15
 
16
+ ## Methods
17
+
18
+ #### `count_tokens_from_text`
19
+ Count tokens from a given text.
20
+
21
+ ```rb
22
+ require "token_estimator"
23
+
24
+ tokenizer_name = "gpt2"
25
+ estimator = TokenEstimator::Estimator.new(tokenizer_name)
26
+
27
+ text = "Your sample text here."
28
+ token_estimation = estimator.count_tokens_from_text(text)
29
+
30
+ puts "Token estimation: #{token_estimation}"
31
+ ```
32
+
33
+ #### `count_tokens_from_file`
34
+ Count tokens from a file. The file type is determined by the file extension.
35
+
36
+ ```rb
37
+ require "token_estimator"
38
+
39
+ file_path = "spec/fixtures/files/lorem.pdf"
40
+ tokenizer_name = "gpt2"
41
+ estimator = TokenEstimator::Estimator.new(tokenizer_name)
42
+
43
+ token_estimation = estimator.count_tokens_from_file(file_path)
44
+
45
+ puts "Token estimation: #{token_estimation}"
46
+ ```
47
+
48
+ #### `count_tokens_from_excel_file`
49
+ Counts tokens from an Excel (.xlsx) file.
50
+
51
+ #### `count_tokens_from_csv_file`
52
+ Counts tokens from a CSV file.
53
+
54
+ #### `count_tokens_from_pdf_file`
55
+ Counts tokens from a PDF file.
56
+
57
+ #### `count_tokens_from_txt_file`
58
+ Counts tokens from a plain text (.txt) file.
59
+
60
+ #### `count_tokens_from_markdown_file`
61
+ Counts tokens from a Markdown (.md) file.
62
+
63
+ #### `count_tokens_from_json_file`
64
+ Counts tokens from a JSON file.
65
+
66
+ #### `count_tokens_from_html_file`
67
+ Counts tokens from an HTML file.
68
+
69
+ #### `count_tokens_from_json`
70
+ Counts tokens from a JSON object.
71
+
72
+ #### `count_tokens_from_html`
73
+ Counts tokens from an HTML string.
74
+
16
75
  ## Roadmap
17
76
  Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:
18
77
 
@@ -29,6 +88,17 @@ Here is a checklist of the formats we currently support for token counting and t
29
88
  - [ ] PNG
30
89
  - [ ] JPG
31
90
 
91
+ ## Error Handling
92
+ If you try to count tokens from an unsupported file type, the gem will raise an `UnsupportedFileTypeError`
93
+
94
+ ```rb
95
+ begin
96
+ token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
97
+ rescue TokenEstimator::UnsupportedFileTypeError => e
98
+ puts e.message
99
+ end
100
+ ```
101
+
32
102
  ## Contributing
33
103
  Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.
34
104
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module TokenEstimator
4
- VERSION = "0.1.0"
4
+ VERSION = "0.1.1"
5
5
  end
@@ -95,6 +95,10 @@ module TokenEstimator
95
95
  tokens.count
96
96
  end
97
97
 
98
+ def supported_file_types
99
+ [".txt", ".csv", ".pdf", ".json", ".md", ".html", ".xlsx"]
100
+ end
101
+
98
102
  private
99
103
 
100
104
  def extract_text_from_excel(xlsx)
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: token_estimator
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - aemabit
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-07-11 00:00:00.000000000 Z
11
+ date: 2024-07-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rails
@@ -138,6 +138,7 @@ files:
138
138
  - lib/token_estimator.rb
139
139
  - lib/token_estimator/version.rb
140
140
  - sig/token_estimator.rbs
141
+ - token_estimator-0.1.0.gem
141
142
  homepage: https://github.com/aemabit/token_estimator
142
143
  licenses:
143
144
  - MIT