token_estimator 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +70 -0
- data/lib/token_estimator/version.rb +1 -1
- data/lib/token_estimator.rb +4 -0
- data/token_estimator-0.1.0.gem +0 -0
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4eb2770739cb655fed20189d0f63414c489d8d943066870b3cb6e18260d7523f
|
4
|
+
data.tar.gz: 367201d551861d9fb49a398baddc3537c87fa4c929ee3a5d82ee3f7c501a3dc1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 00a175350809995a880b31b79db68fb2bfc184f4da9e180b89438959a4b39d17e01e169ca1896f97b984a36284712260d7e7f434d2bd12e3c4ee60599cb38280
|
7
|
+
data.tar.gz: dae8df4734be69cea8760a642c3e764d25340cf7b27058e831c3e72fc999b3a24cc93ea3c0706cf7fe0b937f6602c63d5649a5ff3cad4bfa06e6df860b31703a
|
data/README.md
CHANGED
@@ -13,6 +13,65 @@ And then execute:
|
|
13
13
|
bundle install
|
14
14
|
```
|
15
15
|
|
16
|
+
## Methods
|
17
|
+
|
18
|
+
#### `count_tokens_from_text`
|
19
|
+
Count tokens from a given text.
|
20
|
+
|
21
|
+
```rb
|
22
|
+
require "token_estimator"
|
23
|
+
|
24
|
+
tokenizer_name = "gpt2"
|
25
|
+
estimator = TokenEstimator::Estimator.new(tokenizer_name)
|
26
|
+
|
27
|
+
text = "Your sample text here."
|
28
|
+
token_estimation = estimator.count_tokens_from_text(text)
|
29
|
+
|
30
|
+
puts "Token estimation: #{token_estimation}"
|
31
|
+
```
|
32
|
+
|
33
|
+
#### `count_tokens_from_file`
|
34
|
+
Count tokens from a file. The file type is determined by the file extension.
|
35
|
+
|
36
|
+
```rb
|
37
|
+
require "token_estimator"
|
38
|
+
|
39
|
+
file_path = "spec/fixtures/files/lorem.pdf"
|
40
|
+
tokenizer_name = "gpt2"
|
41
|
+
estimator = TokenEstimator::Estimator.new(tokenizer_name)
|
42
|
+
|
43
|
+
token_estimation = estimator.count_tokens_from_file(file_path)
|
44
|
+
|
45
|
+
puts "Token estimation: #{token_estimation}"
|
46
|
+
```
|
47
|
+
|
48
|
+
#### `count_tokens_from_excel_file`
|
49
|
+
Counts tokens from an Excel (.xlsx) file.
|
50
|
+
|
51
|
+
#### `count_tokens_from_csv_file`
|
52
|
+
Counts tokens from a CSV file.
|
53
|
+
|
54
|
+
#### `count_tokens_from_pdf_file`
|
55
|
+
Counts tokens from a PDF file.
|
56
|
+
|
57
|
+
#### `count_tokens_from_txt_file`
|
58
|
+
Counts tokens from a plain text (.txt) file.
|
59
|
+
|
60
|
+
#### `count_tokens_from_markdown_file`
|
61
|
+
Counts tokens from a Markdown (.md) file.
|
62
|
+
|
63
|
+
#### `count_tokens_from_json_file`
|
64
|
+
Counts tokens from a JSON file.
|
65
|
+
|
66
|
+
#### `count_tokens_from_html_file`
|
67
|
+
Counts tokens from an HTML file.
|
68
|
+
|
69
|
+
#### `count_tokens_from_json`
|
70
|
+
Counts tokens from a JSON object.
|
71
|
+
|
72
|
+
#### `count_tokens_from_html`
|
73
|
+
Counts tokens from an HTML string.
|
74
|
+
|
16
75
|
## Roadmap
|
17
76
|
Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:
|
18
77
|
|
@@ -29,6 +88,17 @@ Here is a checklist of the formats we currently support for token counting and t
|
|
29
88
|
- [ ] PNG
|
30
89
|
- [ ] JPG
|
31
90
|
|
91
|
+
## Error Handling
|
92
|
+
If you try to count tokens from an unsupported file type, the gem will raise an `UnsupportedFileTypeError`
|
93
|
+
|
94
|
+
```rb
|
95
|
+
begin
|
96
|
+
token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
|
97
|
+
rescue TokenEstimator::UnsupportedFileTypeError => e
|
98
|
+
puts e.message
|
99
|
+
end
|
100
|
+
```
|
101
|
+
|
32
102
|
## Contributing
|
33
103
|
Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.
|
34
104
|
|
data/lib/token_estimator.rb
CHANGED
Binary file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: token_estimator
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- aemabit
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-07-
|
11
|
+
date: 2024-07-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rails
|
@@ -138,6 +138,7 @@ files:
|
|
138
138
|
- lib/token_estimator.rb
|
139
139
|
- lib/token_estimator/version.rb
|
140
140
|
- sig/token_estimator.rbs
|
141
|
+
- token_estimator-0.1.0.gem
|
141
142
|
homepage: https://github.com/aemabit/token_estimator
|
142
143
|
licenses:
|
143
144
|
- MIT
|