roseflow-tiktoken 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f3b272bb8e3b5a4805fe59670894340347a0310ada891f29abb03235b4aca243
4
- data.tar.gz: 3135181d05c8ee397d57d88f99f100f8db45dc5cd7295dff27db3641de7bcc0f
3
+ metadata.gz: a7f3ff6fdcfd71f07e2319202fb56ac40e462c98e4a897b239bb4e1150cf1e13
4
+ data.tar.gz: 0cca81f6bd18edc889124e21598109a546ac0dfdac49c84b38f615eeae3f9b9c
5
5
  SHA512:
6
- metadata.gz: a21a9d867e21ea71d36f1605e894fd68933088c25df3077381bdecd1d20b7d12e13826577fe649832dc9294d6aa9e14ade0591cdf2327fe1883d60233005d342
7
- data.tar.gz: 6f94d37d4f661d5d672e8a5c8a9952544458d815f7997ecd01db82f6e2b17cbb8d3a9b7838993b5567e883d744abb08cf3f51a38030437b5823a79bf7ee0c8ff
6
+ metadata.gz: 8a737814488c6fcd78d66fd66f620da0eb2808203be94e3bfecacb0ab71ce432a071140cea65cce8b49db8227096e1930fd8c47437fe051e7fe144b8d4272f2c
7
+ data.tar.gz: bdc94e15b3d375e100e682a46b978abb81eb561aed3bf52bd206123af85f2dad279f9df91f2ada274e7ca1fca04e250cec3e0bbd35ad368f6634c5c631804541
data/CHANGELOG.md CHANGED
@@ -1,4 +1,6 @@
1
- ## [Unreleased]
1
+ ## [0.2.0] - 2023-07-19
2
+
3
+ - Replaces PyCall with tiktoken_ruby
2
4
 
3
5
  ## [0.1.0] - 2023-05-02
4
6
 
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [tiktoken](https://github.com/openai/tiktoken) is a fast BPE tokenizer for use with OpenAI's models. `roseflow-tiktoken` gem helps you use the tokenizer in Ruby, especially with (Roseflow)[https://github.com/ljuti/roseflow].
4
4
 
5
- Currently, this gem wraps the (`tiktoken` Python module)[https://github.com/openai/tiktoken] for convenient use in Roseflow.
5
+ This gem wraps the (`tiktoken_ruby` gem)[https://github.com/IAPark/tiktoken_ruby] for convenient use in Roseflow.
6
6
 
7
7
  ## Installation
8
8
 
@@ -45,6 +45,8 @@ tokenizer.decode([19952, 420, 925, 1139, 11460, 13]) # => "Turn this string into
45
45
  | `p50k_edit` | Use for edit models like `text-davinci-edit-001`, `code-davinci-edit-001` |
46
46
  | `r50k_base` (or `gpt2`) | GPT-3 models like `davinci` |
47
47
 
48
+ If a model is not provided or is unknown to the library, it will default to `cl100k_base` encoding.
49
+
48
50
  ## Development
49
51
 
50
52
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -1,12 +1,11 @@
1
- require "pycall"
1
+ require "tiktoken_ruby"
2
2
 
3
3
  module Roseflow
4
4
  module Tiktoken
5
5
  class Tokenizer
6
6
  def initialize(model: nil)
7
- @tokenizer = PyCall.import_module("tiktoken")
8
7
  @model = model
9
- @encoding = @tokenizer.encoding_for_model(@model) if @model
8
+ @encoding = determine_encoding(model)
10
9
  end
11
10
 
12
11
  def encode(input)
@@ -41,6 +40,11 @@ module Roseflow
41
40
 
42
41
  private
43
42
 
43
+ def determine_encoding(model)
44
+ encoding = model ? ::Tiktoken.encoding_for_model(model) : ::Tiktoken.get_encoding("cl100k_base")
45
+ encoding.is_a?(::Tiktoken::Encoding) ? encoding : ::Tiktoken.get_encoding("cl100k_base")
46
+ end
47
+
44
48
  def tokens_per_message_for_model(model)
45
49
  case model
46
50
  when "gpt-4"
@@ -8,7 +8,7 @@ module Roseflow
8
8
 
9
9
  module VERSION
10
10
  MAJOR = 0
11
- MINOR = 1
11
+ MINOR = 2
12
12
  PATCH = 0
13
13
  PRE = nil
14
14
 
@@ -6,7 +6,7 @@ Gem::Specification.new do |spec|
6
6
  spec.name = "roseflow-tiktoken"
7
7
  spec.version = Roseflow::Tiktoken.gem_version
8
8
  spec.authors = ["Lauri Jutila"]
9
- spec.email = ["git@laurijutila.com"]
9
+ spec.email = ["ljuti@users.noreply.github.com"]
10
10
 
11
11
  spec.summary = "Tiktoken tokenizer for Roseflow."
12
12
  spec.description = "Tiktoken tokenizer for Roseflow."
@@ -29,5 +29,5 @@ Gem::Specification.new do |spec|
29
29
  spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
30
30
  spec.require_paths = ["lib"]
31
31
 
32
- spec.add_dependency "pycall", "~> 1.4"
32
+ spec.add_dependency "tiktoken_ruby"
33
33
  end
metadata CHANGED
@@ -1,32 +1,32 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: roseflow-tiktoken
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Lauri Jutila
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2023-05-10 00:00:00.000000000 Z
11
+ date: 2023-07-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: pycall
14
+ name: tiktoken_ruby
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: '1.4'
19
+ version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: '1.4'
26
+ version: '0'
27
27
  description: Tiktoken tokenizer for Roseflow.
28
28
  email:
29
- - git@laurijutila.com
29
+ - ljuti@users.noreply.github.com
30
30
  executables: []
31
31
  extensions: []
32
32
  extra_rdoc_files: []