roseflow-tiktoken 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: f3b272bb8e3b5a4805fe59670894340347a0310ada891f29abb03235b4aca243
4
+ data.tar.gz: 3135181d05c8ee397d57d88f99f100f8db45dc5cd7295dff27db3641de7bcc0f
5
+ SHA512:
6
+ metadata.gz: a21a9d867e21ea71d36f1605e894fd68933088c25df3077381bdecd1d20b7d12e13826577fe649832dc9294d6aa9e14ade0591cdf2327fe1883d60233005d342
7
+ data.tar.gz: 6f94d37d4f661d5d672e8a5c8a9952544458d815f7997ecd01db82f6e2b17cbb8d3a9b7838993b5567e883d744abb08cf3f51a38030437b5823a79bf7ee0c8ff
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.standard.yml ADDED
@@ -0,0 +1,3 @@
1
+ # For available configuration options, see:
2
+ # https://github.com/testdouble/standard
3
+ ruby_version: 2.6
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2023-05-02
4
+
5
+ - Initial release
@@ -0,0 +1,7 @@
1
+ # Contributor Code of Conduct
2
+
3
+ The Roseflow team is committed to fostering a welcoming community.
4
+
5
+ **Our Code of Conduct can be found here**:
6
+
7
+ https://roseflow.ai/conduct
data/Gemfile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in roseflow-tiktoken.gemspec
6
+ gemspec
7
+
8
+ gem "rake", "~> 13.0"
9
+
10
+ gem "rspec", "~> 3.0"
11
+
12
+ gem "standard", "~> 1.3"
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 Lauri Jutila
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,78 @@
1
+ # Tiktoken tokenizer for Roseflow
2
+
3
+ [tiktoken](https://github.com/openai/tiktoken) is a fast BPE tokenizer for use with OpenAI's models. `roseflow-tiktoken` gem helps you use the tokenizer in Ruby, especially with (Roseflow)[https://github.com/ljuti/roseflow].
4
+
5
+ Currently, this gem wraps the (`tiktoken` Python module)[https://github.com/openai/tiktoken] for convenient use in Roseflow.
6
+
7
+ ## Installation
8
+
9
+ Install the gem and add to the application's Gemfile by executing:
10
+
11
+ ```bash
12
+ $ bundle add roseflow-tiktoken
13
+ ```
14
+
15
+ If bundler is not being used to manage dependencies, install the gem by executing:
16
+
17
+ ```bash
18
+ $ gem install roseflow-tiktoken
19
+ ```
20
+
21
+ ## Usage
22
+
23
+ Encode with tokenizer.
24
+
25
+ ```ruby
26
+ tokenizer = Roseflow::Tiktoken::Tokenizer.new(model: "gpt-3.5-turbo")
27
+ tokenizer.encode("Turn this string into tokens.") # => [...] array of tokens
28
+ ```
29
+
30
+ Decode with tokenizer.
31
+
32
+ ```ruby
33
+ tokenizer = Roseflow::Tiktoken::Tokenizer.new(model: "gpt-3.5-turbo")
34
+ tokenizer.decode([19952, 420, 925, 1139, 11460, 13]) # => "Turn this string into tokens.")
35
+ ```
36
+
37
+ ### Encodings
38
+
39
+ `roseflow-tiktoken` supports these encodings used by OpenAI models:
40
+
41
+ | Encoding name | OpenAI models |
42
+ | ----------------------- | ------------------------------------------------------------------------- |
43
+ | `cl100k_base` | ChatGPT models, `text-embedding-ada-002` |
44
+ | `p50k_base` | Code models, `text-davinci-002`, `text-davinci-003` |
45
+ | `p50k_edit` | Use for edit models like `text-davinci-edit-001`, `code-davinci-edit-001` |
46
+ | `r50k_base` (or `gpt2`) | GPT-3 models like `davinci` |
47
+
48
+ ## Development
49
+
50
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
51
+
52
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
53
+
54
+ ## Contributing
55
+
56
+ Bug reports and pull requests are welcome on GitHub at https://github.com/roseflow-ai/roseflow-tiktoken.
57
+
58
+ ## Community
59
+
60
+ ### Discord
61
+
62
+ Join us in our [Discord](https://discord.gg/roseflow).
63
+
64
+ ### Twitter
65
+
66
+ Connect with the core team on Twitter.
67
+
68
+ <a href="https://twitter.com/ljuti" target="_blank">
69
+ <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/ljuti?logo=twitter&style=social">
70
+ </a>
71
+
72
+ ## License
73
+
74
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
75
+
76
+ ## Code of Conduct
77
+
78
+ Everyone interacting in the `roseflow-tiktoken` project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/roseflow-ai/roseflow-tiktoken/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "standard/rake"
9
+
10
+ task default: %i[spec standard]
@@ -0,0 +1,75 @@
1
+ require "pycall"
2
+
3
+ module Roseflow
4
+ module Tiktoken
5
+ class Tokenizer
6
+ def initialize(model: nil)
7
+ @tokenizer = PyCall.import_module("tiktoken")
8
+ @model = model
9
+ @encoding = @tokenizer.encoding_for_model(@model) if @model
10
+ end
11
+
12
+ def encode(input)
13
+ @encoding.encode(input)
14
+ rescue
15
+ raise ::Roseflow::Tiktoken::NoEncodingError, "No encoding found for model #{@model}"
16
+ end
17
+
18
+ def decode(input)
19
+ @encoding.decode(input)
20
+ rescue
21
+ raise ::Roseflow::Tiktoken::NoEncodingError, "No encoding found for model #{@model}"
22
+ end
23
+
24
+ def count_tokens(messages)
25
+ token_count = 0
26
+
27
+ messages.each do |message|
28
+ token_count += tokens_per_message_for_model(@model)
29
+
30
+ message.each do |key, value|
31
+ token_count += encode(value).count
32
+ if key == "name"
33
+ token_count += tokens_per_message_for_model(@model)
34
+ end
35
+ end
36
+ end
37
+
38
+ token_count += 3 # Every reply is primed with assistant
39
+ return token_count
40
+ end
41
+
42
+ private
43
+
44
+ def tokens_per_message_for_model(model)
45
+ case model
46
+ when "gpt-4"
47
+ tokens_per_message_for_model("gpt-4-0314")
48
+ when "gpt-3.5-turbo"
49
+ tokens_per_message_for_model("gpt-3.5-turbo-0301")
50
+ when "gpt-4-0314"
51
+ 3
52
+ when "gpt-3.5-turbo-0301"
53
+ 4
54
+ else
55
+ raise NotImplementedError, "Model #{model} is not supported."
56
+ end
57
+ end
58
+
59
+ def tokens_per_name_for_model(model)
60
+ case model
61
+ when "gpt-4"
62
+ tokens_per_message_for_model("gpt-4-0314")
63
+ when "gpt-3.5-turbo"
64
+ tokens_per_message_for_model("gpt-3.5-turbo-0301")
65
+ when "gpt-4-0314"
66
+ 1
67
+ when "gpt-3.5-turbo-0301"
68
+ -1
69
+ else
70
+ raise NotImplementedError, "Model #{model} is not supported."
71
+ end
72
+ end
73
+ end
74
+ end
75
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Roseflow
4
+ module Tiktoken
5
+ def self.gem_version
6
+ Gem::Version.new VERSION::STRING
7
+ end
8
+
9
+ module VERSION
10
+ MAJOR = 0
11
+ MINOR = 1
12
+ PATCH = 0
13
+ PRE = nil
14
+
15
+ STRING = [MAJOR, MINOR, PATCH, PRE].compact.join(".")
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "tiktoken/version"
4
+ require "roseflow/tiktoken/tokenizer"
5
+
6
+ module Roseflow
7
+ module Tiktoken
8
+ class Error < StandardError; end
9
+ class NoEncodingError < StandardError; end
10
+ end
11
+ end
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/roseflow/tiktoken/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "roseflow-tiktoken"
7
+ spec.version = Roseflow::Tiktoken.gem_version
8
+ spec.authors = ["Lauri Jutila"]
9
+ spec.email = ["git@laurijutila.com"]
10
+
11
+ spec.summary = "Tiktoken tokenizer for Roseflow."
12
+ spec.description = "Tiktoken tokenizer for Roseflow."
13
+ spec.homepage = "https://github.com/roseflow-ai/roseflow-tiktoken"
14
+ spec.license = "MIT"
15
+ spec.required_ruby_version = ">= 3.2.0"
16
+
17
+ spec.metadata["homepage_uri"] = spec.homepage
18
+ spec.metadata["source_code_uri"] = "https://github.com/roseflow-ai/roseflow-tiktoken"
19
+ spec.metadata["changelog_uri"] = "https://github.com/roseflow-ai/roseflow-tiktoken/blob/master/CHANGELOG.md"
20
+
21
+ # Specify which files should be added to the gem when it is released.
22
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
23
+ spec.files = Dir.chdir(__dir__) do
24
+ `git ls-files -z`.split("\x0").reject do |f|
25
+ (File.expand_path(f) == __FILE__) || f.start_with?(*%w[bin/ test/ spec/ features/ .git .circleci appveyor])
26
+ end
27
+ end
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+
32
+ spec.add_dependency "pycall", "~> 1.4"
33
+ end
@@ -0,0 +1,6 @@
1
+ module Roseflow
2
+ module Tiktoken
3
+ VERSION: String
4
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
5
+ end
6
+ end
metadata ADDED
@@ -0,0 +1,73 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: roseflow-tiktoken
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Lauri Jutila
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2023-05-10 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: pycall
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.4'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.4'
27
+ description: Tiktoken tokenizer for Roseflow.
28
+ email:
29
+ - git@laurijutila.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - ".rspec"
35
+ - ".standard.yml"
36
+ - CHANGELOG.md
37
+ - CODE_OF_CONDUCT.md
38
+ - Gemfile
39
+ - LICENSE.txt
40
+ - README.md
41
+ - Rakefile
42
+ - lib/roseflow/tiktoken.rb
43
+ - lib/roseflow/tiktoken/tokenizer.rb
44
+ - lib/roseflow/tiktoken/version.rb
45
+ - roseflow-tiktoken.gemspec
46
+ - sig/roseflow/tiktoken.rbs
47
+ homepage: https://github.com/roseflow-ai/roseflow-tiktoken
48
+ licenses:
49
+ - MIT
50
+ metadata:
51
+ homepage_uri: https://github.com/roseflow-ai/roseflow-tiktoken
52
+ source_code_uri: https://github.com/roseflow-ai/roseflow-tiktoken
53
+ changelog_uri: https://github.com/roseflow-ai/roseflow-tiktoken/blob/master/CHANGELOG.md
54
+ post_install_message:
55
+ rdoc_options: []
56
+ require_paths:
57
+ - lib
58
+ required_ruby_version: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: 3.2.0
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ requirements: []
69
+ rubygems_version: 3.4.1
70
+ signing_key:
71
+ specification_version: 4
72
+ summary: Tiktoken tokenizer for Roseflow.
73
+ test_files: []