roseflow-tiktoken 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: f3b272bb8e3b5a4805fe59670894340347a0310ada891f29abb03235b4aca243
4
+ data.tar.gz: 3135181d05c8ee397d57d88f99f100f8db45dc5cd7295dff27db3641de7bcc0f
5
+ SHA512:
6
+ metadata.gz: a21a9d867e21ea71d36f1605e894fd68933088c25df3077381bdecd1d20b7d12e13826577fe649832dc9294d6aa9e14ade0591cdf2327fe1883d60233005d342
7
+ data.tar.gz: 6f94d37d4f661d5d672e8a5c8a9952544458d815f7997ecd01db82f6e2b17cbb8d3a9b7838993b5567e883d744abb08cf3f51a38030437b5823a79bf7ee0c8ff
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.standard.yml ADDED
@@ -0,0 +1,3 @@
1
+ # For available configuration options, see:
2
+ # https://github.com/testdouble/standard
3
+ ruby_version: 2.6
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## [Unreleased]
2
+
3
+ ## [0.1.0] - 2023-05-02
4
+
5
+ - Initial release
@@ -0,0 +1,7 @@
1
+ # Contributor Code of Conduct
2
+
3
+ The Roseflow team is committed to fostering a welcoming community.
4
+
5
+ **Our Code of Conduct can be found here**:
6
+
7
+ https://roseflow.ai/conduct
data/Gemfile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in roseflow-tiktoken.gemspec
6
+ gemspec
7
+
8
+ gem "rake", "~> 13.0"
9
+
10
+ gem "rspec", "~> 3.0"
11
+
12
+ gem "standard", "~> 1.3"
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 Lauri Jutila
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,78 @@
1
+ # Tiktoken tokenizer for Roseflow
2
+
3
+ [tiktoken](https://github.com/openai/tiktoken) is a fast BPE tokenizer for use with OpenAI's models. `roseflow-tiktoken` gem helps you use the tokenizer in Ruby, especially with (Roseflow)[https://github.com/ljuti/roseflow].
4
+
5
+ Currently, this gem wraps the (`tiktoken` Python module)[https://github.com/openai/tiktoken] for convenient use in Roseflow.
6
+
7
+ ## Installation
8
+
9
+ Install the gem and add to the application's Gemfile by executing:
10
+
11
+ ```bash
12
+ $ bundle add roseflow-tiktoken
13
+ ```
14
+
15
+ If bundler is not being used to manage dependencies, install the gem by executing:
16
+
17
+ ```bash
18
+ $ gem install roseflow-tiktoken
19
+ ```
20
+
21
+ ## Usage
22
+
23
+ Encode with tokenizer.
24
+
25
+ ```ruby
26
+ tokenizer = Roseflow::Tiktoken::Tokenizer.new(model: "gpt-3.5-turbo")
27
+ tokenizer.encode("Turn this string into tokens.") # => [...] array of tokens
28
+ ```
29
+
30
+ Decode with tokenizer.
31
+
32
+ ```ruby
33
+ tokenizer = Roseflow::Tiktoken::Tokenizer.new(model: "gpt-3.5-turbo")
34
+ tokenizer.decode([19952, 420, 925, 1139, 11460, 13]) # => "Turn this string into tokens.")
35
+ ```
36
+
37
+ ### Encodings
38
+
39
+ `roseflow-tiktoken` supports these encodings used by OpenAI models:
40
+
41
+ | Encoding name | OpenAI models |
42
+ | ----------------------- | ------------------------------------------------------------------------- |
43
+ | `cl100k_base` | ChatGPT models, `text-embedding-ada-002` |
44
+ | `p50k_base` | Code models, `text-davinci-002`, `text-davinci-003` |
45
+ | `p50k_edit` | Use for edit models like `text-davinci-edit-001`, `code-davinci-edit-001` |
46
+ | `r50k_base` (or `gpt2`) | GPT-3 models like `davinci` |
47
+
48
+ ## Development
49
+
50
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
51
+
52
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
53
+
54
+ ## Contributing
55
+
56
+ Bug reports and pull requests are welcome on GitHub at https://github.com/roseflow-ai/roseflow-tiktoken.
57
+
58
+ ## Community
59
+
60
+ ### Discord
61
+
62
+ Join us in our [Discord](https://discord.gg/roseflow).
63
+
64
+ ### Twitter
65
+
66
+ Connect with the core team on Twitter.
67
+
68
+ <a href="https://twitter.com/ljuti" target="_blank">
69
+ <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/ljuti?logo=twitter&style=social">
70
+ </a>
71
+
72
+ ## License
73
+
74
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
75
+
76
+ ## Code of Conduct
77
+
78
+ Everyone interacting in the `roseflow-tiktoken` project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/roseflow-ai/roseflow-tiktoken/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "standard/rake"
9
+
10
+ task default: %i[spec standard]
@@ -0,0 +1,75 @@
1
+ require "pycall"
2
+
3
+ module Roseflow
4
+ module Tiktoken
5
+ class Tokenizer
6
+ def initialize(model: nil)
7
+ @tokenizer = PyCall.import_module("tiktoken")
8
+ @model = model
9
+ @encoding = @tokenizer.encoding_for_model(@model) if @model
10
+ end
11
+
12
+ def encode(input)
13
+ @encoding.encode(input)
14
+ rescue
15
+ raise ::Roseflow::Tiktoken::NoEncodingError, "No encoding found for model #{@model}"
16
+ end
17
+
18
+ def decode(input)
19
+ @encoding.decode(input)
20
+ rescue
21
+ raise ::Roseflow::Tiktoken::NoEncodingError, "No encoding found for model #{@model}"
22
+ end
23
+
24
+ def count_tokens(messages)
25
+ token_count = 0
26
+
27
+ messages.each do |message|
28
+ token_count += tokens_per_message_for_model(@model)
29
+
30
+ message.each do |key, value|
31
+ token_count += encode(value).count
32
+ if key == "name"
33
+ token_count += tokens_per_message_for_model(@model)
34
+ end
35
+ end
36
+ end
37
+
38
+ token_count += 3 # Every reply is primed with assistant
39
+ return token_count
40
+ end
41
+
42
+ private
43
+
44
+ def tokens_per_message_for_model(model)
45
+ case model
46
+ when "gpt-4"
47
+ tokens_per_message_for_model("gpt-4-0314")
48
+ when "gpt-3.5-turbo"
49
+ tokens_per_message_for_model("gpt-3.5-turbo-0301")
50
+ when "gpt-4-0314"
51
+ 3
52
+ when "gpt-3.5-turbo-0301"
53
+ 4
54
+ else
55
+ raise NotImplementedError, "Model #{model} is not supported."
56
+ end
57
+ end
58
+
59
+ def tokens_per_name_for_model(model)
60
+ case model
61
+ when "gpt-4"
62
+ tokens_per_message_for_model("gpt-4-0314")
63
+ when "gpt-3.5-turbo"
64
+ tokens_per_message_for_model("gpt-3.5-turbo-0301")
65
+ when "gpt-4-0314"
66
+ 1
67
+ when "gpt-3.5-turbo-0301"
68
+ -1
69
+ else
70
+ raise NotImplementedError, "Model #{model} is not supported."
71
+ end
72
+ end
73
+ end
74
+ end
75
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Roseflow
4
+ module Tiktoken
5
+ def self.gem_version
6
+ Gem::Version.new VERSION::STRING
7
+ end
8
+
9
+ module VERSION
10
+ MAJOR = 0
11
+ MINOR = 1
12
+ PATCH = 0
13
+ PRE = nil
14
+
15
+ STRING = [MAJOR, MINOR, PATCH, PRE].compact.join(".")
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "tiktoken/version"
4
+ require "roseflow/tiktoken/tokenizer"
5
+
6
+ module Roseflow
7
+ module Tiktoken
8
+ class Error < StandardError; end
9
+ class NoEncodingError < StandardError; end
10
+ end
11
+ end
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/roseflow/tiktoken/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "roseflow-tiktoken"
7
+ spec.version = Roseflow::Tiktoken.gem_version
8
+ spec.authors = ["Lauri Jutila"]
9
+ spec.email = ["git@laurijutila.com"]
10
+
11
+ spec.summary = "Tiktoken tokenizer for Roseflow."
12
+ spec.description = "Tiktoken tokenizer for Roseflow."
13
+ spec.homepage = "https://github.com/roseflow-ai/roseflow-tiktoken"
14
+ spec.license = "MIT"
15
+ spec.required_ruby_version = ">= 3.2.0"
16
+
17
+ spec.metadata["homepage_uri"] = spec.homepage
18
+ spec.metadata["source_code_uri"] = "https://github.com/roseflow-ai/roseflow-tiktoken"
19
+ spec.metadata["changelog_uri"] = "https://github.com/roseflow-ai/roseflow-tiktoken/blob/master/CHANGELOG.md"
20
+
21
+ # Specify which files should be added to the gem when it is released.
22
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
23
+ spec.files = Dir.chdir(__dir__) do
24
+ `git ls-files -z`.split("\x0").reject do |f|
25
+ (File.expand_path(f) == __FILE__) || f.start_with?(*%w[bin/ test/ spec/ features/ .git .circleci appveyor])
26
+ end
27
+ end
28
+ spec.bindir = "exe"
29
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
30
+ spec.require_paths = ["lib"]
31
+
32
+ spec.add_dependency "pycall", "~> 1.4"
33
+ end
@@ -0,0 +1,6 @@
1
+ module Roseflow
2
+ module Tiktoken
3
+ VERSION: String
4
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
5
+ end
6
+ end
metadata ADDED
@@ -0,0 +1,73 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: roseflow-tiktoken
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Lauri Jutila
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2023-05-10 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: pycall
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.4'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.4'
27
+ description: Tiktoken tokenizer for Roseflow.
28
+ email:
29
+ - git@laurijutila.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - ".rspec"
35
+ - ".standard.yml"
36
+ - CHANGELOG.md
37
+ - CODE_OF_CONDUCT.md
38
+ - Gemfile
39
+ - LICENSE.txt
40
+ - README.md
41
+ - Rakefile
42
+ - lib/roseflow/tiktoken.rb
43
+ - lib/roseflow/tiktoken/tokenizer.rb
44
+ - lib/roseflow/tiktoken/version.rb
45
+ - roseflow-tiktoken.gemspec
46
+ - sig/roseflow/tiktoken.rbs
47
+ homepage: https://github.com/roseflow-ai/roseflow-tiktoken
48
+ licenses:
49
+ - MIT
50
+ metadata:
51
+ homepage_uri: https://github.com/roseflow-ai/roseflow-tiktoken
52
+ source_code_uri: https://github.com/roseflow-ai/roseflow-tiktoken
53
+ changelog_uri: https://github.com/roseflow-ai/roseflow-tiktoken/blob/master/CHANGELOG.md
54
+ post_install_message:
55
+ rdoc_options: []
56
+ require_paths:
57
+ - lib
58
+ required_ruby_version: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: 3.2.0
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ requirements: []
69
+ rubygems_version: 3.4.1
70
+ signing_key:
71
+ specification_version: 4
72
+ summary: Tiktoken tokenizer for Roseflow.
73
+ test_files: []