lanterndb 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 33b108e60126444cecd60d7b8bb093edd3a260993842974d138e70628622dd89
4
+ data.tar.gz: 554d9f355237b15b6d32ba694753369891cec7e721cd99d330330dd1d1e3e176
5
+ SHA512:
6
+ metadata.gz: f58f15546955bfd16c01a665c9da82106bd417accca3117e8eba2a2e318a713297d07c28a5a1a935a71c2b1c6c3d9508cb49ef1a64fba89f1dc302b71abe23f0
7
+ data.tar.gz: 9224859157287a5180bed90f7bc9bca8216b021d2129f7fe93c3b8465b48554222d73dfa060449296a4a6b62a64b79db755197d37c62c39c0ab764754172218a
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2024 Lantern Systems, Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,150 @@
1
+ # Lantern Ruby Client
2
+
3
+ [![codecov](https://codecov.io/gh/lanterndata/lantern-ruby/branch/main/graph/badge.svg)](https://codecov.io/gh/lanterndata/lantern-ruby)
4
+
5
+ No Ruby client is required for `pg` or `Sequel`. For `ActiveRecord` and `Rails`, you can use the `lantern` gem.
6
+
7
+ ## Features
8
+
9
+ - Perform nearest neighbor queries over vectors using vectors or text
10
+ - Create text embeddings using OpenAI, Cophere, and open-source models
11
+
12
+ ## Installation
13
+
14
+ Add this line to your application's Gemfile:
15
+
16
+ ```ruby
17
+ gem 'lanterndb'
18
+ ```
19
+
20
+ And then execute:
21
+
22
+ ```bash
23
+ bundle install
24
+ ```
25
+
26
+ Or install it yourself as:
27
+
28
+ ```bash
29
+ gem install lanterndb
30
+ ```
31
+
32
+ ## ActiveRecord
33
+
34
+ ### Connect to the database
35
+
36
+ ```ruby
37
+ require 'active_record'
38
+ require 'lantern'
39
+ ActiveRecord::Base.establish_connection("DATABASE_URL")
40
+ ActiveRecord::Base.connection.enable_extension("lantern")
41
+ conn = ActiveRecord::Base.connection
42
+ ```
43
+
44
+ ### Create a model
45
+
46
+ ```ruby
47
+ ActiveRecord::Migration.create_table :movies do |t|
48
+ t.column :movie_embedding, :real, array: true
49
+ end
50
+ conn.execute("INSERT INTO movies (movie_embedding) VALUES ('{0,1,0}'), ('{3,2,4}')")
51
+ ```
52
+
53
+ ### Embedding generation
54
+
55
+ ```ruby
56
+ embedding1 = Lantern.text_embedding('BAAI/bge-base-en', 'Your text here')
57
+
58
+ Lantern.set_api_token(openai_token: 'your_openai_token')
59
+ embedding2 = Lantern.openai_embedding('text-embedding-3-small', 'Hello')
60
+
61
+ Lantern.set_api_token(cohere_token: 'your_cohere_token')
62
+ embedding3 = Lantern.cohere_embedding('embed-english-v3.0', 'Hello')
63
+ ```
64
+
65
+ A full list of supported models can be found [here](lantern.dev/docs/develop/generate).
66
+
67
+ ### Vector search
68
+
69
+ This gem provides several ways to perform vector search. We support the following distance metrics:
70
+
71
+ - `l2` (Euclidean distance)
72
+ - `cosine` (Cosine similarity)
73
+
74
+ Using pre-computed vectors:
75
+
76
+ ```ruby
77
+ class Document < ApplicationRecord
78
+ has_neighbors :embedding
79
+ end
80
+
81
+ # Find 5 nearest neighbors using L2 distance
82
+ Document.nearest_neighbors(:embedding, [0.1, 0.2, 0.3], distance: 'l2').limit(5)
83
+
84
+ # Given a document, find 5 nearest neighbors using cosine distance
85
+ document = Document.first
86
+ document.nearest_neighbors(:embedding, distance: 'cosine').limit(5)
87
+ ```
88
+
89
+ Using text:
90
+
91
+ ```ruby
92
+ class Book < ApplicationRecord
93
+ has_neighbors :embedding
94
+ end
95
+
96
+ # Find 5 nearest neighbors using open-source model
97
+ Book.nearest_neighbors(:embedding, 'The quick brown fox', model: 'BAAI/bge-small-en', distance: 'l2').limit(5)
98
+
99
+ # Find 5 nearest neighbors using OpenAI
100
+ Lantern.set_api_token(openai_token: 'your_openai_token')
101
+ Book.nearest_neighbors(:embedding, 'The quick brown fox', model: 'openai/text-embedding-3-small', distance: 'cosine').limit(5)
102
+ ```
103
+
104
+ ### Vector index
105
+
106
+ To speed up vector search queries, you can add an HNSW index to your model:
107
+
108
+ ```ruby
109
+ class CreateVectorIndex < ActiveRecord::Migration[7.0]
110
+ def up
111
+ add_index :books, :embedding, using: :lantern_hnsw, opclass: :dist_l2sq_ops, name: 'book_embedding_index'
112
+ end
113
+ def down
114
+ remove_index :books, name: 'book_embedding_index'
115
+ end
116
+ end
117
+ ```
118
+
119
+ Note: This does not support `WITH` parameters (e.g., `ef_construction`, `ef`, `m`, `dim`).
120
+
121
+ To specify `WITH` parameters, you can pass them as options with raw SQL:
122
+
123
+ ```ruby
124
+ class CreateHnswIndex < ActiveRecord::Migration[7.0]
125
+ def up
126
+ execute <<-SQL
127
+ CREATE INDEX movie_embedding_hnsw_idx
128
+ ON movies
129
+ USING lantern_hnsw (movie_embedding dist_l2sq_ops)
130
+ WITH (
131
+ ef = 15,
132
+ m = 16,
133
+ ef_construction = 64
134
+ )
135
+ SQL
136
+ end
137
+ def down
138
+ drop index movie_embedding_hnsw_idx
139
+ end
140
+ end
141
+ ```
142
+
143
+ ## Rails
144
+
145
+ For Rails, enable the Lantern extension using the provided generator:
146
+
147
+ ```bash
148
+ rails generate lantern:install
149
+ rails db:migrate
150
+ ```
@@ -0,0 +1,21 @@
1
+ require 'rails/generators'
2
+ require 'rails/generators/active_record'
3
+
4
+ module Lantern
5
+ module Generators
6
+ class LanternGenerator < Rails::Generators::Base
7
+ include ActiveRecord::Generators::Migration
8
+ source_root File.join(__dir__, 'templates')
9
+
10
+ def copy_migration
11
+ migration_template 'lantern.rb.tt', 'db/migrate/install_lantern.rb', migration_version: migration_version
12
+ end
13
+
14
+ private
15
+
16
+ def migration_version
17
+ "[#{ActiveRecord::VERSION::MAJOR}.#{ActiveRecord::VERSION::MINOR}]"
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,5 @@
1
+ class <%= migration_class_name %> < ActiveRecord::Migration<%= migration_version %>
2
+ def change
3
+ enable_extension "lantern"
4
+ end
5
+ end
@@ -0,0 +1,42 @@
1
+ require 'active_support'
2
+
3
+ module Lantern
4
+ module Embeddings
5
+ extend ActiveSupport::Concern
6
+
7
+ class_methods do
8
+ # Generates a text embedding using the specified model
9
+ #
10
+ # @param model [String] The embedding model to use (e.g., 'BAAI/bge-base-en')
11
+ # @param text [String] The text input to embed
12
+ # @return [Array<Float>] The generated embedding vector
13
+ def text_embedding(model, text)
14
+ generate_embedding(model, text, 'text_embedding')
15
+ end
16
+ def openai_embedding(model, text, dim = nil)
17
+ Lantern.ensure_token!(:openai)
18
+ generate_embedding(model, text, 'openai_embedding', dim)
19
+ end
20
+ def cohere_embedding(model, text, input_type = nil)
21
+ Lantern.ensure_token!(:cohere)
22
+ generate_embedding(model, text, 'cohere_embedding', input_type)
23
+ end
24
+
25
+ private
26
+
27
+ def generate_embedding(model, text, embedding_function, other = nil)
28
+ sanitized_model = connection.quote(model)
29
+ sanitized_text = connection.quote(text)
30
+ if other
31
+ query = "SELECT #{embedding_function}(#{sanitized_model}, #{sanitized_text}, #{other}) AS embedding"
32
+ else
33
+ query = "SELECT #{embedding_function}(#{sanitized_model}, #{sanitized_text}) AS embedding"
34
+ end
35
+ result = connection.select_one(query)
36
+ embedding = result['embedding'].tr('{}', '').split(',').map(&:to_f)
37
+ embedding
38
+ end
39
+ end
40
+ end
41
+ end
42
+
@@ -0,0 +1,60 @@
1
+ module Lantern
2
+ module Model
3
+ def has_neighbors(*attribute_names)
4
+ attribute_names.map!(&:to_sym)
5
+
6
+ scope :nearest_neighbors, ->(attribute_name, vector_or_text, model: nil, openai_dim: nil, cohere_input_type: nil, distance: 'l2') {
7
+ attribute_name = attribute_name.to_sym
8
+
9
+ # Distance operator
10
+ distance_operators = {
11
+ 'l2' => '<->',
12
+ 'cosine' => '<=>'
13
+ }
14
+ operator = distance_operators[distance]
15
+ unless operator
16
+ raise ArgumentError, "Invalid distance metric. Allowed metrics are #{distance_operators.keys.join(', ')}"
17
+ end
18
+
19
+ # Vector order by expression
20
+ order_expression = "#{quoted_table_name}.#{connection.quote_column_name(attribute_name)} #{operator}" + if model
21
+ # Generate vector from text
22
+ text = vector_or_text
23
+ embedding_function = case model
24
+ when /^openai/ then 'openai_embedding'
25
+ when /^cohere/ then 'cohere_embedding'
26
+ else 'text_embedding'
27
+ end
28
+ other = model.start_with?('openai') ? openai_dim : model.start_with?('cohere') ? cohere_input_type : nil
29
+ sanitized_model = connection.quote(model)
30
+ sanitized_text = connection.quote(text)
31
+ if other
32
+ "#{embedding_function}(#{sanitized_model}, #{sanitized_text}, #{other})"
33
+ else
34
+ "#{embedding_function}(#{sanitized_model}, #{sanitized_text})"
35
+ end
36
+ else
37
+ # Vector-based search
38
+ vector = vector_or_text
39
+ unless vector.all? { |v| v.is_a?(Integer) }
40
+ vector = vector.map(&:to_f)
41
+ end
42
+ vector_literal = vector.join(',')
43
+ "ARRAY[#{vector_literal}]"
44
+ end
45
+
46
+ select_columns = select_values.any? ? [] : column_names
47
+ select(select_columns, "#{order_expression} AS distance")
48
+ .where.not(attribute_name => nil)
49
+ .order(Arel.sql(order_expression))
50
+ }
51
+
52
+ define_method :nearest_neighbors do |attribute_name, distance: 'l2'|
53
+ attribute_name = attribute_name.to_sym
54
+ self.class
55
+ .where.not(id: id)
56
+ .nearest_neighbors(attribute_name, self[attribute_name], distance: distance)
57
+ end
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,11 @@
1
+ require 'rails/railtie'
2
+
3
+ module Lantern
4
+ class Railtie < Rails::Railtie
5
+ railtie_name :lantern
6
+
7
+ generators do
8
+ require 'generators/lantern/lantern_generator'
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,3 @@
1
+ module Lantern
2
+ VERSION = '0.0.0'
3
+ end
data/lib/lantern.rb ADDED
@@ -0,0 +1,54 @@
1
+ require_relative 'lantern/version'
2
+ require_relative 'lantern/model'
3
+ require_relative 'lantern/embeddings'
4
+ require 'active_support'
5
+
6
+ module Lantern
7
+ class Error < StandardError; end
8
+ class MissingTokenError < StandardError; end
9
+
10
+ extend Lantern::Embeddings::ClassMethods
11
+
12
+ class << self
13
+ def connection
14
+ ActiveRecord::Base.connection
15
+ end
16
+
17
+ def current_user
18
+ connection.select_value("SELECT CURRENT_USER")
19
+ end
20
+
21
+ def set_api_token(openai_token: nil, cohere_token: nil)
22
+ if openai_token
23
+ connection.execute("ALTER ROLE #{current_user} SET lantern_extras.openai_token = #{connection.quote(openai_token)};")
24
+ end
25
+ if cohere_token
26
+ connection.execute("ALTER ROLE #{current_user} SET lantern_extras.cohere_token = #{connection.quote(cohere_token)};")
27
+ end
28
+ connection.execute("SELECT pg_reload_conf();")
29
+ end
30
+
31
+ def openai_token
32
+ connection.select_value("SHOW lantern_extras.openai_token")
33
+ end
34
+
35
+ def cohere_token
36
+ connection.select_value("SHOW lantern_extras.cohere_token")
37
+ end
38
+
39
+ def ensure_token!(provider)
40
+ case provider
41
+ when :openai
42
+ raise MissingTokenError, "OpenAI token is required for OpenAI embedding generation" if openai_token.blank?
43
+ when :cohere
44
+ raise MissingTokenError, "Cohere token is required for Cohere embedding generation" if cohere_token.blank?
45
+ end
46
+ end
47
+ end
48
+ end
49
+
50
+ ActiveSupport.on_load(:active_record) do
51
+ extend Lantern::Model
52
+ end
53
+
54
+ require_relative 'lantern/railtie' if defined?(Rails::Railtie)
metadata ADDED
@@ -0,0 +1,196 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: lanterndb
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Di Qi
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-11-10 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: activerecord
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '7.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '7.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: activesupport
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '7.0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '7.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: pg
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '1.2'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '1.2'
55
+ - !ruby/object:Gem::Dependency
56
+ name: dotenv
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '2.7'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '2.7'
69
+ - !ruby/object:Gem::Dependency
70
+ name: minitest
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '5.14'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '5.14'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rake
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '13.0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '13.0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rubocop
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '1.0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '1.0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: railties
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '7.0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '7.0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: simplecov
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ - !ruby/object:Gem::Dependency
140
+ name: simplecov-cobertura
141
+ requirement: !ruby/object:Gem::Requirement
142
+ requirements:
143
+ - - ">="
144
+ - !ruby/object:Gem::Version
145
+ version: '0'
146
+ type: :development
147
+ prerelease: false
148
+ version_requirements: !ruby/object:Gem::Requirement
149
+ requirements:
150
+ - - ">="
151
+ - !ruby/object:Gem::Version
152
+ version: '0'
153
+ description:
154
+ email: support@lantern.dev
155
+ executables: []
156
+ extensions: []
157
+ extra_rdoc_files: []
158
+ files:
159
+ - LICENSE.txt
160
+ - README.md
161
+ - lib/generators/lantern/lantern_generator.rb
162
+ - lib/generators/lantern/templates/lantern.rb.tt
163
+ - lib/lantern.rb
164
+ - lib/lantern/embeddings.rb
165
+ - lib/lantern/model.rb
166
+ - lib/lantern/railtie.rb
167
+ - lib/lantern/version.rb
168
+ homepage: https://lantern.dev
169
+ licenses:
170
+ - MIT
171
+ metadata:
172
+ homepage_uri: https://lantern.dev
173
+ source_code_uri: https://github.com/lanterndata/lantern-ruby
174
+ bug_tracker_uri: https://github.com/lanterndata/lantern-ruby/issues
175
+ documentation_uri: https://lantern.dev/docs
176
+ changelog_uri: https://github.com/lanterndata/lantern-ruby/blob/main/CHANGELOG.md
177
+ post_install_message:
178
+ rdoc_options: []
179
+ require_paths:
180
+ - lib
181
+ required_ruby_version: !ruby/object:Gem::Requirement
182
+ requirements:
183
+ - - ">="
184
+ - !ruby/object:Gem::Version
185
+ version: '3.1'
186
+ required_rubygems_version: !ruby/object:Gem::Requirement
187
+ requirements:
188
+ - - ">="
189
+ - !ruby/object:Gem::Version
190
+ version: '0'
191
+ requirements: []
192
+ rubygems_version: 3.5.23
193
+ signing_key:
194
+ specification_version: 4
195
+ summary: Lantern Rails Client
196
+ test_files: []