rails_ai_kit 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: d497000abdc4e0a8f02f6167570d5d00888979e31ed5e484fc707647ebdface5
4
+ data.tar.gz: b721aeaeddabe1c8dce81f6aca4ff94188623c6cdb8fd5b22dd5199a7d1fb3bf
5
+ SHA512:
6
+ metadata.gz: 0f6a1a9ec8f3f9a4d0404f2ef5b7b8dc61ad41bee7a7557d3701a7e958786114430ca4b5c3db4e1581b98bda4b280856eca1a1cd7152987c6d5db54ebc4b1b63
7
+ data.tar.gz: 43316a2c58575ae72429deb036cd3499b42bd19ec5c7fadf15404b46a864fdcffa3b41a33b24815fef16dd426ace74bac38fe8460e6e722c9639785ee5c4d7ef
data/CHANGELOG.md ADDED
@@ -0,0 +1,16 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [0.1.0] - 2025-03-07
6
+
7
+ ### Added
8
+
9
+ - Vector-based classification layer on top of pgvector
10
+ - Configuration for embedding provider (OpenAI, Cohere) and API keys
11
+ - `RailsAiKit::Classifier` with `train`, `classify`, `classify_by_embedding`, `batch_classify`, `labels`
12
+ - `vector_classify` macro for ActiveRecord models (auto embed + classify on save)
13
+ - Similarity search: `Model.similar_to("query text", limit: 5)`
14
+ - Generators: `rails_ai_kit:install`, `rails_ai_kit:vector_columns`
15
+ - Label storage in `rails_ai_kit_labels` (classifier_name, label_name, embedding)
16
+ - Embedding providers: OpenAI (text-embedding-3-small), Cohere (embed-english-v3.0)
data/README.md ADDED
@@ -0,0 +1,271 @@
1
+ # Rails AI Kit
2
+
3
+ A Rails gem that adds a **classification layer on top of [pgvector](https://github.com/pgvector/pgvector)**. Instead of building custom ML or calling LLMs every time, use vector similarity to classify data: support tickets, content moderation, ecommerce categories, document routing, and more.
4
+
5
+ - **No ML training** – train labels with example texts and compare embeddings
6
+ - **No LLM cost** – one-time embedding per piece of content; classification is nearest-neighbor in PostgreSQL
7
+ - **Rails-friendly** – `vector_classify`, `Classifier.train`, `Classifier.classify`, `Article.similar_to("query")`
8
+ - **Multi-provider embeddings** – pass API keys for OpenAI, Cohere, or your own provider
9
+
10
+ ## Architecture
11
+
12
+ ```
13
+ Your Rails App
14
+
15
+
16
+ Rails AI Kit (classification + indexing + similarity)
17
+
18
+
19
+ pgvector (PostgreSQL)
20
+
21
+
22
+ vector similarity search → predicted label
23
+ ```
24
+
25
+ The gem does **not** store your application data. You store data in your own PostgreSQL tables with pgvector. The gem provides:
26
+
27
+ - Classification logic (label training, classify by similarity)
28
+ - Indexing helpers (migrations for vector columns and label table)
29
+ - Similarity search (`similar_to`)
30
+ - Filtering (standard `where(label: "sports")`)
31
+
32
+ ## Requirements
33
+
34
+ - Rails 6+
35
+ - PostgreSQL with [pgvector](https://github.com/pgvector/pgvector) extension
36
+ - An embedding API (OpenAI or Cohere) and API key
37
+
38
+ ## Installation
39
+
40
+ Add to your Gemfile:
41
+
42
+ ```ruby
43
+ gem "rails_ai_kit"
44
+ ```
45
+
46
+ Then:
47
+
48
+ ```bash
49
+ bundle install
50
+ ```
51
+
52
+ ### 1. Enable pgvector and create the labels table
53
+
54
+ The gem needs one internal table to store **label embeddings** (one vector per label per classifier):
55
+
56
+ ```bash
57
+ rails g rails_ai_kit:install
58
+ rails db:migrate
59
+ ```
60
+
61
+ ### 2. Configure embedding provider and API keys
62
+
63
+ In `config/initializers/rails_ai_kit.rb` (create the file):
64
+
65
+ ```ruby
66
+ RailsAiKit.configure do |config|
67
+ config.embedding_provider = :openai # or :cohere
68
+ config.embedding_dimensions = 1536 # 1536 for OpenAI text-embedding-3-small
69
+
70
+ # Pass API keys for the provider(s) you use
71
+ config.api_keys = {
72
+ openai: ENV["OPENAI_API_KEY"],
73
+ cohere: ENV["COHERE_API_KEY"]
74
+ }
75
+
76
+ config.default_classifier_name = "default"
77
+ end
78
+ ```
79
+
80
+ Use environment variables or Rails credentials; do not commit raw API keys.
81
+
82
+ ### 3. Add vector columns to your model
83
+
84
+ For a new or existing table (e.g. `articles` with a `content` column):
85
+
86
+ ```bash
87
+ rails g rails_ai_kit:vector_columns Article content
88
+ rails db:migrate
89
+ ```
90
+
91
+ This adds `embedding` (vector), `label` (string), and `confidence_score` (float).
92
+
93
+ ## Usage
94
+
95
+ ### Declare vector classification on a model
96
+
97
+ ```ruby
98
+ class Article < ApplicationRecord
99
+ vector_classify :content,
100
+ labels: ["sports", "politics", "technology"]
101
+ end
102
+ ```
103
+
104
+ On save, the gem will:
105
+
106
+ 1. Generate an embedding for `content`
107
+ 2. Store it in `embedding`
108
+ 3. Run classification (nearest label vector) and set `label` and `confidence_score`
109
+
110
+ Example:
111
+
112
+ ```ruby
113
+ article = Article.create!(content: "Apple released a new iPhone")
114
+ article.label # => "technology"
115
+ article.confidence_score # => 0.91
116
+ ```
117
+
118
+ ### Train labels with examples
119
+
120
+ Before classifying, train each label with example texts so the gem can build a label vector (average of example embeddings):
121
+
122
+ ```ruby
123
+ classifier = RailsAiKit.classifier("Article") # or use default
124
+
125
+ classifier.train("sports", examples: [
126
+ "football match",
127
+ "cricket tournament",
128
+ "Olympic gold medal"
129
+ ])
130
+
131
+ classifier.train("politics", examples: [
132
+ "election results",
133
+ "parliament debate"
134
+ ])
135
+
136
+ classifier.train("technology", examples: [
137
+ "new iPhone launch",
138
+ "AI software update"
139
+ ])
140
+ ```
141
+
142
+ You can use the default classifier name or pass a custom one:
143
+
144
+ ```ruby
145
+ RailsAiKit.classifier("Article").train("sports", examples: [...])
146
+ RailsAiKit.classifier.train("sports", examples: [...]) # default classifier
147
+ ```
148
+
149
+ ### Classify text without saving
150
+
151
+ ```ruby
152
+ result = RailsAiKit.classifier("Article").classify("India won the cricket match")
153
+ # => { label: "sports", confidence: 0.91, distance: 0.09 }
154
+ ```
155
+
156
+ ### Batch classification
157
+
158
+ ```ruby
159
+ records = Article.where(label: nil).limit(100)
160
+ RailsAiKit.classifier("Article").batch_classify(records,
161
+ text_attribute: :content,
162
+ label_attribute: :label,
163
+ confidence_attribute: :confidence_score
164
+ )
165
+ # Optionally save: records.each(&:save!)
166
+ ```
167
+
168
+ ### Filtering
169
+
170
+ Use normal ActiveRecord scopes:
171
+
172
+ ```ruby
173
+ Article.where(label: "sports")
174
+ Article.where("confidence_score >= ?", 0.8)
175
+ ```
176
+
177
+ ### Similarity search
178
+
179
+ Find records similar to a piece of text (embeds the query, then nearest-neighbor search):
180
+
181
+ ```ruby
182
+ Article.similar_to("new iPhone launch", limit: 5)
183
+ ```
184
+
185
+ ## Example: Support ticket routing
186
+
187
+ ```ruby
188
+ # app/models/support_ticket.rb
189
+ class SupportTicket < ApplicationRecord
190
+ vector_classify :message,
191
+ labels: ["billing", "technical", "account"],
192
+ classifier_name: "SupportTicket"
193
+ end
194
+
195
+ # Train once
196
+ classifier = RailsAiKit.classifier("SupportTicket")
197
+ classifier.train("billing", examples: ["My payment failed", "Refund request", "Invoice issue"])
198
+ classifier.train("technical", examples: ["App crashed", "Login not working", "Error message"])
199
+ classifier.train("account", examples: ["Change email", "Close my account", "Password reset"])
200
+
201
+ # Incoming ticket
202
+ ticket = SupportTicket.create!(message: "My payment failed last night")
203
+ ticket.label # => "billing" → route to billing queue
204
+ ```
205
+
206
+ ## Configuration reference
207
+
208
+ | Option | Description | Default |
209
+ |--------|-------------|---------|
210
+ | `embedding_provider` | `:openai` or `:cohere` | `:openai` |
211
+ | `embedding_dimensions` | Vector size (must match provider) | `1536` |
212
+ | `api_keys` | Hash of provider => API key | `{}` |
213
+ | `default_classifier_name` | Name when no classifier given | `"default"` |
214
+
215
+ ## Generators
216
+
217
+ | Generator | Purpose |
218
+ |-----------|---------|
219
+ | `rails g rails_ai_kit:install` | Migration: enable pgvector + create `rails_ai_kit_labels` |
220
+ | `rails g rails_ai_kit:vector_columns ModelName content_column` | Migration: add `embedding`, `label`, `confidence_score` to a table |
221
+
222
+ ## How it works
223
+
224
+ 1. **Label vectors** – Each label is represented by a vector (average of example embeddings). Stored in `rails_ai_kit_labels`.
225
+ 2. **Classification** – New content is embedded and compared to all label vectors with cosine distance. The nearest label wins; confidence is `1 - distance`.
226
+ 3. **Storage** – Your table holds the content, its embedding, the predicted label, and confidence. The gem only adds one table for label vectors.
227
+
228
+ ## Future ideas
229
+
230
+ - Hierarchical labels (e.g. technology → mobile, laptops)
231
+ - Confidence threshold (e.g. mark as "unknown" if &lt; 0.7)
232
+ - Hybrid search (vector + keyword)
233
+ - Incremental learning (add examples to improve labels over time)
234
+
235
+ ## Development
236
+
237
+ ```bash
238
+ bundle install
239
+ bundle exec rake install
240
+ ```
241
+
242
+ Run tests (when added) with `bundle exec rspec` or `bundle exec rake test`.
243
+
244
+ ## License
245
+
246
+ MIT.
247
+
248
+ ## How it’s built
249
+
250
+ - **Configuration** (`lib/rails_ai_kit/configuration.rb`) – Embeding provider, dimensions, and API keys (e.g. `api_keys[:openai]`).
251
+ - **Embedding providers** (`lib/rails_ai_kit/embedding_providers/`) – Base class plus OpenAI and Cohere. Each implements `embed(text)` and `embed_batch(texts)` using the provider API.
252
+ - **EmbeddingService** – Wraps the configured provider and API key so `RailsAiKit.embedding.embed(text)` works without passing keys every time.
253
+ - **LabelRecord** – ActiveRecord model for `rails_ai_kit_labels` (classifier_name, label_name, embedding). Uses Neighbor’s `has_neighbors :embedding` for similarity.
254
+ - **Classifier** – Trains labels by averaging example embeddings and storing them; classifies by nearest-neighbor (cosine) against those label vectors. Supports `classify(text)`, `classify_by_embedding(vector)`, and `batch_classify(records)`.
255
+ - **VectorClassify** – Concern that adds the `vector_classify` macro: `has_neighbors` on the embedding column, a before_save that embeds the source column and runs `classify_by_embedding`, and a `similar_to(query_text)` scope that embeds the query and runs nearest-neighbor search.
256
+
257
+ ## How it’s built
258
+
259
+ - **Configuration** (`lib/rails_ai_kit/configuration.rb`) – Embedding provider, dimensions, and API keys (e.g. `api_keys[:openai]`).
260
+ - **Embedding providers** (`lib/rails_ai_kit/embedding_providers/`) – Base class plus OpenAI and Cohere. Each implements `embed(text)` and `embed_batch(texts)` using the provider API.
261
+ - **EmbeddingService** – Wraps the configured provider and API key so `RailsAiKit.embedding.embed(text)` works without passing keys every time.
262
+ - **LabelRecord** – ActiveRecord model for `rails_ai_kit_labels` (classifier_name, label_name, embedding). Uses Neighbor’s `has_neighbors :embedding` for similarity.
263
+ - **Classifier** – Trains labels by averaging example embeddings and storing them; classifies by nearest-neighbor (cosine) against those label vectors. Supports `classify(text)`, `classify_by_embedding(vector)`, and `batch_classify(records)`.
264
+ - **VectorClassify** – Concern that adds the `vector_classify` macro: `has_neighbors` on the embedding column, a before_save that embeds the source column and runs `classify_by_embedding`, and a `similar_to(query_text)` scope that embeds the query and runs nearest-neighbor search.
265
+ - **Generators** – `rails_ai_kit:install` creates the labels table migration; `rails_ai_kit:vector_columns` adds embedding/label/confidence_score to a given table.
266
+
267
+ ## Related
268
+
269
+ - [pgvector](https://github.com/pgvector/pgvector) – Open-source vector similarity search for Postgres
270
+ - [Neighbor](https://github.com/ankane/neighbor) – Nearest neighbor search for Rails (used by this gem)
271
+ test
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ class CreateRailsAiKitLabels < ActiveRecord::Migration[<%= Rails::VERSION::MAJOR %>.<%= Rails::VERSION::MINOR %>]
4
+ def change
5
+ enable_extension "vector" unless extension_enabled?("vector")
6
+
7
+ create_table :rails_ai_kit_labels do |t|
8
+ t.string :classifier_name, null: false
9
+ t.string :label_name, null: false
10
+ t.vector :embedding, limit: <%= (defined?(RailsAiKit) && RailsAiKit.configuration.embedding_dimensions) || 1536 %>, null: false
11
+ t.timestamps
12
+ end
13
+
14
+ add_index :rails_ai_kit_labels, [:classifier_name, :label_name], unique: true
15
+ end
16
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Configure Rails AI Kit (vector classification with pgvector).
4
+ # Set your embedding provider and API keys via ENV or Rails credentials.
5
+ RailsAiKit.configure do |config|
6
+ config.embedding_provider = :openai
7
+ config.embedding_dimensions = 1536
8
+
9
+ config.api_keys = {
10
+ openai: ENV["OPENAI_API_KEY"],
11
+ cohere: ENV["COHERE_API_KEY"]
12
+ }
13
+
14
+ config.default_classifier_name = "default"
15
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails/generators"
4
+
5
+ module RailsAiKit
6
+ module Generators
7
+ class InstallGenerator < Rails::Generators::Base
8
+ source_root File.expand_path("templates", __dir__)
9
+
10
+ desc "Creates a migration to enable pgvector and add rails_ai_kit_labels table for label embeddings"
11
+
12
+ def create_migration
13
+ migration_template "create_rails_ai_kit_labels.rb",
14
+ "db/migrate/#{migration_timestamp}_create_rails_ai_kit_labels.rb"
15
+ end
16
+
17
+ def create_initializer
18
+ copy_file "rails_ai_kit.rb", "config/initializers/rails_ai_kit.rb"
19
+ end
20
+
21
+ private
22
+
23
+ def migration_timestamp
24
+ Time.now.utc.strftime("%Y%m%d%H%M%S")
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ class AddRailsAiKitVectorColumnsTo<%= table_name.camelize %> < ActiveRecord::Migration[<%= Rails::VERSION::MAJOR %>.<%= Rails::VERSION::MINOR %>]
4
+ def change
5
+ enable_extension "vector" unless extension_enabled?("vector")
6
+
7
+ add_column :<%= table_name %>, :embedding, :vector, limit: <%= dimensions %>
8
+ add_column :<%= table_name %>, :label, :string
9
+ add_column :<%= table_name %>, :confidence_score, :float
10
+ end
11
+ end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails/generators"
4
+ require "rails/generators/active_record/migration/migration_generator"
5
+
6
+ module RailsAiKit
7
+ module Generators
8
+ class VectorColumnsGenerator < Rails::Generators::NamedBase
9
+ source_root File.expand_path("templates", __dir__)
10
+
11
+ desc "Adds embedding, label, and confidence_score columns to an existing table for vector_classify"
12
+
13
+ argument :content_column, type: :string, default: "content",
14
+ desc: "Name of the text column to classify (e.g. content, body)"
15
+
16
+ class_option :embedding_dimensions, type: :numeric, default: 1536,
17
+ desc: "Vector dimensions (must match your embedding provider)"
18
+
19
+ def create_migration
20
+ migration_template "add_vector_columns.rb",
21
+ "db/migrate/#{migration_timestamp}_add_rails_ai_kit_vector_columns_to_#{table_name}.rb"
22
+ end
23
+
24
+ private
25
+
26
+ def table_name
27
+ name.underscore.pluralize
28
+ end
29
+
30
+ def migration_timestamp
31
+ Time.now.utc.strftime("%Y%m%d%H%M%S")
32
+ end
33
+
34
+ def dimensions
35
+ options[:embedding_dimensions] || (defined?(RailsAiKit) && RailsAiKit.configuration.embedding_dimensions) || 1536
36
+ end
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,115 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ # Vector-based classifier using label embeddings and nearest-neighbor search.
5
+ # Train with examples per label, then classify text by comparing its embedding to label vectors.
6
+ class Classifier
7
+ attr_reader :classifier_name
8
+
9
+ def initialize(classifier_name: nil)
10
+ @classifier_name = classifier_name || RailsAiKit.configuration.default_classifier_name
11
+ end
12
+
13
+ # Train a label with example texts. Embeds each example and stores the (optionally averaged) label vector.
14
+ # @param label_name [String]
15
+ # @param examples [Array<String>] Example texts for this label
16
+ # @return [RailsAiKit::LabelRecord]
17
+ def train(label_name, examples:)
18
+ raise ArgumentError, "At least one example required" if examples.to_a.empty?
19
+
20
+ embeddings = RailsAiKit.embedding.embed_batch(examples.to_a)
21
+ # Average the vectors to get a single label embedding (common approach for few-shot)
22
+ dim = embeddings.first.size
23
+ avg = Array.new(dim, 0.0)
24
+ embeddings.each do |vec|
25
+ vec.each_with_index { |v, i| avg[i] += v }
26
+ end
27
+ n = embeddings.size.to_f
28
+ avg.map! { |x| x / n }
29
+
30
+ record = LabelRecord.find_or_initialize_by(classifier_name: classifier_name, label_name: label_name)
31
+ record.embedding = avg
32
+ record.save!
33
+ record
34
+ end
35
+
36
+ # Classify using an existing embedding vector (e.g. after storing). Avoids re-embedding.
37
+ # @param embedding [Array<Float>]
38
+ # @return [Hash] { label:, confidence:, distance: }
39
+ def classify_by_embedding(embedding)
40
+ return { label: nil, confidence: 0.0, distance: nil } if embedding.blank?
41
+
42
+ nearest = LabelRecord.for_classifier(classifier_name)
43
+ .nearest_neighbors(:embedding, embedding, distance: "cosine")
44
+ .limit(1)
45
+ .first
46
+
47
+ return { label: nil, confidence: 0.0, distance: nil } if nearest.blank?
48
+
49
+ distance = nearest.neighbor_distance
50
+ confidence = [0, 1 - distance].max
51
+ { label: nearest.label_name, confidence: confidence, distance: distance }
52
+ end
53
+
54
+ # Classify a single text. Returns predicted label, confidence (1 - normalized distance), and distance.
55
+ # @param text [String]
56
+ # @return [Hash] { label:, confidence:, distance: }
57
+ def classify(text)
58
+ query_embedding = RailsAiKit.embedding.embed(text)
59
+ nearest = LabelRecord.for_classifier(classifier_name)
60
+ .nearest_neighbors(:embedding, query_embedding, distance: "cosine")
61
+ .limit(1)
62
+ .first
63
+
64
+ return { label: nil, confidence: 0.0, distance: nil } if nearest.blank?
65
+
66
+ # Cosine distance in pgvector is 1 - cosine_similarity; we want confidence in [0,1]
67
+ distance = nearest.neighbor_distance
68
+ confidence = [0, 1 - distance].max
69
+
70
+ { label: nearest.label_name, confidence: confidence, distance: distance }
71
+ end
72
+
73
+ # Classify multiple records in batch. Expects each record to respond to the text getter and optional setters.
74
+ # @param records [Enumerable]
75
+ # @param text_attribute [Symbol, String] Method to get text from each record (e.g. :content)
76
+ # @param label_attribute [Symbol, String] Attribute to set for predicted label (default :label)
77
+ # @param confidence_attribute [Symbol, String] Attribute to set for confidence (default :confidence_score)
78
+ # @return [Array<Hash>] Array of { label:, confidence:, distance: } per record
79
+ def batch_classify(records, text_attribute: :content, label_attribute: :label, confidence_attribute: :confidence_score)
80
+ recs = records.to_a
81
+ return [] if recs.empty?
82
+
83
+ texts = recs.map { |r| r.public_send(text_attribute) }
84
+ embeddings = RailsAiKit.embedding.embed_batch(texts)
85
+
86
+ results = embeddings.map do |query_embedding|
87
+ nearest = LabelRecord.for_classifier(classifier_name)
88
+ .nearest_neighbors(:embedding, query_embedding, distance: "cosine")
89
+ .limit(1)
90
+ .first
91
+
92
+ if nearest.blank?
93
+ { label: nil, confidence: 0.0, distance: nil }
94
+ else
95
+ dist = nearest.neighbor_distance
96
+ conf = [0, 1 - dist].max
97
+ { label: nearest.label_name, confidence: conf, distance: dist }
98
+ end
99
+ end
100
+
101
+ recs.each_with_index do |record, i|
102
+ r = results[i]
103
+ record.public_send(:"#{label_attribute}=", r[:label]) if record.respond_to?(:"#{label_attribute}=")
104
+ record.public_send(:"#{confidence_attribute}=", r[:confidence]) if record.respond_to?(:"#{confidence_attribute}=")
105
+ end
106
+
107
+ results
108
+ end
109
+
110
+ # List all trained labels for this classifier.
111
+ def labels
112
+ LabelRecord.for_classifier(classifier_name).pluck(:label_name)
113
+ end
114
+ end
115
+ end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ class Configuration
5
+ attr_accessor :embedding_provider,
6
+ :embedding_dimensions,
7
+ :api_keys,
8
+ :default_classifier_name
9
+
10
+ def initialize
11
+ @embedding_provider = :openai
12
+ @embedding_dimensions = 1536
13
+ @api_keys = {}
14
+ @default_classifier_name = "default"
15
+ end
16
+
17
+ def api_key(provider = nil)
18
+ provider ||= embedding_provider
19
+ api_keys[provider.to_sym] || api_keys[provider.to_s]
20
+ end
21
+ end
22
+
23
+ def self.configuration
24
+ @configuration ||= Configuration.new
25
+ end
26
+
27
+ def self.configure
28
+ yield configuration
29
+ end
30
+ end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ module EmbeddingProviders
5
+ class Base
6
+ class NotImplementedError < RailsAiKit::Error; end
7
+
8
+ # @param api_key [String] Provider API key
9
+ # @param dimensions [Integer] Optional embedding dimensions (provider-dependent)
10
+ def initialize(api_key:, dimensions: nil)
11
+ @api_key = api_key
12
+ @dimensions = dimensions
13
+ end
14
+
15
+ # Generate embedding for a single text.
16
+ # @param text [String]
17
+ # @return [Array<Float>]
18
+ def embed(text)
19
+ raise NotImplementedError, "#{self.class}#embed must be implemented"
20
+ end
21
+
22
+ # Generate embeddings for multiple texts (optional batching).
23
+ # @param texts [Array<String>]
24
+ # @return [Array<Array<Float>>]
25
+ def embed_batch(texts)
26
+ texts.map { |text| embed(text) }
27
+ end
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ module EmbeddingProviders
5
+ # Cohere embed API: https://docs.cohere.com/reference/embed
6
+ class Cohere < Base
7
+ DEFAULT_MODEL = "embed-english-v3.0"
8
+ DEFAULT_DIMENSIONS = 1024
9
+
10
+ def initialize(api_key:, dimensions: nil, model: nil)
11
+ super(api_key: api_key, dimensions: dimensions)
12
+ @model = model || DEFAULT_MODEL
13
+ @dimensions ||= DEFAULT_DIMENSIONS
14
+ end
15
+
16
+ def embed(text)
17
+ response = client.post("/v1/embed") do |req|
18
+ req.body = {
19
+ texts: [text.to_s],
20
+ model: @model,
21
+ input_type: "search_document",
22
+ embedding_types: ["float"]
23
+ }.to_json
24
+ req.headers["Content-Type"] = "application/json"
25
+ req.headers["Authorization"] = "Bearer #{@api_key}"
26
+ end
27
+ parse_embedding_response(response)
28
+ end
29
+
30
+ def embed_batch(texts)
31
+ return [] if texts.empty?
32
+ response = client.post("/v1/embed") do |req|
33
+ req.body = {
34
+ texts: texts.map(&:to_s),
35
+ model: @model,
36
+ input_type: "search_document",
37
+ embedding_types: ["float"]
38
+ }.to_json
39
+ req.headers["Content-Type"] = "application/json"
40
+ req.headers["Authorization"] = "Bearer #{@api_key}"
41
+ end
42
+ parse_batch_embedding_response(response)
43
+ end
44
+
45
+ private
46
+
47
+ def client
48
+ @client ||= Faraday.new(url: "https://api.cohere.ai") do |f|
49
+ f.request :json
50
+ f.response :json
51
+ f.adapter Faraday.default_adapter
52
+ end
53
+ end
54
+
55
+ def parse_embedding_response(response)
56
+ raise RailsAiKit::Error, "Cohere API error: #{response.body}" unless response.success?
57
+ data = response.body
58
+ data.dig("embeddings", "float")&.first || raise(RailsAiKit::Error, "No embedding in response")
59
+ end
60
+
61
+ def parse_batch_embedding_response(response)
62
+ raise RailsAiKit::Error, "Cohere API error: #{response.body}" unless response.success?
63
+ data = response.body
64
+ data.dig("embeddings", "float") || []
65
+ end
66
+ end
67
+ end
68
+ end
@@ -0,0 +1,69 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ module EmbeddingProviders
5
+ class Openai < Base
6
+ DEFAULT_MODEL = "text-embedding-3-small"
7
+ DEFAULT_DIMENSIONS = 1536
8
+
9
+ def initialize(api_key:, dimensions: nil, model: nil)
10
+ super(api_key: api_key, dimensions: dimensions)
11
+ @model = model || DEFAULT_MODEL
12
+ end
13
+
14
+ def embed(text)
15
+ response = client.post("/v1/embeddings") do |req|
16
+ req.body = {
17
+ input: text.to_s,
18
+ model: @model,
19
+ dimensions: (@dimensions || default_dimensions)
20
+ }.compact.to_json
21
+ req.headers["Content-Type"] = "application/json"
22
+ req.headers["Authorization"] = "Bearer #{@api_key}"
23
+ end
24
+ parse_embedding_response(response)
25
+ end
26
+
27
+ def embed_batch(texts)
28
+ return [] if texts.empty?
29
+ response = client.post("/v1/embeddings") do |req|
30
+ req.body = {
31
+ input: texts.map(&:to_s),
32
+ model: @model,
33
+ dimensions: (@dimensions || default_dimensions)
34
+ }.compact.to_json
35
+ req.headers["Content-Type"] = "application/json"
36
+ req.headers["Authorization"] = "Bearer #{@api_key}"
37
+ end
38
+ parse_batch_embedding_response(response)
39
+ end
40
+
41
+ private
42
+
43
+ def client
44
+ @client ||= Faraday.new(url: "https://api.openai.com") do |f|
45
+ f.request :json
46
+ f.response :json
47
+ f.adapter Faraday.default_adapter
48
+ end
49
+ end
50
+
51
+ def default_dimensions
52
+ @model.to_s.include?("large") ? 3072 : DEFAULT_DIMENSIONS
53
+ end
54
+
55
+ def parse_embedding_response(response)
56
+ raise RailsAiKit::Error, "OpenAI API error: #{response.body}" unless response.success?
57
+ data = response.body
58
+ data["data"]&.first&.dig("embedding") || raise(RailsAiKit::Error, "No embedding in response")
59
+ end
60
+
61
+ def parse_batch_embedding_response(response)
62
+ raise RailsAiKit::Error, "OpenAI API error: #{response.body}" unless response.success?
63
+ data = response.body
64
+ items = data["data"] || []
65
+ items.sort_by { |e| e["index"] }.map { |e| e["embedding"] }
66
+ end
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,44 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ # Wraps the configured embedding provider and API keys.
5
+ # Use RailsAiKit.embedding.embed("hello") or .embed_batch([...])
6
+ class EmbeddingService
7
+ PROVIDERS = {
8
+ openai: EmbeddingProviders::Openai,
9
+ cohere: EmbeddingProviders::Cohere
10
+ }.freeze
11
+
12
+ def initialize(config: RailsAiKit.configuration)
13
+ @config = config
14
+ end
15
+
16
+ def embed(text)
17
+ provider.embed(text)
18
+ end
19
+
20
+ def embed_batch(texts)
21
+ provider.embed_batch(texts)
22
+ end
23
+
24
+ private
25
+
26
+ def provider
27
+ @provider ||= build_provider
28
+ end
29
+
30
+ def build_provider
31
+ name = @config.embedding_provider.to_sym
32
+ klass = PROVIDERS[name] || raise(ArgumentError, "Unknown embedding provider: #{name}")
33
+ api_key = @config.api_key(name) || raise(RailsAiKit::Error, "Missing API key for provider: #{name}. Set RailsAiKit.configuration.api_keys[:#{name}]")
34
+ klass.new(
35
+ api_key: api_key,
36
+ dimensions: @config.embedding_dimensions
37
+ )
38
+ end
39
+ end
40
+
41
+ def self.embedding
42
+ @embedding_service ||= EmbeddingService.new
43
+ end
44
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ # Internal model for storing label embeddings per classifier.
5
+ # Table: rails_ai_kit_labels (via generator).
6
+ # We use a constant so the user's app can reference the same table.
7
+ class LabelRecord < ActiveRecord::Base
8
+ self.table_name = "rails_ai_kit_labels"
9
+
10
+ has_neighbors :embedding, dimensions: -> { RailsAiKit.configuration.embedding_dimensions }
11
+
12
+ validates :classifier_name, presence: true
13
+ validates :label_name, presence: true
14
+ validates :embedding, presence: true
15
+ validates :label_name, uniqueness: { scope: :classifier_name }
16
+
17
+ scope :for_classifier, ->(name) { where(classifier_name: name) }
18
+ end
19
+ end
@@ -0,0 +1,71 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ # ActiveRecord integration: vector_classify macro and similarity search.
5
+ # Expects the model to have columns: embedding (vector), label (string), confidence_score (float).
6
+ module VectorClassify
7
+ extend ActiveSupport::Concern
8
+
9
+ class_methods do
10
+ # Declare vector-based classification on this model.
11
+ #
12
+ # @param source_column [Symbol] Attribute holding the text to embed and classify (e.g. :content)
13
+ # @param labels [Array<String>] Label names (for documentation / validation; training is via Classifier.train)
14
+ # @param embedding_column [Symbol] Column storing the vector (default :embedding)
15
+ # @param label_column [Symbol] Column to store predicted label (default :label)
16
+ # @param confidence_column [Symbol] Column to store confidence score (default :confidence_score)
17
+ # @param classifier_name [String, nil] Classifier namespace (default: model name)
18
+ # @param auto_classify [Boolean] Run classification on save (default true)
19
+ def vector_classify(
20
+ source_column,
21
+ labels: [],
22
+ embedding_column: :embedding,
23
+ label_column: :label,
24
+ confidence_column: :confidence_score,
25
+ classifier_name: nil,
26
+ auto_classify: true
27
+ )
28
+ include RailsAiKit::VectorClassify::InstanceMethods
29
+
30
+ cattr_accessor :rails_ai_kit_source_column, :rails_ai_kit_embedding_column,
31
+ :rails_ai_kit_label_column, :rails_ai_kit_confidence_column,
32
+ :rails_ai_kit_classifier_name, :rails_ai_kit_labels
33
+ self.rails_ai_kit_source_column = source_column.to_sym
34
+ self.rails_ai_kit_embedding_column = embedding_column.to_sym
35
+ self.rails_ai_kit_label_column = label_column.to_sym
36
+ self.rails_ai_kit_confidence_column = confidence_column.to_sym
37
+ self.rails_ai_kit_classifier_name = classifier_name || name
38
+ self.rails_ai_kit_labels = labels
39
+
40
+ has_neighbors embedding_column, dimensions: RailsAiKit.configuration.embedding_dimensions
41
+
42
+ if auto_classify
43
+ before_save :rails_ai_kit_compute_embedding_and_classify
44
+ end
45
+
46
+ # Similarity search by text: embeds the query and returns nearest records.
47
+ define_singleton_method(:similar_to) do |query_text, limit: 5|
48
+ vector = RailsAiKit.embedding.embed(query_text)
49
+ nearest_neighbors(embedding_column, vector, distance: "cosine").limit(limit)
50
+ end
51
+ end
52
+ end
53
+
54
+ module InstanceMethods
55
+ private
56
+
57
+ def rails_ai_kit_compute_embedding_and_classify
58
+ source = send(self.class.rails_ai_kit_source_column)
59
+ return if source.blank?
60
+
61
+ embedding = RailsAiKit.embedding.embed(source)
62
+ send(:"#{self.class.rails_ai_kit_embedding_column}=", embedding)
63
+
64
+ classifier = RailsAiKit::Classifier.new(classifier_name: self.class.rails_ai_kit_classifier_name)
65
+ result = classifier.classify_by_embedding(embedding)
66
+ send(:"#{self.class.rails_ai_kit_label_column}=", result[:label])
67
+ send(:"#{self.class.rails_ai_kit_confidence_column}=", result[:confidence])
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsAiKit
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,25 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "rails_ai_kit/version"
4
+ require_relative "rails_ai_kit/configuration"
5
+ require_relative "rails_ai_kit/embedding_providers/base"
6
+ require_relative "rails_ai_kit/embedding_providers/openai"
7
+ require_relative "rails_ai_kit/embedding_providers/cohere"
8
+ require_relative "rails_ai_kit/embedding_service"
9
+ require_relative "rails_ai_kit/label_record"
10
+ require_relative "rails_ai_kit/classifier"
11
+ require_relative "rails_ai_kit/vector_classify"
12
+
13
+ module RailsAiKit
14
+ class Error < StandardError; end
15
+
16
+ # Top-level classifier using default classifier name. For custom name use Classifier.new(classifier_name: "MyClassifier").
17
+ def self.classifier(classifier_name = nil)
18
+ Classifier.new(classifier_name: classifier_name)
19
+ end
20
+ end
21
+
22
+ # Optional Rails integration: extend ActiveRecord so vector_classify is available.
23
+ if defined?(ActiveRecord::Base)
24
+ ActiveRecord::Base.include RailsAiKit::VectorClassify
25
+ end
metadata ADDED
@@ -0,0 +1,117 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rails_ai_kit
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Rails AI Kit Contributors
8
+ bindir: bin
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: activerecord
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ">="
17
+ - !ruby/object:Gem::Version
18
+ version: '6.0'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - ">="
24
+ - !ruby/object:Gem::Version
25
+ version: '6.0'
26
+ - !ruby/object:Gem::Dependency
27
+ name: activesupport
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: '6.0'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - ">="
38
+ - !ruby/object:Gem::Version
39
+ version: '6.0'
40
+ - !ruby/object:Gem::Dependency
41
+ name: faraday
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '1.0'
47
+ type: :runtime
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '1.0'
54
+ - !ruby/object:Gem::Dependency
55
+ name: neighbor
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: '0.2'
61
+ type: :runtime
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0.2'
68
+ description: 'Rails AI Kit provides ready-made tools to classify data using vector
69
+ similarity: label training, auto/batch classification, similarity search, and filtering—without
70
+ ML training or LLM calls.'
71
+ email:
72
+ - rohit.kushwaha@w3villa.com
73
+ executables: []
74
+ extensions: []
75
+ extra_rdoc_files: []
76
+ files:
77
+ - CHANGELOG.md
78
+ - README.md
79
+ - lib/generators/rails_ai_kit/install/templates/create_rails_ai_kit_labels.rb
80
+ - lib/generators/rails_ai_kit/install/templates/rails_ai_kit.rb
81
+ - lib/generators/rails_ai_kit/install_generator.rb
82
+ - lib/generators/rails_ai_kit/vector_columns/templates/add_vector_columns.rb
83
+ - lib/generators/rails_ai_kit/vector_columns_generator.rb
84
+ - lib/rails_ai_kit.rb
85
+ - lib/rails_ai_kit/classifier.rb
86
+ - lib/rails_ai_kit/configuration.rb
87
+ - lib/rails_ai_kit/embedding_providers/base.rb
88
+ - lib/rails_ai_kit/embedding_providers/cohere.rb
89
+ - lib/rails_ai_kit/embedding_providers/openai.rb
90
+ - lib/rails_ai_kit/embedding_service.rb
91
+ - lib/rails_ai_kit/label_record.rb
92
+ - lib/rails_ai_kit/vector_classify.rb
93
+ - lib/rails_ai_kit/version.rb
94
+ homepage: https://github.com/your-org/rails_ai_kit
95
+ licenses: []
96
+ metadata:
97
+ homepage_uri: https://github.com/your-org/rails_ai_kit
98
+ source_code_uri: https://github.com/your-org/rails_ai_kit
99
+ changelog_uri: https://github.com/your-org/rails_ai_kit/blob/main/CHANGELOG.md
100
+ rdoc_options: []
101
+ require_paths:
102
+ - lib
103
+ required_ruby_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: 3.0.0
108
+ required_rubygems_version: !ruby/object:Gem::Requirement
109
+ requirements:
110
+ - - ">="
111
+ - !ruby/object:Gem::Version
112
+ version: '0'
113
+ requirements: []
114
+ rubygems_version: 4.0.7
115
+ specification_version: 4
116
+ summary: Vector-based classification layer on top of pgvector for Rails
117
+ test_files: []