rails_ai_kit 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +16 -0
- data/README.md +271 -0
- data/lib/generators/rails_ai_kit/install/templates/create_rails_ai_kit_labels.rb +16 -0
- data/lib/generators/rails_ai_kit/install/templates/rails_ai_kit.rb +15 -0
- data/lib/generators/rails_ai_kit/install_generator.rb +28 -0
- data/lib/generators/rails_ai_kit/vector_columns/templates/add_vector_columns.rb +11 -0
- data/lib/generators/rails_ai_kit/vector_columns_generator.rb +39 -0
- data/lib/rails_ai_kit/classifier.rb +115 -0
- data/lib/rails_ai_kit/configuration.rb +30 -0
- data/lib/rails_ai_kit/embedding_providers/base.rb +30 -0
- data/lib/rails_ai_kit/embedding_providers/cohere.rb +68 -0
- data/lib/rails_ai_kit/embedding_providers/openai.rb +69 -0
- data/lib/rails_ai_kit/embedding_service.rb +44 -0
- data/lib/rails_ai_kit/label_record.rb +19 -0
- data/lib/rails_ai_kit/vector_classify.rb +71 -0
- data/lib/rails_ai_kit/version.rb +5 -0
- data/lib/rails_ai_kit.rb +25 -0
- metadata +117 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: d497000abdc4e0a8f02f6167570d5d00888979e31ed5e484fc707647ebdface5
|
|
4
|
+
data.tar.gz: b721aeaeddabe1c8dce81f6aca4ff94188623c6cdb8fd5b22dd5199a7d1fb3bf
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 0f6a1a9ec8f3f9a4d0404f2ef5b7b8dc61ad41bee7a7557d3701a7e958786114430ca4b5c3db4e1581b98bda4b280856eca1a1cd7152987c6d5db54ebc4b1b63
|
|
7
|
+
data.tar.gz: 43316a2c58575ae72429deb036cd3499b42bd19ec5c7fadf15404b46a864fdcffa3b41a33b24815fef16dd426ace74bac38fe8460e6e722c9639785ee5c4d7ef
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## [Unreleased]
|
|
4
|
+
|
|
5
|
+
## [0.1.0] - 2025-03-07
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
|
|
9
|
+
- Vector-based classification layer on top of pgvector
|
|
10
|
+
- Configuration for embedding provider (OpenAI, Cohere) and API keys
|
|
11
|
+
- `RailsAiKit::Classifier` with `train`, `classify`, `classify_by_embedding`, `batch_classify`, `labels`
|
|
12
|
+
- `vector_classify` macro for ActiveRecord models (auto embed + classify on save)
|
|
13
|
+
- Similarity search: `Model.similar_to("query text", limit: 5)`
|
|
14
|
+
- Generators: `rails_ai_kit:install`, `rails_ai_kit:vector_columns`
|
|
15
|
+
- Label storage in `rails_ai_kit_labels` (classifier_name, label_name, embedding)
|
|
16
|
+
- Embedding providers: OpenAI (text-embedding-3-small), Cohere (embed-english-v3.0)
|
data/README.md
ADDED
|
@@ -0,0 +1,271 @@
|
|
|
1
|
+
# Rails AI Kit
|
|
2
|
+
|
|
3
|
+
A Rails gem that adds a **classification layer on top of [pgvector](https://github.com/pgvector/pgvector)**. Instead of building custom ML or calling LLMs every time, use vector similarity to classify data: support tickets, content moderation, ecommerce categories, document routing, and more.
|
|
4
|
+
|
|
5
|
+
- **No ML training** – train labels with example texts and compare embeddings
|
|
6
|
+
- **No LLM cost** – one-time embedding per piece of content; classification is nearest-neighbor in PostgreSQL
|
|
7
|
+
- **Rails-friendly** – `vector_classify`, `Classifier.train`, `Classifier.classify`, `Article.similar_to("query")`
|
|
8
|
+
- **Multi-provider embeddings** – pass API keys for OpenAI, Cohere, or your own provider
|
|
9
|
+
|
|
10
|
+
## Architecture
|
|
11
|
+
|
|
12
|
+
```
|
|
13
|
+
Your Rails App
|
|
14
|
+
│
|
|
15
|
+
▼
|
|
16
|
+
Rails AI Kit (classification + indexing + similarity)
|
|
17
|
+
│
|
|
18
|
+
▼
|
|
19
|
+
pgvector (PostgreSQL)
|
|
20
|
+
│
|
|
21
|
+
▼
|
|
22
|
+
vector similarity search → predicted label
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
The gem does **not** store your application data. You store data in your own PostgreSQL tables with pgvector. The gem provides:
|
|
26
|
+
|
|
27
|
+
- Classification logic (label training, classify by similarity)
|
|
28
|
+
- Indexing helpers (migrations for vector columns and label table)
|
|
29
|
+
- Similarity search (`similar_to`)
|
|
30
|
+
- Filtering (standard `where(label: "sports")`)
|
|
31
|
+
|
|
32
|
+
## Requirements
|
|
33
|
+
|
|
34
|
+
- Rails 6+
|
|
35
|
+
- PostgreSQL with [pgvector](https://github.com/pgvector/pgvector) extension
|
|
36
|
+
- An embedding API (OpenAI or Cohere) and API key
|
|
37
|
+
|
|
38
|
+
## Installation
|
|
39
|
+
|
|
40
|
+
Add to your Gemfile:
|
|
41
|
+
|
|
42
|
+
```ruby
|
|
43
|
+
gem "rails_ai_kit"
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Then:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
bundle install
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### 1. Enable pgvector and create the labels table
|
|
53
|
+
|
|
54
|
+
The gem needs one internal table to store **label embeddings** (one vector per label per classifier):
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
rails g rails_ai_kit:install
|
|
58
|
+
rails db:migrate
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### 2. Configure embedding provider and API keys
|
|
62
|
+
|
|
63
|
+
In `config/initializers/rails_ai_kit.rb` (create the file):
|
|
64
|
+
|
|
65
|
+
```ruby
|
|
66
|
+
RailsAiKit.configure do |config|
|
|
67
|
+
config.embedding_provider = :openai # or :cohere
|
|
68
|
+
config.embedding_dimensions = 1536 # 1536 for OpenAI text-embedding-3-small
|
|
69
|
+
|
|
70
|
+
# Pass API keys for the provider(s) you use
|
|
71
|
+
config.api_keys = {
|
|
72
|
+
openai: ENV["OPENAI_API_KEY"],
|
|
73
|
+
cohere: ENV["COHERE_API_KEY"]
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
config.default_classifier_name = "default"
|
|
77
|
+
end
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Use environment variables or Rails credentials; do not commit raw API keys.
|
|
81
|
+
|
|
82
|
+
### 3. Add vector columns to your model
|
|
83
|
+
|
|
84
|
+
For a new or existing table (e.g. `articles` with a `content` column):
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
rails g rails_ai_kit:vector_columns Article content
|
|
88
|
+
rails db:migrate
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
This adds `embedding` (vector), `label` (string), and `confidence_score` (float).
|
|
92
|
+
|
|
93
|
+
## Usage
|
|
94
|
+
|
|
95
|
+
### Declare vector classification on a model
|
|
96
|
+
|
|
97
|
+
```ruby
|
|
98
|
+
class Article < ApplicationRecord
|
|
99
|
+
vector_classify :content,
|
|
100
|
+
labels: ["sports", "politics", "technology"]
|
|
101
|
+
end
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
On save, the gem will:
|
|
105
|
+
|
|
106
|
+
1. Generate an embedding for `content`
|
|
107
|
+
2. Store it in `embedding`
|
|
108
|
+
3. Run classification (nearest label vector) and set `label` and `confidence_score`
|
|
109
|
+
|
|
110
|
+
Example:
|
|
111
|
+
|
|
112
|
+
```ruby
|
|
113
|
+
article = Article.create!(content: "Apple released a new iPhone")
|
|
114
|
+
article.label # => "technology"
|
|
115
|
+
article.confidence_score # => 0.91
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Train labels with examples
|
|
119
|
+
|
|
120
|
+
Before classifying, train each label with example texts so the gem can build a label vector (average of example embeddings):
|
|
121
|
+
|
|
122
|
+
```ruby
|
|
123
|
+
classifier = RailsAiKit.classifier("Article") # or use default
|
|
124
|
+
|
|
125
|
+
classifier.train("sports", examples: [
|
|
126
|
+
"football match",
|
|
127
|
+
"cricket tournament",
|
|
128
|
+
"Olympic gold medal"
|
|
129
|
+
])
|
|
130
|
+
|
|
131
|
+
classifier.train("politics", examples: [
|
|
132
|
+
"election results",
|
|
133
|
+
"parliament debate"
|
|
134
|
+
])
|
|
135
|
+
|
|
136
|
+
classifier.train("technology", examples: [
|
|
137
|
+
"new iPhone launch",
|
|
138
|
+
"AI software update"
|
|
139
|
+
])
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
You can use the default classifier name or pass a custom one:
|
|
143
|
+
|
|
144
|
+
```ruby
|
|
145
|
+
RailsAiKit.classifier("Article").train("sports", examples: [...])
|
|
146
|
+
RailsAiKit.classifier.train("sports", examples: [...]) # default classifier
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Classify text without saving
|
|
150
|
+
|
|
151
|
+
```ruby
|
|
152
|
+
result = RailsAiKit.classifier("Article").classify("India won the cricket match")
|
|
153
|
+
# => { label: "sports", confidence: 0.91, distance: 0.09 }
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Batch classification
|
|
157
|
+
|
|
158
|
+
```ruby
|
|
159
|
+
records = Article.where(label: nil).limit(100)
|
|
160
|
+
RailsAiKit.classifier("Article").batch_classify(records,
|
|
161
|
+
text_attribute: :content,
|
|
162
|
+
label_attribute: :label,
|
|
163
|
+
confidence_attribute: :confidence_score
|
|
164
|
+
)
|
|
165
|
+
# Optionally save: records.each(&:save!)
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Filtering
|
|
169
|
+
|
|
170
|
+
Use normal ActiveRecord scopes:
|
|
171
|
+
|
|
172
|
+
```ruby
|
|
173
|
+
Article.where(label: "sports")
|
|
174
|
+
Article.where("confidence_score >= ?", 0.8)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Similarity search
|
|
178
|
+
|
|
179
|
+
Find records similar to a piece of text (embeds the query, then nearest-neighbor search):
|
|
180
|
+
|
|
181
|
+
```ruby
|
|
182
|
+
Article.similar_to("new iPhone launch", limit: 5)
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
## Example: Support ticket routing
|
|
186
|
+
|
|
187
|
+
```ruby
|
|
188
|
+
# app/models/support_ticket.rb
|
|
189
|
+
class SupportTicket < ApplicationRecord
|
|
190
|
+
vector_classify :message,
|
|
191
|
+
labels: ["billing", "technical", "account"],
|
|
192
|
+
classifier_name: "SupportTicket"
|
|
193
|
+
end
|
|
194
|
+
|
|
195
|
+
# Train once
|
|
196
|
+
classifier = RailsAiKit.classifier("SupportTicket")
|
|
197
|
+
classifier.train("billing", examples: ["My payment failed", "Refund request", "Invoice issue"])
|
|
198
|
+
classifier.train("technical", examples: ["App crashed", "Login not working", "Error message"])
|
|
199
|
+
classifier.train("account", examples: ["Change email", "Close my account", "Password reset"])
|
|
200
|
+
|
|
201
|
+
# Incoming ticket
|
|
202
|
+
ticket = SupportTicket.create!(message: "My payment failed last night")
|
|
203
|
+
ticket.label # => "billing" → route to billing queue
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
## Configuration reference
|
|
207
|
+
|
|
208
|
+
| Option | Description | Default |
|
|
209
|
+
|--------|-------------|---------|
|
|
210
|
+
| `embedding_provider` | `:openai` or `:cohere` | `:openai` |
|
|
211
|
+
| `embedding_dimensions` | Vector size (must match provider) | `1536` |
|
|
212
|
+
| `api_keys` | Hash of provider => API key | `{}` |
|
|
213
|
+
| `default_classifier_name` | Name when no classifier given | `"default"` |
|
|
214
|
+
|
|
215
|
+
## Generators
|
|
216
|
+
|
|
217
|
+
| Generator | Purpose |
|
|
218
|
+
|-----------|---------|
|
|
219
|
+
| `rails g rails_ai_kit:install` | Migration: enable pgvector + create `rails_ai_kit_labels` |
|
|
220
|
+
| `rails g rails_ai_kit:vector_columns ModelName content_column` | Migration: add `embedding`, `label`, `confidence_score` to a table |
|
|
221
|
+
|
|
222
|
+
## How it works
|
|
223
|
+
|
|
224
|
+
1. **Label vectors** – Each label is represented by a vector (average of example embeddings). Stored in `rails_ai_kit_labels`.
|
|
225
|
+
2. **Classification** – New content is embedded and compared to all label vectors with cosine distance. The nearest label wins; confidence is `1 - distance`.
|
|
226
|
+
3. **Storage** – Your table holds the content, its embedding, the predicted label, and confidence. The gem only adds one table for label vectors.
|
|
227
|
+
|
|
228
|
+
## Future ideas
|
|
229
|
+
|
|
230
|
+
- Hierarchical labels (e.g. technology → mobile, laptops)
|
|
231
|
+
- Confidence threshold (e.g. mark as "unknown" if < 0.7)
|
|
232
|
+
- Hybrid search (vector + keyword)
|
|
233
|
+
- Incremental learning (add examples to improve labels over time)
|
|
234
|
+
|
|
235
|
+
## Development
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
bundle install
|
|
239
|
+
bundle exec rake install
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
Run tests (when added) with `bundle exec rspec` or `bundle exec rake test`.
|
|
243
|
+
|
|
244
|
+
## License
|
|
245
|
+
|
|
246
|
+
MIT.
|
|
247
|
+
|
|
248
|
+
## How it’s built
|
|
249
|
+
|
|
250
|
+
- **Configuration** (`lib/rails_ai_kit/configuration.rb`) – Embeding provider, dimensions, and API keys (e.g. `api_keys[:openai]`).
|
|
251
|
+
- **Embedding providers** (`lib/rails_ai_kit/embedding_providers/`) – Base class plus OpenAI and Cohere. Each implements `embed(text)` and `embed_batch(texts)` using the provider API.
|
|
252
|
+
- **EmbeddingService** – Wraps the configured provider and API key so `RailsAiKit.embedding.embed(text)` works without passing keys every time.
|
|
253
|
+
- **LabelRecord** – ActiveRecord model for `rails_ai_kit_labels` (classifier_name, label_name, embedding). Uses Neighbor’s `has_neighbors :embedding` for similarity.
|
|
254
|
+
- **Classifier** – Trains labels by averaging example embeddings and storing them; classifies by nearest-neighbor (cosine) against those label vectors. Supports `classify(text)`, `classify_by_embedding(vector)`, and `batch_classify(records)`.
|
|
255
|
+
- **VectorClassify** – Concern that adds the `vector_classify` macro: `has_neighbors` on the embedding column, a before_save that embeds the source column and runs `classify_by_embedding`, and a `similar_to(query_text)` scope that embeds the query and runs nearest-neighbor search.
|
|
256
|
+
|
|
257
|
+
## How it’s built
|
|
258
|
+
|
|
259
|
+
- **Configuration** (`lib/rails_ai_kit/configuration.rb`) – Embedding provider, dimensions, and API keys (e.g. `api_keys[:openai]`).
|
|
260
|
+
- **Embedding providers** (`lib/rails_ai_kit/embedding_providers/`) – Base class plus OpenAI and Cohere. Each implements `embed(text)` and `embed_batch(texts)` using the provider API.
|
|
261
|
+
- **EmbeddingService** – Wraps the configured provider and API key so `RailsAiKit.embedding.embed(text)` works without passing keys every time.
|
|
262
|
+
- **LabelRecord** – ActiveRecord model for `rails_ai_kit_labels` (classifier_name, label_name, embedding). Uses Neighbor’s `has_neighbors :embedding` for similarity.
|
|
263
|
+
- **Classifier** – Trains labels by averaging example embeddings and storing them; classifies by nearest-neighbor (cosine) against those label vectors. Supports `classify(text)`, `classify_by_embedding(vector)`, and `batch_classify(records)`.
|
|
264
|
+
- **VectorClassify** – Concern that adds the `vector_classify` macro: `has_neighbors` on the embedding column, a before_save that embeds the source column and runs `classify_by_embedding`, and a `similar_to(query_text)` scope that embeds the query and runs nearest-neighbor search.
|
|
265
|
+
- **Generators** – `rails_ai_kit:install` creates the labels table migration; `rails_ai_kit:vector_columns` adds embedding/label/confidence_score to a given table.
|
|
266
|
+
|
|
267
|
+
## Related
|
|
268
|
+
|
|
269
|
+
- [pgvector](https://github.com/pgvector/pgvector) – Open-source vector similarity search for Postgres
|
|
270
|
+
- [Neighbor](https://github.com/ankane/neighbor) – Nearest neighbor search for Rails (used by this gem)
|
|
271
|
+
test
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
class CreateRailsAiKitLabels < ActiveRecord::Migration[<%= Rails::VERSION::MAJOR %>.<%= Rails::VERSION::MINOR %>]
|
|
4
|
+
def change
|
|
5
|
+
enable_extension "vector" unless extension_enabled?("vector")
|
|
6
|
+
|
|
7
|
+
create_table :rails_ai_kit_labels do |t|
|
|
8
|
+
t.string :classifier_name, null: false
|
|
9
|
+
t.string :label_name, null: false
|
|
10
|
+
t.vector :embedding, limit: <%= (defined?(RailsAiKit) && RailsAiKit.configuration.embedding_dimensions) || 1536 %>, null: false
|
|
11
|
+
t.timestamps
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
add_index :rails_ai_kit_labels, [:classifier_name, :label_name], unique: true
|
|
15
|
+
end
|
|
16
|
+
end
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Configure Rails AI Kit (vector classification with pgvector).
|
|
4
|
+
# Set your embedding provider and API keys via ENV or Rails credentials.
|
|
5
|
+
RailsAiKit.configure do |config|
|
|
6
|
+
config.embedding_provider = :openai
|
|
7
|
+
config.embedding_dimensions = 1536
|
|
8
|
+
|
|
9
|
+
config.api_keys = {
|
|
10
|
+
openai: ENV["OPENAI_API_KEY"],
|
|
11
|
+
cohere: ENV["COHERE_API_KEY"]
|
|
12
|
+
}
|
|
13
|
+
|
|
14
|
+
config.default_classifier_name = "default"
|
|
15
|
+
end
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "rails/generators"
|
|
4
|
+
|
|
5
|
+
module RailsAiKit
|
|
6
|
+
module Generators
|
|
7
|
+
class InstallGenerator < Rails::Generators::Base
|
|
8
|
+
source_root File.expand_path("templates", __dir__)
|
|
9
|
+
|
|
10
|
+
desc "Creates a migration to enable pgvector and add rails_ai_kit_labels table for label embeddings"
|
|
11
|
+
|
|
12
|
+
def create_migration
|
|
13
|
+
migration_template "create_rails_ai_kit_labels.rb",
|
|
14
|
+
"db/migrate/#{migration_timestamp}_create_rails_ai_kit_labels.rb"
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def create_initializer
|
|
18
|
+
copy_file "rails_ai_kit.rb", "config/initializers/rails_ai_kit.rb"
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
private
|
|
22
|
+
|
|
23
|
+
def migration_timestamp
|
|
24
|
+
Time.now.utc.strftime("%Y%m%d%H%M%S")
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
class AddRailsAiKitVectorColumnsTo<%= table_name.camelize %> < ActiveRecord::Migration[<%= Rails::VERSION::MAJOR %>.<%= Rails::VERSION::MINOR %>]
|
|
4
|
+
def change
|
|
5
|
+
enable_extension "vector" unless extension_enabled?("vector")
|
|
6
|
+
|
|
7
|
+
add_column :<%= table_name %>, :embedding, :vector, limit: <%= dimensions %>
|
|
8
|
+
add_column :<%= table_name %>, :label, :string
|
|
9
|
+
add_column :<%= table_name %>, :confidence_score, :float
|
|
10
|
+
end
|
|
11
|
+
end
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "rails/generators"
|
|
4
|
+
require "rails/generators/active_record/migration/migration_generator"
|
|
5
|
+
|
|
6
|
+
module RailsAiKit
|
|
7
|
+
module Generators
|
|
8
|
+
class VectorColumnsGenerator < Rails::Generators::NamedBase
|
|
9
|
+
source_root File.expand_path("templates", __dir__)
|
|
10
|
+
|
|
11
|
+
desc "Adds embedding, label, and confidence_score columns to an existing table for vector_classify"
|
|
12
|
+
|
|
13
|
+
argument :content_column, type: :string, default: "content",
|
|
14
|
+
desc: "Name of the text column to classify (e.g. content, body)"
|
|
15
|
+
|
|
16
|
+
class_option :embedding_dimensions, type: :numeric, default: 1536,
|
|
17
|
+
desc: "Vector dimensions (must match your embedding provider)"
|
|
18
|
+
|
|
19
|
+
def create_migration
|
|
20
|
+
migration_template "add_vector_columns.rb",
|
|
21
|
+
"db/migrate/#{migration_timestamp}_add_rails_ai_kit_vector_columns_to_#{table_name}.rb"
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
private
|
|
25
|
+
|
|
26
|
+
def table_name
|
|
27
|
+
name.underscore.pluralize
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def migration_timestamp
|
|
31
|
+
Time.now.utc.strftime("%Y%m%d%H%M%S")
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def dimensions
|
|
35
|
+
options[:embedding_dimensions] || (defined?(RailsAiKit) && RailsAiKit.configuration.embedding_dimensions) || 1536
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
end
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
# Vector-based classifier using label embeddings and nearest-neighbor search.
|
|
5
|
+
# Train with examples per label, then classify text by comparing its embedding to label vectors.
|
|
6
|
+
class Classifier
|
|
7
|
+
attr_reader :classifier_name
|
|
8
|
+
|
|
9
|
+
def initialize(classifier_name: nil)
|
|
10
|
+
@classifier_name = classifier_name || RailsAiKit.configuration.default_classifier_name
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
# Train a label with example texts. Embeds each example and stores the (optionally averaged) label vector.
|
|
14
|
+
# @param label_name [String]
|
|
15
|
+
# @param examples [Array<String>] Example texts for this label
|
|
16
|
+
# @return [RailsAiKit::LabelRecord]
|
|
17
|
+
def train(label_name, examples:)
|
|
18
|
+
raise ArgumentError, "At least one example required" if examples.to_a.empty?
|
|
19
|
+
|
|
20
|
+
embeddings = RailsAiKit.embedding.embed_batch(examples.to_a)
|
|
21
|
+
# Average the vectors to get a single label embedding (common approach for few-shot)
|
|
22
|
+
dim = embeddings.first.size
|
|
23
|
+
avg = Array.new(dim, 0.0)
|
|
24
|
+
embeddings.each do |vec|
|
|
25
|
+
vec.each_with_index { |v, i| avg[i] += v }
|
|
26
|
+
end
|
|
27
|
+
n = embeddings.size.to_f
|
|
28
|
+
avg.map! { |x| x / n }
|
|
29
|
+
|
|
30
|
+
record = LabelRecord.find_or_initialize_by(classifier_name: classifier_name, label_name: label_name)
|
|
31
|
+
record.embedding = avg
|
|
32
|
+
record.save!
|
|
33
|
+
record
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
# Classify using an existing embedding vector (e.g. after storing). Avoids re-embedding.
|
|
37
|
+
# @param embedding [Array<Float>]
|
|
38
|
+
# @return [Hash] { label:, confidence:, distance: }
|
|
39
|
+
def classify_by_embedding(embedding)
|
|
40
|
+
return { label: nil, confidence: 0.0, distance: nil } if embedding.blank?
|
|
41
|
+
|
|
42
|
+
nearest = LabelRecord.for_classifier(classifier_name)
|
|
43
|
+
.nearest_neighbors(:embedding, embedding, distance: "cosine")
|
|
44
|
+
.limit(1)
|
|
45
|
+
.first
|
|
46
|
+
|
|
47
|
+
return { label: nil, confidence: 0.0, distance: nil } if nearest.blank?
|
|
48
|
+
|
|
49
|
+
distance = nearest.neighbor_distance
|
|
50
|
+
confidence = [0, 1 - distance].max
|
|
51
|
+
{ label: nearest.label_name, confidence: confidence, distance: distance }
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
# Classify a single text. Returns predicted label, confidence (1 - normalized distance), and distance.
|
|
55
|
+
# @param text [String]
|
|
56
|
+
# @return [Hash] { label:, confidence:, distance: }
|
|
57
|
+
def classify(text)
|
|
58
|
+
query_embedding = RailsAiKit.embedding.embed(text)
|
|
59
|
+
nearest = LabelRecord.for_classifier(classifier_name)
|
|
60
|
+
.nearest_neighbors(:embedding, query_embedding, distance: "cosine")
|
|
61
|
+
.limit(1)
|
|
62
|
+
.first
|
|
63
|
+
|
|
64
|
+
return { label: nil, confidence: 0.0, distance: nil } if nearest.blank?
|
|
65
|
+
|
|
66
|
+
# Cosine distance in pgvector is 1 - cosine_similarity; we want confidence in [0,1]
|
|
67
|
+
distance = nearest.neighbor_distance
|
|
68
|
+
confidence = [0, 1 - distance].max
|
|
69
|
+
|
|
70
|
+
{ label: nearest.label_name, confidence: confidence, distance: distance }
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
# Classify multiple records in batch. Expects each record to respond to the text getter and optional setters.
|
|
74
|
+
# @param records [Enumerable]
|
|
75
|
+
# @param text_attribute [Symbol, String] Method to get text from each record (e.g. :content)
|
|
76
|
+
# @param label_attribute [Symbol, String] Attribute to set for predicted label (default :label)
|
|
77
|
+
# @param confidence_attribute [Symbol, String] Attribute to set for confidence (default :confidence_score)
|
|
78
|
+
# @return [Array<Hash>] Array of { label:, confidence:, distance: } per record
|
|
79
|
+
def batch_classify(records, text_attribute: :content, label_attribute: :label, confidence_attribute: :confidence_score)
|
|
80
|
+
recs = records.to_a
|
|
81
|
+
return [] if recs.empty?
|
|
82
|
+
|
|
83
|
+
texts = recs.map { |r| r.public_send(text_attribute) }
|
|
84
|
+
embeddings = RailsAiKit.embedding.embed_batch(texts)
|
|
85
|
+
|
|
86
|
+
results = embeddings.map do |query_embedding|
|
|
87
|
+
nearest = LabelRecord.for_classifier(classifier_name)
|
|
88
|
+
.nearest_neighbors(:embedding, query_embedding, distance: "cosine")
|
|
89
|
+
.limit(1)
|
|
90
|
+
.first
|
|
91
|
+
|
|
92
|
+
if nearest.blank?
|
|
93
|
+
{ label: nil, confidence: 0.0, distance: nil }
|
|
94
|
+
else
|
|
95
|
+
dist = nearest.neighbor_distance
|
|
96
|
+
conf = [0, 1 - dist].max
|
|
97
|
+
{ label: nearest.label_name, confidence: conf, distance: dist }
|
|
98
|
+
end
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
recs.each_with_index do |record, i|
|
|
102
|
+
r = results[i]
|
|
103
|
+
record.public_send(:"#{label_attribute}=", r[:label]) if record.respond_to?(:"#{label_attribute}=")
|
|
104
|
+
record.public_send(:"#{confidence_attribute}=", r[:confidence]) if record.respond_to?(:"#{confidence_attribute}=")
|
|
105
|
+
end
|
|
106
|
+
|
|
107
|
+
results
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
# List all trained labels for this classifier.
|
|
111
|
+
def labels
|
|
112
|
+
LabelRecord.for_classifier(classifier_name).pluck(:label_name)
|
|
113
|
+
end
|
|
114
|
+
end
|
|
115
|
+
end
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
class Configuration
|
|
5
|
+
attr_accessor :embedding_provider,
|
|
6
|
+
:embedding_dimensions,
|
|
7
|
+
:api_keys,
|
|
8
|
+
:default_classifier_name
|
|
9
|
+
|
|
10
|
+
def initialize
|
|
11
|
+
@embedding_provider = :openai
|
|
12
|
+
@embedding_dimensions = 1536
|
|
13
|
+
@api_keys = {}
|
|
14
|
+
@default_classifier_name = "default"
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def api_key(provider = nil)
|
|
18
|
+
provider ||= embedding_provider
|
|
19
|
+
api_keys[provider.to_sym] || api_keys[provider.to_s]
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
def self.configuration
|
|
24
|
+
@configuration ||= Configuration.new
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def self.configure
|
|
28
|
+
yield configuration
|
|
29
|
+
end
|
|
30
|
+
end
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
module EmbeddingProviders
|
|
5
|
+
class Base
|
|
6
|
+
class NotImplementedError < RailsAiKit::Error; end
|
|
7
|
+
|
|
8
|
+
# @param api_key [String] Provider API key
|
|
9
|
+
# @param dimensions [Integer] Optional embedding dimensions (provider-dependent)
|
|
10
|
+
def initialize(api_key:, dimensions: nil)
|
|
11
|
+
@api_key = api_key
|
|
12
|
+
@dimensions = dimensions
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
# Generate embedding for a single text.
|
|
16
|
+
# @param text [String]
|
|
17
|
+
# @return [Array<Float>]
|
|
18
|
+
def embed(text)
|
|
19
|
+
raise NotImplementedError, "#{self.class}#embed must be implemented"
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
# Generate embeddings for multiple texts (optional batching).
|
|
23
|
+
# @param texts [Array<String>]
|
|
24
|
+
# @return [Array<Array<Float>>]
|
|
25
|
+
def embed_batch(texts)
|
|
26
|
+
texts.map { |text| embed(text) }
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
end
|
|
30
|
+
end
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
module EmbeddingProviders
|
|
5
|
+
# Cohere embed API: https://docs.cohere.com/reference/embed
|
|
6
|
+
class Cohere < Base
|
|
7
|
+
DEFAULT_MODEL = "embed-english-v3.0"
|
|
8
|
+
DEFAULT_DIMENSIONS = 1024
|
|
9
|
+
|
|
10
|
+
def initialize(api_key:, dimensions: nil, model: nil)
|
|
11
|
+
super(api_key: api_key, dimensions: dimensions)
|
|
12
|
+
@model = model || DEFAULT_MODEL
|
|
13
|
+
@dimensions ||= DEFAULT_DIMENSIONS
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def embed(text)
|
|
17
|
+
response = client.post("/v1/embed") do |req|
|
|
18
|
+
req.body = {
|
|
19
|
+
texts: [text.to_s],
|
|
20
|
+
model: @model,
|
|
21
|
+
input_type: "search_document",
|
|
22
|
+
embedding_types: ["float"]
|
|
23
|
+
}.to_json
|
|
24
|
+
req.headers["Content-Type"] = "application/json"
|
|
25
|
+
req.headers["Authorization"] = "Bearer #{@api_key}"
|
|
26
|
+
end
|
|
27
|
+
parse_embedding_response(response)
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def embed_batch(texts)
|
|
31
|
+
return [] if texts.empty?
|
|
32
|
+
response = client.post("/v1/embed") do |req|
|
|
33
|
+
req.body = {
|
|
34
|
+
texts: texts.map(&:to_s),
|
|
35
|
+
model: @model,
|
|
36
|
+
input_type: "search_document",
|
|
37
|
+
embedding_types: ["float"]
|
|
38
|
+
}.to_json
|
|
39
|
+
req.headers["Content-Type"] = "application/json"
|
|
40
|
+
req.headers["Authorization"] = "Bearer #{@api_key}"
|
|
41
|
+
end
|
|
42
|
+
parse_batch_embedding_response(response)
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
private
|
|
46
|
+
|
|
47
|
+
def client
|
|
48
|
+
@client ||= Faraday.new(url: "https://api.cohere.ai") do |f|
|
|
49
|
+
f.request :json
|
|
50
|
+
f.response :json
|
|
51
|
+
f.adapter Faraday.default_adapter
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def parse_embedding_response(response)
|
|
56
|
+
raise RailsAiKit::Error, "Cohere API error: #{response.body}" unless response.success?
|
|
57
|
+
data = response.body
|
|
58
|
+
data.dig("embeddings", "float")&.first || raise(RailsAiKit::Error, "No embedding in response")
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def parse_batch_embedding_response(response)
|
|
62
|
+
raise RailsAiKit::Error, "Cohere API error: #{response.body}" unless response.success?
|
|
63
|
+
data = response.body
|
|
64
|
+
data.dig("embeddings", "float") || []
|
|
65
|
+
end
|
|
66
|
+
end
|
|
67
|
+
end
|
|
68
|
+
end
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
module EmbeddingProviders
|
|
5
|
+
class Openai < Base
|
|
6
|
+
DEFAULT_MODEL = "text-embedding-3-small"
|
|
7
|
+
DEFAULT_DIMENSIONS = 1536
|
|
8
|
+
|
|
9
|
+
def initialize(api_key:, dimensions: nil, model: nil)
|
|
10
|
+
super(api_key: api_key, dimensions: dimensions)
|
|
11
|
+
@model = model || DEFAULT_MODEL
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
def embed(text)
|
|
15
|
+
response = client.post("/v1/embeddings") do |req|
|
|
16
|
+
req.body = {
|
|
17
|
+
input: text.to_s,
|
|
18
|
+
model: @model,
|
|
19
|
+
dimensions: (@dimensions || default_dimensions)
|
|
20
|
+
}.compact.to_json
|
|
21
|
+
req.headers["Content-Type"] = "application/json"
|
|
22
|
+
req.headers["Authorization"] = "Bearer #{@api_key}"
|
|
23
|
+
end
|
|
24
|
+
parse_embedding_response(response)
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def embed_batch(texts)
|
|
28
|
+
return [] if texts.empty?
|
|
29
|
+
response = client.post("/v1/embeddings") do |req|
|
|
30
|
+
req.body = {
|
|
31
|
+
input: texts.map(&:to_s),
|
|
32
|
+
model: @model,
|
|
33
|
+
dimensions: (@dimensions || default_dimensions)
|
|
34
|
+
}.compact.to_json
|
|
35
|
+
req.headers["Content-Type"] = "application/json"
|
|
36
|
+
req.headers["Authorization"] = "Bearer #{@api_key}"
|
|
37
|
+
end
|
|
38
|
+
parse_batch_embedding_response(response)
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
private
|
|
42
|
+
|
|
43
|
+
def client
|
|
44
|
+
@client ||= Faraday.new(url: "https://api.openai.com") do |f|
|
|
45
|
+
f.request :json
|
|
46
|
+
f.response :json
|
|
47
|
+
f.adapter Faraday.default_adapter
|
|
48
|
+
end
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def default_dimensions
|
|
52
|
+
@model.to_s.include?("large") ? 3072 : DEFAULT_DIMENSIONS
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def parse_embedding_response(response)
|
|
56
|
+
raise RailsAiKit::Error, "OpenAI API error: #{response.body}" unless response.success?
|
|
57
|
+
data = response.body
|
|
58
|
+
data["data"]&.first&.dig("embedding") || raise(RailsAiKit::Error, "No embedding in response")
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def parse_batch_embedding_response(response)
|
|
62
|
+
raise RailsAiKit::Error, "OpenAI API error: #{response.body}" unless response.success?
|
|
63
|
+
data = response.body
|
|
64
|
+
items = data["data"] || []
|
|
65
|
+
items.sort_by { |e| e["index"] }.map { |e| e["embedding"] }
|
|
66
|
+
end
|
|
67
|
+
end
|
|
68
|
+
end
|
|
69
|
+
end
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
# Wraps the configured embedding provider and API keys.
|
|
5
|
+
# Use RailsAiKit.embedding.embed("hello") or .embed_batch([...])
|
|
6
|
+
class EmbeddingService
|
|
7
|
+
PROVIDERS = {
|
|
8
|
+
openai: EmbeddingProviders::Openai,
|
|
9
|
+
cohere: EmbeddingProviders::Cohere
|
|
10
|
+
}.freeze
|
|
11
|
+
|
|
12
|
+
def initialize(config: RailsAiKit.configuration)
|
|
13
|
+
@config = config
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def embed(text)
|
|
17
|
+
provider.embed(text)
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def embed_batch(texts)
|
|
21
|
+
provider.embed_batch(texts)
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
private
|
|
25
|
+
|
|
26
|
+
def provider
|
|
27
|
+
@provider ||= build_provider
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def build_provider
|
|
31
|
+
name = @config.embedding_provider.to_sym
|
|
32
|
+
klass = PROVIDERS[name] || raise(ArgumentError, "Unknown embedding provider: #{name}")
|
|
33
|
+
api_key = @config.api_key(name) || raise(RailsAiKit::Error, "Missing API key for provider: #{name}. Set RailsAiKit.configuration.api_keys[:#{name}]")
|
|
34
|
+
klass.new(
|
|
35
|
+
api_key: api_key,
|
|
36
|
+
dimensions: @config.embedding_dimensions
|
|
37
|
+
)
|
|
38
|
+
end
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
def self.embedding
|
|
42
|
+
@embedding_service ||= EmbeddingService.new
|
|
43
|
+
end
|
|
44
|
+
end
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
# Internal model for storing label embeddings per classifier.
|
|
5
|
+
# Table: rails_ai_kit_labels (via generator).
|
|
6
|
+
# We use a constant so the user's app can reference the same table.
|
|
7
|
+
class LabelRecord < ActiveRecord::Base
|
|
8
|
+
self.table_name = "rails_ai_kit_labels"
|
|
9
|
+
|
|
10
|
+
has_neighbors :embedding, dimensions: -> { RailsAiKit.configuration.embedding_dimensions }
|
|
11
|
+
|
|
12
|
+
validates :classifier_name, presence: true
|
|
13
|
+
validates :label_name, presence: true
|
|
14
|
+
validates :embedding, presence: true
|
|
15
|
+
validates :label_name, uniqueness: { scope: :classifier_name }
|
|
16
|
+
|
|
17
|
+
scope :for_classifier, ->(name) { where(classifier_name: name) }
|
|
18
|
+
end
|
|
19
|
+
end
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsAiKit
|
|
4
|
+
# ActiveRecord integration: vector_classify macro and similarity search.
|
|
5
|
+
# Expects the model to have columns: embedding (vector), label (string), confidence_score (float).
|
|
6
|
+
module VectorClassify
|
|
7
|
+
extend ActiveSupport::Concern
|
|
8
|
+
|
|
9
|
+
class_methods do
|
|
10
|
+
# Declare vector-based classification on this model.
|
|
11
|
+
#
|
|
12
|
+
# @param source_column [Symbol] Attribute holding the text to embed and classify (e.g. :content)
|
|
13
|
+
# @param labels [Array<String>] Label names (for documentation / validation; training is via Classifier.train)
|
|
14
|
+
# @param embedding_column [Symbol] Column storing the vector (default :embedding)
|
|
15
|
+
# @param label_column [Symbol] Column to store predicted label (default :label)
|
|
16
|
+
# @param confidence_column [Symbol] Column to store confidence score (default :confidence_score)
|
|
17
|
+
# @param classifier_name [String, nil] Classifier namespace (default: model name)
|
|
18
|
+
# @param auto_classify [Boolean] Run classification on save (default true)
|
|
19
|
+
def vector_classify(
|
|
20
|
+
source_column,
|
|
21
|
+
labels: [],
|
|
22
|
+
embedding_column: :embedding,
|
|
23
|
+
label_column: :label,
|
|
24
|
+
confidence_column: :confidence_score,
|
|
25
|
+
classifier_name: nil,
|
|
26
|
+
auto_classify: true
|
|
27
|
+
)
|
|
28
|
+
include RailsAiKit::VectorClassify::InstanceMethods
|
|
29
|
+
|
|
30
|
+
cattr_accessor :rails_ai_kit_source_column, :rails_ai_kit_embedding_column,
|
|
31
|
+
:rails_ai_kit_label_column, :rails_ai_kit_confidence_column,
|
|
32
|
+
:rails_ai_kit_classifier_name, :rails_ai_kit_labels
|
|
33
|
+
self.rails_ai_kit_source_column = source_column.to_sym
|
|
34
|
+
self.rails_ai_kit_embedding_column = embedding_column.to_sym
|
|
35
|
+
self.rails_ai_kit_label_column = label_column.to_sym
|
|
36
|
+
self.rails_ai_kit_confidence_column = confidence_column.to_sym
|
|
37
|
+
self.rails_ai_kit_classifier_name = classifier_name || name
|
|
38
|
+
self.rails_ai_kit_labels = labels
|
|
39
|
+
|
|
40
|
+
has_neighbors embedding_column, dimensions: RailsAiKit.configuration.embedding_dimensions
|
|
41
|
+
|
|
42
|
+
if auto_classify
|
|
43
|
+
before_save :rails_ai_kit_compute_embedding_and_classify
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
# Similarity search by text: embeds the query and returns nearest records.
|
|
47
|
+
define_singleton_method(:similar_to) do |query_text, limit: 5|
|
|
48
|
+
vector = RailsAiKit.embedding.embed(query_text)
|
|
49
|
+
nearest_neighbors(embedding_column, vector, distance: "cosine").limit(limit)
|
|
50
|
+
end
|
|
51
|
+
end
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
module InstanceMethods
|
|
55
|
+
private
|
|
56
|
+
|
|
57
|
+
def rails_ai_kit_compute_embedding_and_classify
|
|
58
|
+
source = send(self.class.rails_ai_kit_source_column)
|
|
59
|
+
return if source.blank?
|
|
60
|
+
|
|
61
|
+
embedding = RailsAiKit.embedding.embed(source)
|
|
62
|
+
send(:"#{self.class.rails_ai_kit_embedding_column}=", embedding)
|
|
63
|
+
|
|
64
|
+
classifier = RailsAiKit::Classifier.new(classifier_name: self.class.rails_ai_kit_classifier_name)
|
|
65
|
+
result = classifier.classify_by_embedding(embedding)
|
|
66
|
+
send(:"#{self.class.rails_ai_kit_label_column}=", result[:label])
|
|
67
|
+
send(:"#{self.class.rails_ai_kit_confidence_column}=", result[:confidence])
|
|
68
|
+
end
|
|
69
|
+
end
|
|
70
|
+
end
|
|
71
|
+
end
|
data/lib/rails_ai_kit.rb
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "rails_ai_kit/version"
|
|
4
|
+
require_relative "rails_ai_kit/configuration"
|
|
5
|
+
require_relative "rails_ai_kit/embedding_providers/base"
|
|
6
|
+
require_relative "rails_ai_kit/embedding_providers/openai"
|
|
7
|
+
require_relative "rails_ai_kit/embedding_providers/cohere"
|
|
8
|
+
require_relative "rails_ai_kit/embedding_service"
|
|
9
|
+
require_relative "rails_ai_kit/label_record"
|
|
10
|
+
require_relative "rails_ai_kit/classifier"
|
|
11
|
+
require_relative "rails_ai_kit/vector_classify"
|
|
12
|
+
|
|
13
|
+
module RailsAiKit
|
|
14
|
+
class Error < StandardError; end
|
|
15
|
+
|
|
16
|
+
# Top-level classifier using default classifier name. For custom name use Classifier.new(classifier_name: "MyClassifier").
|
|
17
|
+
def self.classifier(classifier_name = nil)
|
|
18
|
+
Classifier.new(classifier_name: classifier_name)
|
|
19
|
+
end
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
# Optional Rails integration: extend ActiveRecord so vector_classify is available.
|
|
23
|
+
if defined?(ActiveRecord::Base)
|
|
24
|
+
ActiveRecord::Base.include RailsAiKit::VectorClassify
|
|
25
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: rails_ai_kit
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Rails AI Kit Contributors
|
|
8
|
+
bindir: bin
|
|
9
|
+
cert_chain: []
|
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
|
+
dependencies:
|
|
12
|
+
- !ruby/object:Gem::Dependency
|
|
13
|
+
name: activerecord
|
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
|
15
|
+
requirements:
|
|
16
|
+
- - ">="
|
|
17
|
+
- !ruby/object:Gem::Version
|
|
18
|
+
version: '6.0'
|
|
19
|
+
type: :runtime
|
|
20
|
+
prerelease: false
|
|
21
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
22
|
+
requirements:
|
|
23
|
+
- - ">="
|
|
24
|
+
- !ruby/object:Gem::Version
|
|
25
|
+
version: '6.0'
|
|
26
|
+
- !ruby/object:Gem::Dependency
|
|
27
|
+
name: activesupport
|
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
|
29
|
+
requirements:
|
|
30
|
+
- - ">="
|
|
31
|
+
- !ruby/object:Gem::Version
|
|
32
|
+
version: '6.0'
|
|
33
|
+
type: :runtime
|
|
34
|
+
prerelease: false
|
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
36
|
+
requirements:
|
|
37
|
+
- - ">="
|
|
38
|
+
- !ruby/object:Gem::Version
|
|
39
|
+
version: '6.0'
|
|
40
|
+
- !ruby/object:Gem::Dependency
|
|
41
|
+
name: faraday
|
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
|
43
|
+
requirements:
|
|
44
|
+
- - ">="
|
|
45
|
+
- !ruby/object:Gem::Version
|
|
46
|
+
version: '1.0'
|
|
47
|
+
type: :runtime
|
|
48
|
+
prerelease: false
|
|
49
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
50
|
+
requirements:
|
|
51
|
+
- - ">="
|
|
52
|
+
- !ruby/object:Gem::Version
|
|
53
|
+
version: '1.0'
|
|
54
|
+
- !ruby/object:Gem::Dependency
|
|
55
|
+
name: neighbor
|
|
56
|
+
requirement: !ruby/object:Gem::Requirement
|
|
57
|
+
requirements:
|
|
58
|
+
- - ">="
|
|
59
|
+
- !ruby/object:Gem::Version
|
|
60
|
+
version: '0.2'
|
|
61
|
+
type: :runtime
|
|
62
|
+
prerelease: false
|
|
63
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
64
|
+
requirements:
|
|
65
|
+
- - ">="
|
|
66
|
+
- !ruby/object:Gem::Version
|
|
67
|
+
version: '0.2'
|
|
68
|
+
description: 'Rails AI Kit provides ready-made tools to classify data using vector
|
|
69
|
+
similarity: label training, auto/batch classification, similarity search, and filtering—without
|
|
70
|
+
ML training or LLM calls.'
|
|
71
|
+
email:
|
|
72
|
+
- rohit.kushwaha@w3villa.com
|
|
73
|
+
executables: []
|
|
74
|
+
extensions: []
|
|
75
|
+
extra_rdoc_files: []
|
|
76
|
+
files:
|
|
77
|
+
- CHANGELOG.md
|
|
78
|
+
- README.md
|
|
79
|
+
- lib/generators/rails_ai_kit/install/templates/create_rails_ai_kit_labels.rb
|
|
80
|
+
- lib/generators/rails_ai_kit/install/templates/rails_ai_kit.rb
|
|
81
|
+
- lib/generators/rails_ai_kit/install_generator.rb
|
|
82
|
+
- lib/generators/rails_ai_kit/vector_columns/templates/add_vector_columns.rb
|
|
83
|
+
- lib/generators/rails_ai_kit/vector_columns_generator.rb
|
|
84
|
+
- lib/rails_ai_kit.rb
|
|
85
|
+
- lib/rails_ai_kit/classifier.rb
|
|
86
|
+
- lib/rails_ai_kit/configuration.rb
|
|
87
|
+
- lib/rails_ai_kit/embedding_providers/base.rb
|
|
88
|
+
- lib/rails_ai_kit/embedding_providers/cohere.rb
|
|
89
|
+
- lib/rails_ai_kit/embedding_providers/openai.rb
|
|
90
|
+
- lib/rails_ai_kit/embedding_service.rb
|
|
91
|
+
- lib/rails_ai_kit/label_record.rb
|
|
92
|
+
- lib/rails_ai_kit/vector_classify.rb
|
|
93
|
+
- lib/rails_ai_kit/version.rb
|
|
94
|
+
homepage: https://github.com/your-org/rails_ai_kit
|
|
95
|
+
licenses: []
|
|
96
|
+
metadata:
|
|
97
|
+
homepage_uri: https://github.com/your-org/rails_ai_kit
|
|
98
|
+
source_code_uri: https://github.com/your-org/rails_ai_kit
|
|
99
|
+
changelog_uri: https://github.com/your-org/rails_ai_kit/blob/main/CHANGELOG.md
|
|
100
|
+
rdoc_options: []
|
|
101
|
+
require_paths:
|
|
102
|
+
- lib
|
|
103
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
104
|
+
requirements:
|
|
105
|
+
- - ">="
|
|
106
|
+
- !ruby/object:Gem::Version
|
|
107
|
+
version: 3.0.0
|
|
108
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
109
|
+
requirements:
|
|
110
|
+
- - ">="
|
|
111
|
+
- !ruby/object:Gem::Version
|
|
112
|
+
version: '0'
|
|
113
|
+
requirements: []
|
|
114
|
+
rubygems_version: 4.0.7
|
|
115
|
+
specification_version: 4
|
|
116
|
+
summary: Vector-based classification layer on top of pgvector for Rails
|
|
117
|
+
test_files: []
|