jekyll_ai_related_posts 0.1.4 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bde6fb501500ac67e8b2fbcc05707dad8acc44d3e06ea9e461a98a065e9f84e2
4
- data.tar.gz: e0a588ca8656583ad3dcbb36bb76d8c3992107ef807e5425c3f0f0f0db021da7
3
+ metadata.gz: 4cdd84629ee4f629c87e6db6e716988d931626bcce1146209781a1ead8e7360c
4
+ data.tar.gz: 36ec471f3873c0eb6678c2c35c2b0d58da00104abb56f571f0922a31b7137ee0
5
5
  SHA512:
6
- metadata.gz: a5e674a295aeb7c2e0bfd1bcec208d129dad7833a3690c4beb467e130220806e06867f49c4703a677cd000944ae0fdc66305bbcdc9679e7ccb51691f1ed9ec4b
7
- data.tar.gz: f5c95c5eab1f65ec12b56b5ed949125eb24ec41a0e0097609bff313d654ff0888152e78070648fb744b4bf04c90a2f50a5698161d77df99545037d8c26f55d26
6
+ metadata.gz: d35cec25d660ea48f8062240305850943710930446b121a349611cb4212b8d368c3164b0b9cc44549eb757f7278fe10b5a2091f19a2c9ff3212ed8b9ee7d3775
7
+ data.tar.gz: 0aa5878a2ada754441f1bc8e69e765407bfbaaea4f80a74c0951b1690c70f2a6e9c272bf8df6f86efbca3bea11e94067a723637fe577b997417dfb661bdf9a5d
data/CHANGELOG.md CHANGED
@@ -1,5 +1,11 @@
1
1
  ## [Unreleased]
2
2
 
3
+ - feat: Generic LLM provider support (OpenAI-compatible APIs like OpenRouter) with configurable `api_url` and `model`. Embedding dimensions are auto-discovered from the API.
4
+
5
+ ## [0.1.4] - 2024-10-12
6
+
7
+ - Better log messages (improved clarity about what's happening).
8
+
3
9
  ## [0.1.3] - 2024-05-15
4
10
 
5
11
  - Better (nicer to read) log messages.
data/README.md CHANGED
@@ -46,23 +46,40 @@ exclude:
46
46
 
47
47
  ## Configuration
48
48
 
49
- All config for this plugin sits under a top-level `ai_related_posts` key.
49
+ All config for this plugin sits under a top-level `ai_related_posts` key in
50
+ Jekyll's `_config.yml`.
50
51
 
51
- The only required config is `openai_api_key` -- we need to authenticate to the
52
+ The only required config is an API key -- we need to authenticate to the
52
53
  API to fetch embedding vectors.
53
54
 
54
- - **openai_api_key** Your OpenAI API key, used to fetch embeddings.
55
+ - **api_key** (or `openai_api_key` for backward compatibility) Your API key, used to fetch embeddings.
56
+ - **api_url** (optional, default `https://api.openai.com`). The base URL for the embeddings API.
57
+ - **model** (optional, default `text-embedding-3-small`). The model to use for embeddings.
55
58
  - **fetch_enabled** (optional, default `true`). If true, fetch embeddings. If
56
59
  false, don't fetch embeddings. If this is a string (like `prod`), fetch
57
60
  embeddings only when the `JEKYLL_ENV` environment variable is equal to the
58
61
  string. (This is useful if you want to reduce API costs by only fetching
59
62
  embeddings on production builds.)
60
63
 
61
- ### Example Config
64
+ **Important:** The plugin stores the model and dimensions in the cache database. If you change the model or dimensions in your config, the plugin will detect the mismatch and exit with an error. You can either:
65
+ - Update your config to match the cached values, or
66
+ - Delete the cache file (`.ai_related_posts_cache.sqlite3`) and it will be regenerated with the new model/dimensions.
67
+
68
+ ### Example Config: OpenAI (default)
69
+
70
+ ```yaml
71
+ ai_related_posts:
72
+ api_key: sk-proj-abc123
73
+ fetch_enabled: prod
74
+ ```
75
+
76
+ ### Example Config: OpenRouter
62
77
 
63
78
  ```yaml
64
79
  ai_related_posts:
65
- openai_api_key: sk-proj-abc123
80
+ api_key: sk-or-v1-abc123
81
+ api_url: https://openrouter.ai/api
82
+ model: openai/text-embedding-3-small
66
83
  fetch_enabled: prod
67
84
  ```
68
85
 
@@ -128,14 +145,17 @@ fees if done frequently).
128
145
  ## How It Works
129
146
 
130
147
  Jekyll AI Related Posts is implemented as a Jekyll Generator plugin. During the
131
- build process, the plugin will call the [OpenAI Embeddings
132
- API](https://platform.openai.com/docs/guides/embeddings) to fetch the vector
133
- embedding for a string containing the title, tags, and categories of your
134
- article. It's not necessary to use the full post text, in most cases the title
135
- and tags produce very accurate results because the LLM knows when topics are
136
- related even if they never use identical words. This is also why the LLM
137
- produces better results than LSI. These vector embeddings are cached in a SQLite
138
- database. To query for related posts, we query the cached vectors using the
148
+ build process, the plugin will call an embeddings API (by default, the [OpenAI
149
+ Embeddings API](https://platform.openai.com/docs/guides/embeddings)) to fetch
150
+ the vector embedding for a string containing the title, tags, and categories of
151
+ your article. The plugin works with any OpenAI-compatible embeddings API, such as
152
+ [OpenRouter](https://openrouter.ai/).
153
+
154
+ It's not necessary to use the full post text, in most cases the title and tags
155
+ produce very accurate results because the LLM knows when topics are related even
156
+ if they never use identical words. This is also why the LLM produces better
157
+ results than LSI. These vector embeddings are cached in a SQLite database. To
158
+ query for related posts, we query the cached vectors using the
139
159
  [sqlite-vss](https://github.com/asg017/sqlite-vss) plugin.
140
160
 
141
161
  ## Development
@@ -0,0 +1,48 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "lib/jekyll_ai_related_posts/version"
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "jekyll_ai_related_posts"
7
+ spec.version = JekyllAiRelatedPosts::VERSION
8
+ spec.authors = [ "Mike Kasberg" ]
9
+ spec.email = [ "kasberg.mike@gmail.com" ]
10
+
11
+ spec.summary = "Populate ai_related_posts using Open AI embeddings"
12
+ spec.description = "Populate ai_related_posts using Open AI embeddings"
13
+ spec.homepage = "https://github.com/mkasberg/jekyll_ai_related_posts"
14
+ spec.license = "MIT"
15
+ spec.required_ruby_version = ">= 3.0.0"
16
+
17
+ spec.metadata["allowed_push_host"] = "https://rubygems.org"
18
+
19
+ spec.metadata["homepage_uri"] = spec.homepage
20
+ spec.metadata["source_code_uri"] = "https://github.com/mkasberg/jekyll_ai_related_posts"
21
+ spec.metadata["changelog_uri"] = "https://github.com/mkasberg/jekyll_ai_related_posts/blob/main/CHANGELOG.md"
22
+
23
+ # Specify which files should be added to the gem when it is released.
24
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
25
+ spec.files = Dir.chdir(__dir__) do
26
+ `git ls-files -z`.split("\x0").reject do |f|
27
+ (File.expand_path(f) == __FILE__) ||
28
+ f.start_with?(*%w[bin/ test/ spec/ features/ .git .github appveyor Gemfile])
29
+ end
30
+ end
31
+ spec.bindir = "exe"
32
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
33
+ spec.require_paths = [ "lib" ]
34
+
35
+ # Uncomment to register a new dependency of your gem
36
+ # spec.add_dependency "example-gem", "~> 1.0"
37
+ spec.add_dependency "activerecord", "~> 7.0"
38
+ spec.add_dependency "faraday", "~> 2.9"
39
+ spec.add_dependency "jekyll", ">= 3.0"
40
+ spec.add_dependency "sqlite3", "~> 1.4"
41
+ spec.add_dependency "sqlite-vss", "~> 0.1.2"
42
+ spec.add_dependency "zeitwerk", "~> 2.6"
43
+
44
+ spec.add_development_dependency "ostruct", "~> 0.6"
45
+
46
+ # For more information and examples about making a new gem, check out our
47
+ # guide at: https://bundler.io/guides/creating_gem.html
48
+ end
@@ -0,0 +1,70 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "faraday"
4
+
5
+ module JekyllAiRelatedPosts
6
+ class ApiEmbeddings
7
+ DEFAULT_API_URL = "https://api.openai.com"
8
+ DEFAULT_MODEL = "text-embedding-3-small"
9
+ DEFAULT_DIMENSIONS = 1536
10
+
11
+ def initialize(api_key, api_url: nil, model: nil, connection: nil)
12
+ @api_url = api_url || DEFAULT_API_URL
13
+ @model = model || DEFAULT_MODEL
14
+ @dimensions = nil
15
+
16
+ @connection = if connection.nil?
17
+ Faraday.new(url: @api_url) do |builder|
18
+ builder.request :authorization, "Bearer", api_key
19
+ builder.request :json
20
+ builder.response :json
21
+ builder.response :raise_error
22
+ end
23
+ else
24
+ connection
25
+ end
26
+ end
27
+
28
+ def dimensions
29
+ @dimensions ||= discover_dimensions
30
+ end
31
+
32
+ def model
33
+ @model
34
+ end
35
+
36
+ def embedding_for(text)
37
+ res = @connection.post("v1/embeddings") do |req|
38
+ req.body = {
39
+ input: text,
40
+ model: @model
41
+ }
42
+ end
43
+
44
+ data = res.body
45
+ unless data.is_a?(Hash) &&
46
+ data["data"].is_a?(Array) &&
47
+ data["data"][0].is_a?(Hash) &&
48
+ data["data"][0]["embedding"].is_a?(Array)
49
+ Jekyll.logger.error "AI Related Posts:", "Unexpected API response structure!"
50
+ Jekyll.logger.error "AI Related Posts:", "Response body: #{data.inspect}"
51
+ raise Error, "Unexpected API response: embedding data not found"
52
+ end
53
+
54
+ embedding = data["data"][0]["embedding"]
55
+ @dimensions ||= embedding.length
56
+ embedding
57
+ rescue Faraday::Error => e
58
+ Jekyll.logger.error "AI Related Posts:", "Error response from embeddings API!"
59
+ Jekyll.logger.error "AI Related Posts:", e.inspect
60
+
61
+ raise
62
+ end
63
+
64
+ private
65
+
66
+ def discover_dimensions
67
+ embedding_for("test").length
68
+ end
69
+ end
70
+ end
@@ -16,6 +16,8 @@ module JekyllAiRelatedPosts
16
16
  cache_hits: 0,
17
17
  cache_misses: 0
18
18
  }
19
+ @embeddings_fetcher = new_fetcher if fetch_enabled?
20
+
19
21
  setup_database
20
22
 
21
23
  @indexed_posts = {}
@@ -24,7 +26,7 @@ module JekyllAiRelatedPosts
24
26
  end
25
27
 
26
28
  if fetch_enabled?
27
- @embeddings_fetcher = new_fetcher
29
+ validate_cache_metadata
28
30
 
29
31
  @site.posts.docs.each do |p|
30
32
  ensure_embedding_cached(p)
@@ -85,9 +87,15 @@ module JekyllAiRelatedPosts
85
87
  def new_fetcher
86
88
  case @site.config["ai_related_posts"]["embeddings_source"]
87
89
  when "mock"
88
- MockEmbeddings.new
90
+ model = @site.config["ai_related_posts"]["model"]
91
+ dimensions = @site.config["ai_related_posts"]["dimensions"]
92
+ MockEmbeddings.new(model: model, dimensions: dimensions)
89
93
  else
90
- OpenAiEmbeddings.new(@site.config["ai_related_posts"]["openai_api_key"])
94
+ api_key = @site.config["ai_related_posts"]["api_key"] ||
95
+ @site.config["ai_related_posts"]["openai_api_key"]
96
+ api_url = @site.config["ai_related_posts"]["api_url"]
97
+ model = @site.config["ai_related_posts"]["model"]
98
+ ApiEmbeddings.new(api_key, api_url: api_url, model: model)
91
99
  end
92
100
  end
93
101
 
@@ -158,6 +166,14 @@ module JekyllAiRelatedPosts
158
166
  post.data["ai_related_posts"] = related_posts
159
167
  end
160
168
 
169
+ def dimensions
170
+ @embeddings_fetcher&.dimensions || ApiEmbeddings::DEFAULT_DIMENSIONS
171
+ end
172
+
173
+ def model
174
+ @embeddings_fetcher&.model || ApiEmbeddings::DEFAULT_MODEL
175
+ end
176
+
161
177
  def embedding_text(post)
162
178
  text = "Title: #{post.data["title"]}"
163
179
  text += "; Categories: #{post.data["categories"].join(", ")}" unless post.data["categories"].empty?
@@ -203,14 +219,51 @@ module JekyllAiRelatedPosts
203
219
  SQL
204
220
  ActiveRecord::Base.connection.execute(create_posts)
205
221
 
206
- create_vss_posts = <<-SQL
207
- CREATE VIRTUAL TABLE IF NOT EXISTS vss_posts using vss0(
208
- post_embedding(#{OpenAiEmbeddings::DIMENSIONS})
222
+ unless table_exists?("vss_posts")
223
+ create_vss_posts = <<-SQL
224
+ CREATE VIRTUAL TABLE vss_posts using vss0(
225
+ post_embedding(#{dimensions})
226
+ );
227
+ SQL
228
+ ActiveRecord::Base.connection.execute(create_vss_posts)
229
+ end
230
+
231
+ create_cache_metadata = <<-SQL
232
+ CREATE TABLE IF NOT EXISTS cache_metadata(
233
+ key TEXT PRIMARY KEY,
234
+ value TEXT
209
235
  );
210
236
  SQL
211
- ActiveRecord::Base.connection.execute(create_vss_posts)
237
+ ActiveRecord::Base.connection.execute(create_cache_metadata)
212
238
 
213
239
  Jekyll.logger.debug "AI Related Posts:", "DB setup complete"
214
240
  end
241
+
242
+ def table_exists?(name)
243
+ ActiveRecord::Base.connection.table_exists?(name)
244
+ end
245
+
246
+ def validate_cache_metadata
247
+ model = @embeddings_fetcher&.model || ApiEmbeddings::DEFAULT_MODEL
248
+
249
+ stored_model = ActiveRecord::Base.connection.execute(
250
+ "SELECT value FROM cache_metadata WHERE key = 'model';"
251
+ ).first
252
+
253
+ if stored_model && stored_model["value"] != model
254
+ Jekyll.logger.error "AI Related Posts:", "Cache model mismatch!"
255
+ Jekyll.logger.error "AI Related Posts:", " Configured model: #{model}"
256
+ Jekyll.logger.error "AI Related Posts:", " Cached model: #{stored_model["value"]}"
257
+ Jekyll.logger.error "AI Related Posts:", "Either update your config to match the cached model, or delete the cache file (.ai_related_posts_cache.sqlite3) and it will be regenerated."
258
+ raise Error, "Cache model mismatch: configured=#{model}, cached=#{stored_model["value"]}"
259
+ end
260
+
261
+ # Store/update metadata if not present
262
+ if stored_model.nil?
263
+ ActiveRecord::Base.connection.execute(
264
+ ActiveRecord::Base.sanitize_sql([ "INSERT INTO cache_metadata (key, value) VALUES ('model', ?);", model ])
265
+ )
266
+ end
267
+ end
215
268
  end
216
269
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module JekyllAiRelatedPosts
4
- VERSION = "0.1.4"
4
+ VERSION = "0.2.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: jekyll_ai_related_posts
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mike Kasberg
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-10-12 00:00:00.000000000 Z
11
+ date: 2026-06-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -94,6 +94,20 @@ dependencies:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
96
  version: '2.6'
97
+ - !ruby/object:Gem::Dependency
98
+ name: ostruct
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: '0.6'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: '0.6'
97
111
  description: Populate ai_related_posts using Open AI embeddings
98
112
  email:
99
113
  - kasberg.mike@gmail.com
@@ -109,10 +123,11 @@ files:
109
123
  - Rakefile
110
124
  - gemfiles/current.gemfile
111
125
  - gemfiles/jekyll3.gemfile
126
+ - jekyll_ai_related_posts.gemspec
112
127
  - lib/jekyll_ai_related_posts.rb
128
+ - lib/jekyll_ai_related_posts/api_embeddings.rb
113
129
  - lib/jekyll_ai_related_posts/generator.rb
114
130
  - lib/jekyll_ai_related_posts/models/post.rb
115
- - lib/jekyll_ai_related_posts/open_ai_embeddings.rb
116
131
  - lib/jekyll_ai_related_posts/version.rb
117
132
  homepage: https://github.com/mkasberg/jekyll_ai_related_posts
118
133
  licenses:
@@ -137,7 +152,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
137
152
  - !ruby/object:Gem::Version
138
153
  version: '0'
139
154
  requirements: []
140
- rubygems_version: 3.5.6
155
+ rubygems_version: 3.5.3
141
156
  signing_key:
142
157
  specification_version: 4
143
158
  summary: Populate ai_related_posts using Open AI embeddings
@@ -1,38 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require "faraday"
4
-
5
- module JekyllAiRelatedPosts
6
- class OpenAiEmbeddings
7
- DIMENSIONS = 1536
8
-
9
- def initialize(api_key, connection: nil)
10
- @connection = if connection.nil?
11
- Faraday.new(url: "https://api.openai.com") do |builder|
12
- builder.request :authorization, "Bearer", api_key
13
- builder.request :json
14
- builder.response :json
15
- builder.response :raise_error
16
- end
17
- else
18
- connection
19
- end
20
- end
21
-
22
- def embedding_for(text)
23
- res = @connection.post("/v1/embeddings") do |req|
24
- req.body = {
25
- input: text,
26
- model: "text-embedding-3-small"
27
- }
28
- end
29
-
30
- res.body["data"].first["embedding"]
31
- rescue Faraday::Error => e
32
- Jekyll.logger.error "AI Related Posts:", "Error response from OpenAI API!"
33
- Jekyll.logger.error "AI Related Posts:", e.inspect
34
-
35
- raise
36
- end
37
- end
38
- end