legion-apollo 0.4.0 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +30 -0
- data/README.md +24 -3
- data/lib/legion/apollo/helpers/confidence.rb +1 -0
- data/lib/legion/apollo/local/graph.rb +1 -1
- data/lib/legion/apollo/local/migrations/004_add_versioning_tiers_inference.rb +53 -0
- data/lib/legion/apollo/local/migrations/005_add_raw_content_temporal_windows.rb +22 -0
- data/lib/legion/apollo/local.rb +256 -45
- data/lib/legion/apollo/routes.rb +41 -22
- data/lib/legion/apollo/settings.rb +19 -1
- data/lib/legion/apollo/version.rb +1 -1
- data/lib/legion/apollo.rb +55 -27
- metadata +3 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 9320cef3392467f9819e47c9cacc3893f45ffab9570825cb5d6b270ff3ce6a67
|
|
4
|
+
data.tar.gz: d38343aa96c9ba0b1d634f6e4d3850e22c899fe81c13550dce19809cec290a94
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 2dcaf3fc8fbf087b8e59c079e71921aebf8c53620e32d9cd8939a00981474747b78e978412d28a5c3ff4f204df01b9538f23a81c82a76a0d94afab8080219e46
|
|
7
|
+
data.tar.gz: 8d078c7e1c727bd23b08bfdb09af0abe7581296132d715b7ed7c4c5205d4645c004f1f980af6546239961c6649e9433e66a725341e2275e3cdb2e1de7f1687a3
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,35 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.5.2] - 2026-04-27
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- Store `raw_content` alongside indexed `content` in Apollo Local so callers can preserve verbatim source text separately from retrieval text (#25, #26)
|
|
7
|
+
- Add `valid_from`/`valid_to` temporal windows and `as_of:` query filtering for local knowledge entries (#27)
|
|
8
|
+
|
|
9
|
+
### Fixed
|
|
10
|
+
- Sanitize Apollo ingest and query text by scrubbing invalid UTF-8 and removing null bytes before routing to local or global backends (#29)
|
|
11
|
+
|
|
12
|
+
## [0.5.1] - 2026-04-27
|
|
13
|
+
|
|
14
|
+
### Fixed
|
|
15
|
+
- Guard Apollo Local tag queries and promotion against nil, shutdown, or unavailable local DB connections before SQL and Ruby fallback paths (#30)
|
|
16
|
+
|
|
17
|
+
## [0.5.0] - 2026-04-18
|
|
18
|
+
|
|
19
|
+
### Added
|
|
20
|
+
- Migration 004: versioning, tiers, inference, expiry metadata, and source linkage columns on `local_knowledge`; `local_source_links` table (#6-#11)
|
|
21
|
+
- Inference tagging: `is_inference` flag on ingest, `include_inferences:` filter on query, `INITIAL_INFERENCE_CONFIDENCE` (0.35) for LLM-derived entries (#9)
|
|
22
|
+
- Temporal expiry metadata: `forget_reason` and custom `expires_at` on ingest (#8)
|
|
23
|
+
- Versioned knowledge: `parent_knowledge_id`/`is_latest`/`supersession_type` on ingest, automatic parent supersession, `version_chain` traversal, `include_history:` query filter (#7)
|
|
24
|
+
- L0/L1/L2 tiered retrieval: `tier:` parameter on query with summary projection and truncation fallback (#6)
|
|
25
|
+
- Source-to-fact linkage: `source_uri`/`source_hash`/`relevance_score`/`extraction_method` on ingest, `source_links_for` query method, `local_source_links` table (#10)
|
|
26
|
+
- `SUPERSEDES` relation type in `Local::Graph` (#11)
|
|
27
|
+
- Versioning and expiry settings defaults
|
|
28
|
+
|
|
29
|
+
### Fixed
|
|
30
|
+
- FTS5 search crashes on punctuation (`.`, `:`, `-`, `+`, etc.) by tokenizing input into quoted alphanumeric terms with implicit AND semantics; ILIKE fallback now escapes `%` and `_` wildcards (#22)
|
|
31
|
+
- Apollo query returns HTTP 500 on non-Postgres backends: `direct_query` exceptions normalized to `:backend_query_failed` symbol, `apollo_status_code` maps known unavailability symbols to 503 (#23)
|
|
32
|
+
|
|
3
33
|
## [0.4.0] - 2026-04-02
|
|
4
34
|
|
|
5
35
|
### Changed
|
data/README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
# legion-apollo
|
|
2
2
|
|
|
3
|
-
Apollo client
|
|
3
|
+
Apollo is the LegionIO knowledge client. It gives extensions one API for writing, retrieving, and merging knowledge across the global Apollo service and the node-local SQLite store.
|
|
4
4
|
|
|
5
|
-
**Version**: 0.
|
|
5
|
+
**Version**: 0.5.2
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
`legion-apollo` provides `query`, `ingest`, and `retrieve` with smart routing: co-located `lex-apollo`, RabbitMQ transport, node-local SQLite, or graceful failure. `Apollo::Local` mirrors the same public API for offline and low-latency retrieval without requiring remote infrastructure.
|
|
8
8
|
|
|
9
9
|
## Usage
|
|
10
10
|
|
|
@@ -21,6 +21,24 @@ results = Legion::Apollo.query(text: 'local note', scope: :local)
|
|
|
21
21
|
|
|
22
22
|
# Query both and merge (deduped by content hash, ranked by confidence)
|
|
23
23
|
results = Legion::Apollo.query(text: 'ruby', scope: :all)
|
|
24
|
+
|
|
25
|
+
# Preserve verbatim source text separately from indexed retrieval content
|
|
26
|
+
Legion::Apollo.ingest(
|
|
27
|
+
content: 'Summarized policy note for search',
|
|
28
|
+
raw_content: 'Exact source text from the original record',
|
|
29
|
+
tags: %w[policy source],
|
|
30
|
+
scope: :local
|
|
31
|
+
)
|
|
32
|
+
|
|
33
|
+
# Query the local store as it was valid at a point in time
|
|
34
|
+
Legion::Apollo.ingest(
|
|
35
|
+
content: 'Policy version active in Q2',
|
|
36
|
+
tags: %w[policy],
|
|
37
|
+
valid_from: '2026-04-01T00:00:00.000Z',
|
|
38
|
+
valid_to: '2026-06-30T23:59:59.999Z',
|
|
39
|
+
scope: :local
|
|
40
|
+
)
|
|
41
|
+
results = Legion::Apollo.query(text: 'policy', scope: :local, as_of: '2026-05-01T00:00:00.000Z')
|
|
24
42
|
```
|
|
25
43
|
|
|
26
44
|
## Scopes
|
|
@@ -37,9 +55,12 @@ results = Legion::Apollo.query(text: 'ruby', scope: :all)
|
|
|
37
55
|
|
|
38
56
|
Features:
|
|
39
57
|
- Content-hash dedup (MD5 of normalized content)
|
|
58
|
+
- `raw_content` preservation for verbatim source text
|
|
59
|
+
- `valid_from` / `valid_to` temporal windows with `as_of:` query filtering
|
|
40
60
|
- Optional LLM embeddings (1024-dim) with cosine rerank when `Legion::LLM.can_embed?`
|
|
41
61
|
- TTL expiry (default 5-year retention)
|
|
42
62
|
- FTS5 full-text search with `ILIKE` fallback
|
|
63
|
+
- Null-byte removal and invalid UTF-8 scrubbing before persistence or backend routing
|
|
43
64
|
|
|
44
65
|
## Configuration
|
|
45
66
|
|
|
@@ -11,7 +11,7 @@ module Legion
|
|
|
11
11
|
# Relationships are directional typed edges between two entities.
|
|
12
12
|
# Graph traversal expands one frontier batch per depth to avoid per-node queries.
|
|
13
13
|
module Graph # rubocop:disable Metrics/ModuleLength
|
|
14
|
-
VALID_RELATION_TYPES = %w[AFFECTS OWNED_BY DEPENDS_ON RELATED_TO].freeze
|
|
14
|
+
VALID_RELATION_TYPES = %w[AFFECTS OWNED_BY DEPENDS_ON RELATED_TO SUPERSEDES].freeze
|
|
15
15
|
|
|
16
16
|
class << self # rubocop:disable Metrics/ClassLength
|
|
17
17
|
include Legion::Logging::Helper
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do # rubocop:disable Metrics/BlockLength
|
|
4
|
+
up do # rubocop:disable Metrics/BlockLength
|
|
5
|
+
alter_table(:local_knowledge) do
|
|
6
|
+
add_column :is_inference, :boolean, default: false, null: false
|
|
7
|
+
add_column :parent_knowledge_id, Integer, null: true
|
|
8
|
+
add_column :is_latest, :boolean, default: true, null: false
|
|
9
|
+
add_column :supersession_type, String, size: 32, null: true
|
|
10
|
+
add_column :forget_reason, String, size: 128, null: true
|
|
11
|
+
add_column :summary_l0, String, size: 500, null: true
|
|
12
|
+
add_column :summary_l1, :text, null: true
|
|
13
|
+
add_column :knowledge_tier, String, size: 4, null: false, default: 'L2'
|
|
14
|
+
add_column :l0_generated_at, String, null: true
|
|
15
|
+
add_column :l1_generated_at, String, null: true
|
|
16
|
+
|
|
17
|
+
add_index :is_latest, name: :idx_local_knowledge_is_latest
|
|
18
|
+
add_index :is_inference, name: :idx_local_knowledge_is_inference
|
|
19
|
+
add_index :knowledge_tier, name: :idx_local_knowledge_tier
|
|
20
|
+
add_index :parent_knowledge_id, name: :idx_local_knowledge_parent
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
create_table(:local_source_links) do
|
|
24
|
+
primary_key :id
|
|
25
|
+
Integer :entry_id, null: false
|
|
26
|
+
String :source_uri, text: true
|
|
27
|
+
String :source_hash, size: 64
|
|
28
|
+
Float :relevance_score, default: 1.0
|
|
29
|
+
String :extraction_method, size: 64
|
|
30
|
+
String :created_at, null: false
|
|
31
|
+
|
|
32
|
+
index :entry_id, name: :idx_source_links_entry
|
|
33
|
+
index :source_hash, name: :idx_source_links_hash
|
|
34
|
+
end
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
down do
|
|
38
|
+
drop_table(:local_source_links) if table_exists?(:local_source_links)
|
|
39
|
+
|
|
40
|
+
alter_table(:local_knowledge) do
|
|
41
|
+
drop_column :is_inference
|
|
42
|
+
drop_column :parent_knowledge_id
|
|
43
|
+
drop_column :is_latest
|
|
44
|
+
drop_column :supersession_type
|
|
45
|
+
drop_column :forget_reason
|
|
46
|
+
drop_column :summary_l0
|
|
47
|
+
drop_column :summary_l1
|
|
48
|
+
drop_column :knowledge_tier
|
|
49
|
+
drop_column :l0_generated_at
|
|
50
|
+
drop_column :l1_generated_at
|
|
51
|
+
end
|
|
52
|
+
end
|
|
53
|
+
end
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do
|
|
4
|
+
up do
|
|
5
|
+
alter_table(:local_knowledge) do
|
|
6
|
+
add_column :raw_content, :text, null: true
|
|
7
|
+
add_column :valid_from, String, null: true
|
|
8
|
+
add_column :valid_to, String, null: true
|
|
9
|
+
|
|
10
|
+
add_index :valid_from, name: :idx_local_knowledge_valid_from
|
|
11
|
+
add_index :valid_to, name: :idx_local_knowledge_valid_to
|
|
12
|
+
end
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
down do
|
|
16
|
+
alter_table(:local_knowledge) do
|
|
17
|
+
drop_column :raw_content
|
|
18
|
+
drop_column :valid_from
|
|
19
|
+
drop_column :valid_to
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
end
|
data/lib/legion/apollo/local.rb
CHANGED
|
@@ -5,6 +5,7 @@ require 'legion/logging'
|
|
|
5
5
|
require 'socket'
|
|
6
6
|
require 'time'
|
|
7
7
|
require_relative 'local/graph'
|
|
8
|
+
require_relative 'helpers/confidence'
|
|
8
9
|
require_relative 'helpers/similarity'
|
|
9
10
|
require_relative 'helpers/tag_normalizer'
|
|
10
11
|
|
|
@@ -90,7 +91,7 @@ module Legion
|
|
|
90
91
|
{ success: false, error: e.message }
|
|
91
92
|
end
|
|
92
93
|
|
|
93
|
-
def query(text:, limit: nil, min_confidence: nil, tags: nil, **) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
|
|
94
|
+
def query(text:, limit: nil, min_confidence: nil, tags: nil, **opts) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity
|
|
94
95
|
return not_started_error unless started?
|
|
95
96
|
|
|
96
97
|
text = normalize_text_input(text)
|
|
@@ -98,19 +99,27 @@ module Legion
|
|
|
98
99
|
limit ||= local_setting(:default_limit, 5)
|
|
99
100
|
min_confidence ||= local_setting(:min_confidence, 0.3)
|
|
100
101
|
multiplier = local_setting(:fts_candidate_multiplier, 3)
|
|
102
|
+
as_of = normalize_temporal_value(opts[:as_of])
|
|
101
103
|
log.info do
|
|
102
104
|
"Apollo::Local query executing text_length=#{text.to_s.length} " \
|
|
103
105
|
"limit=#{limit} min_confidence=#{min_confidence} tag_count=#{Array(tags).size}"
|
|
104
106
|
end
|
|
105
107
|
log.debug { "Apollo::Local query limit=#{limit} min_confidence=#{min_confidence} tags=#{Array(tags).size}" }
|
|
106
108
|
|
|
107
|
-
candidates = fts_search(text, limit: limit * multiplier)
|
|
108
|
-
|
|
109
|
+
candidates = fts_search(text, limit: limit * multiplier, as_of: as_of)
|
|
110
|
+
include_inferences = opts.fetch(:include_inferences, true)
|
|
111
|
+
include_history = opts.fetch(:include_history, false)
|
|
112
|
+
candidates = filter_candidates(candidates, min_confidence: min_confidence, tags: tags,
|
|
113
|
+
options: { include_inferences: include_inferences,
|
|
114
|
+
include_history: include_history, as_of: as_of })
|
|
109
115
|
candidates = cosine_rerank(text, candidates) if can_rerank?
|
|
110
116
|
results = candidates.first(limit)
|
|
111
117
|
|
|
118
|
+
tier = opts[:tier]
|
|
119
|
+
results = results.map { |r| project_tier(r, tier) } if tier
|
|
120
|
+
|
|
112
121
|
log.info { "Apollo::Local query completed count=#{results.size}" }
|
|
113
|
-
{ success: true, results: results, count: results.size, mode: :local }
|
|
122
|
+
{ success: true, results: results, count: results.size, mode: :local, tier: tier }
|
|
114
123
|
rescue StandardError => e
|
|
115
124
|
handle_exception(
|
|
116
125
|
e,
|
|
@@ -151,10 +160,11 @@ module Legion
|
|
|
151
160
|
end
|
|
152
161
|
|
|
153
162
|
def query_by_tags(tags:, limit: 50) # rubocop:disable Metrics/MethodLength
|
|
154
|
-
|
|
155
|
-
|
|
163
|
+
connection = local_db_connection
|
|
156
164
|
tags = normalize_tags_input(tags)
|
|
157
|
-
|
|
165
|
+
return { success: false, error: :not_started } unless local_db_usable?(connection)
|
|
166
|
+
|
|
167
|
+
results = query_by_tags_via_sql(connection, tags: tags, limit: limit)
|
|
158
168
|
|
|
159
169
|
log.info { "Apollo::Local query_by_tags completed tag_count=#{tags.size} count=#{results.size}" }
|
|
160
170
|
{ success: true, results: results, count: results.size }
|
|
@@ -170,11 +180,13 @@ module Legion
|
|
|
170
180
|
end
|
|
171
181
|
|
|
172
182
|
def promote_to_global(tags:, min_confidence: 0.6) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
|
|
173
|
-
return { success: false, error: :not_started } unless
|
|
183
|
+
return { success: false, error: :not_started } unless local_db_usable?(local_db_connection)
|
|
174
184
|
|
|
175
185
|
tags = normalize_tags_input(tags)
|
|
176
186
|
entries = query_by_tags(tags: tags)
|
|
177
|
-
|
|
187
|
+
return entries unless entries[:success]
|
|
188
|
+
|
|
189
|
+
unless entries[:results]&.any?
|
|
178
190
|
log.info { "Apollo::Local promote_to_global skipped tag_count=#{tags.size} reason=no_entries" }
|
|
179
191
|
return { success: true, promoted: 0 }
|
|
180
192
|
end
|
|
@@ -192,6 +204,7 @@ module Legion
|
|
|
192
204
|
end
|
|
193
205
|
result = Legion::Apollo.ingest(
|
|
194
206
|
content: entry[:content],
|
|
207
|
+
raw_content: entry[:raw_content] || entry[:content],
|
|
195
208
|
tags: entry_tags + ['promoted_from_local'],
|
|
196
209
|
source_channel: 'local_promotion',
|
|
197
210
|
submitted_by: "node:#{hostname}",
|
|
@@ -223,6 +236,41 @@ module Legion
|
|
|
223
236
|
{ success: false, error: e.message }
|
|
224
237
|
end
|
|
225
238
|
|
|
239
|
+
def version_chain(entry_id:, max_depth: 50) # rubocop:disable Metrics/MethodLength
|
|
240
|
+
return not_started_error unless started?
|
|
241
|
+
|
|
242
|
+
chain = []
|
|
243
|
+
current_id = entry_id
|
|
244
|
+
seen = Set.new
|
|
245
|
+
|
|
246
|
+
max_depth.times do
|
|
247
|
+
break unless current_id
|
|
248
|
+
break if seen.include?(current_id)
|
|
249
|
+
|
|
250
|
+
seen.add(current_id)
|
|
251
|
+
row = db[:local_knowledge].where(id: current_id).first
|
|
252
|
+
break unless row
|
|
253
|
+
|
|
254
|
+
chain << row
|
|
255
|
+
current_id = row[:parent_knowledge_id]
|
|
256
|
+
end
|
|
257
|
+
|
|
258
|
+
{ success: true, chain: chain, count: chain.size }
|
|
259
|
+
rescue StandardError => e
|
|
260
|
+
handle_exception(e, level: :error, operation: 'apollo.local.version_chain', entry_id: entry_id)
|
|
261
|
+
{ success: false, error: e.message }
|
|
262
|
+
end
|
|
263
|
+
|
|
264
|
+
def source_links_for(entry_id:)
|
|
265
|
+
return not_started_error unless started?
|
|
266
|
+
|
|
267
|
+
links = db[:local_source_links].where(entry_id: entry_id).all
|
|
268
|
+
{ success: true, links: links, count: links.size }
|
|
269
|
+
rescue StandardError => e
|
|
270
|
+
handle_exception(e, level: :error, operation: 'apollo.local.source_links_for', entry_id: entry_id)
|
|
271
|
+
{ success: false, error: e.message }
|
|
272
|
+
end
|
|
273
|
+
|
|
226
274
|
private
|
|
227
275
|
|
|
228
276
|
def self_knowledge_files
|
|
@@ -294,6 +342,26 @@ module Legion
|
|
|
294
342
|
Legion::Data::Local.connection
|
|
295
343
|
end
|
|
296
344
|
|
|
345
|
+
def local_db_connection
|
|
346
|
+
return nil unless started? && data_local_available?
|
|
347
|
+
|
|
348
|
+
db
|
|
349
|
+
rescue StandardError => e
|
|
350
|
+
handle_exception(e, level: :debug, operation: 'apollo.local.local_db_connection')
|
|
351
|
+
nil
|
|
352
|
+
end
|
|
353
|
+
|
|
354
|
+
def local_db_usable?(connection)
|
|
355
|
+
return false unless started? && connection
|
|
356
|
+
return false if connection.respond_to?(:closed?) && connection.closed?
|
|
357
|
+
|
|
358
|
+
connection.test_connection if connection.respond_to?(:test_connection)
|
|
359
|
+
true
|
|
360
|
+
rescue StandardError => e
|
|
361
|
+
handle_exception(e, level: :debug, operation: 'apollo.local.local_db_usable')
|
|
362
|
+
false
|
|
363
|
+
end
|
|
364
|
+
|
|
297
365
|
def content_hash(content)
|
|
298
366
|
normalized = content.to_s.strip.downcase.gsub(/\s+/, ' ')
|
|
299
367
|
Digest::MD5.hexdigest(normalized)
|
|
@@ -353,9 +421,12 @@ module Legion
|
|
|
353
421
|
|
|
354
422
|
result = ingest(
|
|
355
423
|
content: entry[:content],
|
|
424
|
+
raw_content: entry[:raw_content] || entry[:content],
|
|
356
425
|
tags: clean_tags,
|
|
357
426
|
confidence: ((entry[:confidence] || 0.5) * 0.9).round(10),
|
|
358
|
-
source_channel: 'global_hydration'
|
|
427
|
+
source_channel: 'global_hydration',
|
|
428
|
+
valid_from: entry[:valid_from],
|
|
429
|
+
valid_to: entry[:valid_to]
|
|
359
430
|
)
|
|
360
431
|
hydrated += 1 if result[:success]
|
|
361
432
|
end
|
|
@@ -365,6 +436,8 @@ module Legion
|
|
|
365
436
|
end
|
|
366
437
|
|
|
367
438
|
def ingest_without_lock(content:, tags:, **opts) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
|
|
439
|
+
content = normalize_text_input(content)
|
|
440
|
+
raw_content = normalize_text_input(opts.key?(:raw_content) ? opts[:raw_content] : content)
|
|
368
441
|
hash = content_hash(content)
|
|
369
442
|
return deduplicated_ingest(hash) if duplicate?(hash)
|
|
370
443
|
|
|
@@ -374,8 +447,11 @@ module Legion
|
|
|
374
447
|
end
|
|
375
448
|
log.debug { "Apollo::Local ingest hash=#{hash} tags=#{Array(tags).size} source_channel=#{opts[:source_channel]}" }
|
|
376
449
|
|
|
377
|
-
|
|
378
|
-
|
|
450
|
+
metadata = opts.dup
|
|
451
|
+
metadata.delete(:raw_content)
|
|
452
|
+
row = build_ingest_row(content: content, raw_content: raw_content, hash: hash, tags: tags, **metadata)
|
|
453
|
+
id = persist_ingest_row(row, metadata)
|
|
454
|
+
mark_parent_superseded(metadata[:parent_knowledge_id]) if metadata[:parent_knowledge_id]
|
|
379
455
|
|
|
380
456
|
log.info { "Apollo::Local ingest stored id=#{id} hash=#{hash}" }
|
|
381
457
|
{ success: true, mode: :local, id: id }
|
|
@@ -385,31 +461,94 @@ module Legion
|
|
|
385
461
|
deduplicated_ingest(hash)
|
|
386
462
|
end
|
|
387
463
|
|
|
388
|
-
def build_ingest_row(content:, hash:, tags:, **opts)
|
|
464
|
+
def build_ingest_row(content:, raw_content:, hash:, tags:, **opts) # rubocop:disable Metrics/MethodLength
|
|
465
|
+
is_inference = opts[:is_inference] == true
|
|
466
|
+
default_confidence = is_inference ? Legion::Apollo::Helpers::Confidence::INITIAL_INFERENCE_CONFIDENCE : 1.0
|
|
467
|
+
ingest_metadata_columns(
|
|
468
|
+
content: content,
|
|
469
|
+
raw_content: raw_content,
|
|
470
|
+
hash: hash,
|
|
471
|
+
tags: tags,
|
|
472
|
+
opts: opts,
|
|
473
|
+
is_inference: is_inference,
|
|
474
|
+
default_confidence: default_confidence
|
|
475
|
+
).merge(embedding_columns(content, opts)).merge(timestamp_columns)
|
|
476
|
+
end
|
|
477
|
+
|
|
478
|
+
def ingest_metadata_columns(context)
|
|
479
|
+
ingest_base_columns(context)
|
|
480
|
+
.merge(ingest_lineage_columns(context[:opts]))
|
|
481
|
+
.merge(ingest_temporal_columns(context[:opts]))
|
|
482
|
+
end
|
|
483
|
+
|
|
484
|
+
def ingest_base_columns(context)
|
|
485
|
+
opts = context[:opts]
|
|
389
486
|
{
|
|
390
|
-
content:
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
487
|
+
content: context[:content],
|
|
488
|
+
raw_content: context[:raw_content],
|
|
489
|
+
content_hash: context[:hash],
|
|
490
|
+
tags: serialized_tags(context[:tags]),
|
|
491
|
+
confidence: opts[:confidence] || context[:default_confidence],
|
|
492
|
+
is_inference: context[:is_inference]
|
|
493
|
+
}.merge(ingest_source_columns(opts))
|
|
494
|
+
end
|
|
495
|
+
|
|
496
|
+
def ingest_source_columns(opts)
|
|
497
|
+
{ source_channel: opts[:source_channel], source_agent: opts[:source_agent],
|
|
498
|
+
submitted_by: opts[:submitted_by] }
|
|
499
|
+
end
|
|
500
|
+
|
|
501
|
+
def ingest_lineage_columns(opts)
|
|
502
|
+
{
|
|
503
|
+
forget_reason: opts[:forget_reason],
|
|
504
|
+
parent_knowledge_id: opts[:parent_knowledge_id],
|
|
505
|
+
supersession_type: opts[:supersession_type]
|
|
506
|
+
}
|
|
398
507
|
end
|
|
399
508
|
|
|
400
|
-
def
|
|
509
|
+
def ingest_temporal_columns(opts)
|
|
510
|
+
{
|
|
511
|
+
valid_from: normalize_temporal_value(opts[:valid_from]),
|
|
512
|
+
valid_to: normalize_temporal_value(opts[:valid_to])
|
|
513
|
+
}
|
|
514
|
+
end
|
|
515
|
+
|
|
516
|
+
def persist_ingest_row(row, opts = {})
|
|
401
517
|
db.transaction do
|
|
402
518
|
id = db[:local_knowledge].insert(row)
|
|
403
519
|
sync_fts!(id, row[:content], row[:tags])
|
|
520
|
+
create_source_link(id, opts) if opts[:source_uri]
|
|
404
521
|
id
|
|
405
522
|
end
|
|
406
523
|
end
|
|
407
524
|
|
|
525
|
+
def create_source_link(entry_id, opts)
|
|
526
|
+
db[:local_source_links].insert(
|
|
527
|
+
entry_id: entry_id,
|
|
528
|
+
source_uri: opts[:source_uri],
|
|
529
|
+
source_hash: opts[:source_hash],
|
|
530
|
+
relevance_score: opts[:relevance_score] || 1.0,
|
|
531
|
+
extraction_method: opts[:extraction_method],
|
|
532
|
+
created_at: Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
|
|
533
|
+
)
|
|
534
|
+
rescue StandardError => e
|
|
535
|
+
handle_exception(e, level: :warn, operation: 'apollo.local.create_source_link', entry_id: entry_id)
|
|
536
|
+
end
|
|
537
|
+
|
|
408
538
|
def deduplicated_ingest(hash)
|
|
409
539
|
log.info { "Apollo::Local ingest deduplicated hash=#{hash}" }
|
|
410
540
|
{ success: true, mode: :deduplicated }
|
|
411
541
|
end
|
|
412
542
|
|
|
543
|
+
def mark_parent_superseded(parent_id)
|
|
544
|
+
return unless parent_id
|
|
545
|
+
|
|
546
|
+
db[:local_knowledge].where(id: parent_id).update(is_latest: false)
|
|
547
|
+
log.info { "Apollo::Local marked entry id=#{parent_id} as superseded" }
|
|
548
|
+
rescue StandardError => e
|
|
549
|
+
handle_exception(e, level: :warn, operation: 'apollo.local.mark_parent_superseded', parent_id: parent_id)
|
|
550
|
+
end
|
|
551
|
+
|
|
413
552
|
def generate_embedding(content) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
|
|
414
553
|
unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:can_embed?) && Legion::LLM.can_embed?
|
|
415
554
|
log.debug 'Apollo::Local embedding skipped because embeddings are unavailable'
|
|
@@ -447,13 +586,13 @@ module Legion
|
|
|
447
586
|
log.debug { "Apollo::Local FTS synced id=#{id}" }
|
|
448
587
|
end
|
|
449
588
|
|
|
450
|
-
def embedding_columns(content)
|
|
589
|
+
def embedding_columns(content, opts = {})
|
|
451
590
|
embedding, embedded_at = generate_embedding(content)
|
|
452
591
|
|
|
453
592
|
{
|
|
454
593
|
embedding: embedding ? Legion::JSON.dump(embedding) : nil,
|
|
455
594
|
embedded_at: embedded_at,
|
|
456
|
-
expires_at: compute_expires_at
|
|
595
|
+
expires_at: opts[:expires_at] || compute_expires_at
|
|
457
596
|
}
|
|
458
597
|
end
|
|
459
598
|
|
|
@@ -466,33 +605,45 @@ module Legion
|
|
|
466
605
|
Legion::JSON.dump(normalize_tags_input(tags))
|
|
467
606
|
end
|
|
468
607
|
|
|
469
|
-
def fts_search(text, limit:) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
|
|
470
|
-
if text.to_s.strip.empty?
|
|
471
|
-
return db[:local_knowledge]
|
|
472
|
-
.where(Sequel.lit('expires_at > ?', Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')))
|
|
473
|
-
.limit(limit)
|
|
474
|
-
.all
|
|
475
|
-
end
|
|
476
|
-
|
|
477
|
-
escaped = text.to_s.gsub('"', '""')
|
|
608
|
+
def fts_search(text, limit:, as_of: nil) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
|
|
478
609
|
now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
|
|
610
|
+
return active_knowledge_dataset(now: now, as_of: as_of).limit(limit).all if text.to_s.strip.empty?
|
|
611
|
+
|
|
612
|
+
tokens = text.to_s.scan(/[\p{L}\p{N}_]+/)
|
|
613
|
+
return ilike_search(text, now: now, limit: limit, as_of: as_of) if tokens.empty?
|
|
614
|
+
|
|
615
|
+
escaped = tokens.map { |t| %("#{t}") }.join(' ')
|
|
616
|
+
temporal_sql, temporal_params = temporal_window_sql(as_of, table_alias: 'lk')
|
|
479
617
|
db.fetch(
|
|
480
618
|
'SELECT lk.* FROM local_knowledge lk ' \
|
|
481
619
|
'INNER JOIN local_knowledge_fts fts ON lk.id = fts.rowid ' \
|
|
482
|
-
|
|
483
|
-
|
|
620
|
+
"WHERE local_knowledge_fts MATCH ? AND lk.expires_at > ?#{temporal_sql} " \
|
|
621
|
+
'ORDER BY fts.rank LIMIT ?',
|
|
622
|
+
escaped, now, *temporal_params, limit
|
|
484
623
|
).all
|
|
485
624
|
rescue StandardError => e
|
|
486
625
|
handle_exception(e, level: :debug, operation: 'apollo.local.fts_search', limit: limit, fallback: :ilike)
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
|
|
626
|
+
ilike_search(text, now: Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ'), limit: limit, as_of: as_of)
|
|
627
|
+
end
|
|
628
|
+
|
|
629
|
+
def ilike_search(text, now:, limit:, as_of: nil)
|
|
630
|
+
safe_text = text.to_s.gsub('\\', '\\\\\\\\').gsub('%', '\%').gsub('_', '\_')
|
|
631
|
+
active_knowledge_dataset(now: now, as_of: as_of)
|
|
632
|
+
.where(Sequel.lit("content LIKE ? ESCAPE '\\' COLLATE NOCASE", "%#{safe_text}%"))
|
|
490
633
|
.limit(limit)
|
|
491
634
|
.all
|
|
492
635
|
end
|
|
493
636
|
|
|
494
|
-
def filter_candidates(candidates, min_confidence:, tags:)
|
|
637
|
+
def filter_candidates(candidates, min_confidence:, tags:, options: {}) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity,Metrics/MethodLength,Metrics/AbcSize
|
|
638
|
+
include_inferences = options.fetch(:include_inferences, true)
|
|
639
|
+
include_history = options.fetch(:include_history, false)
|
|
640
|
+
as_of = options[:as_of]
|
|
495
641
|
candidates = candidates.select { |c| (c[:confidence] || 0) >= min_confidence }
|
|
642
|
+
candidates = candidates.select { |c| temporally_valid?(c, as_of) }
|
|
643
|
+
candidates = candidates.reject { |c| [1, true].include?(c[:is_inference]) } unless include_inferences
|
|
644
|
+
unless include_history
|
|
645
|
+
candidates = candidates.select { |c| c[:is_latest].nil? || c[:is_latest] == 1 || c[:is_latest] == true }
|
|
646
|
+
end
|
|
496
647
|
if tags && !tags.empty?
|
|
497
648
|
tag_set = Array(tags).map(&:to_s)
|
|
498
649
|
candidates = candidates.select do |c|
|
|
@@ -503,6 +654,36 @@ module Legion
|
|
|
503
654
|
candidates
|
|
504
655
|
end
|
|
505
656
|
|
|
657
|
+
def active_knowledge_dataset(now:, as_of: nil)
|
|
658
|
+
apply_temporal_window(db[:local_knowledge].where(Sequel.lit('expires_at > ?', now)), as_of)
|
|
659
|
+
end
|
|
660
|
+
|
|
661
|
+
def apply_temporal_window(dataset, as_of)
|
|
662
|
+
return dataset if as_of.to_s.empty?
|
|
663
|
+
|
|
664
|
+
dataset.where(
|
|
665
|
+
Sequel.lit('(valid_from IS NULL OR valid_from <= ?) AND (valid_to IS NULL OR valid_to >= ?)', as_of, as_of)
|
|
666
|
+
)
|
|
667
|
+
end
|
|
668
|
+
|
|
669
|
+
def temporal_window_sql(as_of, table_alias:)
|
|
670
|
+
return ['', []] if as_of.to_s.empty?
|
|
671
|
+
|
|
672
|
+
[
|
|
673
|
+
" AND (#{table_alias}.valid_from IS NULL OR #{table_alias}.valid_from <= ?) " \
|
|
674
|
+
"AND (#{table_alias}.valid_to IS NULL OR #{table_alias}.valid_to >= ?)",
|
|
675
|
+
[as_of, as_of]
|
|
676
|
+
]
|
|
677
|
+
end
|
|
678
|
+
|
|
679
|
+
def temporally_valid?(row, as_of)
|
|
680
|
+
return true if as_of.to_s.empty?
|
|
681
|
+
|
|
682
|
+
valid_from = row[:valid_from]
|
|
683
|
+
valid_to = row[:valid_to]
|
|
684
|
+
(valid_from.nil? || valid_from <= as_of) && (valid_to.nil? || valid_to >= as_of)
|
|
685
|
+
end
|
|
686
|
+
|
|
506
687
|
def parse_tags(tags_json)
|
|
507
688
|
return [] if tags_json.nil? || tags_json.empty?
|
|
508
689
|
|
|
@@ -571,6 +752,17 @@ module Legion
|
|
|
571
752
|
value.to_s
|
|
572
753
|
end
|
|
573
754
|
|
|
755
|
+
def normalize_temporal_value(value)
|
|
756
|
+
return nil if value.nil?
|
|
757
|
+
|
|
758
|
+
text = value.respond_to?(:utc) ? value.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ') : value.to_s.strip
|
|
759
|
+
return nil if text.empty?
|
|
760
|
+
|
|
761
|
+
Time.parse(text).utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
|
|
762
|
+
rescue StandardError
|
|
763
|
+
text
|
|
764
|
+
end
|
|
765
|
+
|
|
574
766
|
def normalize_tags_input(tags)
|
|
575
767
|
Legion::Apollo::Helpers::TagNormalizer.normalize(Array(tags)).first(max_tags_limit)
|
|
576
768
|
rescue StandardError => e
|
|
@@ -589,9 +781,9 @@ module Legion
|
|
|
589
781
|
Legion::Apollo::Helpers::TagNormalizer::MAX_TAGS
|
|
590
782
|
end
|
|
591
783
|
|
|
592
|
-
def query_by_tags_via_sql(tags:, limit:) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength
|
|
784
|
+
def query_by_tags_via_sql(connection, tags:, limit:) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength
|
|
593
785
|
now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
|
|
594
|
-
dataset =
|
|
786
|
+
dataset = connection[:local_knowledge].where(Sequel.lit('expires_at > ?', now))
|
|
595
787
|
|
|
596
788
|
Array(tags).map(&:to_s).each do |tag|
|
|
597
789
|
dataset = dataset.where(
|
|
@@ -611,11 +803,15 @@ module Legion
|
|
|
611
803
|
tag_count: Array(tags).size,
|
|
612
804
|
limit: limit
|
|
613
805
|
)
|
|
614
|
-
|
|
806
|
+
raise unless local_db_usable?(connection)
|
|
807
|
+
|
|
808
|
+
query_by_tags_via_ruby(connection, tags: tags, limit: limit)
|
|
615
809
|
end
|
|
616
810
|
|
|
617
|
-
def query_by_tags_via_ruby(tags:, limit:)
|
|
618
|
-
|
|
811
|
+
def query_by_tags_via_ruby(connection, tags:, limit:)
|
|
812
|
+
raise Sequel::DatabaseConnectionError, 'local database unavailable' unless local_db_usable?(connection)
|
|
813
|
+
|
|
814
|
+
candidates = connection[:local_knowledge]
|
|
619
815
|
.where(Sequel.lit('expires_at > ?', Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')))
|
|
620
816
|
.all
|
|
621
817
|
|
|
@@ -626,7 +822,7 @@ module Legion
|
|
|
626
822
|
end
|
|
627
823
|
|
|
628
824
|
def update_upsert_entry(existing, content, tags_json, opts) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
|
|
629
|
-
content = content
|
|
825
|
+
content = normalize_text_input(content)
|
|
630
826
|
new_hash = content_hash(content)
|
|
631
827
|
embedding, embedded_at = generate_embedding(content)
|
|
632
828
|
now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
|
|
@@ -658,6 +854,21 @@ module Legion
|
|
|
658
854
|
log.debug { "Apollo::Local FTS rebuilt id=#{id}" }
|
|
659
855
|
end
|
|
660
856
|
|
|
857
|
+
def project_tier(entry, tier) # rubocop:disable Metrics/MethodLength
|
|
858
|
+
case tier
|
|
859
|
+
when :l0
|
|
860
|
+
entry.slice(:id, :content_hash, :confidence, :tags, :source_channel, :is_inference, :is_latest).merge(
|
|
861
|
+
summary: entry[:summary_l0] || entry[:content]&.slice(0, 200)
|
|
862
|
+
)
|
|
863
|
+
when :l1
|
|
864
|
+
entry.slice(:id, :content_hash, :confidence, :tags, :source_channel, :is_inference, :is_latest).merge(
|
|
865
|
+
summary: entry[:summary_l1] || entry[:content]&.slice(0, 1000)
|
|
866
|
+
)
|
|
867
|
+
else
|
|
868
|
+
entry
|
|
869
|
+
end
|
|
870
|
+
end
|
|
871
|
+
|
|
661
872
|
def not_started_error
|
|
662
873
|
{ success: false, error: :not_started }
|
|
663
874
|
end
|
data/lib/legion/apollo/routes.rb
CHANGED
|
@@ -50,7 +50,7 @@ module Legion
|
|
|
50
50
|
end
|
|
51
51
|
end
|
|
52
52
|
|
|
53
|
-
def self.register_query_route(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity
|
|
53
|
+
def self.register_query_route(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
|
|
54
54
|
app.post '/api/apollo/query' do
|
|
55
55
|
unless apollo_api_available?
|
|
56
56
|
halt 503, json_error('apollo_unavailable', 'apollo is not available', status_code: 503)
|
|
@@ -59,21 +59,25 @@ module Legion
|
|
|
59
59
|
body = parse_request_body
|
|
60
60
|
default_limit = defined?(Legion::Settings) ? (Legion::Settings[:apollo]&.dig(:default_limit) || 5) : 5
|
|
61
61
|
result = Legion::Apollo.query(
|
|
62
|
-
text:
|
|
63
|
-
limit:
|
|
64
|
-
min_confidence:
|
|
65
|
-
status:
|
|
66
|
-
tags:
|
|
67
|
-
domain:
|
|
68
|
-
agent_id:
|
|
69
|
-
scope:
|
|
62
|
+
text: body[:query],
|
|
63
|
+
limit: body[:limit] || default_limit,
|
|
64
|
+
min_confidence: body[:min_confidence],
|
|
65
|
+
status: body[:status] || [:confirmed],
|
|
66
|
+
tags: body[:tags],
|
|
67
|
+
domain: body[:domain],
|
|
68
|
+
agent_id: body[:agent_id] || 'api',
|
|
69
|
+
scope: normalize_scope(body[:scope]),
|
|
70
|
+
tier: body[:tier]&.to_sym,
|
|
71
|
+
as_of: body[:as_of],
|
|
72
|
+
include_inferences: body.fetch(:include_inferences, true),
|
|
73
|
+
include_history: body.fetch(:include_history, false)
|
|
70
74
|
)
|
|
71
75
|
json_response(result, status_code: apollo_status_code(result))
|
|
72
76
|
end
|
|
73
77
|
end
|
|
74
78
|
|
|
75
79
|
def self.register_ingest_route(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
|
|
76
|
-
app.post '/api/apollo/ingest' do
|
|
80
|
+
app.post '/api/apollo/ingest' do # rubocop:disable Metrics/BlockLength
|
|
77
81
|
unless apollo_api_available?
|
|
78
82
|
halt 503, json_error('apollo_unavailable', 'apollo is not available', status_code: 503)
|
|
79
83
|
end
|
|
@@ -84,15 +88,27 @@ module Legion
|
|
|
84
88
|
effective_max_tags = [max_tags, Legion::Apollo::Helpers::TagNormalizer::MAX_TAGS].min
|
|
85
89
|
tags = Legion::Apollo::Helpers::TagNormalizer.normalize(Array(body[:tags])).first(effective_max_tags)
|
|
86
90
|
result = Legion::Apollo.ingest(
|
|
87
|
-
content:
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
91
|
+
content: body[:content],
|
|
92
|
+
raw_content: body[:raw_content],
|
|
93
|
+
content_type: body[:content_type] || :observation,
|
|
94
|
+
tags: tags,
|
|
95
|
+
source_agent: body[:source_agent] || 'api',
|
|
96
|
+
source_provider: body[:source_provider],
|
|
97
|
+
source_channel: body[:source_channel] || 'rest_api',
|
|
98
|
+
knowledge_domain: body[:knowledge_domain],
|
|
99
|
+
context: body[:context] || {},
|
|
100
|
+
scope: normalize_scope(body[:scope]),
|
|
101
|
+
is_inference: body[:is_inference] == true,
|
|
102
|
+
forget_reason: body[:forget_reason],
|
|
103
|
+
expires_at: body[:expires_at],
|
|
104
|
+
valid_from: body[:valid_from],
|
|
105
|
+
valid_to: body[:valid_to],
|
|
106
|
+
parent_knowledge_id: body[:parent_knowledge_id],
|
|
107
|
+
supersession_type: body[:supersession_type],
|
|
108
|
+
source_uri: body[:source_uri],
|
|
109
|
+
source_hash: body[:source_hash],
|
|
110
|
+
relevance_score: body[:relevance_score],
|
|
111
|
+
extraction_method: body[:extraction_method]
|
|
96
112
|
)
|
|
97
113
|
json_response(result, status_code: apollo_status_code(result, success_status: 201))
|
|
98
114
|
end
|
|
@@ -197,13 +213,16 @@ module Legion
|
|
|
197
213
|
:global
|
|
198
214
|
end
|
|
199
215
|
|
|
200
|
-
def apollo_status_code(result, success_status: 200)
|
|
216
|
+
def apollo_status_code(result, success_status: 200) # rubocop:disable Metrics/MethodLength
|
|
201
217
|
return 202 if result[:success] && result[:mode] == :async
|
|
202
218
|
return success_status if result[:success]
|
|
203
219
|
|
|
204
220
|
case result[:error]
|
|
205
|
-
when :no_path_available, :not_started
|
|
206
|
-
|
|
221
|
+
when :no_path_available, :not_started, :local_not_started,
|
|
222
|
+
:upstream_query_failed, :backend_query_failed
|
|
223
|
+
503
|
|
224
|
+
else
|
|
225
|
+
500
|
|
207
226
|
end
|
|
208
227
|
rescue StandardError => e
|
|
209
228
|
handle_exception(e, level: :debug, operation: :apollo_status_code)
|
|
@@ -10,7 +10,9 @@ module Legion
|
|
|
10
10
|
max_tags: 20,
|
|
11
11
|
default_limit: 5,
|
|
12
12
|
min_confidence: 0.3,
|
|
13
|
-
local: local_defaults
|
|
13
|
+
local: local_defaults,
|
|
14
|
+
versioning: versioning_defaults,
|
|
15
|
+
expiry: expiry_defaults
|
|
14
16
|
}
|
|
15
17
|
end
|
|
16
18
|
|
|
@@ -24,6 +26,22 @@ module Legion
|
|
|
24
26
|
default_limit: 5
|
|
25
27
|
}
|
|
26
28
|
end
|
|
29
|
+
|
|
30
|
+
def self.versioning_defaults
|
|
31
|
+
{
|
|
32
|
+
enabled: true,
|
|
33
|
+
supersession_threshold: 0.85,
|
|
34
|
+
max_chain_depth: 50
|
|
35
|
+
}
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
def self.expiry_defaults
|
|
39
|
+
{
|
|
40
|
+
enabled: true,
|
|
41
|
+
sweep_interval: 3600,
|
|
42
|
+
warn_before_expiry: 86_400
|
|
43
|
+
}
|
|
44
|
+
end
|
|
27
45
|
end
|
|
28
46
|
end
|
|
29
47
|
end
|
data/lib/legion/apollo.rb
CHANGED
|
@@ -97,9 +97,11 @@ module Legion
|
|
|
97
97
|
return not_started_error unless started?
|
|
98
98
|
|
|
99
99
|
normalized_tags = normalize_tags_input(tags)
|
|
100
|
-
|
|
100
|
+
normalized_content = normalize_text_input(content)
|
|
101
|
+
normalized_raw_content = normalize_text_input(opts.key?(:raw_content) ? opts[:raw_content] : content)
|
|
102
|
+
payload = { **opts, content: normalized_content, raw_content: normalized_raw_content, tags: normalized_tags }
|
|
101
103
|
log.info do
|
|
102
|
-
"Apollo ingest requested scope=#{scope} content_length=#{content.to_s.length} " \
|
|
104
|
+
"Apollo ingest requested scope=#{scope} content_length=#{payload[:content].to_s.length} " \
|
|
103
105
|
"tags=#{payload[:tags].size} source_channel=#{payload[:source_channel]}"
|
|
104
106
|
end
|
|
105
107
|
log.debug do
|
|
@@ -247,7 +249,7 @@ module Legion
|
|
|
247
249
|
Legion::Extensions::Apollo::Runners::Knowledge.handle_query(**normalize_query_payload(payload))
|
|
248
250
|
rescue StandardError => e
|
|
249
251
|
handle_exception(e, level: :error, operation: 'apollo.direct_query', payload_keys: payload.keys)
|
|
250
|
-
{ success: false, error: e.message }
|
|
252
|
+
{ success: false, error: :backend_query_failed, detail: e.message }
|
|
251
253
|
end
|
|
252
254
|
|
|
253
255
|
def direct_ingest(payload)
|
|
@@ -288,7 +290,9 @@ module Legion
|
|
|
288
290
|
"Apollo query using local store text_length=#{payload[:text].to_s.length} " \
|
|
289
291
|
"limit=#{payload[:limit]}"
|
|
290
292
|
end
|
|
291
|
-
result = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags
|
|
293
|
+
result = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags,
|
|
294
|
+
:tier, :include_inferences, :include_history,
|
|
295
|
+
:as_of))
|
|
292
296
|
return result unless result[:success]
|
|
293
297
|
|
|
294
298
|
entries = normalize_local_entries(Array(result[:results]))
|
|
@@ -322,7 +326,9 @@ module Legion
|
|
|
322
326
|
|
|
323
327
|
if Legion::Apollo::Local.started?
|
|
324
328
|
attempted = true
|
|
325
|
-
local = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags
|
|
329
|
+
local = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags,
|
|
330
|
+
:tier, :include_inferences, :include_history,
|
|
331
|
+
:as_of))
|
|
326
332
|
if local[:success]
|
|
327
333
|
any_success = true
|
|
328
334
|
entries.concat(normalize_local_entries(Array(local[:results]))) if local[:results]
|
|
@@ -341,6 +347,9 @@ module Legion
|
|
|
341
347
|
return { success: false, error: :no_path_available } unless attempted
|
|
342
348
|
|
|
343
349
|
unless any_success
|
|
350
|
+
symbol_errors = errors.compact.grep(Symbol).uniq
|
|
351
|
+
return { success: false, error: symbol_errors.first } if symbol_errors.size == 1
|
|
352
|
+
|
|
344
353
|
combined_error = errors.compact.map(&:to_s).reject(&:empty?).join('; ')
|
|
345
354
|
combined_error = :upstream_query_failed if combined_error.empty?
|
|
346
355
|
return { success: false, error: combined_error }
|
|
@@ -372,18 +381,29 @@ module Legion
|
|
|
372
381
|
else
|
|
373
382
|
Array(e[:tags])
|
|
374
383
|
end
|
|
375
|
-
{ id: e[:id], content: e[:content], content_hash: hash,
|
|
376
|
-
confidence: e[:confidence] || 0.5, content_type: 'fact', tags: tags, source: :local
|
|
384
|
+
{ id: e[:id], content: e[:content], raw_content: e[:raw_content] || e[:content], content_hash: hash,
|
|
385
|
+
confidence: e[:confidence] || 0.5, content_type: 'fact', tags: tags, source: :local,
|
|
386
|
+
valid_from: e[:valid_from], valid_to: e[:valid_to] }
|
|
377
387
|
end
|
|
378
388
|
end
|
|
379
389
|
|
|
380
390
|
def normalize_global_entries(entries)
|
|
381
|
-
entries.map
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
391
|
+
entries.map { |entry| normalize_global_entry(entry) }
|
|
392
|
+
end
|
|
393
|
+
|
|
394
|
+
def normalize_global_entry(entry)
|
|
395
|
+
{ id: entry[:id], content: entry[:content], raw_content: normalized_raw_content(entry),
|
|
396
|
+
content_hash: normalized_content_hash(entry), confidence: entry[:confidence] || 0.5,
|
|
397
|
+
content_type: entry[:content_type] || 'fact', tags: Array(entry[:tags]), source: :global,
|
|
398
|
+
valid_from: entry[:valid_from], valid_to: entry[:valid_to] }
|
|
399
|
+
end
|
|
400
|
+
|
|
401
|
+
def normalized_raw_content(entry)
|
|
402
|
+
entry[:raw_content] || entry[:content]
|
|
403
|
+
end
|
|
404
|
+
|
|
405
|
+
def normalized_content_hash(entry)
|
|
406
|
+
entry[:content_hash] || Digest::MD5.hexdigest(entry[:content].to_s.strip.downcase.gsub(/\s+/, ' '))
|
|
387
407
|
end
|
|
388
408
|
|
|
389
409
|
def dedup_and_rank(entries, limit:)
|
|
@@ -465,20 +485,28 @@ module Legion
|
|
|
465
485
|
end
|
|
466
486
|
|
|
467
487
|
def normalize_text_input(value) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/MethodLength
|
|
468
|
-
case value
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
488
|
+
text = case value
|
|
489
|
+
when String
|
|
490
|
+
value
|
|
491
|
+
when Array
|
|
492
|
+
parts = value.filter_map { |entry| extract_text_fragment(entry) }
|
|
493
|
+
joined = parts.map(&:to_s).map(&:strip).reject(&:empty?).join("\n")
|
|
494
|
+
joined.empty? ? value.to_s : joined
|
|
495
|
+
when Hash
|
|
496
|
+
extract_text_fragment(value).to_s
|
|
497
|
+
when nil
|
|
498
|
+
''
|
|
499
|
+
else
|
|
500
|
+
value.to_s
|
|
501
|
+
end
|
|
502
|
+
sanitize_text_input(text)
|
|
503
|
+
end
|
|
504
|
+
|
|
505
|
+
def sanitize_text_input(value)
|
|
506
|
+
text = value.to_s.dup
|
|
507
|
+
text = text.encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: '')
|
|
508
|
+
text = text.scrub('') unless text.valid_encoding?
|
|
509
|
+
text.delete("\u0000")
|
|
482
510
|
end
|
|
483
511
|
|
|
484
512
|
def normalize_tags_input(tags)
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: legion-apollo
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.5.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Esity
|
|
@@ -86,6 +86,8 @@ files:
|
|
|
86
86
|
- lib/legion/apollo/local/migrations/001_create_local_knowledge.rb
|
|
87
87
|
- lib/legion/apollo/local/migrations/002_create_graph_tables.rb
|
|
88
88
|
- lib/legion/apollo/local/migrations/003_harden_graph_relationships.rb
|
|
89
|
+
- lib/legion/apollo/local/migrations/004_add_versioning_tiers_inference.rb
|
|
90
|
+
- lib/legion/apollo/local/migrations/005_add_raw_content_temporal_windows.rb
|
|
89
91
|
- lib/legion/apollo/messages/access_boost.rb
|
|
90
92
|
- lib/legion/apollo/messages/ingest.rb
|
|
91
93
|
- lib/legion/apollo/messages/query.rb
|