legion-apollo 0.4.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7468f022e582731ace40da083285569d57c98e38ef375e5745f43d93e9c9d171
4
- data.tar.gz: 3d2bb59a17a6060e690c7321fc914d2dc2e2bdd7bb1ec108917bf86fc7f26c72
3
+ metadata.gz: 9320cef3392467f9819e47c9cacc3893f45ffab9570825cb5d6b270ff3ce6a67
4
+ data.tar.gz: d38343aa96c9ba0b1d634f6e4d3850e22c899fe81c13550dce19809cec290a94
5
5
  SHA512:
6
- metadata.gz: a47cb6e18c47432ec0f3a15872f52dc193d7253fda94833da1c63d6046ad39bba8894f10dacc1c3472d7ef538157aad39cf1cf9894061623a9f1a162f69a5c16
7
- data.tar.gz: 3ca7dc35c483089a4b2c6bb0af57b524afe3a3cce571b893a14a4aa01f74fbb01b51b5eb09040389db09a6b5f2516d0bd6d97fc499bf8c912aadc6c3b622a587
6
+ metadata.gz: 2dcaf3fc8fbf087b8e59c079e71921aebf8c53620e32d9cd8939a00981474747b78e978412d28a5c3ff4f204df01b9538f23a81c82a76a0d94afab8080219e46
7
+ data.tar.gz: 8d078c7e1c727bd23b08bfdb09af0abe7581296132d715b7ed7c4c5205d4645c004f1f980af6546239961c6649e9433e66a725341e2275e3cdb2e1de7f1687a3
data/CHANGELOG.md CHANGED
@@ -1,5 +1,35 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.5.2] - 2026-04-27
4
+
5
+ ### Added
6
+ - Store `raw_content` alongside indexed `content` in Apollo Local so callers can preserve verbatim source text separately from retrieval text (#25, #26)
7
+ - Add `valid_from`/`valid_to` temporal windows and `as_of:` query filtering for local knowledge entries (#27)
8
+
9
+ ### Fixed
10
+ - Sanitize Apollo ingest and query text by scrubbing invalid UTF-8 and removing null bytes before routing to local or global backends (#29)
11
+
12
+ ## [0.5.1] - 2026-04-27
13
+
14
+ ### Fixed
15
+ - Guard Apollo Local tag queries and promotion against nil, shutdown, or unavailable local DB connections before SQL and Ruby fallback paths (#30)
16
+
17
+ ## [0.5.0] - 2026-04-18
18
+
19
+ ### Added
20
+ - Migration 004: versioning, tiers, inference, expiry metadata, and source linkage columns on `local_knowledge`; `local_source_links` table (#6-#11)
21
+ - Inference tagging: `is_inference` flag on ingest, `include_inferences:` filter on query, `INITIAL_INFERENCE_CONFIDENCE` (0.35) for LLM-derived entries (#9)
22
+ - Temporal expiry metadata: `forget_reason` and custom `expires_at` on ingest (#8)
23
+ - Versioned knowledge: `parent_knowledge_id`/`is_latest`/`supersession_type` on ingest, automatic parent supersession, `version_chain` traversal, `include_history:` query filter (#7)
24
+ - L0/L1/L2 tiered retrieval: `tier:` parameter on query with summary projection and truncation fallback (#6)
25
+ - Source-to-fact linkage: `source_uri`/`source_hash`/`relevance_score`/`extraction_method` on ingest, `source_links_for` query method, `local_source_links` table (#10)
26
+ - `SUPERSEDES` relation type in `Local::Graph` (#11)
27
+ - Versioning and expiry settings defaults
28
+
29
+ ### Fixed
30
+ - FTS5 search crashes on punctuation (`.`, `:`, `-`, `+`, etc.) by tokenizing input into quoted alphanumeric terms with implicit AND semantics; ILIKE fallback now escapes `%` and `_` wildcards (#22)
31
+ - Apollo query returns HTTP 500 on non-Postgres backends: `direct_query` exceptions normalized to `:backend_query_failed` symbol, `apollo_status_code` maps known unavailability symbols to 503 (#23)
32
+
3
33
  ## [0.4.0] - 2026-04-02
4
34
 
5
35
  ### Changed
data/README.md CHANGED
@@ -1,10 +1,10 @@
1
1
  # legion-apollo
2
2
 
3
- Apollo client library for the LegionIO framework.
3
+ Apollo is the LegionIO knowledge client. It gives extensions one API for writing, retrieving, and merging knowledge across the global Apollo service and the node-local SQLite store.
4
4
 
5
- **Version**: 0.3.2
5
+ **Version**: 0.5.2
6
6
 
7
- Provides `query`, `ingest`, and `retrieve` with smart routing: co-located lex-apollo service, RabbitMQ transport, or graceful failure. Supports a node-local SQLite knowledge store (`Apollo::Local`) that mirrors the same API without requiring any remote infrastructure.
7
+ `legion-apollo` provides `query`, `ingest`, and `retrieve` with smart routing: co-located `lex-apollo`, RabbitMQ transport, node-local SQLite, or graceful failure. `Apollo::Local` mirrors the same public API for offline and low-latency retrieval without requiring remote infrastructure.
8
8
 
9
9
  ## Usage
10
10
 
@@ -21,6 +21,24 @@ results = Legion::Apollo.query(text: 'local note', scope: :local)
21
21
 
22
22
  # Query both and merge (deduped by content hash, ranked by confidence)
23
23
  results = Legion::Apollo.query(text: 'ruby', scope: :all)
24
+
25
+ # Preserve verbatim source text separately from indexed retrieval content
26
+ Legion::Apollo.ingest(
27
+ content: 'Summarized policy note for search',
28
+ raw_content: 'Exact source text from the original record',
29
+ tags: %w[policy source],
30
+ scope: :local
31
+ )
32
+
33
+ # Query the local store as it was valid at a point in time
34
+ Legion::Apollo.ingest(
35
+ content: 'Policy version active in Q2',
36
+ tags: %w[policy],
37
+ valid_from: '2026-04-01T00:00:00.000Z',
38
+ valid_to: '2026-06-30T23:59:59.999Z',
39
+ scope: :local
40
+ )
41
+ results = Legion::Apollo.query(text: 'policy', scope: :local, as_of: '2026-05-01T00:00:00.000Z')
24
42
  ```
25
43
 
26
44
  ## Scopes
@@ -37,9 +55,12 @@ results = Legion::Apollo.query(text: 'ruby', scope: :all)
37
55
 
38
56
  Features:
39
57
  - Content-hash dedup (MD5 of normalized content)
58
+ - `raw_content` preservation for verbatim source text
59
+ - `valid_from` / `valid_to` temporal windows with `as_of:` query filtering
40
60
  - Optional LLM embeddings (1024-dim) with cosine rerank when `Legion::LLM.can_embed?`
41
61
  - TTL expiry (default 5-year retention)
42
62
  - FTS5 full-text search with `ILIKE` fallback
63
+ - Null-byte removal and invalid UTF-8 scrubbing before persistence or backend routing
43
64
 
44
65
  ## Configuration
45
66
 
@@ -17,6 +17,7 @@ module Legion
17
17
  WRITE_GATE_THRESHOLD = 0.3
18
18
  HIGH_CONFIDENCE = 0.8
19
19
  ARCHIVE_THRESHOLD = 0.1
20
+ INITIAL_INFERENCE_CONFIDENCE = 0.35
20
21
 
21
22
  STATUSES = %i[pending confirmed disputed deprecated archived].freeze
22
23
 
@@ -11,7 +11,7 @@ module Legion
11
11
  # Relationships are directional typed edges between two entities.
12
12
  # Graph traversal expands one frontier batch per depth to avoid per-node queries.
13
13
  module Graph # rubocop:disable Metrics/ModuleLength
14
- VALID_RELATION_TYPES = %w[AFFECTS OWNED_BY DEPENDS_ON RELATED_TO].freeze
14
+ VALID_RELATION_TYPES = %w[AFFECTS OWNED_BY DEPENDS_ON RELATED_TO SUPERSEDES].freeze
15
15
 
16
16
  class << self # rubocop:disable Metrics/ClassLength
17
17
  include Legion::Logging::Helper
@@ -0,0 +1,53 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do # rubocop:disable Metrics/BlockLength
4
+ up do # rubocop:disable Metrics/BlockLength
5
+ alter_table(:local_knowledge) do
6
+ add_column :is_inference, :boolean, default: false, null: false
7
+ add_column :parent_knowledge_id, Integer, null: true
8
+ add_column :is_latest, :boolean, default: true, null: false
9
+ add_column :supersession_type, String, size: 32, null: true
10
+ add_column :forget_reason, String, size: 128, null: true
11
+ add_column :summary_l0, String, size: 500, null: true
12
+ add_column :summary_l1, :text, null: true
13
+ add_column :knowledge_tier, String, size: 4, null: false, default: 'L2'
14
+ add_column :l0_generated_at, String, null: true
15
+ add_column :l1_generated_at, String, null: true
16
+
17
+ add_index :is_latest, name: :idx_local_knowledge_is_latest
18
+ add_index :is_inference, name: :idx_local_knowledge_is_inference
19
+ add_index :knowledge_tier, name: :idx_local_knowledge_tier
20
+ add_index :parent_knowledge_id, name: :idx_local_knowledge_parent
21
+ end
22
+
23
+ create_table(:local_source_links) do
24
+ primary_key :id
25
+ Integer :entry_id, null: false
26
+ String :source_uri, text: true
27
+ String :source_hash, size: 64
28
+ Float :relevance_score, default: 1.0
29
+ String :extraction_method, size: 64
30
+ String :created_at, null: false
31
+
32
+ index :entry_id, name: :idx_source_links_entry
33
+ index :source_hash, name: :idx_source_links_hash
34
+ end
35
+ end
36
+
37
+ down do
38
+ drop_table(:local_source_links) if table_exists?(:local_source_links)
39
+
40
+ alter_table(:local_knowledge) do
41
+ drop_column :is_inference
42
+ drop_column :parent_knowledge_id
43
+ drop_column :is_latest
44
+ drop_column :supersession_type
45
+ drop_column :forget_reason
46
+ drop_column :summary_l0
47
+ drop_column :summary_l1
48
+ drop_column :knowledge_tier
49
+ drop_column :l0_generated_at
50
+ drop_column :l1_generated_at
51
+ end
52
+ end
53
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ alter_table(:local_knowledge) do
6
+ add_column :raw_content, :text, null: true
7
+ add_column :valid_from, String, null: true
8
+ add_column :valid_to, String, null: true
9
+
10
+ add_index :valid_from, name: :idx_local_knowledge_valid_from
11
+ add_index :valid_to, name: :idx_local_knowledge_valid_to
12
+ end
13
+ end
14
+
15
+ down do
16
+ alter_table(:local_knowledge) do
17
+ drop_column :raw_content
18
+ drop_column :valid_from
19
+ drop_column :valid_to
20
+ end
21
+ end
22
+ end
@@ -5,6 +5,7 @@ require 'legion/logging'
5
5
  require 'socket'
6
6
  require 'time'
7
7
  require_relative 'local/graph'
8
+ require_relative 'helpers/confidence'
8
9
  require_relative 'helpers/similarity'
9
10
  require_relative 'helpers/tag_normalizer'
10
11
 
@@ -90,7 +91,7 @@ module Legion
90
91
  { success: false, error: e.message }
91
92
  end
92
93
 
93
- def query(text:, limit: nil, min_confidence: nil, tags: nil, **) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
94
+ def query(text:, limit: nil, min_confidence: nil, tags: nil, **opts) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity
94
95
  return not_started_error unless started?
95
96
 
96
97
  text = normalize_text_input(text)
@@ -98,19 +99,27 @@ module Legion
98
99
  limit ||= local_setting(:default_limit, 5)
99
100
  min_confidence ||= local_setting(:min_confidence, 0.3)
100
101
  multiplier = local_setting(:fts_candidate_multiplier, 3)
102
+ as_of = normalize_temporal_value(opts[:as_of])
101
103
  log.info do
102
104
  "Apollo::Local query executing text_length=#{text.to_s.length} " \
103
105
  "limit=#{limit} min_confidence=#{min_confidence} tag_count=#{Array(tags).size}"
104
106
  end
105
107
  log.debug { "Apollo::Local query limit=#{limit} min_confidence=#{min_confidence} tags=#{Array(tags).size}" }
106
108
 
107
- candidates = fts_search(text, limit: limit * multiplier)
108
- candidates = filter_candidates(candidates, min_confidence: min_confidence, tags: tags)
109
+ candidates = fts_search(text, limit: limit * multiplier, as_of: as_of)
110
+ include_inferences = opts.fetch(:include_inferences, true)
111
+ include_history = opts.fetch(:include_history, false)
112
+ candidates = filter_candidates(candidates, min_confidence: min_confidence, tags: tags,
113
+ options: { include_inferences: include_inferences,
114
+ include_history: include_history, as_of: as_of })
109
115
  candidates = cosine_rerank(text, candidates) if can_rerank?
110
116
  results = candidates.first(limit)
111
117
 
118
+ tier = opts[:tier]
119
+ results = results.map { |r| project_tier(r, tier) } if tier
120
+
112
121
  log.info { "Apollo::Local query completed count=#{results.size}" }
113
- { success: true, results: results, count: results.size, mode: :local }
122
+ { success: true, results: results, count: results.size, mode: :local, tier: tier }
114
123
  rescue StandardError => e
115
124
  handle_exception(
116
125
  e,
@@ -151,10 +160,11 @@ module Legion
151
160
  end
152
161
 
153
162
  def query_by_tags(tags:, limit: 50) # rubocop:disable Metrics/MethodLength
154
- return { success: false, error: :not_started } unless started?
155
-
163
+ connection = local_db_connection
156
164
  tags = normalize_tags_input(tags)
157
- results = query_by_tags_via_sql(tags: tags, limit: limit)
165
+ return { success: false, error: :not_started } unless local_db_usable?(connection)
166
+
167
+ results = query_by_tags_via_sql(connection, tags: tags, limit: limit)
158
168
 
159
169
  log.info { "Apollo::Local query_by_tags completed tag_count=#{tags.size} count=#{results.size}" }
160
170
  { success: true, results: results, count: results.size }
@@ -170,11 +180,13 @@ module Legion
170
180
  end
171
181
 
172
182
  def promote_to_global(tags:, min_confidence: 0.6) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
173
- return { success: false, error: :not_started } unless started?
183
+ return { success: false, error: :not_started } unless local_db_usable?(local_db_connection)
174
184
 
175
185
  tags = normalize_tags_input(tags)
176
186
  entries = query_by_tags(tags: tags)
177
- unless entries[:success] && entries[:results]&.any?
187
+ return entries unless entries[:success]
188
+
189
+ unless entries[:results]&.any?
178
190
  log.info { "Apollo::Local promote_to_global skipped tag_count=#{tags.size} reason=no_entries" }
179
191
  return { success: true, promoted: 0 }
180
192
  end
@@ -192,6 +204,7 @@ module Legion
192
204
  end
193
205
  result = Legion::Apollo.ingest(
194
206
  content: entry[:content],
207
+ raw_content: entry[:raw_content] || entry[:content],
195
208
  tags: entry_tags + ['promoted_from_local'],
196
209
  source_channel: 'local_promotion',
197
210
  submitted_by: "node:#{hostname}",
@@ -223,6 +236,41 @@ module Legion
223
236
  { success: false, error: e.message }
224
237
  end
225
238
 
239
+ def version_chain(entry_id:, max_depth: 50) # rubocop:disable Metrics/MethodLength
240
+ return not_started_error unless started?
241
+
242
+ chain = []
243
+ current_id = entry_id
244
+ seen = Set.new
245
+
246
+ max_depth.times do
247
+ break unless current_id
248
+ break if seen.include?(current_id)
249
+
250
+ seen.add(current_id)
251
+ row = db[:local_knowledge].where(id: current_id).first
252
+ break unless row
253
+
254
+ chain << row
255
+ current_id = row[:parent_knowledge_id]
256
+ end
257
+
258
+ { success: true, chain: chain, count: chain.size }
259
+ rescue StandardError => e
260
+ handle_exception(e, level: :error, operation: 'apollo.local.version_chain', entry_id: entry_id)
261
+ { success: false, error: e.message }
262
+ end
263
+
264
+ def source_links_for(entry_id:)
265
+ return not_started_error unless started?
266
+
267
+ links = db[:local_source_links].where(entry_id: entry_id).all
268
+ { success: true, links: links, count: links.size }
269
+ rescue StandardError => e
270
+ handle_exception(e, level: :error, operation: 'apollo.local.source_links_for', entry_id: entry_id)
271
+ { success: false, error: e.message }
272
+ end
273
+
226
274
  private
227
275
 
228
276
  def self_knowledge_files
@@ -294,6 +342,26 @@ module Legion
294
342
  Legion::Data::Local.connection
295
343
  end
296
344
 
345
+ def local_db_connection
346
+ return nil unless started? && data_local_available?
347
+
348
+ db
349
+ rescue StandardError => e
350
+ handle_exception(e, level: :debug, operation: 'apollo.local.local_db_connection')
351
+ nil
352
+ end
353
+
354
+ def local_db_usable?(connection)
355
+ return false unless started? && connection
356
+ return false if connection.respond_to?(:closed?) && connection.closed?
357
+
358
+ connection.test_connection if connection.respond_to?(:test_connection)
359
+ true
360
+ rescue StandardError => e
361
+ handle_exception(e, level: :debug, operation: 'apollo.local.local_db_usable')
362
+ false
363
+ end
364
+
297
365
  def content_hash(content)
298
366
  normalized = content.to_s.strip.downcase.gsub(/\s+/, ' ')
299
367
  Digest::MD5.hexdigest(normalized)
@@ -353,9 +421,12 @@ module Legion
353
421
 
354
422
  result = ingest(
355
423
  content: entry[:content],
424
+ raw_content: entry[:raw_content] || entry[:content],
356
425
  tags: clean_tags,
357
426
  confidence: ((entry[:confidence] || 0.5) * 0.9).round(10),
358
- source_channel: 'global_hydration'
427
+ source_channel: 'global_hydration',
428
+ valid_from: entry[:valid_from],
429
+ valid_to: entry[:valid_to]
359
430
  )
360
431
  hydrated += 1 if result[:success]
361
432
  end
@@ -365,6 +436,8 @@ module Legion
365
436
  end
366
437
 
367
438
  def ingest_without_lock(content:, tags:, **opts) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
439
+ content = normalize_text_input(content)
440
+ raw_content = normalize_text_input(opts.key?(:raw_content) ? opts[:raw_content] : content)
368
441
  hash = content_hash(content)
369
442
  return deduplicated_ingest(hash) if duplicate?(hash)
370
443
 
@@ -374,8 +447,11 @@ module Legion
374
447
  end
375
448
  log.debug { "Apollo::Local ingest hash=#{hash} tags=#{Array(tags).size} source_channel=#{opts[:source_channel]}" }
376
449
 
377
- row = build_ingest_row(content: content, hash: hash, tags: tags, **opts)
378
- id = persist_ingest_row(row)
450
+ metadata = opts.dup
451
+ metadata.delete(:raw_content)
452
+ row = build_ingest_row(content: content, raw_content: raw_content, hash: hash, tags: tags, **metadata)
453
+ id = persist_ingest_row(row, metadata)
454
+ mark_parent_superseded(metadata[:parent_knowledge_id]) if metadata[:parent_knowledge_id]
379
455
 
380
456
  log.info { "Apollo::Local ingest stored id=#{id} hash=#{hash}" }
381
457
  { success: true, mode: :local, id: id }
@@ -385,31 +461,94 @@ module Legion
385
461
  deduplicated_ingest(hash)
386
462
  end
387
463
 
388
- def build_ingest_row(content:, hash:, tags:, **opts)
464
+ def build_ingest_row(content:, raw_content:, hash:, tags:, **opts) # rubocop:disable Metrics/MethodLength
465
+ is_inference = opts[:is_inference] == true
466
+ default_confidence = is_inference ? Legion::Apollo::Helpers::Confidence::INITIAL_INFERENCE_CONFIDENCE : 1.0
467
+ ingest_metadata_columns(
468
+ content: content,
469
+ raw_content: raw_content,
470
+ hash: hash,
471
+ tags: tags,
472
+ opts: opts,
473
+ is_inference: is_inference,
474
+ default_confidence: default_confidence
475
+ ).merge(embedding_columns(content, opts)).merge(timestamp_columns)
476
+ end
477
+
478
+ def ingest_metadata_columns(context)
479
+ ingest_base_columns(context)
480
+ .merge(ingest_lineage_columns(context[:opts]))
481
+ .merge(ingest_temporal_columns(context[:opts]))
482
+ end
483
+
484
+ def ingest_base_columns(context)
485
+ opts = context[:opts]
389
486
  {
390
- content: content,
391
- content_hash: hash,
392
- tags: serialized_tags(tags),
393
- source_channel: opts[:source_channel],
394
- source_agent: opts[:source_agent],
395
- submitted_by: opts[:submitted_by],
396
- confidence: opts[:confidence] || 1.0
397
- }.merge(embedding_columns(content)).merge(timestamp_columns)
487
+ content: context[:content],
488
+ raw_content: context[:raw_content],
489
+ content_hash: context[:hash],
490
+ tags: serialized_tags(context[:tags]),
491
+ confidence: opts[:confidence] || context[:default_confidence],
492
+ is_inference: context[:is_inference]
493
+ }.merge(ingest_source_columns(opts))
494
+ end
495
+
496
+ def ingest_source_columns(opts)
497
+ { source_channel: opts[:source_channel], source_agent: opts[:source_agent],
498
+ submitted_by: opts[:submitted_by] }
499
+ end
500
+
501
+ def ingest_lineage_columns(opts)
502
+ {
503
+ forget_reason: opts[:forget_reason],
504
+ parent_knowledge_id: opts[:parent_knowledge_id],
505
+ supersession_type: opts[:supersession_type]
506
+ }
398
507
  end
399
508
 
400
- def persist_ingest_row(row)
509
+ def ingest_temporal_columns(opts)
510
+ {
511
+ valid_from: normalize_temporal_value(opts[:valid_from]),
512
+ valid_to: normalize_temporal_value(opts[:valid_to])
513
+ }
514
+ end
515
+
516
+ def persist_ingest_row(row, opts = {})
401
517
  db.transaction do
402
518
  id = db[:local_knowledge].insert(row)
403
519
  sync_fts!(id, row[:content], row[:tags])
520
+ create_source_link(id, opts) if opts[:source_uri]
404
521
  id
405
522
  end
406
523
  end
407
524
 
525
+ def create_source_link(entry_id, opts)
526
+ db[:local_source_links].insert(
527
+ entry_id: entry_id,
528
+ source_uri: opts[:source_uri],
529
+ source_hash: opts[:source_hash],
530
+ relevance_score: opts[:relevance_score] || 1.0,
531
+ extraction_method: opts[:extraction_method],
532
+ created_at: Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
533
+ )
534
+ rescue StandardError => e
535
+ handle_exception(e, level: :warn, operation: 'apollo.local.create_source_link', entry_id: entry_id)
536
+ end
537
+
408
538
  def deduplicated_ingest(hash)
409
539
  log.info { "Apollo::Local ingest deduplicated hash=#{hash}" }
410
540
  { success: true, mode: :deduplicated }
411
541
  end
412
542
 
543
+ def mark_parent_superseded(parent_id)
544
+ return unless parent_id
545
+
546
+ db[:local_knowledge].where(id: parent_id).update(is_latest: false)
547
+ log.info { "Apollo::Local marked entry id=#{parent_id} as superseded" }
548
+ rescue StandardError => e
549
+ handle_exception(e, level: :warn, operation: 'apollo.local.mark_parent_superseded', parent_id: parent_id)
550
+ end
551
+
413
552
  def generate_embedding(content) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
414
553
  unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:can_embed?) && Legion::LLM.can_embed?
415
554
  log.debug 'Apollo::Local embedding skipped because embeddings are unavailable'
@@ -447,13 +586,13 @@ module Legion
447
586
  log.debug { "Apollo::Local FTS synced id=#{id}" }
448
587
  end
449
588
 
450
- def embedding_columns(content)
589
+ def embedding_columns(content, opts = {})
451
590
  embedding, embedded_at = generate_embedding(content)
452
591
 
453
592
  {
454
593
  embedding: embedding ? Legion::JSON.dump(embedding) : nil,
455
594
  embedded_at: embedded_at,
456
- expires_at: compute_expires_at
595
+ expires_at: opts[:expires_at] || compute_expires_at
457
596
  }
458
597
  end
459
598
 
@@ -466,33 +605,45 @@ module Legion
466
605
  Legion::JSON.dump(normalize_tags_input(tags))
467
606
  end
468
607
 
469
- def fts_search(text, limit:) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
470
- if text.to_s.strip.empty?
471
- return db[:local_knowledge]
472
- .where(Sequel.lit('expires_at > ?', Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')))
473
- .limit(limit)
474
- .all
475
- end
476
-
477
- escaped = text.to_s.gsub('"', '""')
608
+ def fts_search(text, limit:, as_of: nil) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
478
609
  now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
610
+ return active_knowledge_dataset(now: now, as_of: as_of).limit(limit).all if text.to_s.strip.empty?
611
+
612
+ tokens = text.to_s.scan(/[\p{L}\p{N}_]+/)
613
+ return ilike_search(text, now: now, limit: limit, as_of: as_of) if tokens.empty?
614
+
615
+ escaped = tokens.map { |t| %("#{t}") }.join(' ')
616
+ temporal_sql, temporal_params = temporal_window_sql(as_of, table_alias: 'lk')
479
617
  db.fetch(
480
618
  'SELECT lk.* FROM local_knowledge lk ' \
481
619
  'INNER JOIN local_knowledge_fts fts ON lk.id = fts.rowid ' \
482
- 'WHERE local_knowledge_fts MATCH ? AND lk.expires_at > ? ORDER BY fts.rank LIMIT ?',
483
- escaped, now, limit
620
+ "WHERE local_knowledge_fts MATCH ? AND lk.expires_at > ?#{temporal_sql} " \
621
+ 'ORDER BY fts.rank LIMIT ?',
622
+ escaped, now, *temporal_params, limit
484
623
  ).all
485
624
  rescue StandardError => e
486
625
  handle_exception(e, level: :debug, operation: 'apollo.local.fts_search', limit: limit, fallback: :ilike)
487
- db[:local_knowledge]
488
- .where(Sequel.lit('expires_at > ?', Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')))
489
- .where(Sequel.ilike(:content, "%#{text}%"))
626
+ ilike_search(text, now: Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ'), limit: limit, as_of: as_of)
627
+ end
628
+
629
+ def ilike_search(text, now:, limit:, as_of: nil)
630
+ safe_text = text.to_s.gsub('\\', '\\\\\\\\').gsub('%', '\%').gsub('_', '\_')
631
+ active_knowledge_dataset(now: now, as_of: as_of)
632
+ .where(Sequel.lit("content LIKE ? ESCAPE '\\' COLLATE NOCASE", "%#{safe_text}%"))
490
633
  .limit(limit)
491
634
  .all
492
635
  end
493
636
 
494
- def filter_candidates(candidates, min_confidence:, tags:)
637
+ def filter_candidates(candidates, min_confidence:, tags:, options: {}) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity,Metrics/MethodLength,Metrics/AbcSize
638
+ include_inferences = options.fetch(:include_inferences, true)
639
+ include_history = options.fetch(:include_history, false)
640
+ as_of = options[:as_of]
495
641
  candidates = candidates.select { |c| (c[:confidence] || 0) >= min_confidence }
642
+ candidates = candidates.select { |c| temporally_valid?(c, as_of) }
643
+ candidates = candidates.reject { |c| [1, true].include?(c[:is_inference]) } unless include_inferences
644
+ unless include_history
645
+ candidates = candidates.select { |c| c[:is_latest].nil? || c[:is_latest] == 1 || c[:is_latest] == true }
646
+ end
496
647
  if tags && !tags.empty?
497
648
  tag_set = Array(tags).map(&:to_s)
498
649
  candidates = candidates.select do |c|
@@ -503,6 +654,36 @@ module Legion
503
654
  candidates
504
655
  end
505
656
 
657
+ def active_knowledge_dataset(now:, as_of: nil)
658
+ apply_temporal_window(db[:local_knowledge].where(Sequel.lit('expires_at > ?', now)), as_of)
659
+ end
660
+
661
+ def apply_temporal_window(dataset, as_of)
662
+ return dataset if as_of.to_s.empty?
663
+
664
+ dataset.where(
665
+ Sequel.lit('(valid_from IS NULL OR valid_from <= ?) AND (valid_to IS NULL OR valid_to >= ?)', as_of, as_of)
666
+ )
667
+ end
668
+
669
+ def temporal_window_sql(as_of, table_alias:)
670
+ return ['', []] if as_of.to_s.empty?
671
+
672
+ [
673
+ " AND (#{table_alias}.valid_from IS NULL OR #{table_alias}.valid_from <= ?) " \
674
+ "AND (#{table_alias}.valid_to IS NULL OR #{table_alias}.valid_to >= ?)",
675
+ [as_of, as_of]
676
+ ]
677
+ end
678
+
679
+ def temporally_valid?(row, as_of)
680
+ return true if as_of.to_s.empty?
681
+
682
+ valid_from = row[:valid_from]
683
+ valid_to = row[:valid_to]
684
+ (valid_from.nil? || valid_from <= as_of) && (valid_to.nil? || valid_to >= as_of)
685
+ end
686
+
506
687
  def parse_tags(tags_json)
507
688
  return [] if tags_json.nil? || tags_json.empty?
508
689
 
@@ -571,6 +752,17 @@ module Legion
571
752
  value.to_s
572
753
  end
573
754
 
755
+ def normalize_temporal_value(value)
756
+ return nil if value.nil?
757
+
758
+ text = value.respond_to?(:utc) ? value.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ') : value.to_s.strip
759
+ return nil if text.empty?
760
+
761
+ Time.parse(text).utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
762
+ rescue StandardError
763
+ text
764
+ end
765
+
574
766
  def normalize_tags_input(tags)
575
767
  Legion::Apollo::Helpers::TagNormalizer.normalize(Array(tags)).first(max_tags_limit)
576
768
  rescue StandardError => e
@@ -589,9 +781,9 @@ module Legion
589
781
  Legion::Apollo::Helpers::TagNormalizer::MAX_TAGS
590
782
  end
591
783
 
592
- def query_by_tags_via_sql(tags:, limit:) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength
784
+ def query_by_tags_via_sql(connection, tags:, limit:) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength
593
785
  now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
594
- dataset = db[:local_knowledge].where(Sequel.lit('expires_at > ?', now))
786
+ dataset = connection[:local_knowledge].where(Sequel.lit('expires_at > ?', now))
595
787
 
596
788
  Array(tags).map(&:to_s).each do |tag|
597
789
  dataset = dataset.where(
@@ -611,11 +803,15 @@ module Legion
611
803
  tag_count: Array(tags).size,
612
804
  limit: limit
613
805
  )
614
- query_by_tags_via_ruby(tags: tags, limit: limit)
806
+ raise unless local_db_usable?(connection)
807
+
808
+ query_by_tags_via_ruby(connection, tags: tags, limit: limit)
615
809
  end
616
810
 
617
- def query_by_tags_via_ruby(tags:, limit:)
618
- candidates = db[:local_knowledge]
811
+ def query_by_tags_via_ruby(connection, tags:, limit:)
812
+ raise Sequel::DatabaseConnectionError, 'local database unavailable' unless local_db_usable?(connection)
813
+
814
+ candidates = connection[:local_knowledge]
619
815
  .where(Sequel.lit('expires_at > ?', Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')))
620
816
  .all
621
817
 
@@ -626,7 +822,7 @@ module Legion
626
822
  end
627
823
 
628
824
  def update_upsert_entry(existing, content, tags_json, opts) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize
629
- content = content.to_s
825
+ content = normalize_text_input(content)
630
826
  new_hash = content_hash(content)
631
827
  embedding, embedded_at = generate_embedding(content)
632
828
  now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.%LZ')
@@ -658,6 +854,21 @@ module Legion
658
854
  log.debug { "Apollo::Local FTS rebuilt id=#{id}" }
659
855
  end
660
856
 
857
+ def project_tier(entry, tier) # rubocop:disable Metrics/MethodLength
858
+ case tier
859
+ when :l0
860
+ entry.slice(:id, :content_hash, :confidence, :tags, :source_channel, :is_inference, :is_latest).merge(
861
+ summary: entry[:summary_l0] || entry[:content]&.slice(0, 200)
862
+ )
863
+ when :l1
864
+ entry.slice(:id, :content_hash, :confidence, :tags, :source_channel, :is_inference, :is_latest).merge(
865
+ summary: entry[:summary_l1] || entry[:content]&.slice(0, 1000)
866
+ )
867
+ else
868
+ entry
869
+ end
870
+ end
871
+
661
872
  def not_started_error
662
873
  { success: false, error: :not_started }
663
874
  end
@@ -50,7 +50,7 @@ module Legion
50
50
  end
51
51
  end
52
52
 
53
- def self.register_query_route(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity
53
+ def self.register_query_route(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
54
54
  app.post '/api/apollo/query' do
55
55
  unless apollo_api_available?
56
56
  halt 503, json_error('apollo_unavailable', 'apollo is not available', status_code: 503)
@@ -59,21 +59,25 @@ module Legion
59
59
  body = parse_request_body
60
60
  default_limit = defined?(Legion::Settings) ? (Legion::Settings[:apollo]&.dig(:default_limit) || 5) : 5
61
61
  result = Legion::Apollo.query(
62
- text: body[:query],
63
- limit: body[:limit] || default_limit,
64
- min_confidence: body[:min_confidence],
65
- status: body[:status] || [:confirmed],
66
- tags: body[:tags],
67
- domain: body[:domain],
68
- agent_id: body[:agent_id] || 'api',
69
- scope: normalize_scope(body[:scope])
62
+ text: body[:query],
63
+ limit: body[:limit] || default_limit,
64
+ min_confidence: body[:min_confidence],
65
+ status: body[:status] || [:confirmed],
66
+ tags: body[:tags],
67
+ domain: body[:domain],
68
+ agent_id: body[:agent_id] || 'api',
69
+ scope: normalize_scope(body[:scope]),
70
+ tier: body[:tier]&.to_sym,
71
+ as_of: body[:as_of],
72
+ include_inferences: body.fetch(:include_inferences, true),
73
+ include_history: body.fetch(:include_history, false)
70
74
  )
71
75
  json_response(result, status_code: apollo_status_code(result))
72
76
  end
73
77
  end
74
78
 
75
79
  def self.register_ingest_route(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
76
- app.post '/api/apollo/ingest' do
80
+ app.post '/api/apollo/ingest' do # rubocop:disable Metrics/BlockLength
77
81
  unless apollo_api_available?
78
82
  halt 503, json_error('apollo_unavailable', 'apollo is not available', status_code: 503)
79
83
  end
@@ -84,15 +88,27 @@ module Legion
84
88
  effective_max_tags = [max_tags, Legion::Apollo::Helpers::TagNormalizer::MAX_TAGS].min
85
89
  tags = Legion::Apollo::Helpers::TagNormalizer.normalize(Array(body[:tags])).first(effective_max_tags)
86
90
  result = Legion::Apollo.ingest(
87
- content: body[:content],
88
- content_type: body[:content_type] || :observation,
89
- tags: tags,
90
- source_agent: body[:source_agent] || 'api',
91
- source_provider: body[:source_provider],
92
- source_channel: body[:source_channel] || 'rest_api',
93
- knowledge_domain: body[:knowledge_domain],
94
- context: body[:context] || {},
95
- scope: normalize_scope(body[:scope])
91
+ content: body[:content],
92
+ raw_content: body[:raw_content],
93
+ content_type: body[:content_type] || :observation,
94
+ tags: tags,
95
+ source_agent: body[:source_agent] || 'api',
96
+ source_provider: body[:source_provider],
97
+ source_channel: body[:source_channel] || 'rest_api',
98
+ knowledge_domain: body[:knowledge_domain],
99
+ context: body[:context] || {},
100
+ scope: normalize_scope(body[:scope]),
101
+ is_inference: body[:is_inference] == true,
102
+ forget_reason: body[:forget_reason],
103
+ expires_at: body[:expires_at],
104
+ valid_from: body[:valid_from],
105
+ valid_to: body[:valid_to],
106
+ parent_knowledge_id: body[:parent_knowledge_id],
107
+ supersession_type: body[:supersession_type],
108
+ source_uri: body[:source_uri],
109
+ source_hash: body[:source_hash],
110
+ relevance_score: body[:relevance_score],
111
+ extraction_method: body[:extraction_method]
96
112
  )
97
113
  json_response(result, status_code: apollo_status_code(result, success_status: 201))
98
114
  end
@@ -197,13 +213,16 @@ module Legion
197
213
  :global
198
214
  end
199
215
 
200
- def apollo_status_code(result, success_status: 200)
216
+ def apollo_status_code(result, success_status: 200) # rubocop:disable Metrics/MethodLength
201
217
  return 202 if result[:success] && result[:mode] == :async
202
218
  return success_status if result[:success]
203
219
 
204
220
  case result[:error]
205
- when :no_path_available, :not_started then 503
206
- else 500
221
+ when :no_path_available, :not_started, :local_not_started,
222
+ :upstream_query_failed, :backend_query_failed
223
+ 503
224
+ else
225
+ 500
207
226
  end
208
227
  rescue StandardError => e
209
228
  handle_exception(e, level: :debug, operation: :apollo_status_code)
@@ -10,7 +10,9 @@ module Legion
10
10
  max_tags: 20,
11
11
  default_limit: 5,
12
12
  min_confidence: 0.3,
13
- local: local_defaults
13
+ local: local_defaults,
14
+ versioning: versioning_defaults,
15
+ expiry: expiry_defaults
14
16
  }
15
17
  end
16
18
 
@@ -24,6 +26,22 @@ module Legion
24
26
  default_limit: 5
25
27
  }
26
28
  end
29
+
30
+ def self.versioning_defaults
31
+ {
32
+ enabled: true,
33
+ supersession_threshold: 0.85,
34
+ max_chain_depth: 50
35
+ }
36
+ end
37
+
38
+ def self.expiry_defaults
39
+ {
40
+ enabled: true,
41
+ sweep_interval: 3600,
42
+ warn_before_expiry: 86_400
43
+ }
44
+ end
27
45
  end
28
46
  end
29
47
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module Apollo
5
- VERSION = '0.4.0'
5
+ VERSION = '0.5.2'
6
6
  end
7
7
  end
data/lib/legion/apollo.rb CHANGED
@@ -97,9 +97,11 @@ module Legion
97
97
  return not_started_error unless started?
98
98
 
99
99
  normalized_tags = normalize_tags_input(tags)
100
- payload = { content: content, tags: normalized_tags, **opts }
100
+ normalized_content = normalize_text_input(content)
101
+ normalized_raw_content = normalize_text_input(opts.key?(:raw_content) ? opts[:raw_content] : content)
102
+ payload = { **opts, content: normalized_content, raw_content: normalized_raw_content, tags: normalized_tags }
101
103
  log.info do
102
- "Apollo ingest requested scope=#{scope} content_length=#{content.to_s.length} " \
104
+ "Apollo ingest requested scope=#{scope} content_length=#{payload[:content].to_s.length} " \
103
105
  "tags=#{payload[:tags].size} source_channel=#{payload[:source_channel]}"
104
106
  end
105
107
  log.debug do
@@ -247,7 +249,7 @@ module Legion
247
249
  Legion::Extensions::Apollo::Runners::Knowledge.handle_query(**normalize_query_payload(payload))
248
250
  rescue StandardError => e
249
251
  handle_exception(e, level: :error, operation: 'apollo.direct_query', payload_keys: payload.keys)
250
- { success: false, error: e.message }
252
+ { success: false, error: :backend_query_failed, detail: e.message }
251
253
  end
252
254
 
253
255
  def direct_ingest(payload)
@@ -288,7 +290,9 @@ module Legion
288
290
  "Apollo query using local store text_length=#{payload[:text].to_s.length} " \
289
291
  "limit=#{payload[:limit]}"
290
292
  end
291
- result = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags))
293
+ result = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags,
294
+ :tier, :include_inferences, :include_history,
295
+ :as_of))
292
296
  return result unless result[:success]
293
297
 
294
298
  entries = normalize_local_entries(Array(result[:results]))
@@ -322,7 +326,9 @@ module Legion
322
326
 
323
327
  if Legion::Apollo::Local.started?
324
328
  attempted = true
325
- local = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags))
329
+ local = Legion::Apollo::Local.query(**payload.slice(:text, :limit, :min_confidence, :tags,
330
+ :tier, :include_inferences, :include_history,
331
+ :as_of))
326
332
  if local[:success]
327
333
  any_success = true
328
334
  entries.concat(normalize_local_entries(Array(local[:results]))) if local[:results]
@@ -341,6 +347,9 @@ module Legion
341
347
  return { success: false, error: :no_path_available } unless attempted
342
348
 
343
349
  unless any_success
350
+ symbol_errors = errors.compact.grep(Symbol).uniq
351
+ return { success: false, error: symbol_errors.first } if symbol_errors.size == 1
352
+
344
353
  combined_error = errors.compact.map(&:to_s).reject(&:empty?).join('; ')
345
354
  combined_error = :upstream_query_failed if combined_error.empty?
346
355
  return { success: false, error: combined_error }
@@ -372,18 +381,29 @@ module Legion
372
381
  else
373
382
  Array(e[:tags])
374
383
  end
375
- { id: e[:id], content: e[:content], content_hash: hash,
376
- confidence: e[:confidence] || 0.5, content_type: 'fact', tags: tags, source: :local }
384
+ { id: e[:id], content: e[:content], raw_content: e[:raw_content] || e[:content], content_hash: hash,
385
+ confidence: e[:confidence] || 0.5, content_type: 'fact', tags: tags, source: :local,
386
+ valid_from: e[:valid_from], valid_to: e[:valid_to] }
377
387
  end
378
388
  end
379
389
 
380
390
  def normalize_global_entries(entries)
381
- entries.map do |e|
382
- hash = e[:content_hash] || Digest::MD5.hexdigest(e[:content].to_s.strip.downcase.gsub(/\s+/, ' '))
383
- { id: e[:id], content: e[:content], content_hash: hash,
384
- confidence: e[:confidence] || 0.5, content_type: e[:content_type] || 'fact',
385
- tags: Array(e[:tags]), source: :global }
386
- end
391
+ entries.map { |entry| normalize_global_entry(entry) }
392
+ end
393
+
394
+ def normalize_global_entry(entry)
395
+ { id: entry[:id], content: entry[:content], raw_content: normalized_raw_content(entry),
396
+ content_hash: normalized_content_hash(entry), confidence: entry[:confidence] || 0.5,
397
+ content_type: entry[:content_type] || 'fact', tags: Array(entry[:tags]), source: :global,
398
+ valid_from: entry[:valid_from], valid_to: entry[:valid_to] }
399
+ end
400
+
401
+ def normalized_raw_content(entry)
402
+ entry[:raw_content] || entry[:content]
403
+ end
404
+
405
+ def normalized_content_hash(entry)
406
+ entry[:content_hash] || Digest::MD5.hexdigest(entry[:content].to_s.strip.downcase.gsub(/\s+/, ' '))
387
407
  end
388
408
 
389
409
  def dedup_and_rank(entries, limit:)
@@ -465,20 +485,28 @@ module Legion
465
485
  end
466
486
 
467
487
  def normalize_text_input(value) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/MethodLength
468
- case value
469
- when String
470
- value
471
- when Array
472
- parts = value.filter_map { |entry| extract_text_fragment(entry) }
473
- joined = parts.map(&:to_s).map(&:strip).reject(&:empty?).join("\n")
474
- joined.empty? ? value.to_s : joined
475
- when Hash
476
- extract_text_fragment(value).to_s
477
- when nil
478
- ''
479
- else
480
- value.to_s
481
- end
488
+ text = case value
489
+ when String
490
+ value
491
+ when Array
492
+ parts = value.filter_map { |entry| extract_text_fragment(entry) }
493
+ joined = parts.map(&:to_s).map(&:strip).reject(&:empty?).join("\n")
494
+ joined.empty? ? value.to_s : joined
495
+ when Hash
496
+ extract_text_fragment(value).to_s
497
+ when nil
498
+ ''
499
+ else
500
+ value.to_s
501
+ end
502
+ sanitize_text_input(text)
503
+ end
504
+
505
+ def sanitize_text_input(value)
506
+ text = value.to_s.dup
507
+ text = text.encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: '')
508
+ text = text.scrub('') unless text.valid_encoding?
509
+ text.delete("\u0000")
482
510
  end
483
511
 
484
512
  def normalize_tags_input(tags)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-apollo
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.5.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -86,6 +86,8 @@ files:
86
86
  - lib/legion/apollo/local/migrations/001_create_local_knowledge.rb
87
87
  - lib/legion/apollo/local/migrations/002_create_graph_tables.rb
88
88
  - lib/legion/apollo/local/migrations/003_harden_graph_relationships.rb
89
+ - lib/legion/apollo/local/migrations/004_add_versioning_tiers_inference.rb
90
+ - lib/legion/apollo/local/migrations/005_add_raw_content_temporal_windows.rb
89
91
  - lib/legion/apollo/messages/access_boost.rb
90
92
  - lib/legion/apollo/messages/ingest.rb
91
93
  - lib/legion/apollo/messages/query.rb