legion-data 1.7.0 → 1.7.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8f45999e6d1c3b0727dc351630f7c76da902a40513e9cbcd35dbce97d5c90304
4
- data.tar.gz: b3efe7c69e3ccb9540f44fc4e7a464428898abaa74bcddbab9c4b39f2eefc3b4
3
+ metadata.gz: a12da2249f20eb148411c6e052ef5b0bbdef86579907a84feab15d713f7b51f9
4
+ data.tar.gz: 433ea32fa8120428137d2114e54f5285653acc872956d04df3520cc996d277bc
5
5
  SHA512:
6
- metadata.gz: 6449f9218be46e31329571cca1d9d3233860b8dd3e307eb35fcc824d3385969dd25a3493a1b0e4261786e622fe754c3d20f152bbf3bf9bfb920c0f0041103df9
7
- data.tar.gz: b24487e1c279cda679fac9f27b3b4d7d5af00be5652782065e0f9b3a2b78502dce1d45c0db45b3f5f3632da9fd68f18b75714825d83dc0c0cd05953a26ae6c03
6
+ metadata.gz: 8f8c62cfbe83e98ccb220ab02432ed7f2e02837350b4a7eb30ac24e51de6486c128e9f17846ac89dcd67fb835b8f292ded46d61830a2aa037f3f61e1734ee288
7
+ data.tar.gz: e73434d2a098fa7b57d5c6dea6e75daf43ca6585492984f3c225449c7819ffde7e796e6ac113d3ee4d5caf30bb627085fc2cfec3400ec4c7ca43a9e980b5cb8c
data/CHANGELOG.md CHANGED
@@ -2,6 +2,33 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [1.7.3] - 2026-04-27
6
+
7
+ ### Added
8
+ - Migration 074: widens Apollo `content_hash` to 64 fixed characters and `knowledge_domain` / `source_provider` / `source_agent` to 255 characters so SHA-256 hashes and real-world identifiers fit without ingestion truncation failures. (Fixes #33, #34)
9
+ - Migration 075: adds task `idempotency_key` and `idempotency_expires_at` columns plus indexes for SHA-256 payload deduplication windows. (Fixes #14)
10
+ - Migration 076: adds `extract_step_timings` for per-step Extract pipeline timing visibility. (Fixes #15)
11
+ - `Task.idempotency_key_for`, `Task.find_active_by_idempotency_key`, and `Task.create_idempotent` for stable content-addressed task dispatch deduplication. (Fixes #14)
12
+ - Extract results now include `extract_id` and `step_timings`, and persist timing rows when the migration is present. (Fixes #15)
13
+ - `AuditLogHashChain` plus `AuditLog.compute_hash` / `AuditLog.verify_chain` as the canonical data-side audit log hash-chain implementation for standard write paths to share. (Refs #13)
14
+
15
+ ### Fixed
16
+ - Migration 051 now adds SQLite/MySQL `tasks.created_at` without a non-constant default before backfilling from `created`, allowing later migration specs and fresh SQLite databases to migrate cleanly.
17
+
18
+ ## [1.7.2] - 2026-04-27
19
+
20
+ ### Fixed
21
+ - Dev-fallback to SQLite now logs at `:error` level with explicit warnings that data written to SQLite will not be visible when the configured network database reconnects.
22
+
23
+ ### Added
24
+ - `Connection.connection_info` — returns adapter, connection state, and fallback status for health checks and diagnostics
25
+ - `Connection.fallback_active?` — returns true when the data layer fell back to SQLite from a configured network database; Apollo and other services can check this to detect degraded mode and log appropriate warnings
26
+
27
+ ## [1.7.1] - 2026-04-27
28
+
29
+ ### Fixed
30
+ - `QueryFileLogger` now treats writes after `close` as no-ops, preventing repeated `IOError: closed stream` warnings from late Sequel query callbacks during shutdown. (Fixes #35)
31
+
5
32
  ## [1.7.0] - 2026-04-24
6
33
 
7
34
  ### Added
data/README.md CHANGED
@@ -139,12 +139,30 @@ MyMemoryTrace.all # queries legionio_local.db, never the shared DB
139
139
  `Legion::Data::Extract` provides a handler registry for extracting text from documents, used by `lex-knowledge` for corpus ingestion:
140
140
 
141
141
  ```ruby
142
- text = Legion::Data::Extract.extract('/path/to/document.pdf')
143
- text = Legion::Data::Extract.extract('/path/to/data.csv')
142
+ result = Legion::Data::Extract.extract('/path/to/document.pdf')
143
+ text = result[:text]
144
+ result[:step_timings] # per-step name, start_time, end_time, status, error, duration_ms
144
145
  ```
145
146
 
146
147
  Supported formats: `.txt`, `.md`, `.csv`, `.json`, `.jsonl`, `.html`, `.xlsx`, `.docx`, `.pdf`, `.pptx`, `.vtt`
147
148
 
149
+ When migration 076 is present, Extract also persists the same per-step timing rows to `extract_step_timings`
150
+ under the returned `extract_id`.
151
+
152
+ ### Task Idempotency
153
+
154
+ `Task.idempotency_key_for` computes a stable SHA-256 key from canonical JSON payloads. `Task.create_idempotent`
155
+ returns an existing non-terminal task for the same key inside the optional TTL window, or creates a new task
156
+ with `idempotency_key` and `idempotency_expires_at` populated:
157
+
158
+ ```ruby
159
+ task = Legion::Data::Model::Task.create_idempotent(
160
+ { status: 'pending', payload: Legion::JSON.dump(payload) },
161
+ payload: payload,
162
+ ttl: 300
163
+ )
164
+ ```
165
+
148
166
  ### Filesystem Spool (Write Buffer)
149
167
 
150
168
  When the database is unavailable, `Legion::Data::Spool` buffers writes to `~/.legionio/data/spool/` and replays once the connection is restored:
@@ -343,6 +361,7 @@ Legion::Data.reload_static_cache
343
361
  | `Chain` | `chains` | Task execution chains |
344
362
  | `AuditLog` | `audit_log` | Tamper-evident audit trail with hash chain |
345
363
  | `AuditRecord` | `audit_records` | Structured audit records |
364
+ | `ExtractStepTiming` | `extract_step_timings` | Per-step Extract pipeline timing metadata |
346
365
  | `RbacRoleAssignment` | `rbac_role_assignments` | RBAC principal -> role mappings |
347
366
  | `RbacRunnerGrant` | `rbac_runner_grants` | Per-runner permission grants |
348
367
  | `RbacCrossTeamGrant` | `rbac_cross_team_grants` | Cross-team access grants |
@@ -377,7 +396,7 @@ Apollo models require PostgreSQL with the `pgvector` extension. They are skipped
377
396
 
378
397
  ## Migrations
379
398
 
380
- 71 numbered Sequel DSL migrations run automatically on startup (`auto_migrate: true`). Key milestones:
399
+ 76 numbered Sequel DSL migrations run automatically on startup (`auto_migrate: true`). Key milestones:
381
400
 
382
401
  | Range | What was added |
383
402
  |-------|---------------|
@@ -393,6 +412,7 @@ Apollo models require PostgreSQL with the `pgvector` extension. They are skipped
393
412
  | 050 | Critical indexes across 13 tables |
394
413
  | 058–067 | Audit records, chains, knowledge tiers, tool embedding cache, identity system (providers, principals, identities, groups) |
395
414
  | 068–071 | Entity type on audit records, principal on nodes, approval queue resume, engine on relationships |
415
+ | 072–076 | Identity audit/multi-instance columns, Apollo identifier widening, task idempotency, Extract step timings |
396
416
 
397
417
  Run migrations standalone:
398
418
 
@@ -0,0 +1,85 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'digest'
4
+ require 'legion/json'
5
+ require 'time'
6
+
7
+ module Legion
8
+ module Data
9
+ module AuditLogHashChain
10
+ GENESIS_HASH = ('0' * 64).freeze
11
+ CANONICAL_FIELDS = %i[
12
+ principal_id action resource source status detail created_at previous_hash
13
+ ].freeze
14
+
15
+ class << self
16
+ def compute_hash(record)
17
+ Digest::SHA256.hexdigest(canonical_payload(record))
18
+ end
19
+
20
+ def verify(records)
21
+ previous_hash = GENESIS_HASH
22
+ records.each do |record|
23
+ return invalid(record, :parent_mismatch) unless value_for(record, :previous_hash).to_s == previous_hash
24
+
25
+ expected = compute_hash(record)
26
+ return invalid(record, :hash_mismatch) unless value_for(record, :record_hash).to_s == expected
27
+
28
+ previous_hash = expected
29
+ end
30
+
31
+ { valid: true, length: records.size }
32
+ end
33
+
34
+ def canonical_payload(record)
35
+ CANONICAL_FIELDS.map do |field|
36
+ "#{field}:#{canonical_value(value_for(record, field))}"
37
+ end.join('|')
38
+ end
39
+
40
+ private
41
+
42
+ def invalid(record, reason)
43
+ { valid: false, broken_at: value_for(record, :id), reason: reason }
44
+ end
45
+
46
+ def canonical_value(value)
47
+ case value
48
+ when Time
49
+ value.utc.iso8601(6)
50
+ when DateTime
51
+ value.to_time.utc.iso8601(6)
52
+ when Hash
53
+ Legion::JSON.dump(canonical_hash(value))
54
+ when Array
55
+ Legion::JSON.dump(value.map { |item| canonical_json_value(item) })
56
+ else
57
+ value.to_s
58
+ end
59
+ end
60
+
61
+ def canonical_json_value(value)
62
+ case value
63
+ when Hash then canonical_hash(value)
64
+ when Array then value.map { |item| canonical_json_value(item) }
65
+ else value
66
+ end
67
+ end
68
+
69
+ def canonical_hash(hash)
70
+ hash.keys.map(&:to_s).sort.to_h do |key|
71
+ [key, canonical_json_value(hash.fetch(key) { hash.fetch(key.to_sym) })]
72
+ end
73
+ end
74
+
75
+ def value_for(record, field)
76
+ return record[field] if record.respond_to?(:[]) && !record[field].nil?
77
+ return record[field.to_s] if record.respond_to?(:[]) && !record[field.to_s].nil?
78
+ return record.public_send(field) if record.respond_to?(field)
79
+
80
+ nil
81
+ end
82
+ end
83
+ end
84
+ end
85
+ end
@@ -103,12 +103,13 @@ module Legion
103
103
 
104
104
  def initialize(path)
105
105
  @path = path
106
+ @closed = false
107
+ @mutex = Mutex.new
106
108
  dir = File.dirname(path)
107
109
  FileUtils.mkdir_p(dir)
108
110
  FileUtils.chmod(0o700, dir) if File.directory?(dir)
109
111
  @file = File.open(path, File::WRONLY | File::APPEND | File::CREAT, 0o600)
110
112
  @file.sync = true
111
- @mutex = Mutex.new
112
113
  end
113
114
 
114
115
  def debug(message)
@@ -128,16 +129,23 @@ module Legion
128
129
  end
129
130
 
130
131
  def close
131
- @mutex.synchronize { @file.close unless @file.closed? }
132
+ @mutex.synchronize do
133
+ @closed = true
134
+ @file.close unless @file.closed?
135
+ end
132
136
  end
133
137
 
134
138
  private
135
139
 
136
140
  def write(level, message)
137
141
  @mutex.synchronize do
142
+ return if @closed || @file.closed?
143
+
138
144
  @file.puts "[#{Time.now.strftime('%Y-%m-%d %H:%M:%S.%L')}] #{level} #{message}"
139
145
  end
140
146
  rescue IOError => e
147
+ return nil if @closed || @file.closed?
148
+
141
149
  handle_exception(e, level: :warn, handled: true, operation: :query_file_write, path: @path)
142
150
  nil
143
151
  end
@@ -155,16 +163,22 @@ module Legion
155
163
  def setup
156
164
  opts = sequel_opts
157
165
  log.info("Legion::Data::Connection setup adapter=#{adapter}")
166
+ @fallback_active = false
158
167
  @sequel = if adapter == :sqlite
159
168
  ::Sequel.connect(opts.merge(adapter: :sqlite, database: sqlite_path))
160
169
  else
170
+ attempted_adapter = adapter
161
171
  begin
162
- ::Sequel.connect(connection_opts_for(adapter: adapter, opts: opts))
172
+ ::Sequel.connect(connection_opts_for(adapter: attempted_adapter, opts: opts))
163
173
  rescue StandardError => e
164
174
  raise unless dev_fallback?
165
175
 
166
- handle_exception(e, level: :warn, handled: true, operation: :shared_connect, fallback: :sqlite)
176
+ log.error("Legion::Data FALLING BACK TO SQLITE #{attempted_adapter} connection failed: #{e.message}")
177
+ log.error("Legion::Data WARNING: Data written to SQLite will NOT be visible when #{attempted_adapter} reconnects. " \
178
+ 'Apollo knowledge, audit logs, and other DB-backed services will use a local-only store.')
179
+ handle_exception(e, level: :error, handled: true, operation: :shared_connect, fallback: :sqlite)
167
180
  @adapter = :sqlite
181
+ @fallback_active = true
168
182
  sqlite_opts = sequel_opts
169
183
  ::Sequel.connect(sqlite_opts.merge(adapter: :sqlite, database: sqlite_path))
170
184
  end
@@ -175,6 +189,25 @@ module Legion
175
189
  connect_with_replicas
176
190
  end
177
191
 
192
+ # Returns connection metadata for health checks and diagnostics.
193
+ # Apollo and other services can use this to detect silent fallback.
194
+ def connection_info
195
+ {
196
+ adapter: adapter,
197
+ connected: Legion::Settings[:data][:connected],
198
+ fallback_active: @fallback_active || false,
199
+ configured_adapter: Legion::Settings[:data][:adapter]&.to_sym || :sqlite,
200
+ sequel_alive: (begin; !@sequel&.test_connection.nil?; rescue StandardError; false; end)
201
+ }
202
+ end
203
+
204
+ # Returns true if the data layer fell back to SQLite from a configured
205
+ # network database (PostgreSQL/MySQL). Services should check this and
206
+ # log warnings when operating in degraded mode.
207
+ def fallback_active?
208
+ @fallback_active == true
209
+ end
210
+
178
211
  def stats
179
212
  return { connected: false } unless @sequel
180
213
 
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'legion/logging/helper'
4
+ require 'securerandom'
4
5
  require_relative 'extract/type_detector'
5
6
  require_relative 'extract/handlers/base'
6
7
 
@@ -11,29 +12,49 @@ module Legion
11
12
  include Legion::Logging::Helper
12
13
 
13
14
  def extract(source, type: :auto)
14
- detected_type = type == :auto ? TypeDetector.detect(source) : type&.to_sym
15
- return { success: false, text: nil, error: :unknown_type } unless detected_type
15
+ extract_id = SecureRandom.uuid
16
+ timings = []
17
+ detected_type = timed_step(:detect_type, timings) do
18
+ type == :auto ? TypeDetector.detect(source) : type&.to_sym
19
+ end
20
+ unless detected_type
21
+ result = { success: false, text: nil, error: :unknown_type, extract_id: extract_id,
22
+ step_timings: timings }
23
+ persist_step_timings(extract_id, timings)
24
+ return result
25
+ end
16
26
 
17
- handler = Handlers::Base.for_type(detected_type)
18
- return { success: false, text: nil, error: :no_handler, type: detected_type } unless handler
27
+ handler = timed_step(:resolve_handler, timings) { Handlers::Base.for_type(detected_type) }
28
+ unless handler
29
+ result = { success: false, text: nil, error: :no_handler, type: detected_type, extract_id: extract_id,
30
+ step_timings: timings }
31
+ persist_step_timings(extract_id, timings)
32
+ return result
33
+ end
19
34
 
20
- unless handler.available?
35
+ available = timed_step(:check_availability, timings) { handler.available? }
36
+ unless available
21
37
  return { success: false, text: nil, error: :gem_not_installed,
22
- gem: handler.gem_name, type: detected_type }
38
+ gem: handler.gem_name, type: detected_type, extract_id: extract_id,
39
+ step_timings: timings }.tap { persist_step_timings(extract_id, timings) }
23
40
  end
24
41
 
25
42
  log.info "Extract starting type=#{detected_type} handler=#{handler.name}"
26
- result = handler.extract(source)
43
+ result = timed_step(:handler_extract, timings) { handler.extract(source) }
27
44
  if result[:text]
28
45
  log.info "Extract succeeded type=#{detected_type}"
29
- { success: true, text: result[:text], metadata: result[:metadata], type: detected_type }
46
+ { success: true, text: result[:text], metadata: result[:metadata], type: detected_type,
47
+ extract_id: extract_id, step_timings: timings }
30
48
  else
31
49
  log.warn "Extract failed type=#{detected_type} error=#{result[:error]}"
32
- { success: false, text: nil, error: result[:error], type: detected_type }
33
- end
50
+ { success: false, text: nil, error: result[:error], type: detected_type,
51
+ extract_id: extract_id, step_timings: timings }
52
+ end.tap { persist_step_timings(extract_id, timings) }
34
53
  rescue StandardError => e
35
54
  handle_exception(e, level: :error, handled: true, operation: :extract, type: detected_type)
36
- { success: false, text: nil, error: e.message, type: detected_type }
55
+ persist_step_timings(extract_id, timings) if extract_id
56
+ { success: false, text: nil, error: e.message, type: detected_type, extract_id: extract_id,
57
+ step_timings: timings }
37
58
  end
38
59
 
39
60
  def supported_types
@@ -54,6 +75,48 @@ module Legion
54
75
 
55
76
  private
56
77
 
78
+ def timed_step(name, timings)
79
+ monotonic_start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
80
+ start_time = Time.now.utc
81
+ result = yield
82
+ record_step_timing(timings, name: name, start_time: start_time, monotonic_start: monotonic_start,
83
+ status: :success)
84
+ result
85
+ rescue StandardError => e
86
+ record_step_timing(timings, name: name, start_time: start_time, monotonic_start: monotonic_start,
87
+ status: :error, error: "#{e.class}: #{e.message}")
88
+ raise
89
+ end
90
+
91
+ def record_step_timing(timings, name:, start_time:, monotonic_start:, status:, error: nil)
92
+ end_time = Time.now.utc
93
+ duration_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - monotonic_start) * 1000).round
94
+ timings << {
95
+ name: name.to_s,
96
+ start_time: start_time,
97
+ end_time: end_time,
98
+ status: status.to_s,
99
+ error: error,
100
+ duration_ms: duration_ms
101
+ }
102
+ end
103
+
104
+ def persist_step_timings(extract_id, timings)
105
+ return unless defined?(Legion::Data)
106
+
107
+ connection = Legion::Data.connection
108
+ return unless connection&.table_exists?(:extract_step_timings)
109
+
110
+ existing_steps = connection[:extract_step_timings].where(extract_id: extract_id).select_map(:name)
111
+ rows = timings.reject { |timing| existing_steps.include?(timing[:name]) }.map do |timing|
112
+ timing.merge(extract_id: extract_id)
113
+ end
114
+ connection[:extract_step_timings].multi_insert(rows) unless rows.empty?
115
+ rescue StandardError => e
116
+ handle_exception(e, level: :warn, handled: true, operation: :persist_extract_step_timings,
117
+ extract_id: extract_id)
118
+ end
119
+
57
120
  def load_all_handlers
58
121
  return if @handlers_loaded
59
122
 
@@ -14,7 +14,7 @@ Sequel.migration do
14
14
  else
15
15
  # SQLite/MySQL: add real column and backfill from created
16
16
  alter_table(:tasks) do
17
- add_column :created_at, DateTime, default: Sequel::CURRENT_TIMESTAMP
17
+ add_column :created_at, DateTime
18
18
  end
19
19
 
20
20
  run 'UPDATE tasks SET created_at = created WHERE created_at IS NULL'
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ next unless adapter_scheme == :postgres
6
+ next unless table_exists?(:apollo_entries)
7
+
8
+ apollo_columns = schema(:apollo_entries).map(&:first)
9
+ alter_table(:apollo_entries) do
10
+ set_column_type :content_hash, String, fixed: true, size: 64 if apollo_columns.include?(:content_hash)
11
+ set_column_type :knowledge_domain, String, size: 255 if apollo_columns.include?(:knowledge_domain)
12
+ set_column_type :source_provider, String, size: 255 if apollo_columns.include?(:source_provider)
13
+ set_column_type :source_agent, String, size: 255 if apollo_columns.include?(:source_agent)
14
+ end
15
+
16
+ next unless table_exists?(:apollo_entries_archive)
17
+
18
+ archive_columns = schema(:apollo_entries_archive).map(&:first)
19
+ alter_table(:apollo_entries_archive) do
20
+ set_column_type :content_hash, String, fixed: true, size: 64 if archive_columns.include?(:content_hash)
21
+ set_column_type :knowledge_domain, String, size: 255 if archive_columns.include?(:knowledge_domain)
22
+ set_column_type :source_provider, String, size: 255 if archive_columns.include?(:source_provider)
23
+ set_column_type :source_agent, String, size: 255 if archive_columns.include?(:source_agent)
24
+ end
25
+ end
26
+
27
+ down do
28
+ next unless adapter_scheme == :postgres
29
+ next unless table_exists?(:apollo_entries)
30
+
31
+ apollo_columns = schema(:apollo_entries).map(&:first)
32
+ alter_table(:apollo_entries) do
33
+ set_column_type :content_hash, String, fixed: true, size: 32 if apollo_columns.include?(:content_hash)
34
+ set_column_type :knowledge_domain, String, size: 50 if apollo_columns.include?(:knowledge_domain)
35
+ set_column_type :source_provider, String, size: 50 if apollo_columns.include?(:source_provider)
36
+ set_column_type :source_agent, String, size: 50 if apollo_columns.include?(:source_agent)
37
+ end
38
+
39
+ next unless table_exists?(:apollo_entries_archive)
40
+
41
+ archive_columns = schema(:apollo_entries_archive).map(&:first)
42
+ alter_table(:apollo_entries_archive) do
43
+ set_column_type :content_hash, String, fixed: true, size: 32 if archive_columns.include?(:content_hash)
44
+ set_column_type :knowledge_domain, String, size: 50 if archive_columns.include?(:knowledge_domain)
45
+ set_column_type :source_provider, String, size: 50 if archive_columns.include?(:source_provider)
46
+ set_column_type :source_agent, String, size: 50 if archive_columns.include?(:source_agent)
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ next unless table_exists?(:tasks)
6
+
7
+ existing_columns = schema(:tasks).map(&:first)
8
+ alter_table(:tasks) do
9
+ add_column :idempotency_key, String, size: 64 unless existing_columns.include?(:idempotency_key)
10
+ add_column :idempotency_expires_at, DateTime unless existing_columns.include?(:idempotency_expires_at)
11
+ end
12
+
13
+ add_index :tasks, :idempotency_key, name: :idx_tasks_idempotency_key, if_not_exists: true
14
+ add_index :tasks, :idempotency_expires_at, name: :idx_tasks_idempotency_expires_at, if_not_exists: true
15
+ end
16
+
17
+ down do
18
+ next unless table_exists?(:tasks)
19
+
20
+ existing_columns = schema(:tasks).map(&:first)
21
+ alter_table(:tasks) do
22
+ drop_index :idempotency_key, name: :idx_tasks_idempotency_key, if_exists: true
23
+ drop_index :idempotency_expires_at, name: :idx_tasks_idempotency_expires_at, if_exists: true
24
+ drop_column :idempotency_expires_at if existing_columns.include?(:idempotency_expires_at)
25
+ drop_column :idempotency_key if existing_columns.include?(:idempotency_key)
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ create_table?(:extract_step_timings) do
6
+ primary_key :id
7
+ String :extract_id, size: 36, null: false
8
+ String :name, size: 100, null: false
9
+ DateTime :start_time, null: false
10
+ DateTime :end_time, null: false
11
+ String :status, size: 20, null: false
12
+ String :error, text: true
13
+ Integer :duration_ms, null: false, default: 0
14
+
15
+ index :extract_id, name: :idx_extract_step_timings_extract_id
16
+ index %i[extract_id name], name: :idx_extract_step_timings_extract_name
17
+ index :status, name: :idx_extract_step_timings_status
18
+ end
19
+ end
20
+
21
+ down do
22
+ drop_table?(:extract_step_timings)
23
+ end
24
+ end
@@ -14,7 +14,7 @@ module Legion
14
14
  %w[extension function relationship chain task runner node setting digital_worker
15
15
  apollo_entry apollo_relation apollo_expertise apollo_access_log audit_log
16
16
  audit_record identity_provider principal identity identity_group
17
- identity_group_membership identity_audit_log]
17
+ identity_group_membership identity_audit_log extract_step_timing]
18
18
  end
19
19
 
20
20
  def load
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'legion/logging/helper'
4
+ require 'legion/data/audit_log_hash_chain'
4
5
 
5
6
  module Legion
6
7
  module Data
@@ -33,6 +34,14 @@ module Legion
33
34
  def before_destroy
34
35
  raise 'audit_log records are immutable and cannot be deleted'
35
36
  end
37
+
38
+ def self.compute_hash(record)
39
+ Legion::Data::AuditLogHashChain.compute_hash(record)
40
+ end
41
+
42
+ def self.verify_chain(records = order(:created_at, :id).all)
43
+ Legion::Data::AuditLogHashChain.verify(records)
44
+ end
36
45
  end
37
46
  end
38
47
  end
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Data
5
+ module Model
6
+ class ExtractStepTiming < Sequel::Model(:extract_step_timings)
7
+ end
8
+ end
9
+ end
10
+ end
@@ -1,9 +1,17 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'digest'
4
+ require 'legion/json'
5
+ require 'time'
6
+
3
7
  module Legion
4
8
  module Data
5
9
  module Model
6
10
  class Task < Sequel::Model
11
+ TERMINAL_STATUSES = %w[
12
+ completed complete failed error cancelled canceled timeout timed_out
13
+ ].freeze
14
+
7
15
  many_to_one :relationship
8
16
  one_to_many :task_log
9
17
  many_to_one :parent, class: self
@@ -11,9 +19,51 @@ module Legion
11
19
  many_to_one :master, class: self
12
20
  one_to_many :slave, key: :master_id, class: self
13
21
 
22
+ def self.idempotency_key_for(payload)
23
+ Digest::SHA256.hexdigest(Legion::JSON.dump(canonical_payload(payload)))
24
+ end
25
+
26
+ def self.find_active_by_idempotency_key(key, now: Time.now)
27
+ return nil if key.to_s.empty?
28
+ return nil unless columns.include?(:idempotency_key)
29
+
30
+ where(idempotency_key: key)
31
+ .exclude(status: TERMINAL_STATUSES)
32
+ .where { (idempotency_expires_at =~ nil) | (idempotency_expires_at > now) }
33
+ .reverse_order(:created, :id)
34
+ .first
35
+ end
36
+
37
+ def self.create_idempotent(values, payload: nil, idempotency_key: nil, ttl: nil)
38
+ key = idempotency_key || idempotency_key_for(payload || values)
39
+ existing = find_active_by_idempotency_key(key)
40
+ return existing if existing
41
+
42
+ expires_at = ttl ? Time.now + ttl : nil
43
+ create(values.merge(idempotency_key: key, idempotency_expires_at: expires_at))
44
+ end
45
+
14
46
  def cancelled?
15
47
  !cancelled_at.nil?
16
48
  end
49
+
50
+ def self.canonical_payload(value)
51
+ case value
52
+ when Hash
53
+ value.keys.map(&:to_s).sort.to_h do |key|
54
+ [key, canonical_payload(value.fetch(key) { value.fetch(key.to_sym) })]
55
+ end
56
+ when Array
57
+ value.map { |item| canonical_payload(item) }
58
+ when Time
59
+ value.utc.iso8601(6)
60
+ when DateTime
61
+ value.to_time.utc.iso8601(6)
62
+ else
63
+ value
64
+ end
65
+ end
66
+ private_class_method :canonical_payload
17
67
  end
18
68
  end
19
69
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module Data
5
- VERSION = '1.7.0'
5
+ VERSION = '1.7.3'
6
6
  end
7
7
  end
data/lib/legion/data.rb CHANGED
@@ -16,6 +16,7 @@ require_relative 'data/helper'
16
16
  require_relative 'data/rls'
17
17
  require_relative 'data/extract'
18
18
  require_relative 'data/audit_record'
19
+ require_relative 'data/audit_log_hash_chain'
19
20
 
20
21
  unless Legion::Logging::Helper.method_defined?(:handle_exception)
21
22
  module Legion
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-data
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.7.0
4
+ version: 1.7.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -121,6 +121,7 @@ files:
121
121
  - lib/legion/data/archival.rb
122
122
  - lib/legion/data/archival/policy.rb
123
123
  - lib/legion/data/archiver.rb
124
+ - lib/legion/data/audit_log_hash_chain.rb
124
125
  - lib/legion/data/audit_record.rb
125
126
  - lib/legion/data/connection.rb
126
127
  - lib/legion/data/encryption/cipher.rb
@@ -218,6 +219,9 @@ files:
218
219
  - lib/legion/data/migrations/071_add_engine_to_relationships.rb
219
220
  - lib/legion/data/migrations/072_create_identity_audit_log.rb
220
221
  - lib/legion/data/migrations/073_add_identity_multi_instance_columns.rb
222
+ - lib/legion/data/migrations/074_widen_apollo_entry_identifiers.rb
223
+ - lib/legion/data/migrations/075_add_task_idempotency.rb
224
+ - lib/legion/data/migrations/076_create_extract_step_timings.rb
221
225
  - lib/legion/data/model.rb
222
226
  - lib/legion/data/models/apollo_access_log.rb
223
227
  - lib/legion/data/models/apollo_entry.rb
@@ -228,6 +232,7 @@ files:
228
232
  - lib/legion/data/models/chain.rb
229
233
  - lib/legion/data/models/digital_worker.rb
230
234
  - lib/legion/data/models/extension.rb
235
+ - lib/legion/data/models/extract_step_timing.rb
231
236
  - lib/legion/data/models/function.rb
232
237
  - lib/legion/data/models/identity.rb
233
238
  - lib/legion/data/models/identity_audit_log.rb