legion-data 1.6.30 → 1.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +34 -0
- data/README.md +23 -3
- data/lib/legion/data/audit_log_hash_chain.rb +85 -0
- data/lib/legion/data/connection.rb +37 -4
- data/lib/legion/data/extract.rb +74 -11
- data/lib/legion/data/migrations/051_fix_tasks_created_at.rb +1 -1
- data/lib/legion/data/migrations/072_create_identity_audit_log.rb +31 -0
- data/lib/legion/data/migrations/073_add_identity_multi_instance_columns.rb +39 -0
- data/lib/legion/data/migrations/074_widen_apollo_entry_identifiers.rb +49 -0
- data/lib/legion/data/migrations/075_add_task_idempotency.rb +28 -0
- data/lib/legion/data/migrations/076_create_extract_step_timings.rb +24 -0
- data/lib/legion/data/model.rb +1 -1
- data/lib/legion/data/models/audit_log.rb +9 -0
- data/lib/legion/data/models/extract_step_timing.rb +10 -0
- data/lib/legion/data/models/identity_audit_log.rb +14 -0
- data/lib/legion/data/models/task.rb +50 -0
- data/lib/legion/data/version.rb +1 -1
- data/lib/legion/data.rb +1 -0
- metadata +9 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a12da2249f20eb148411c6e052ef5b0bbdef86579907a84feab15d713f7b51f9
|
|
4
|
+
data.tar.gz: 433ea32fa8120428137d2114e54f5285653acc872956d04df3520cc996d277bc
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 8f8c62cfbe83e98ccb220ab02432ed7f2e02837350b4a7eb30ac24e51de6486c128e9f17846ac89dcd67fb835b8f292ded46d61830a2aa037f3f61e1734ee288
|
|
7
|
+
data.tar.gz: e73434d2a098fa7b57d5c6dea6e75daf43ca6585492984f3c225449c7819ffde7e796e6ac113d3ee4d5caf30bb627085fc2cfec3400ec4c7ca43a9e980b5cb8c
|
data/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,40 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
## [1.7.3] - 2026-04-27
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- Migration 074: widens Apollo `content_hash` to 64 fixed characters and `knowledge_domain` / `source_provider` / `source_agent` to 255 characters so SHA-256 hashes and real-world identifiers fit without ingestion truncation failures. (Fixes #33, #34)
|
|
9
|
+
- Migration 075: adds task `idempotency_key` and `idempotency_expires_at` columns plus indexes for SHA-256 payload deduplication windows. (Fixes #14)
|
|
10
|
+
- Migration 076: adds `extract_step_timings` for per-step Extract pipeline timing visibility. (Fixes #15)
|
|
11
|
+
- `Task.idempotency_key_for`, `Task.find_active_by_idempotency_key`, and `Task.create_idempotent` for stable content-addressed task dispatch deduplication. (Fixes #14)
|
|
12
|
+
- Extract results now include `extract_id` and `step_timings`, and persist timing rows when the migration is present. (Fixes #15)
|
|
13
|
+
- `AuditLogHashChain` plus `AuditLog.compute_hash` / `AuditLog.verify_chain` as the canonical data-side audit log hash-chain implementation for standard write paths to share. (Refs #13)
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
- Migration 051 now adds SQLite/MySQL `tasks.created_at` without a non-constant default before backfilling from `created`, allowing later migration specs and fresh SQLite databases to migrate cleanly.
|
|
17
|
+
|
|
18
|
+
## [1.7.2] - 2026-04-27
|
|
19
|
+
|
|
20
|
+
### Fixed
|
|
21
|
+
- Dev-fallback to SQLite now logs at `:error` level with explicit warnings that data written to SQLite will not be visible when the configured network database reconnects.
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
- `Connection.connection_info` — returns adapter, connection state, and fallback status for health checks and diagnostics
|
|
25
|
+
- `Connection.fallback_active?` — returns true when the data layer fell back to SQLite from a configured network database; Apollo and other services can check this to detect degraded mode and log appropriate warnings
|
|
26
|
+
|
|
27
|
+
## [1.7.1] - 2026-04-27
|
|
28
|
+
|
|
29
|
+
### Fixed
|
|
30
|
+
- `QueryFileLogger` now treats writes after `close` as no-ops, preventing repeated `IOError: closed stream` warnings from late Sequel query callbacks during shutdown. (Fixes #35)
|
|
31
|
+
|
|
32
|
+
## [1.7.0] - 2026-04-24
|
|
33
|
+
|
|
34
|
+
### Added
|
|
35
|
+
- Migration 072: `identity_audit_log` table (Postgres-only) with indexes
|
|
36
|
+
- Migration 073: `employee_id` on principals, `account_type`/`qualifier`/`is_default`/`link_evidence` on identities, partial unique index for one-default-per-provider
|
|
37
|
+
- `IdentityAuditLog` model added to model loader
|
|
38
|
+
|
|
5
39
|
## [1.6.30] - 2026-04-22
|
|
6
40
|
|
|
7
41
|
### Fixed
|
data/README.md
CHANGED
|
@@ -139,12 +139,30 @@ MyMemoryTrace.all # queries legionio_local.db, never the shared DB
|
|
|
139
139
|
`Legion::Data::Extract` provides a handler registry for extracting text from documents, used by `lex-knowledge` for corpus ingestion:
|
|
140
140
|
|
|
141
141
|
```ruby
|
|
142
|
-
|
|
143
|
-
text =
|
|
142
|
+
result = Legion::Data::Extract.extract('/path/to/document.pdf')
|
|
143
|
+
text = result[:text]
|
|
144
|
+
result[:step_timings] # per-step name, start_time, end_time, status, error, duration_ms
|
|
144
145
|
```
|
|
145
146
|
|
|
146
147
|
Supported formats: `.txt`, `.md`, `.csv`, `.json`, `.jsonl`, `.html`, `.xlsx`, `.docx`, `.pdf`, `.pptx`, `.vtt`
|
|
147
148
|
|
|
149
|
+
When migration 076 is present, Extract also persists the same per-step timing rows to `extract_step_timings`
|
|
150
|
+
under the returned `extract_id`.
|
|
151
|
+
|
|
152
|
+
### Task Idempotency
|
|
153
|
+
|
|
154
|
+
`Task.idempotency_key_for` computes a stable SHA-256 key from canonical JSON payloads. `Task.create_idempotent`
|
|
155
|
+
returns an existing non-terminal task for the same key inside the optional TTL window, or creates a new task
|
|
156
|
+
with `idempotency_key` and `idempotency_expires_at` populated:
|
|
157
|
+
|
|
158
|
+
```ruby
|
|
159
|
+
task = Legion::Data::Model::Task.create_idempotent(
|
|
160
|
+
{ status: 'pending', payload: Legion::JSON.dump(payload) },
|
|
161
|
+
payload: payload,
|
|
162
|
+
ttl: 300
|
|
163
|
+
)
|
|
164
|
+
```
|
|
165
|
+
|
|
148
166
|
### Filesystem Spool (Write Buffer)
|
|
149
167
|
|
|
150
168
|
When the database is unavailable, `Legion::Data::Spool` buffers writes to `~/.legionio/data/spool/` and replays once the connection is restored:
|
|
@@ -343,6 +361,7 @@ Legion::Data.reload_static_cache
|
|
|
343
361
|
| `Chain` | `chains` | Task execution chains |
|
|
344
362
|
| `AuditLog` | `audit_log` | Tamper-evident audit trail with hash chain |
|
|
345
363
|
| `AuditRecord` | `audit_records` | Structured audit records |
|
|
364
|
+
| `ExtractStepTiming` | `extract_step_timings` | Per-step Extract pipeline timing metadata |
|
|
346
365
|
| `RbacRoleAssignment` | `rbac_role_assignments` | RBAC principal -> role mappings |
|
|
347
366
|
| `RbacRunnerGrant` | `rbac_runner_grants` | Per-runner permission grants |
|
|
348
367
|
| `RbacCrossTeamGrant` | `rbac_cross_team_grants` | Cross-team access grants |
|
|
@@ -377,7 +396,7 @@ Apollo models require PostgreSQL with the `pgvector` extension. They are skipped
|
|
|
377
396
|
|
|
378
397
|
## Migrations
|
|
379
398
|
|
|
380
|
-
|
|
399
|
+
76 numbered Sequel DSL migrations run automatically on startup (`auto_migrate: true`). Key milestones:
|
|
381
400
|
|
|
382
401
|
| Range | What was added |
|
|
383
402
|
|-------|---------------|
|
|
@@ -393,6 +412,7 @@ Apollo models require PostgreSQL with the `pgvector` extension. They are skipped
|
|
|
393
412
|
| 050 | Critical indexes across 13 tables |
|
|
394
413
|
| 058–067 | Audit records, chains, knowledge tiers, tool embedding cache, identity system (providers, principals, identities, groups) |
|
|
395
414
|
| 068–071 | Entity type on audit records, principal on nodes, approval queue resume, engine on relationships |
|
|
415
|
+
| 072–076 | Identity audit/multi-instance columns, Apollo identifier widening, task idempotency, Extract step timings |
|
|
396
416
|
|
|
397
417
|
Run migrations standalone:
|
|
398
418
|
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'digest'
|
|
4
|
+
require 'legion/json'
|
|
5
|
+
require 'time'
|
|
6
|
+
|
|
7
|
+
module Legion
|
|
8
|
+
module Data
|
|
9
|
+
module AuditLogHashChain
|
|
10
|
+
GENESIS_HASH = ('0' * 64).freeze
|
|
11
|
+
CANONICAL_FIELDS = %i[
|
|
12
|
+
principal_id action resource source status detail created_at previous_hash
|
|
13
|
+
].freeze
|
|
14
|
+
|
|
15
|
+
class << self
|
|
16
|
+
def compute_hash(record)
|
|
17
|
+
Digest::SHA256.hexdigest(canonical_payload(record))
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def verify(records)
|
|
21
|
+
previous_hash = GENESIS_HASH
|
|
22
|
+
records.each do |record|
|
|
23
|
+
return invalid(record, :parent_mismatch) unless value_for(record, :previous_hash).to_s == previous_hash
|
|
24
|
+
|
|
25
|
+
expected = compute_hash(record)
|
|
26
|
+
return invalid(record, :hash_mismatch) unless value_for(record, :record_hash).to_s == expected
|
|
27
|
+
|
|
28
|
+
previous_hash = expected
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
{ valid: true, length: records.size }
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def canonical_payload(record)
|
|
35
|
+
CANONICAL_FIELDS.map do |field|
|
|
36
|
+
"#{field}:#{canonical_value(value_for(record, field))}"
|
|
37
|
+
end.join('|')
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
private
|
|
41
|
+
|
|
42
|
+
def invalid(record, reason)
|
|
43
|
+
{ valid: false, broken_at: value_for(record, :id), reason: reason }
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
def canonical_value(value)
|
|
47
|
+
case value
|
|
48
|
+
when Time
|
|
49
|
+
value.utc.iso8601(6)
|
|
50
|
+
when DateTime
|
|
51
|
+
value.to_time.utc.iso8601(6)
|
|
52
|
+
when Hash
|
|
53
|
+
Legion::JSON.dump(canonical_hash(value))
|
|
54
|
+
when Array
|
|
55
|
+
Legion::JSON.dump(value.map { |item| canonical_json_value(item) })
|
|
56
|
+
else
|
|
57
|
+
value.to_s
|
|
58
|
+
end
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def canonical_json_value(value)
|
|
62
|
+
case value
|
|
63
|
+
when Hash then canonical_hash(value)
|
|
64
|
+
when Array then value.map { |item| canonical_json_value(item) }
|
|
65
|
+
else value
|
|
66
|
+
end
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
def canonical_hash(hash)
|
|
70
|
+
hash.keys.map(&:to_s).sort.to_h do |key|
|
|
71
|
+
[key, canonical_json_value(hash.fetch(key) { hash.fetch(key.to_sym) })]
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def value_for(record, field)
|
|
76
|
+
return record[field] if record.respond_to?(:[]) && !record[field].nil?
|
|
77
|
+
return record[field.to_s] if record.respond_to?(:[]) && !record[field.to_s].nil?
|
|
78
|
+
return record.public_send(field) if record.respond_to?(field)
|
|
79
|
+
|
|
80
|
+
nil
|
|
81
|
+
end
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
@@ -103,12 +103,13 @@ module Legion
|
|
|
103
103
|
|
|
104
104
|
def initialize(path)
|
|
105
105
|
@path = path
|
|
106
|
+
@closed = false
|
|
107
|
+
@mutex = Mutex.new
|
|
106
108
|
dir = File.dirname(path)
|
|
107
109
|
FileUtils.mkdir_p(dir)
|
|
108
110
|
FileUtils.chmod(0o700, dir) if File.directory?(dir)
|
|
109
111
|
@file = File.open(path, File::WRONLY | File::APPEND | File::CREAT, 0o600)
|
|
110
112
|
@file.sync = true
|
|
111
|
-
@mutex = Mutex.new
|
|
112
113
|
end
|
|
113
114
|
|
|
114
115
|
def debug(message)
|
|
@@ -128,16 +129,23 @@ module Legion
|
|
|
128
129
|
end
|
|
129
130
|
|
|
130
131
|
def close
|
|
131
|
-
@mutex.synchronize
|
|
132
|
+
@mutex.synchronize do
|
|
133
|
+
@closed = true
|
|
134
|
+
@file.close unless @file.closed?
|
|
135
|
+
end
|
|
132
136
|
end
|
|
133
137
|
|
|
134
138
|
private
|
|
135
139
|
|
|
136
140
|
def write(level, message)
|
|
137
141
|
@mutex.synchronize do
|
|
142
|
+
return if @closed || @file.closed?
|
|
143
|
+
|
|
138
144
|
@file.puts "[#{Time.now.strftime('%Y-%m-%d %H:%M:%S.%L')}] #{level} #{message}"
|
|
139
145
|
end
|
|
140
146
|
rescue IOError => e
|
|
147
|
+
return nil if @closed || @file.closed?
|
|
148
|
+
|
|
141
149
|
handle_exception(e, level: :warn, handled: true, operation: :query_file_write, path: @path)
|
|
142
150
|
nil
|
|
143
151
|
end
|
|
@@ -155,16 +163,22 @@ module Legion
|
|
|
155
163
|
def setup
|
|
156
164
|
opts = sequel_opts
|
|
157
165
|
log.info("Legion::Data::Connection setup adapter=#{adapter}")
|
|
166
|
+
@fallback_active = false
|
|
158
167
|
@sequel = if adapter == :sqlite
|
|
159
168
|
::Sequel.connect(opts.merge(adapter: :sqlite, database: sqlite_path))
|
|
160
169
|
else
|
|
170
|
+
attempted_adapter = adapter
|
|
161
171
|
begin
|
|
162
|
-
::Sequel.connect(connection_opts_for(adapter:
|
|
172
|
+
::Sequel.connect(connection_opts_for(adapter: attempted_adapter, opts: opts))
|
|
163
173
|
rescue StandardError => e
|
|
164
174
|
raise unless dev_fallback?
|
|
165
175
|
|
|
166
|
-
|
|
176
|
+
log.error("Legion::Data FALLING BACK TO SQLITE — #{attempted_adapter} connection failed: #{e.message}")
|
|
177
|
+
log.error("Legion::Data WARNING: Data written to SQLite will NOT be visible when #{attempted_adapter} reconnects. " \
|
|
178
|
+
'Apollo knowledge, audit logs, and other DB-backed services will use a local-only store.')
|
|
179
|
+
handle_exception(e, level: :error, handled: true, operation: :shared_connect, fallback: :sqlite)
|
|
167
180
|
@adapter = :sqlite
|
|
181
|
+
@fallback_active = true
|
|
168
182
|
sqlite_opts = sequel_opts
|
|
169
183
|
::Sequel.connect(sqlite_opts.merge(adapter: :sqlite, database: sqlite_path))
|
|
170
184
|
end
|
|
@@ -175,6 +189,25 @@ module Legion
|
|
|
175
189
|
connect_with_replicas
|
|
176
190
|
end
|
|
177
191
|
|
|
192
|
+
# Returns connection metadata for health checks and diagnostics.
|
|
193
|
+
# Apollo and other services can use this to detect silent fallback.
|
|
194
|
+
def connection_info
|
|
195
|
+
{
|
|
196
|
+
adapter: adapter,
|
|
197
|
+
connected: Legion::Settings[:data][:connected],
|
|
198
|
+
fallback_active: @fallback_active || false,
|
|
199
|
+
configured_adapter: Legion::Settings[:data][:adapter]&.to_sym || :sqlite,
|
|
200
|
+
sequel_alive: (begin; !@sequel&.test_connection.nil?; rescue StandardError; false; end)
|
|
201
|
+
}
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
# Returns true if the data layer fell back to SQLite from a configured
|
|
205
|
+
# network database (PostgreSQL/MySQL). Services should check this and
|
|
206
|
+
# log warnings when operating in degraded mode.
|
|
207
|
+
def fallback_active?
|
|
208
|
+
@fallback_active == true
|
|
209
|
+
end
|
|
210
|
+
|
|
178
211
|
def stats
|
|
179
212
|
return { connected: false } unless @sequel
|
|
180
213
|
|
data/lib/legion/data/extract.rb
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require 'legion/logging/helper'
|
|
4
|
+
require 'securerandom'
|
|
4
5
|
require_relative 'extract/type_detector'
|
|
5
6
|
require_relative 'extract/handlers/base'
|
|
6
7
|
|
|
@@ -11,29 +12,49 @@ module Legion
|
|
|
11
12
|
include Legion::Logging::Helper
|
|
12
13
|
|
|
13
14
|
def extract(source, type: :auto)
|
|
14
|
-
|
|
15
|
-
|
|
15
|
+
extract_id = SecureRandom.uuid
|
|
16
|
+
timings = []
|
|
17
|
+
detected_type = timed_step(:detect_type, timings) do
|
|
18
|
+
type == :auto ? TypeDetector.detect(source) : type&.to_sym
|
|
19
|
+
end
|
|
20
|
+
unless detected_type
|
|
21
|
+
result = { success: false, text: nil, error: :unknown_type, extract_id: extract_id,
|
|
22
|
+
step_timings: timings }
|
|
23
|
+
persist_step_timings(extract_id, timings)
|
|
24
|
+
return result
|
|
25
|
+
end
|
|
16
26
|
|
|
17
|
-
handler = Handlers::Base.for_type(detected_type)
|
|
18
|
-
|
|
27
|
+
handler = timed_step(:resolve_handler, timings) { Handlers::Base.for_type(detected_type) }
|
|
28
|
+
unless handler
|
|
29
|
+
result = { success: false, text: nil, error: :no_handler, type: detected_type, extract_id: extract_id,
|
|
30
|
+
step_timings: timings }
|
|
31
|
+
persist_step_timings(extract_id, timings)
|
|
32
|
+
return result
|
|
33
|
+
end
|
|
19
34
|
|
|
20
|
-
|
|
35
|
+
available = timed_step(:check_availability, timings) { handler.available? }
|
|
36
|
+
unless available
|
|
21
37
|
return { success: false, text: nil, error: :gem_not_installed,
|
|
22
|
-
gem: handler.gem_name, type: detected_type
|
|
38
|
+
gem: handler.gem_name, type: detected_type, extract_id: extract_id,
|
|
39
|
+
step_timings: timings }.tap { persist_step_timings(extract_id, timings) }
|
|
23
40
|
end
|
|
24
41
|
|
|
25
42
|
log.info "Extract starting type=#{detected_type} handler=#{handler.name}"
|
|
26
|
-
result = handler.extract(source)
|
|
43
|
+
result = timed_step(:handler_extract, timings) { handler.extract(source) }
|
|
27
44
|
if result[:text]
|
|
28
45
|
log.info "Extract succeeded type=#{detected_type}"
|
|
29
|
-
{ success: true, text: result[:text], metadata: result[:metadata], type: detected_type
|
|
46
|
+
{ success: true, text: result[:text], metadata: result[:metadata], type: detected_type,
|
|
47
|
+
extract_id: extract_id, step_timings: timings }
|
|
30
48
|
else
|
|
31
49
|
log.warn "Extract failed type=#{detected_type} error=#{result[:error]}"
|
|
32
|
-
{ success: false, text: nil, error: result[:error], type: detected_type
|
|
33
|
-
|
|
50
|
+
{ success: false, text: nil, error: result[:error], type: detected_type,
|
|
51
|
+
extract_id: extract_id, step_timings: timings }
|
|
52
|
+
end.tap { persist_step_timings(extract_id, timings) }
|
|
34
53
|
rescue StandardError => e
|
|
35
54
|
handle_exception(e, level: :error, handled: true, operation: :extract, type: detected_type)
|
|
36
|
-
|
|
55
|
+
persist_step_timings(extract_id, timings) if extract_id
|
|
56
|
+
{ success: false, text: nil, error: e.message, type: detected_type, extract_id: extract_id,
|
|
57
|
+
step_timings: timings }
|
|
37
58
|
end
|
|
38
59
|
|
|
39
60
|
def supported_types
|
|
@@ -54,6 +75,48 @@ module Legion
|
|
|
54
75
|
|
|
55
76
|
private
|
|
56
77
|
|
|
78
|
+
def timed_step(name, timings)
|
|
79
|
+
monotonic_start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
|
80
|
+
start_time = Time.now.utc
|
|
81
|
+
result = yield
|
|
82
|
+
record_step_timing(timings, name: name, start_time: start_time, monotonic_start: monotonic_start,
|
|
83
|
+
status: :success)
|
|
84
|
+
result
|
|
85
|
+
rescue StandardError => e
|
|
86
|
+
record_step_timing(timings, name: name, start_time: start_time, monotonic_start: monotonic_start,
|
|
87
|
+
status: :error, error: "#{e.class}: #{e.message}")
|
|
88
|
+
raise
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def record_step_timing(timings, name:, start_time:, monotonic_start:, status:, error: nil)
|
|
92
|
+
end_time = Time.now.utc
|
|
93
|
+
duration_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - monotonic_start) * 1000).round
|
|
94
|
+
timings << {
|
|
95
|
+
name: name.to_s,
|
|
96
|
+
start_time: start_time,
|
|
97
|
+
end_time: end_time,
|
|
98
|
+
status: status.to_s,
|
|
99
|
+
error: error,
|
|
100
|
+
duration_ms: duration_ms
|
|
101
|
+
}
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def persist_step_timings(extract_id, timings)
|
|
105
|
+
return unless defined?(Legion::Data)
|
|
106
|
+
|
|
107
|
+
connection = Legion::Data.connection
|
|
108
|
+
return unless connection&.table_exists?(:extract_step_timings)
|
|
109
|
+
|
|
110
|
+
existing_steps = connection[:extract_step_timings].where(extract_id: extract_id).select_map(:name)
|
|
111
|
+
rows = timings.reject { |timing| existing_steps.include?(timing[:name]) }.map do |timing|
|
|
112
|
+
timing.merge(extract_id: extract_id)
|
|
113
|
+
end
|
|
114
|
+
connection[:extract_step_timings].multi_insert(rows) unless rows.empty?
|
|
115
|
+
rescue StandardError => e
|
|
116
|
+
handle_exception(e, level: :warn, handled: true, operation: :persist_extract_step_timings,
|
|
117
|
+
extract_id: extract_id)
|
|
118
|
+
end
|
|
119
|
+
|
|
57
120
|
def load_all_handlers
|
|
58
121
|
return if @handlers_loaded
|
|
59
122
|
|
|
@@ -14,7 +14,7 @@ Sequel.migration do
|
|
|
14
14
|
else
|
|
15
15
|
# SQLite/MySQL: add real column and backfill from created
|
|
16
16
|
alter_table(:tasks) do
|
|
17
|
-
add_column :created_at, DateTime
|
|
17
|
+
add_column :created_at, DateTime
|
|
18
18
|
end
|
|
19
19
|
|
|
20
20
|
run 'UPDATE tasks SET created_at = created WHERE created_at IS NULL'
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do
|
|
4
|
+
up do
|
|
5
|
+
next unless adapter_scheme == :postgres
|
|
6
|
+
|
|
7
|
+
create_table(:identity_audit_log) do
|
|
8
|
+
column :id, :uuid, default: Sequel.lit('gen_random_uuid()'), primary_key: true
|
|
9
|
+
foreign_key :principal_id, :principals, type: :uuid, on_delete: :set_null
|
|
10
|
+
foreign_key :identity_id, :identities, type: :uuid, on_delete: :set_null
|
|
11
|
+
String :provider_name, null: false
|
|
12
|
+
String :event_type, null: false
|
|
13
|
+
String :trust_level
|
|
14
|
+
column :detail, :jsonb, null: false, default: Sequel.lit("'{}'")
|
|
15
|
+
String :node_id
|
|
16
|
+
String :session_id
|
|
17
|
+
DateTime :created_at, null: false, default: Sequel::CURRENT_TIMESTAMP
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
add_index :identity_audit_log, :principal_id
|
|
21
|
+
add_index :identity_audit_log, :event_type
|
|
22
|
+
add_index :identity_audit_log, :created_at
|
|
23
|
+
add_index :identity_audit_log, %i[principal_id event_type created_at]
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
down do
|
|
27
|
+
next unless adapter_scheme == :postgres
|
|
28
|
+
|
|
29
|
+
drop_table?(:identity_audit_log)
|
|
30
|
+
end
|
|
31
|
+
end
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do
|
|
4
|
+
up do
|
|
5
|
+
next unless adapter_scheme == :postgres
|
|
6
|
+
|
|
7
|
+
alter_table(:principals) do
|
|
8
|
+
add_column :employee_id, String
|
|
9
|
+
end
|
|
10
|
+
run 'CREATE INDEX idx_principals_employee_id ON principals (employee_id) WHERE employee_id IS NOT NULL'
|
|
11
|
+
|
|
12
|
+
alter_table(:identities) do
|
|
13
|
+
add_column :account_type, String, null: false, default: 'primary'
|
|
14
|
+
add_column :qualifier, String
|
|
15
|
+
add_column :is_default, TrueClass, null: false, default: false
|
|
16
|
+
add_column :link_evidence, String
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
run 'CREATE UNIQUE INDEX identities_one_default_per_provider ON identities (principal_id, provider_id) WHERE is_default = true AND active = true'
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
down do
|
|
23
|
+
next unless adapter_scheme == :postgres
|
|
24
|
+
|
|
25
|
+
run 'DROP INDEX IF EXISTS identities_one_default_per_provider'
|
|
26
|
+
|
|
27
|
+
alter_table(:identities) do
|
|
28
|
+
drop_column :link_evidence
|
|
29
|
+
drop_column :is_default
|
|
30
|
+
drop_column :qualifier
|
|
31
|
+
drop_column :account_type
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
run 'DROP INDEX IF EXISTS idx_principals_employee_id'
|
|
35
|
+
alter_table(:principals) do
|
|
36
|
+
drop_column :employee_id
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
end
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do
|
|
4
|
+
up do
|
|
5
|
+
next unless adapter_scheme == :postgres
|
|
6
|
+
next unless table_exists?(:apollo_entries)
|
|
7
|
+
|
|
8
|
+
apollo_columns = schema(:apollo_entries).map(&:first)
|
|
9
|
+
alter_table(:apollo_entries) do
|
|
10
|
+
set_column_type :content_hash, String, fixed: true, size: 64 if apollo_columns.include?(:content_hash)
|
|
11
|
+
set_column_type :knowledge_domain, String, size: 255 if apollo_columns.include?(:knowledge_domain)
|
|
12
|
+
set_column_type :source_provider, String, size: 255 if apollo_columns.include?(:source_provider)
|
|
13
|
+
set_column_type :source_agent, String, size: 255 if apollo_columns.include?(:source_agent)
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
next unless table_exists?(:apollo_entries_archive)
|
|
17
|
+
|
|
18
|
+
archive_columns = schema(:apollo_entries_archive).map(&:first)
|
|
19
|
+
alter_table(:apollo_entries_archive) do
|
|
20
|
+
set_column_type :content_hash, String, fixed: true, size: 64 if archive_columns.include?(:content_hash)
|
|
21
|
+
set_column_type :knowledge_domain, String, size: 255 if archive_columns.include?(:knowledge_domain)
|
|
22
|
+
set_column_type :source_provider, String, size: 255 if archive_columns.include?(:source_provider)
|
|
23
|
+
set_column_type :source_agent, String, size: 255 if archive_columns.include?(:source_agent)
|
|
24
|
+
end
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
down do
|
|
28
|
+
next unless adapter_scheme == :postgres
|
|
29
|
+
next unless table_exists?(:apollo_entries)
|
|
30
|
+
|
|
31
|
+
apollo_columns = schema(:apollo_entries).map(&:first)
|
|
32
|
+
alter_table(:apollo_entries) do
|
|
33
|
+
set_column_type :content_hash, String, fixed: true, size: 32 if apollo_columns.include?(:content_hash)
|
|
34
|
+
set_column_type :knowledge_domain, String, size: 50 if apollo_columns.include?(:knowledge_domain)
|
|
35
|
+
set_column_type :source_provider, String, size: 50 if apollo_columns.include?(:source_provider)
|
|
36
|
+
set_column_type :source_agent, String, size: 50 if apollo_columns.include?(:source_agent)
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
next unless table_exists?(:apollo_entries_archive)
|
|
40
|
+
|
|
41
|
+
archive_columns = schema(:apollo_entries_archive).map(&:first)
|
|
42
|
+
alter_table(:apollo_entries_archive) do
|
|
43
|
+
set_column_type :content_hash, String, fixed: true, size: 32 if archive_columns.include?(:content_hash)
|
|
44
|
+
set_column_type :knowledge_domain, String, size: 50 if archive_columns.include?(:knowledge_domain)
|
|
45
|
+
set_column_type :source_provider, String, size: 50 if archive_columns.include?(:source_provider)
|
|
46
|
+
set_column_type :source_agent, String, size: 50 if archive_columns.include?(:source_agent)
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
end
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do
|
|
4
|
+
up do
|
|
5
|
+
next unless table_exists?(:tasks)
|
|
6
|
+
|
|
7
|
+
existing_columns = schema(:tasks).map(&:first)
|
|
8
|
+
alter_table(:tasks) do
|
|
9
|
+
add_column :idempotency_key, String, size: 64 unless existing_columns.include?(:idempotency_key)
|
|
10
|
+
add_column :idempotency_expires_at, DateTime unless existing_columns.include?(:idempotency_expires_at)
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
add_index :tasks, :idempotency_key, name: :idx_tasks_idempotency_key, if_not_exists: true
|
|
14
|
+
add_index :tasks, :idempotency_expires_at, name: :idx_tasks_idempotency_expires_at, if_not_exists: true
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
down do
|
|
18
|
+
next unless table_exists?(:tasks)
|
|
19
|
+
|
|
20
|
+
existing_columns = schema(:tasks).map(&:first)
|
|
21
|
+
alter_table(:tasks) do
|
|
22
|
+
drop_index :idempotency_key, name: :idx_tasks_idempotency_key, if_exists: true
|
|
23
|
+
drop_index :idempotency_expires_at, name: :idx_tasks_idempotency_expires_at, if_exists: true
|
|
24
|
+
drop_column :idempotency_expires_at if existing_columns.include?(:idempotency_expires_at)
|
|
25
|
+
drop_column :idempotency_key if existing_columns.include?(:idempotency_key)
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
Sequel.migration do
|
|
4
|
+
up do
|
|
5
|
+
create_table?(:extract_step_timings) do
|
|
6
|
+
primary_key :id
|
|
7
|
+
String :extract_id, size: 36, null: false
|
|
8
|
+
String :name, size: 100, null: false
|
|
9
|
+
DateTime :start_time, null: false
|
|
10
|
+
DateTime :end_time, null: false
|
|
11
|
+
String :status, size: 20, null: false
|
|
12
|
+
String :error, text: true
|
|
13
|
+
Integer :duration_ms, null: false, default: 0
|
|
14
|
+
|
|
15
|
+
index :extract_id, name: :idx_extract_step_timings_extract_id
|
|
16
|
+
index %i[extract_id name], name: :idx_extract_step_timings_extract_name
|
|
17
|
+
index :status, name: :idx_extract_step_timings_status
|
|
18
|
+
end
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
down do
|
|
22
|
+
drop_table?(:extract_step_timings)
|
|
23
|
+
end
|
|
24
|
+
end
|
data/lib/legion/data/model.rb
CHANGED
|
@@ -14,7 +14,7 @@ module Legion
|
|
|
14
14
|
%w[extension function relationship chain task runner node setting digital_worker
|
|
15
15
|
apollo_entry apollo_relation apollo_expertise apollo_access_log audit_log
|
|
16
16
|
audit_record identity_provider principal identity identity_group
|
|
17
|
-
identity_group_membership]
|
|
17
|
+
identity_group_membership identity_audit_log extract_step_timing]
|
|
18
18
|
end
|
|
19
19
|
|
|
20
20
|
def load
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require 'legion/logging/helper'
|
|
4
|
+
require 'legion/data/audit_log_hash_chain'
|
|
4
5
|
|
|
5
6
|
module Legion
|
|
6
7
|
module Data
|
|
@@ -33,6 +34,14 @@ module Legion
|
|
|
33
34
|
def before_destroy
|
|
34
35
|
raise 'audit_log records are immutable and cannot be deleted'
|
|
35
36
|
end
|
|
37
|
+
|
|
38
|
+
def self.compute_hash(record)
|
|
39
|
+
Legion::Data::AuditLogHashChain.compute_hash(record)
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
def self.verify_chain(records = order(:created_at, :id).all)
|
|
43
|
+
Legion::Data::AuditLogHashChain.verify(records)
|
|
44
|
+
end
|
|
36
45
|
end
|
|
37
46
|
end
|
|
38
47
|
end
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
return unless Legion::Data::Connection.adapter == :postgres
|
|
4
|
+
|
|
5
|
+
module Legion
|
|
6
|
+
module Data
|
|
7
|
+
module Model
|
|
8
|
+
class IdentityAuditLog < Sequel::Model(:identity_audit_log)
|
|
9
|
+
many_to_one :principal, class: 'Legion::Data::Model::Principal'
|
|
10
|
+
many_to_one :identity, class: 'Legion::Data::Model::Identity'
|
|
11
|
+
end
|
|
12
|
+
end
|
|
13
|
+
end
|
|
14
|
+
end
|
|
@@ -1,9 +1,17 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require 'digest'
|
|
4
|
+
require 'legion/json'
|
|
5
|
+
require 'time'
|
|
6
|
+
|
|
3
7
|
module Legion
|
|
4
8
|
module Data
|
|
5
9
|
module Model
|
|
6
10
|
class Task < Sequel::Model
|
|
11
|
+
TERMINAL_STATUSES = %w[
|
|
12
|
+
completed complete failed error cancelled canceled timeout timed_out
|
|
13
|
+
].freeze
|
|
14
|
+
|
|
7
15
|
many_to_one :relationship
|
|
8
16
|
one_to_many :task_log
|
|
9
17
|
many_to_one :parent, class: self
|
|
@@ -11,9 +19,51 @@ module Legion
|
|
|
11
19
|
many_to_one :master, class: self
|
|
12
20
|
one_to_many :slave, key: :master_id, class: self
|
|
13
21
|
|
|
22
|
+
def self.idempotency_key_for(payload)
|
|
23
|
+
Digest::SHA256.hexdigest(Legion::JSON.dump(canonical_payload(payload)))
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
def self.find_active_by_idempotency_key(key, now: Time.now)
|
|
27
|
+
return nil if key.to_s.empty?
|
|
28
|
+
return nil unless columns.include?(:idempotency_key)
|
|
29
|
+
|
|
30
|
+
where(idempotency_key: key)
|
|
31
|
+
.exclude(status: TERMINAL_STATUSES)
|
|
32
|
+
.where { (idempotency_expires_at =~ nil) | (idempotency_expires_at > now) }
|
|
33
|
+
.reverse_order(:created, :id)
|
|
34
|
+
.first
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
def self.create_idempotent(values, payload: nil, idempotency_key: nil, ttl: nil)
|
|
38
|
+
key = idempotency_key || idempotency_key_for(payload || values)
|
|
39
|
+
existing = find_active_by_idempotency_key(key)
|
|
40
|
+
return existing if existing
|
|
41
|
+
|
|
42
|
+
expires_at = ttl ? Time.now + ttl : nil
|
|
43
|
+
create(values.merge(idempotency_key: key, idempotency_expires_at: expires_at))
|
|
44
|
+
end
|
|
45
|
+
|
|
14
46
|
def cancelled?
|
|
15
47
|
!cancelled_at.nil?
|
|
16
48
|
end
|
|
49
|
+
|
|
50
|
+
def self.canonical_payload(value)
|
|
51
|
+
case value
|
|
52
|
+
when Hash
|
|
53
|
+
value.keys.map(&:to_s).sort.to_h do |key|
|
|
54
|
+
[key, canonical_payload(value.fetch(key) { value.fetch(key.to_sym) })]
|
|
55
|
+
end
|
|
56
|
+
when Array
|
|
57
|
+
value.map { |item| canonical_payload(item) }
|
|
58
|
+
when Time
|
|
59
|
+
value.utc.iso8601(6)
|
|
60
|
+
when DateTime
|
|
61
|
+
value.to_time.utc.iso8601(6)
|
|
62
|
+
else
|
|
63
|
+
value
|
|
64
|
+
end
|
|
65
|
+
end
|
|
66
|
+
private_class_method :canonical_payload
|
|
17
67
|
end
|
|
18
68
|
end
|
|
19
69
|
end
|
data/lib/legion/data/version.rb
CHANGED
data/lib/legion/data.rb
CHANGED
|
@@ -16,6 +16,7 @@ require_relative 'data/helper'
|
|
|
16
16
|
require_relative 'data/rls'
|
|
17
17
|
require_relative 'data/extract'
|
|
18
18
|
require_relative 'data/audit_record'
|
|
19
|
+
require_relative 'data/audit_log_hash_chain'
|
|
19
20
|
|
|
20
21
|
unless Legion::Logging::Helper.method_defined?(:handle_exception)
|
|
21
22
|
module Legion
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: legion-data
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.
|
|
4
|
+
version: 1.7.3
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Esity
|
|
@@ -121,6 +121,7 @@ files:
|
|
|
121
121
|
- lib/legion/data/archival.rb
|
|
122
122
|
- lib/legion/data/archival/policy.rb
|
|
123
123
|
- lib/legion/data/archiver.rb
|
|
124
|
+
- lib/legion/data/audit_log_hash_chain.rb
|
|
124
125
|
- lib/legion/data/audit_record.rb
|
|
125
126
|
- lib/legion/data/connection.rb
|
|
126
127
|
- lib/legion/data/encryption/cipher.rb
|
|
@@ -216,6 +217,11 @@ files:
|
|
|
216
217
|
- lib/legion/data/migrations/069_add_principal_id_to_nodes.rb
|
|
217
218
|
- lib/legion/data/migrations/070_add_approval_queue_resume.rb
|
|
218
219
|
- lib/legion/data/migrations/071_add_engine_to_relationships.rb
|
|
220
|
+
- lib/legion/data/migrations/072_create_identity_audit_log.rb
|
|
221
|
+
- lib/legion/data/migrations/073_add_identity_multi_instance_columns.rb
|
|
222
|
+
- lib/legion/data/migrations/074_widen_apollo_entry_identifiers.rb
|
|
223
|
+
- lib/legion/data/migrations/075_add_task_idempotency.rb
|
|
224
|
+
- lib/legion/data/migrations/076_create_extract_step_timings.rb
|
|
219
225
|
- lib/legion/data/model.rb
|
|
220
226
|
- lib/legion/data/models/apollo_access_log.rb
|
|
221
227
|
- lib/legion/data/models/apollo_entry.rb
|
|
@@ -226,8 +232,10 @@ files:
|
|
|
226
232
|
- lib/legion/data/models/chain.rb
|
|
227
233
|
- lib/legion/data/models/digital_worker.rb
|
|
228
234
|
- lib/legion/data/models/extension.rb
|
|
235
|
+
- lib/legion/data/models/extract_step_timing.rb
|
|
229
236
|
- lib/legion/data/models/function.rb
|
|
230
237
|
- lib/legion/data/models/identity.rb
|
|
238
|
+
- lib/legion/data/models/identity_audit_log.rb
|
|
231
239
|
- lib/legion/data/models/identity_group.rb
|
|
232
240
|
- lib/legion/data/models/identity_group_membership.rb
|
|
233
241
|
- lib/legion/data/models/identity_provider.rb
|