legion-data 1.7.0 → 1.7.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8f45999e6d1c3b0727dc351630f7c76da902a40513e9cbcd35dbce97d5c90304
4
- data.tar.gz: b3efe7c69e3ccb9540f44fc4e7a464428898abaa74bcddbab9c4b39f2eefc3b4
3
+ metadata.gz: b562a2c2ce70e81d1b71038255dd8a23a34fc78805acadc34da983dc995b2690
4
+ data.tar.gz: ae6cb3c93f9498b9f16df5294c3a9ce871a47723da87933c7ae5ac48b2fcf4cb
5
5
  SHA512:
6
- metadata.gz: 6449f9218be46e31329571cca1d9d3233860b8dd3e307eb35fcc824d3385969dd25a3493a1b0e4261786e622fe754c3d20f152bbf3bf9bfb920c0f0041103df9
7
- data.tar.gz: b24487e1c279cda679fac9f27b3b4d7d5af00be5652782065e0f9b3a2b78502dce1d45c0db45b3f5f3632da9fd68f18b75714825d83dc0c0cd05953a26ae6c03
6
+ metadata.gz: ec953bd837aa49a737584ea20c8668eb35156953593584297fc80edca0d9c7b7594a55b91976006a572b7521b797e44310d7244c4055683b5cf6c673bcddd8fd
7
+ data.tar.gz: 750707075046c2b09edbd5a0f8bca49059e3b60bf6707fe0ffdb694b54df868e358a308d7e86fbbaf1b1e5c005a797517edde17cfdcf0782d9338532c067d460
@@ -0,0 +1,29 @@
1
+ # Standard LegionIO pre-commit configuration
2
+ # Install: pre-commit install
3
+ # Manual: pre-commit run --all-files
4
+ repos:
5
+ - repo: https://github.com/pre-commit/pre-commit-hooks
6
+ rev: v5.0.0
7
+ hooks:
8
+ - id: trailing-whitespace
9
+ - id: end-of-file-fixer
10
+ - id: check-yaml
11
+ - id: check-json
12
+ exclude: Gemfile\.lock
13
+ - id: check-merge-conflict
14
+
15
+ - repo: local
16
+ hooks:
17
+ - id: rubocop
18
+ name: RuboCop (autofix)
19
+ entry: scripts/pre-commit-rubocop.sh
20
+ language: script
21
+ types: [ruby]
22
+ pass_filenames: true
23
+
24
+ - id: ruby-syntax
25
+ name: Ruby syntax check
26
+ entry: bash -c 'status=0; for file in "$@"; do ruby -c "$file" || status=$?; done; exit $status' --
27
+ language: system
28
+ types: [ruby]
29
+ pass_filenames: true
data/CHANGELOG.md CHANGED
@@ -2,6 +2,43 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [1.7.4] - 2026-04-28
6
+
7
+ ### Fixed
8
+ - Pre-commit RuboCop hook now distinguishes missing tools from real RuboCop failures and propagates failures instead of silently passing.
9
+ - Ruby syntax pre-commit hook now checks every staged Ruby file instead of only the first argument.
10
+ - Connection setup now refreshes the configured adapter before each setup call and clears fallback state on shutdown so fallback health checks do not stay stale across reconnects.
11
+
12
+ ### Changed
13
+ - README refreshed for the current migration count, version line, fallback diagnostics, pre-commit workflow, and recent model surface.
14
+
15
+ ## [1.7.3] - 2026-04-27
16
+
17
+ ### Added
18
+ - Migration 074: widens Apollo `content_hash` to 64 fixed characters and `knowledge_domain` / `source_provider` / `source_agent` to 255 characters so SHA-256 hashes and real-world identifiers fit without ingestion truncation failures. (Fixes #33, #34)
19
+ - Migration 075: adds task `idempotency_key` and `idempotency_expires_at` columns plus indexes for SHA-256 payload deduplication windows. (Fixes #14)
20
+ - Migration 076: adds `extract_step_timings` for per-step Extract pipeline timing visibility. (Fixes #15)
21
+ - `Task.idempotency_key_for`, `Task.find_active_by_idempotency_key`, and `Task.create_idempotent` for stable content-addressed task dispatch deduplication. (Fixes #14)
22
+ - Extract results now include `extract_id` and `step_timings`, and persist timing rows when the migration is present. (Fixes #15)
23
+ - `AuditLogHashChain` plus `AuditLog.compute_hash` / `AuditLog.verify_chain` as the canonical data-side audit log hash-chain implementation for standard write paths to share. (Refs #13)
24
+
25
+ ### Fixed
26
+ - Migration 051 now adds SQLite/MySQL `tasks.created_at` without a non-constant default before backfilling from `created`, allowing later migration specs and fresh SQLite databases to migrate cleanly.
27
+
28
+ ## [1.7.2] - 2026-04-27
29
+
30
+ ### Fixed
31
+ - Dev-fallback to SQLite now logs at `:error` level with explicit warnings that data written to SQLite will not be visible when the configured network database reconnects.
32
+
33
+ ### Added
34
+ - `Connection.connection_info` — returns adapter, connection state, and fallback status for health checks and diagnostics
35
+ - `Connection.fallback_active?` — returns true when the data layer fell back to SQLite from a configured network database; Apollo and other services can check this to detect degraded mode and log appropriate warnings
36
+
37
+ ## [1.7.1] - 2026-04-27
38
+
39
+ ### Fixed
40
+ - `QueryFileLogger` now treats writes after `close` as no-ops, preventing repeated `IOError: closed stream` warnings from late Sequel query callbacks during shutdown. (Fixes #35)
41
+
5
42
  ## [1.7.0] - 2026-04-24
6
43
 
7
44
  ### Added
data/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # legion-data
2
2
 
3
- Persistent database storage for the [LegionIO](https://github.com/LegionIO/LegionIO) async job engine and AI coding assistant platform. Provides database connectivity via the [Sequel ORM](https://sequel.jeremyevans.net/), automatic schema migrations (71 numbered migrations), Sequel models for the full LegionIO control plane, and a parallel local SQLite database for on-node agentic cognitive state.
3
+ Persistent database storage for the [LegionIO](https://github.com/LegionIO/LegionIO) async job engine and AI coding assistant platform. Provides database connectivity via the [Sequel ORM](https://sequel.jeremyevans.net/), automatic schema migrations (76 numbered migrations), Sequel models for the full LegionIO control plane, and a parallel local SQLite database for on-node agentic cognitive state.
4
4
 
5
- **Version**: 1.6.25 | **Ruby**: >= 3.4 | **License**: Apache-2.0
5
+ **Version**: 1.7.4 | **Ruby**: >= 3.4 | **License**: Apache-2.0
6
6
 
7
7
  ---
8
8
 
@@ -51,17 +51,19 @@ Legion::Data (singleton module)
51
51
  │ ├── .adapter # Reads adapter from settings (:sqlite, :mysql2, :postgres)
52
52
  │ ├── .setup # Establish connection (dev_mode fallback to SQLite if unreachable)
53
53
  │ ├── .sequel # Raw Sequel::Database accessor
54
+ │ ├── .connection_info # Adapter, liveness, and fallback diagnostics
55
+ │ ├── .fallback_active? # True when dev fallback moved a network DB to SQLite
54
56
  │ ├── .stats # Pool metrics, tuning snapshot, adapter-specific DB stats
55
57
  │ └── .shutdown # Disconnect and close query file logger
56
58
 
57
- ├── Migration # Auto-migration system (71 numbered Sequel DSL migrations)
59
+ ├── Migration # Auto-migration system (76 numbered Sequel DSL migrations)
58
60
 
59
61
  ├── Model # Sequel model autoloader
60
62
  │ └── Models: Extension, Function, Runner, Node, Task, TaskLog, Setting,
61
63
  │ DigitalWorker, Relationship, AuditLog, AuditRecord, Chain,
62
64
  │ RbacRoleAssignment, RbacRunnerGrant, RbacCrossTeamGrant,
63
65
  │ IdentityProvider, Principal, Identity, IdentityGroup,
64
- │ IdentityGroupMembership,
66
+ │ IdentityGroupMembership, IdentityAuditLog,
65
67
  │ ApolloEntry, ApolloRelation, ApolloExpertise, ApolloAccessLog (PG only)
66
68
 
67
69
  ├── Local # Parallel local SQLite for agentic cognitive state
@@ -117,6 +119,10 @@ Legion::Data.local.db_path # => "legionio_local.db"
117
119
  Legion::Data.connected? # => true
118
120
  Legion::Data.stats # => { shared: {...}, local: {...} }
119
121
 
122
+ # Inspect shared DB diagnostics, including dev fallback state
123
+ Legion::Data::Connection.connection_info
124
+ # => { adapter: :sqlite, connected: true, fallback_active: false, ... }
125
+
120
126
  # Shut down both connections
121
127
  Legion::Data.shutdown
122
128
  ```
@@ -139,12 +145,30 @@ MyMemoryTrace.all # queries legionio_local.db, never the shared DB
139
145
  `Legion::Data::Extract` provides a handler registry for extracting text from documents, used by `lex-knowledge` for corpus ingestion:
140
146
 
141
147
  ```ruby
142
- text = Legion::Data::Extract.extract('/path/to/document.pdf')
143
- text = Legion::Data::Extract.extract('/path/to/data.csv')
148
+ result = Legion::Data::Extract.extract('/path/to/document.pdf')
149
+ text = result[:text]
150
+ result[:step_timings] # per-step name, start_time, end_time, status, error, duration_ms
144
151
  ```
145
152
 
146
153
  Supported formats: `.txt`, `.md`, `.csv`, `.json`, `.jsonl`, `.html`, `.xlsx`, `.docx`, `.pdf`, `.pptx`, `.vtt`
147
154
 
155
+ When migration 076 is present, Extract also persists the same per-step timing rows to `extract_step_timings`
156
+ under the returned `extract_id`.
157
+
158
+ ### Task Idempotency
159
+
160
+ `Task.idempotency_key_for` computes a stable SHA-256 key from canonical JSON payloads. `Task.create_idempotent`
161
+ returns an existing non-terminal task for the same key inside the optional TTL window, or creates a new task
162
+ with `idempotency_key` and `idempotency_expires_at` populated:
163
+
164
+ ```ruby
165
+ task = Legion::Data::Model::Task.create_idempotent(
166
+ { status: 'pending', payload: Legion::JSON.dump(payload) },
167
+ payload: payload,
168
+ ttl: 300
169
+ )
170
+ ```
171
+
148
172
  ### Filesystem Spool (Write Buffer)
149
173
 
150
174
  When the database is unavailable, `Legion::Data::Spool` buffers writes to `~/.legionio/data/spool/` and replays once the connection is restored:
@@ -296,6 +320,8 @@ When `dev_mode: true` and a network database is unreachable, the shared connecti
296
320
  { "data": { "dev_mode": true, "dev_fallback": true } }
297
321
  ```
298
322
 
323
+ Fallback is intentionally loud. `Connection.setup` logs the degraded mode at error level, `Connection.fallback_active?` returns `true`, and `Connection.connection_info` reports the configured adapter, actual adapter, connection state, and Sequel liveness. Data written during fallback is local-only SQLite data and will not appear in the configured network database after reconnect.
324
+
299
325
  ### HashiCorp Vault Integration
300
326
 
301
327
  When Vault is connected, credentials are fetched dynamically from `database/creds/legion`, overriding any static `creds` block.
@@ -343,6 +369,7 @@ Legion::Data.reload_static_cache
343
369
  | `Chain` | `chains` | Task execution chains |
344
370
  | `AuditLog` | `audit_log` | Tamper-evident audit trail with hash chain |
345
371
  | `AuditRecord` | `audit_records` | Structured audit records |
372
+ | `ExtractStepTiming` | `extract_step_timings` | Per-step Extract pipeline timing metadata |
346
373
  | `RbacRoleAssignment` | `rbac_role_assignments` | RBAC principal -> role mappings |
347
374
  | `RbacRunnerGrant` | `rbac_runner_grants` | Per-runner permission grants |
348
375
  | `RbacCrossTeamGrant` | `rbac_cross_team_grants` | Cross-team access grants |
@@ -377,7 +404,7 @@ Apollo models require PostgreSQL with the `pgvector` extension. They are skipped
377
404
 
378
405
  ## Migrations
379
406
 
380
- 71 numbered Sequel DSL migrations run automatically on startup (`auto_migrate: true`). Key milestones:
407
+ 76 numbered Sequel DSL migrations run automatically on startup (`auto_migrate: true`). Key milestones:
381
408
 
382
409
  | Range | What was added |
383
410
  |-------|---------------|
@@ -393,6 +420,8 @@ Apollo models require PostgreSQL with the `pgvector` extension. They are skipped
393
420
  | 050 | Critical indexes across 13 tables |
394
421
  | 058–067 | Audit records, chains, knowledge tiers, tool embedding cache, identity system (providers, principals, identities, groups) |
395
422
  | 068–071 | Entity type on audit records, principal on nodes, approval queue resume, engine on relationships |
423
+ | 072–073 | Identity audit log and multi-instance identity columns |
424
+ | 074–076 | Apollo field width fixes, task idempotency columns, and Extract step timing rows |
396
425
 
397
426
  Run migrations standalone:
398
427
 
@@ -442,6 +471,15 @@ bundle exec rspec # all tests must pass
442
471
  bundle exec rubocop -A # zero offenses expected
443
472
  ```
444
473
 
474
+ This repo also includes a pre-commit configuration:
475
+
476
+ ```bash
477
+ pre-commit install
478
+ pre-commit run --all-files
479
+ ```
480
+
481
+ The local RuboCop hook auto-corrects staged Ruby files when RuboCop is available and fails the commit when RuboCop reports real offenses. The Ruby syntax hook checks every staged Ruby file.
482
+
445
483
  Follow the [LegionIO contribution guide](https://github.com/LegionIO/.github/blob/main/CONTRIBUTING.md). Open a PR against `main`.
446
484
 
447
485
  ---
@@ -0,0 +1,85 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'digest'
4
+ require 'legion/json'
5
+ require 'time'
6
+
7
+ module Legion
8
+ module Data
9
+ module AuditLogHashChain
10
+ GENESIS_HASH = ('0' * 64).freeze
11
+ CANONICAL_FIELDS = %i[
12
+ principal_id action resource source status detail created_at previous_hash
13
+ ].freeze
14
+
15
+ class << self
16
+ def compute_hash(record)
17
+ Digest::SHA256.hexdigest(canonical_payload(record))
18
+ end
19
+
20
+ def verify(records)
21
+ previous_hash = GENESIS_HASH
22
+ records.each do |record|
23
+ return invalid(record, :parent_mismatch) unless value_for(record, :previous_hash).to_s == previous_hash
24
+
25
+ expected = compute_hash(record)
26
+ return invalid(record, :hash_mismatch) unless value_for(record, :record_hash).to_s == expected
27
+
28
+ previous_hash = expected
29
+ end
30
+
31
+ { valid: true, length: records.size }
32
+ end
33
+
34
+ def canonical_payload(record)
35
+ CANONICAL_FIELDS.map do |field|
36
+ "#{field}:#{canonical_value(value_for(record, field))}"
37
+ end.join('|')
38
+ end
39
+
40
+ private
41
+
42
+ def invalid(record, reason)
43
+ { valid: false, broken_at: value_for(record, :id), reason: reason }
44
+ end
45
+
46
+ def canonical_value(value)
47
+ case value
48
+ when Time
49
+ value.utc.iso8601(6)
50
+ when DateTime
51
+ value.to_time.utc.iso8601(6)
52
+ when Hash
53
+ Legion::JSON.dump(canonical_hash(value))
54
+ when Array
55
+ Legion::JSON.dump(value.map { |item| canonical_json_value(item) })
56
+ else
57
+ value.to_s
58
+ end
59
+ end
60
+
61
+ def canonical_json_value(value)
62
+ case value
63
+ when Hash then canonical_hash(value)
64
+ when Array then value.map { |item| canonical_json_value(item) }
65
+ else value
66
+ end
67
+ end
68
+
69
+ def canonical_hash(hash)
70
+ hash.keys.map(&:to_s).sort.to_h do |key|
71
+ [key, canonical_json_value(hash.fetch(key) { hash.fetch(key.to_sym) })]
72
+ end
73
+ end
74
+
75
+ def value_for(record, field)
76
+ return record[field] if record.respond_to?(:[]) && !record[field].nil?
77
+ return record[field.to_s] if record.respond_to?(:[]) && !record[field.to_s].nil?
78
+ return record.public_send(field) if record.respond_to?(field)
79
+
80
+ nil
81
+ end
82
+ end
83
+ end
84
+ end
85
+ end
@@ -103,12 +103,13 @@ module Legion
103
103
 
104
104
  def initialize(path)
105
105
  @path = path
106
+ @closed = false
107
+ @mutex = Mutex.new
106
108
  dir = File.dirname(path)
107
109
  FileUtils.mkdir_p(dir)
108
110
  FileUtils.chmod(0o700, dir) if File.directory?(dir)
109
111
  @file = File.open(path, File::WRONLY | File::APPEND | File::CREAT, 0o600)
110
112
  @file.sync = true
111
- @mutex = Mutex.new
112
113
  end
113
114
 
114
115
  def debug(message)
@@ -128,16 +129,23 @@ module Legion
128
129
  end
129
130
 
130
131
  def close
131
- @mutex.synchronize { @file.close unless @file.closed? }
132
+ @mutex.synchronize do
133
+ @closed = true
134
+ @file.close unless @file.closed?
135
+ end
132
136
  end
133
137
 
134
138
  private
135
139
 
136
140
  def write(level, message)
137
141
  @mutex.synchronize do
142
+ return if @closed || @file.closed?
143
+
138
144
  @file.puts "[#{Time.now.strftime('%Y-%m-%d %H:%M:%S.%L')}] #{level} #{message}"
139
145
  end
140
146
  rescue IOError => e
147
+ return nil if @closed || @file.closed?
148
+
141
149
  handle_exception(e, level: :warn, handled: true, operation: :query_file_write, path: @path)
142
150
  nil
143
151
  end
@@ -153,18 +161,25 @@ module Legion
153
161
  end
154
162
 
155
163
  def setup
164
+ @adapter = Legion::Settings[:data][:adapter]&.to_sym || :sqlite
156
165
  opts = sequel_opts
157
166
  log.info("Legion::Data::Connection setup adapter=#{adapter}")
167
+ @fallback_active = false
158
168
  @sequel = if adapter == :sqlite
159
169
  ::Sequel.connect(opts.merge(adapter: :sqlite, database: sqlite_path))
160
170
  else
171
+ attempted_adapter = adapter
161
172
  begin
162
- ::Sequel.connect(connection_opts_for(adapter: adapter, opts: opts))
173
+ ::Sequel.connect(connection_opts_for(adapter: attempted_adapter, opts: opts))
163
174
  rescue StandardError => e
164
175
  raise unless dev_fallback?
165
176
 
166
- handle_exception(e, level: :warn, handled: true, operation: :shared_connect, fallback: :sqlite)
177
+ log.error("Legion::Data FALLING BACK TO SQLITE #{attempted_adapter} network DB connection failed: #{e.message}")
178
+ log.error("Legion::Data WARNING: Data written to SQLite will NOT be visible when #{attempted_adapter} reconnects. " \
179
+ 'Apollo knowledge, audit logs, and other DB-backed services will use a local-only store.')
180
+ handle_exception(e, level: :error, handled: true, operation: :shared_connect, fallback: :sqlite)
167
181
  @adapter = :sqlite
182
+ @fallback_active = true
168
183
  sqlite_opts = sequel_opts
169
184
  ::Sequel.connect(sqlite_opts.merge(adapter: :sqlite, database: sqlite_path))
170
185
  end
@@ -175,6 +190,25 @@ module Legion
175
190
  connect_with_replicas
176
191
  end
177
192
 
193
+ # Returns connection metadata for health checks and diagnostics.
194
+ # Apollo and other services can use this to detect silent fallback.
195
+ def connection_info
196
+ {
197
+ adapter: adapter,
198
+ connected: Legion::Settings[:data][:connected],
199
+ fallback_active: @fallback_active || false,
200
+ configured_adapter: Legion::Settings[:data][:adapter]&.to_sym || :sqlite,
201
+ sequel_alive: (begin; !@sequel&.test_connection.nil?; rescue StandardError; false; end)
202
+ }
203
+ end
204
+
205
+ # Returns true if the data layer fell back to SQLite from a configured
206
+ # network database (PostgreSQL/MySQL). Services should check this and
207
+ # log warnings when operating in degraded mode.
208
+ def fallback_active?
209
+ @fallback_active == true
210
+ end
211
+
178
212
  def stats
179
213
  return { connected: false } unless @sequel
180
214
 
@@ -228,6 +262,7 @@ module Legion
228
262
  @sequel&.disconnect
229
263
  @query_file_logger&.close
230
264
  @query_file_logger = nil
265
+ @fallback_active = false
231
266
  Legion::Settings[:data][:connected] = false
232
267
  log.info 'Legion::Data connection closed'
233
268
  end
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'legion/logging/helper'
4
+ require 'securerandom'
4
5
  require_relative 'extract/type_detector'
5
6
  require_relative 'extract/handlers/base'
6
7
 
@@ -11,29 +12,49 @@ module Legion
11
12
  include Legion::Logging::Helper
12
13
 
13
14
  def extract(source, type: :auto)
14
- detected_type = type == :auto ? TypeDetector.detect(source) : type&.to_sym
15
- return { success: false, text: nil, error: :unknown_type } unless detected_type
15
+ extract_id = SecureRandom.uuid
16
+ timings = []
17
+ detected_type = timed_step(:detect_type, timings) do
18
+ type == :auto ? TypeDetector.detect(source) : type&.to_sym
19
+ end
20
+ unless detected_type
21
+ result = { success: false, text: nil, error: :unknown_type, extract_id: extract_id,
22
+ step_timings: timings }
23
+ persist_step_timings(extract_id, timings)
24
+ return result
25
+ end
16
26
 
17
- handler = Handlers::Base.for_type(detected_type)
18
- return { success: false, text: nil, error: :no_handler, type: detected_type } unless handler
27
+ handler = timed_step(:resolve_handler, timings) { Handlers::Base.for_type(detected_type) }
28
+ unless handler
29
+ result = { success: false, text: nil, error: :no_handler, type: detected_type, extract_id: extract_id,
30
+ step_timings: timings }
31
+ persist_step_timings(extract_id, timings)
32
+ return result
33
+ end
19
34
 
20
- unless handler.available?
35
+ available = timed_step(:check_availability, timings) { handler.available? }
36
+ unless available
21
37
  return { success: false, text: nil, error: :gem_not_installed,
22
- gem: handler.gem_name, type: detected_type }
38
+ gem: handler.gem_name, type: detected_type, extract_id: extract_id,
39
+ step_timings: timings }.tap { persist_step_timings(extract_id, timings) }
23
40
  end
24
41
 
25
42
  log.info "Extract starting type=#{detected_type} handler=#{handler.name}"
26
- result = handler.extract(source)
43
+ result = timed_step(:handler_extract, timings) { handler.extract(source) }
27
44
  if result[:text]
28
45
  log.info "Extract succeeded type=#{detected_type}"
29
- { success: true, text: result[:text], metadata: result[:metadata], type: detected_type }
46
+ { success: true, text: result[:text], metadata: result[:metadata], type: detected_type,
47
+ extract_id: extract_id, step_timings: timings }
30
48
  else
31
49
  log.warn "Extract failed type=#{detected_type} error=#{result[:error]}"
32
- { success: false, text: nil, error: result[:error], type: detected_type }
33
- end
50
+ { success: false, text: nil, error: result[:error], type: detected_type,
51
+ extract_id: extract_id, step_timings: timings }
52
+ end.tap { persist_step_timings(extract_id, timings) }
34
53
  rescue StandardError => e
35
54
  handle_exception(e, level: :error, handled: true, operation: :extract, type: detected_type)
36
- { success: false, text: nil, error: e.message, type: detected_type }
55
+ persist_step_timings(extract_id, timings) if extract_id
56
+ { success: false, text: nil, error: e.message, type: detected_type, extract_id: extract_id,
57
+ step_timings: timings }
37
58
  end
38
59
 
39
60
  def supported_types
@@ -54,6 +75,48 @@ module Legion
54
75
 
55
76
  private
56
77
 
78
+ def timed_step(name, timings)
79
+ monotonic_start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
80
+ start_time = Time.now.utc
81
+ result = yield
82
+ record_step_timing(timings, name: name, start_time: start_time, monotonic_start: monotonic_start,
83
+ status: :success)
84
+ result
85
+ rescue StandardError => e
86
+ record_step_timing(timings, name: name, start_time: start_time, monotonic_start: monotonic_start,
87
+ status: :error, error: "#{e.class}: #{e.message}")
88
+ raise
89
+ end
90
+
91
+ def record_step_timing(timings, name:, start_time:, monotonic_start:, status:, error: nil)
92
+ end_time = Time.now.utc
93
+ duration_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - monotonic_start) * 1000).round
94
+ timings << {
95
+ name: name.to_s,
96
+ start_time: start_time,
97
+ end_time: end_time,
98
+ status: status.to_s,
99
+ error: error,
100
+ duration_ms: duration_ms
101
+ }
102
+ end
103
+
104
+ def persist_step_timings(extract_id, timings)
105
+ return unless defined?(Legion::Data)
106
+
107
+ connection = Legion::Data.connection
108
+ return unless connection&.table_exists?(:extract_step_timings)
109
+
110
+ existing_steps = connection[:extract_step_timings].where(extract_id: extract_id).select_map(:name)
111
+ rows = timings.reject { |timing| existing_steps.include?(timing[:name]) }.map do |timing|
112
+ timing.merge(extract_id: extract_id)
113
+ end
114
+ connection[:extract_step_timings].multi_insert(rows) unless rows.empty?
115
+ rescue StandardError => e
116
+ handle_exception(e, level: :warn, handled: true, operation: :persist_extract_step_timings,
117
+ extract_id: extract_id)
118
+ end
119
+
57
120
  def load_all_handlers
58
121
  return if @handlers_loaded
59
122
 
@@ -14,7 +14,7 @@ Sequel.migration do
14
14
  else
15
15
  # SQLite/MySQL: add real column and backfill from created
16
16
  alter_table(:tasks) do
17
- add_column :created_at, DateTime, default: Sequel::CURRENT_TIMESTAMP
17
+ add_column :created_at, DateTime
18
18
  end
19
19
 
20
20
  run 'UPDATE tasks SET created_at = created WHERE created_at IS NULL'
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ next unless adapter_scheme == :postgres
6
+ next unless table_exists?(:apollo_entries)
7
+
8
+ apollo_columns = schema(:apollo_entries).map(&:first)
9
+ alter_table(:apollo_entries) do
10
+ set_column_type :content_hash, String, fixed: true, size: 64 if apollo_columns.include?(:content_hash)
11
+ set_column_type :knowledge_domain, String, size: 255 if apollo_columns.include?(:knowledge_domain)
12
+ set_column_type :source_provider, String, size: 255 if apollo_columns.include?(:source_provider)
13
+ set_column_type :source_agent, String, size: 255 if apollo_columns.include?(:source_agent)
14
+ end
15
+
16
+ next unless table_exists?(:apollo_entries_archive)
17
+
18
+ archive_columns = schema(:apollo_entries_archive).map(&:first)
19
+ alter_table(:apollo_entries_archive) do
20
+ set_column_type :content_hash, String, fixed: true, size: 64 if archive_columns.include?(:content_hash)
21
+ set_column_type :knowledge_domain, String, size: 255 if archive_columns.include?(:knowledge_domain)
22
+ set_column_type :source_provider, String, size: 255 if archive_columns.include?(:source_provider)
23
+ set_column_type :source_agent, String, size: 255 if archive_columns.include?(:source_agent)
24
+ end
25
+ end
26
+
27
+ down do
28
+ next unless adapter_scheme == :postgres
29
+ next unless table_exists?(:apollo_entries)
30
+
31
+ apollo_columns = schema(:apollo_entries).map(&:first)
32
+ alter_table(:apollo_entries) do
33
+ set_column_type :content_hash, String, fixed: true, size: 32 if apollo_columns.include?(:content_hash)
34
+ set_column_type :knowledge_domain, String, size: 50 if apollo_columns.include?(:knowledge_domain)
35
+ set_column_type :source_provider, String, size: 50 if apollo_columns.include?(:source_provider)
36
+ set_column_type :source_agent, String, size: 50 if apollo_columns.include?(:source_agent)
37
+ end
38
+
39
+ next unless table_exists?(:apollo_entries_archive)
40
+
41
+ archive_columns = schema(:apollo_entries_archive).map(&:first)
42
+ alter_table(:apollo_entries_archive) do
43
+ set_column_type :content_hash, String, fixed: true, size: 32 if archive_columns.include?(:content_hash)
44
+ set_column_type :knowledge_domain, String, size: 50 if archive_columns.include?(:knowledge_domain)
45
+ set_column_type :source_provider, String, size: 50 if archive_columns.include?(:source_provider)
46
+ set_column_type :source_agent, String, size: 50 if archive_columns.include?(:source_agent)
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ next unless table_exists?(:tasks)
6
+
7
+ existing_columns = schema(:tasks).map(&:first)
8
+ alter_table(:tasks) do
9
+ add_column :idempotency_key, String, size: 64 unless existing_columns.include?(:idempotency_key)
10
+ add_column :idempotency_expires_at, DateTime unless existing_columns.include?(:idempotency_expires_at)
11
+ end
12
+
13
+ add_index :tasks, :idempotency_key, name: :idx_tasks_idempotency_key, if_not_exists: true
14
+ add_index :tasks, :idempotency_expires_at, name: :idx_tasks_idempotency_expires_at, if_not_exists: true
15
+ end
16
+
17
+ down do
18
+ next unless table_exists?(:tasks)
19
+
20
+ existing_columns = schema(:tasks).map(&:first)
21
+ alter_table(:tasks) do
22
+ drop_index :idempotency_key, name: :idx_tasks_idempotency_key, if_exists: true
23
+ drop_index :idempotency_expires_at, name: :idx_tasks_idempotency_expires_at, if_exists: true
24
+ drop_column :idempotency_expires_at if existing_columns.include?(:idempotency_expires_at)
25
+ drop_column :idempotency_key if existing_columns.include?(:idempotency_key)
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ Sequel.migration do
4
+ up do
5
+ create_table?(:extract_step_timings) do
6
+ primary_key :id
7
+ String :extract_id, size: 36, null: false
8
+ String :name, size: 100, null: false
9
+ DateTime :start_time, null: false
10
+ DateTime :end_time, null: false
11
+ String :status, size: 20, null: false
12
+ String :error, text: true
13
+ Integer :duration_ms, null: false, default: 0
14
+
15
+ index :extract_id, name: :idx_extract_step_timings_extract_id
16
+ index %i[extract_id name], name: :idx_extract_step_timings_extract_name
17
+ index :status, name: :idx_extract_step_timings_status
18
+ end
19
+ end
20
+
21
+ down do
22
+ drop_table?(:extract_step_timings)
23
+ end
24
+ end
@@ -14,7 +14,7 @@ module Legion
14
14
  %w[extension function relationship chain task runner node setting digital_worker
15
15
  apollo_entry apollo_relation apollo_expertise apollo_access_log audit_log
16
16
  audit_record identity_provider principal identity identity_group
17
- identity_group_membership identity_audit_log]
17
+ identity_group_membership identity_audit_log extract_step_timing]
18
18
  end
19
19
 
20
20
  def load
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'legion/logging/helper'
4
+ require 'legion/data/audit_log_hash_chain'
4
5
 
5
6
  module Legion
6
7
  module Data
@@ -33,6 +34,14 @@ module Legion
33
34
  def before_destroy
34
35
  raise 'audit_log records are immutable and cannot be deleted'
35
36
  end
37
+
38
+ def self.compute_hash(record)
39
+ Legion::Data::AuditLogHashChain.compute_hash(record)
40
+ end
41
+
42
+ def self.verify_chain(records = order(:created_at, :id).all)
43
+ Legion::Data::AuditLogHashChain.verify(records)
44
+ end
36
45
  end
37
46
  end
38
47
  end
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module Data
5
+ module Model
6
+ class ExtractStepTiming < Sequel::Model(:extract_step_timings)
7
+ end
8
+ end
9
+ end
10
+ end
@@ -1,9 +1,17 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'digest'
4
+ require 'legion/json'
5
+ require 'time'
6
+
3
7
  module Legion
4
8
  module Data
5
9
  module Model
6
10
  class Task < Sequel::Model
11
+ TERMINAL_STATUSES = %w[
12
+ completed complete failed error cancelled canceled timeout timed_out
13
+ ].freeze
14
+
7
15
  many_to_one :relationship
8
16
  one_to_many :task_log
9
17
  many_to_one :parent, class: self
@@ -11,9 +19,51 @@ module Legion
11
19
  many_to_one :master, class: self
12
20
  one_to_many :slave, key: :master_id, class: self
13
21
 
22
+ def self.idempotency_key_for(payload)
23
+ Digest::SHA256.hexdigest(Legion::JSON.dump(canonical_payload(payload)))
24
+ end
25
+
26
+ def self.find_active_by_idempotency_key(key, now: Time.now)
27
+ return nil if key.to_s.empty?
28
+ return nil unless columns.include?(:idempotency_key)
29
+
30
+ where(idempotency_key: key)
31
+ .exclude(status: TERMINAL_STATUSES)
32
+ .where { (idempotency_expires_at =~ nil) | (idempotency_expires_at > now) }
33
+ .reverse_order(:created, :id)
34
+ .first
35
+ end
36
+
37
+ def self.create_idempotent(values, payload: nil, idempotency_key: nil, ttl: nil)
38
+ key = idempotency_key || idempotency_key_for(payload || values)
39
+ existing = find_active_by_idempotency_key(key)
40
+ return existing if existing
41
+
42
+ expires_at = ttl ? Time.now + ttl : nil
43
+ create(values.merge(idempotency_key: key, idempotency_expires_at: expires_at))
44
+ end
45
+
14
46
  def cancelled?
15
47
  !cancelled_at.nil?
16
48
  end
49
+
50
+ def self.canonical_payload(value)
51
+ case value
52
+ when Hash
53
+ value.keys.map(&:to_s).sort.to_h do |key|
54
+ [key, canonical_payload(value.fetch(key) { value.fetch(key.to_sym) })]
55
+ end
56
+ when Array
57
+ value.map { |item| canonical_payload(item) }
58
+ when Time
59
+ value.utc.iso8601(6)
60
+ when DateTime
61
+ value.to_time.utc.iso8601(6)
62
+ else
63
+ value
64
+ end
65
+ end
66
+ private_class_method :canonical_payload
17
67
  end
18
68
  end
19
69
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module Data
5
- VERSION = '1.7.0'
5
+ VERSION = '1.7.4'
6
6
  end
7
7
  end
data/lib/legion/data.rb CHANGED
@@ -16,6 +16,7 @@ require_relative 'data/helper'
16
16
  require_relative 'data/rls'
17
17
  require_relative 'data/extract'
18
18
  require_relative 'data/audit_record'
19
+ require_relative 'data/audit_log_hash_chain'
19
20
 
20
21
  unless Legion::Logging::Helper.method_defined?(:handle_exception)
21
22
  module Legion
@@ -0,0 +1,16 @@
1
+ #!/usr/bin/env bash
2
+ # Pre-commit hook: run RuboCop with autofix on staged Ruby files.
3
+ set -uo pipefail
4
+
5
+ FILES=("$@")
6
+
7
+ if command -v rubocop >/dev/null 2>&1; then
8
+ exec rubocop -A --force-exclusion "${FILES[@]}"
9
+ fi
10
+
11
+ if bundle exec rubocop -v >/dev/null 2>&1; then
12
+ exec bundle exec rubocop -A --force-exclusion "${FILES[@]}"
13
+ fi
14
+
15
+ echo "RuboCop is not available locally; CI will enforce RuboCop."
16
+ exit 0
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-data
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.7.0
4
+ version: 1.7.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -107,6 +107,7 @@ files:
107
107
  - ".github/dependabot.yml"
108
108
  - ".github/workflows/ci.yml"
109
109
  - ".gitignore"
110
+ - ".pre-commit-config.yaml"
110
111
  - ".rubocop.yml"
111
112
  - AGENTS.md
112
113
  - CHANGELOG.md
@@ -121,6 +122,7 @@ files:
121
122
  - lib/legion/data/archival.rb
122
123
  - lib/legion/data/archival/policy.rb
123
124
  - lib/legion/data/archiver.rb
125
+ - lib/legion/data/audit_log_hash_chain.rb
124
126
  - lib/legion/data/audit_record.rb
125
127
  - lib/legion/data/connection.rb
126
128
  - lib/legion/data/encryption/cipher.rb
@@ -218,6 +220,9 @@ files:
218
220
  - lib/legion/data/migrations/071_add_engine_to_relationships.rb
219
221
  - lib/legion/data/migrations/072_create_identity_audit_log.rb
220
222
  - lib/legion/data/migrations/073_add_identity_multi_instance_columns.rb
223
+ - lib/legion/data/migrations/074_widen_apollo_entry_identifiers.rb
224
+ - lib/legion/data/migrations/075_add_task_idempotency.rb
225
+ - lib/legion/data/migrations/076_create_extract_step_timings.rb
221
226
  - lib/legion/data/model.rb
222
227
  - lib/legion/data/models/apollo_access_log.rb
223
228
  - lib/legion/data/models/apollo_entry.rb
@@ -228,6 +233,7 @@ files:
228
233
  - lib/legion/data/models/chain.rb
229
234
  - lib/legion/data/models/digital_worker.rb
230
235
  - lib/legion/data/models/extension.rb
236
+ - lib/legion/data/models/extract_step_timing.rb
231
237
  - lib/legion/data/models/function.rb
232
238
  - lib/legion/data/models/identity.rb
233
239
  - lib/legion/data/models/identity_audit_log.rb
@@ -252,6 +258,7 @@ files:
252
258
  - lib/legion/data/storage_tiers.rb
253
259
  - lib/legion/data/vector.rb
254
260
  - lib/legion/data/version.rb
261
+ - scripts/pre-commit-rubocop.sh
255
262
  homepage: https://github.com/LegionIO/legion-data
256
263
  licenses:
257
264
  - Apache-2.0