language-operator 0.1.65 → 0.1.66
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +20 -1
- data/components/agent/Gemfile +1 -1
- data/docs/observability.md +208 -0
- data/lib/language_operator/agent/task_executor.rb +11 -1
- data/lib/language_operator/agent.rb +24 -14
- data/lib/language_operator/cli/commands/agent/base.rb +140 -47
- data/lib/language_operator/cli/commands/agent/code_operations.rb +157 -16
- data/lib/language_operator/cli/errors/suggestions.rb +1 -1
- data/lib/language_operator/constants.rb +1 -0
- data/lib/language_operator/kubernetes/client.rb +1 -1
- data/lib/language_operator/version.rb +1 -1
- data/synth/003/Makefile +12 -2
- data/synth/003/agent.txt +1 -1
- metadata +4 -6
- data/lib/language_operator/cli/commands/agent/learning.rb +0 -408
- data/synth/003/agent.optimized.rb +0 -66
- data/synth/003/agent.synthesized.rb +0 -41
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: f9121bdf48b4ee7bad4c9918d68ddf554966acd1c264cd8e1a1cadc13e5c6459
|
|
4
|
+
data.tar.gz: cf90195c887a60165e1aee1990e2dafe2b69a37c736362d1bcbf341dd21dfb3f
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f533c69e38c4bc604b939e54b743228a4640611292d5afbe7cbaa79f0b4f05436c8e1360cb64a8f1de623acddc08d291d39a4381817a530bd4b96f4408699913
|
|
7
|
+
data.tar.gz: 1511bba1d7471abaa63e1029fe9b229414dd9f2b745fb2f38b83d073ac529506b00a913fd56418c4a461c06bf061a0526973a5213574ed2cc70332a0249d8d00
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
|
@@ -2,4 +2,23 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://rubygems.org/gems/language-operator)
|
|
4
4
|
|
|
5
|
-
This gem is experimental, used by [language-operator](https://github.com/language-operator/language-operator), and not ready for production.
|
|
5
|
+
This gem is experimental, used by [language-operator](https://github.com/language-operator/language-operator), and not ready for production.
|
|
6
|
+
|
|
7
|
+
## Observability
|
|
8
|
+
|
|
9
|
+
The gem includes comprehensive OpenTelemetry instrumentation for monitoring agent executions and enabling the learning system to optimize performance.
|
|
10
|
+
|
|
11
|
+
**Span Hierarchy:**
|
|
12
|
+
```
|
|
13
|
+
agent_executor (parent span - overall agent run)
|
|
14
|
+
└── task_executor.execute_task (child span - task execution)
|
|
15
|
+
└── execute_tool #{tool_name} (grandchild span - tool calls)
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
**Key Features:**
|
|
19
|
+
- Automatic trace generation following OpenTelemetry GenAI conventions
|
|
20
|
+
- Learning system integration via standardized span names and attributes
|
|
21
|
+
- Optional data capture with privacy controls
|
|
22
|
+
- Performance monitoring and debugging support
|
|
23
|
+
|
|
24
|
+
For detailed information, see [docs/observability.md](./docs/observability.md).
|
data/components/agent/Gemfile
CHANGED
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
# Observability and Telemetry
|
|
2
|
+
|
|
3
|
+
The Language Operator gem includes comprehensive OpenTelemetry instrumentation to enable observability, debugging, and optimization of agent executions.
|
|
4
|
+
|
|
5
|
+
## OpenTelemetry Integration
|
|
6
|
+
|
|
7
|
+
The gem automatically instruments agent executions with OpenTelemetry spans, following the [OpenTelemetry Semantic Conventions for GenAI](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
|
|
8
|
+
|
|
9
|
+
### Configuration
|
|
10
|
+
|
|
11
|
+
Configure telemetry via environment variables:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
# Basic telemetry (always enabled)
|
|
15
|
+
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-otel-collector:4317
|
|
16
|
+
|
|
17
|
+
# Data capture controls (optional - defaults to metadata only)
|
|
18
|
+
CAPTURE_TASK_INPUTS=true # Capture full task inputs as JSON
|
|
19
|
+
CAPTURE_TASK_OUTPUTS=true # Capture full task outputs as JSON
|
|
20
|
+
CAPTURE_TOOL_ARGS=true # Capture tool call arguments
|
|
21
|
+
CAPTURE_TOOL_RESULTS=true # Capture tool call results
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Security Note:** Data capture is disabled by default to prevent sensitive information leakage. Only enable full data capture in secure environments.
|
|
25
|
+
|
|
26
|
+
## Span Hierarchy
|
|
27
|
+
|
|
28
|
+
The gem creates a hierarchical trace structure that enables the learning system to identify and analyze complete agent executions:
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
agent_executor (parent span - overall agent run)
|
|
32
|
+
└── task_executor.execute_task (child span - task 1)
|
|
33
|
+
└── execute_tool github (grandchild span - tool call 1)
|
|
34
|
+
└── execute_tool slack (grandchild span - tool call 2)
|
|
35
|
+
└── task_executor.execute_task (child span - task 2)
|
|
36
|
+
└── task_executor.execute_task (child span - task 3)
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Span Names
|
|
40
|
+
|
|
41
|
+
| Span Name | Purpose | Created By |
|
|
42
|
+
|-----------|---------|------------|
|
|
43
|
+
| `agent_executor` | Overall agent execution | `LanguageOperator::Agent.execute_main_block()` |
|
|
44
|
+
| `task_executor.execute_task` | Individual task execution | `TaskExecutor#execute_task()` |
|
|
45
|
+
| `execute_tool #{tool_name}` | Tool calls from LLM responses | `TaskTracer#record_single_tool_call()` |
|
|
46
|
+
| `execute_tool.#{tool_name}` | Direct tool calls from symbolic tasks | `Client::Base` tool wrapper |
|
|
47
|
+
|
|
48
|
+
## Span Attributes
|
|
49
|
+
|
|
50
|
+
### Agent Executor Span
|
|
51
|
+
|
|
52
|
+
The top-level `agent_executor` span includes:
|
|
53
|
+
|
|
54
|
+
```
|
|
55
|
+
agent.name: "my-agent" # Agent identifier
|
|
56
|
+
agent.task_count: 5 # Number of tasks in agent
|
|
57
|
+
agent.mode: "autonomous" # Execution mode (autonomous/scheduled/interactive)
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Task Executor Span
|
|
61
|
+
|
|
62
|
+
Each `task_executor.execute_task` span includes:
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
# Core identification (CRITICAL for learning system)
|
|
66
|
+
task.name: "fetch_user_data" # Task identifier
|
|
67
|
+
gen_ai.operation.name: "execute_task" # Operation type
|
|
68
|
+
|
|
69
|
+
# Execution metadata
|
|
70
|
+
task.max_retries: 3 # Retry configuration
|
|
71
|
+
task.timeout: 30000 # Timeout in milliseconds
|
|
72
|
+
task.type: "hybrid" # Task type (neural/symbolic/hybrid)
|
|
73
|
+
task.has_neural: "true" # Has neural implementation
|
|
74
|
+
task.has_symbolic: "false" # Has symbolic implementation
|
|
75
|
+
|
|
76
|
+
# Agent context
|
|
77
|
+
agent.name: "my-agent" # Agent identifier (explicit for learning system)
|
|
78
|
+
|
|
79
|
+
# Data capture (when enabled)
|
|
80
|
+
task.inputs: '{"user_id": 123}' # JSON-encoded inputs (CAPTURE_TASK_INPUTS=true)
|
|
81
|
+
task.outputs: '{"user": {...}}' # JSON-encoded outputs (CAPTURE_TASK_OUTPUTS=true)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Tool Call Spans
|
|
85
|
+
|
|
86
|
+
Tool calls create spans with names like `execute_tool #{tool_name}` and include:
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
# GenAI semantic attributes
|
|
90
|
+
gen_ai.operation.name: "execute_tool" # Operation type
|
|
91
|
+
gen_ai.tool.name: "github" # Tool identifier
|
|
92
|
+
gen_ai.tool.call.id: "call_123" # Call ID (if available)
|
|
93
|
+
|
|
94
|
+
# Data capture (when enabled)
|
|
95
|
+
gen_ai.tool.call.arguments: '{"repo": "..."}' # JSON arguments (CAPTURE_TOOL_ARGS=true)
|
|
96
|
+
gen_ai.tool.call.result: '{"status": "ok"}' # JSON result (CAPTURE_TOOL_RESULTS=true)
|
|
97
|
+
|
|
98
|
+
# Size metadata (always captured)
|
|
99
|
+
gen_ai.tool.call.arguments.size: 45 # Arguments size in bytes
|
|
100
|
+
gen_ai.tool.call.result.size: 1024 # Result size in bytes
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Learning System Integration
|
|
104
|
+
|
|
105
|
+
This span naming convention enables the language-operator Kubernetes controller to:
|
|
106
|
+
|
|
107
|
+
1. **Identify Task Executions**: Query traces by `task_executor.execute_task` spans
|
|
108
|
+
2. **Group by Agent**: Filter by `agent.name` attribute
|
|
109
|
+
3. **Analyze Patterns**: Extract execution patterns from span attributes
|
|
110
|
+
4. **Build Optimizations**: Create optimized implementations based on trace analysis
|
|
111
|
+
|
|
112
|
+
### Example OTLP Query
|
|
113
|
+
|
|
114
|
+
To find all task executions for an agent:
|
|
115
|
+
|
|
116
|
+
```sql
|
|
117
|
+
SELECT * FROM spans
|
|
118
|
+
WHERE name = 'task_executor.execute_task'
|
|
119
|
+
AND attributes['agent.name'] = 'my-agent'
|
|
120
|
+
AND start_time > NOW() - INTERVAL '1 hour'
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
## Data Privacy and Security
|
|
124
|
+
|
|
125
|
+
### Default Behavior (Secure)
|
|
126
|
+
|
|
127
|
+
By default, the gem captures:
|
|
128
|
+
- ✅ Task names and metadata
|
|
129
|
+
- ✅ Execution timing and counts
|
|
130
|
+
- ✅ Tool names and call frequencies
|
|
131
|
+
- ✅ Data sizes (bytes)
|
|
132
|
+
- ❌ **NOT** actual data content
|
|
133
|
+
|
|
134
|
+
### Full Data Capture (Optional)
|
|
135
|
+
|
|
136
|
+
When explicitly enabled, the gem additionally captures:
|
|
137
|
+
- ⚠️ Complete task inputs and outputs as JSON
|
|
138
|
+
- ⚠️ Tool call arguments and results
|
|
139
|
+
- ⚠️ LLM prompts and responses
|
|
140
|
+
|
|
141
|
+
**Warning:** Only enable full data capture in development or secure production environments. Captured data may contain sensitive information.
|
|
142
|
+
|
|
143
|
+
### Data Sanitization
|
|
144
|
+
|
|
145
|
+
When full capture is enabled, the gem:
|
|
146
|
+
- Truncates large payloads (>1000 chars for span attributes)
|
|
147
|
+
- Converts complex objects to JSON automatically
|
|
148
|
+
- Respects OpenTelemetry attribute limits
|
|
149
|
+
|
|
150
|
+
## Performance Impact
|
|
151
|
+
|
|
152
|
+
Telemetry overhead is minimal:
|
|
153
|
+
- **Default mode**: <5% performance overhead
|
|
154
|
+
- **Full capture mode**: ~10% performance overhead
|
|
155
|
+
- **Span creation**: <1ms per span
|
|
156
|
+
- **Data serialization**: 1-5ms for complex objects
|
|
157
|
+
|
|
158
|
+
## Debugging with Traces
|
|
159
|
+
|
|
160
|
+
### Common Queries
|
|
161
|
+
|
|
162
|
+
**Find slow tasks:**
|
|
163
|
+
```sql
|
|
164
|
+
SELECT attributes['task.name'], duration_ms
|
|
165
|
+
FROM spans
|
|
166
|
+
WHERE name = 'task_executor.execute_task'
|
|
167
|
+
AND duration_ms > 5000
|
|
168
|
+
ORDER BY duration_ms DESC
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
**Tool usage analysis:**
|
|
172
|
+
```sql
|
|
173
|
+
SELECT attributes['gen_ai.tool.name'], COUNT(*)
|
|
174
|
+
FROM spans
|
|
175
|
+
WHERE name LIKE 'execute_tool%'
|
|
176
|
+
GROUP BY attributes['gen_ai.tool.name']
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Agent execution frequency:**
|
|
180
|
+
```sql
|
|
181
|
+
SELECT attributes['agent.name'], COUNT(*) as executions
|
|
182
|
+
FROM spans
|
|
183
|
+
WHERE name = 'agent_executor'
|
|
184
|
+
AND start_time > NOW() - INTERVAL '24 hours'
|
|
185
|
+
GROUP BY attributes['agent.name']
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### Trace Sampling
|
|
189
|
+
|
|
190
|
+
For high-volume agents, consider trace sampling:
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
# Sample 10% of traces
|
|
194
|
+
OTEL_TRACES_SAMPLER=parentbased_traceidratio
|
|
195
|
+
OTEL_TRACES_SAMPLER_ARG=0.1
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
## Related Documentation
|
|
199
|
+
|
|
200
|
+
- [Agent Runtime Architecture](./agent-internals.md) - How agents execute
|
|
201
|
+
- [Best Practices](./best-practices.md) - Production deployment guidance
|
|
202
|
+
- [Understanding Generated Code](./understanding-generated-code.md) - Agent code structure
|
|
203
|
+
|
|
204
|
+
## External Resources
|
|
205
|
+
|
|
206
|
+
- [OpenTelemetry Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
|
|
207
|
+
- [Language Operator Controller](https://github.com/language-operator/language-operator) - Learning system implementation
|
|
208
|
+
- [OTLP Specification](https://opentelemetry.io/docs/specs/otlp/) - Wire format
|
|
@@ -138,6 +138,10 @@ module LanguageOperator
|
|
|
138
138
|
# Execute with retry logic
|
|
139
139
|
result = execute_with_retry(task, task_name, inputs, timeout, max_retries, execution_start)
|
|
140
140
|
|
|
141
|
+
# Add task outputs to span for learning system (if enabled)
|
|
142
|
+
current_span = OpenTelemetry::Trace.current_span
|
|
143
|
+
current_span&.set_attribute('task.outputs', result.to_json) if current_span && capture_enabled?(:outputs)
|
|
144
|
+
|
|
141
145
|
# Emit Kubernetes event for successful task completion
|
|
142
146
|
emit_task_execution_event(task_name, success: true, execution_start: execution_start)
|
|
143
147
|
|
|
@@ -1023,13 +1027,19 @@ module LanguageOperator
|
|
|
1023
1027
|
attributes = {
|
|
1024
1028
|
# Core task identification (CRITICAL for learning system)
|
|
1025
1029
|
'task.name' => task_name.to_s,
|
|
1026
|
-
'task.inputs' => inputs.keys.map(&:to_s).join(','),
|
|
1027
1030
|
'task.max_retries' => max_retries,
|
|
1028
1031
|
|
|
1029
1032
|
# Semantic operation name for better trace organization
|
|
1030
1033
|
'gen_ai.operation.name' => 'execute_task'
|
|
1031
1034
|
}
|
|
1032
1035
|
|
|
1036
|
+
# Add task inputs - JSON-encoded if capture enabled, else just keys
|
|
1037
|
+
attributes['task.inputs'] = if capture_enabled?(:inputs)
|
|
1038
|
+
inputs.to_json
|
|
1039
|
+
else
|
|
1040
|
+
inputs.keys.map(&:to_s).join(',')
|
|
1041
|
+
end
|
|
1042
|
+
|
|
1033
1043
|
# Explicitly add agent name if available (redundant with resource attribute but ensures visibility)
|
|
1034
1044
|
if (agent_name = ENV.fetch('AGENT_NAME', nil))
|
|
1035
1045
|
attributes['agent.name'] = agent_name
|
|
@@ -4,6 +4,7 @@ require_relative 'agent/base'
|
|
|
4
4
|
require_relative 'agent/executor'
|
|
5
5
|
require_relative 'agent/task_executor'
|
|
6
6
|
require_relative 'agent/web_server'
|
|
7
|
+
require_relative 'agent/instrumentation'
|
|
7
8
|
require_relative 'dsl'
|
|
8
9
|
require_relative 'logger'
|
|
9
10
|
|
|
@@ -24,6 +25,8 @@ module LanguageOperator
|
|
|
24
25
|
# agent.execute_goal("Summarize daily news")
|
|
25
26
|
# rubocop:disable Metrics/ModuleLength
|
|
26
27
|
module Agent
|
|
28
|
+
extend LanguageOperator::Agent::Instrumentation
|
|
29
|
+
|
|
27
30
|
# Module-level logger for Agent framework
|
|
28
31
|
@logger = LanguageOperator::Logger.new(component: 'Agent')
|
|
29
32
|
|
|
@@ -215,22 +218,29 @@ module LanguageOperator
|
|
|
215
218
|
agent: agent_def.name,
|
|
216
219
|
task_count: agent_def.tasks.size)
|
|
217
220
|
|
|
218
|
-
#
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
221
|
+
# Execute main block within agent_executor span for learning system integration
|
|
222
|
+
with_span('agent_executor', attributes: {
|
|
223
|
+
'agent.name' => agent_def.name,
|
|
224
|
+
'agent.task_count' => agent_def.tasks.size,
|
|
225
|
+
'agent.mode' => ENV.fetch('AGENT_MODE', 'unknown')
|
|
226
|
+
}) do
|
|
227
|
+
# Get inputs from environment or default to empty hash
|
|
228
|
+
inputs = {}
|
|
229
|
+
|
|
230
|
+
# Execute main block with task executor as context
|
|
231
|
+
result = agent_def.main.call(inputs, task_executor)
|
|
232
|
+
|
|
233
|
+
logger.info('Main block execution completed',
|
|
234
|
+
result: result)
|
|
235
|
+
|
|
236
|
+
# Call output handler if defined
|
|
237
|
+
if agent_def.output
|
|
238
|
+
logger.debug('Executing output handler', outputs: result)
|
|
239
|
+
execute_output_handler(agent_def, result, task_executor)
|
|
240
|
+
end
|
|
226
241
|
|
|
227
|
-
|
|
228
|
-
if agent_def.output
|
|
229
|
-
logger.debug('Executing output handler', outputs: result)
|
|
230
|
-
execute_output_handler(agent_def, result, task_executor)
|
|
242
|
+
result
|
|
231
243
|
end
|
|
232
|
-
|
|
233
|
-
result
|
|
234
244
|
end
|
|
235
245
|
|
|
236
246
|
# Execute main block (DSL v1) in persistent mode for autonomous agents
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require 'thor'
|
|
4
|
+
require 'json'
|
|
4
5
|
require_relative '../../command_loader'
|
|
5
6
|
require_relative '../../wizards/agent_wizard'
|
|
6
7
|
|
|
@@ -9,7 +10,6 @@ require_relative 'workspace'
|
|
|
9
10
|
require_relative 'code_operations'
|
|
10
11
|
require_relative 'logs'
|
|
11
12
|
require_relative 'lifecycle'
|
|
12
|
-
require_relative 'learning'
|
|
13
13
|
|
|
14
14
|
# Include helper modules
|
|
15
15
|
require_relative 'helpers/cluster_llm_client'
|
|
@@ -35,7 +35,6 @@ module LanguageOperator
|
|
|
35
35
|
include CodeOperations
|
|
36
36
|
include Logs
|
|
37
37
|
include Lifecycle
|
|
38
|
-
include Learning
|
|
39
38
|
|
|
40
39
|
# NOTE: Core commands (create, list, inspect, delete) will be added below
|
|
41
40
|
# This file is a placeholder for the refactoring process
|
|
@@ -173,6 +172,9 @@ module LanguageOperator
|
|
|
173
172
|
# Main agent information
|
|
174
173
|
puts
|
|
175
174
|
status = agent.dig('status', 'phase') || 'Unknown'
|
|
175
|
+
creation_timestamp = agent.dig('metadata', 'creationTimestamp')
|
|
176
|
+
formatted_created = creation_timestamp ? Formatters::ValueFormatter.time_ago(Time.parse(creation_timestamp)) : nil
|
|
177
|
+
|
|
176
178
|
format_agent_details(
|
|
177
179
|
name: name,
|
|
178
180
|
namespace: ctx.namespace,
|
|
@@ -180,8 +182,8 @@ module LanguageOperator
|
|
|
180
182
|
status: format_status(status),
|
|
181
183
|
mode: agent.dig('spec', 'executionMode') || 'autonomous',
|
|
182
184
|
schedule: agent.dig('spec', 'schedule'),
|
|
183
|
-
persona: agent.dig('spec', 'persona'),
|
|
184
|
-
created:
|
|
185
|
+
persona: agent.dig('spec', 'persona') || 'None',
|
|
186
|
+
created: formatted_created
|
|
185
187
|
)
|
|
186
188
|
puts
|
|
187
189
|
|
|
@@ -191,7 +193,6 @@ module LanguageOperator
|
|
|
191
193
|
exec_data = get_execution_data(name, ctx)
|
|
192
194
|
|
|
193
195
|
exec_rows = {
|
|
194
|
-
'Total Runs' => exec_data[:total_runs],
|
|
195
196
|
'Last Run' => exec_data[:last_run] || 'Never'
|
|
196
197
|
}
|
|
197
198
|
exec_rows['Next Run'] = exec_data[:next_run] || 'N/A' if agent.dig('spec', 'schedule')
|
|
@@ -200,6 +201,10 @@ module LanguageOperator
|
|
|
200
201
|
puts
|
|
201
202
|
end
|
|
202
203
|
|
|
204
|
+
# Learning status
|
|
205
|
+
display_learning_section(agent, name, ctx)
|
|
206
|
+
puts
|
|
207
|
+
|
|
203
208
|
# Resources
|
|
204
209
|
resources = agent.dig('spec', 'resources')
|
|
205
210
|
if resources
|
|
@@ -302,62 +307,71 @@ module LanguageOperator
|
|
|
302
307
|
Formatters::ProgressFormatter.with_spinner("Deleting agent '#{name}'") do
|
|
303
308
|
ctx.client.delete_resource(RESOURCE_AGENT, name, ctx.namespace)
|
|
304
309
|
end
|
|
310
|
+
|
|
311
|
+
# Verify deletion completed
|
|
312
|
+
verify_agent_deletion(ctx, name)
|
|
305
313
|
end
|
|
306
314
|
end
|
|
307
315
|
|
|
308
|
-
|
|
309
|
-
long_desc <<-DESC
|
|
310
|
-
List the versioned ConfigMaps created by the operator for an agent.
|
|
316
|
+
private
|
|
311
317
|
|
|
312
|
-
|
|
318
|
+
# Display learning status section in agent inspect
|
|
319
|
+
def display_learning_section(agent, _name, _ctx)
|
|
320
|
+
annotations = agent.dig('metadata', 'annotations')
|
|
321
|
+
annotations = annotations.respond_to?(:to_h) ? annotations.to_h : (annotations || {})
|
|
313
322
|
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
aictl agent versions my-agent --cluster production
|
|
317
|
-
DESC
|
|
318
|
-
option :cluster, type: :string, desc: 'Override current cluster context'
|
|
319
|
-
def versions(name)
|
|
320
|
-
handle_command_error('list agent versions') do
|
|
321
|
-
ctx = CLI::Helpers::ClusterContext.from_options(options)
|
|
323
|
+
# Determine learning state
|
|
324
|
+
learning_enabled = !annotations.key?(Constants::KubernetesLabels::LEARNING_DISABLED_LABEL)
|
|
322
325
|
|
|
323
|
-
|
|
324
|
-
|
|
326
|
+
# Get runs pending learning from agent status
|
|
327
|
+
runs_pending_learning = agent.dig('status', 'runsPendingLearning') || 0
|
|
328
|
+
learning_threshold = 10 # Standard threshold
|
|
325
329
|
|
|
326
|
-
|
|
327
|
-
|
|
330
|
+
# Calculate progress percentage
|
|
331
|
+
progress_percent = [(runs_pending_learning.to_f / learning_threshold * 100).round, 100].min
|
|
332
|
+
runs_display = if runs_pending_learning >= learning_threshold
|
|
333
|
+
"#{runs_pending_learning}/#{learning_threshold} #{pastel.green('(Ready)')}"
|
|
334
|
+
else
|
|
335
|
+
"#{runs_pending_learning}/#{learning_threshold} (#{progress_percent}%)"
|
|
336
|
+
end
|
|
328
337
|
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
labels = cm.dig('metadata', 'labels') || {}
|
|
332
|
-
labels['agent'] == name && labels['version']
|
|
333
|
-
end
|
|
334
|
-
|
|
335
|
-
# Sort by version (assuming numeric versions)
|
|
336
|
-
agent_configs.sort! do |a, b|
|
|
337
|
-
version_a = a.dig('metadata', 'labels', 'version').to_i
|
|
338
|
-
version_b = b.dig('metadata', 'labels', 'version').to_i
|
|
339
|
-
version_b <=> version_a # Reverse order (newest first)
|
|
340
|
-
end
|
|
338
|
+
status_color = learning_enabled ? :green : :yellow
|
|
339
|
+
status_text = learning_enabled ? 'Enabled' : 'Disabled'
|
|
341
340
|
|
|
342
|
-
|
|
343
|
-
|
|
341
|
+
highlighted_box(
|
|
342
|
+
title: 'Learning',
|
|
343
|
+
color: :cyan,
|
|
344
|
+
rows: {
|
|
345
|
+
'Status' => pastel.send(status_color).bold(status_text),
|
|
346
|
+
'Threshold' => "#{pastel.cyan('10 successful runs')} (auto-learning trigger)",
|
|
347
|
+
'Confidence Target' => "#{pastel.cyan('85%')} (pattern detection)",
|
|
348
|
+
'Runs Recorded' => runs_display
|
|
349
|
+
}
|
|
350
|
+
)
|
|
344
351
|
end
|
|
345
352
|
|
|
346
|
-
private
|
|
347
|
-
|
|
348
353
|
# Shared helper methods that are used across multiple commands
|
|
349
354
|
# These will be extracted from the original agent.rb
|
|
350
355
|
|
|
351
|
-
def handle_agent_not_found(name, ctx, error)
|
|
356
|
+
def handle_agent_not_found(name, ctx, error = nil)
|
|
352
357
|
# Get available agents for fuzzy matching
|
|
353
358
|
agents = ctx.client.list_resources(RESOURCE_AGENT, namespace: ctx.namespace)
|
|
354
359
|
available_names = agents.map { |a| a.dig('metadata', 'name') }
|
|
355
360
|
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
+
# Create error if not provided
|
|
362
|
+
error ||= K8s::Error::NotFound.new('GET', "/apis/langop.io/v1alpha1/namespaces/#{ctx.namespace}/languageagents/#{name}", 404, 'Not Found')
|
|
363
|
+
|
|
364
|
+
begin
|
|
365
|
+
CLI::Errors::Handler.handle_not_found(error, {
|
|
366
|
+
resource_type: RESOURCE_AGENT,
|
|
367
|
+
resource_name: name,
|
|
368
|
+
cluster: ctx.name,
|
|
369
|
+
available_resources: available_names
|
|
370
|
+
})
|
|
371
|
+
rescue CLI::Errors::NotFoundError
|
|
372
|
+
# Error message already displayed by handler, just exit gracefully
|
|
373
|
+
exit 1
|
|
374
|
+
end
|
|
361
375
|
end
|
|
362
376
|
|
|
363
377
|
def display_agent_created(agent, ctx, _description, _synthesis_result)
|
|
@@ -372,8 +386,8 @@ module LanguageOperator
|
|
|
372
386
|
status: format_status(status),
|
|
373
387
|
mode: agent.dig('spec', 'executionMode') || 'autonomous',
|
|
374
388
|
schedule: agent.dig('spec', 'schedule'),
|
|
375
|
-
persona: agent.dig('spec', 'persona') || '
|
|
376
|
-
created:
|
|
389
|
+
persona: agent.dig('spec', 'persona') || 'None',
|
|
390
|
+
created: 'just now'
|
|
377
391
|
)
|
|
378
392
|
|
|
379
393
|
puts
|
|
@@ -526,11 +540,17 @@ module LanguageOperator
|
|
|
526
540
|
end
|
|
527
541
|
|
|
528
542
|
table_data = agents.map do |agent|
|
|
543
|
+
status = if agent.dig('metadata', 'deletionTimestamp')
|
|
544
|
+
'Pending Deletion'
|
|
545
|
+
else
|
|
546
|
+
agent.dig('status', 'phase') || 'Unknown'
|
|
547
|
+
end
|
|
548
|
+
|
|
529
549
|
{
|
|
530
550
|
name: agent.dig('metadata', 'name'),
|
|
531
551
|
namespace: agent.dig('metadata', 'namespace') || context.namespace,
|
|
532
552
|
mode: agent.dig('spec', 'executionMode') || 'autonomous',
|
|
533
|
-
status:
|
|
553
|
+
status: status
|
|
534
554
|
}
|
|
535
555
|
end
|
|
536
556
|
|
|
@@ -556,11 +576,17 @@ module LanguageOperator
|
|
|
556
576
|
agents = ctx.client.list_resources(RESOURCE_AGENT, namespace: ctx.namespace)
|
|
557
577
|
|
|
558
578
|
agents.each do |agent|
|
|
579
|
+
status = if agent.dig('metadata', 'deletionTimestamp')
|
|
580
|
+
'Pending Deletion'
|
|
581
|
+
else
|
|
582
|
+
agent.dig('status', 'phase') || 'Unknown'
|
|
583
|
+
end
|
|
584
|
+
|
|
559
585
|
all_agents << {
|
|
560
586
|
cluster: cluster[:name],
|
|
561
587
|
name: agent.dig('metadata', 'name'),
|
|
562
588
|
mode: agent.dig('spec', 'executionMode') || 'autonomous',
|
|
563
|
-
status:
|
|
589
|
+
status: status,
|
|
564
590
|
next_run: agent.dig('status', 'nextRun') || 'N/A',
|
|
565
591
|
executions: agent.dig('status', 'executionCount') || 0
|
|
566
592
|
}
|
|
@@ -828,6 +854,73 @@ module LanguageOperator
|
|
|
828
854
|
rescue StandardError
|
|
829
855
|
schedule
|
|
830
856
|
end
|
|
857
|
+
|
|
858
|
+
def verify_agent_deletion(ctx, name)
|
|
859
|
+
max_wait = 30 # Wait up to 30 seconds
|
|
860
|
+
interval = 2 # Check every 2 seconds
|
|
861
|
+
elapsed = 0
|
|
862
|
+
|
|
863
|
+
Formatters::ProgressFormatter.with_spinner('Verifying deletion') do
|
|
864
|
+
loop do
|
|
865
|
+
begin
|
|
866
|
+
agent = ctx.client.get_resource(RESOURCE_AGENT, name, ctx.namespace)
|
|
867
|
+
|
|
868
|
+
# Check if deletion is stuck on finalizers
|
|
869
|
+
deletion_timestamp = agent.dig('metadata', 'deletionTimestamp')
|
|
870
|
+
if deletion_timestamp
|
|
871
|
+
finalizers = agent.dig('metadata', 'finalizers') || []
|
|
872
|
+
if finalizers.any?
|
|
873
|
+
if elapsed >= max_wait
|
|
874
|
+
deletion_stuck_error(name, finalizers)
|
|
875
|
+
return
|
|
876
|
+
end
|
|
877
|
+
end
|
|
878
|
+
end
|
|
879
|
+
rescue K8s::Error::NotFound
|
|
880
|
+
# Agent successfully deleted
|
|
881
|
+
break
|
|
882
|
+
end
|
|
883
|
+
|
|
884
|
+
if elapsed >= max_wait
|
|
885
|
+
deletion_timeout_error(name)
|
|
886
|
+
return
|
|
887
|
+
end
|
|
888
|
+
|
|
889
|
+
sleep interval
|
|
890
|
+
elapsed += interval
|
|
891
|
+
end
|
|
892
|
+
end
|
|
893
|
+
|
|
894
|
+
# Deletion verified - no additional success message needed
|
|
895
|
+
end
|
|
896
|
+
|
|
897
|
+
def deletion_stuck_error(name, finalizers)
|
|
898
|
+
puts
|
|
899
|
+
Formatters::ProgressFormatter.error("Deletion of agent '#{name}' is stuck")
|
|
900
|
+
puts
|
|
901
|
+
puts "The agent has the following finalizers preventing deletion:"
|
|
902
|
+
finalizers.each { |f| puts " - #{pastel.yellow(f)}" }
|
|
903
|
+
puts
|
|
904
|
+
puts "This usually indicates the operator is not running properly."
|
|
905
|
+
puts
|
|
906
|
+
puts "To diagnose:"
|
|
907
|
+
puts " kubectl get pods -n kube-system | grep language-operator"
|
|
908
|
+
puts " kubectl logs -n kube-system -l app.kubernetes.io/name=language-operator"
|
|
909
|
+
puts
|
|
910
|
+
puts "Emergency cleanup (advanced users only):"
|
|
911
|
+
puts " kubectl patch languageagent #{name} -p '{\"metadata\":{\"finalizers\":null}}' --type=merge"
|
|
912
|
+
end
|
|
913
|
+
|
|
914
|
+
def deletion_timeout_error(name)
|
|
915
|
+
puts
|
|
916
|
+
Formatters::ProgressFormatter.warn("Could not verify deletion of agent '#{name}' within 30 seconds")
|
|
917
|
+
puts
|
|
918
|
+
puts "Check deletion status with:"
|
|
919
|
+
puts " aictl agent list"
|
|
920
|
+
puts " kubectl get languageagent #{name}"
|
|
921
|
+
puts
|
|
922
|
+
puts "If the agent shows 'Unknown' status, it may be pending deletion."
|
|
923
|
+
end
|
|
831
924
|
end
|
|
832
925
|
end
|
|
833
926
|
end
|