RubyGems - language-operator - Versions diffs - 0.1.63 → 0.1.66 - Mend

language-operator 0.1.63 → 0.1.66

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

checksums.yaml +4 -4
data/.plan.md +127 -0
data/.rspec +3 -0
data/Gemfile +2 -0
data/Gemfile.lock +4 -1
data/Makefile +34 -80
data/README.md +20 -1
data/components/agent/Gemfile +1 -1
data/docs/cheat-sheet.md +173 -0
data/docs/observability.md +208 -0
data/lib/language_operator/agent/base.rb +10 -1
data/lib/language_operator/agent/event_config.rb +172 -0
data/lib/language_operator/agent/safety/ast_validator.rb +1 -1
data/lib/language_operator/agent/safety/safe_executor.rb +5 -1
data/lib/language_operator/agent/task_executor.rb +97 -7
data/lib/language_operator/agent/telemetry.rb +25 -3
data/lib/language_operator/agent/web_server.rb +6 -9
data/lib/language_operator/agent.rb +24 -14
data/lib/language_operator/cli/commands/agent/base.rb +155 -64
data/lib/language_operator/cli/commands/agent/code_operations.rb +157 -16
data/lib/language_operator/cli/commands/cluster.rb +2 -2
data/lib/language_operator/cli/commands/status.rb +2 -2
data/lib/language_operator/cli/commands/system/synthesize.rb +1 -1
data/lib/language_operator/cli/errors/suggestions.rb +1 -1
data/lib/language_operator/cli/formatters/value_formatter.rb +1 -1
data/lib/language_operator/cli/helpers/ux_helper.rb +3 -4
data/lib/language_operator/config.rb +3 -3
data/lib/language_operator/constants/kubernetes_labels.rb +2 -2
data/lib/language_operator/constants.rb +1 -0
data/lib/language_operator/dsl/task_definition.rb +18 -7
data/lib/language_operator/instrumentation/task_tracer.rb +44 -3
data/lib/language_operator/kubernetes/client.rb +112 -1
data/lib/language_operator/templates/schema/CHANGELOG.md +28 -0
data/lib/language_operator/templates/schema/agent_dsl_openapi.yaml +1 -1
data/lib/language_operator/templates/schema/agent_dsl_schema.json +1 -1
data/lib/language_operator/type_coercion.rb +22 -8
data/lib/language_operator/version.rb +1 -1
data/synth/002/agent.rb +23 -12
data/synth/002/output.log +88 -15
data/synth/003/Makefile +17 -4
data/synth/003/agent.txt +1 -1
data/synth/004/Makefile +54 -0
data/synth/004/README.md +281 -0
data/synth/004/instructions.txt +1 -0
metadata +11 -6
data/lib/language_operator/cli/commands/agent/learning.rb +0 -289
data/synth/003/agent.optimized.rb +0 -66
data/synth/003/agent.synthesized.rb +0 -41

data/docs/observability.md ADDED Viewed

@@ -0,0 +1,208 @@
+# Observability and Telemetry
+The Language Operator gem includes comprehensive OpenTelemetry instrumentation to enable observability, debugging, and optimization of agent executions.
+## OpenTelemetry Integration
+The gem automatically instruments agent executions with OpenTelemetry spans, following the [OpenTelemetry Semantic Conventions for GenAI](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+### Configuration
+Configure telemetry via environment variables:
+```bash
+# Basic telemetry (always enabled)
+OTEL_EXPORTER_OTLP_ENDPOINT=https://your-otel-collector:4317
+# Data capture controls (optional - defaults to metadata only)
+CAPTURE_TASK_INPUTS=true      # Capture full task inputs as JSON
+CAPTURE_TASK_OUTPUTS=true     # Capture full task outputs as JSON
+CAPTURE_TOOL_ARGS=true        # Capture tool call arguments
+CAPTURE_TOOL_RESULTS=true     # Capture tool call results
+```
+**Security Note:** Data capture is disabled by default to prevent sensitive information leakage. Only enable full data capture in secure environments.
+## Span Hierarchy
+The gem creates a hierarchical trace structure that enables the learning system to identify and analyze complete agent executions:
+```
+agent_executor (parent span - overall agent run)
+  └── task_executor.execute_task (child span - task 1)
+      └── execute_tool github (grandchild span - tool call 1)
+      └── execute_tool slack (grandchild span - tool call 2)
+  └── task_executor.execute_task (child span - task 2)
+  └── task_executor.execute_task (child span - task 3)
+```
+### Span Names
+| Span Name | Purpose | Created By |
+|-----------|---------|------------|
+| `agent_executor` | Overall agent execution | `LanguageOperator::Agent.execute_main_block()` |
+| `task_executor.execute_task` | Individual task execution | `TaskExecutor#execute_task()` |
+| `execute_tool #{tool_name}` | Tool calls from LLM responses | `TaskTracer#record_single_tool_call()` |
+| `execute_tool.#{tool_name}` | Direct tool calls from symbolic tasks | `Client::Base` tool wrapper |
+## Span Attributes
+### Agent Executor Span
+The top-level `agent_executor` span includes:
+```
+agent.name: "my-agent"           # Agent identifier
+agent.task_count: 5              # Number of tasks in agent
+agent.mode: "autonomous"         # Execution mode (autonomous/scheduled/interactive)
+```
+### Task Executor Span
+Each `task_executor.execute_task` span includes:
+```
+# Core identification (CRITICAL for learning system)
+task.name: "fetch_user_data"            # Task identifier
+gen_ai.operation.name: "execute_task"   # Operation type
+# Execution metadata
+task.max_retries: 3                     # Retry configuration
+task.timeout: 30000                     # Timeout in milliseconds
+task.type: "hybrid"                     # Task type (neural/symbolic/hybrid)
+task.has_neural: "true"                 # Has neural implementation
+task.has_symbolic: "false"              # Has symbolic implementation
+# Agent context
+agent.name: "my-agent"                  # Agent identifier (explicit for learning system)
+# Data capture (when enabled)
+task.inputs: '{"user_id": 123}'         # JSON-encoded inputs (CAPTURE_TASK_INPUTS=true)
+task.outputs: '{"user": {...}}'         # JSON-encoded outputs (CAPTURE_TASK_OUTPUTS=true)
+```
+### Tool Call Spans
+Tool calls create spans with names like `execute_tool #{tool_name}` and include:
+```
+# GenAI semantic attributes
+gen_ai.operation.name: "execute_tool"           # Operation type
+gen_ai.tool.name: "github"                      # Tool identifier
+gen_ai.tool.call.id: "call_123"                 # Call ID (if available)
+# Data capture (when enabled)
+gen_ai.tool.call.arguments: '{"repo": "..."}'   # JSON arguments (CAPTURE_TOOL_ARGS=true)
+gen_ai.tool.call.result: '{"status": "ok"}'     # JSON result (CAPTURE_TOOL_RESULTS=true)
+# Size metadata (always captured)
+gen_ai.tool.call.arguments.size: 45             # Arguments size in bytes
+gen_ai.tool.call.result.size: 1024              # Result size in bytes
+```
+## Learning System Integration
+This span naming convention enables the language-operator Kubernetes controller to:
+1. **Identify Task Executions**: Query traces by `task_executor.execute_task` spans
+2. **Group by Agent**: Filter by `agent.name` attribute
+3. **Analyze Patterns**: Extract execution patterns from span attributes
+4. **Build Optimizations**: Create optimized implementations based on trace analysis
+### Example OTLP Query
+To find all task executions for an agent:
+```sql
+SELECT * FROM spans
+WHERE name = 'task_executor.execute_task'
+  AND attributes['agent.name'] = 'my-agent'
+  AND start_time > NOW() - INTERVAL '1 hour'
+```
+## Data Privacy and Security
+### Default Behavior (Secure)
+By default, the gem captures:
+- ✅ Task names and metadata
+- ✅ Execution timing and counts
+- ✅ Tool names and call frequencies
+- ✅ Data sizes (bytes)
+- ❌ **NOT** actual data content
+### Full Data Capture (Optional)
+When explicitly enabled, the gem additionally captures:
+- ⚠️ Complete task inputs and outputs as JSON
+- ⚠️ Tool call arguments and results
+- ⚠️ LLM prompts and responses
+**Warning:** Only enable full data capture in development or secure production environments. Captured data may contain sensitive information.
+### Data Sanitization
+When full capture is enabled, the gem:
+- Truncates large payloads (>1000 chars for span attributes)
+- Converts complex objects to JSON automatically
+- Respects OpenTelemetry attribute limits
+## Performance Impact
+Telemetry overhead is minimal:
+- **Default mode**: <5% performance overhead
+- **Full capture mode**: ~10% performance overhead
+- **Span creation**: <1ms per span
+- **Data serialization**: 1-5ms for complex objects
+## Debugging with Traces
+### Common Queries
+**Find slow tasks:**
+```sql
+SELECT attributes['task.name'], duration_ms
+FROM spans
+WHERE name = 'task_executor.execute_task'
+  AND duration_ms > 5000
+ORDER BY duration_ms DESC
+```
+**Tool usage analysis:**
+```sql
+SELECT attributes['gen_ai.tool.name'], COUNT(*)
+FROM spans
+WHERE name LIKE 'execute_tool%'
+GROUP BY attributes['gen_ai.tool.name']
+```
+**Agent execution frequency:**
+```sql
+SELECT attributes['agent.name'], COUNT(*) as executions
+FROM spans
+WHERE name = 'agent_executor'
+  AND start_time > NOW() - INTERVAL '24 hours'
+GROUP BY attributes['agent.name']
+```
+### Trace Sampling
+For high-volume agents, consider trace sampling:
+```bash
+# Sample 10% of traces
+OTEL_TRACES_SAMPLER=parentbased_traceidratio
+OTEL_TRACES_SAMPLER_ARG=0.1
+```
+## Related Documentation
+- [Agent Runtime Architecture](./agent-internals.md) - How agents execute
+- [Best Practices](./best-practices.md) - Production deployment guidance
+- [Understanding Generated Code](./understanding-generated-code.md) - Agent code structure
+## External Resources
+- [OpenTelemetry Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
+- [Language Operator Controller](https://github.com/language-operator/language-operator) - Learning system implementation
+- [OTLP Specification](https://opentelemetry.io/docs/specs/otlp/) - Wire format

data/lib/language_operator/agent/base.rb CHANGED Viewed

@@ -2,6 +2,7 @@
 require_relative '../client'
 require_relative '../constants'
+require_relative '../kubernetes/client'
 require_relative 'telemetry'
 require_relative 'instrumentation'
@@ -21,7 +22,7 @@ module LanguageOperator
     class Base < LanguageOperator::Client::Base
       include Instrumentation
-      attr_reader :workspace_path, :mode
+      attr_reader :workspace_path, :mode, :kubernetes_client
       # Initialize the agent
       #
@@ -40,6 +41,14 @@ module LanguageOperator
         @workspace_path = ENV.fetch('WORKSPACE_PATH', '/workspace')
         @mode = agent_mode_with_default
         @executor = nil
+        # Initialize Kubernetes client for event emission (only in K8s environments)
+        @kubernetes_client = begin
+          LanguageOperator::Kubernetes::Client.instance if ENV.fetch('KUBERNETES_SERVICE_HOST', nil)
+        rescue StandardError => e
+          logger.warn('Failed to initialize Kubernetes client', error: e.message)
+          nil
+        end
       end
       # Run the agent in its configured mode

data/lib/language_operator/agent/event_config.rb ADDED Viewed

@@ -0,0 +1,172 @@
+# frozen_string_literal: true
+require_relative '../config'
+module LanguageOperator
+  module Agent
+    # Event emission configuration for agent runtime
+    #
+    # Manages configuration for Kubernetes event emission including:
+    # - Event filtering and batching options
+    # - Error handling preferences
+    # - Performance tuning settings
+    #
+    # @example Load event configuration
+    #   config = EventConfig.load
+    #   puts "Events enabled: #{config[:enabled]}"
+    #   puts "Max events per minute: #{config[:rate_limit]}"
+    module EventConfig
+      # Load event emission configuration from environment variables
+      #
+      # @return [Hash] Event configuration hash
+      def self.load
+        Config.from_env(
+          {
+            # Core event emission settings
+            enabled: 'ENABLE_K8S_EVENTS',
+            disabled: 'DISABLE_K8S_EVENTS',
+            # Event filtering
+            emit_success_events: 'EMIT_SUCCESS_EVENTS',
+            emit_failure_events: 'EMIT_FAILURE_EVENTS',
+            emit_validation_events: 'EMIT_VALIDATION_EVENTS',
+            # Performance and rate limiting
+            rate_limit_per_minute: 'EVENT_RATE_LIMIT_PER_MINUTE',
+            batch_size: 'EVENT_BATCH_SIZE',
+            batch_timeout_ms: 'EVENT_BATCH_TIMEOUT_MS',
+            # Error handling
+            retry_failed_events: 'RETRY_FAILED_EVENTS',
+            max_event_retries: 'MAX_EVENT_RETRIES',
+            retry_delay_ms: 'EVENT_RETRY_DELAY_MS',
+            # Event content control
+            include_task_metadata: 'INCLUDE_TASK_METADATA',
+            include_error_details: 'INCLUDE_ERROR_DETAILS',
+            truncate_long_messages: 'TRUNCATE_LONG_MESSAGES',
+            max_message_length: 'MAX_EVENT_MESSAGE_LENGTH'
+          },
+          defaults: {
+            enabled: 'true',
+            disabled: 'false',
+            emit_success_events: 'true',
+            emit_failure_events: 'true',
+            emit_validation_events: 'true',
+            rate_limit_per_minute: '60',
+            batch_size: '1',
+            batch_timeout_ms: '1000',
+            retry_failed_events: 'true',
+            max_event_retries: '3',
+            retry_delay_ms: '1000',
+            include_task_metadata: 'true',
+            include_error_details: 'true',
+            truncate_long_messages: 'true',
+            max_message_length: '1000'
+          },
+          types: {
+            enabled: :boolean,
+            disabled: :boolean,
+            emit_success_events: :boolean,
+            emit_failure_events: :boolean,
+            emit_validation_events: :boolean,
+            rate_limit_per_minute: :integer,
+            batch_size: :integer,
+            batch_timeout_ms: :integer,
+            retry_failed_events: :boolean,
+            max_event_retries: :integer,
+            retry_delay_ms: :integer,
+            include_task_metadata: :boolean,
+            include_error_details: :boolean,
+            truncate_long_messages: :boolean,
+            max_message_length: :integer
+          }
+        )
+      end
+      # Check if event emission is enabled overall
+      #
+      # Events are enabled if:
+      # - Running in Kubernetes (KUBERNETES_SERVICE_HOST set)
+      # - Not explicitly disabled (DISABLE_K8S_EVENTS != 'true')
+      # - Explicitly enabled (ENABLE_K8S_EVENTS != 'false')
+      #
+      # @param config [Hash] Configuration hash from load
+      # @return [Boolean] True if events should be emitted
+      def self.enabled?(config = nil)
+        config ||= load
+        # Must be in Kubernetes environment
+        return false unless ENV.fetch('KUBERNETES_SERVICE_HOST', nil)
+        # Respect explicit disable flag (legacy)
+        return false if config[:disabled]
+        # Check enable flag
+        config[:enabled]
+      end
+      # Check if specific event type should be emitted
+      #
+      # @param event_type [Symbol] Event type (:success, :failure, :validation)
+      # @param config [Hash] Configuration hash from load
+      # @return [Boolean] True if this event type should be emitted
+      def self.should_emit?(event_type, config = nil)
+        return false unless enabled?(config)
+        config ||= load
+        case event_type
+        when :success
+          config[:emit_success_events]
+        when :failure
+          config[:emit_failure_events]
+        when :validation
+          config[:emit_validation_events]
+        else
+          false
+        end
+      end
+      # Get rate limiting configuration
+      #
+      # @param config [Hash] Configuration hash from load
+      # @return [Hash] Rate limiting settings
+      def self.rate_limit_config(config = nil)
+        config ||= load
+        {
+          per_minute: config[:rate_limit_per_minute],
+          batch_size: config[:batch_size],
+          batch_timeout_ms: config[:batch_timeout_ms]
+        }
+      end
+      # Get retry configuration for failed events
+      #
+      # @param config [Hash] Configuration hash from load
+      # @return [Hash] Retry settings
+      def self.retry_config(config = nil)
+        config ||= load
+        {
+          enabled: config[:retry_failed_events],
+          max_retries: config[:max_event_retries],
+          delay_ms: config[:retry_delay_ms]
+        }
+      end
+      # Get content configuration for event messages
+      #
+      # @param config [Hash] Configuration hash from load
+      # @return [Hash] Content settings
+      def self.content_config(config = nil)
+        config ||= load
+        {
+          include_task_metadata: config[:include_task_metadata],
+          include_error_details: config[:include_error_details],
+          truncate_long_messages: config[:truncate_long_messages],
+          max_message_length: config[:max_message_length]
+        }
+      end
+    end
+  end
+end

data/lib/language_operator/agent/safety/ast_validator.rb CHANGED Viewed

@@ -25,7 +25,7 @@ module LanguageOperator
           const_set const_get remove_const
           define_method define_singleton_method
           undef_method remove_method alias_method
-          exit exit! abort raise fail throw
+          exit exit! abort throw
           trap at_exit
           open
         ].freeze

data/lib/language_operator/agent/safety/safe_executor.rb CHANGED Viewed

@@ -36,7 +36,8 @@ module LanguageOperator
           # Step 3: Execute using instance_eval with smart constant injection
           # Only inject constants that won't conflict with user-defined ones
-          safe_constants = %w[Numeric Integer Float String Array Hash TrueClass FalseClass Time Date]
+          safe_constants = %w[Numeric Integer Float String Array Hash TrueClass FalseClass Time Date
+                              ArgumentError TypeError RuntimeError StandardError]
           # Find which constants user code defines to avoid redefinition warnings
           user_defined_constants = safe_constants.select { |const| code.include?("#{const} =") }
@@ -129,6 +130,9 @@ module LanguageOperator
             when :TrueClass, :FalseClass, :NilClass
               # Allow boolean and nil types
               ::Object.const_get(name)
+            when :ArgumentError, :TypeError, :RuntimeError, :StandardError
+              # Allow standard Ruby exception classes for error handling
+              ::Object.const_get(name)
             else
               # Security-by-default: explicitly deny access to any other constants
               # This prevents sandbox bypass through const_missing fallback

data/lib/language_operator/agent/task_executor.rb CHANGED Viewed

@@ -106,15 +106,11 @@ module LanguageOperator
       def execute_task(task_name, inputs: {}, timeout: nil, max_retries: nil)
         execution_start = Time.now
         max_retries ||= @config[:max_retries]
         # Reset JSON parsing retry flag for this task
         @parsing_retry_attempted = false
-        with_span('task_executor.execute_task', attributes: {
-                    'task.name' => task_name.to_s,
-                    'task.inputs' => inputs.keys.map(&:to_s).join(','),
-                    'task.max_retries' => max_retries
-                  }) do
+        with_span('task_executor.execute_task', attributes: build_task_execution_attributes(task_name, inputs, max_retries)) do
           # Fast task lookup using pre-built cache
           task_name_sym = task_name.to_sym
           task_info = @task_cache[task_name_sym]
@@ -140,15 +136,31 @@ module LanguageOperator
           OpenTelemetry::Trace.current_span&.set_attribute('task.timeout', timeout)
           # Execute with retry logic
-          execute_with_retry(task, task_name, inputs, timeout, max_retries, execution_start)
+          result = execute_with_retry(task, task_name, inputs, timeout, max_retries, execution_start)
+          # Add task outputs to span for learning system (if enabled)
+          current_span = OpenTelemetry::Trace.current_span
+          current_span&.set_attribute('task.outputs', result.to_json) if current_span && capture_enabled?(:outputs)
+          # Emit Kubernetes event for successful task completion
+          emit_task_execution_event(task_name, success: true, execution_start: execution_start)
+          result
         end
       rescue ArgumentError => e
         # Validation errors should not be retried - re-raise immediately
         log_task_error(task_name, e, :validation, execution_start)
+        emit_task_execution_event(task_name, success: false, execution_start: execution_start, error: e, event_type: :validation)
         raise TaskValidationError.new(task_name, e.message, e)
+      rescue TaskValidationError => e
+        # TaskValidationError from validate_inputs should be logged as :validation
+        log_task_error(task_name, e, :validation, execution_start)
+        emit_task_execution_event(task_name, success: false, execution_start: execution_start, error: e, event_type: :validation)
+        raise e
       rescue StandardError => e
         # Catch any unexpected errors that escaped retry logic
         log_task_error(task_name, e, :system, execution_start)
+        emit_task_execution_event(task_name, success: false, execution_start: execution_start, error: e)
         raise create_appropriate_error(task_name, e)
       end
@@ -371,6 +383,39 @@ module LanguageOperator
         'Agent::TaskExecutor'
       end
+      # Emit Kubernetes event for task execution
+      #
+      # @param task_name [Symbol, String] Task name
+      # @param success [Boolean] Whether task succeeded
+      # @param execution_start [Time] Task execution start time
+      # @param error [Exception, nil] Error if task failed
+      # @param event_type [Symbol, nil] Event type override (:success, :failure, :validation)
+      def emit_task_execution_event(task_name, success:, execution_start:, error: nil, event_type: nil)
+        return unless @agent.respond_to?(:kubernetes_client)
+        duration_ms = ((Time.now - execution_start) * 1000).round(2)
+        metadata = {
+          'task_type' => determine_task_type(@tasks[task_name.to_sym])
+        }
+        if error
+          metadata['error_type'] = error.class.name
+          metadata['error_category'] = categorize_error(error).to_s
+        end
+        @agent.kubernetes_client.emit_execution_event(
+          task_name.to_s,
+          success: success,
+          duration_ms: duration_ms,
+          metadata: metadata
+        )
+      rescue StandardError => e
+        logger.warn('Failed to emit task execution event',
+                    task: task_name,
+                    error: e.message)
+      end
       # Summarize hash values for logging (truncate long strings)
       # Optimized for performance with lazy computation
       #
@@ -620,6 +665,8 @@ module LanguageOperator
       # @param task [TaskDefinition] The task definition
       # @return [String] Task type
       def determine_task_type(task)
+        return nil unless task
         if task.neural? && task.symbolic?
           'hybrid'
         elsif task.neural?
@@ -964,6 +1011,49 @@ module LanguageOperator
         end
         cache
       end
+      # Build semantic attributes for task execution span
+      #
+      # Includes attributes required for learning status tracking:
+      # - task.name: Task identifier for learning controller
+      # - agent.name: Agent identifier (explicit for learning system)
+      # - gen_ai.operation.name: Semantic operation name
+      #
+      # @param task_name [Symbol] Name of the task being executed
+      # @param inputs [Hash] Task input parameters
+      # @param max_retries [Integer] Maximum retry attempts
+      # @return [Hash] Span attributes
+      def build_task_execution_attributes(task_name, inputs, max_retries)
+        attributes = {
+          # Core task identification (CRITICAL for learning system)
+          'task.name' => task_name.to_s,
+          'task.max_retries' => max_retries,
+          # Semantic operation name for better trace organization
+          'gen_ai.operation.name' => 'execute_task'
+        }
+        # Add task inputs - JSON-encoded if capture enabled, else just keys
+        attributes['task.inputs'] = if capture_enabled?(:inputs)
+                                      inputs.to_json
+                                    else
+                                      inputs.keys.map(&:to_s).join(',')
+                                    end
+        # Explicitly add agent name if available (redundant with resource attribute but ensures visibility)
+        if (agent_name = ENV.fetch('AGENT_NAME', nil))
+          attributes['agent.name'] = agent_name
+        end
+        # Add task type information if available
+        if (task_info = @task_cache[task_name.to_sym])
+          attributes['task.type'] = task_info[:type]
+          attributes['task.has_neural'] = task_info[:neural].to_s
+          attributes['task.has_symbolic'] = task_info[:symbolic].to_s
+        end
+        attributes
+      end
     end
   end
 end

data/lib/language_operator/agent/telemetry.rb CHANGED Viewed

@@ -70,6 +70,11 @@ module LanguageOperator
         # Build resource attributes from environment variables
         #
+        # Includes semantic attributes required for learning status tracking:
+        # - agent.name: Required for learning controller to identify agent executions
+        # - agent.mode: Agent operating mode (autonomous, scheduled, reactive)
+        # - service.version: Agent runtime version for observability
+        #
         # @return [Hash] Resource attributes
         def build_resource_attributes
           attributes = {}
@@ -83,9 +88,26 @@ module LanguageOperator
           # Kubernetes pod name
           attributes['k8s.pod.name'] = ENV['HOSTNAME'] if ENV['HOSTNAME']
-          # Agent-specific attributes
-          attributes['agent.name'] = ENV['AGENT_NAME'] if ENV['AGENT_NAME']
-          attributes['agent.mode'] = ENV['AGENT_MODE'] if ENV['AGENT_MODE']
+          # Agent-specific attributes (CRITICAL for learning system)
+          if (agent_name = ENV.fetch('AGENT_NAME', nil))
+            attributes['agent.name'] = agent_name
+            # Also set as service.name for better trace organization
+            attributes['service.name'] = "language-operator-agent-#{agent_name}"
+          else
+            warn 'AGENT_NAME environment variable not set - learning status tracking may not work correctly'
+          end
+          if (agent_mode = ENV.fetch('AGENT_MODE', nil))
+            attributes['agent.mode'] = agent_mode
+          end
+          # Agent runtime version for observability
+          attributes['service.version'] = LanguageOperator::VERSION if defined?(LanguageOperator::VERSION)
+          # Agent cluster context
+          if (cluster_name = ENV.fetch('AGENT_CLUSTER', nil))
+            attributes['agent.cluster'] = cluster_name
+          end
           attributes
         end

data/lib/language_operator/agent/web_server.rb CHANGED Viewed

@@ -179,16 +179,13 @@ module LanguageOperator
         # Drain and cleanup all executors in the pool
         executors_cleaned = 0
-        begin
-          loop do
-            executor = @executor_pool.pop(timeout: 0.1)
-            if executor
-              executor.cleanup_connections
-              executors_cleaned += 1
-            end
+        until @executor_pool.empty?
+          executor = @executor_pool.pop unless @executor_pool.empty?
+          if executor
+            executor.cleanup_connections
+            executors_cleaned += 1
           end
-        rescue ThreadError
-          # Pool is empty, we're done
         end
         puts "Cleaned up #{executors_cleaned} executors from pool"