language-operator 0.1.31 → 0.1.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +7 -8
- data/CHANGELOG.md +14 -0
- data/CI_STATUS.md +56 -0
- data/Gemfile.lock +2 -2
- data/Makefile +22 -6
- data/lib/language_operator/agent/base.rb +10 -6
- data/lib/language_operator/agent/executor.rb +19 -97
- data/lib/language_operator/agent/safety/ast_validator.rb +62 -43
- data/lib/language_operator/agent/safety/safe_executor.rb +27 -2
- data/lib/language_operator/agent/scheduler.rb +60 -0
- data/lib/language_operator/agent/task_executor.rb +548 -0
- data/lib/language_operator/agent.rb +90 -27
- data/lib/language_operator/cli/base_command.rb +117 -0
- data/lib/language_operator/cli/commands/agent.rb +339 -407
- data/lib/language_operator/cli/commands/cluster.rb +274 -290
- data/lib/language_operator/cli/commands/install.rb +110 -119
- data/lib/language_operator/cli/commands/model.rb +284 -184
- data/lib/language_operator/cli/commands/persona.rb +218 -284
- data/lib/language_operator/cli/commands/quickstart.rb +4 -5
- data/lib/language_operator/cli/commands/status.rb +31 -35
- data/lib/language_operator/cli/commands/system.rb +221 -233
- data/lib/language_operator/cli/commands/tool.rb +356 -422
- data/lib/language_operator/cli/commands/use.rb +19 -22
- data/lib/language_operator/cli/helpers/resource_dependency_checker.rb +0 -18
- data/lib/language_operator/cli/wizards/quickstart_wizard.rb +0 -1
- data/lib/language_operator/client/config.rb +20 -21
- data/lib/language_operator/config.rb +115 -3
- data/lib/language_operator/constants.rb +54 -0
- data/lib/language_operator/dsl/agent_context.rb +7 -7
- data/lib/language_operator/dsl/agent_definition.rb +111 -26
- data/lib/language_operator/dsl/config.rb +30 -66
- data/lib/language_operator/dsl/main_definition.rb +114 -0
- data/lib/language_operator/dsl/schema.rb +84 -43
- data/lib/language_operator/dsl/task_definition.rb +315 -0
- data/lib/language_operator/dsl.rb +0 -1
- data/lib/language_operator/instrumentation/task_tracer.rb +285 -0
- data/lib/language_operator/logger.rb +4 -4
- data/lib/language_operator/synthesis_test_harness.rb +324 -0
- data/lib/language_operator/templates/examples/agent_synthesis.tmpl +26 -8
- data/lib/language_operator/templates/schema/CHANGELOG.md +26 -0
- data/lib/language_operator/templates/schema/agent_dsl_openapi.yaml +1 -1
- data/lib/language_operator/templates/schema/agent_dsl_schema.json +84 -42
- data/lib/language_operator/type_coercion.rb +250 -0
- data/lib/language_operator/ux/base.rb +81 -0
- data/lib/language_operator/ux/concerns/README.md +155 -0
- data/lib/language_operator/ux/concerns/headings.rb +90 -0
- data/lib/language_operator/ux/concerns/input_validation.rb +146 -0
- data/lib/language_operator/ux/concerns/provider_helpers.rb +167 -0
- data/lib/language_operator/ux/create_agent.rb +252 -0
- data/lib/language_operator/ux/create_model.rb +267 -0
- data/lib/language_operator/ux/quickstart.rb +594 -0
- data/lib/language_operator/version.rb +1 -1
- data/lib/language_operator.rb +2 -0
- data/requirements/ARCHITECTURE.md +1 -0
- data/requirements/SCRATCH.md +153 -0
- data/requirements/dsl.md +0 -0
- data/requirements/features +1 -0
- data/requirements/personas +1 -0
- data/requirements/proposals +1 -0
- data/requirements/tasks/iterate.md +14 -15
- data/requirements/tasks/optimize.md +13 -4
- data/synth/001/Makefile +90 -0
- data/synth/001/agent.rb +26 -0
- data/synth/001/agent.yaml +7 -0
- data/synth/001/output.log +44 -0
- data/synth/Makefile +39 -0
- data/synth/README.md +342 -0
- metadata +37 -10
- data/lib/language_operator/dsl/workflow_definition.rb +0 -259
- data/test_agent_dsl.rb +0 -108
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
# Knowledge Base
|
|
2
|
+
|
|
3
|
+
Living document of critical insights, patterns, and gotchas for this codebase.
|
|
4
|
+
|
|
5
|
+
## DSL Architecture (v1 - Current)
|
|
6
|
+
|
|
7
|
+
**Core Model:** Task/Main (imperative, replacing declarative workflow/step)
|
|
8
|
+
|
|
9
|
+
**Key Components:**
|
|
10
|
+
- `TaskDefinition`: Organic functions with stable contracts (inputs/outputs), evolving implementations (neural→symbolic)
|
|
11
|
+
- `MainDefinition`: Imperative entry point using Ruby control flow + `execute_task()`
|
|
12
|
+
- `TypeSchema`: 7-type system (string, integer, number, boolean, array, hash, any)
|
|
13
|
+
|
|
14
|
+
**Migration Strategy:**
|
|
15
|
+
- DSL v0 (workflow/step) marked deprecated but fully functional
|
|
16
|
+
- Both models supported in schema generation for backward compatibility
|
|
17
|
+
- Deprecation clearly noted in descriptions, safe methods updated
|
|
18
|
+
|
|
19
|
+
## Testing Patterns
|
|
20
|
+
|
|
21
|
+
**RSpec Best Practices:**
|
|
22
|
+
- Use single-quoted heredocs (`<<~'RUBY'`) when testing code with interpolation to avoid context issues
|
|
23
|
+
- RuboCop requires uppercase annotation keywords with colon+space (e.g., `# TODO: fix`)
|
|
24
|
+
- Symbol hash keys: Use `.keys.first.to_s` or `.values.first` for pattern properties, not direct string access
|
|
25
|
+
|
|
26
|
+
**Parser Gem Quirks:**
|
|
27
|
+
- Very forgiving - accepts syntax variations Ruby rejects
|
|
28
|
+
- Makes syntax error testing difficult (2 pending tests skipped for this reason)
|
|
29
|
+
- AST validation works well for semantic checks, less so for syntactic ones
|
|
30
|
+
|
|
31
|
+
## Schema Generation
|
|
32
|
+
|
|
33
|
+
**JSON Schema Patterns:**
|
|
34
|
+
- `patternProperties` for flexible parameter validation without enumeration
|
|
35
|
+
- Regex patterns as keys validate both names and types dynamically
|
|
36
|
+
- Always include `examples` for complex schemas (aids understanding)
|
|
37
|
+
|
|
38
|
+
## Security (AST Validator)
|
|
39
|
+
|
|
40
|
+
**Safe Methods Lists:**
|
|
41
|
+
- DSL v1: `task`, `main`, `execute_task`, `inputs`, `outputs`, `instructions`
|
|
42
|
+
- DSL v0: Removed `workflow`, `step`, `depends_on`, `prompt`
|
|
43
|
+
- Helpers: Added `TypeCoercion` for validation
|
|
44
|
+
|
|
45
|
+
**Blocked Patterns:**
|
|
46
|
+
- System execution: `system`, `exec`, `spawn`, `fork`
|
|
47
|
+
- Dynamic evaluation: `eval`, `instance_eval`, `class_eval`, `send`
|
|
48
|
+
- File operations: Direct `File` access, dangerous IO
|
|
49
|
+
- Works in both task blocks and main blocks
|
|
50
|
+
|
|
51
|
+
## Critical File Map
|
|
52
|
+
|
|
53
|
+
| File | Purpose | Complexity |
|
|
54
|
+
|------|---------|------------|
|
|
55
|
+
| `lib/language_operator/dsl/schema.rb` | JSON Schema generation (DSL→schema) | High (1100+ lines) |
|
|
56
|
+
| `lib/language_operator/dsl/task_definition.rb` | Task contract+validation | Medium (316 lines) |
|
|
57
|
+
| `lib/language_operator/dsl/main_definition.rb` | Main block execution | Low (115 lines) |
|
|
58
|
+
| `lib/language_operator/agent/task_executor.rb` | Neural/symbolic task execution | Medium (233 lines) |
|
|
59
|
+
| `lib/language_operator/agent/safety/ast_validator.rb` | Code security validation | High |
|
|
60
|
+
| `spec/language_operator/dsl/schema_spec.rb` | Schema test coverage (186 tests) | High |
|
|
61
|
+
| `spec/language_operator/agent/task_executor_spec.rb` | Task executor tests (19 tests) | Medium |
|
|
62
|
+
|
|
63
|
+
## Current Status
|
|
64
|
+
|
|
65
|
+
**Completed (2025-11-14):**
|
|
66
|
+
- ✅ Issue #26: Schema generation for task/main model
|
|
67
|
+
- ✅ Issue #25: AST validator updated for DSL v1
|
|
68
|
+
- ✅ Issues #21-23: TaskDefinition, MainDefinition, TypeCoercion implemented
|
|
69
|
+
- ✅ Issue #28: TaskExecutor for task execution runtime
|
|
70
|
+
- ✅ Issue #32 (partial): DependencyGraph and ParallelExecutor for implicit parallelism
|
|
71
|
+
|
|
72
|
+
**Test Suite Health:**
|
|
73
|
+
- 135 examples, 0 failures, 2 pending (syntax error tests)
|
|
74
|
+
- 186 schema-specific tests, all passing
|
|
75
|
+
- 19 TaskExecutor tests, all passing
|
|
76
|
+
- 20 DependencyGraph tests, all passing
|
|
77
|
+
- 11 ParallelExecutor tests, all passing
|
|
78
|
+
- RuboCop clean
|
|
79
|
+
|
|
80
|
+
## Task Execution (DSL v1)
|
|
81
|
+
|
|
82
|
+
**Neural Task Flow:**
|
|
83
|
+
1. TaskExecutor builds prompt from task instructions + inputs + output schema
|
|
84
|
+
2. LLM called via `agent.send_message` with full tool access
|
|
85
|
+
3. Response parsed as JSON (supports ```json blocks or raw objects)
|
|
86
|
+
4. Outputs validated against schema via TaskDefinition#validate_outputs
|
|
87
|
+
5. Fail fast on any error (critical for re-synthesis)
|
|
88
|
+
|
|
89
|
+
**Symbolic Task Flow:**
|
|
90
|
+
1. TaskExecutor calls TaskDefinition#call with inputs and self as context
|
|
91
|
+
2. TaskDefinition validates inputs, executes code block, validates outputs
|
|
92
|
+
3. Context provides `execute_task`, `execute_llm`, `execute_tool` helpers
|
|
93
|
+
4. Fail fast on any error
|
|
94
|
+
|
|
95
|
+
**Runtime Wiring:**
|
|
96
|
+
- Agent module detects DSL v1 (main block) vs v0 (workflow)
|
|
97
|
+
- Autonomous mode: `execute_main_block` creates TaskExecutor, calls MainDefinition
|
|
98
|
+
- Scheduled mode: `Scheduler#start_with_main` creates TaskExecutor, schedules main
|
|
99
|
+
- MainDefinition receives TaskExecutor as execution context via instance_exec
|
|
100
|
+
|
|
101
|
+
## Parallel Execution (DSL v1)
|
|
102
|
+
|
|
103
|
+
**Architecture:**
|
|
104
|
+
- DependencyGraph: AST-based analysis extracts task dependencies from main block
|
|
105
|
+
- ParallelExecutor: Level-based execution using Concurrent::FixedThreadPool
|
|
106
|
+
- Default pool size: 4 threads (configurable)
|
|
107
|
+
|
|
108
|
+
**How It Works:**
|
|
109
|
+
1. Parse main block code to extract `execute_task` calls
|
|
110
|
+
2. Build dependency graph based on variable flow (which tasks use outputs from which other tasks)
|
|
111
|
+
3. Assign execution levels via topological sort
|
|
112
|
+
4. Execute each level in parallel (all tasks in level run concurrently)
|
|
113
|
+
5. Wait for level completion before starting next level
|
|
114
|
+
|
|
115
|
+
**Performance:**
|
|
116
|
+
- Measured 2x speedup for I/O-bound parallel tasks
|
|
117
|
+
- Thread pool handles > pool size tasks gracefully
|
|
118
|
+
- Fail-fast error handling (collects all errors, raises RuntimeError)
|
|
119
|
+
|
|
120
|
+
**Current Status (2025-11-14):**
|
|
121
|
+
- ✅ DependencyGraph: Complete and tested (20 tests)
|
|
122
|
+
- ✅ ParallelExecutor: Complete and tested (11 tests)
|
|
123
|
+
- ⚠️ Integration: Partial - blocked on variable-to-result mapping
|
|
124
|
+
|
|
125
|
+
**Blocking Issue:**
|
|
126
|
+
The fundamental challenge is mapping variable names from code to task results:
|
|
127
|
+
```ruby
|
|
128
|
+
# User code:
|
|
129
|
+
s1 = execute_task(:fetch1)
|
|
130
|
+
merged = execute_task(:merge, inputs: { s1: s1 })
|
|
131
|
+
|
|
132
|
+
# ParallelExecutor passes: { fetch1: {...} }
|
|
133
|
+
# But merge expects: { s1: {...} }
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**Solution Options:**
|
|
137
|
+
1. Enhanced AST analysis (complex, 2-3 days)
|
|
138
|
+
2. Naming convention: var name = task name (simple, 1 day)
|
|
139
|
+
3. Explicit dependency DSL (medium, 1-2 days)
|
|
140
|
+
4. Defer to follow-up issue (pragmatic, 0 days)
|
|
141
|
+
|
|
142
|
+
**Recommendation:** Option 4 - defer integration, ship infrastructure
|
|
143
|
+
|
|
144
|
+
## Quick Wins / Common Gotchas
|
|
145
|
+
|
|
146
|
+
1. **Hash Key Access:** Ruby symbols ≠ strings. Always check key types in tests.
|
|
147
|
+
2. **Heredoc Interpolation:** Use `'RUBY'` (single quotes) to prevent RSpec context leakage.
|
|
148
|
+
3. **Pattern Properties:** Schema validation via regex - powerful for type systems.
|
|
149
|
+
4. **Migration-Friendly:** Keep deprecated features functional with clear warnings.
|
|
150
|
+
5. **Parser Tolerance:** Don't rely on parser for syntax validation - it's too forgiving.
|
|
151
|
+
6. **Tool Execution:** Tools accessed via LLM interface, not direct RPC (execute_tool → execute_llm)
|
|
152
|
+
7. **Error Wrapping:** TaskExecutor wraps errors in RuntimeError with task context for debugging
|
|
153
|
+
8. **Concurrent Ruby Futures:** Use `future.wait` + `future.rejected?` to check status, not `rescue` around `future.value`
|
data/requirements/dsl.md
ADDED
|
File without changes
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
../../requirements/features
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
../../requirements/personas
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
../../requirements/proposals
|
|
@@ -1,35 +1,34 @@
|
|
|
1
1
|
# Task
|
|
2
2
|
|
|
3
|
-
##
|
|
4
|
-
|
|
5
|
-
Adopt the [ruby-engineer](../../../requirements/personas/ruby-engineer.md) persona while executing these instructions, please.
|
|
3
|
+
## Prerequisites
|
|
6
4
|
|
|
7
|
-
|
|
5
|
+
Please read the following context files:
|
|
8
6
|
|
|
9
|
-
|
|
7
|
+
* Persona: requirements/personas/ruby-engineer.md
|
|
8
|
+
* Feature Spec: requirements/proposals/dsl-v1.md
|
|
9
|
+
* Scratch: requirements/SCRATCH.md
|
|
10
10
|
|
|
11
|
-
##
|
|
11
|
+
## Persona
|
|
12
12
|
|
|
13
|
-
|
|
14
|
-
Issues are found using the `gh` command for this project:
|
|
15
|
-
- Owner: language-operator
|
|
16
|
-
- Repository: language-operator-gem
|
|
13
|
+
**CRITICAL**: Adopt the ruby-engineer persona while executing these instructions, please.
|
|
17
14
|
|
|
18
15
|
## Instructions
|
|
19
16
|
|
|
20
17
|
Follow these directions closely:
|
|
21
18
|
|
|
22
|
-
1. Use the
|
|
19
|
+
1. Use the `gh` tool to find the top issue for this repository (language-operator/language-operator-gem) with the "ready" label.
|
|
23
20
|
2. Investigate if it's valid, or a mis-use of the intended feature.
|
|
24
21
|
3. **CRITICAL:** Switch to plan mode, and propose an implementation plan. Await my feedback.
|
|
25
22
|
4. Add your implementation plan as a comment on the issue.
|
|
26
|
-
5. Implement
|
|
23
|
+
5. Implement your plan.
|
|
27
24
|
6. Run existing tests, and add new ones if necessary. Remember to include CI. Remember the linter and that bundler will fail if it's out of sync with its lockfile.
|
|
28
25
|
7. **CRITICAL:** Test the actual functionality manually before committing. If it's a CLI command, run it. If it's library code, test it in the appropriate context. Never commit untested code.
|
|
29
|
-
8. Commit the change
|
|
26
|
+
8. Commit the change with a semantic, ONE LINE message, like 'feat: create task_definition structure'.
|
|
30
27
|
9. **CRITICAL:** Halt while CI runs and await my feedback.
|
|
31
|
-
10.
|
|
32
|
-
11. Resolve the issue.
|
|
28
|
+
10. Add resolution details as a comment on the GitHub issue.
|
|
29
|
+
11. Resolve the GitHub issue.
|
|
30
|
+
|
|
31
|
+
Consider if you need to update requirements/SCRATCH.md for the next run.
|
|
33
32
|
|
|
34
33
|
## Output
|
|
35
34
|
|
|
@@ -5,17 +5,26 @@
|
|
|
5
5
|
|
|
6
6
|
Optimize
|
|
7
7
|
|
|
8
|
+
## Inputs
|
|
9
|
+
|
|
10
|
+
- :persona string - the persona to adopt when executing this task (default: ruby-engineer)
|
|
11
|
+
|
|
8
12
|
## Persona
|
|
9
13
|
|
|
10
|
-
Adopt the
|
|
14
|
+
Adopt the `requirements/personas/:persona.md` persona while executing these instructions, please.
|
|
11
15
|
|
|
12
16
|
## Instructions
|
|
13
17
|
|
|
14
|
-
Suggest an improvement that could improve the quality of the codebase or developer experience. Things like
|
|
18
|
+
Suggest an improvement that could improve the quality of the codebase or developer experience. Things like:
|
|
19
|
+
- opportunities to reduce lines of code
|
|
20
|
+
- DRYing up code
|
|
21
|
+
- Dead code paths
|
|
22
|
+
- Duplicate utility implementations
|
|
23
|
+
- Magic strings
|
|
24
|
+
- Other forms of tech debt
|
|
15
25
|
|
|
16
26
|
An important thing to consider is that this code has been written by different agents with different contexts, who may not have been aware of overall patterns. These kinds of optimizations are high priority.
|
|
17
27
|
|
|
18
28
|
## Output
|
|
19
29
|
|
|
20
|
-
|
|
21
|
-
A proposed list of three concrete optimizations.
|
|
30
|
+
Propose ONE high-impact optimization or refactor.
|
data/synth/001/Makefile
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
.PHONY: synthesize synthesize-all synthesize-sonnet synthesize-gpt-4 run run-docker validate clean compare help
|
|
2
|
+
|
|
3
|
+
# Default model - inherit from SYNTHESIS_MODEL env var, no hardcoded fallback
|
|
4
|
+
MODEL ?= $(SYNTHESIS_MODEL)
|
|
5
|
+
|
|
6
|
+
# Model-specific names
|
|
7
|
+
SONNET_MODEL = claude-3-7-sonnet-20250219
|
|
8
|
+
GPT4_MODEL = gpt-4-turbo
|
|
9
|
+
|
|
10
|
+
help:
|
|
11
|
+
@echo "Synthesis Test Targets:"
|
|
12
|
+
@echo " make synthesize - Generate agent.rb using default model"
|
|
13
|
+
@echo " make synthesize-all - Generate for all configured models"
|
|
14
|
+
@echo " make synthesize-sonnet - Generate agent.sonnet.rb"
|
|
15
|
+
@echo " make synthesize-gpt-4 - Generate agent.gpt-4.rb"
|
|
16
|
+
@echo " make run - Execute agent.rb locally with bundle exec"
|
|
17
|
+
@echo " make run-docker - Execute agent.rb in Docker container"
|
|
18
|
+
@echo " make validate - Validate agent.rb syntax"
|
|
19
|
+
@echo " make clean - Remove all generated .rb files"
|
|
20
|
+
@echo " make compare - Compare outputs from different models"
|
|
21
|
+
|
|
22
|
+
synthesize:
|
|
23
|
+
@echo "Synthesizing agent.rb with model: $(MODEL)..."
|
|
24
|
+
@bundle exec ruby -I ../../lib -e "\
|
|
25
|
+
require 'language_operator/synthesis_test_harness'; \
|
|
26
|
+
harness = LanguageOperator::SynthesisTestHarness.new(model: '$(MODEL)'); \
|
|
27
|
+
code = harness.synthesize('agent.yaml'); \
|
|
28
|
+
File.write('agent.rb', code); \
|
|
29
|
+
puts 'Generated: agent.rb'"
|
|
30
|
+
|
|
31
|
+
synthesize-sonnet:
|
|
32
|
+
@echo "Synthesizing with Claude Sonnet..."
|
|
33
|
+
@bundle exec ruby -I ../../lib -e "\
|
|
34
|
+
require 'language_operator/synthesis_test_harness'; \
|
|
35
|
+
harness = LanguageOperator::SynthesisTestHarness.new(model: '$(SONNET_MODEL)'); \
|
|
36
|
+
code = harness.synthesize('agent.yaml'); \
|
|
37
|
+
File.write('agent.sonnet.rb', code); \
|
|
38
|
+
puts 'Generated: agent.sonnet.rb'"
|
|
39
|
+
|
|
40
|
+
synthesize-gpt-4:
|
|
41
|
+
@echo "Synthesizing with GPT-4..."
|
|
42
|
+
@bundle exec ruby -I ../../lib -e "\
|
|
43
|
+
require 'language_operator/synthesis_test_harness'; \
|
|
44
|
+
harness = LanguageOperator::SynthesisTestHarness.new(model: '$(GPT4_MODEL)'); \
|
|
45
|
+
code = harness.synthesize('agent.yaml'); \
|
|
46
|
+
File.write('agent.gpt-4.rb', code); \
|
|
47
|
+
puts 'Generated: agent.gpt-4.rb'"
|
|
48
|
+
|
|
49
|
+
synthesize-all: synthesize-sonnet synthesize-gpt-4
|
|
50
|
+
@echo "All synthesis complete!"
|
|
51
|
+
|
|
52
|
+
run:
|
|
53
|
+
@if [ ! -f agent.rb ]; then \
|
|
54
|
+
echo "Error: agent.rb not found. Run 'make synthesize' first."; \
|
|
55
|
+
exit 1; \
|
|
56
|
+
fi
|
|
57
|
+
@echo "Executing agent.rb in Docker container..."
|
|
58
|
+
@docker run --rm -i \
|
|
59
|
+
--network host \
|
|
60
|
+
-e AGENT_NAME=hello-world \
|
|
61
|
+
-e AGENT_MODE=autonomous \
|
|
62
|
+
-e AGENT_CODE_PATH=/agent/agent.rb \
|
|
63
|
+
-e LLM_MODEL=$(SYNTHESIS_MODEL) \
|
|
64
|
+
-e MODEL_ENDPOINTS=$(SYNTHESIS_ENDPOINT) \
|
|
65
|
+
-e OPENAI_API_KEY=$(SYNTHESIS_API_KEY) \
|
|
66
|
+
-e ANTHROPIC_API_KEY=$(ANTHROPIC_API_KEY) \
|
|
67
|
+
-v $(PWD)/agent.rb:/agent/agent.rb:ro \
|
|
68
|
+
ghcr.io/language-operator/agent:dev
|
|
69
|
+
|
|
70
|
+
validate:
|
|
71
|
+
@if [ ! -f agent.rb ]; then \
|
|
72
|
+
echo "Error: agent.rb not found. Run 'make synthesize' first."; \
|
|
73
|
+
exit 1; \
|
|
74
|
+
fi
|
|
75
|
+
@echo "Validating agent.rb..."
|
|
76
|
+
@ruby -c agent.rb && echo "Syntax: OK"
|
|
77
|
+
|
|
78
|
+
clean:
|
|
79
|
+
@echo "Cleaning generated files..."
|
|
80
|
+
@rm -f agent.rb agent.*.rb
|
|
81
|
+
@echo "Clean complete!"
|
|
82
|
+
|
|
83
|
+
compare:
|
|
84
|
+
@echo "Comparing model outputs..."
|
|
85
|
+
@if [ -f agent.sonnet.rb ] && [ -f agent.gpt-4.rb ]; then \
|
|
86
|
+
echo "=== Claude Sonnet vs GPT-4 ==="; \
|
|
87
|
+
diff -u agent.sonnet.rb agent.gpt-4.rb || true; \
|
|
88
|
+
else \
|
|
89
|
+
echo "Error: Run 'make synthesize-all' first to generate comparison files"; \
|
|
90
|
+
fi
|
data/synth/001/agent.rb
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'language_operator'
|
|
4
|
+
|
|
5
|
+
agent 'hello-world' do
|
|
6
|
+
description 'Logs a message to stdout'
|
|
7
|
+
|
|
8
|
+
task :log_message,
|
|
9
|
+
instructions: "log the message 'Hello, world!' to agent logs",
|
|
10
|
+
inputs: {},
|
|
11
|
+
outputs: { result: 'string' }
|
|
12
|
+
|
|
13
|
+
main do |_inputs|
|
|
14
|
+
puts 'Hello, world!'
|
|
15
|
+
{ result: 'message logged' }
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
constraints do
|
|
19
|
+
max_iterations 999_999
|
|
20
|
+
timeout '10m'
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
output do
|
|
24
|
+
workspace 'results/output.txt'
|
|
25
|
+
end
|
|
26
|
+
end
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
Executing agent.rb in Docker container...
|
|
2
|
+
📁 /agent/agent.rb
|
|
3
|
+
[1;36m·[0m OpenTelemetry disabled
|
|
4
|
+
[1;36m·[0m Configuring LLM (provider=openai_compatible, model=mistralai/magistral-small-2509, timeout=300)
|
|
5
|
+
[1;36m·[0m LLM configuration complete
|
|
6
|
+
[1;36m·[0m No MCP servers configured, agent will run without tools
|
|
7
|
+
[1;36m·[0m Chat session initialized (with_tools=false)
|
|
8
|
+
[1;36m·[0m Audit logger initialized (log_path=/tmp/langop-audit.jsonl)
|
|
9
|
+
[1;36m·[0m Safety manager initialized (enabled=true, budget_tracking=false, rate_limiting=false, content_filtering=false, audit_logging=true)
|
|
10
|
+
[1;36m·[0m Starting workflow execution: hello-world
|
|
11
|
+
[1;34m·[0m [1mPrompt sent to LLM:[0m
|
|
12
|
+
# Task: Logs a message to stdout
|
|
13
|
+
|
|
14
|
+
## Objectives:
|
|
15
|
+
- Log the message 'Hello, world!' to agent logs
|
|
16
|
+
|
|
17
|
+
## Workflow Steps:
|
|
18
|
+
Log message
|
|
19
|
+
|
|
20
|
+
## Constraints:
|
|
21
|
+
- Maximum iterations: 999999
|
|
22
|
+
- Timeout: 10m
|
|
23
|
+
|
|
24
|
+
Please complete this task following the workflow steps.
|
|
25
|
+
|
|
26
|
+
[1;36m·[0m LLM request (34.877s)
|
|
27
|
+
[1;35m·[0m [1mLLM Response:[0m
|
|
28
|
+
To solve the task of logging 'Hello, world!' to stdout, the most straightforward and universally applicable solution is to use a print statement. Given that the task does not specify a programming language, Python's `print` function is chosen as it is widely used and easily understandable. The constraints of maximum iterations and timeout do not affect this simple task, as it is a one-time operation.
|
|
29
|
+
|
|
30
|
+
Here is the solution:
|
|
31
|
+
|
|
32
|
+
```python
|
|
33
|
+
print('Hello, world!')
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
This code snippet will output 'Hello, world!' to the standard output, fulfilling the task's requirement. The simplicity of this solution ensures that it is both correct and appropriate for the given instructions.
|
|
37
|
+
|
|
38
|
+
[1;33m·[0m Could not write output to workspace: No such file or directory @ rb_sysopen - /workspace/output.txt
|
|
39
|
+
[1;36m·[0m Output (first 500 chars): [THINK]Alright, I need to log the message 'Hello, world!' to stdout. The task is straightforward, but let's make sure I understand all the requirements.
|
|
40
|
+
|
|
41
|
+
First, the objective is clear: log 'Hello, world!' to agent logs. The workflow step says "Log message," which seems to be the action required.
|
|
42
|
+
|
|
43
|
+
Constraints mention a maximum of 999999 iterations and a timeout of 10 minutes. But since this is just logging a message, it's likely a one-time action, so iterations might not be relevant here unless th
|
|
44
|
+
[1;36m·[0m Workflow execution completed (34.9s, total_tokens=1020, estimated_cost=$0.0)
|
data/synth/Makefile
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
.PHONY: test test-all clean help list
|
|
2
|
+
|
|
3
|
+
help:
|
|
4
|
+
@echo "Synthesis Test Suite"
|
|
5
|
+
@echo ""
|
|
6
|
+
@echo "Available targets:"
|
|
7
|
+
@echo " make test - Run synthesis for all test cases"
|
|
8
|
+
@echo " make test-all - Run synthesis for all models in all test cases"
|
|
9
|
+
@echo " make clean - Clean all generated files"
|
|
10
|
+
@echo " make list - List all test cases"
|
|
11
|
+
@echo ""
|
|
12
|
+
@echo "Individual test cases:"
|
|
13
|
+
@echo " make test-001 - Run test 001 (hello-world)"
|
|
14
|
+
@echo ""
|
|
15
|
+
@echo "Usage:"
|
|
16
|
+
@echo " cd 001 && make synthesize - Synthesize single test with default model"
|
|
17
|
+
@echo " cd 001 && make synthesize-all - Synthesize with all models"
|
|
18
|
+
@echo " cd 001 && make run - Execute synthesized code"
|
|
19
|
+
|
|
20
|
+
test:
|
|
21
|
+
@echo "Running synthesis tests..."
|
|
22
|
+
@$(MAKE) -C 001 synthesize
|
|
23
|
+
|
|
24
|
+
test-001:
|
|
25
|
+
@echo "Running test 001..."
|
|
26
|
+
@$(MAKE) -C 001 synthesize
|
|
27
|
+
|
|
28
|
+
test-all:
|
|
29
|
+
@echo "Running synthesis with all models..."
|
|
30
|
+
@$(MAKE) -C 001 synthesize-all
|
|
31
|
+
|
|
32
|
+
clean:
|
|
33
|
+
@echo "Cleaning all test artifacts..."
|
|
34
|
+
@$(MAKE) -C 001 clean
|
|
35
|
+
@echo "Clean complete!"
|
|
36
|
+
|
|
37
|
+
list:
|
|
38
|
+
@echo "Available test cases:"
|
|
39
|
+
@echo " 001 - hello-world (Say something in your logs)"
|