language-operator 0.0.1 → 0.1.30
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +125 -0
- data/CHANGELOG.md +53 -0
- data/Gemfile +8 -0
- data/Gemfile.lock +284 -0
- data/LICENSE +229 -21
- data/Makefile +77 -0
- data/README.md +3 -11
- data/Rakefile +34 -0
- data/bin/aictl +7 -0
- data/completions/_aictl +232 -0
- data/completions/aictl.bash +121 -0
- data/completions/aictl.fish +114 -0
- data/docs/architecture/agent-runtime.md +585 -0
- data/docs/dsl/agent-reference.md +591 -0
- data/docs/dsl/best-practices.md +1078 -0
- data/docs/dsl/chat-endpoints.md +895 -0
- data/docs/dsl/constraints.md +671 -0
- data/docs/dsl/mcp-integration.md +1177 -0
- data/docs/dsl/webhooks.md +932 -0
- data/docs/dsl/workflows.md +744 -0
- data/examples/README.md +569 -0
- data/examples/agent_example.rb +86 -0
- data/examples/chat_endpoint_agent.rb +118 -0
- data/examples/github_webhook_agent.rb +171 -0
- data/examples/mcp_agent.rb +158 -0
- data/examples/oauth_callback_agent.rb +296 -0
- data/examples/stripe_webhook_agent.rb +219 -0
- data/examples/webhook_agent.rb +80 -0
- data/lib/language_operator/agent/base.rb +110 -0
- data/lib/language_operator/agent/executor.rb +440 -0
- data/lib/language_operator/agent/instrumentation.rb +54 -0
- data/lib/language_operator/agent/metrics_tracker.rb +183 -0
- data/lib/language_operator/agent/safety/ast_validator.rb +272 -0
- data/lib/language_operator/agent/safety/audit_logger.rb +104 -0
- data/lib/language_operator/agent/safety/budget_tracker.rb +175 -0
- data/lib/language_operator/agent/safety/content_filter.rb +93 -0
- data/lib/language_operator/agent/safety/manager.rb +207 -0
- data/lib/language_operator/agent/safety/rate_limiter.rb +150 -0
- data/lib/language_operator/agent/safety/safe_executor.rb +115 -0
- data/lib/language_operator/agent/scheduler.rb +183 -0
- data/lib/language_operator/agent/telemetry.rb +116 -0
- data/lib/language_operator/agent/web_server.rb +610 -0
- data/lib/language_operator/agent/webhook_authenticator.rb +226 -0
- data/lib/language_operator/agent.rb +149 -0
- data/lib/language_operator/cli/commands/agent.rb +1252 -0
- data/lib/language_operator/cli/commands/cluster.rb +335 -0
- data/lib/language_operator/cli/commands/install.rb +404 -0
- data/lib/language_operator/cli/commands/model.rb +266 -0
- data/lib/language_operator/cli/commands/persona.rb +396 -0
- data/lib/language_operator/cli/commands/quickstart.rb +22 -0
- data/lib/language_operator/cli/commands/status.rb +156 -0
- data/lib/language_operator/cli/commands/tool.rb +537 -0
- data/lib/language_operator/cli/commands/use.rb +47 -0
- data/lib/language_operator/cli/errors/handler.rb +180 -0
- data/lib/language_operator/cli/errors/suggestions.rb +176 -0
- data/lib/language_operator/cli/formatters/code_formatter.rb +81 -0
- data/lib/language_operator/cli/formatters/log_formatter.rb +290 -0
- data/lib/language_operator/cli/formatters/progress_formatter.rb +53 -0
- data/lib/language_operator/cli/formatters/table_formatter.rb +179 -0
- data/lib/language_operator/cli/formatters/value_formatter.rb +113 -0
- data/lib/language_operator/cli/helpers/cluster_context.rb +62 -0
- data/lib/language_operator/cli/helpers/cluster_validator.rb +101 -0
- data/lib/language_operator/cli/helpers/editor_helper.rb +58 -0
- data/lib/language_operator/cli/helpers/kubeconfig_validator.rb +167 -0
- data/lib/language_operator/cli/helpers/resource_dependency_checker.rb +74 -0
- data/lib/language_operator/cli/helpers/schedule_builder.rb +108 -0
- data/lib/language_operator/cli/helpers/user_prompts.rb +69 -0
- data/lib/language_operator/cli/main.rb +232 -0
- data/lib/language_operator/cli/templates/tools/generic.yaml +66 -0
- data/lib/language_operator/cli/wizards/agent_wizard.rb +246 -0
- data/lib/language_operator/cli/wizards/quickstart_wizard.rb +588 -0
- data/lib/language_operator/client/base.rb +214 -0
- data/lib/language_operator/client/config.rb +136 -0
- data/lib/language_operator/client/cost_calculator.rb +37 -0
- data/lib/language_operator/client/mcp_connector.rb +123 -0
- data/lib/language_operator/client.rb +19 -0
- data/lib/language_operator/config/cluster_config.rb +101 -0
- data/lib/language_operator/config/tool_patterns.yaml +57 -0
- data/lib/language_operator/config/tool_registry.rb +96 -0
- data/lib/language_operator/config.rb +138 -0
- data/lib/language_operator/dsl/adapter.rb +124 -0
- data/lib/language_operator/dsl/agent_context.rb +90 -0
- data/lib/language_operator/dsl/agent_definition.rb +427 -0
- data/lib/language_operator/dsl/chat_endpoint_definition.rb +115 -0
- data/lib/language_operator/dsl/config.rb +119 -0
- data/lib/language_operator/dsl/context.rb +50 -0
- data/lib/language_operator/dsl/execution_context.rb +47 -0
- data/lib/language_operator/dsl/helpers.rb +109 -0
- data/lib/language_operator/dsl/http.rb +184 -0
- data/lib/language_operator/dsl/mcp_server_definition.rb +73 -0
- data/lib/language_operator/dsl/parameter_definition.rb +124 -0
- data/lib/language_operator/dsl/registry.rb +36 -0
- data/lib/language_operator/dsl/shell.rb +125 -0
- data/lib/language_operator/dsl/tool_definition.rb +112 -0
- data/lib/language_operator/dsl/webhook_authentication.rb +114 -0
- data/lib/language_operator/dsl/webhook_definition.rb +106 -0
- data/lib/language_operator/dsl/workflow_definition.rb +259 -0
- data/lib/language_operator/dsl.rb +160 -0
- data/lib/language_operator/errors.rb +60 -0
- data/lib/language_operator/kubernetes/client.rb +279 -0
- data/lib/language_operator/kubernetes/resource_builder.rb +194 -0
- data/lib/language_operator/loggable.rb +47 -0
- data/lib/language_operator/logger.rb +141 -0
- data/lib/language_operator/retry.rb +123 -0
- data/lib/language_operator/retryable.rb +132 -0
- data/lib/language_operator/tool_loader.rb +242 -0
- data/lib/language_operator/validators.rb +170 -0
- data/lib/language_operator/version.rb +1 -1
- data/lib/language_operator.rb +65 -3
- data/requirements/tasks/challenge.md +9 -0
- data/requirements/tasks/iterate.md +36 -0
- data/requirements/tasks/optimize.md +21 -0
- data/requirements/tasks/tag.md +5 -0
- data/test_agent_dsl.rb +108 -0
- metadata +503 -20
|
@@ -0,0 +1,895 @@
|
|
|
1
|
+
# Chat Endpoint Guide
|
|
2
|
+
|
|
3
|
+
Complete guide to exposing agents as OpenAI-compatible chat completion endpoints.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Overview](#overview)
|
|
8
|
+
- [Basic Configuration](#basic-configuration)
|
|
9
|
+
- [System Prompt](#system-prompt)
|
|
10
|
+
- [Model Parameters](#model-parameters)
|
|
11
|
+
- [API Endpoints](#api-endpoints)
|
|
12
|
+
- [Streaming Support](#streaming-support)
|
|
13
|
+
- [Authentication](#authentication)
|
|
14
|
+
- [Usage Examples](#usage-examples)
|
|
15
|
+
- [Integration with OpenAI SDK](#integration-with-openai-sdk)
|
|
16
|
+
- [Complete Examples](#complete-examples)
|
|
17
|
+
- [Best Practices](#best-practices)
|
|
18
|
+
|
|
19
|
+
## Overview
|
|
20
|
+
|
|
21
|
+
Language Operator agents can expose OpenAI-compatible chat completion endpoints, allowing them to be used as drop-in replacements for LLM APIs in existing applications.
|
|
22
|
+
|
|
23
|
+
### What is a Chat Endpoint?
|
|
24
|
+
|
|
25
|
+
A chat endpoint transforms an agent into an API-compatible language model that:
|
|
26
|
+
- Accepts OpenAI-format chat completion requests
|
|
27
|
+
- Supports both streaming and non-streaming responses
|
|
28
|
+
- Provides model listing via `/v1/models`
|
|
29
|
+
- Returns usage statistics (token counts)
|
|
30
|
+
- Works with existing OpenAI SDKs and tools
|
|
31
|
+
|
|
32
|
+
### Use Cases
|
|
33
|
+
|
|
34
|
+
- **Domain-specific models**: Create specialized "models" for specific tasks
|
|
35
|
+
- **Agent as a service**: Expose agents to other applications
|
|
36
|
+
- **LLM proxy**: Add custom logic, caching, or rate limiting
|
|
37
|
+
- **Testing**: Use agents as mock LLM endpoints
|
|
38
|
+
- **Integration**: Connect agents to LangChain, AutoGPT, etc.
|
|
39
|
+
|
|
40
|
+
## Basic Configuration
|
|
41
|
+
|
|
42
|
+
Define a chat endpoint using the `as_chat_endpoint` block:
|
|
43
|
+
|
|
44
|
+
```ruby
|
|
45
|
+
agent "github-expert" do
|
|
46
|
+
description "GitHub API and workflow expert"
|
|
47
|
+
mode :reactive
|
|
48
|
+
|
|
49
|
+
as_chat_endpoint do
|
|
50
|
+
system_prompt "You are a GitHub expert assistant"
|
|
51
|
+
temperature 0.7
|
|
52
|
+
max_tokens 2000
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Key points:**
|
|
58
|
+
- Agent automatically switches to `:reactive` mode
|
|
59
|
+
- Endpoints are automatically created at `/v1/chat/completions` and `/v1/models`
|
|
60
|
+
- Agent processes chat messages and returns completions
|
|
61
|
+
- Works with existing OpenAI client libraries
|
|
62
|
+
|
|
63
|
+
## System Prompt
|
|
64
|
+
|
|
65
|
+
The system prompt defines the agent's behavior and expertise. It's prepended to every conversation.
|
|
66
|
+
|
|
67
|
+
### Basic System Prompt
|
|
68
|
+
|
|
69
|
+
```ruby
|
|
70
|
+
as_chat_endpoint do
|
|
71
|
+
system_prompt "You are a helpful customer service assistant"
|
|
72
|
+
end
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Detailed System Prompt
|
|
76
|
+
|
|
77
|
+
Use heredoc for multi-line prompts:
|
|
78
|
+
|
|
79
|
+
```ruby
|
|
80
|
+
as_chat_endpoint do
|
|
81
|
+
system_prompt <<~PROMPT
|
|
82
|
+
You are a GitHub expert assistant with deep knowledge of:
|
|
83
|
+
- GitHub API and workflows
|
|
84
|
+
- Pull requests, issues, and code review
|
|
85
|
+
- GitHub Actions and CI/CD
|
|
86
|
+
- Repository management and best practices
|
|
87
|
+
|
|
88
|
+
Provide helpful, accurate answers about GitHub topics.
|
|
89
|
+
Keep responses concise but informative.
|
|
90
|
+
PROMPT
|
|
91
|
+
end
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### System Prompt Best Practices
|
|
95
|
+
|
|
96
|
+
**Be specific about expertise:**
|
|
97
|
+
```ruby
|
|
98
|
+
system_prompt <<~PROMPT
|
|
99
|
+
You are a Kubernetes troubleshooting expert specializing in:
|
|
100
|
+
- Pod scheduling and resource issues
|
|
101
|
+
- Network policy debugging
|
|
102
|
+
- Storage and volume problems
|
|
103
|
+
- Performance optimization
|
|
104
|
+
PROMPT
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Include behavioral guidelines:**
|
|
108
|
+
```ruby
|
|
109
|
+
system_prompt <<~PROMPT
|
|
110
|
+
You are a financial analyst assistant.
|
|
111
|
+
|
|
112
|
+
Guidelines:
|
|
113
|
+
- Base all analysis on factual data
|
|
114
|
+
- Clearly distinguish facts from interpretations
|
|
115
|
+
- Use industry-standard terminology
|
|
116
|
+
- Never provide investment advice
|
|
117
|
+
- Always cite sources when referencing data
|
|
118
|
+
PROMPT
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Set tone and style:**
|
|
122
|
+
```ruby
|
|
123
|
+
system_prompt <<~PROMPT
|
|
124
|
+
You are a friendly technical support agent.
|
|
125
|
+
|
|
126
|
+
Communication style:
|
|
127
|
+
- Use clear, simple language
|
|
128
|
+
- Be patient and encouraging
|
|
129
|
+
- Provide step-by-step instructions
|
|
130
|
+
- Offer to clarify if anything is unclear
|
|
131
|
+
PROMPT
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Model Parameters
|
|
135
|
+
|
|
136
|
+
Configure LLM behavior with standard OpenAI parameters.
|
|
137
|
+
|
|
138
|
+
### Temperature
|
|
139
|
+
|
|
140
|
+
Controls randomness in responses (0.0 - 2.0):
|
|
141
|
+
|
|
142
|
+
```ruby
|
|
143
|
+
as_chat_endpoint do
|
|
144
|
+
temperature 0.7 # Balanced creativity and consistency
|
|
145
|
+
end
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
**Guidelines:**
|
|
149
|
+
- `0.0` - Deterministic, focused responses (good for factual tasks)
|
|
150
|
+
- `0.5-0.7` - Balanced (default for most use cases)
|
|
151
|
+
- `1.0+` - More creative and varied (good for brainstorming)
|
|
152
|
+
|
|
153
|
+
### Max Tokens
|
|
154
|
+
|
|
155
|
+
Maximum tokens in the response:
|
|
156
|
+
|
|
157
|
+
```ruby
|
|
158
|
+
as_chat_endpoint do
|
|
159
|
+
max_tokens 2000 # Limit response length
|
|
160
|
+
end
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Guidelines:**
|
|
164
|
+
- Set based on expected response length
|
|
165
|
+
- Consider cost implications
|
|
166
|
+
- Default: 2000 tokens
|
|
167
|
+
|
|
168
|
+
### Top P (Nucleus Sampling)
|
|
169
|
+
|
|
170
|
+
Alternative to temperature for controlling randomness (0.0 - 1.0):
|
|
171
|
+
|
|
172
|
+
```ruby
|
|
173
|
+
as_chat_endpoint do
|
|
174
|
+
top_p 0.9 # Consider top 90% probability mass
|
|
175
|
+
end
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Note:** Use either `temperature` or `top_p`, not both.
|
|
179
|
+
|
|
180
|
+
### Frequency Penalty
|
|
181
|
+
|
|
182
|
+
Reduces repetition of token sequences (-2.0 to 2.0):
|
|
183
|
+
|
|
184
|
+
```ruby
|
|
185
|
+
as_chat_endpoint do
|
|
186
|
+
frequency_penalty 0.5 # Discourage repetition
|
|
187
|
+
end
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
**Guidelines:**
|
|
191
|
+
- `0.0` - No penalty (default)
|
|
192
|
+
- `0.5-1.0` - Moderate reduction in repetition
|
|
193
|
+
- Higher values: Stronger penalty against repetition
|
|
194
|
+
|
|
195
|
+
### Presence Penalty
|
|
196
|
+
|
|
197
|
+
Encourages talking about new topics (-2.0 to 2.0):
|
|
198
|
+
|
|
199
|
+
```ruby
|
|
200
|
+
as_chat_endpoint do
|
|
201
|
+
presence_penalty 0.6 # Encourage topic diversity
|
|
202
|
+
end
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### Stop Sequences
|
|
206
|
+
|
|
207
|
+
Tokens that stop generation:
|
|
208
|
+
|
|
209
|
+
```ruby
|
|
210
|
+
as_chat_endpoint do
|
|
211
|
+
stop ["\n\n", "END", "###"] # Stop on these sequences
|
|
212
|
+
end
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
### Model Name
|
|
216
|
+
|
|
217
|
+
Custom model identifier returned in API responses:
|
|
218
|
+
|
|
219
|
+
```ruby
|
|
220
|
+
as_chat_endpoint do
|
|
221
|
+
model "github-expert-v1" # Custom model name
|
|
222
|
+
end
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**Default:** Agent name (e.g., `"github-expert"`)
|
|
226
|
+
|
|
227
|
+
### Complete Parameter Configuration
|
|
228
|
+
|
|
229
|
+
```ruby
|
|
230
|
+
as_chat_endpoint do
|
|
231
|
+
system_prompt "You are a helpful assistant"
|
|
232
|
+
|
|
233
|
+
# Model identification
|
|
234
|
+
model "my-custom-model-v1"
|
|
235
|
+
|
|
236
|
+
# Sampling parameters
|
|
237
|
+
temperature 0.7
|
|
238
|
+
top_p 0.9
|
|
239
|
+
|
|
240
|
+
# Length controls
|
|
241
|
+
max_tokens 2000
|
|
242
|
+
stop ["\n\n\n"]
|
|
243
|
+
|
|
244
|
+
# Repetition controls
|
|
245
|
+
frequency_penalty 0.5
|
|
246
|
+
presence_penalty 0.6
|
|
247
|
+
end
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
## API Endpoints
|
|
251
|
+
|
|
252
|
+
Chat endpoints expose OpenAI-compatible HTTP endpoints.
|
|
253
|
+
|
|
254
|
+
### POST /v1/chat/completions
|
|
255
|
+
|
|
256
|
+
Chat completion endpoint (streaming and non-streaming).
|
|
257
|
+
|
|
258
|
+
**Request format:**
|
|
259
|
+
```json
|
|
260
|
+
{
|
|
261
|
+
"model": "github-expert-v1",
|
|
262
|
+
"messages": [
|
|
263
|
+
{"role": "user", "content": "How do I create a pull request?"}
|
|
264
|
+
],
|
|
265
|
+
"stream": false
|
|
266
|
+
}
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
**Response format (non-streaming):**
|
|
270
|
+
```json
|
|
271
|
+
{
|
|
272
|
+
"id": "chatcmpl-123",
|
|
273
|
+
"object": "chat.completion",
|
|
274
|
+
"created": 1677652288,
|
|
275
|
+
"model": "github-expert-v1",
|
|
276
|
+
"choices": [{
|
|
277
|
+
"index": 0,
|
|
278
|
+
"message": {
|
|
279
|
+
"role": "assistant",
|
|
280
|
+
"content": "To create a pull request on GitHub..."
|
|
281
|
+
},
|
|
282
|
+
"finish_reason": "stop"
|
|
283
|
+
}],
|
|
284
|
+
"usage": {
|
|
285
|
+
"prompt_tokens": 15,
|
|
286
|
+
"completion_tokens": 45,
|
|
287
|
+
"total_tokens": 60
|
|
288
|
+
}
|
|
289
|
+
}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
### GET /v1/models
|
|
293
|
+
|
|
294
|
+
List available models.
|
|
295
|
+
|
|
296
|
+
**Request:**
|
|
297
|
+
```bash
|
|
298
|
+
GET /v1/models
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
**Response:**
|
|
302
|
+
```json
|
|
303
|
+
{
|
|
304
|
+
"object": "list",
|
|
305
|
+
"data": [
|
|
306
|
+
{
|
|
307
|
+
"id": "github-expert-v1",
|
|
308
|
+
"object": "model",
|
|
309
|
+
"created": 1677652288,
|
|
310
|
+
"owned_by": "language-operator"
|
|
311
|
+
}
|
|
312
|
+
]
|
|
313
|
+
}
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
### Health Check Endpoints
|
|
317
|
+
|
|
318
|
+
**GET /health** - Health check
|
|
319
|
+
```bash
|
|
320
|
+
curl http://localhost:8080/health
|
|
321
|
+
# Returns: {"status":"healthy"}
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**GET /ready** - Readiness check
|
|
325
|
+
```bash
|
|
326
|
+
curl http://localhost:8080/ready
|
|
327
|
+
# Returns: {"status":"ready"}
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
## Streaming Support
|
|
331
|
+
|
|
332
|
+
Chat endpoints support Server-Sent Events (SSE) for streaming responses.
|
|
333
|
+
|
|
334
|
+
### Enabling Streaming
|
|
335
|
+
|
|
336
|
+
Set `stream: true` in the request:
|
|
337
|
+
|
|
338
|
+
```bash
|
|
339
|
+
curl -N -X POST http://localhost:8080/v1/chat/completions \
|
|
340
|
+
-H "Content-Type: application/json" \
|
|
341
|
+
-d '{
|
|
342
|
+
"model": "github-expert-v1",
|
|
343
|
+
"messages": [{"role": "user", "content": "Explain GitHub Actions"}],
|
|
344
|
+
"stream": true
|
|
345
|
+
}'
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
### Streaming Response Format
|
|
349
|
+
|
|
350
|
+
Responses are sent as SSE events:
|
|
351
|
+
|
|
352
|
+
```
|
|
353
|
+
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"github-expert-v1","choices":[{"index":0,"delta":{"content":"To"},"finish_reason":null}]}
|
|
354
|
+
|
|
355
|
+
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"github-expert-v1","choices":[{"index":0,"delta":{"content":" create"},"finish_reason":null}]}
|
|
356
|
+
|
|
357
|
+
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"github-expert-v1","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}
|
|
358
|
+
|
|
359
|
+
...
|
|
360
|
+
|
|
361
|
+
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"github-expert-v1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
|
|
362
|
+
|
|
363
|
+
data: [DONE]
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
**Key points:**
|
|
367
|
+
- Each chunk contains a delta with new content
|
|
368
|
+
- Final chunk includes `finish_reason: "stop"`
|
|
369
|
+
- Stream ends with `data: [DONE]`
|
|
370
|
+
|
|
371
|
+
### Streaming vs Non-Streaming
|
|
372
|
+
|
|
373
|
+
**Non-streaming (default):**
|
|
374
|
+
- Complete response returned at once
|
|
375
|
+
- Simpler to consume
|
|
376
|
+
- Higher perceived latency
|
|
377
|
+
- Better for batch processing
|
|
378
|
+
|
|
379
|
+
**Streaming (`stream: true`):**
|
|
380
|
+
- Response sent incrementally
|
|
381
|
+
- Lower perceived latency
|
|
382
|
+
- Better user experience
|
|
383
|
+
- More complex to consume
|
|
384
|
+
|
|
385
|
+
## Authentication
|
|
386
|
+
|
|
387
|
+
While the chat endpoint examples above don't show authentication, you can combine chat endpoints with webhooks for authentication:
|
|
388
|
+
|
|
389
|
+
```ruby
|
|
390
|
+
agent "secure-chat-agent" do
|
|
391
|
+
mode :reactive
|
|
392
|
+
|
|
393
|
+
# Chat endpoint
|
|
394
|
+
as_chat_endpoint do
|
|
395
|
+
system_prompt "You are a helpful assistant"
|
|
396
|
+
temperature 0.7
|
|
397
|
+
end
|
|
398
|
+
|
|
399
|
+
# Webhook for custom routes (can add auth)
|
|
400
|
+
webhook "/authenticated" do
|
|
401
|
+
method :post
|
|
402
|
+
|
|
403
|
+
authenticate do
|
|
404
|
+
verify_api_key(
|
|
405
|
+
header: 'X-API-Key',
|
|
406
|
+
secret: ENV['API_KEY']
|
|
407
|
+
)
|
|
408
|
+
end
|
|
409
|
+
|
|
410
|
+
on_request do |context|
|
|
411
|
+
# Custom authenticated logic
|
|
412
|
+
end
|
|
413
|
+
end
|
|
414
|
+
end
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**Note:** Standard OpenAI SDK clients expect the `/v1/chat/completions` endpoint. For production deployments, add authentication at the infrastructure level (API gateway, ingress controller, etc.).
|
|
418
|
+
|
|
419
|
+
## Usage Examples
|
|
420
|
+
|
|
421
|
+
### Using curl (Non-streaming)
|
|
422
|
+
|
|
423
|
+
```bash
|
|
424
|
+
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
425
|
+
-H "Content-Type: application/json" \
|
|
426
|
+
-d '{
|
|
427
|
+
"model": "github-expert-v1",
|
|
428
|
+
"messages": [
|
|
429
|
+
{"role": "user", "content": "How do I create a pull request?"}
|
|
430
|
+
]
|
|
431
|
+
}'
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
### Using curl (Streaming)
|
|
435
|
+
|
|
436
|
+
```bash
|
|
437
|
+
curl -N -X POST http://localhost:8080/v1/chat/completions \
|
|
438
|
+
-H "Content-Type: application/json" \
|
|
439
|
+
-d '{
|
|
440
|
+
"model": "github-expert-v1",
|
|
441
|
+
"messages": [
|
|
442
|
+
{"role": "user", "content": "Explain GitHub Actions"}
|
|
443
|
+
],
|
|
444
|
+
"stream": true
|
|
445
|
+
}'
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
### Multi-turn Conversation
|
|
449
|
+
|
|
450
|
+
```bash
|
|
451
|
+
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
452
|
+
-H "Content-Type: application/json" \
|
|
453
|
+
-d '{
|
|
454
|
+
"model": "github-expert-v1",
|
|
455
|
+
"messages": [
|
|
456
|
+
{"role": "user", "content": "What is a pull request?"},
|
|
457
|
+
{"role": "assistant", "content": "A pull request is a way to propose changes..."},
|
|
458
|
+
{"role": "user", "content": "How do I review one?"}
|
|
459
|
+
]
|
|
460
|
+
}'
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
### With System Message
|
|
464
|
+
|
|
465
|
+
```bash
|
|
466
|
+
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
467
|
+
-H "Content-Type: application/json" \
|
|
468
|
+
-d '{
|
|
469
|
+
"model": "github-expert-v1",
|
|
470
|
+
"messages": [
|
|
471
|
+
{"role": "system", "content": "You are an expert in GitHub Actions."},
|
|
472
|
+
{"role": "user", "content": "How do I set up CI/CD?"}
|
|
473
|
+
]
|
|
474
|
+
}'
|
|
475
|
+
```
|
|
476
|
+
|
|
477
|
+
## Integration with OpenAI SDK
|
|
478
|
+
|
|
479
|
+
Chat endpoints are compatible with OpenAI client libraries.
|
|
480
|
+
|
|
481
|
+
### Python
|
|
482
|
+
|
|
483
|
+
```python
|
|
484
|
+
from openai import OpenAI
|
|
485
|
+
|
|
486
|
+
# Point client to your agent
|
|
487
|
+
client = OpenAI(
|
|
488
|
+
api_key="not-needed", # Not used, but required by SDK
|
|
489
|
+
base_url="http://localhost:8080/v1"
|
|
490
|
+
)
|
|
491
|
+
|
|
492
|
+
# Non-streaming
|
|
493
|
+
response = client.chat.completions.create(
|
|
494
|
+
model="github-expert-v1",
|
|
495
|
+
messages=[
|
|
496
|
+
{"role": "user", "content": "How do I create a PR?"}
|
|
497
|
+
]
|
|
498
|
+
)
|
|
499
|
+
print(response.choices[0].message.content)
|
|
500
|
+
|
|
501
|
+
# Streaming
|
|
502
|
+
stream = client.chat.completions.create(
|
|
503
|
+
model="github-expert-v1",
|
|
504
|
+
messages=[
|
|
505
|
+
{"role": "user", "content": "Explain GitHub Actions"}
|
|
506
|
+
],
|
|
507
|
+
stream=True
|
|
508
|
+
)
|
|
509
|
+
|
|
510
|
+
for chunk in stream:
|
|
511
|
+
if chunk.choices[0].delta.content:
|
|
512
|
+
print(chunk.choices[0].delta.content, end="")
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
### JavaScript/TypeScript
|
|
516
|
+
|
|
517
|
+
```javascript
|
|
518
|
+
import OpenAI from 'openai';
|
|
519
|
+
|
|
520
|
+
const client = new OpenAI({
|
|
521
|
+
apiKey: 'not-needed',
|
|
522
|
+
baseURL: 'http://localhost:8080/v1',
|
|
523
|
+
});
|
|
524
|
+
|
|
525
|
+
// Non-streaming
|
|
526
|
+
const response = await client.chat.completions.create({
|
|
527
|
+
model: 'github-expert-v1',
|
|
528
|
+
messages: [
|
|
529
|
+
{ role: 'user', content: 'How do I create a PR?' }
|
|
530
|
+
],
|
|
531
|
+
});
|
|
532
|
+
console.log(response.choices[0].message.content);
|
|
533
|
+
|
|
534
|
+
// Streaming
|
|
535
|
+
const stream = await client.chat.completions.create({
|
|
536
|
+
model: 'github-expert-v1',
|
|
537
|
+
messages: [
|
|
538
|
+
{ role: 'user', content: 'Explain GitHub Actions' }
|
|
539
|
+
],
|
|
540
|
+
stream: true,
|
|
541
|
+
});
|
|
542
|
+
|
|
543
|
+
for await (const chunk of stream) {
|
|
544
|
+
if (chunk.choices[0]?.delta?.content) {
|
|
545
|
+
process.stdout.write(chunk.choices[0].delta.content);
|
|
546
|
+
}
|
|
547
|
+
}
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
### Ruby
|
|
551
|
+
|
|
552
|
+
```ruby
|
|
553
|
+
require 'openai'
|
|
554
|
+
|
|
555
|
+
client = OpenAI::Client.new(
|
|
556
|
+
access_token: "not-needed",
|
|
557
|
+
uri_base: "http://localhost:8080/v1/"
|
|
558
|
+
)
|
|
559
|
+
|
|
560
|
+
# Non-streaming
|
|
561
|
+
response = client.chat(
|
|
562
|
+
parameters: {
|
|
563
|
+
model: "github-expert-v1",
|
|
564
|
+
messages: [
|
|
565
|
+
{ role: "user", content: "How do I create a PR?" }
|
|
566
|
+
]
|
|
567
|
+
}
|
|
568
|
+
)
|
|
569
|
+
puts response.dig("choices", 0, "message", "content")
|
|
570
|
+
|
|
571
|
+
# Streaming
|
|
572
|
+
client.chat(
|
|
573
|
+
parameters: {
|
|
574
|
+
model: "github-expert-v1",
|
|
575
|
+
messages: [
|
|
576
|
+
{ role: "user", content: "Explain GitHub Actions" }
|
|
577
|
+
],
|
|
578
|
+
stream: proc do |chunk, _bytesize|
|
|
579
|
+
print chunk.dig("choices", 0, "delta", "content")
|
|
580
|
+
end
|
|
581
|
+
}
|
|
582
|
+
)
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
### LangChain Integration
|
|
586
|
+
|
|
587
|
+
```python
|
|
588
|
+
from langchain_openai import ChatOpenAI
|
|
589
|
+
|
|
590
|
+
# Use agent as LangChain LLM
|
|
591
|
+
llm = ChatOpenAI(
|
|
592
|
+
model="github-expert-v1",
|
|
593
|
+
openai_api_key="not-needed",
|
|
594
|
+
openai_api_base="http://localhost:8080/v1"
|
|
595
|
+
)
|
|
596
|
+
|
|
597
|
+
# Use in LangChain chains
|
|
598
|
+
from langchain.chains import LLMChain
|
|
599
|
+
from langchain.prompts import PromptTemplate
|
|
600
|
+
|
|
601
|
+
prompt = PromptTemplate(
|
|
602
|
+
input_variables=["topic"],
|
|
603
|
+
template="Explain {topic} in GitHub"
|
|
604
|
+
)
|
|
605
|
+
|
|
606
|
+
chain = LLMChain(llm=llm, prompt=prompt)
|
|
607
|
+
result = chain.run(topic="pull requests")
|
|
608
|
+
print(result)
|
|
609
|
+
```
|
|
610
|
+
|
|
611
|
+
## Complete Examples
|
|
612
|
+
|
|
613
|
+
### GitHub Expert Agent
|
|
614
|
+
|
|
615
|
+
```ruby
|
|
616
|
+
agent "github-expert" do
|
|
617
|
+
description "GitHub API and workflow expert"
|
|
618
|
+
mode :reactive
|
|
619
|
+
|
|
620
|
+
as_chat_endpoint do
|
|
621
|
+
system_prompt <<~PROMPT
|
|
622
|
+
You are a GitHub expert assistant with deep knowledge of:
|
|
623
|
+
- GitHub API and workflows
|
|
624
|
+
- Pull requests, issues, and code review
|
|
625
|
+
- GitHub Actions and CI/CD
|
|
626
|
+
- Repository management and best practices
|
|
627
|
+
|
|
628
|
+
Provide helpful, accurate answers about GitHub topics.
|
|
629
|
+
Keep responses concise but informative.
|
|
630
|
+
PROMPT
|
|
631
|
+
|
|
632
|
+
model "github-expert-v1"
|
|
633
|
+
temperature 0.7
|
|
634
|
+
max_tokens 2000
|
|
635
|
+
end
|
|
636
|
+
|
|
637
|
+
constraints do
|
|
638
|
+
timeout '30s'
|
|
639
|
+
requests_per_minute 30
|
|
640
|
+
daily_budget 1000 # $10/day
|
|
641
|
+
end
|
|
642
|
+
end
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
### Customer Support Agent
|
|
646
|
+
|
|
647
|
+
```ruby
|
|
648
|
+
agent "customer-support" do
|
|
649
|
+
description "Friendly customer support assistant"
|
|
650
|
+
mode :reactive
|
|
651
|
+
|
|
652
|
+
as_chat_endpoint do
|
|
653
|
+
system_prompt <<~PROMPT
|
|
654
|
+
You are a friendly customer support representative.
|
|
655
|
+
|
|
656
|
+
Guidelines:
|
|
657
|
+
- Be empathetic and understanding
|
|
658
|
+
- Provide clear, step-by-step solutions
|
|
659
|
+
- Ask clarifying questions when needed
|
|
660
|
+
- Escalate to human support for complex issues
|
|
661
|
+
- Always maintain a professional, helpful tone
|
|
662
|
+
|
|
663
|
+
Available topics:
|
|
664
|
+
- Account management
|
|
665
|
+
- Billing and payments
|
|
666
|
+
- Technical troubleshooting
|
|
667
|
+
- Product features and usage
|
|
668
|
+
PROMPT
|
|
669
|
+
|
|
670
|
+
model "support-assistant-v1"
|
|
671
|
+
temperature 0.8 # Slightly more conversational
|
|
672
|
+
max_tokens 1500
|
|
673
|
+
presence_penalty 0.6 # Encourage topic variety
|
|
674
|
+
end
|
|
675
|
+
|
|
676
|
+
constraints do
|
|
677
|
+
timeout '15s' # Quick responses for support
|
|
678
|
+
requests_per_minute 60
|
|
679
|
+
hourly_budget 500
|
|
680
|
+
daily_budget 5000
|
|
681
|
+
|
|
682
|
+
# Safety
|
|
683
|
+
blocked_topics ['violence', 'hate-speech']
|
|
684
|
+
end
|
|
685
|
+
end
|
|
686
|
+
```
|
|
687
|
+
|
|
688
|
+
### Technical Documentation Assistant
|
|
689
|
+
|
|
690
|
+
```ruby
|
|
691
|
+
agent "docs-assistant" do
|
|
692
|
+
description "Technical documentation expert"
|
|
693
|
+
mode :reactive
|
|
694
|
+
|
|
695
|
+
as_chat_endpoint do
|
|
696
|
+
system_prompt <<~PROMPT
|
|
697
|
+
You are a technical documentation assistant specializing in API documentation.
|
|
698
|
+
|
|
699
|
+
Your expertise:
|
|
700
|
+
- REST API design and documentation
|
|
701
|
+
- OpenAPI/Swagger specifications
|
|
702
|
+
- Authentication and authorization patterns
|
|
703
|
+
- Rate limiting and pagination
|
|
704
|
+
- Error handling best practices
|
|
705
|
+
|
|
706
|
+
When answering:
|
|
707
|
+
- Provide code examples when relevant
|
|
708
|
+
- Explain concepts clearly with examples
|
|
709
|
+
- Reference industry standards (REST, OpenAPI, etc.)
|
|
710
|
+
- Include best practices and gotchas
|
|
711
|
+
- Format responses with proper markdown
|
|
712
|
+
PROMPT
|
|
713
|
+
|
|
714
|
+
model "docs-expert-v1"
|
|
715
|
+
temperature 0.5 # More consistent/factual
|
|
716
|
+
max_tokens 3000 # Longer for detailed explanations
|
|
717
|
+
frequency_penalty 0.3 # Reduce repetition in docs
|
|
718
|
+
end
|
|
719
|
+
|
|
720
|
+
constraints do
|
|
721
|
+
timeout '45s'
|
|
722
|
+
requests_per_minute 20
|
|
723
|
+
daily_budget 2000
|
|
724
|
+
end
|
|
725
|
+
end
|
|
726
|
+
```
|
|
727
|
+
|
|
728
|
+
### Code Review Assistant
|
|
729
|
+
|
|
730
|
+
```ruby
|
|
731
|
+
agent "code-reviewer" do
|
|
732
|
+
description "Automated code review assistant"
|
|
733
|
+
mode :reactive
|
|
734
|
+
|
|
735
|
+
as_chat_endpoint do
|
|
736
|
+
system_prompt <<~PROMPT
|
|
737
|
+
You are a senior software engineer conducting code reviews.
|
|
738
|
+
|
|
739
|
+
Focus areas:
|
|
740
|
+
- Code correctness and logic errors
|
|
741
|
+
- Security vulnerabilities
|
|
742
|
+
- Performance issues
|
|
743
|
+
- Code style and best practices
|
|
744
|
+
- Test coverage
|
|
745
|
+
- Documentation quality
|
|
746
|
+
|
|
747
|
+
Review approach:
|
|
748
|
+
- Be constructive and specific
|
|
749
|
+
- Explain the "why" behind suggestions
|
|
750
|
+
- Prioritize issues by severity
|
|
751
|
+
- Suggest concrete improvements
|
|
752
|
+
- Acknowledge good practices
|
|
753
|
+
|
|
754
|
+
Format reviews with:
|
|
755
|
+
- Summary of overall code quality
|
|
756
|
+
- Specific issues with line references
|
|
757
|
+
- Suggested improvements
|
|
758
|
+
- Security concerns (if any)
|
|
759
|
+
PROMPT
|
|
760
|
+
|
|
761
|
+
model "code-reviewer-v1"
|
|
762
|
+
temperature 0.3 # Consistent, focused reviews
|
|
763
|
+
max_tokens 4000 # Detailed reviews
|
|
764
|
+
frequency_penalty 0.5 # Avoid repetitive comments
|
|
765
|
+
end
|
|
766
|
+
|
|
767
|
+
constraints do
|
|
768
|
+
timeout '1m' # Allow time for thorough review
|
|
769
|
+
requests_per_hour 50
|
|
770
|
+
daily_budget 3000
|
|
771
|
+
end
|
|
772
|
+
end
|
|
773
|
+
```
|
|
774
|
+
|
|
775
|
+
### SQL Query Helper
|
|
776
|
+
|
|
777
|
+
```ruby
|
|
778
|
+
agent "sql-helper" do
|
|
779
|
+
description "SQL query assistance and optimization"
|
|
780
|
+
mode :reactive
|
|
781
|
+
|
|
782
|
+
as_chat_endpoint do
|
|
783
|
+
system_prompt <<~PROMPT
|
|
784
|
+
You are a database expert specializing in SQL query writing and optimization.
|
|
785
|
+
|
|
786
|
+
Expertise:
|
|
787
|
+
- SQL syntax (PostgreSQL, MySQL, SQLite)
|
|
788
|
+
- Query optimization and performance
|
|
789
|
+
- Index design
|
|
790
|
+
- Join strategies
|
|
791
|
+
- Aggregation and window functions
|
|
792
|
+
- Common table expressions (CTEs)
|
|
793
|
+
|
|
794
|
+
When helping:
|
|
795
|
+
- Write clean, readable SQL
|
|
796
|
+
- Explain query logic
|
|
797
|
+
- Suggest optimizations
|
|
798
|
+
- Warn about performance pitfalls
|
|
799
|
+
- Include comments in complex queries
|
|
800
|
+
- Consider different SQL dialects
|
|
801
|
+
PROMPT
|
|
802
|
+
|
|
803
|
+
model "sql-expert-v1"
|
|
804
|
+
temperature 0.4 # Precise for SQL
|
|
805
|
+
max_tokens 2500
|
|
806
|
+
stop ["```\n\n"] # Stop after code block
|
|
807
|
+
end
|
|
808
|
+
|
|
809
|
+
constraints do
|
|
810
|
+
timeout '30s'
|
|
811
|
+
requests_per_minute 40
|
|
812
|
+
daily_budget 1500
|
|
813
|
+
end
|
|
814
|
+
end
|
|
815
|
+
```
|
|
816
|
+
|
|
817
|
+
## Best Practices
|
|
818
|
+
|
|
819
|
+
### System Prompt Design
|
|
820
|
+
|
|
821
|
+
1. **Be specific about expertise** - Define clear areas of knowledge
|
|
822
|
+
2. **Include guidelines** - Specify how the agent should respond
|
|
823
|
+
3. **Set boundaries** - Define what the agent should/shouldn't do
|
|
824
|
+
4. **Provide context** - Explain the agent's role and purpose
|
|
825
|
+
5. **Use examples** - Show expected behavior in the prompt
|
|
826
|
+
|
|
827
|
+
### Parameter Tuning
|
|
828
|
+
|
|
829
|
+
1. **Temperature**
|
|
830
|
+
- Lower (0.0-0.3) for factual, consistent responses
|
|
831
|
+
- Medium (0.5-0.7) for balanced interactions
|
|
832
|
+
- Higher (0.8-1.0) for creative, varied responses
|
|
833
|
+
|
|
834
|
+
2. **Max Tokens**
|
|
835
|
+
- Set based on expected response length
|
|
836
|
+
- Consider cost implications
|
|
837
|
+
- Balance between completeness and efficiency
|
|
838
|
+
|
|
839
|
+
3. **Penalties**
|
|
840
|
+
- Use `frequency_penalty` to reduce repetition
|
|
841
|
+
- Use `presence_penalty` to encourage topic diversity
|
|
842
|
+
- Start low (0.0-0.5) and adjust based on behavior
|
|
843
|
+
|
|
844
|
+
### Performance
|
|
845
|
+
|
|
846
|
+
1. **Set appropriate timeouts** - Balance thoroughness and responsiveness
|
|
847
|
+
2. **Use streaming for long responses** - Better user experience
|
|
848
|
+
3. **Cache responses** - When appropriate for repeated queries
|
|
849
|
+
4. **Monitor token usage** - Track costs and optimize prompts
|
|
850
|
+
|
|
851
|
+
### Cost Management
|
|
852
|
+
|
|
853
|
+
1. **Set budget constraints** - Use `daily_budget` and `hourly_budget`
|
|
854
|
+
2. **Limit max_tokens** - Prevent unexpectedly long responses
|
|
855
|
+
3. **Monitor usage** - Track requests and token consumption
|
|
856
|
+
4. **Optimize prompts** - Shorter system prompts reduce costs
|
|
857
|
+
|
|
858
|
+
```ruby
|
|
859
|
+
constraints do
|
|
860
|
+
hourly_budget 100 # $1/hour
|
|
861
|
+
daily_budget 1000 # $10/day
|
|
862
|
+
requests_per_minute 30
|
|
863
|
+
end
|
|
864
|
+
```
|
|
865
|
+
|
|
866
|
+
### Security
|
|
867
|
+
|
|
868
|
+
1. **Don't expose credentials** - Never include API keys in prompts
|
|
869
|
+
2. **Validate inputs** - Sanitize user messages
|
|
870
|
+
3. **Filter outputs** - Use `blocked_patterns` for PII
|
|
871
|
+
4. **Add authentication** - Use API gateway or webhook auth
|
|
872
|
+
5. **Rate limit** - Prevent abuse with `requests_per_minute`
|
|
873
|
+
|
|
874
|
+
### Testing
|
|
875
|
+
|
|
876
|
+
1. **Test with OpenAI SDK** - Verify compatibility
|
|
877
|
+
2. **Test streaming** - Ensure SSE works correctly
|
|
878
|
+
3. **Test error cases** - Handle malformed requests
|
|
879
|
+
4. **Test conversation history** - Multi-turn interactions
|
|
880
|
+
5. **Load test** - Verify performance under load
|
|
881
|
+
|
|
882
|
+
### Monitoring
|
|
883
|
+
|
|
884
|
+
1. **Track usage metrics** - Requests, tokens, costs
|
|
885
|
+
2. **Monitor latency** - Response time distribution
|
|
886
|
+
3. **Log errors** - Capture and analyze failures
|
|
887
|
+
4. **Monitor quality** - Track user feedback
|
|
888
|
+
5. **Alert on anomalies** - Unusual usage patterns
|
|
889
|
+
|
|
890
|
+
## See Also
|
|
891
|
+
|
|
892
|
+
- [Agent Reference](agent-reference.md) - Complete agent DSL reference
|
|
893
|
+
- [MCP Integration](mcp-integration.md) - Tool server capabilities
|
|
894
|
+
- [Webhooks](webhooks.md) - Reactive agent configuration
|
|
895
|
+
- [Best Practices](best-practices.md) - Production deployment patterns
|