loki-mode 4.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +691 -0
- package/SKILL.md +191 -0
- package/VERSION +1 -0
- package/autonomy/.loki/dashboard/index.html +2634 -0
- package/autonomy/CONSTITUTION.md +508 -0
- package/autonomy/README.md +201 -0
- package/autonomy/config.example.yaml +152 -0
- package/autonomy/loki +526 -0
- package/autonomy/run.sh +3636 -0
- package/bin/loki-mode.js +26 -0
- package/bin/postinstall.js +60 -0
- package/docs/ACKNOWLEDGEMENTS.md +234 -0
- package/docs/COMPARISON.md +325 -0
- package/docs/COMPETITIVE-ANALYSIS.md +333 -0
- package/docs/INSTALLATION.md +547 -0
- package/docs/auto-claude-comparison.md +276 -0
- package/docs/cursor-comparison.md +225 -0
- package/docs/dashboard-guide.md +355 -0
- package/docs/screenshots/README.md +149 -0
- package/docs/screenshots/dashboard-agents.png +0 -0
- package/docs/screenshots/dashboard-tasks.png +0 -0
- package/docs/thick2thin.md +173 -0
- package/package.json +48 -0
- package/references/advanced-patterns.md +453 -0
- package/references/agent-types.md +243 -0
- package/references/agents.md +1043 -0
- package/references/business-ops.md +550 -0
- package/references/competitive-analysis.md +216 -0
- package/references/confidence-routing.md +371 -0
- package/references/core-workflow.md +275 -0
- package/references/cursor-learnings.md +207 -0
- package/references/deployment.md +604 -0
- package/references/lab-research-patterns.md +534 -0
- package/references/mcp-integration.md +186 -0
- package/references/memory-system.md +467 -0
- package/references/openai-patterns.md +647 -0
- package/references/production-patterns.md +568 -0
- package/references/prompt-repetition.md +192 -0
- package/references/quality-control.md +437 -0
- package/references/sdlc-phases.md +410 -0
- package/references/task-queue.md +361 -0
- package/references/tool-orchestration.md +691 -0
- package/skills/00-index.md +120 -0
- package/skills/agents.md +249 -0
- package/skills/artifacts.md +174 -0
- package/skills/github-integration.md +218 -0
- package/skills/model-selection.md +125 -0
- package/skills/parallel-workflows.md +526 -0
- package/skills/patterns-advanced.md +188 -0
- package/skills/production.md +292 -0
- package/skills/quality-gates.md +180 -0
- package/skills/testing.md +149 -0
- package/skills/troubleshooting.md +109 -0
|
@@ -0,0 +1,647 @@
|
|
|
1
|
+
# OpenAI Agent Patterns Reference
|
|
2
|
+
|
|
3
|
+
Research-backed patterns from OpenAI's Agents SDK, Deep Research, and autonomous agent frameworks.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
OpenAI's agent ecosystem provides four key architectural innovations for Loki Mode:
|
|
10
|
+
|
|
11
|
+
1. **Tracing Spans** - Hierarchical event tracking with span types
|
|
12
|
+
2. **Guardrails & Tripwires** - Input/output validation with early termination
|
|
13
|
+
3. **Handoff Callbacks** - Data preparation during agent transfers
|
|
14
|
+
4. **Multi-Tiered Fallbacks** - Model and workflow-level failure recovery
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Tracing Spans Architecture
|
|
19
|
+
|
|
20
|
+
### Span Types (Agents SDK Pattern)
|
|
21
|
+
|
|
22
|
+
Every operation is wrapped in a typed span for observability:
|
|
23
|
+
|
|
24
|
+
```yaml
|
|
25
|
+
span_types:
|
|
26
|
+
agent_span:
|
|
27
|
+
- Wraps entire agent execution
|
|
28
|
+
- Contains: agent_name, instructions_hash, model
|
|
29
|
+
|
|
30
|
+
generation_span:
|
|
31
|
+
- Wraps LLM API calls
|
|
32
|
+
- Contains: model, tokens_in, tokens_out, latency_ms
|
|
33
|
+
|
|
34
|
+
function_span:
|
|
35
|
+
- Wraps tool/function calls
|
|
36
|
+
- Contains: function_name, arguments, result, success
|
|
37
|
+
|
|
38
|
+
guardrail_span:
|
|
39
|
+
- Wraps validation checks
|
|
40
|
+
- Contains: guardrail_name, triggered, blocking
|
|
41
|
+
|
|
42
|
+
handoff_span:
|
|
43
|
+
- Wraps agent-to-agent transfers
|
|
44
|
+
- Contains: from_agent, to_agent, context_passed
|
|
45
|
+
|
|
46
|
+
custom_span:
|
|
47
|
+
- User-defined operations
|
|
48
|
+
- Contains: operation_name, metadata
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Hierarchical Trace Structure
|
|
52
|
+
|
|
53
|
+
```json
|
|
54
|
+
{
|
|
55
|
+
"trace_id": "trace_abc123def456",
|
|
56
|
+
"workflow_name": "implement_feature",
|
|
57
|
+
"group_id": "session_xyz789",
|
|
58
|
+
"spans": [
|
|
59
|
+
{
|
|
60
|
+
"span_id": "span_001",
|
|
61
|
+
"parent_id": null,
|
|
62
|
+
"type": "agent_span",
|
|
63
|
+
"agent_name": "orchestrator",
|
|
64
|
+
"started_at": "2026-01-07T10:00:00Z",
|
|
65
|
+
"ended_at": "2026-01-07T10:05:00Z",
|
|
66
|
+
"children": ["span_002", "span_003"]
|
|
67
|
+
},
|
|
68
|
+
{
|
|
69
|
+
"span_id": "span_002",
|
|
70
|
+
"parent_id": "span_001",
|
|
71
|
+
"type": "guardrail_span",
|
|
72
|
+
"guardrail_name": "input_validation",
|
|
73
|
+
"triggered": false,
|
|
74
|
+
"blocking": true
|
|
75
|
+
},
|
|
76
|
+
{
|
|
77
|
+
"span_id": "span_003",
|
|
78
|
+
"parent_id": "span_001",
|
|
79
|
+
"type": "handoff_span",
|
|
80
|
+
"from_agent": "orchestrator",
|
|
81
|
+
"to_agent": "backend-dev",
|
|
82
|
+
"context_passed": ["task_spec", "related_files"]
|
|
83
|
+
}
|
|
84
|
+
]
|
|
85
|
+
}
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Storage Location
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
.loki/traces/
|
|
92
|
+
├── active/
|
|
93
|
+
│ └── {trace_id}.json # Currently running traces
|
|
94
|
+
└── completed/
|
|
95
|
+
└── {date}/
|
|
96
|
+
└── {trace_id}.json # Archived traces by date
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Guardrails & Tripwires System
|
|
102
|
+
|
|
103
|
+
### Input Guardrails
|
|
104
|
+
|
|
105
|
+
Run **before** agent execution to validate user input:
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
@input_guardrail(blocking=True)
|
|
109
|
+
async def validate_task_scope(input, context):
|
|
110
|
+
"""
|
|
111
|
+
Blocks tasks outside project scope.
|
|
112
|
+
Based on OpenAI Agents SDK pattern.
|
|
113
|
+
"""
|
|
114
|
+
# Check if task references files outside project
|
|
115
|
+
if references_external_paths(input):
|
|
116
|
+
return GuardrailResult(
|
|
117
|
+
tripwire_triggered=True,
|
|
118
|
+
reason="Task references paths outside project root"
|
|
119
|
+
)
|
|
120
|
+
|
|
121
|
+
# Check for disallowed operations
|
|
122
|
+
if contains_destructive_operation(input):
|
|
123
|
+
return GuardrailResult(
|
|
124
|
+
tripwire_triggered=True,
|
|
125
|
+
reason="Destructive operation requires human approval"
|
|
126
|
+
)
|
|
127
|
+
|
|
128
|
+
return GuardrailResult(tripwire_triggered=False)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Output Guardrails
|
|
132
|
+
|
|
133
|
+
Run **after** agent execution to validate results:
|
|
134
|
+
|
|
135
|
+
```python
|
|
136
|
+
@output_guardrail
|
|
137
|
+
async def validate_code_quality(output, context):
|
|
138
|
+
"""
|
|
139
|
+
Blocks low-quality code output.
|
|
140
|
+
"""
|
|
141
|
+
if output.type == "code":
|
|
142
|
+
issues = run_static_analysis(output.content)
|
|
143
|
+
critical = [i for i in issues if i.severity == "critical"]
|
|
144
|
+
|
|
145
|
+
if critical:
|
|
146
|
+
return GuardrailResult(
|
|
147
|
+
tripwire_triggered=True,
|
|
148
|
+
reason=f"Critical issues found: {critical}"
|
|
149
|
+
)
|
|
150
|
+
|
|
151
|
+
return GuardrailResult(tripwire_triggered=False)
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Execution Modes
|
|
155
|
+
|
|
156
|
+
| Mode | Behavior | Use When |
|
|
157
|
+
|------|----------|----------|
|
|
158
|
+
| **Blocking** | Guardrail completes before agent starts | Sensitive operations, expensive models |
|
|
159
|
+
| **Parallel** | Guardrail runs concurrently with agent | Fast checks, acceptable token loss |
|
|
160
|
+
|
|
161
|
+
```python
|
|
162
|
+
# Blocking mode: prevents token consumption
|
|
163
|
+
@input_guardrail(blocking=True, run_in_parallel=False)
|
|
164
|
+
async def expensive_validation(input):
|
|
165
|
+
# Agent won't start until this completes
|
|
166
|
+
pass
|
|
167
|
+
|
|
168
|
+
# Parallel mode: faster but may waste tokens if fails
|
|
169
|
+
@input_guardrail(blocking=True, run_in_parallel=True)
|
|
170
|
+
async def fast_validation(input):
|
|
171
|
+
# Runs alongside agent start
|
|
172
|
+
pass
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### Tripwire Exceptions
|
|
176
|
+
|
|
177
|
+
When tripwire triggers, execution halts immediately:
|
|
178
|
+
|
|
179
|
+
```python
|
|
180
|
+
class InputGuardrailTripwireTriggered(Exception):
|
|
181
|
+
"""Raised when input validation fails."""
|
|
182
|
+
pass
|
|
183
|
+
|
|
184
|
+
class OutputGuardrailTripwireTriggered(Exception):
|
|
185
|
+
"""Raised when output validation fails."""
|
|
186
|
+
pass
|
|
187
|
+
|
|
188
|
+
# In agent loop:
|
|
189
|
+
try:
|
|
190
|
+
result = await run_agent(task)
|
|
191
|
+
except InputGuardrailTripwireTriggered as e:
|
|
192
|
+
log_blocked_attempt(e)
|
|
193
|
+
return early_exit(reason=str(e))
|
|
194
|
+
except OutputGuardrailTripwireTriggered as e:
|
|
195
|
+
rollback_changes()
|
|
196
|
+
return retry_with_constraints(e.constraints)
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### Layered Defense Strategy
|
|
200
|
+
|
|
201
|
+
> "Think of guardrails as a layered defense mechanism. While a single one is unlikely to provide sufficient protection, using multiple, specialized guardrails together creates more resilient agents." - OpenAI Agents SDK
|
|
202
|
+
|
|
203
|
+
```yaml
|
|
204
|
+
guardrail_layers:
|
|
205
|
+
layer_1_input:
|
|
206
|
+
- scope_validation # Is task within bounds?
|
|
207
|
+
- pii_detection # Contains sensitive data?
|
|
208
|
+
- injection_detection # Prompt injection attempt?
|
|
209
|
+
|
|
210
|
+
layer_2_pre_execution:
|
|
211
|
+
- cost_estimation # Will this exceed budget?
|
|
212
|
+
- dependency_check # Are dependencies available?
|
|
213
|
+
- conflict_detection # Will this conflict with in-progress work?
|
|
214
|
+
|
|
215
|
+
layer_3_output:
|
|
216
|
+
- static_analysis # Code quality issues?
|
|
217
|
+
- secret_detection # Secrets in output?
|
|
218
|
+
- spec_compliance # Matches OpenAPI spec?
|
|
219
|
+
|
|
220
|
+
layer_4_post_action:
|
|
221
|
+
- test_validation # Tests pass?
|
|
222
|
+
- review_approval # Review passed?
|
|
223
|
+
- deployment_safety # Safe to deploy?
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Handoff Callbacks
|
|
229
|
+
|
|
230
|
+
### on_handoff Pattern
|
|
231
|
+
|
|
232
|
+
Prepare data when transferring between agents:
|
|
233
|
+
|
|
234
|
+
```python
|
|
235
|
+
async def on_handoff_to_backend_dev(handoff_context):
|
|
236
|
+
"""
|
|
237
|
+
Called when orchestrator hands off to backend-dev agent.
|
|
238
|
+
Fetches context the receiving agent will need.
|
|
239
|
+
"""
|
|
240
|
+
# Pre-fetch relevant files
|
|
241
|
+
relevant_files = await find_related_files(handoff_context.task)
|
|
242
|
+
|
|
243
|
+
# Load architectural context
|
|
244
|
+
architecture = await read_file(".loki/specs/architecture.md")
|
|
245
|
+
|
|
246
|
+
# Get recent changes to affected areas
|
|
247
|
+
recent_commits = await git_log(paths=relevant_files, limit=10)
|
|
248
|
+
|
|
249
|
+
return HandoffData(
|
|
250
|
+
files=relevant_files,
|
|
251
|
+
architecture=architecture,
|
|
252
|
+
recent_changes=recent_commits,
|
|
253
|
+
constraints=handoff_context.constraints
|
|
254
|
+
)
|
|
255
|
+
|
|
256
|
+
# Register callback
|
|
257
|
+
handoff(
|
|
258
|
+
to_agent=backend_dev,
|
|
259
|
+
on_handoff=on_handoff_to_backend_dev
|
|
260
|
+
)
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### Handoff Context Transfer
|
|
264
|
+
|
|
265
|
+
```json
|
|
266
|
+
{
|
|
267
|
+
"handoff_id": "ho_abc123",
|
|
268
|
+
"from_agent": "orchestrator",
|
|
269
|
+
"to_agent": "backend-dev",
|
|
270
|
+
"timestamp": "2026-01-07T10:05:00Z",
|
|
271
|
+
"context": {
|
|
272
|
+
"task_id": "task-001",
|
|
273
|
+
"goal": "Implement user authentication endpoint",
|
|
274
|
+
"constraints": [
|
|
275
|
+
"Use existing auth patterns from src/auth/",
|
|
276
|
+
"Maintain backwards compatibility",
|
|
277
|
+
"Add rate limiting"
|
|
278
|
+
],
|
|
279
|
+
"pre_fetched": {
|
|
280
|
+
"files": ["src/auth/middleware.ts", "src/routes/index.ts"],
|
|
281
|
+
"architecture": "...",
|
|
282
|
+
"recent_changes": [...]
|
|
283
|
+
}
|
|
284
|
+
},
|
|
285
|
+
"return_expected": true,
|
|
286
|
+
"timeout_seconds": 600
|
|
287
|
+
}
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## Multi-Tiered Fallback System
|
|
293
|
+
|
|
294
|
+
### Model-Level Fallbacks
|
|
295
|
+
|
|
296
|
+
```python
|
|
297
|
+
async def execute_with_model_fallback(task, preferred_model):
|
|
298
|
+
"""
|
|
299
|
+
Try preferred model, fall back to alternatives on failure.
|
|
300
|
+
Based on OpenAI safety patterns.
|
|
301
|
+
"""
|
|
302
|
+
fallback_chain = {
|
|
303
|
+
"opus": ["sonnet", "haiku"],
|
|
304
|
+
"sonnet": ["haiku", "opus"],
|
|
305
|
+
"haiku": ["sonnet"]
|
|
306
|
+
}
|
|
307
|
+
|
|
308
|
+
models_to_try = [preferred_model] + fallback_chain.get(preferred_model, [])
|
|
309
|
+
|
|
310
|
+
for model in models_to_try:
|
|
311
|
+
try:
|
|
312
|
+
result = await run_agent(task, model=model)
|
|
313
|
+
if result.success:
|
|
314
|
+
return result
|
|
315
|
+
except RateLimitError:
|
|
316
|
+
log_warning(f"Rate limit on {model}, trying fallback")
|
|
317
|
+
continue
|
|
318
|
+
except ModelUnavailableError:
|
|
319
|
+
log_warning(f"{model} unavailable, trying fallback")
|
|
320
|
+
continue
|
|
321
|
+
|
|
322
|
+
# All models failed
|
|
323
|
+
return escalate_to_human(task, reason="All model fallbacks exhausted")
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
### Workflow-Level Fallbacks
|
|
327
|
+
|
|
328
|
+
```python
|
|
329
|
+
async def execute_with_workflow_fallback(task):
|
|
330
|
+
"""
|
|
331
|
+
If complex workflow fails, fall back to simpler operations.
|
|
332
|
+
"""
|
|
333
|
+
# Try full workflow first
|
|
334
|
+
try:
|
|
335
|
+
return await full_implementation_workflow(task)
|
|
336
|
+
except WorkflowError as e:
|
|
337
|
+
log_warning(f"Full workflow failed: {e}")
|
|
338
|
+
|
|
339
|
+
# Fall back to simpler approach
|
|
340
|
+
try:
|
|
341
|
+
return await simplified_workflow(task)
|
|
342
|
+
except WorkflowError as e:
|
|
343
|
+
log_warning(f"Simplified workflow failed: {e}")
|
|
344
|
+
|
|
345
|
+
# Last resort: decompose and try piece by piece
|
|
346
|
+
try:
|
|
347
|
+
subtasks = decompose_task(task)
|
|
348
|
+
results = []
|
|
349
|
+
for subtask in subtasks:
|
|
350
|
+
result = await execute_single_step(subtask)
|
|
351
|
+
results.append(result)
|
|
352
|
+
return combine_results(results)
|
|
353
|
+
except Exception as e:
|
|
354
|
+
return escalate_to_human(task, reason=f"All workflows failed: {e}")
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
### Fallback Decision Tree
|
|
358
|
+
|
|
359
|
+
```
|
|
360
|
+
Task Execution
|
|
361
|
+
|
|
|
362
|
+
+-- Try preferred approach
|
|
363
|
+
| |
|
|
364
|
+
| +-- Success? --> Done
|
|
365
|
+
| |
|
|
366
|
+
| +-- Rate limit? --> Try next model in chain
|
|
367
|
+
| |
|
|
368
|
+
| +-- Error? --> Try simpler workflow
|
|
369
|
+
|
|
|
370
|
+
+-- All workflows failed?
|
|
371
|
+
| |
|
|
372
|
+
| +-- Decompose into subtasks
|
|
373
|
+
| |
|
|
374
|
+
| +-- Execute piece by piece
|
|
375
|
+
|
|
|
376
|
+
+-- Still failing?
|
|
377
|
+
|
|
|
378
|
+
+-- Escalate to human
|
|
379
|
+
+-- Log detailed failure context
|
|
380
|
+
+-- Save state for resume
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
---
|
|
384
|
+
|
|
385
|
+
## Confidence-Based Human Escalation
|
|
386
|
+
|
|
387
|
+
### Confidence Scoring
|
|
388
|
+
|
|
389
|
+
```python
|
|
390
|
+
def calculate_confidence(task_result):
|
|
391
|
+
"""
|
|
392
|
+
Score confidence 0-1 based on multiple signals.
|
|
393
|
+
Low confidence triggers human review.
|
|
394
|
+
"""
|
|
395
|
+
signals = []
|
|
396
|
+
|
|
397
|
+
# Test coverage signal
|
|
398
|
+
if task_result.test_coverage >= 0.9:
|
|
399
|
+
signals.append(1.0)
|
|
400
|
+
elif task_result.test_coverage >= 0.7:
|
|
401
|
+
signals.append(0.7)
|
|
402
|
+
else:
|
|
403
|
+
signals.append(0.3)
|
|
404
|
+
|
|
405
|
+
# Review consensus signal
|
|
406
|
+
if task_result.review_unanimous:
|
|
407
|
+
signals.append(1.0)
|
|
408
|
+
elif task_result.review_majority:
|
|
409
|
+
signals.append(0.7)
|
|
410
|
+
else:
|
|
411
|
+
signals.append(0.3)
|
|
412
|
+
|
|
413
|
+
# Retry count signal
|
|
414
|
+
retry_penalty = min(task_result.retry_count * 0.2, 0.8)
|
|
415
|
+
signals.append(1.0 - retry_penalty)
|
|
416
|
+
|
|
417
|
+
return sum(signals) / len(signals)
|
|
418
|
+
|
|
419
|
+
# Escalation threshold
|
|
420
|
+
CONFIDENCE_THRESHOLD = 0.6
|
|
421
|
+
|
|
422
|
+
if calculate_confidence(result) < CONFIDENCE_THRESHOLD:
|
|
423
|
+
escalate_to_human(
|
|
424
|
+
task,
|
|
425
|
+
reason="Low confidence score",
|
|
426
|
+
context=result
|
|
427
|
+
)
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
### Automatic Escalation Triggers
|
|
431
|
+
|
|
432
|
+
```yaml
|
|
433
|
+
human_escalation_triggers:
|
|
434
|
+
# Retry-based
|
|
435
|
+
- condition: retry_count > 3
|
|
436
|
+
action: pause_and_escalate
|
|
437
|
+
reason: "Multiple failures indicate unclear requirements"
|
|
438
|
+
|
|
439
|
+
# Domain-based
|
|
440
|
+
- condition: domain in ["payments", "auth", "pii"]
|
|
441
|
+
action: require_approval
|
|
442
|
+
reason: "Sensitive domain requires human review"
|
|
443
|
+
|
|
444
|
+
# Confidence-based
|
|
445
|
+
- condition: confidence_score < 0.6
|
|
446
|
+
action: pause_and_escalate
|
|
447
|
+
reason: "Low confidence in solution quality"
|
|
448
|
+
|
|
449
|
+
# Time-based
|
|
450
|
+
- condition: wall_time > expected_time * 3
|
|
451
|
+
action: pause_and_escalate
|
|
452
|
+
reason: "Task taking much longer than expected"
|
|
453
|
+
|
|
454
|
+
# Cost-based
|
|
455
|
+
- condition: tokens_used > budget * 0.8
|
|
456
|
+
action: pause_and_escalate
|
|
457
|
+
reason: "Approaching token budget limit"
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
---
|
|
461
|
+
|
|
462
|
+
## AGENTS.md Integration
|
|
463
|
+
|
|
464
|
+
### Reading Target Project's AGENTS.md
|
|
465
|
+
|
|
466
|
+
```python
|
|
467
|
+
async def load_project_context():
|
|
468
|
+
"""
|
|
469
|
+
Read AGENTS.md from target project if exists.
|
|
470
|
+
Based on OpenAI/AAIF standard.
|
|
471
|
+
"""
|
|
472
|
+
agents_md_locations = [
|
|
473
|
+
"AGENTS.md",
|
|
474
|
+
".github/AGENTS.md",
|
|
475
|
+
"docs/AGENTS.md"
|
|
476
|
+
]
|
|
477
|
+
|
|
478
|
+
for location in agents_md_locations:
|
|
479
|
+
if await file_exists(location):
|
|
480
|
+
content = await read_file(location)
|
|
481
|
+
return parse_agents_md(content)
|
|
482
|
+
|
|
483
|
+
# No AGENTS.md found - use defaults
|
|
484
|
+
return default_project_context()
|
|
485
|
+
|
|
486
|
+
def parse_agents_md(content):
|
|
487
|
+
"""
|
|
488
|
+
Extract structured guidance from AGENTS.md.
|
|
489
|
+
"""
|
|
490
|
+
sections = parse_markdown_sections(content)
|
|
491
|
+
|
|
492
|
+
return ProjectContext(
|
|
493
|
+
build_commands=sections.get("build", []),
|
|
494
|
+
test_commands=sections.get("test", []),
|
|
495
|
+
code_style=sections.get("code style", {}),
|
|
496
|
+
architecture_notes=sections.get("architecture", ""),
|
|
497
|
+
deployment_notes=sections.get("deployment", ""),
|
|
498
|
+
security_notes=sections.get("security", "")
|
|
499
|
+
)
|
|
500
|
+
```
|
|
501
|
+
|
|
502
|
+
### Context Priority
|
|
503
|
+
|
|
504
|
+
```
|
|
505
|
+
1. AGENTS.md (closest to current file, monorepo-aware)
|
|
506
|
+
2. CLAUDE.md (Claude-specific instructions)
|
|
507
|
+
3. .loki/CONTINUITY.md (session state)
|
|
508
|
+
4. Package-level documentation
|
|
509
|
+
5. README.md (general project info)
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
---
|
|
513
|
+
|
|
514
|
+
## Reasoning Model Guidance
|
|
515
|
+
|
|
516
|
+
### When to Use Extended Thinking
|
|
517
|
+
|
|
518
|
+
Based on OpenAI's o3/o4-mini patterns:
|
|
519
|
+
|
|
520
|
+
```yaml
|
|
521
|
+
use_extended_reasoning:
|
|
522
|
+
always:
|
|
523
|
+
- System architecture design
|
|
524
|
+
- Security vulnerability analysis
|
|
525
|
+
- Complex debugging (multi-file, unclear root cause)
|
|
526
|
+
- API design decisions
|
|
527
|
+
- Performance optimization strategy
|
|
528
|
+
|
|
529
|
+
sometimes:
|
|
530
|
+
- Code review (only for critical/complex changes)
|
|
531
|
+
- Refactoring planning (when multiple approaches exist)
|
|
532
|
+
- Integration design (when crossing system boundaries)
|
|
533
|
+
|
|
534
|
+
never:
|
|
535
|
+
- Simple bug fixes
|
|
536
|
+
- Documentation updates
|
|
537
|
+
- Unit test writing
|
|
538
|
+
- Formatting/linting
|
|
539
|
+
- File operations
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
### Backtracking Pattern
|
|
543
|
+
|
|
544
|
+
```python
|
|
545
|
+
async def execute_with_backtracking(task, max_backtracks=3):
|
|
546
|
+
"""
|
|
547
|
+
Allow agent to backtrack and try different approaches.
|
|
548
|
+
Based on Deep Research's adaptive planning.
|
|
549
|
+
"""
|
|
550
|
+
attempts = []
|
|
551
|
+
|
|
552
|
+
for attempt in range(max_backtracks + 1):
|
|
553
|
+
# Generate approach considering previous failures
|
|
554
|
+
approach = await plan_approach(
|
|
555
|
+
task,
|
|
556
|
+
failed_approaches=attempts
|
|
557
|
+
)
|
|
558
|
+
|
|
559
|
+
result = await execute_approach(approach)
|
|
560
|
+
|
|
561
|
+
if result.success:
|
|
562
|
+
return result
|
|
563
|
+
|
|
564
|
+
# Record failed approach for learning
|
|
565
|
+
attempts.append({
|
|
566
|
+
"approach": approach,
|
|
567
|
+
"failure_reason": result.error,
|
|
568
|
+
"partial_progress": result.partial_output
|
|
569
|
+
})
|
|
570
|
+
|
|
571
|
+
# Backtrack: reset to clean state
|
|
572
|
+
await rollback_to_checkpoint(task.checkpoint_id)
|
|
573
|
+
|
|
574
|
+
return FailedResult(
|
|
575
|
+
reason="Max backtracks exceeded",
|
|
576
|
+
attempts=attempts
|
|
577
|
+
)
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
---
|
|
581
|
+
|
|
582
|
+
## Session State Management
|
|
583
|
+
|
|
584
|
+
### Automatic State Persistence
|
|
585
|
+
|
|
586
|
+
```python
|
|
587
|
+
class Session:
|
|
588
|
+
"""
|
|
589
|
+
Automatic conversation history and state management.
|
|
590
|
+
Inspired by OpenAI Agents SDK Sessions.
|
|
591
|
+
"""
|
|
592
|
+
|
|
593
|
+
def __init__(self, session_id):
|
|
594
|
+
self.session_id = session_id
|
|
595
|
+
self.state_file = f".loki/state/sessions/{session_id}.json"
|
|
596
|
+
self.history = []
|
|
597
|
+
self.context = {}
|
|
598
|
+
|
|
599
|
+
async def save_state(self):
|
|
600
|
+
state = {
|
|
601
|
+
"session_id": self.session_id,
|
|
602
|
+
"history": self.history,
|
|
603
|
+
"context": self.context,
|
|
604
|
+
"last_updated": now()
|
|
605
|
+
}
|
|
606
|
+
await write_json(self.state_file, state)
|
|
607
|
+
|
|
608
|
+
async def load_state(self):
|
|
609
|
+
if await file_exists(self.state_file):
|
|
610
|
+
state = await read_json(self.state_file)
|
|
611
|
+
self.history = state["history"]
|
|
612
|
+
self.context = state["context"]
|
|
613
|
+
|
|
614
|
+
async def add_turn(self, role, content, metadata=None):
|
|
615
|
+
self.history.append({
|
|
616
|
+
"role": role,
|
|
617
|
+
"content": content,
|
|
618
|
+
"metadata": metadata,
|
|
619
|
+
"timestamp": now()
|
|
620
|
+
})
|
|
621
|
+
await self.save_state()
|
|
622
|
+
```
|
|
623
|
+
|
|
624
|
+
---
|
|
625
|
+
|
|
626
|
+
## Sources
|
|
627
|
+
|
|
628
|
+
**OpenAI Official:**
|
|
629
|
+
- [Agents SDK Documentation](https://openai.github.io/openai-agents-python/)
|
|
630
|
+
- [Practical Guide to Building Agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)
|
|
631
|
+
- [Building Agents Track](https://developers.openai.com/tracks/building-agents/)
|
|
632
|
+
- [AGENTS.md Specification](https://agents.md/)
|
|
633
|
+
|
|
634
|
+
**Deep Research & Reasoning:**
|
|
635
|
+
- [Introducing Deep Research](https://openai.com/index/introducing-deep-research/)
|
|
636
|
+
- [Deep Research System Card](https://cdn.openai.com/deep-research-system-card.pdf)
|
|
637
|
+
- [Introducing o3 and o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/)
|
|
638
|
+
- [Reasoning Best Practices](https://platform.openai.com/docs/guides/reasoning-best-practices)
|
|
639
|
+
|
|
640
|
+
**Safety & Monitoring:**
|
|
641
|
+
- [Chain of Thought Monitoring](https://openai.com/index/chain-of-thought-monitoring/)
|
|
642
|
+
- [Agent Builder Safety](https://platform.openai.com/docs/guides/agent-builder-safety)
|
|
643
|
+
- [Computer-Using Agent](https://openai.com/index/computer-using-agent/)
|
|
644
|
+
|
|
645
|
+
**Standards & Interoperability:**
|
|
646
|
+
- [Agentic AI Foundation](https://openai.com/index/agentic-ai-foundation/)
|
|
647
|
+
- [OpenAI for Developers 2025](https://developers.openai.com/blog/openai-for-developers-2025/)
|