loki-mode 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +691 -0
  3. package/SKILL.md +191 -0
  4. package/VERSION +1 -0
  5. package/autonomy/.loki/dashboard/index.html +2634 -0
  6. package/autonomy/CONSTITUTION.md +508 -0
  7. package/autonomy/README.md +201 -0
  8. package/autonomy/config.example.yaml +152 -0
  9. package/autonomy/loki +526 -0
  10. package/autonomy/run.sh +3636 -0
  11. package/bin/loki-mode.js +26 -0
  12. package/bin/postinstall.js +60 -0
  13. package/docs/ACKNOWLEDGEMENTS.md +234 -0
  14. package/docs/COMPARISON.md +325 -0
  15. package/docs/COMPETITIVE-ANALYSIS.md +333 -0
  16. package/docs/INSTALLATION.md +547 -0
  17. package/docs/auto-claude-comparison.md +276 -0
  18. package/docs/cursor-comparison.md +225 -0
  19. package/docs/dashboard-guide.md +355 -0
  20. package/docs/screenshots/README.md +149 -0
  21. package/docs/screenshots/dashboard-agents.png +0 -0
  22. package/docs/screenshots/dashboard-tasks.png +0 -0
  23. package/docs/thick2thin.md +173 -0
  24. package/package.json +48 -0
  25. package/references/advanced-patterns.md +453 -0
  26. package/references/agent-types.md +243 -0
  27. package/references/agents.md +1043 -0
  28. package/references/business-ops.md +550 -0
  29. package/references/competitive-analysis.md +216 -0
  30. package/references/confidence-routing.md +371 -0
  31. package/references/core-workflow.md +275 -0
  32. package/references/cursor-learnings.md +207 -0
  33. package/references/deployment.md +604 -0
  34. package/references/lab-research-patterns.md +534 -0
  35. package/references/mcp-integration.md +186 -0
  36. package/references/memory-system.md +467 -0
  37. package/references/openai-patterns.md +647 -0
  38. package/references/production-patterns.md +568 -0
  39. package/references/prompt-repetition.md +192 -0
  40. package/references/quality-control.md +437 -0
  41. package/references/sdlc-phases.md +410 -0
  42. package/references/task-queue.md +361 -0
  43. package/references/tool-orchestration.md +691 -0
  44. package/skills/00-index.md +120 -0
  45. package/skills/agents.md +249 -0
  46. package/skills/artifacts.md +174 -0
  47. package/skills/github-integration.md +218 -0
  48. package/skills/model-selection.md +125 -0
  49. package/skills/parallel-workflows.md +526 -0
  50. package/skills/patterns-advanced.md +188 -0
  51. package/skills/production.md +292 -0
  52. package/skills/quality-gates.md +180 -0
  53. package/skills/testing.md +149 -0
  54. package/skills/troubleshooting.md +109 -0
@@ -0,0 +1,647 @@
1
+ # OpenAI Agent Patterns Reference
2
+
3
+ Research-backed patterns from OpenAI's Agents SDK, Deep Research, and autonomous agent frameworks.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ OpenAI's agent ecosystem provides four key architectural innovations for Loki Mode:
10
+
11
+ 1. **Tracing Spans** - Hierarchical event tracking with span types
12
+ 2. **Guardrails & Tripwires** - Input/output validation with early termination
13
+ 3. **Handoff Callbacks** - Data preparation during agent transfers
14
+ 4. **Multi-Tiered Fallbacks** - Model and workflow-level failure recovery
15
+
16
+ ---
17
+
18
+ ## Tracing Spans Architecture
19
+
20
+ ### Span Types (Agents SDK Pattern)
21
+
22
+ Every operation is wrapped in a typed span for observability:
23
+
24
+ ```yaml
25
+ span_types:
26
+ agent_span:
27
+ - Wraps entire agent execution
28
+ - Contains: agent_name, instructions_hash, model
29
+
30
+ generation_span:
31
+ - Wraps LLM API calls
32
+ - Contains: model, tokens_in, tokens_out, latency_ms
33
+
34
+ function_span:
35
+ - Wraps tool/function calls
36
+ - Contains: function_name, arguments, result, success
37
+
38
+ guardrail_span:
39
+ - Wraps validation checks
40
+ - Contains: guardrail_name, triggered, blocking
41
+
42
+ handoff_span:
43
+ - Wraps agent-to-agent transfers
44
+ - Contains: from_agent, to_agent, context_passed
45
+
46
+ custom_span:
47
+ - User-defined operations
48
+ - Contains: operation_name, metadata
49
+ ```
50
+
51
+ ### Hierarchical Trace Structure
52
+
53
+ ```json
54
+ {
55
+ "trace_id": "trace_abc123def456",
56
+ "workflow_name": "implement_feature",
57
+ "group_id": "session_xyz789",
58
+ "spans": [
59
+ {
60
+ "span_id": "span_001",
61
+ "parent_id": null,
62
+ "type": "agent_span",
63
+ "agent_name": "orchestrator",
64
+ "started_at": "2026-01-07T10:00:00Z",
65
+ "ended_at": "2026-01-07T10:05:00Z",
66
+ "children": ["span_002", "span_003"]
67
+ },
68
+ {
69
+ "span_id": "span_002",
70
+ "parent_id": "span_001",
71
+ "type": "guardrail_span",
72
+ "guardrail_name": "input_validation",
73
+ "triggered": false,
74
+ "blocking": true
75
+ },
76
+ {
77
+ "span_id": "span_003",
78
+ "parent_id": "span_001",
79
+ "type": "handoff_span",
80
+ "from_agent": "orchestrator",
81
+ "to_agent": "backend-dev",
82
+ "context_passed": ["task_spec", "related_files"]
83
+ }
84
+ ]
85
+ }
86
+ ```
87
+
88
+ ### Storage Location
89
+
90
+ ```
91
+ .loki/traces/
92
+ ├── active/
93
+ │ └── {trace_id}.json # Currently running traces
94
+ └── completed/
95
+ └── {date}/
96
+ └── {trace_id}.json # Archived traces by date
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Guardrails & Tripwires System
102
+
103
+ ### Input Guardrails
104
+
105
+ Run **before** agent execution to validate user input:
106
+
107
+ ```python
108
+ @input_guardrail(blocking=True)
109
+ async def validate_task_scope(input, context):
110
+ """
111
+ Blocks tasks outside project scope.
112
+ Based on OpenAI Agents SDK pattern.
113
+ """
114
+ # Check if task references files outside project
115
+ if references_external_paths(input):
116
+ return GuardrailResult(
117
+ tripwire_triggered=True,
118
+ reason="Task references paths outside project root"
119
+ )
120
+
121
+ # Check for disallowed operations
122
+ if contains_destructive_operation(input):
123
+ return GuardrailResult(
124
+ tripwire_triggered=True,
125
+ reason="Destructive operation requires human approval"
126
+ )
127
+
128
+ return GuardrailResult(tripwire_triggered=False)
129
+ ```
130
+
131
+ ### Output Guardrails
132
+
133
+ Run **after** agent execution to validate results:
134
+
135
+ ```python
136
+ @output_guardrail
137
+ async def validate_code_quality(output, context):
138
+ """
139
+ Blocks low-quality code output.
140
+ """
141
+ if output.type == "code":
142
+ issues = run_static_analysis(output.content)
143
+ critical = [i for i in issues if i.severity == "critical"]
144
+
145
+ if critical:
146
+ return GuardrailResult(
147
+ tripwire_triggered=True,
148
+ reason=f"Critical issues found: {critical}"
149
+ )
150
+
151
+ return GuardrailResult(tripwire_triggered=False)
152
+ ```
153
+
154
+ ### Execution Modes
155
+
156
+ | Mode | Behavior | Use When |
157
+ |------|----------|----------|
158
+ | **Blocking** | Guardrail completes before agent starts | Sensitive operations, expensive models |
159
+ | **Parallel** | Guardrail runs concurrently with agent | Fast checks, acceptable token loss |
160
+
161
+ ```python
162
+ # Blocking mode: prevents token consumption
163
+ @input_guardrail(blocking=True, run_in_parallel=False)
164
+ async def expensive_validation(input):
165
+ # Agent won't start until this completes
166
+ pass
167
+
168
+ # Parallel mode: faster but may waste tokens if fails
169
+ @input_guardrail(blocking=True, run_in_parallel=True)
170
+ async def fast_validation(input):
171
+ # Runs alongside agent start
172
+ pass
173
+ ```
174
+
175
+ ### Tripwire Exceptions
176
+
177
+ When tripwire triggers, execution halts immediately:
178
+
179
+ ```python
180
+ class InputGuardrailTripwireTriggered(Exception):
181
+ """Raised when input validation fails."""
182
+ pass
183
+
184
+ class OutputGuardrailTripwireTriggered(Exception):
185
+ """Raised when output validation fails."""
186
+ pass
187
+
188
+ # In agent loop:
189
+ try:
190
+ result = await run_agent(task)
191
+ except InputGuardrailTripwireTriggered as e:
192
+ log_blocked_attempt(e)
193
+ return early_exit(reason=str(e))
194
+ except OutputGuardrailTripwireTriggered as e:
195
+ rollback_changes()
196
+ return retry_with_constraints(e.constraints)
197
+ ```
198
+
199
+ ### Layered Defense Strategy
200
+
201
+ > "Think of guardrails as a layered defense mechanism. While a single one is unlikely to provide sufficient protection, using multiple, specialized guardrails together creates more resilient agents." - OpenAI Agents SDK
202
+
203
+ ```yaml
204
+ guardrail_layers:
205
+ layer_1_input:
206
+ - scope_validation # Is task within bounds?
207
+ - pii_detection # Contains sensitive data?
208
+ - injection_detection # Prompt injection attempt?
209
+
210
+ layer_2_pre_execution:
211
+ - cost_estimation # Will this exceed budget?
212
+ - dependency_check # Are dependencies available?
213
+ - conflict_detection # Will this conflict with in-progress work?
214
+
215
+ layer_3_output:
216
+ - static_analysis # Code quality issues?
217
+ - secret_detection # Secrets in output?
218
+ - spec_compliance # Matches OpenAPI spec?
219
+
220
+ layer_4_post_action:
221
+ - test_validation # Tests pass?
222
+ - review_approval # Review passed?
223
+ - deployment_safety # Safe to deploy?
224
+ ```
225
+
226
+ ---
227
+
228
+ ## Handoff Callbacks
229
+
230
+ ### on_handoff Pattern
231
+
232
+ Prepare data when transferring between agents:
233
+
234
+ ```python
235
+ async def on_handoff_to_backend_dev(handoff_context):
236
+ """
237
+ Called when orchestrator hands off to backend-dev agent.
238
+ Fetches context the receiving agent will need.
239
+ """
240
+ # Pre-fetch relevant files
241
+ relevant_files = await find_related_files(handoff_context.task)
242
+
243
+ # Load architectural context
244
+ architecture = await read_file(".loki/specs/architecture.md")
245
+
246
+ # Get recent changes to affected areas
247
+ recent_commits = await git_log(paths=relevant_files, limit=10)
248
+
249
+ return HandoffData(
250
+ files=relevant_files,
251
+ architecture=architecture,
252
+ recent_changes=recent_commits,
253
+ constraints=handoff_context.constraints
254
+ )
255
+
256
+ # Register callback
257
+ handoff(
258
+ to_agent=backend_dev,
259
+ on_handoff=on_handoff_to_backend_dev
260
+ )
261
+ ```
262
+
263
+ ### Handoff Context Transfer
264
+
265
+ ```json
266
+ {
267
+ "handoff_id": "ho_abc123",
268
+ "from_agent": "orchestrator",
269
+ "to_agent": "backend-dev",
270
+ "timestamp": "2026-01-07T10:05:00Z",
271
+ "context": {
272
+ "task_id": "task-001",
273
+ "goal": "Implement user authentication endpoint",
274
+ "constraints": [
275
+ "Use existing auth patterns from src/auth/",
276
+ "Maintain backwards compatibility",
277
+ "Add rate limiting"
278
+ ],
279
+ "pre_fetched": {
280
+ "files": ["src/auth/middleware.ts", "src/routes/index.ts"],
281
+ "architecture": "...",
282
+ "recent_changes": [...]
283
+ }
284
+ },
285
+ "return_expected": true,
286
+ "timeout_seconds": 600
287
+ }
288
+ ```
289
+
290
+ ---
291
+
292
+ ## Multi-Tiered Fallback System
293
+
294
+ ### Model-Level Fallbacks
295
+
296
+ ```python
297
+ async def execute_with_model_fallback(task, preferred_model):
298
+ """
299
+ Try preferred model, fall back to alternatives on failure.
300
+ Based on OpenAI safety patterns.
301
+ """
302
+ fallback_chain = {
303
+ "opus": ["sonnet", "haiku"],
304
+ "sonnet": ["haiku", "opus"],
305
+ "haiku": ["sonnet"]
306
+ }
307
+
308
+ models_to_try = [preferred_model] + fallback_chain.get(preferred_model, [])
309
+
310
+ for model in models_to_try:
311
+ try:
312
+ result = await run_agent(task, model=model)
313
+ if result.success:
314
+ return result
315
+ except RateLimitError:
316
+ log_warning(f"Rate limit on {model}, trying fallback")
317
+ continue
318
+ except ModelUnavailableError:
319
+ log_warning(f"{model} unavailable, trying fallback")
320
+ continue
321
+
322
+ # All models failed
323
+ return escalate_to_human(task, reason="All model fallbacks exhausted")
324
+ ```
325
+
326
+ ### Workflow-Level Fallbacks
327
+
328
+ ```python
329
+ async def execute_with_workflow_fallback(task):
330
+ """
331
+ If complex workflow fails, fall back to simpler operations.
332
+ """
333
+ # Try full workflow first
334
+ try:
335
+ return await full_implementation_workflow(task)
336
+ except WorkflowError as e:
337
+ log_warning(f"Full workflow failed: {e}")
338
+
339
+ # Fall back to simpler approach
340
+ try:
341
+ return await simplified_workflow(task)
342
+ except WorkflowError as e:
343
+ log_warning(f"Simplified workflow failed: {e}")
344
+
345
+ # Last resort: decompose and try piece by piece
346
+ try:
347
+ subtasks = decompose_task(task)
348
+ results = []
349
+ for subtask in subtasks:
350
+ result = await execute_single_step(subtask)
351
+ results.append(result)
352
+ return combine_results(results)
353
+ except Exception as e:
354
+ return escalate_to_human(task, reason=f"All workflows failed: {e}")
355
+ ```
356
+
357
+ ### Fallback Decision Tree
358
+
359
+ ```
360
+ Task Execution
361
+ |
362
+ +-- Try preferred approach
363
+ | |
364
+ | +-- Success? --> Done
365
+ | |
366
+ | +-- Rate limit? --> Try next model in chain
367
+ | |
368
+ | +-- Error? --> Try simpler workflow
369
+ |
370
+ +-- All workflows failed?
371
+ | |
372
+ | +-- Decompose into subtasks
373
+ | |
374
+ | +-- Execute piece by piece
375
+ |
376
+ +-- Still failing?
377
+ |
378
+ +-- Escalate to human
379
+ +-- Log detailed failure context
380
+ +-- Save state for resume
381
+ ```
382
+
383
+ ---
384
+
385
+ ## Confidence-Based Human Escalation
386
+
387
+ ### Confidence Scoring
388
+
389
+ ```python
390
+ def calculate_confidence(task_result):
391
+ """
392
+ Score confidence 0-1 based on multiple signals.
393
+ Low confidence triggers human review.
394
+ """
395
+ signals = []
396
+
397
+ # Test coverage signal
398
+ if task_result.test_coverage >= 0.9:
399
+ signals.append(1.0)
400
+ elif task_result.test_coverage >= 0.7:
401
+ signals.append(0.7)
402
+ else:
403
+ signals.append(0.3)
404
+
405
+ # Review consensus signal
406
+ if task_result.review_unanimous:
407
+ signals.append(1.0)
408
+ elif task_result.review_majority:
409
+ signals.append(0.7)
410
+ else:
411
+ signals.append(0.3)
412
+
413
+ # Retry count signal
414
+ retry_penalty = min(task_result.retry_count * 0.2, 0.8)
415
+ signals.append(1.0 - retry_penalty)
416
+
417
+ return sum(signals) / len(signals)
418
+
419
+ # Escalation threshold
420
+ CONFIDENCE_THRESHOLD = 0.6
421
+
422
+ if calculate_confidence(result) < CONFIDENCE_THRESHOLD:
423
+ escalate_to_human(
424
+ task,
425
+ reason="Low confidence score",
426
+ context=result
427
+ )
428
+ ```
429
+
430
+ ### Automatic Escalation Triggers
431
+
432
+ ```yaml
433
+ human_escalation_triggers:
434
+ # Retry-based
435
+ - condition: retry_count > 3
436
+ action: pause_and_escalate
437
+ reason: "Multiple failures indicate unclear requirements"
438
+
439
+ # Domain-based
440
+ - condition: domain in ["payments", "auth", "pii"]
441
+ action: require_approval
442
+ reason: "Sensitive domain requires human review"
443
+
444
+ # Confidence-based
445
+ - condition: confidence_score < 0.6
446
+ action: pause_and_escalate
447
+ reason: "Low confidence in solution quality"
448
+
449
+ # Time-based
450
+ - condition: wall_time > expected_time * 3
451
+ action: pause_and_escalate
452
+ reason: "Task taking much longer than expected"
453
+
454
+ # Cost-based
455
+ - condition: tokens_used > budget * 0.8
456
+ action: pause_and_escalate
457
+ reason: "Approaching token budget limit"
458
+ ```
459
+
460
+ ---
461
+
462
+ ## AGENTS.md Integration
463
+
464
+ ### Reading Target Project's AGENTS.md
465
+
466
+ ```python
467
+ async def load_project_context():
468
+ """
469
+ Read AGENTS.md from target project if exists.
470
+ Based on OpenAI/AAIF standard.
471
+ """
472
+ agents_md_locations = [
473
+ "AGENTS.md",
474
+ ".github/AGENTS.md",
475
+ "docs/AGENTS.md"
476
+ ]
477
+
478
+ for location in agents_md_locations:
479
+ if await file_exists(location):
480
+ content = await read_file(location)
481
+ return parse_agents_md(content)
482
+
483
+ # No AGENTS.md found - use defaults
484
+ return default_project_context()
485
+
486
+ def parse_agents_md(content):
487
+ """
488
+ Extract structured guidance from AGENTS.md.
489
+ """
490
+ sections = parse_markdown_sections(content)
491
+
492
+ return ProjectContext(
493
+ build_commands=sections.get("build", []),
494
+ test_commands=sections.get("test", []),
495
+ code_style=sections.get("code style", {}),
496
+ architecture_notes=sections.get("architecture", ""),
497
+ deployment_notes=sections.get("deployment", ""),
498
+ security_notes=sections.get("security", "")
499
+ )
500
+ ```
501
+
502
+ ### Context Priority
503
+
504
+ ```
505
+ 1. AGENTS.md (closest to current file, monorepo-aware)
506
+ 2. CLAUDE.md (Claude-specific instructions)
507
+ 3. .loki/CONTINUITY.md (session state)
508
+ 4. Package-level documentation
509
+ 5. README.md (general project info)
510
+ ```
511
+
512
+ ---
513
+
514
+ ## Reasoning Model Guidance
515
+
516
+ ### When to Use Extended Thinking
517
+
518
+ Based on OpenAI's o3/o4-mini patterns:
519
+
520
+ ```yaml
521
+ use_extended_reasoning:
522
+ always:
523
+ - System architecture design
524
+ - Security vulnerability analysis
525
+ - Complex debugging (multi-file, unclear root cause)
526
+ - API design decisions
527
+ - Performance optimization strategy
528
+
529
+ sometimes:
530
+ - Code review (only for critical/complex changes)
531
+ - Refactoring planning (when multiple approaches exist)
532
+ - Integration design (when crossing system boundaries)
533
+
534
+ never:
535
+ - Simple bug fixes
536
+ - Documentation updates
537
+ - Unit test writing
538
+ - Formatting/linting
539
+ - File operations
540
+ ```
541
+
542
+ ### Backtracking Pattern
543
+
544
+ ```python
545
+ async def execute_with_backtracking(task, max_backtracks=3):
546
+ """
547
+ Allow agent to backtrack and try different approaches.
548
+ Based on Deep Research's adaptive planning.
549
+ """
550
+ attempts = []
551
+
552
+ for attempt in range(max_backtracks + 1):
553
+ # Generate approach considering previous failures
554
+ approach = await plan_approach(
555
+ task,
556
+ failed_approaches=attempts
557
+ )
558
+
559
+ result = await execute_approach(approach)
560
+
561
+ if result.success:
562
+ return result
563
+
564
+ # Record failed approach for learning
565
+ attempts.append({
566
+ "approach": approach,
567
+ "failure_reason": result.error,
568
+ "partial_progress": result.partial_output
569
+ })
570
+
571
+ # Backtrack: reset to clean state
572
+ await rollback_to_checkpoint(task.checkpoint_id)
573
+
574
+ return FailedResult(
575
+ reason="Max backtracks exceeded",
576
+ attempts=attempts
577
+ )
578
+ ```
579
+
580
+ ---
581
+
582
+ ## Session State Management
583
+
584
+ ### Automatic State Persistence
585
+
586
+ ```python
587
+ class Session:
588
+ """
589
+ Automatic conversation history and state management.
590
+ Inspired by OpenAI Agents SDK Sessions.
591
+ """
592
+
593
+ def __init__(self, session_id):
594
+ self.session_id = session_id
595
+ self.state_file = f".loki/state/sessions/{session_id}.json"
596
+ self.history = []
597
+ self.context = {}
598
+
599
+ async def save_state(self):
600
+ state = {
601
+ "session_id": self.session_id,
602
+ "history": self.history,
603
+ "context": self.context,
604
+ "last_updated": now()
605
+ }
606
+ await write_json(self.state_file, state)
607
+
608
+ async def load_state(self):
609
+ if await file_exists(self.state_file):
610
+ state = await read_json(self.state_file)
611
+ self.history = state["history"]
612
+ self.context = state["context"]
613
+
614
+ async def add_turn(self, role, content, metadata=None):
615
+ self.history.append({
616
+ "role": role,
617
+ "content": content,
618
+ "metadata": metadata,
619
+ "timestamp": now()
620
+ })
621
+ await self.save_state()
622
+ ```
623
+
624
+ ---
625
+
626
+ ## Sources
627
+
628
+ **OpenAI Official:**
629
+ - [Agents SDK Documentation](https://openai.github.io/openai-agents-python/)
630
+ - [Practical Guide to Building Agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)
631
+ - [Building Agents Track](https://developers.openai.com/tracks/building-agents/)
632
+ - [AGENTS.md Specification](https://agents.md/)
633
+
634
+ **Deep Research & Reasoning:**
635
+ - [Introducing Deep Research](https://openai.com/index/introducing-deep-research/)
636
+ - [Deep Research System Card](https://cdn.openai.com/deep-research-system-card.pdf)
637
+ - [Introducing o3 and o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/)
638
+ - [Reasoning Best Practices](https://platform.openai.com/docs/guides/reasoning-best-practices)
639
+
640
+ **Safety & Monitoring:**
641
+ - [Chain of Thought Monitoring](https://openai.com/index/chain-of-thought-monitoring/)
642
+ - [Agent Builder Safety](https://platform.openai.com/docs/guides/agent-builder-safety)
643
+ - [Computer-Using Agent](https://openai.com/index/computer-using-agent/)
644
+
645
+ **Standards & Interoperability:**
646
+ - [Agentic AI Foundation](https://openai.com/index/agentic-ai-foundation/)
647
+ - [OpenAI for Developers 2025](https://developers.openai.com/blog/openai-for-developers-2025/)