loki-mode 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +691 -0
  3. package/SKILL.md +191 -0
  4. package/VERSION +1 -0
  5. package/autonomy/.loki/dashboard/index.html +2634 -0
  6. package/autonomy/CONSTITUTION.md +508 -0
  7. package/autonomy/README.md +201 -0
  8. package/autonomy/config.example.yaml +152 -0
  9. package/autonomy/loki +526 -0
  10. package/autonomy/run.sh +3636 -0
  11. package/bin/loki-mode.js +26 -0
  12. package/bin/postinstall.js +60 -0
  13. package/docs/ACKNOWLEDGEMENTS.md +234 -0
  14. package/docs/COMPARISON.md +325 -0
  15. package/docs/COMPETITIVE-ANALYSIS.md +333 -0
  16. package/docs/INSTALLATION.md +547 -0
  17. package/docs/auto-claude-comparison.md +276 -0
  18. package/docs/cursor-comparison.md +225 -0
  19. package/docs/dashboard-guide.md +355 -0
  20. package/docs/screenshots/README.md +149 -0
  21. package/docs/screenshots/dashboard-agents.png +0 -0
  22. package/docs/screenshots/dashboard-tasks.png +0 -0
  23. package/docs/thick2thin.md +173 -0
  24. package/package.json +48 -0
  25. package/references/advanced-patterns.md +453 -0
  26. package/references/agent-types.md +243 -0
  27. package/references/agents.md +1043 -0
  28. package/references/business-ops.md +550 -0
  29. package/references/competitive-analysis.md +216 -0
  30. package/references/confidence-routing.md +371 -0
  31. package/references/core-workflow.md +275 -0
  32. package/references/cursor-learnings.md +207 -0
  33. package/references/deployment.md +604 -0
  34. package/references/lab-research-patterns.md +534 -0
  35. package/references/mcp-integration.md +186 -0
  36. package/references/memory-system.md +467 -0
  37. package/references/openai-patterns.md +647 -0
  38. package/references/production-patterns.md +568 -0
  39. package/references/prompt-repetition.md +192 -0
  40. package/references/quality-control.md +437 -0
  41. package/references/sdlc-phases.md +410 -0
  42. package/references/task-queue.md +361 -0
  43. package/references/tool-orchestration.md +691 -0
  44. package/skills/00-index.md +120 -0
  45. package/skills/agents.md +249 -0
  46. package/skills/artifacts.md +174 -0
  47. package/skills/github-integration.md +218 -0
  48. package/skills/model-selection.md +125 -0
  49. package/skills/parallel-workflows.md +526 -0
  50. package/skills/patterns-advanced.md +188 -0
  51. package/skills/production.md +292 -0
  52. package/skills/quality-gates.md +180 -0
  53. package/skills/testing.md +149 -0
  54. package/skills/troubleshooting.md +109 -0
@@ -0,0 +1,568 @@
1
+ # Production Patterns Reference
2
+
3
+ Practitioner-tested patterns from Hacker News discussions and real-world deployments. These patterns represent what actually works in production, not theoretical frameworks.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ This reference consolidates battle-tested insights from:
10
+ - HN discussions on autonomous agents in production (2025)
11
+ - Coding with LLMs practitioner experiences
12
+ - Simon Willison's Superpowers coding agent patterns
13
+ - Multi-agent orchestration real-world deployments
14
+
15
+ ---
16
+
17
+ ## What Actually Works in Production
18
+
19
+ ### Human-in-the-Loop (HITL) is Non-Negotiable
20
+
21
+ **Key Insight:** "Zero companies don't have a human in the loop" for customer-facing applications.
22
+
23
+ ```yaml
24
+ hitl_patterns:
25
+ always_human:
26
+ - Customer-facing responses
27
+ - Financial transactions
28
+ - Security-critical operations
29
+ - Legal/compliance decisions
30
+
31
+ automation_candidates:
32
+ - Internal tooling
33
+ - Developer assistance
34
+ - Data preprocessing
35
+ - Code generation (with review)
36
+
37
+ implementation:
38
+ - Classification layer routes to human vs automated
39
+ - Confidence thresholds trigger escalation
40
+ - Audit trails for all automated decisions
41
+ ```
42
+
43
+ ### Narrow Scope Wins
44
+
45
+ **Key Insight:** Successful agents operate within tightly constrained domains.
46
+
47
+ ```yaml
48
+ scope_constraints:
49
+ max_steps_before_review: 3-5
50
+ task_characteristics:
51
+ - Specific, well-defined objectives
52
+ - Pre-classified inputs
53
+ - Deterministic success criteria
54
+ - Verifiable outputs
55
+
56
+ successful_domains:
57
+ - Email scanning and classification
58
+ - Invoice processing
59
+ - Code refactoring (bounded)
60
+ - Documentation generation
61
+ - Test writing
62
+
63
+ failure_prone_domains:
64
+ - Open-ended feature implementation
65
+ - Novel algorithm design
66
+ - Security-critical code
67
+ - Cross-system integrations
68
+ ```
69
+
70
+ ### Confidence-Based Routing
71
+
72
+ **Key Insight:** Treat agents as preprocessors, not decision-makers.
73
+
74
+ ```python
75
+ def confidence_based_routing(agent_output):
76
+ """
77
+ Route based on confidence, not capability.
78
+ Based on production practitioner patterns.
79
+ """
80
+ confidence = agent_output.confidence_score
81
+
82
+ if confidence >= 0.95:
83
+ # High confidence: auto-approve with logging
84
+ return AutoApprove(audit_log=True)
85
+
86
+ elif confidence >= 0.70:
87
+ # Medium confidence: quick human review
88
+ return HumanReview(priority="normal", timeout="1h")
89
+
90
+ elif confidence >= 0.40:
91
+ # Low confidence: detailed human review
92
+ return HumanReview(priority="high", context="full")
93
+
94
+ else:
95
+ # Very low confidence: escalate immediately
96
+ return Escalate(reason="low_confidence", require_senior=True)
97
+ ```
98
+
99
+ ### Classification Before Automation
100
+
101
+ **Key Insight:** Separate inputs before processing.
102
+
103
+ ```yaml
104
+ classification_first:
105
+ step_1_classify:
106
+ workable:
107
+ - Clear requirements
108
+ - Existing patterns
109
+ - Test coverage available
110
+ non_workable:
111
+ - Ambiguous requirements
112
+ - Novel architecture
113
+ - Missing dependencies
114
+ escalate_immediately:
115
+ - Security concerns
116
+ - Compliance requirements
117
+ - Customer-facing changes
118
+
119
+ step_2_route:
120
+ workable: "Automated pipeline"
121
+ non_workable: "Human clarification"
122
+ escalate: "Senior review"
123
+ ```
124
+
125
+ ### Deterministic Outer Loops
126
+
127
+ **Key Insight:** Wrap agent outputs with rule-based validation.
128
+
129
+ ```python
130
+ def deterministic_validation_loop(task, max_attempts=3):
131
+ """
132
+ Use LLMs only where genuine ambiguity exists.
133
+ Wrap with deterministic rules.
134
+ """
135
+ for attempt in range(max_attempts):
136
+ # LLM handles the ambiguous part
137
+ output = agent.execute(task)
138
+
139
+ # Deterministic validation (NOT LLM)
140
+ validation_errors = []
141
+
142
+ # Rule: Must have tests
143
+ if not output.has_tests:
144
+ validation_errors.append("Missing tests")
145
+
146
+ # Rule: Must pass linting
147
+ lint_result = run_linter(output.code)
148
+ if lint_result.errors:
149
+ validation_errors.append(f"Lint errors: {lint_result.errors}")
150
+
151
+ # Rule: Must compile
152
+ compile_result = compile_code(output.code)
153
+ if not compile_result.success:
154
+ validation_errors.append(f"Compile error: {compile_result.error}")
155
+
156
+ # Rule: Tests must pass
157
+ if output.has_tests:
158
+ test_result = run_tests(output.code)
159
+ if not test_result.all_passed:
160
+ validation_errors.append(f"Test failures: {test_result.failures}")
161
+
162
+ if not validation_errors:
163
+ return output
164
+
165
+ # Feed errors back for retry
166
+ task = task.with_feedback(validation_errors)
167
+
168
+ return FailedResult(reason="Max attempts exceeded")
169
+ ```
170
+
171
+ ---
172
+
173
+ ## Context Engineering Patterns
174
+
175
+ ### Context Curation Over Automatic Selection
176
+
177
+ **Key Insight:** Manually choose which files and information to provide.
178
+
179
+ ```yaml
180
+ context_curation:
181
+ principles:
182
+ - "Less is more" - focused context beats comprehensive context
183
+ - Manual selection outperforms automatic RAG
184
+ - Remove outdated information aggressively
185
+
186
+ anti_patterns:
187
+ - Dumping entire codebase into context
188
+ - Relying on automatic context selection
189
+ - Accumulating conversation history indefinitely
190
+
191
+ implementation:
192
+ per_task_context:
193
+ - 2-5 most relevant files
194
+ - Specific functions, not entire modules
195
+ - Recent changes only (last 1-2 days)
196
+ - Clear success criteria
197
+
198
+ context_budget:
199
+ target: "< 10k tokens for context"
200
+ reserve: "90% for model reasoning"
201
+ ```
202
+
203
+ ### Information Abstraction
204
+
205
+ **Key Insight:** Summarize rather than feeding full data.
206
+
207
+ ```python
208
+ def abstract_for_agent(raw_data, task_context):
209
+ """
210
+ Design abstractions that preserve decision-relevant information.
211
+ Based on practitioner insights.
212
+ """
213
+ # BAD: Feed 10,000 database rows
214
+ # raw_data = db.query("SELECT * FROM users")
215
+
216
+ # GOOD: Summarize to decision-relevant info
217
+ summary = {
218
+ "query_status": "success",
219
+ "total_results": len(raw_data),
220
+ "sample": raw_data[:5],
221
+ "schema": extract_schema(raw_data),
222
+ "statistics": {
223
+ "null_count": count_nulls(raw_data),
224
+ "unique_values": count_uniques(raw_data),
225
+ "date_range": get_date_range(raw_data)
226
+ }
227
+ }
228
+
229
+ return summary
230
+ ```
231
+
232
+ ### Separate Conversations Per Task
233
+
234
+ **Key Insight:** Fresh contexts yield better results than accumulated sessions.
235
+
236
+ ```yaml
237
+ conversation_management:
238
+ new_conversation_triggers:
239
+ - Different domain (backend -> frontend)
240
+ - New feature vs bug fix
241
+ - After completing major task
242
+ - When errors accumulate (3+ in row)
243
+
244
+ preserve_across_sessions:
245
+ - CLAUDE.md / CONTINUITY.md
246
+ - Architectural decisions
247
+ - Key constraints
248
+
249
+ discard_between_sessions:
250
+ - Debugging attempts
251
+ - Abandoned approaches
252
+ - Intermediate drafts
253
+ ```
254
+
255
+ ---
256
+
257
+ ## Skills System Pattern
258
+
259
+ ### On-Demand Skill Loading
260
+
261
+ **Key Insight:** Skills remain dormant until the model actively seeks them out.
262
+
263
+ ```yaml
264
+ skills_architecture:
265
+ core_interaction: "< 2k tokens"
266
+ skill_loading: "On-demand via search"
267
+
268
+ implementation:
269
+ skill_discovery:
270
+ - Shell script searches skill files
271
+ - Model requests specific skills by name
272
+ - Skills loaded only when needed
273
+
274
+ skill_structure:
275
+ name: "unique-skill-name"
276
+ trigger: "Pattern that activates skill"
277
+ content: "Detailed instructions"
278
+ dependencies: ["other-skills"]
279
+
280
+ benefits:
281
+ - Minimal base context
282
+ - Extensible without bloat
283
+ - Skills can be updated independently
284
+ ```
285
+
286
+ ### Sub-Agents for Context Isolation
287
+
288
+ **Key Insight:** Prevent massive token waste by isolating context-noisy subtasks.
289
+
290
+ ```python
291
+ async def context_isolated_search(query, codebase_path):
292
+ """
293
+ Use sub-agent for grep/search to prevent context pollution.
294
+ Based on Simon Willison's patterns.
295
+ """
296
+ # Main agent stays focused
297
+ # Sub-agent handles noisy file searching
298
+
299
+ search_agent = spawn_subagent(
300
+ role="codebase-searcher",
301
+ context_limit="10k tokens",
302
+ permissions=["read-only"]
303
+ )
304
+
305
+ results = await search_agent.execute(
306
+ task=f"Find files related to: {query}",
307
+ codebase=codebase_path
308
+ )
309
+
310
+ # Return only relevant paths, not full content
311
+ return FilteredResults(
312
+ paths=results.relevant_files[:10],
313
+ summaries=results.file_summaries,
314
+ confidence=results.relevance_scores
315
+ )
316
+ ```
317
+
318
+ ---
319
+
320
+ ## Planning Before Execution
321
+
322
+ ### Explicit Plan-Then-Code Workflow
323
+
324
+ **Key Insight:** Have models articulate detailed plans without immediately writing code.
325
+
326
+ ```yaml
327
+ plan_then_code:
328
+ phase_1_planning:
329
+ outputs:
330
+ - spec.md: "Detailed requirements"
331
+ - todo.md: "Tagged tasks [BUG], [FEAT], [REFACTOR]"
332
+ - approach.md: "Implementation strategy"
333
+ constraints:
334
+ - NO CODE in this phase
335
+ - Human review before proceeding
336
+ - Clear success criteria
337
+
338
+ phase_2_review:
339
+ checks:
340
+ - Plan addresses all requirements
341
+ - Approach is feasible
342
+ - No missing dependencies
343
+ - Tests are specified
344
+
345
+ phase_3_implementation:
346
+ constraints:
347
+ - Follow plan exactly
348
+ - One task at a time
349
+ - Test after each change
350
+ - Report deviations immediately
351
+ ```
352
+
353
+ ---
354
+
355
+ ## Multi-Agent Orchestration Patterns
356
+
357
+ ### Event-Driven Coordination
358
+
359
+ **Key Insight:** Move beyond synchronous prompt chaining to asynchronous, decoupled systems.
360
+
361
+ ```yaml
362
+ event_driven_orchestration:
363
+ problems_with_synchronous:
364
+ - Doesn't scale
365
+ - Mixes orchestration with prompt logic
366
+ - Single failure breaks entire chain
367
+ - No retry/recovery mechanism
368
+
369
+ async_architecture:
370
+ message_queue:
371
+ - Agents communicate via events
372
+ - Decoupled execution
373
+ - Natural retry/dead-letter handling
374
+
375
+ state_management:
376
+ - Persistent task state
377
+ - Checkpoint/resume capability
378
+ - Clear ownership of data
379
+
380
+ error_handling:
381
+ - Per-agent retry policies
382
+ - Circuit breakers
383
+ - Graceful degradation
384
+ ```
385
+
386
+ ### Policy-First Enforcement
387
+
388
+ **Key Insight:** Govern agent behavior at runtime, not just training time.
389
+
390
+ ```python
391
+ class PolicyEngine:
392
+ """
393
+ Runtime governance for agent behavior.
394
+ Based on autonomous control plane patterns.
395
+ """
396
+
397
+ def __init__(self, policies):
398
+ self.policies = policies
399
+
400
+ async def enforce(self, agent_action, context):
401
+ for policy in self.policies:
402
+ result = await policy.evaluate(agent_action, context)
403
+
404
+ if result.blocked:
405
+ return BlockedAction(
406
+ reason=result.reason,
407
+ policy=policy.name,
408
+ remediation=result.suggested_action
409
+ )
410
+
411
+ if result.modified:
412
+ agent_action = result.modified_action
413
+
414
+ return AllowedAction(agent_action)
415
+
416
+ # Example policies
417
+ policies = [
418
+ NoProductionDataDeletion(),
419
+ NoSecretsInCode(),
420
+ MaxTokenBudget(limit=100000),
421
+ RequireTestsForCode(),
422
+ BlockExternalNetworkCalls(in_sandbox=True)
423
+ ]
424
+ ```
425
+
426
+ ### Simulation Layer
427
+
428
+ **Key Insight:** Evaluate changes before deploying to real environment.
429
+
430
+ ```yaml
431
+ simulation_layer:
432
+ purpose: "Test agent behavior in safe environment"
433
+
434
+ implementation:
435
+ sandbox_environment:
436
+ - Isolated container
437
+ - Mocked external services
438
+ - Synthetic data
439
+ - Full audit logging
440
+
441
+ validation_checks:
442
+ - Run tests in sandbox first
443
+ - Compare outputs to expected
444
+ - Check for policy violations
445
+ - Measure resource consumption
446
+
447
+ promotion_criteria:
448
+ - All tests pass
449
+ - No policy violations
450
+ - Resource usage within limits
451
+ - Human approval (for sensitive changes)
452
+ ```
453
+
454
+ ---
455
+
456
+ ## Evaluation and Benchmarking
457
+
458
+ ### Problems with Current Benchmarks
459
+
460
+ **Key Insight:** LLM-as-judge creates shared blind spots.
461
+
462
+ ```yaml
463
+ benchmark_problems:
464
+ llm_judge_issues:
465
+ - Same architecture = same failure modes
466
+ - Math errors accepted as correct
467
+ - "Do-nothing" baseline passes 38% of time
468
+
469
+ contamination:
470
+ - Published benchmarks become training targets
471
+ - Overfitting to specific datasets
472
+ - Inflated scores don't reflect real performance
473
+
474
+ solutions:
475
+ held_back_sets: "90% public, 10% private"
476
+ human_evaluation: "Final published results require humans"
477
+ production_testing: "A/B tests measure actual value"
478
+ objective_outcomes: "Simulated environments with verifiable results"
479
+ ```
480
+
481
+ ### Practical Evaluation Approach
482
+
483
+ ```python
484
+ def evaluate_agent_change(before_agent, after_agent, task_set):
485
+ """
486
+ Production-oriented evaluation.
487
+ Based on HN practitioner recommendations.
488
+ """
489
+ results = {
490
+ "before": [],
491
+ "after": [],
492
+ "human_preference": []
493
+ }
494
+
495
+ for task in task_set:
496
+ # Run both agents
497
+ before_result = before_agent.execute(task)
498
+ after_result = after_agent.execute(task)
499
+
500
+ # Objective metrics (NOT LLM-judged)
501
+ results["before"].append({
502
+ "tests_pass": run_tests(before_result),
503
+ "lint_clean": run_linter(before_result),
504
+ "time_taken": before_result.duration,
505
+ "tokens_used": before_result.tokens
506
+ })
507
+
508
+ results["after"].append({
509
+ "tests_pass": run_tests(after_result),
510
+ "lint_clean": run_linter(after_result),
511
+ "time_taken": after_result.duration,
512
+ "tokens_used": after_result.tokens
513
+ })
514
+
515
+ # Sample for human review
516
+ if random.random() < 0.1: # 10% sample
517
+ results["human_preference"].append({
518
+ "task": task,
519
+ "before": before_result,
520
+ "after": after_result,
521
+ "pending_review": True
522
+ })
523
+
524
+ return EvaluationReport(results)
525
+ ```
526
+
527
+ ---
528
+
529
+ ## Cost and Token Economics
530
+
531
+ ### Real-World Cost Patterns
532
+
533
+ ```yaml
534
+ cost_patterns:
535
+ claude_code:
536
+ heavy_use: "$25/1-2 hours on large codebases"
537
+ api_range: "$1-5/hour depending on efficiency"
538
+ max_tier: "$200/month often needs 2-3 subscriptions"
539
+
540
+ token_economics:
541
+ sub_agents_multiply_cost: "Each duplicates context"
542
+ example: "5-task parallel job = 50,000+ tokens per subtask"
543
+
544
+ optimization:
545
+ context_isolation: "Use sub-agents for noisy tasks"
546
+ information_abstraction: "Summarize, don't dump"
547
+ fresh_conversations: "Reset after major tasks"
548
+ skill_on_demand: "Load only when needed"
549
+ ```
550
+
551
+ ---
552
+
553
+ ## Sources
554
+
555
+ **Hacker News Discussions:**
556
+ - [What Actually Works in Production for Autonomous Agents](https://news.ycombinator.com/item?id=44623207)
557
+ - [Coding with LLMs in Summer 2025](https://news.ycombinator.com/item?id=44623953)
558
+ - [Superpowers: How I'm Using Coding Agents](https://news.ycombinator.com/item?id=45547344)
559
+ - [Claude Code Experience After Two Weeks](https://news.ycombinator.com/item?id=44596472)
560
+ - [AI Agent Benchmarks Are Broken](https://news.ycombinator.com/item?id=44531697)
561
+ - [How to Orchestrate Multi-Agent Workflows](https://news.ycombinator.com/item?id=45955997)
562
+ - [Context Engineering vs Prompt Engineering](https://news.ycombinator.com/item?id=44427757)
563
+
564
+ **Show HN Projects:**
565
+ - [Self-Evolving Agents Repository](https://news.ycombinator.com/item?id=45099226)
566
+ - [Package Manager for Agent Skills](https://news.ycombinator.com/item?id=46422264)
567
+ - [Wispbit - AI Code Review Agent](https://news.ycombinator.com/item?id=44722603)
568
+ - [Agtrace - Monitoring for AI Coding Agents](https://news.ycombinator.com/item?id=46425670)