npm - loki-mode - Versions diffs - 4.2.0 - Mend

loki-mode 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/LICENSE +21 -0
package/README.md +691 -0
package/SKILL.md +191 -0
package/VERSION +1 -0
package/autonomy/.loki/dashboard/index.html +2634 -0
package/autonomy/CONSTITUTION.md +508 -0
package/autonomy/README.md +201 -0
package/autonomy/config.example.yaml +152 -0
package/autonomy/loki +526 -0
package/autonomy/run.sh +3636 -0
package/bin/loki-mode.js +26 -0
package/bin/postinstall.js +60 -0
package/docs/ACKNOWLEDGEMENTS.md +234 -0
package/docs/COMPARISON.md +325 -0
package/docs/COMPETITIVE-ANALYSIS.md +333 -0
package/docs/INSTALLATION.md +547 -0
package/docs/auto-claude-comparison.md +276 -0
package/docs/cursor-comparison.md +225 -0
package/docs/dashboard-guide.md +355 -0
package/docs/screenshots/README.md +149 -0
package/docs/screenshots/dashboard-agents.png +0 -0
package/docs/screenshots/dashboard-tasks.png +0 -0
package/docs/thick2thin.md +173 -0
package/package.json +48 -0
package/references/advanced-patterns.md +453 -0
package/references/agent-types.md +243 -0
package/references/agents.md +1043 -0
package/references/business-ops.md +550 -0
package/references/competitive-analysis.md +216 -0
package/references/confidence-routing.md +371 -0
package/references/core-workflow.md +275 -0
package/references/cursor-learnings.md +207 -0
package/references/deployment.md +604 -0
package/references/lab-research-patterns.md +534 -0
package/references/mcp-integration.md +186 -0
package/references/memory-system.md +467 -0
package/references/openai-patterns.md +647 -0
package/references/production-patterns.md +568 -0
package/references/prompt-repetition.md +192 -0
package/references/quality-control.md +437 -0
package/references/sdlc-phases.md +410 -0
package/references/task-queue.md +361 -0
package/references/tool-orchestration.md +691 -0
package/skills/00-index.md +120 -0
package/skills/agents.md +249 -0
package/skills/artifacts.md +174 -0
package/skills/github-integration.md +218 -0
package/skills/model-selection.md +125 -0
package/skills/parallel-workflows.md +526 -0
package/skills/patterns-advanced.md +188 -0
package/skills/production.md +292 -0
package/skills/quality-gates.md +180 -0
package/skills/testing.md +149 -0
package/skills/troubleshooting.md +109 -0

package/references/prompt-repetition.md ADDED Viewed

@@ -0,0 +1,192 @@
+# Prompt Repetition Pattern Reference
+Research-backed technique from arXiv 2512.14982v1: "Prompt Repetition Improves Non-Reasoning LLMs"
+---
+## Overview
+**Key Finding:** Repeating prompts improves accuracy from 21.33% → 97.33% on position-dependent tasks without latency penalty.
+**Why It Works:** Causal language models process tokens sequentially without future context. Repetition enables bidirectional attention within the parallelizable prefill stage.
+---
+## When to Apply
+### ✅ USE Prompt Repetition For:
+- **Haiku agents** (non-reasoning model)
+- **Structured tasks** (unit tests, linting, formatting)
+- **Position-dependent operations** (finding items in lists, parsing structured data)
+- **Simple bug fixes** (typos, imports, syntax errors)
+### ❌ DO NOT Use For:
+- **Opus agents** (reasoning model - neutral/slightly negative effect)
+- **Sonnet agents** (reasoning model - neutral effect)
+- **Complex reasoning tasks** (architecture decisions, planning)
+- **Creative generation** (doesn't help with open-ended tasks)
+---
+## Implementation Pattern
+### Basic Repetition (2x)
+```python
+# Standard prompt (no repetition)
+Task(
+    model="haiku",
+    description="Run unit tests",
+    prompt="Execute all unit tests in tests/ directory and report results"
+)
+# With prompt repetition (2x)
+base_prompt = "Execute all unit tests in tests/ directory and report results"
+repeated_prompt = f"{base_prompt}\n\n{base_prompt}"
+Task(
+    model="haiku",
+    description="Run unit tests",
+    prompt=repeated_prompt
+)
+```
+### Enhanced Repetition (3x)
+For tasks requiring attention to position-dependent elements:
+```python
+# 3x repetition for complex structured tasks
+base_prompt = "Find all TODO comments in codebase and categorize by priority"
+repeated_prompt = f"{base_prompt}\n\n{base_prompt}\n\n{base_prompt}"
+Task(
+    model="haiku",
+    description="Categorize TODOs",
+    prompt=repeated_prompt
+)
+```
+---
+## Performance Impact
+### Benchmarks from Research Paper
+| Model | Task | Baseline | 2x Repetition | 3x Repetition |
+|-------|------|----------|---------------|---------------|
+| Gemini 2.0 Flash-Lite | NameIndex | 21.33% | 97.33% | 98.67% |
+| GPT-4o | NameIndex | 56.67% | 86.67% | 90.00% |
+| Claude 3 Sonnet | NameIndex | 48.00% | 82.67% | 85.33% |
+| Deepseek V3 | NameIndex | 62.67% | 88.00% | 91.33% |
+**Aggregate Results:**
+- Wins: 47/70 tests improved
+- Losses: 0/70 tests degraded
+- Neutral: 23/70 tests unchanged
+### Latency Impact
+**Zero latency penalty** - repetition occurs in parallelizable prefill stage, not sequential generation.
+---
+## Loki Mode Integration
+### Automatic Application
+Loki Mode automatically applies prompt repetition for Haiku agents on eligible tasks:
+```python
+def prepare_task_prompt(task, model):
+    """Prepare prompt with optional repetition based on model and task type."""
+    base_prompt = task.prompt
+    # Apply repetition for Haiku on structured tasks
+    if model == "haiku" and is_structured_task(task):
+        # 2x repetition for standard tasks
+        if task.complexity == "simple":
+            return f"{base_prompt}\n\n{base_prompt}"
+        # 3x repetition for position-critical tasks
+        elif requires_position_accuracy(task):
+            return f"{base_prompt}\n\n{base_prompt}\n\n{base_prompt}"
+    return base_prompt  # No repetition for reasoning models
+def is_structured_task(task):
+    """Determine if task benefits from prompt repetition."""
+    structured_keywords = [
+        "test", "lint", "format", "parse", "find", "list",
+        "extract", "categorize", "count", "filter"
+    ]
+    return any(kw in task.description.lower() for kw in structured_keywords)
+def requires_position_accuracy(task):
+    """Check if task requires precise position/order handling."""
+    position_keywords = [
+        "order", "sequence", "position", "index", "nth",
+        "first", "last", "middle", "between"
+    ]
+    return any(kw in task.description.lower() for kw in position_keywords)
+```
+### Manual Override
+Disable repetition for specific tasks:
+```python
+Task(
+    model="haiku",
+    description="Generate creative names",
+    prompt="Suggest 10 creative product names",
+    metadata={"disable_prompt_repetition": True}
+)
+```
+---
+## Research Citations
+**Paper:** Leviathan, Y., Kalman, M., & Matias, Y. (2025). *Prompt Repetition Improves Non-Reasoning LLMs*. arXiv:2512.14982v1.
+**Key Quotes:**
+- "Prompt repetition wins 47 out of 70 tests, with 0 losses"
+- "No increase in output lengths or generation times"
+- "Results neutral to slightly positive when reasoning enabled"
+---
+## Best Practices
+1. **Always repeat for Haiku** on structured tasks (unit tests, linting, parsing)
+2. **Never repeat for Opus/Sonnet** (reasoning models see no benefit)
+3. **Use 2x repetition** as default (diminishing returns beyond 3x)
+4. **Test with/without** repetition on critical tasks to validate improvement
+5. **Monitor token usage** - input tokens increase 2-3x (but still cost-effective due to accuracy gains)
+---
+## Cost-Benefit Analysis
+### Example: Unit Test Execution
+**Without Repetition:**
+- Accuracy: 65%
+- Input tokens: 500
+- Retries needed: 2-3
+- Total cost: 500 + (500 × 2) = 1500 tokens
+**With 2x Repetition:**
+- Accuracy: 95%
+- Input tokens: 1000 (2x)
+- Retries needed: 0-1
+- Total cost: 1000 + (1000 × 0.5) = 1500 tokens
+**Result:** Same cost, 46% higher accuracy.
+---
+**Version:** 1.0.0 | **Research Source:** arXiv 2512.14982v1 (2025)

package/references/quality-control.md ADDED Viewed

@@ -0,0 +1,437 @@
+# Quality Control Reference
+Quality gates, code review process, and severity blocking rules.
+Enhanced with 2025 research on anti-sycophancy, heterogeneous teams, and OpenAI Agents SDK patterns.
+---
+## Core Principle: Guardrails, Not Just Acceleration
+**CRITICAL:** Speed without quality controls creates "AI slop" - semi-functional code that accumulates technical debt. Loki Mode enforces strict quality guardrails.
+**Research Insight:** Heterogeneous review teams outperform homogeneous ones by 4-6% (A-HMAD, 2025).
+**OpenAI Insight:** "Think of guardrails as a layered defense mechanism. Multiple specialized guardrails create resilient agents."
+---
+## Guardrails & Tripwires System (OpenAI SDK Pattern)
+### Input Guardrails (Run Before Execution)
+```python
+# Layer 1: Validate task scope and safety
+@input_guardrail(blocking=True)
+async def validate_task_scope(input, context):
+    # Check if task within project bounds
+    if references_external_paths(input):
+        return GuardrailResult(
+            tripwire_triggered=True,
+            reason="Task references paths outside project"
+        )
+    # Check for destructive operations
+    if contains_destructive_operation(input):
+        return GuardrailResult(
+            tripwire_triggered=True,
+            reason="Destructive operation requires human approval"
+        )
+    return GuardrailResult(tripwire_triggered=False)
+# Layer 2: Detect prompt injection
+@input_guardrail(blocking=True)
+async def detect_injection(input, context):
+    if has_injection_patterns(input):
+        return GuardrailResult(
+            tripwire_triggered=True,
+            reason="Potential prompt injection detected"
+        )
+    return GuardrailResult(tripwire_triggered=False)
+```
+### Output Guardrails (Run After Execution)
+```python
+# Validate code quality before accepting
+@output_guardrail
+async def validate_code_output(output, context):
+    if output.type == "code":
+        issues = run_static_analysis(output.content)
+        critical = [i for i in issues if i.severity == "critical"]
+        if critical:
+            return GuardrailResult(
+                tripwire_triggered=True,
+                reason=f"Critical issues: {critical}"
+            )
+    return GuardrailResult(tripwire_triggered=False)
+# Check for secrets in output
+@output_guardrail
+async def check_secrets(output, context):
+    if contains_secrets(output.content):
+        return GuardrailResult(
+            tripwire_triggered=True,
+            reason="Output contains potential secrets"
+        )
+    return GuardrailResult(tripwire_triggered=False)
+```
+### Execution Modes
+| Mode | Behavior | Use When |
+|------|----------|----------|
+| **Blocking** | Guardrail completes before agent starts | Expensive models, sensitive ops |
+| **Parallel** | Guardrail runs with agent | Fast checks, acceptable token loss |
+```python
+# Blocking: prevents token consumption on fail
+@input_guardrail(blocking=True, run_in_parallel=False)
+async def expensive_validation(input): pass
+# Parallel: faster but may waste tokens
+@input_guardrail(blocking=True, run_in_parallel=True)
+async def fast_validation(input): pass
+```
+### Tripwire Handling
+When a guardrail triggers its tripwire, execution halts immediately:
+```python
+try:
+    result = await run_agent(task)
+except InputGuardrailTripwireTriggered as e:
+    log_blocked_attempt(e)
+    return early_exit(reason=str(e))
+except OutputGuardrailTripwireTriggered as e:
+    rollback_changes()
+    return retry_with_constraints(e.constraints)
+```
+### Layered Defense Strategy
+```yaml
+guardrail_layers:
+  layer_1_input:
+    - scope_validation      # Is task within bounds?
+    - pii_detection         # Contains sensitive data?
+    - injection_detection   # Prompt injection attempt?
+  layer_2_pre_execution:
+    - cost_estimation       # Will this exceed budget?
+    - dependency_check      # Are dependencies available?
+    - conflict_detection    # Conflicts with in-progress work?
+  layer_3_output:
+    - static_analysis       # Code quality issues?
+    - secret_detection      # Secrets in output?
+    - spec_compliance       # Matches OpenAPI spec?
+  layer_4_post_action:
+    - test_validation       # Tests pass?
+    - review_approval       # Review passed?
+    - deployment_safety     # Safe to deploy?
+```
+See `references/openai-patterns.md` for full guardrails implementation.
+---
+## Quality Gates
+**Never ship code without passing all quality gates:**
+### 1. Static Analysis (Automated)
+- CodeQL security scanning
+- ESLint/Pylint/Rubocop for code style
+- Unused variable/import detection
+- Duplicated logic detection
+- Type checking (TypeScript/mypy/etc)
+### 2. 3-Reviewer Parallel System (AI-driven)
+Every code change goes through 3 specialized reviewers **simultaneously**:
+```
+IMPLEMENT -> BLIND REVIEW (parallel) -> DEBATE (if disagreement) -> AGGREGATE -> FIX -> RE-REVIEW
+                |
+                +-- code-reviewer (Sonnet) - Code quality, patterns, best practices
+                +-- business-logic-reviewer (Sonnet) - Requirements, edge cases, UX
+                +-- security-reviewer (Sonnet) - Vulnerabilities, OWASP Top 10
+```
+**Important:**
+- ALWAYS launch all 3 reviewers in a single message (3 Task calls)
+- ALWAYS specify model: "sonnet" for each reviewer
+- ALWAYS use blind review mode (reviewers cannot see each other's findings initially)
+- NEVER dispatch reviewers sequentially (always parallel - 3x faster)
+- NEVER aggregate before all 3 reviewers complete
+### Anti-Sycophancy Protocol (CONSENSAGENT Research)
+**Problem:** Reviewers may reinforce each other's findings instead of critically engaging.
+**Solution: Blind Review + Devil's Advocate**
+```python
+# Phase 1: Independent blind review
+reviews = []
+for reviewer in [code_reviewer, business_reviewer, security_reviewer]:
+    review = Task(
+        subagent_type="general-purpose",
+        model="opus",
+        prompt=f"""
+        {reviewer.prompt}
+        CRITICAL: Be skeptical. Your job is to find problems.
+        List specific concerns with file:line references.
+        Do NOT rubber-stamp. Finding zero issues is suspicious.
+        """
+    )
+    reviews.append(review)
+# Phase 2: Check for disagreement
+if has_disagreement(reviews):
+    # Structured debate - max 2 rounds
+    debate_result = structured_debate(reviews, max_rounds=2)
+else:
+    # All agreed - run devil's advocate
+    devil_review = Task(
+        subagent_type="general-purpose",
+        model="opus",
+        prompt="""
+        The other reviewers found no issues. Your job is to be contrarian.
+        Find problems they missed. Challenge assumptions.
+        If truly nothing wrong, explain why each potential issue category is covered.
+        """
+    )
+    reviews.append(devil_review)
+```
+### Heterogeneous Team Composition
+**Each reviewer has distinct personality/focus:**
+| Reviewer | Model | Expertise | Personality |
+|----------|-------|-----------|-------------|
+| Code Quality | Opus | SOLID, patterns, maintainability | Perfectionist |
+| Business Logic | Opus | Requirements, edge cases, UX | Pragmatic |
+| Security | Opus | OWASP, auth, injection | Paranoid |
+This diversity prevents groupthink and catches more issues.
+### 3. Severity-Based Blocking
+| Severity | Action | Continue? |
+|----------|--------|-----------|
+| **Critical** | BLOCK - Fix immediately | NO |
+| **High** | BLOCK - Fix immediately | NO |
+| **Medium** | BLOCK - Fix before proceeding | NO |
+| **Low** | Add `// TODO(review): ...` comment | YES |
+| **Cosmetic** | Add `// FIXME(nitpick): ...` comment | YES |
+**Critical/High/Medium = BLOCK and fix before proceeding**
+**Low/Cosmetic = Add TODO/FIXME comment, continue**
+### 4. Test Coverage Gates
+- Unit tests: 100% pass, >80% coverage
+- Integration tests: 100% pass
+- E2E tests: critical flows pass
+### 5. Rulesets (Blocking Merges)
+- No secrets in code
+- No unhandled exceptions
+- No SQL injection vulnerabilities
+- No XSS vulnerabilities
+---
+## Code Review Protocol
+### Launching Reviewers (Parallel)
+```python
+# CORRECT: Launch all 3 in parallel
+Task(subagent_type="general-purpose", model="opus",
+     description="Code quality review",
+     prompt="Review for code quality, patterns, SOLID principles...")
+Task(subagent_type="general-purpose", model="opus",
+     description="Business logic review",
+     prompt="Review for requirements alignment, edge cases, UX...")
+Task(subagent_type="general-purpose", model="opus",
+     description="Security review",
+     prompt="Review for vulnerabilities, OWASP Top 10...")
+# WRONG: Sequential reviewers (3x slower)
+# Don't do: await reviewer1; await reviewer2; await reviewer3;
+```
+### After Fixes
+- ALWAYS re-run ALL 3 reviewers after fixes (not just the one that found the issue)
+- Wait for all reviews to complete before aggregating results
+---
+## Structured Prompting for Subagents
+**Every subagent dispatch MUST include:**
+```markdown
+## GOAL (What success looks like)
+[High-level objective, not just the action]
+Example: "Refactor authentication for maintainability and testability"
+NOT: "Refactor the auth file"
+## CONSTRAINTS (What you cannot do)
+- No third-party dependencies without approval
+- Maintain backwards compatibility with v1.x API
+- Keep response time under 200ms
+- Follow existing error handling patterns
+## CONTEXT (What you need to know)
+- Related files: [list with brief descriptions]
+- Architecture decisions: [relevant ADRs or patterns]
+- Previous attempts: [what was tried, why it failed]
+- Dependencies: [what this depends on, what depends on this]
+## OUTPUT FORMAT (What to deliver)
+- [ ] Pull request with Why/What/Trade-offs description
+- [ ] Unit tests with >90% coverage
+- [ ] Update API documentation
+- [ ] Performance benchmark results
+```
+---
+## Task Completion Report
+**Every completed task MUST include decision documentation:**
+```markdown
+## Task Completion Report
+### WHY (Problem & Solution Rationale)
+- **Problem**: [What was broken/missing/suboptimal]
+- **Root Cause**: [Why it happened]
+- **Solution Chosen**: [What we implemented]
+- **Alternatives Considered**:
+  1. [Option A]: Rejected because [reason]
+  2. [Option B]: Rejected because [reason]
+### WHAT (Changes Made)
+- **Files Modified**: [with line ranges and purpose]
+  - `src/auth.ts:45-89` - Extracted token validation to separate function
+  - `src/auth.test.ts:120-156` - Added edge case tests
+- **APIs Changed**: [breaking vs non-breaking]
+- **Behavior Changes**: [what users will notice]
+- **Dependencies Added/Removed**: [with justification]
+### TRADE-OFFS (Gains & Costs)
+- **Gained**:
+  - Better testability (extracted pure functions)
+  - 40% faster token validation
+  - Reduced cyclomatic complexity from 15 to 6
+- **Cost**:
+  - Added 2 new functions (increased surface area)
+  - Requires migration for custom token validators
+- **Neutral**:
+  - No performance change for standard use cases
+### RISKS & MITIGATIONS
+- **Risk**: Existing custom validators may break
+  - **Mitigation**: Added backwards-compatibility shim, deprecation warning
+- **Risk**: New validation logic untested at scale
+  - **Mitigation**: Gradual rollout with feature flag, rollback plan ready
+### TEST RESULTS
+- Unit: 24/24 passed (coverage: 92%)
+- Integration: 8/8 passed
+- Performance: p99 improved from 145ms -> 87ms
+### NEXT STEPS (if any)
+- [ ] Monitor error rates for 24h post-deploy
+- [ ] Create follow-up task to remove compatibility shim in v3.0
+```
+---
+## Preventing "AI Slop"
+### Warning Signs
+- Tests pass but code quality degraded
+- Copy-paste duplication instead of abstraction
+- Over-engineered solutions to simple problems
+- Missing error handling
+- No logging/observability
+- Generic variable names (data, temp, result)
+- Magic numbers without constants
+- Commented-out code
+- TODO comments without GitHub issues
+### When Detected
+1. Fail the task immediately
+2. Add to failed queue with detailed feedback
+3. Re-dispatch with stricter constraints
+4. Update CONTINUITY.md with anti-pattern to avoid
+---
+## Quality Gate Hooks
+### Pre-Write Hook (BLOCKING)
+```bash
+#!/bin/bash
+# .loki/hooks/pre-write.sh
+# Blocks writes that violate rules
+# Check for secrets
+if grep -rE "(password|secret|key).*=.*['\"][^'\"]{8,}" "$1"; then
+  echo "BLOCKED: Potential secret detected"
+  exit 1
+fi
+# Check for console.log in production
+if grep -n "console.log" "$1" | grep -v "test"; then
+  echo "BLOCKED: Remove console.log statements"
+  exit 1
+fi
+```
+### Post-Write Hook (AUTO-FIX)
+```bash
+#!/bin/bash
+# .loki/hooks/post-write.sh
+# Auto-fixes after writes
+# Format code
+npx prettier --write "$1"
+# Fix linting issues
+npx eslint --fix "$1"
+# Type check
+npx tsc --noEmit
+```
+---
+## Constitution Reference
+Quality gates are enforced by `autonomy/CONSTITUTION.md`:
+**Pre-Commit (BLOCKING):**
+- Linting (auto-fix enabled)
+- Type checking (strict mode)
+- Contract tests (80% coverage minimum)
+- Spec validation (Spectral)
+**Post-Implementation (AUTO-FIX):**
+- Static analysis (ESLint, Prettier, TSC)
+- Security scan (Semgrep, Snyk)
+- Performance check (Lighthouse score 90+)
+**Runtime Invariants:**
+- `SPEC_BEFORE_CODE`: Implementation tasks require spec reference
+- `TASK_HAS_COMMIT`: Completed tasks have git commit SHA
+- `QUALITY_GATES_PASSED`: Completed tasks passed all quality checks