loki-mode 4.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +691 -0
- package/SKILL.md +191 -0
- package/VERSION +1 -0
- package/autonomy/.loki/dashboard/index.html +2634 -0
- package/autonomy/CONSTITUTION.md +508 -0
- package/autonomy/README.md +201 -0
- package/autonomy/config.example.yaml +152 -0
- package/autonomy/loki +526 -0
- package/autonomy/run.sh +3636 -0
- package/bin/loki-mode.js +26 -0
- package/bin/postinstall.js +60 -0
- package/docs/ACKNOWLEDGEMENTS.md +234 -0
- package/docs/COMPARISON.md +325 -0
- package/docs/COMPETITIVE-ANALYSIS.md +333 -0
- package/docs/INSTALLATION.md +547 -0
- package/docs/auto-claude-comparison.md +276 -0
- package/docs/cursor-comparison.md +225 -0
- package/docs/dashboard-guide.md +355 -0
- package/docs/screenshots/README.md +149 -0
- package/docs/screenshots/dashboard-agents.png +0 -0
- package/docs/screenshots/dashboard-tasks.png +0 -0
- package/docs/thick2thin.md +173 -0
- package/package.json +48 -0
- package/references/advanced-patterns.md +453 -0
- package/references/agent-types.md +243 -0
- package/references/agents.md +1043 -0
- package/references/business-ops.md +550 -0
- package/references/competitive-analysis.md +216 -0
- package/references/confidence-routing.md +371 -0
- package/references/core-workflow.md +275 -0
- package/references/cursor-learnings.md +207 -0
- package/references/deployment.md +604 -0
- package/references/lab-research-patterns.md +534 -0
- package/references/mcp-integration.md +186 -0
- package/references/memory-system.md +467 -0
- package/references/openai-patterns.md +647 -0
- package/references/production-patterns.md +568 -0
- package/references/prompt-repetition.md +192 -0
- package/references/quality-control.md +437 -0
- package/references/sdlc-phases.md +410 -0
- package/references/task-queue.md +361 -0
- package/references/tool-orchestration.md +691 -0
- package/skills/00-index.md +120 -0
- package/skills/agents.md +249 -0
- package/skills/artifacts.md +174 -0
- package/skills/github-integration.md +218 -0
- package/skills/model-selection.md +125 -0
- package/skills/parallel-workflows.md +526 -0
- package/skills/patterns-advanced.md +188 -0
- package/skills/production.md +292 -0
- package/skills/quality-gates.md +180 -0
- package/skills/testing.md +149 -0
- package/skills/troubleshooting.md +109 -0
|
@@ -0,0 +1,292 @@
|
|
|
1
|
+
# Production Patterns
|
|
2
|
+
|
|
3
|
+
## HN 2025 Battle-Tested Patterns
|
|
4
|
+
|
|
5
|
+
### Narrow Scope Wins
|
|
6
|
+
|
|
7
|
+
```yaml
|
|
8
|
+
task_constraints:
|
|
9
|
+
max_steps_before_review: 3-5
|
|
10
|
+
characteristics:
|
|
11
|
+
- Specific, well-defined objectives
|
|
12
|
+
- Pre-classified inputs
|
|
13
|
+
- Deterministic success criteria
|
|
14
|
+
- Verifiable outputs
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
### Deterministic Outer Loops
|
|
18
|
+
|
|
19
|
+
**Wrap agent outputs with rule-based validation (NOT LLM-judged):**
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
1. Agent generates output
|
|
23
|
+
2. Run linter (deterministic)
|
|
24
|
+
3. Run tests (deterministic)
|
|
25
|
+
4. Check compilation (deterministic)
|
|
26
|
+
5. Only then: human or AI review
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Context Engineering
|
|
30
|
+
|
|
31
|
+
```yaml
|
|
32
|
+
principles:
|
|
33
|
+
- "Less is more" - focused beats comprehensive
|
|
34
|
+
- Manual selection outperforms automatic RAG
|
|
35
|
+
- Fresh conversations per major task
|
|
36
|
+
- Remove outdated information aggressively
|
|
37
|
+
|
|
38
|
+
context_budget:
|
|
39
|
+
target: "< 10k tokens for context"
|
|
40
|
+
reserve: "90% for model reasoning"
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Proactive Context Management (OpenCode Pattern)
|
|
46
|
+
|
|
47
|
+
**Prevent context overflow in long autonomous sessions:**
|
|
48
|
+
|
|
49
|
+
```yaml
|
|
50
|
+
compaction_strategy:
|
|
51
|
+
trigger: "Every 25 iterations OR context feels heavy"
|
|
52
|
+
|
|
53
|
+
preserve_always:
|
|
54
|
+
- CONTINUITY.md content (current state)
|
|
55
|
+
- Current task specification
|
|
56
|
+
- Recent Mistakes & Learnings (last 5)
|
|
57
|
+
- Active queue items
|
|
58
|
+
|
|
59
|
+
consolidate:
|
|
60
|
+
- Completed task summaries -> semantic memory
|
|
61
|
+
- Resolved errors -> anti-patterns
|
|
62
|
+
- Successful patterns -> procedural memory
|
|
63
|
+
|
|
64
|
+
discard:
|
|
65
|
+
- Verbose tool outputs
|
|
66
|
+
- Intermediate reasoning
|
|
67
|
+
- Superseded plans
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Sub-Agents for Context Isolation
|
|
73
|
+
|
|
74
|
+
**Run expensive explorations in isolated contexts:**
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
# Heavy analysis that would bloat main context
|
|
78
|
+
Task(
|
|
79
|
+
subagent_type="Explore",
|
|
80
|
+
model="haiku",
|
|
81
|
+
description="Find all auth-related files",
|
|
82
|
+
prompt="Search codebase for authentication patterns. Return only file paths."
|
|
83
|
+
)
|
|
84
|
+
# Main context stays clean; only results return
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Git Worktree Isolation (Cursor Pattern)
|
|
90
|
+
|
|
91
|
+
**Use git worktrees for parallel implementation agents:**
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# Create isolated worktree for feature
|
|
95
|
+
git worktree add ../project-feature-auth feature/auth
|
|
96
|
+
|
|
97
|
+
# Agent works in isolated worktree
|
|
98
|
+
cd ../project-feature-auth
|
|
99
|
+
# ... implement feature ...
|
|
100
|
+
|
|
101
|
+
# Merge back when complete
|
|
102
|
+
git checkout main
|
|
103
|
+
git merge feature/auth
|
|
104
|
+
git worktree remove ../project-feature-auth
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Benefits:**
|
|
108
|
+
- Multiple agents can work in parallel without conflicts
|
|
109
|
+
- Each agent has clean, isolated file state
|
|
110
|
+
- Merges happen explicitly, not through file racing
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## Atomic Checkpoint/Rollback (Cursor Pattern)
|
|
115
|
+
|
|
116
|
+
```yaml
|
|
117
|
+
checkpoint_strategy:
|
|
118
|
+
when:
|
|
119
|
+
- Before spawning any subagent
|
|
120
|
+
- Before any destructive operation
|
|
121
|
+
- After completing a task successfully
|
|
122
|
+
|
|
123
|
+
how:
|
|
124
|
+
- git commit -m "CHECKPOINT: before {operation}"
|
|
125
|
+
- Record commit hash in CONTINUITY.md
|
|
126
|
+
|
|
127
|
+
rollback:
|
|
128
|
+
- git reset --hard {checkpoint_hash}
|
|
129
|
+
- Update CONTINUITY.md with rollback reason
|
|
130
|
+
- Add to Mistakes & Learnings
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## CI/CD Automation (Zencoder Patterns)
|
|
136
|
+
|
|
137
|
+
### CI Failure Analysis and Auto-Resolution
|
|
138
|
+
|
|
139
|
+
```yaml
|
|
140
|
+
ci_failure_workflow:
|
|
141
|
+
1. Detect CI failure (webhook or poll)
|
|
142
|
+
2. Parse error logs for root cause
|
|
143
|
+
3. Classify failure type:
|
|
144
|
+
- Test failure: Fix code, re-run tests
|
|
145
|
+
- Lint failure: Auto-fix with --fix flag
|
|
146
|
+
- Build failure: Check dependencies, configs
|
|
147
|
+
- Flaky test: Mark and investigate separately
|
|
148
|
+
4. Apply fix and push
|
|
149
|
+
5. Monitor CI result
|
|
150
|
+
6. If still failing after 3 attempts: escalate
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Automated Review Comment Resolution
|
|
154
|
+
|
|
155
|
+
```yaml
|
|
156
|
+
pr_comment_workflow:
|
|
157
|
+
trigger: "New review comment on PR"
|
|
158
|
+
|
|
159
|
+
workflow:
|
|
160
|
+
1. Parse comment for actionable feedback
|
|
161
|
+
2. Classify: bug, style, question, suggestion
|
|
162
|
+
3. For bugs/style: implement fix
|
|
163
|
+
4. For questions: add code comment or respond
|
|
164
|
+
5. For suggestions: evaluate and implement if beneficial
|
|
165
|
+
6. Push changes and mark comment resolved
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Continuous Dependency Management
|
|
169
|
+
|
|
170
|
+
```yaml
|
|
171
|
+
dependency_workflow:
|
|
172
|
+
schedule: "Weekly or on security advisory"
|
|
173
|
+
|
|
174
|
+
workflow:
|
|
175
|
+
1. Run npm audit / pip-audit / cargo audit
|
|
176
|
+
2. Classify vulnerabilities by severity
|
|
177
|
+
3. For Critical/High: immediate update
|
|
178
|
+
4. For Medium: schedule update
|
|
179
|
+
5. Run full test suite after updates
|
|
180
|
+
6. Create PR with changelog
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Batch Processing (Claude API)
|
|
186
|
+
|
|
187
|
+
**50% cost reduction for large-scale async operations.**
|
|
188
|
+
|
|
189
|
+
### When to Use Batch API
|
|
190
|
+
|
|
191
|
+
| Use Case | Batch? | Reasoning |
|
|
192
|
+
|----------|--------|-----------|
|
|
193
|
+
| Single code review | No | Immediate feedback needed |
|
|
194
|
+
| Review 100+ files | Yes | 50% savings, async OK |
|
|
195
|
+
| Generate tests for all modules | Yes | Bulk operation |
|
|
196
|
+
| Interactive development | No | Need real-time responses |
|
|
197
|
+
| Large-scale evaluations | Yes | Cost-critical at scale |
|
|
198
|
+
| QA phase bulk analysis | Yes | Can wait for results |
|
|
199
|
+
|
|
200
|
+
### Batch API Limits
|
|
201
|
+
|
|
202
|
+
```yaml
|
|
203
|
+
limits:
|
|
204
|
+
max_requests: 100,000 per batch
|
|
205
|
+
max_size: 256 MB per batch
|
|
206
|
+
processing_time: Most < 1 hour (max 24h)
|
|
207
|
+
results_available: 29 days
|
|
208
|
+
|
|
209
|
+
pricing:
|
|
210
|
+
discount: 50% on all tokens
|
|
211
|
+
stacks_with: Prompt caching (30-98% cache hits)
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### Implementation Pattern
|
|
215
|
+
|
|
216
|
+
```python
|
|
217
|
+
import anthropic
|
|
218
|
+
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
|
|
219
|
+
from anthropic.types.messages.batch_create_params import Request
|
|
220
|
+
|
|
221
|
+
client = anthropic.Anthropic()
|
|
222
|
+
|
|
223
|
+
# Batch all code review requests
|
|
224
|
+
def batch_code_review(files: list[str]) -> str:
|
|
225
|
+
requests = [
|
|
226
|
+
Request(
|
|
227
|
+
custom_id=f"review-{i}-{file.replace('/', '-')}",
|
|
228
|
+
params=MessageCreateParamsNonStreaming(
|
|
229
|
+
model="claude-sonnet-4-5",
|
|
230
|
+
max_tokens=2048,
|
|
231
|
+
messages=[{
|
|
232
|
+
"role": "user",
|
|
233
|
+
"content": f"Review this code for bugs, security, performance:\n\n{open(file).read()}"
|
|
234
|
+
}]
|
|
235
|
+
)
|
|
236
|
+
)
|
|
237
|
+
for i, file in enumerate(files)
|
|
238
|
+
]
|
|
239
|
+
|
|
240
|
+
batch = client.messages.batches.create(requests=requests)
|
|
241
|
+
return batch.id # Poll for results later
|
|
242
|
+
|
|
243
|
+
# Poll for completion
|
|
244
|
+
def wait_for_batch(batch_id: str):
|
|
245
|
+
while True:
|
|
246
|
+
batch = client.messages.batches.retrieve(batch_id)
|
|
247
|
+
if batch.processing_status == "ended":
|
|
248
|
+
return batch
|
|
249
|
+
time.sleep(60)
|
|
250
|
+
|
|
251
|
+
# Stream results
|
|
252
|
+
def process_results(batch_id: str):
|
|
253
|
+
for result in client.messages.batches.results(batch_id):
|
|
254
|
+
if result.result.type == "succeeded":
|
|
255
|
+
# Process successful review
|
|
256
|
+
handle_review(result.custom_id, result.result.message)
|
|
257
|
+
elif result.result.type == "errored":
|
|
258
|
+
# Retry or log error
|
|
259
|
+
handle_error(result.custom_id, result.result.error)
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
### Batch + Prompt Caching
|
|
263
|
+
|
|
264
|
+
**Stack discounts for maximum savings:**
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
# All requests share cached system prompt
|
|
268
|
+
SHARED_SYSTEM = [
|
|
269
|
+
{"type": "text", "text": "You are a code reviewer..."},
|
|
270
|
+
{"type": "text", "text": CODING_STANDARDS, # Large shared context
|
|
271
|
+
"cache_control": {"type": "ephemeral"}}
|
|
272
|
+
]
|
|
273
|
+
|
|
274
|
+
requests = [
|
|
275
|
+
Request(
|
|
276
|
+
custom_id=f"review-{file}",
|
|
277
|
+
params=MessageCreateParamsNonStreaming(
|
|
278
|
+
model="claude-sonnet-4-5",
|
|
279
|
+
max_tokens=2048,
|
|
280
|
+
system=SHARED_SYSTEM, # Identical across all requests
|
|
281
|
+
messages=[{"role": "user", "content": f"Review: {code}"}]
|
|
282
|
+
)
|
|
283
|
+
)
|
|
284
|
+
for file, code in files_with_code
|
|
285
|
+
]
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
**Cost math:**
|
|
289
|
+
- Base: $3/MTok input, $15/MTok output (Sonnet)
|
|
290
|
+
- Batch discount: 50% -> $1.50/$7.50
|
|
291
|
+
- Cache hit: 90% reduction on cached tokens
|
|
292
|
+
- Combined: Up to 95% savings on large batches
|
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
# Quality Gates
|
|
2
|
+
|
|
3
|
+
**Never ship code without passing all quality gates.**
|
|
4
|
+
|
|
5
|
+
## The 7 Quality Gates
|
|
6
|
+
|
|
7
|
+
1. **Input Guardrails** - Validate scope, detect injection, check constraints (OpenAI SDK)
|
|
8
|
+
2. **Static Analysis** - CodeQL, ESLint/Pylint, type checking
|
|
9
|
+
3. **Blind Review System** - 3 reviewers in parallel, no visibility of each other's findings
|
|
10
|
+
4. **Anti-Sycophancy Check** - If unanimous approval, run Devil's Advocate reviewer
|
|
11
|
+
5. **Output Guardrails** - Validate code quality, spec compliance, no secrets (tripwire on fail)
|
|
12
|
+
6. **Severity-Based Blocking** - Critical/High/Medium = BLOCK; Low/Cosmetic = TODO comment
|
|
13
|
+
7. **Test Coverage Gates** - Unit: 100% pass, >80% coverage; Integration: 100% pass
|
|
14
|
+
|
|
15
|
+
## Guardrails Execution Modes
|
|
16
|
+
|
|
17
|
+
- **Blocking**: Guardrail completes before agent starts (use for expensive operations)
|
|
18
|
+
- **Parallel**: Guardrail runs with agent (use for fast checks, accept token loss risk)
|
|
19
|
+
|
|
20
|
+
**Research:** Blind review + Devil's Advocate reduces false positives by 30% (CONSENSAGENT, 2025)
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Velocity-Quality Feedback Loop (CRITICAL)
|
|
25
|
+
|
|
26
|
+
**Research from arXiv 2511.04427v2 - empirical study of 807 repositories.**
|
|
27
|
+
|
|
28
|
+
### Key Findings
|
|
29
|
+
|
|
30
|
+
| Metric | Finding | Implication |
|
|
31
|
+
|--------|---------|-------------|
|
|
32
|
+
| Initial Velocity | +281% lines added | Impressive but TRANSIENT |
|
|
33
|
+
| Quality Degradation | +30% static warnings, +41% complexity | PERSISTENT problem |
|
|
34
|
+
| Cancellation Point | 3.28x complexity OR 4.94x warnings | Completely negates velocity gains |
|
|
35
|
+
|
|
36
|
+
### The Trap to Avoid
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
Initial excitement -> Velocity spike -> Quality degradation accumulates
|
|
40
|
+
|
|
|
41
|
+
v
|
|
42
|
+
Complexity cancels velocity gains
|
|
43
|
+
|
|
|
44
|
+
v
|
|
45
|
+
Frustration -> Abandonment cycle
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**CRITICAL RULE:** Every velocity gain MUST be accompanied by quality verification.
|
|
49
|
+
|
|
50
|
+
### Mandatory Quality Checks (Per Task)
|
|
51
|
+
|
|
52
|
+
```yaml
|
|
53
|
+
velocity_quality_balance:
|
|
54
|
+
before_commit:
|
|
55
|
+
- static_analysis: "Run ESLint/Pylint/CodeQL - warnings must not increase"
|
|
56
|
+
- complexity_check: "Cyclomatic complexity must not increase >10%"
|
|
57
|
+
- test_coverage: "Coverage must not decrease"
|
|
58
|
+
|
|
59
|
+
thresholds:
|
|
60
|
+
max_new_warnings: 0 # Zero tolerance for new warnings
|
|
61
|
+
max_complexity_increase: 10% # Per file, per commit
|
|
62
|
+
min_coverage: 80% # Never drop below
|
|
63
|
+
|
|
64
|
+
if_threshold_violated:
|
|
65
|
+
action: "BLOCK commit, fix before proceeding"
|
|
66
|
+
reason: "Velocity gains without quality are net negative"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Metrics to Track
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
.loki/metrics/quality/
|
|
73
|
+
+-- warnings.json # Static analysis warning count over time
|
|
74
|
+
+-- complexity.json # Cyclomatic complexity per file
|
|
75
|
+
+-- coverage.json # Test coverage percentage
|
|
76
|
+
+-- velocity.json # Lines added/commits per hour
|
|
77
|
+
+-- ratio.json # Quality/Velocity ratio (must stay positive)
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Blind Review System
|
|
83
|
+
|
|
84
|
+
**Launch 3 reviewers in parallel in a single message:**
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
# ALWAYS launch all 3 in ONE message
|
|
88
|
+
Task(model="sonnet", description="Code review: correctness", prompt="Review for bugs...")
|
|
89
|
+
Task(model="sonnet", description="Code review: security", prompt="Review for vulnerabilities...")
|
|
90
|
+
Task(model="sonnet", description="Code review: performance", prompt="Review for efficiency...")
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
**Rules:**
|
|
94
|
+
- ALWAYS use sonnet for reviews (balanced quality/cost)
|
|
95
|
+
- NEVER aggregate before all 3 complete
|
|
96
|
+
- ALWAYS re-run ALL 3 after fixes
|
|
97
|
+
- If unanimous approval -> run Devil's Advocate
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Severity-Based Blocking
|
|
102
|
+
|
|
103
|
+
| Severity | Action |
|
|
104
|
+
|----------|--------|
|
|
105
|
+
| Critical | BLOCK - fix immediately |
|
|
106
|
+
| High | BLOCK - fix before commit |
|
|
107
|
+
| Medium | BLOCK - fix before merge |
|
|
108
|
+
| Low | TODO comment, fix later |
|
|
109
|
+
| Cosmetic | Note, optional fix |
|
|
110
|
+
|
|
111
|
+
See `references/quality-control.md` for complete details.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Scale Considerations
|
|
116
|
+
|
|
117
|
+
> **Source:** [Cursor Scaling Learnings](../references/cursor-learnings.md) - integrators became bottlenecks at 100+ agents
|
|
118
|
+
|
|
119
|
+
### Review Intensity Scaling
|
|
120
|
+
|
|
121
|
+
At high agent counts, full 3-reviewer blind review for every change creates bottlenecks.
|
|
122
|
+
|
|
123
|
+
```yaml
|
|
124
|
+
review_scaling:
|
|
125
|
+
low_scale: # <10 agents
|
|
126
|
+
all_changes: "Full 3-reviewer blind review"
|
|
127
|
+
rationale: "Quality critical, throughput acceptable"
|
|
128
|
+
|
|
129
|
+
medium_scale: # 10-50 agents
|
|
130
|
+
high_risk: "Full 3-reviewer blind review"
|
|
131
|
+
medium_risk: "2-reviewer review"
|
|
132
|
+
low_risk: "1 reviewer + automated checks"
|
|
133
|
+
rationale: "Balance quality and throughput"
|
|
134
|
+
|
|
135
|
+
high_scale: # 50+ agents
|
|
136
|
+
critical_changes: "Full 3-reviewer blind review"
|
|
137
|
+
standard_changes: "Automated checks + spot review"
|
|
138
|
+
trivial_changes: "Automated checks only"
|
|
139
|
+
rationale: "Trust workers, avoid bottlenecks"
|
|
140
|
+
|
|
141
|
+
risk_classification:
|
|
142
|
+
high_risk:
|
|
143
|
+
- Security-related changes
|
|
144
|
+
- Authentication/authorization
|
|
145
|
+
- Payment processing
|
|
146
|
+
- Data migrations
|
|
147
|
+
- API breaking changes
|
|
148
|
+
medium_risk:
|
|
149
|
+
- New features
|
|
150
|
+
- Business logic changes
|
|
151
|
+
- Database schema changes
|
|
152
|
+
low_risk:
|
|
153
|
+
- Bug fixes with tests
|
|
154
|
+
- Refactoring with no behavior change
|
|
155
|
+
- Documentation
|
|
156
|
+
- Dependency updates (minor)
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Judge Agent Integration
|
|
160
|
+
|
|
161
|
+
Use judge agents to determine when full review is needed:
|
|
162
|
+
|
|
163
|
+
```yaml
|
|
164
|
+
judge_review_decision:
|
|
165
|
+
inputs:
|
|
166
|
+
- change_type: "feature|bugfix|refactor|docs"
|
|
167
|
+
- files_changed: 5
|
|
168
|
+
- lines_changed: 120
|
|
169
|
+
- test_coverage: 85%
|
|
170
|
+
- static_analysis: "0 new warnings"
|
|
171
|
+
output:
|
|
172
|
+
review_level: "full|partial|automated"
|
|
173
|
+
rationale: "Medium-risk feature with good coverage"
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Cursor's Key Learning
|
|
177
|
+
|
|
178
|
+
> "Dedicated integrator/reviewer roles created more bottlenecks than they solved. Workers were already capable of handling conflicts themselves."
|
|
179
|
+
|
|
180
|
+
**Implication:** At scale, trust automated checks and worker judgment. Reserve full review for high-risk changes only.
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
# Testing
|
|
2
|
+
|
|
3
|
+
## E2E Testing with Playwright MCP
|
|
4
|
+
|
|
5
|
+
**Use Playwright MCP for browser-based testing.**
|
|
6
|
+
|
|
7
|
+
```python
|
|
8
|
+
# E2E test after feature implementation
|
|
9
|
+
Task(
|
|
10
|
+
subagent_type="general-purpose",
|
|
11
|
+
model="sonnet",
|
|
12
|
+
description="Run E2E tests for auth flow",
|
|
13
|
+
prompt="""Use Playwright MCP to test:
|
|
14
|
+
1. Navigate to /login
|
|
15
|
+
2. Fill email and password fields
|
|
16
|
+
3. Click submit button
|
|
17
|
+
4. Verify redirect to /dashboard
|
|
18
|
+
5. Check user name appears in header
|
|
19
|
+
|
|
20
|
+
Use accessibility tree refs, not coordinates."""
|
|
21
|
+
)
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Best Practices:**
|
|
25
|
+
- Use accessibility tree refs instead of coordinates
|
|
26
|
+
- Test critical user flows after each feature
|
|
27
|
+
- Capture screenshots for error states
|
|
28
|
+
- Run after unit tests, before deployment
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Property-Based Testing (Kiro Pattern)
|
|
33
|
+
|
|
34
|
+
**Auto-generate edge case tests from specifications.**
|
|
35
|
+
|
|
36
|
+
```yaml
|
|
37
|
+
property_based_testing:
|
|
38
|
+
purpose: "Verify code meets spec constraints with hundreds of random inputs"
|
|
39
|
+
tools: "fast-check (JS/TS), hypothesis (Python), QuickCheck (Haskell)"
|
|
40
|
+
|
|
41
|
+
extract_properties_from:
|
|
42
|
+
- OpenAPI schema: "minLength, maxLength, pattern, enum, minimum, maximum"
|
|
43
|
+
- Business rules: "requirements.md invariants"
|
|
44
|
+
- Data models: "TypeScript interfaces, DB constraints"
|
|
45
|
+
|
|
46
|
+
examples:
|
|
47
|
+
- "email field always matches email regex"
|
|
48
|
+
- "price is never negative"
|
|
49
|
+
- "created_at <= updated_at always"
|
|
50
|
+
- "array length never exceeds maxItems"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**When to use:**
|
|
54
|
+
- After implementing API endpoints (validate against OpenAPI)
|
|
55
|
+
- After data model changes (validate invariants)
|
|
56
|
+
- Before deployment (edge case regression)
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Event-Driven Hooks (Kiro Pattern)
|
|
61
|
+
|
|
62
|
+
**Trigger quality checks on file operations, not just at phase boundaries.**
|
|
63
|
+
|
|
64
|
+
```yaml
|
|
65
|
+
hooks_system:
|
|
66
|
+
location: ".loki/hooks/"
|
|
67
|
+
|
|
68
|
+
triggers:
|
|
69
|
+
on_file_write:
|
|
70
|
+
- lint: "npx eslint --fix {file}"
|
|
71
|
+
- typecheck: "npx tsc --noEmit"
|
|
72
|
+
- secrets_scan: "detect-secrets scan {file}"
|
|
73
|
+
|
|
74
|
+
on_task_complete:
|
|
75
|
+
- contract_test: "npm run test:contract"
|
|
76
|
+
- spec_lint: "spectral lint .loki/specs/openapi.yaml"
|
|
77
|
+
|
|
78
|
+
on_phase_complete:
|
|
79
|
+
- memory_consolidate: "Extract patterns to semantic memory"
|
|
80
|
+
- metrics_update: "Log efficiency scores"
|
|
81
|
+
- checkpoint: "git commit with phase summary"
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**Benefits:**
|
|
85
|
+
- Catches issues 5-10x earlier than phase-end review
|
|
86
|
+
- Reduces rework cycles
|
|
87
|
+
- Aligns with Constitutional AI (continuous self-critique)
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Visual Design Input
|
|
92
|
+
|
|
93
|
+
**When given design mockups or screenshots:**
|
|
94
|
+
|
|
95
|
+
1. **Discovery Phase:** Extract visual requirements from mockups
|
|
96
|
+
2. **Development Phase:** Implement UI matching design specs
|
|
97
|
+
3. **QA Phase:** Visual regression testing with Playwright screenshots
|
|
98
|
+
4. **Verification:** Compare screenshots against original designs
|
|
99
|
+
|
|
100
|
+
```python
|
|
101
|
+
# Capture screenshot for comparison
|
|
102
|
+
Task(
|
|
103
|
+
model="sonnet",
|
|
104
|
+
description="Visual regression test",
|
|
105
|
+
prompt="""Use Playwright MCP to:
|
|
106
|
+
1. Navigate to implemented feature
|
|
107
|
+
2. Capture screenshot
|
|
108
|
+
3. Compare against design mockup at designs/feature.png
|
|
109
|
+
4. Report visual differences"""
|
|
110
|
+
)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Test Strategy by Phase
|
|
116
|
+
|
|
117
|
+
| Phase | Test Type | Tool |
|
|
118
|
+
|-------|-----------|------|
|
|
119
|
+
| Development | Unit tests | Haiku (parallel) |
|
|
120
|
+
| Development | Integration tests | Sonnet |
|
|
121
|
+
| QA | E2E tests | Playwright MCP |
|
|
122
|
+
| QA | Property-based tests | fast-check/hypothesis |
|
|
123
|
+
| Pre-deployment | Full regression | All of above |
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## Review-to-Memory Learning
|
|
128
|
+
|
|
129
|
+
**Pipe code review findings into semantic memory to prevent repeat mistakes.**
|
|
130
|
+
|
|
131
|
+
```yaml
|
|
132
|
+
review_learning:
|
|
133
|
+
trigger: "After every code review cycle"
|
|
134
|
+
|
|
135
|
+
workflow:
|
|
136
|
+
1. Complete 3-reviewer blind review
|
|
137
|
+
2. Aggregate findings by severity
|
|
138
|
+
3. For each Critical/High/Medium finding:
|
|
139
|
+
- Extract pattern description
|
|
140
|
+
- Document prevention strategy
|
|
141
|
+
- Save to .loki/memory/semantic/anti-patterns/
|
|
142
|
+
4. Link to episodic memory for traceability
|
|
143
|
+
|
|
144
|
+
output_format:
|
|
145
|
+
pattern: "Using any instead of proper TypeScript types"
|
|
146
|
+
category: "type-safety"
|
|
147
|
+
severity: "high"
|
|
148
|
+
prevention: "Always define explicit interfaces for API responses"
|
|
149
|
+
```
|