@sentry/warden 0.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/find-bugs/SKILL.md +75 -0
- package/.agents/skills/vercel-react-best-practices/AGENTS.md +2934 -0
- package/.agents/skills/vercel-react-best-practices/SKILL.md +136 -0
- package/.agents/skills/vercel-react-best-practices/rules/advanced-event-handler-refs.md +55 -0
- package/.agents/skills/vercel-react-best-practices/rules/advanced-init-once.md +42 -0
- package/.agents/skills/vercel-react-best-practices/rules/advanced-use-latest.md +39 -0
- package/.agents/skills/vercel-react-best-practices/rules/async-api-routes.md +38 -0
- package/.agents/skills/vercel-react-best-practices/rules/async-defer-await.md +80 -0
- package/.agents/skills/vercel-react-best-practices/rules/async-dependencies.md +51 -0
- package/.agents/skills/vercel-react-best-practices/rules/async-parallel.md +28 -0
- package/.agents/skills/vercel-react-best-practices/rules/async-suspense-boundaries.md +99 -0
- package/.agents/skills/vercel-react-best-practices/rules/bundle-barrel-imports.md +59 -0
- package/.agents/skills/vercel-react-best-practices/rules/bundle-conditional.md +31 -0
- package/.agents/skills/vercel-react-best-practices/rules/bundle-defer-third-party.md +49 -0
- package/.agents/skills/vercel-react-best-practices/rules/bundle-dynamic-imports.md +35 -0
- package/.agents/skills/vercel-react-best-practices/rules/bundle-preload.md +50 -0
- package/.agents/skills/vercel-react-best-practices/rules/client-event-listeners.md +74 -0
- package/.agents/skills/vercel-react-best-practices/rules/client-localstorage-schema.md +71 -0
- package/.agents/skills/vercel-react-best-practices/rules/client-passive-event-listeners.md +48 -0
- package/.agents/skills/vercel-react-best-practices/rules/client-swr-dedup.md +56 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-batch-dom-css.md +107 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-cache-function-results.md +80 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-cache-property-access.md +28 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-cache-storage.md +70 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-combine-iterations.md +32 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-early-exit.md +50 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-hoist-regexp.md +45 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-index-maps.md +37 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-length-check-first.md +49 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-min-max-loop.md +82 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-set-map-lookups.md +24 -0
- package/.agents/skills/vercel-react-best-practices/rules/js-tosorted-immutable.md +57 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-activity.md +26 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-animate-svg-wrapper.md +47 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-conditional-render.md +40 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-content-visibility.md +38 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-hoist-jsx.md +46 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-hydration-no-flicker.md +82 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-hydration-suppress-warning.md +30 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-svg-precision.md +28 -0
- package/.agents/skills/vercel-react-best-practices/rules/rendering-usetransition-loading.md +75 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-defer-reads.md +39 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-dependencies.md +45 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-derived-state-no-effect.md +40 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-derived-state.md +29 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-functional-setstate.md +74 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-lazy-state-init.md +58 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-memo-with-default-value.md +38 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-memo.md +44 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-move-effect-to-event.md +45 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-simple-expression-in-memo.md +35 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-transitions.md +40 -0
- package/.agents/skills/vercel-react-best-practices/rules/rerender-use-ref-transient-values.md +73 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-after-nonblocking.md +73 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-auth-actions.md +96 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-cache-lru.md +41 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-cache-react.md +76 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-dedup-props.md +65 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-parallel-fetching.md +83 -0
- package/.agents/skills/vercel-react-best-practices/rules/server-serialization.md +38 -0
- package/.claude/settings.json +57 -0
- package/.claude/settings.local.json +88 -0
- package/.claude/skills/agent-prompt/SKILL.md +54 -0
- package/.claude/skills/agent-prompt/references/agentic-patterns.md +94 -0
- package/.claude/skills/agent-prompt/references/anti-patterns.md +140 -0
- package/.claude/skills/agent-prompt/references/context-design.md +124 -0
- package/.claude/skills/agent-prompt/references/core-principles.md +75 -0
- package/.claude/skills/agent-prompt/references/model-guidance.md +118 -0
- package/.claude/skills/agent-prompt/references/output-formats.md +98 -0
- package/.claude/skills/agent-prompt/references/skill-structure.md +115 -0
- package/.claude/skills/agent-prompt/references/system-prompts.md +115 -0
- package/.claude/skills/notseer/SKILL.md +131 -0
- package/.claude/skills/skill-writer/SKILL.md +140 -0
- package/.claude/skills/testing-guidelines/SKILL.md +132 -0
- package/.claude/skills/warden-skill/SKILL.md +250 -0
- package/.claude/skills/warden-skill/references/config-schema.md +133 -0
- package/.dex/config.toml +2 -0
- package/.github/workflows/ci.yml +33 -0
- package/.github/workflows/release.yml +54 -0
- package/.github/workflows/warden.yml +40 -0
- package/AGENTS.md +89 -0
- package/CONTRIBUTING.md +60 -0
- package/LICENSE +105 -0
- package/README.md +43 -0
- package/SPEC.md +263 -0
- package/action.yml +87 -0
- package/assets/favicon.png +0 -0
- package/assets/warden-icon-bw.svg +5 -0
- package/assets/warden-icon-purple.png +0 -0
- package/assets/warden-icon-purple.svg +5 -0
- package/docs/.claude/settings.local.json +11 -0
- package/docs/astro.config.mjs +43 -0
- package/docs/package.json +19 -0
- package/docs/pnpm-lock.yaml +4000 -0
- package/docs/public/favicon.svg +5 -0
- package/docs/src/components/Code.astro +141 -0
- package/docs/src/components/PackageManagerTabs.astro +183 -0
- package/docs/src/components/Terminal.astro +212 -0
- package/docs/src/layouts/Base.astro +380 -0
- package/docs/src/pages/cli.astro +167 -0
- package/docs/src/pages/config.astro +394 -0
- package/docs/src/pages/guide.astro +449 -0
- package/docs/src/pages/index.astro +490 -0
- package/docs/src/styles/global.css +551 -0
- package/docs/tsconfig.json +3 -0
- package/docs/vercel.json +5 -0
- package/eslint.config.js +33 -0
- package/package.json +73 -0
- package/src/action/index.ts +1 -0
- package/src/action/main.ts +868 -0
- package/src/cli/args.test.ts +477 -0
- package/src/cli/args.ts +415 -0
- package/src/cli/commands/add.ts +447 -0
- package/src/cli/commands/init.test.ts +136 -0
- package/src/cli/commands/init.ts +132 -0
- package/src/cli/commands/setup-app/browser.ts +38 -0
- package/src/cli/commands/setup-app/credentials.ts +45 -0
- package/src/cli/commands/setup-app/manifest.ts +48 -0
- package/src/cli/commands/setup-app/server.ts +172 -0
- package/src/cli/commands/setup-app.ts +156 -0
- package/src/cli/commands/sync.ts +114 -0
- package/src/cli/context.ts +131 -0
- package/src/cli/files.test.ts +155 -0
- package/src/cli/files.ts +89 -0
- package/src/cli/fix.test.ts +310 -0
- package/src/cli/fix.ts +387 -0
- package/src/cli/git.test.ts +119 -0
- package/src/cli/git.ts +318 -0
- package/src/cli/index.ts +14 -0
- package/src/cli/main.ts +672 -0
- package/src/cli/output/box.ts +235 -0
- package/src/cli/output/formatters.test.ts +187 -0
- package/src/cli/output/formatters.ts +269 -0
- package/src/cli/output/icons.ts +13 -0
- package/src/cli/output/index.ts +44 -0
- package/src/cli/output/ink-runner.tsx +337 -0
- package/src/cli/output/jsonl.test.ts +347 -0
- package/src/cli/output/jsonl.ts +126 -0
- package/src/cli/output/reporter.ts +435 -0
- package/src/cli/output/tasks.ts +374 -0
- package/src/cli/output/tty.test.ts +117 -0
- package/src/cli/output/tty.ts +60 -0
- package/src/cli/output/verbosity.test.ts +40 -0
- package/src/cli/output/verbosity.ts +31 -0
- package/src/cli/terminal.test.ts +148 -0
- package/src/cli/terminal.ts +301 -0
- package/src/config/index.ts +3 -0
- package/src/config/loader.test.ts +313 -0
- package/src/config/loader.ts +103 -0
- package/src/config/schema.ts +168 -0
- package/src/config/writer.test.ts +119 -0
- package/src/config/writer.ts +84 -0
- package/src/diff/classify.test.ts +162 -0
- package/src/diff/classify.ts +92 -0
- package/src/diff/coalesce.test.ts +208 -0
- package/src/diff/coalesce.ts +133 -0
- package/src/diff/context.test.ts +226 -0
- package/src/diff/context.ts +201 -0
- package/src/diff/index.ts +4 -0
- package/src/diff/parser.test.ts +212 -0
- package/src/diff/parser.ts +149 -0
- package/src/event/context.ts +132 -0
- package/src/event/index.ts +2 -0
- package/src/event/schedule-context.ts +101 -0
- package/src/examples/examples.integration.test.ts +66 -0
- package/src/examples/index.test.ts +101 -0
- package/src/examples/index.ts +122 -0
- package/src/examples/setup.ts +25 -0
- package/src/index.ts +115 -0
- package/src/output/dedup.test.ts +419 -0
- package/src/output/dedup.ts +607 -0
- package/src/output/github-checks.test.ts +300 -0
- package/src/output/github-checks.ts +476 -0
- package/src/output/github-issues.ts +329 -0
- package/src/output/index.ts +5 -0
- package/src/output/issue-renderer.ts +197 -0
- package/src/output/renderer.test.ts +727 -0
- package/src/output/renderer.ts +217 -0
- package/src/output/stale.test.ts +375 -0
- package/src/output/stale.ts +155 -0
- package/src/output/types.ts +34 -0
- package/src/sdk/index.ts +1 -0
- package/src/sdk/runner.test.ts +806 -0
- package/src/sdk/runner.ts +1232 -0
- package/src/skills/index.ts +36 -0
- package/src/skills/loader.test.ts +300 -0
- package/src/skills/loader.ts +423 -0
- package/src/skills/remote.test.ts +704 -0
- package/src/skills/remote.ts +604 -0
- package/src/triggers/matcher.test.ts +277 -0
- package/src/triggers/matcher.ts +152 -0
- package/src/types/index.ts +194 -0
- package/src/utils/async.ts +18 -0
- package/src/utils/index.test.ts +84 -0
- package/src/utils/index.ts +50 -0
- package/tsconfig.json +25 -0
- package/vitest.config.ts +8 -0
- package/vitest.integration.config.ts +11 -0
- package/warden.toml +19 -0
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# Agentic Patterns
|
|
2
|
+
|
|
3
|
+
Patterns for building effective tool-using agents.
|
|
4
|
+
|
|
5
|
+
## Tool Boundaries
|
|
6
|
+
|
|
7
|
+
Define clear tool access for safety:
|
|
8
|
+
|
|
9
|
+
```typescript
|
|
10
|
+
allowedTools: ['Read', 'Grep'],
|
|
11
|
+
disallowedTools: ['Write', 'Edit', 'Bash', 'WebFetch', 'WebSearch'],
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
Document available tools in the system prompt so the agent knows its capabilities.
|
|
15
|
+
|
|
16
|
+
## Investigation Before Reporting
|
|
17
|
+
|
|
18
|
+
From [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
19
|
+
|
|
20
|
+
> ALWAYS read and understand relevant files before proposing code edits.
|
|
21
|
+
> Do not speculate about code you have not inspected.
|
|
22
|
+
|
|
23
|
+
Encourage thorough analysis:
|
|
24
|
+
|
|
25
|
+
```markdown
|
|
26
|
+
## Analysis Approach
|
|
27
|
+
|
|
28
|
+
1. **Understand intent**: What is the code trying to do?
|
|
29
|
+
2. **Trace data flow**: Follow variables from input to usage
|
|
30
|
+
3. **Consider edge cases**: Empty, null, zero, negative values?
|
|
31
|
+
4. **Check error paths**: Are failures handled correctly?
|
|
32
|
+
5. **Verify assumptions**: What might not be true?
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Confidence Levels
|
|
36
|
+
|
|
37
|
+
Require explicit confidence:
|
|
38
|
+
|
|
39
|
+
```markdown
|
|
40
|
+
"confidence" reflects certainty this is a real issue:
|
|
41
|
+
- **high**: Clear violation, no ambiguity
|
|
42
|
+
- **medium**: Likely issue, context might justify it
|
|
43
|
+
- **low**: Possible concern, needs human review
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Handling Uncertainty
|
|
47
|
+
|
|
48
|
+
From [OpenAI's agent guidelines](https://cookbook.openai.com/examples/gpt4-1_prompting_guide):
|
|
49
|
+
|
|
50
|
+
> Different tools should have different uncertainty thresholds.
|
|
51
|
+
|
|
52
|
+
For analysis:
|
|
53
|
+
|
|
54
|
+
```markdown
|
|
55
|
+
Only report bugs you are confident are real. Do not speculate or
|
|
56
|
+
report "potential" issues. If you're unsure, don't report it.
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Persistence for Agentic Tasks
|
|
60
|
+
|
|
61
|
+
From [OpenAI's GPT-5 guide](https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide):
|
|
62
|
+
|
|
63
|
+
> Keep going until the task is resolved before yielding back to the user.
|
|
64
|
+
|
|
65
|
+
For multi-step analysis, encourage completion:
|
|
66
|
+
|
|
67
|
+
```markdown
|
|
68
|
+
Continue investigating until you have checked all relevant code paths.
|
|
69
|
+
Use Read and Grep to trace data flow through the codebase.
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Subagent Isolation
|
|
73
|
+
|
|
74
|
+
From [agentic best practices](https://www.ranthebuilder.cloud/post/agentic-ai-prompting-best-practices-for-smarter-vibe-coding):
|
|
75
|
+
|
|
76
|
+
> Each subagent should run in complete isolation. Every call should be
|
|
77
|
+
> like a pure function - same input, same output, no shared state.
|
|
78
|
+
|
|
79
|
+
Warden achieves this by analyzing each hunk independently.
|
|
80
|
+
|
|
81
|
+
## Context Management
|
|
82
|
+
|
|
83
|
+
For long-running tasks, from [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
84
|
+
|
|
85
|
+
```markdown
|
|
86
|
+
Your context window will be automatically compacted. Do not stop tasks
|
|
87
|
+
early due to token budget concerns. Save progress before context refreshes.
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Sources
|
|
91
|
+
|
|
92
|
+
- [Anthropic: Claude 4.x Best Practices](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices)
|
|
93
|
+
- [OpenAI: GPT-5 Prompting Guide](https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide)
|
|
94
|
+
- [Agentic AI Prompting Best Practices](https://www.ranthebuilder.cloud/post/agentic-ai-prompting-best-practices-for-smarter-vibe-coding)
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
# Anti-Patterns
|
|
2
|
+
|
|
3
|
+
Common mistakes to avoid when writing prompts.
|
|
4
|
+
|
|
5
|
+
## Over-Emphasis and Anchoring
|
|
6
|
+
|
|
7
|
+
From [Anthropic's Opus 4.5 guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
8
|
+
|
|
9
|
+
> Claude Opus 4.5 is more responsive to the system prompt than previous
|
|
10
|
+
> models. Where you might have said "CRITICAL: You MUST...", you can
|
|
11
|
+
> use more normal prompting.
|
|
12
|
+
|
|
13
|
+
From [Vercel's AGENTS.md research](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals):
|
|
14
|
+
|
|
15
|
+
> Instructions stating "You MUST invoke the skill" caused agents to anchor
|
|
16
|
+
> excessively on documentation patterns while missing project context.
|
|
17
|
+
|
|
18
|
+
**Avoid:**
|
|
19
|
+
```markdown
|
|
20
|
+
CRITICAL: You MUST ALWAYS check for SQL injection. NEVER skip this.
|
|
21
|
+
IT IS ABSOLUTELY ESSENTIAL that you...
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Prefer:**
|
|
25
|
+
```markdown
|
|
26
|
+
Understand the code's intent first, then check for SQL injection:
|
|
27
|
+
Is user input concatenated into queries instead of parameterized?
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
The key insight: "MUST" language causes anchoring on the instruction at the expense of contextual understanding.
|
|
31
|
+
|
|
32
|
+
## Scope Creep
|
|
33
|
+
|
|
34
|
+
Each skill should do one thing well.
|
|
35
|
+
|
|
36
|
+
**Avoid:**
|
|
37
|
+
```markdown
|
|
38
|
+
Find security issues, performance problems, and bugs too.
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Prefer:** Create separate skills for each concern.
|
|
42
|
+
|
|
43
|
+
## Vague Severity
|
|
44
|
+
|
|
45
|
+
**Avoid:**
|
|
46
|
+
```markdown
|
|
47
|
+
Use high severity for important issues and medium for less important ones.
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
**Prefer:**
|
|
51
|
+
```markdown
|
|
52
|
+
- **critical**: Crash, data loss, or silent data corruption
|
|
53
|
+
- **high**: Incorrect behavior in common scenarios
|
|
54
|
+
- **medium**: Incorrect behavior in edge cases
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Negative-Only Instructions
|
|
58
|
+
|
|
59
|
+
**Avoid:**
|
|
60
|
+
```markdown
|
|
61
|
+
Do not output markdown.
|
|
62
|
+
Do not include explanations.
|
|
63
|
+
Do not wrap in code fences.
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**Prefer:**
|
|
67
|
+
```markdown
|
|
68
|
+
Return ONLY valid JSON starting with {"findings":
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Missing Exclusions
|
|
72
|
+
|
|
73
|
+
Without explicit exclusions, skills report everything tangentially related.
|
|
74
|
+
|
|
75
|
+
**Avoid:** Omitting "What NOT to Report" section.
|
|
76
|
+
|
|
77
|
+
**Prefer:**
|
|
78
|
+
```markdown
|
|
79
|
+
## What NOT to Report
|
|
80
|
+
|
|
81
|
+
- Security vulnerabilities (use security-review skill)
|
|
82
|
+
- Style or formatting issues
|
|
83
|
+
- Performance concerns (unless causing incorrect behavior)
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Hallucination-Prone Patterns
|
|
87
|
+
|
|
88
|
+
From [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
89
|
+
|
|
90
|
+
> Never speculate about code you have not opened.
|
|
91
|
+
|
|
92
|
+
**Avoid:** Asking for analysis without providing code context.
|
|
93
|
+
|
|
94
|
+
**Prefer:** Always include actual code in the prompt (Warden does this automatically).
|
|
95
|
+
|
|
96
|
+
## Over-Engineering Output
|
|
97
|
+
|
|
98
|
+
From [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
99
|
+
|
|
100
|
+
> Claude Opus 4.5 has a tendency to overengineer by creating extra files
|
|
101
|
+
> or adding unnecessary abstractions.
|
|
102
|
+
|
|
103
|
+
For prompts, keep output requirements minimal:
|
|
104
|
+
|
|
105
|
+
**Avoid:**
|
|
106
|
+
```markdown
|
|
107
|
+
Include a detailed analysis section, then a summary, then recommendations,
|
|
108
|
+
then a risk assessment matrix, then...
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
**Prefer:**
|
|
112
|
+
```markdown
|
|
113
|
+
Return JSON with findings array. Keep descriptions to 1-2 sentences.
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Conflicting Instructions
|
|
117
|
+
|
|
118
|
+
**Avoid:**
|
|
119
|
+
```markdown
|
|
120
|
+
Be thorough and check everything.
|
|
121
|
+
...
|
|
122
|
+
Only report high-confidence issues.
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**Prefer:** Consistent stance throughout the prompt.
|
|
126
|
+
|
|
127
|
+
## Missing Examples
|
|
128
|
+
|
|
129
|
+
For complex output formats, include an example:
|
|
130
|
+
|
|
131
|
+
**Avoid:** Schema only without example.
|
|
132
|
+
|
|
133
|
+
**Prefer:**
|
|
134
|
+
```markdown
|
|
135
|
+
Example response format:
|
|
136
|
+
{"findings": [{"id": "sql-injection-1", "severity": "high", ...}]}
|
|
137
|
+
|
|
138
|
+
Full schema:
|
|
139
|
+
...
|
|
140
|
+
```
|
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
# Context Design
|
|
2
|
+
|
|
3
|
+
Research on how context delivery affects agent performance. Based on [Vercel's AGENTS.md evaluation](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals).
|
|
4
|
+
|
|
5
|
+
## Key Finding: Passive Context Wins
|
|
6
|
+
|
|
7
|
+
Vercel's evaluation showed dramatic performance differences:
|
|
8
|
+
|
|
9
|
+
| Approach | Pass Rate |
|
|
10
|
+
|----------|-----------|
|
|
11
|
+
| No docs | 53% |
|
|
12
|
+
| Skills (default) | 53% |
|
|
13
|
+
| Skills with explicit instructions | 79% |
|
|
14
|
+
| AGENTS.md (passive context) | 100% |
|
|
15
|
+
|
|
16
|
+
## Why Passive Context Outperforms
|
|
17
|
+
|
|
18
|
+
Three structural advantages:
|
|
19
|
+
|
|
20
|
+
### 1. Eliminated Decision Burden
|
|
21
|
+
|
|
22
|
+
Information exists automatically rather than requiring agent judgment about when retrieval is necessary.
|
|
23
|
+
|
|
24
|
+
**Implication for skills:** Warden injects skill prompts directly into the system prompt. This is passive context - the agent doesn't need to decide to load the skill.
|
|
25
|
+
|
|
26
|
+
### 2. Persistent Availability
|
|
27
|
+
|
|
28
|
+
Documentation remains accessible throughout every conversation turn via system prompt.
|
|
29
|
+
|
|
30
|
+
**Implication for skills:** Keep skill prompts self-contained. Don't require the agent to fetch additional context to understand the task.
|
|
31
|
+
|
|
32
|
+
### 3. Avoided Sequencing Problems
|
|
33
|
+
|
|
34
|
+
No competing instructions about whether to explore first vs consult docs first.
|
|
35
|
+
|
|
36
|
+
**Implication for skills:** Be explicit about the analysis approach. Don't leave ordering ambiguous.
|
|
37
|
+
|
|
38
|
+
## Instruction Wording Matters
|
|
39
|
+
|
|
40
|
+
Subtle wording differences produced dramatically divergent outcomes:
|
|
41
|
+
|
|
42
|
+
| Wording | Effect |
|
|
43
|
+
|---------|--------|
|
|
44
|
+
| "You MUST invoke the skill" | Agent anchored on docs, missed project context |
|
|
45
|
+
| "Explore project first, then invoke skill" | Agent built context first, better results |
|
|
46
|
+
|
|
47
|
+
### Recommendations
|
|
48
|
+
|
|
49
|
+
**Avoid:**
|
|
50
|
+
```markdown
|
|
51
|
+
You MUST check for SQL injection on every code change.
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Prefer:**
|
|
55
|
+
```markdown
|
|
56
|
+
Understand the code's intent first, then check for SQL injection:
|
|
57
|
+
Is user input concatenated into queries?
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Retrieval-Led Reasoning
|
|
61
|
+
|
|
62
|
+
From Vercel's research:
|
|
63
|
+
|
|
64
|
+
> "Prefer retrieval-led reasoning over pre-training-led reasoning"
|
|
65
|
+
|
|
66
|
+
This means: look at the actual code before applying general knowledge.
|
|
67
|
+
|
|
68
|
+
**For Warden skills:**
|
|
69
|
+
```markdown
|
|
70
|
+
## Analysis Approach
|
|
71
|
+
|
|
72
|
+
1. Read the code context provided
|
|
73
|
+
2. Use Read/Grep to investigate related files if needed
|
|
74
|
+
3. Apply skill criteria to what you've observed
|
|
75
|
+
4. Only report issues you've verified in the code
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Compression Works
|
|
79
|
+
|
|
80
|
+
Vercel reduced docs from 40KB to 8KB (80% reduction) with no loss in effectiveness.
|
|
81
|
+
|
|
82
|
+
**Implication for skills:**
|
|
83
|
+
- Concise prompts work as well as verbose ones
|
|
84
|
+
- Use structured formats (tables, lists) over prose
|
|
85
|
+
- Index references rather than including full content
|
|
86
|
+
|
|
87
|
+
**Example - compressed reference:**
|
|
88
|
+
```markdown
|
|
89
|
+
### Injection Types
|
|
90
|
+
SQL|Command|Template|Header|XSS|Path traversal
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
vs verbose:
|
|
94
|
+
```markdown
|
|
95
|
+
There are several types of injection vulnerabilities you should check for.
|
|
96
|
+
First, SQL injection occurs when... Second, command injection...
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
## When Skills Add Value
|
|
100
|
+
|
|
101
|
+
Skills remain valuable for:
|
|
102
|
+
- User-triggered workflows (version upgrades, migrations)
|
|
103
|
+
- Explicit "apply this standard" requests
|
|
104
|
+
- Tasks requiring specific tool sequences
|
|
105
|
+
|
|
106
|
+
For general knowledge that should always apply, passive context wins.
|
|
107
|
+
|
|
108
|
+
## Application to Warden
|
|
109
|
+
|
|
110
|
+
Warden's architecture aligns with these findings:
|
|
111
|
+
|
|
112
|
+
1. **Passive injection** - Skill prompts are injected into system prompt
|
|
113
|
+
2. **Hunk context provided** - Code is in the user prompt, not requiring retrieval
|
|
114
|
+
3. **Read-only tools available** - Agent can investigate but starts with context
|
|
115
|
+
|
|
116
|
+
To maximize effectiveness:
|
|
117
|
+
- Write self-contained skill prompts
|
|
118
|
+
- Include analysis approach (explore → apply criteria)
|
|
119
|
+
- Avoid "MUST" language that causes anchoring
|
|
120
|
+
- Keep prompts concise with structured references
|
|
121
|
+
|
|
122
|
+
## Sources
|
|
123
|
+
|
|
124
|
+
- [Vercel: AGENTS.md Outperforms Skills](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals)
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Core Principles
|
|
2
|
+
|
|
3
|
+
Foundational rules for writing effective prompts. Derived from [Anthropic's official documentation](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices).
|
|
4
|
+
|
|
5
|
+
## 1. Be Explicit
|
|
6
|
+
|
|
7
|
+
Claude 4.x models respond well to clear, explicit instructions.
|
|
8
|
+
|
|
9
|
+
**Less effective:**
|
|
10
|
+
```
|
|
11
|
+
Review this code for issues.
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
**More effective:**
|
|
15
|
+
```
|
|
16
|
+
Analyze the code changes for security issues. Only report genuine
|
|
17
|
+
security concerns, not style issues.
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## 2. Provide Context and Motivation
|
|
21
|
+
|
|
22
|
+
Explain *why* a behavior matters.
|
|
23
|
+
|
|
24
|
+
**Less effective:**
|
|
25
|
+
```
|
|
26
|
+
Never report style issues.
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
**More effective:**
|
|
30
|
+
```
|
|
31
|
+
Do not report style issues. This skill focuses on security vulnerabilities.
|
|
32
|
+
Style issues are handled by code-simplifier and would create noise here.
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## 3. Be Vigilant with Examples
|
|
36
|
+
|
|
37
|
+
Claude pays close attention to examples. Ensure they demonstrate desired behaviors only.
|
|
38
|
+
|
|
39
|
+
## 4. Prefer Positive Instructions
|
|
40
|
+
|
|
41
|
+
Tell Claude what to do, not what to avoid.
|
|
42
|
+
|
|
43
|
+
**Less effective:**
|
|
44
|
+
```
|
|
45
|
+
Do not use markdown in your response.
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**More effective:**
|
|
49
|
+
```
|
|
50
|
+
Return ONLY valid JSON starting with {"findings":
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## 5. Scope Narrowly
|
|
54
|
+
|
|
55
|
+
Broad prompts decrease accuracy. Each skill should have one clear focus.
|
|
56
|
+
|
|
57
|
+
**Less effective:**
|
|
58
|
+
```
|
|
59
|
+
Find bugs, security issues, performance problems, and style violations.
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
**More effective:**
|
|
63
|
+
```
|
|
64
|
+
Identify functional bugs that cause incorrect behavior. Focus on null
|
|
65
|
+
handling, off-by-one errors, and async issues.
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## 6. Match Prompt Style to Output
|
|
69
|
+
|
|
70
|
+
The formatting in your prompt influences Claude's response style. Remove markdown from prompts if you want less markdown in output.
|
|
71
|
+
|
|
72
|
+
## Sources
|
|
73
|
+
|
|
74
|
+
- [Anthropic: Claude 4.x Best Practices](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices)
|
|
75
|
+
- [OpenAI: Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
# Model-Specific Guidance
|
|
2
|
+
|
|
3
|
+
Optimizations for Claude 4.x models (Sonnet 4.5, Opus 4.5).
|
|
4
|
+
|
|
5
|
+
## Claude 4.x Strengths
|
|
6
|
+
|
|
7
|
+
| Capability | Implication |
|
|
8
|
+
|------------|-------------|
|
|
9
|
+
| Precise instruction following | Can use simpler, more natural prompts |
|
|
10
|
+
| Structured output (JSON) | Reliable parsing without complex extraction |
|
|
11
|
+
| Parallel tool calling | Can investigate multiple files simultaneously |
|
|
12
|
+
| Long-horizon state tracking | Maintains context across extended sessions |
|
|
13
|
+
|
|
14
|
+
## Prompting Adjustments
|
|
15
|
+
|
|
16
|
+
### Simpler Language
|
|
17
|
+
|
|
18
|
+
Claude 4.x doesn't need aggressive emphasis:
|
|
19
|
+
|
|
20
|
+
**Before (older models):**
|
|
21
|
+
```markdown
|
|
22
|
+
CRITICAL: You MUST use this tool when...
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
**After (Claude 4.x):**
|
|
26
|
+
```markdown
|
|
27
|
+
Use this tool when...
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### More Direct Communication
|
|
31
|
+
|
|
32
|
+
From [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
33
|
+
|
|
34
|
+
> Claude 4.5 models have a more concise and natural communication style.
|
|
35
|
+
> More direct and grounded, provides fact-based progress rather than
|
|
36
|
+
> self-celebratory updates.
|
|
37
|
+
|
|
38
|
+
### Explicit Thoroughness
|
|
39
|
+
|
|
40
|
+
Claude 4.x may be conservative. If you want thorough analysis:
|
|
41
|
+
|
|
42
|
+
```markdown
|
|
43
|
+
Go beyond the basics to create a fully-featured analysis. Include
|
|
44
|
+
as many relevant findings as possible.
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Thinking and Reflection
|
|
48
|
+
|
|
49
|
+
Claude 4.x supports thinking between tool calls:
|
|
50
|
+
|
|
51
|
+
```markdown
|
|
52
|
+
After receiving tool results, carefully reflect on their quality and
|
|
53
|
+
determine optimal next steps before proceeding.
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Interleaved Thinking
|
|
57
|
+
|
|
58
|
+
For complex multi-step analysis:
|
|
59
|
+
|
|
60
|
+
```markdown
|
|
61
|
+
Use your thinking to plan and iterate based on new information,
|
|
62
|
+
then take the best next action.
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Thinking Sensitivity
|
|
66
|
+
|
|
67
|
+
From [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
68
|
+
|
|
69
|
+
> When extended thinking is disabled, Claude Opus 4.5 is particularly
|
|
70
|
+
> sensitive to the word "think" and its variants.
|
|
71
|
+
|
|
72
|
+
If not using extended thinking, prefer alternatives:
|
|
73
|
+
- "consider" instead of "think about"
|
|
74
|
+
- "evaluate" instead of "think through"
|
|
75
|
+
- "reflect on" instead of "think over"
|
|
76
|
+
|
|
77
|
+
## Tool Usage
|
|
78
|
+
|
|
79
|
+
Claude 4.x excels at parallel tool execution:
|
|
80
|
+
|
|
81
|
+
```markdown
|
|
82
|
+
If you intend to call multiple tools and there are no dependencies,
|
|
83
|
+
make all independent calls in parallel.
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Proactive vs Conservative
|
|
87
|
+
|
|
88
|
+
To make Claude more proactive:
|
|
89
|
+
|
|
90
|
+
```markdown
|
|
91
|
+
By default, implement changes rather than only suggesting them.
|
|
92
|
+
Infer the most useful action and proceed.
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
To make Claude more conservative:
|
|
96
|
+
|
|
97
|
+
```markdown
|
|
98
|
+
Do not take action unless clearly instructed. Default to providing
|
|
99
|
+
information and recommendations rather than making changes.
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Code Exploration
|
|
103
|
+
|
|
104
|
+
From [Anthropic's guidance](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices):
|
|
105
|
+
|
|
106
|
+
> Claude Opus 4.5 can be overly conservative when exploring code.
|
|
107
|
+
|
|
108
|
+
If needed, add explicit instructions:
|
|
109
|
+
|
|
110
|
+
```markdown
|
|
111
|
+
ALWAYS read and understand relevant files before reporting issues.
|
|
112
|
+
Do not speculate about code you have not inspected.
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Sources
|
|
116
|
+
|
|
117
|
+
- [Anthropic: Claude 4.x Best Practices](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices)
|
|
118
|
+
- [Anthropic: What's New in Claude 4.5](https://docs.claude.com/en/about-claude/models/whats-new-claude-4-5)
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
# Output Formats
|
|
2
|
+
|
|
3
|
+
How to specify structured output for reliable parsing.
|
|
4
|
+
|
|
5
|
+
## Enforce JSON-Only Output
|
|
6
|
+
|
|
7
|
+
Be explicit about format requirements:
|
|
8
|
+
|
|
9
|
+
```markdown
|
|
10
|
+
IMPORTANT: Your response must be ONLY a valid JSON object.
|
|
11
|
+
No markdown, no explanation, no code fences.
|
|
12
|
+
|
|
13
|
+
Example response format:
|
|
14
|
+
{"findings": [{"id": "example-1", "severity": "medium", ...}]}
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## Provide Complete Schema
|
|
18
|
+
|
|
19
|
+
Include all available fields so Claude knows the full structure:
|
|
20
|
+
|
|
21
|
+
```json
|
|
22
|
+
{
|
|
23
|
+
"findings": [
|
|
24
|
+
{
|
|
25
|
+
"id": "unique-identifier",
|
|
26
|
+
"severity": "critical|high|medium|low|info",
|
|
27
|
+
"confidence": "high|medium|low",
|
|
28
|
+
"title": "Short descriptive title",
|
|
29
|
+
"description": "Detailed explanation",
|
|
30
|
+
"location": {
|
|
31
|
+
"path": "path/to/file.ts",
|
|
32
|
+
"startLine": 10,
|
|
33
|
+
"endLine": 15
|
|
34
|
+
},
|
|
35
|
+
"suggestedFix": {
|
|
36
|
+
"description": "How to fix",
|
|
37
|
+
"diff": "unified diff format"
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
]
|
|
41
|
+
}
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Field Requirements
|
|
45
|
+
|
|
46
|
+
Document which fields are required vs optional:
|
|
47
|
+
|
|
48
|
+
```markdown
|
|
49
|
+
Requirements:
|
|
50
|
+
- Return ONLY valid JSON starting with {"findings":
|
|
51
|
+
- "findings" array can be empty if no issues found
|
|
52
|
+
- "location.path" is auto-filled - just provide startLine (and optionally endLine). Omit location for general findings.
|
|
53
|
+
- "confidence" reflects certainty given codebase context
|
|
54
|
+
- "suggestedFix" is optional - only include when the fix is complete, correct, and applies to the same file being analyzed. If the fix requires changes to a different file, describe it in the description instead.
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Set Length Expectations
|
|
58
|
+
|
|
59
|
+
Prevent verbose output:
|
|
60
|
+
|
|
61
|
+
```markdown
|
|
62
|
+
Keep descriptions SHORT (1-2 sentences max)
|
|
63
|
+
Be concise - focus only on the changes shown
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Empty Results
|
|
67
|
+
|
|
68
|
+
Explicitly allow empty arrays:
|
|
69
|
+
|
|
70
|
+
```markdown
|
|
71
|
+
Return an empty findings array if no issues match the skill's criteria:
|
|
72
|
+
{"findings": []}
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Warden's JSON Extraction
|
|
76
|
+
|
|
77
|
+
The runner handles common output issues (`src/sdk/runner.ts`):
|
|
78
|
+
|
|
79
|
+
- Strips markdown code fences if present
|
|
80
|
+
- Finds `{"findings"` pattern in prose
|
|
81
|
+
- Extracts balanced JSON with nested objects
|
|
82
|
+
- Validates against FindingSchema with Zod
|
|
83
|
+
|
|
84
|
+
This provides resilience, but clean JSON output is still preferred.
|
|
85
|
+
|
|
86
|
+
## Severity Level Definitions
|
|
87
|
+
|
|
88
|
+
Severity reflects **urgency and required action**, not the type of issue. Each skill defines what "significant impact" means in its domain.
|
|
89
|
+
|
|
90
|
+
| Level | Definition |
|
|
91
|
+
|----------|----------------------------------------------------------|
|
|
92
|
+
| critical | Must fix before merge: significant impact if ignored |
|
|
93
|
+
| high | Should fix before merge: notable issue affecting quality |
|
|
94
|
+
| medium | Worth reviewing: potential issue, may need action |
|
|
95
|
+
| low | Minor: address when convenient |
|
|
96
|
+
| info | Informational: no action required |
|
|
97
|
+
|
|
98
|
+
Avoid vague definitions like "important" or "less important."
|