agentscamp 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +64 -0
- package/content/agents/accessibility-auditor.md +66 -0
- package/content/agents/agent-architect.md +65 -0
- package/content/agents/agent-reliability-reviewer.md +40 -0
- package/content/agents/agent-tool-integration-engineer.md +38 -0
- package/content/agents/api-architect.md +84 -0
- package/content/agents/backend-developer.md +92 -0
- package/content/agents/browser-agent-engineer.md +37 -0
- package/content/agents/cloud-architect.md +72 -0
- package/content/agents/code-reviewer.md +69 -0
- package/content/agents/data-engineer.md +67 -0
- package/content/agents/data-scientist.md +79 -0
- package/content/agents/debugger.md +89 -0
- package/content/agents/dependency-manager.md +64 -0
- package/content/agents/devops-engineer.md +94 -0
- package/content/agents/documentation-engineer.md +52 -0
- package/content/agents/finetuning-engineer.md +43 -0
- package/content/agents/frontend-developer.md +78 -0
- package/content/agents/git-github-expert.md +66 -0
- package/content/agents/golang-pro.md +72 -0
- package/content/agents/graphql-architect.md +85 -0
- package/content/agents/kubernetes-specialist.md +87 -0
- package/content/agents/llm-cost-optimizer.md +39 -0
- package/content/agents/llm-evaluation-engineer.md +42 -0
- package/content/agents/llm-inference-engineer.md +42 -0
- package/content/agents/llm-integration-engineer.md +39 -0
- package/content/agents/llm-observability-engineer.md +41 -0
- package/content/agents/mcp-server-engineer.md +43 -0
- package/content/agents/ml-engineer.md +67 -0
- package/content/agents/mobile-developer.md +89 -0
- package/content/agents/performance-engineer.md +79 -0
- package/content/agents/postgres-migration-engineer.md +42 -0
- package/content/agents/prompt-engineer.md +58 -0
- package/content/agents/prompt-injection-auditor.md +42 -0
- package/content/agents/python-pro.md +77 -0
- package/content/agents/rag-pipeline-engineer.md +42 -0
- package/content/agents/react-specialist.md +83 -0
- package/content/agents/refactoring-specialist.md +78 -0
- package/content/agents/retrieval-engineer.md +41 -0
- package/content/agents/rust-pro.md +89 -0
- package/content/agents/security-auditor.md +78 -0
- package/content/agents/sql-pro.md +53 -0
- package/content/agents/sre-engineer.md +66 -0
- package/content/agents/system-architect.md +77 -0
- package/content/agents/terraform-specialist.md +73 -0
- package/content/agents/test-engineer.md +79 -0
- package/content/agents/typescript-pro.md +82 -0
- package/content/agents/vector-search-engineer.md +43 -0
- package/content/agents/voice-agent-engineer.md +38 -0
- package/content/agents/workflow-orchestrator.md +70 -0
- package/content/commands/add-docstrings.md +92 -0
- package/content/commands/add-human-approval.md +40 -0
- package/content/commands/add-mcp-server.md +50 -0
- package/content/commands/add-streaming-endpoint.md +34 -0
- package/content/commands/benchmark-rerankers.md +44 -0
- package/content/commands/breakdown-task.md +86 -0
- package/content/commands/commit.md +117 -0
- package/content/commands/create-pr.md +109 -0
- package/content/commands/db-migrate.md +47 -0
- package/content/commands/explain-code.md +71 -0
- package/content/commands/explain-error.md +98 -0
- package/content/commands/extract-function.md +107 -0
- package/content/commands/find-bug.md +93 -0
- package/content/commands/fix-failing-test.md +106 -0
- package/content/commands/new-component.md +119 -0
- package/content/commands/plan-feature.md +71 -0
- package/content/commands/profile-postgres-queries.md +41 -0
- package/content/commands/red-team-llm.md +45 -0
- package/content/commands/refactor.md +82 -0
- package/content/commands/review-pr.md +101 -0
- package/content/commands/run-evals.md +34 -0
- package/content/commands/scaffold-pgvector-schema.md +42 -0
- package/content/commands/scaffold-vllm-config.md +44 -0
- package/content/commands/security-scan.md +129 -0
- package/content/commands/set-perf-budget.md +47 -0
- package/content/commands/setup-claude-ci.md +60 -0
- package/content/commands/sync-branch.md +138 -0
- package/content/commands/update-readme.md +108 -0
- package/content/commands/write-tests.md +81 -0
- package/content/manifest.json +1709 -0
- package/content/skills/adr-writer.md +90 -0
- package/content/skills/branch-rebaser.md +86 -0
- package/content/skills/bundle-analyzer.md +77 -0
- package/content/skills/changelog-from-prs.md +81 -0
- package/content/skills/chunking-strategy-optimizer.md +34 -0
- package/content/skills/claude-settings-auditor.md +38 -0
- package/content/skills/conventional-commits.md +80 -0
- package/content/skills/coverage-gap-finder.md +72 -0
- package/content/skills/dead-code-finder.md +65 -0
- package/content/skills/dependency-audit.md +64 -0
- package/content/skills/embedding-index-tuner.md +34 -0
- package/content/skills/embedding-set-inspector.md +34 -0
- package/content/skills/finetune-dataset-builder.md +33 -0
- package/content/skills/graphrag-scaffolder.md +39 -0
- package/content/skills/hook-writer.md +39 -0
- package/content/skills/human-in-the-loop-gate.md +33 -0
- package/content/skills/llm-as-judge-scorer.md +33 -0
- package/content/skills/llm-eval-suite-scaffolder.md +30 -0
- package/content/skills/llm-guardrails-designer.md +33 -0
- package/content/skills/llm-output-schema-generator.md +32 -0
- package/content/skills/mcp-server-scaffolder.md +33 -0
- package/content/skills/mock-data-factory.md +75 -0
- package/content/skills/multimodal-document-extractor.md +39 -0
- package/content/skills/openapi-doc-writer.md +88 -0
- package/content/skills/plugin-scaffolder.md +38 -0
- package/content/skills/postgres-index-strategist.md +38 -0
- package/content/skills/pr-description.md +87 -0
- package/content/skills/prompt-cache-optimizer.md +34 -0
- package/content/skills/prompt-optimizer.md +40 -0
- package/content/skills/prompt-pii-redactor.md +33 -0
- package/content/skills/provider-fallback-wrapper.md +33 -0
- package/content/skills/qlora-finetune-runner.md +33 -0
- package/content/skills/readme-generator.md +84 -0
- package/content/skills/secret-scanner.md +65 -0
- package/content/skills/sql-optimizer.md +77 -0
- package/content/skills/test-scaffolder.md +74 -0
- package/content/skills/tool-definition-generator.md +33 -0
- package/content/skills/web-research-pipeline.md +39 -0
- package/dist/index.js +384 -0
- package/package.json +44 -0
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Refactor the target for readability and structure without changing behavior."
|
|
3
|
+
argument-hint: "[file or function]"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
Refactor `$ARGUMENTS` to improve readability, structure, and maintainability while keeping observable behavior exactly the same. If `$ARGUMENTS` is empty, ask which file or function to target before making any changes.
|
|
7
|
+
|
|
8
|
+
> [!WARNING]
|
|
9
|
+
> This is a behavior-preserving refactor, not a rewrite. Do not add features, change public APIs, alter return values, or modify side effects. If you discover a genuine bug, stop and report it instead of silently "fixing" it.
|
|
10
|
+
|
|
11
|
+
## 1. Establish a baseline
|
|
12
|
+
|
|
13
|
+
Before touching anything, understand the current behavior and how it is verified.
|
|
14
|
+
|
|
15
|
+
- Read `$ARGUMENTS` and the code that calls it. Note the public interface: function signatures, exported symbols, and any side effects (I/O, network, mutation, logging).
|
|
16
|
+
- Find the relevant tests. Run them to confirm they pass before you start, so you have a known-good baseline.
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
# Adjust to the project's test runner
|
|
20
|
+
npm test # or: pytest, go test ./..., cargo test
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
If there are no tests covering the target, say so. Add a minimal characterization test that captures current behavior before refactoring, or ask the user how to proceed.
|
|
24
|
+
|
|
25
|
+
## 2. Identify what to improve
|
|
26
|
+
|
|
27
|
+
Look for concrete, well-known issues rather than stylistic preference:
|
|
28
|
+
|
|
29
|
+
- **Naming** — vague or misleading identifiers (`data`, `tmp`, `doStuff`).
|
|
30
|
+
- **Duplication** — repeated logic that can be extracted into one place.
|
|
31
|
+
- **Long functions** — units doing several jobs that should be split.
|
|
32
|
+
- **Deep nesting** — guard clauses and early returns can flatten control flow.
|
|
33
|
+
- **Dead code** — unused variables, unreachable branches, stale comments.
|
|
34
|
+
- **Leaky structure** — mixed levels of abstraction in one function.
|
|
35
|
+
|
|
36
|
+
## 3. Apply changes in small steps
|
|
37
|
+
|
|
38
|
+
Make one focused transformation at a time. Prefer many small, verifiable edits over one large rewrite.
|
|
39
|
+
|
|
40
|
+
A typical move is replacing nested conditionals with guard clauses:
|
|
41
|
+
|
|
42
|
+
```js
|
|
43
|
+
// Before
|
|
44
|
+
function getDiscount(user) {
|
|
45
|
+
if (user) {
|
|
46
|
+
if (user.isActive) {
|
|
47
|
+
return user.isPremium ? 0.2 : 0.1;
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
return 0;
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
// After
|
|
54
|
+
function getDiscount(user) {
|
|
55
|
+
if (!user || !user.isActive) return 0;
|
|
56
|
+
return user.isPremium ? 0.2 : 0.1;
|
|
57
|
+
}
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
After each meaningful step, re-run the tests so a regression is caught immediately and isolated to the change that caused it.
|
|
61
|
+
|
|
62
|
+
## 4. Verify behavior is unchanged
|
|
63
|
+
|
|
64
|
+
- Run the full test suite again; it must pass with no modifications to the assertions.
|
|
65
|
+
- Run the linter and type checker to confirm nothing was broken.
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
npm run lint
|
|
69
|
+
npm run build # or the project's typecheck command
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
> [!NOTE]
|
|
73
|
+
> If a test had to change to keep passing, that means behavior changed. Revert and rethink, or surface the discrepancy to the user.
|
|
74
|
+
|
|
75
|
+
## 5. Summarize
|
|
76
|
+
|
|
77
|
+
Report back concisely:
|
|
78
|
+
|
|
79
|
+
- **What changed** — the specific refactorings applied (e.g. "extracted `validateInput`, flattened nesting in `parse`").
|
|
80
|
+
- **Why** — the readability or structure problem each change addressed.
|
|
81
|
+
- **Verification** — that tests, lint, and types still pass.
|
|
82
|
+
- **Follow-ups** — anything out of scope you noticed (suspected bugs, missing test coverage) listed separately, not acted on.
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Review a pull request for correctness, security, and style, and summarize findings."
|
|
3
|
+
argument-hint: "[PR number]"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
Review pull request **#$ARGUMENTS** end to end. Produce a focused, actionable review that a maintainer can act on immediately. Do not approve or merge the PR — your job is to analyze and report.
|
|
7
|
+
|
|
8
|
+
## Gather context
|
|
9
|
+
|
|
10
|
+
Pull the PR metadata and the full diff before forming any opinion. Use the GitHub CLI so you read the same state reviewers see.
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
# Title, body, author, branches, labels, and CI status
|
|
14
|
+
gh pr view $ARGUMENTS
|
|
15
|
+
|
|
16
|
+
# Full diff for the PR
|
|
17
|
+
gh pr diff $ARGUMENTS
|
|
18
|
+
|
|
19
|
+
# Files changed with additions/deletions
|
|
20
|
+
gh pr view $ARGUMENTS --json files --jq '.files[] | "\(.path) +\(.additions) -\(.deletions)"'
|
|
21
|
+
|
|
22
|
+
# CI / check results
|
|
23
|
+
gh pr checks $ARGUMENTS
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Read the PR description and any linked issues to understand the *intended* behavior. Then check out the branch locally so you can inspect surrounding code, run the test suite, and verify claims.
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
gh pr checkout $ARGUMENTS
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
> [!NOTE]
|
|
33
|
+
> Review the change against its stated goal. A technically clean diff that does not solve the problem described in the PR is still a problem worth flagging.
|
|
34
|
+
|
|
35
|
+
## What to evaluate
|
|
36
|
+
|
|
37
|
+
### Correctness
|
|
38
|
+
|
|
39
|
+
Trace the changed logic against the intended behavior. Look for off-by-one errors, incorrect conditionals, unhandled `null`/`undefined`, broken edge cases, race conditions, and resource leaks. Confirm new behavior is covered by tests and that existing tests still pass.
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
# Run the project's test suite (adapt to the repo)
|
|
43
|
+
npm test
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### Security
|
|
47
|
+
|
|
48
|
+
Inspect every place untrusted input enters the system. Flag injection risks (SQL, shell, template), missing authentication or authorization checks, unsafe deserialization, path traversal, and secrets committed to the repo.
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
# Scan the diff for accidentally committed secrets
|
|
52
|
+
gh pr diff $ARGUMENTS | grep -nEi '(api[_-]?key|secret|token|password|BEGIN [A-Z ]*PRIVATE KEY)'
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
> [!WARNING]
|
|
56
|
+
> Never echo a real secret you discover into the review. Report the file and line, recommend rotation, and ask the author to remove it from history.
|
|
57
|
+
|
|
58
|
+
### Style and maintainability
|
|
59
|
+
|
|
60
|
+
Check naming, dead code, duplicated logic, oversized functions, and adherence to the project's lint rules and conventions. Prefer the codebase's existing patterns over personal preference.
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
npm run lint
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Classify each finding
|
|
67
|
+
|
|
68
|
+
Tag every finding by severity so the author knows what blocks merge:
|
|
69
|
+
|
|
70
|
+
- **Blocker** — must fix before merge (bugs, security holes, broken tests).
|
|
71
|
+
- **Should-fix** — important but not strictly blocking.
|
|
72
|
+
- **Nit** — minor style or polish; optional.
|
|
73
|
+
|
|
74
|
+
For each finding, cite the exact `file:line`, explain *why* it matters, and propose a concrete fix or a suggested diff.
|
|
75
|
+
|
|
76
|
+
## Output format
|
|
77
|
+
|
|
78
|
+
Summarize the review in this structure:
|
|
79
|
+
|
|
80
|
+
```markdown
|
|
81
|
+
## Review of PR #$ARGUMENTS — <title>
|
|
82
|
+
|
|
83
|
+
**Verdict:** Approve / Request changes / Comment
|
|
84
|
+
|
|
85
|
+
### Summary
|
|
86
|
+
<2-3 sentences on what the PR does and overall quality.>
|
|
87
|
+
|
|
88
|
+
### Blockers
|
|
89
|
+
- `path/to/file.ts:42` — <issue and fix>
|
|
90
|
+
|
|
91
|
+
### Should-fix
|
|
92
|
+
- `path/to/file.ts:88` — <issue and fix>
|
|
93
|
+
|
|
94
|
+
### Nits
|
|
95
|
+
- `path/to/file.ts:101` — <issue and fix>
|
|
96
|
+
|
|
97
|
+
### What looks good
|
|
98
|
+
- <notable strengths worth calling out>
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Keep feedback specific and respectful. End with a clear recommendation and the single most important next step for the author.
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Run the project's LLM evaluation suite (DeepEval, promptfoo, or RAGAS) and report scores against thresholds before a merge."
|
|
3
|
+
argument-hint: "<eval suite path / config, or the feature to evaluate>"
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Bash"
|
|
5
|
+
model: sonnet
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Scope
|
|
9
|
+
|
|
10
|
+
Treat `$ARGUMENTS` as the eval target — a path to the eval suite/config, or the feature whose suite should run. Restate what you're evaluating in one sentence first.
|
|
11
|
+
|
|
12
|
+
This command runs the **LLM evaluation suite** (e.g. [DeepEval](/tools/deepeval), [promptfoo](/tools/promptfoo), or [RAGAS](/tools/ragas)) — it is **not** a unit-test runner. If the project has no eval suite yet, say so and point to scaffolding one rather than inventing ad-hoc checks.
|
|
13
|
+
|
|
14
|
+
> [!NOTE]
|
|
15
|
+
> Evals are non-deterministic and cost tokens (judge metrics call an LLM). Run the full frozen dataset, not a cherry-picked subset, or the result is meaningless.
|
|
16
|
+
|
|
17
|
+
## Step 1 — Locate the suite
|
|
18
|
+
|
|
19
|
+
Find the eval config/tests (e.g. `deepeval`/pytest eval files, `promptfooconfig.yaml`, or a RAGAS script) and the frozen dataset. Confirm the metrics and their thresholds. If none exists, stop and recommend scaffolding one — do not fabricate a suite.
|
|
20
|
+
|
|
21
|
+
## Step 2 — Run it
|
|
22
|
+
|
|
23
|
+
Execute the suite over the **full** dataset using the project's runner. Capture the raw output. Do not modify prompts or the dataset to make it pass.
|
|
24
|
+
|
|
25
|
+
## Step 3 — Report against thresholds and baseline
|
|
26
|
+
|
|
27
|
+
Produce a table: metric | score | threshold | baseline | pass/fail | delta vs baseline. Call out any metric below threshold or regressed from baseline explicitly.
|
|
28
|
+
|
|
29
|
+
## Step 4 — Verdict
|
|
30
|
+
|
|
31
|
+
Give a clear merge verdict: **pass** (all metrics clear threshold, no regression) or **block** (which metric failed, by how much). For a block, point at the likely stage — retrieval, prompt, or model — rather than guessing a fix.
|
|
32
|
+
|
|
33
|
+
> [!WARNING]
|
|
34
|
+
> Never tune the prompt against the same cases you're reporting on in the same run, and never relax a threshold just to go green. If a threshold is wrong, change it deliberately in its own commit with a rationale.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Scaffold a production-ready pgvector schema and HNSW index for a corpus — matching the project's migration tooling, distance metric, and embedding dimensions."
|
|
3
|
+
argument-hint: "<table/corpus name and embedding dimensions, or a description of the data>"
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Edit, Write, Bash"
|
|
5
|
+
model: sonnet
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Scope
|
|
9
|
+
|
|
10
|
+
Treat `$ARGUMENTS` as the corpus to store: a table/collection name, the embedding dimensions (and ideally the embedding model, so the distance metric is correct), and any metadata fields you'll filter on. If the dimensions or model aren't given, ask — guessing the vector size is the one thing you cannot paper over later.
|
|
11
|
+
|
|
12
|
+
Goal: produce a **migration-managed** pgvector schema and index that's correct on the first apply — right dimension, right operator class, indexed filter columns — not ad-hoc `CREATE TABLE` run by hand.
|
|
13
|
+
|
|
14
|
+
> [!NOTE]
|
|
15
|
+
> This scaffolds the schema; it does not embed your data. Embedding and ingestion are a separate step (see [pgvector](/tools/pgvector) and the [vector-search-engineer](/agents/data-ai/vector-search-engineer)).
|
|
16
|
+
|
|
17
|
+
## Step 1 — Detect the project's conventions
|
|
18
|
+
|
|
19
|
+
Before writing any SQL, find how this project manages schema: look for a migrations directory and tool (e.g. Prisma, Drizzle, Alembic, Flyway, golang-migrate, Rails, Knex) and match its file naming and format. Confirm Postgres is the database and check whether `vector` is already enabled. Never hand-write DDL out of band when a migration tool owns the schema — generate a migration in the project's format.
|
|
20
|
+
|
|
21
|
+
## Step 2 — Enable the extension
|
|
22
|
+
|
|
23
|
+
Add `CREATE EXTENSION IF NOT EXISTS vector;` as the first step of the migration (or confirm it's already enabled, including on the managed provider if there is one — most require enabling it explicitly).
|
|
24
|
+
|
|
25
|
+
## Step 3 — Define the table and vector column
|
|
26
|
+
|
|
27
|
+
Create the table (or alter an existing one) with a `vector(N)` column where **N is the embedding model's exact output dimension**. Include the content/reference columns and the metadata columns you'll filter on. State the dimension and model in a comment so the next person knows what produced these vectors.
|
|
28
|
+
|
|
29
|
+
## Step 4 — Choose the operator class to match the metric
|
|
30
|
+
|
|
31
|
+
Pick the index operator class to match the embedding model's distance metric — `vector_cosine_ops` for cosine (most common), `vector_l2_ops` for Euclidean, `vector_ip_ops` for inner product. A mismatch here silently degrades recall, so state the assumption explicitly.
|
|
32
|
+
|
|
33
|
+
## Step 5 — Create the HNSW index (and filter indexes)
|
|
34
|
+
|
|
35
|
+
Add an HNSW index on the vector column with the chosen operator class, and **B-tree indexes on the metadata columns you filter on** so filtered search doesn't fall back to a scan. Leave HNSW `m` / `ef_construction` at sensible defaults but note that they're tunable — point to the [Embedding Index Tuner](/skills/database/embedding-index-tuner) for fitting them to a recall target.
|
|
36
|
+
|
|
37
|
+
## Step 6 — Emit a sample query and the apply command
|
|
38
|
+
|
|
39
|
+
Provide a parameterized nearest-neighbour query with a metadata `WHERE` clause and an `ORDER BY embedding <=> $1 LIMIT 20` (over-retrieve, then rerank), and tell the user the exact command to apply the migration with their project's migration tool. Remind them that building the index on a large existing table should use `CREATE INDEX CONCURRENTLY` to avoid locking writes.
|
|
40
|
+
|
|
41
|
+
> [!WARNING]
|
|
42
|
+
> Get the **dimension** and **operator class** right before any data is loaded. Changing the vector dimension later means re-creating the column and re-embedding the whole corpus; changing the metric means re-building the index. Both are far cheaper to decide now than to migrate later.
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Scaffold a vLLM serving config for a model on a target GPU — pick precision/quantization and parallelism to fit, set batching and context length, and expose an OpenAI-compatible server."
|
|
3
|
+
argument-hint: "<model + target GPU(s) and VRAM, or a description of the serving workload>"
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Bash, Edit, Write"
|
|
5
|
+
model: sonnet
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Scope
|
|
9
|
+
|
|
10
|
+
Treat `$ARGUMENTS` as what to serve: a model (id/size), the target GPU(s) and VRAM, and ideally the workload shape (chat vs. batch, prompt/response lengths, target concurrency). If the GPU/VRAM isn't given, ask — it determines whether the model fits at all and at what precision.
|
|
11
|
+
|
|
12
|
+
Goal: a **runnable, fits-the-GPU** vLLM serving config with an OpenAI-compatible endpoint — sane defaults a human can then load-test and tune, not a guess that OOMs on first launch.
|
|
13
|
+
|
|
14
|
+
> [!NOTE]
|
|
15
|
+
> This scaffolds a starting config; it does not load-test or tune to an SLO. For benchmarking and tuning throughput/p95 against a budget, hand off to the [llm-inference-engineer](/agents/data-ai/llm-inference-engineer). For local single-user running, [Ollama](/tools/ollama) is simpler than vLLM.
|
|
16
|
+
|
|
17
|
+
## Step 1 — Size the model against the GPU
|
|
18
|
+
|
|
19
|
+
Estimate the model's memory at candidate precisions (FP16/BF16 vs. FP8 vs. AWQ/GPTQ int4) plus KV-cache headroom for your context length and concurrency. Decide whether it fits one GPU or needs **tensor parallelism** (`--tensor-parallel-size N`). State the assumption.
|
|
20
|
+
|
|
21
|
+
## Step 2 — Choose precision/quantization
|
|
22
|
+
|
|
23
|
+
Pick the highest precision that fits with headroom; drop to FP8 or 4-bit quantization only as needed to fit, and **flag that quantization can affect quality** so it gets re-checked against an eval set, not assumed safe.
|
|
24
|
+
|
|
25
|
+
## Step 3 — Set the core serving flags
|
|
26
|
+
|
|
27
|
+
Produce the `vllm serve` invocation (or equivalent config) with the parameters that matter:
|
|
28
|
+
|
|
29
|
+
- `--max-model-len` — context length sized to your prompts (don't over-allocate; it costs KV-cache memory).
|
|
30
|
+
- `--gpu-memory-utilization` — how much VRAM vLLM may use (leave headroom).
|
|
31
|
+
- `--max-num-seqs` — concurrency / batch width.
|
|
32
|
+
- `--tensor-parallel-size` — for multi-GPU models.
|
|
33
|
+
- quantization flag if used.
|
|
34
|
+
|
|
35
|
+
## Step 4 — Expose the OpenAI-compatible endpoint
|
|
36
|
+
|
|
37
|
+
Confirm the server exposes `/v1/chat/completions` (and `/v1/completions`) so existing OpenAI clients work by changing the base URL. Note the host/port and any served-model-name.
|
|
38
|
+
|
|
39
|
+
## Step 5 — Emit the config and a smoke test
|
|
40
|
+
|
|
41
|
+
Output the final command/config plus a one-line `curl` (or OpenAI-client snippet) to verify the endpoint responds, and the env/launch notes (GPU visibility, model download/cache).
|
|
42
|
+
|
|
43
|
+
> [!WARNING]
|
|
44
|
+
> The two failure modes to pre-empt: an out-of-memory crash on launch (precision/context/concurrency too high for the VRAM) and a silent quality drop from quantization. Size conservatively with KV-cache headroom, and re-run your eval set after any quantization before trusting the deployment — see [vLLM](/tools/vllm).
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Scan the current diff or given paths for security vulnerabilities."
|
|
3
|
+
argument-hint: "[paths]"
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Bash"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
Audit code for security vulnerabilities and report what you find by severity. This command is **read-only** — investigate and report, but do not edit code, rewrite history, or "fix it while you're in there." Work through the steps below and trace every finding to a concrete line.
|
|
8
|
+
|
|
9
|
+
## Scope
|
|
10
|
+
|
|
11
|
+
Decide what to audit before reading a single file:
|
|
12
|
+
|
|
13
|
+
- If `$ARGUMENTS` is provided, treat it as the set of paths or globs to scan (`src/api`, `app/**/*.ts`, `lib/auth.ts`). Restrict the audit to those files and the code they directly call.
|
|
14
|
+
- If `$ARGUMENTS` is empty, scan the **current diff** — the uncommitted and recently committed changes — so the review matches what is about to ship.
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
# No arguments → derive the scope from the diff
|
|
18
|
+
git diff --name-only HEAD # unstaged + staged changes vs. HEAD
|
|
19
|
+
git diff --name-only origin/main...HEAD # everything on this branch
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
> [!NOTE]
|
|
23
|
+
> Default to the diff, not the whole repository. A focused review of what changed is far more useful than a shallow pass over the entire codebase. Only widen scope when `$ARGUMENTS` tells you to.
|
|
24
|
+
|
|
25
|
+
## Step 1 — Map untrusted input
|
|
26
|
+
|
|
27
|
+
Vulnerabilities live where untrusted data crosses a trust boundary. Before pattern-matching, identify every entry point in scope: HTTP handlers, request bodies, query/path params, headers, cookies, file uploads, webhook payloads, message-queue consumers, CLI args, and env-driven config.
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
# Common request/entry-point surfaces (adapt to the stack)
|
|
31
|
+
grep -rnE 'req\.(body|query|params|headers|cookies)|request\.(get|args|json)|process\.argv' <scope>
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Trace each tainted value from entry point to where it is used. The checks below all reduce to one question: *does untrusted input reach a dangerous sink without being neutralized?*
|
|
35
|
+
|
|
36
|
+
## Step 2 — Injection (SQL, command, template)
|
|
37
|
+
|
|
38
|
+
Look for untrusted input concatenated or interpolated into an interpreter instead of parameterized.
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
# SQL built with string concatenation/interpolation instead of bound params
|
|
42
|
+
grep -rnE "(query|execute|raw)\(.*(\+|\$\{|%s|f\"|f')" <scope>
|
|
43
|
+
|
|
44
|
+
# Shell execution with interpolated input
|
|
45
|
+
grep -rnE 'exec\(|execSync|spawn\(|os\.system|subprocess\.(run|call|Popen)|child_process' <scope>
|
|
46
|
+
|
|
47
|
+
# Server-side template rendering from user input (SSTI)
|
|
48
|
+
grep -rnE 'render(_template_string|String)?\(|Template\(|\$\{[^}]*req' <scope>
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
- **SQL:** flag anything that is not a parameterized query / prepared statement. ORMs are safe only until someone reaches for `.raw()`.
|
|
52
|
+
- **Command:** flag any shell string built from input; the fix is an `exec`-family call with an argument array and no shell, or an allowlist.
|
|
53
|
+
- **Template:** flag user data passed as the *template* rather than as *data* bound into a fixed template.
|
|
54
|
+
|
|
55
|
+
## Step 3 — Missing authorization (and authentication)
|
|
56
|
+
|
|
57
|
+
Authentication asks *who are you*; authorization asks *are you allowed to touch this object*. The second is the one people forget. For every state-changing or data-returning handler, confirm there is an explicit ownership/role check.
|
|
58
|
+
|
|
59
|
+
- Find endpoints that take an object id (`/users/:id`, `/orders/:id`) and verify the handler checks the object belongs to the caller — not just that the caller is logged in. Missing that check is **IDOR / broken object-level authorization**.
|
|
60
|
+
- Watch for checks done in the UI or middleware but **not** re-enforced on the server.
|
|
61
|
+
- Flag admin/privileged routes that rely only on a hidden URL or a client-supplied role.
|
|
62
|
+
|
|
63
|
+
> [!WARNING]
|
|
64
|
+
> "The user can't reach this page" is not authorization. Anyone can call the endpoint directly. Every protected action needs a server-side check at the point it mutates or returns data.
|
|
65
|
+
|
|
66
|
+
## Step 4 — Hardcoded secrets
|
|
67
|
+
|
|
68
|
+
Scan for credentials committed to the repo. Report file and line — **never paste the secret value into your output**, even partially.
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
grep -rnEi '(api[_-]?key|secret|token|passwd|password|client[_-]?secret|aws_[a-z_]*key)\s*[:=]\s*["'\''][^"'\'' ]{8,}' <scope>
|
|
72
|
+
grep -rnE 'BEGIN [A-Z ]*PRIVATE KEY' <scope>
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
If you find a live credential, treat it as a **Critical** finding: it must be rotated, not just deleted, because it is already in git history.
|
|
76
|
+
|
|
77
|
+
## Step 5 — SSRF, path traversal, and unsafe deserialization
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
# SSRF: server-side fetch where the URL/host comes from input
|
|
81
|
+
grep -rnE '(fetch|axios|requests\.(get|post)|http\.(get|request)|urlopen)\(' <scope>
|
|
82
|
+
|
|
83
|
+
# Path traversal: filesystem paths built from input
|
|
84
|
+
grep -rnE '(readFile|open|sendFile|createReadStream|path\.join)\(.*(req|input|params|argv)' <scope>
|
|
85
|
+
|
|
86
|
+
# Unsafe deserialization
|
|
87
|
+
grep -rnE 'pickle\.loads|yaml\.load\(|unserialize|Marshal\.load|ObjectInputStream' <scope>
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
- **SSRF:** a user-controlled URL fed to a server-side request lets an attacker hit internal services and cloud metadata (`169.254.169.254`). Require an allowlist of hosts/schemes; blocklists leak.
|
|
91
|
+
- **Path traversal:** input reaching a file path enables `../../etc/passwd`. Require canonicalization plus a check that the resolved path stays inside the intended root.
|
|
92
|
+
- **Deserialization:** `pickle.loads`, `yaml.load` (without `SafeLoader`), and PHP `unserialize` on untrusted bytes are remote code execution. Require safe loaders or a strict schema.
|
|
93
|
+
|
|
94
|
+
## Step 6 — Missing input validation
|
|
95
|
+
|
|
96
|
+
Even where there's no obvious sink, unvalidated input causes downstream damage: oversized payloads, type confusion, mass assignment, and bypassed business rules.
|
|
97
|
+
|
|
98
|
+
- Check that request bodies are validated against a schema (zod, pydantic, JSON Schema) before use — not trusted by shape.
|
|
99
|
+
- Flag **mass assignment**: spreading a request body straight into a DB write (`User.create({ ...req.body })`) lets a caller set `isAdmin`. Require an explicit allowlist of writable fields.
|
|
100
|
+
- Confirm numeric/length/enum bounds, and that file uploads are checked for type and size.
|
|
101
|
+
|
|
102
|
+
## Step 7 — Report findings
|
|
103
|
+
|
|
104
|
+
Rank by **severity**, give each finding a concrete fix, and state your **confidence**.
|
|
105
|
+
|
|
106
|
+
```markdown
|
|
107
|
+
## Security scan — <diff or scope>
|
|
108
|
+
|
|
109
|
+
**Summary:** <1–2 sentences: N findings, highest severity, overall posture.>
|
|
110
|
+
|
|
111
|
+
### Confirmed
|
|
112
|
+
- **[Critical] SQL injection** — `src/api/search.ts:48`
|
|
113
|
+
- Untrusted `req.query.q` is concatenated into the SQL string.
|
|
114
|
+
- **Fix:** use a parameterized query (`db.query(sql, [q])`).
|
|
115
|
+
- **Confidence:** high — input flows directly to the sink with no escaping.
|
|
116
|
+
|
|
117
|
+
### To double-check
|
|
118
|
+
- **[Medium] Possible SSRF** — `src/lib/fetchImage.ts:21`
|
|
119
|
+
- `url` comes from the request; host allowlisting may exist upstream — verify the caller.
|
|
120
|
+
- **Fix:** enforce a scheme + host allowlist at this function.
|
|
121
|
+
- **Confidence:** medium — needs the call site confirmed.
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Severity guide: **Critical** (RCE, auth bypass, live secret) · **High** (injection, SSRF, IDOR on sensitive data) · **Medium** (missing validation, traversal behind a guard) · **Low** (defense-in-depth, hardening).
|
|
125
|
+
|
|
126
|
+
> [!NOTE]
|
|
127
|
+
> Separate **confirmed** issues — where you traced tainted input to a dangerous sink — from things **to double-check** that depend on context you could not verify (an upstream guard, a framework default, a sanitizer elsewhere). Honest confidence is more useful than false certainty in either direction.
|
|
128
|
+
|
|
129
|
+
End with the single highest-priority issue to address first. Do not modify any files — this command only reports.
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Define and enforce a cost and latency budget for an LLM feature or endpoint — set p95/p99 latency and cost-per-request ceilings, instrument to measure them against real traffic, and wire a check that fails when the budget is breached."
|
|
3
|
+
argument-hint: "<the LLM endpoint/feature to budget, plus any target numbers (e.g. 'chat API, p95 < 2s, < $0.02/req')>"
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Bash"
|
|
5
|
+
model: sonnet
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Scope
|
|
9
|
+
|
|
10
|
+
Treat `$ARGUMENTS` as the LLM feature or endpoint to put a budget around — and any target numbers the user gave. The job is to turn "it should be fast and cheap" into **explicit, measured ceilings** that a build or monitor can enforce, so cost and latency can't regress silently. A budget nobody checks is a wish; this command produces one that fails loudly.
|
|
11
|
+
|
|
12
|
+
> [!NOTE]
|
|
13
|
+
> This sets and enforces the budget. To then *find and cut* what's over budget, hand off to the [llm-cost-optimizer](/agents/data-ai/llm-cost-optimizer) agent; for the techniques behind the targets, see [LLM Cost and Latency Engineering](/guides/advanced/llm-cost-latency-engineering).
|
|
14
|
+
|
|
15
|
+
## Step 1 — Pin the budget numbers
|
|
16
|
+
|
|
17
|
+
Settle the ceilings before measuring anything:
|
|
18
|
+
|
|
19
|
+
- **Latency** — p50/p95/p99 targets (budget the **tail**, p95/p99, not the average — users feel the tail). Distinguish total time from time-to-first-token for streamed responses.
|
|
20
|
+
- **Cost** — a cost-per-request ceiling, and/or a daily/monthly spend cap for the feature.
|
|
21
|
+
- **Scope** — which endpoint/feature/model this budget covers, since different routes warrant different budgets.
|
|
22
|
+
|
|
23
|
+
If the user didn't give numbers, propose defaults from the feature's UX (interactive vs. batch) and current measured baseline, and state them explicitly.
|
|
24
|
+
|
|
25
|
+
## Step 2 — Establish the baseline
|
|
26
|
+
|
|
27
|
+
Measure current cost and latency against **representative** traffic — real prompt/response sizes and concurrency, not a single warm request. Pull from existing observability/traces ([Helicone](/tools/helicone), [Portkey](/tools/portkey), or your logs) where available. Report p50/p95/p99 and cost-per-request as they stand, so the budget is grounded in reality and you know the gap.
|
|
28
|
+
|
|
29
|
+
## Step 3 — Instrument the metrics
|
|
30
|
+
|
|
31
|
+
Ensure the numbers are actually captured per request: latency (and time-to-first-token), input/output tokens, and computed cost. If instrumentation is missing, add the minimal measurement needed — you can't enforce a budget you don't record.
|
|
32
|
+
|
|
33
|
+
## Step 4 — Wire the enforcement
|
|
34
|
+
|
|
35
|
+
Make the budget fail loudly when breached, at the right gate:
|
|
36
|
+
|
|
37
|
+
- **CI / pre-merge** — a latency/cost regression test over a representative sample that fails the build when p95 or cost-per-request exceeds the ceiling.
|
|
38
|
+
- **Runtime** — alerts or guardrails on p95/p99 and on the daily/monthly spend cap (gateway budgets and rate limits can hard-stop runaway cost).
|
|
39
|
+
|
|
40
|
+
Pick the gate that matches the risk: regression-prone code → CI; runaway-spend risk → runtime caps.
|
|
41
|
+
|
|
42
|
+
## Step 5 — Document the budget
|
|
43
|
+
|
|
44
|
+
Record the ceilings, where they're enforced, the current baseline vs. target, and what to do on a breach (route to the [llm-cost-optimizer](/agents/data-ai/llm-cost-optimizer)). A budget that lives only in someone's head isn't enforced.
|
|
45
|
+
|
|
46
|
+
> [!WARNING]
|
|
47
|
+
> Budget the tail, not the mean. An average latency under target hides the p99 requests that make users churn — and an average cost hides the expensive outlier prompts that dominate the bill. Set and enforce p95/p99 and per-request ceilings, not just the average.
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Wire Claude Code into this repo's CI the safe way — install the GitHub App or scaffold the workflow YAML, scope permissions to the minimum, set secrets correctly, and verify with a real trigger."
|
|
3
|
+
argument-hint: "<what CI should do — e.g. 'review PRs', 'fix failing tests', 'respond to @claude mentions'>"
|
|
4
|
+
allowed-tools: "Read, Grep, Glob, Bash, Write, Edit"
|
|
5
|
+
model: sonnet
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Scope
|
|
9
|
+
|
|
10
|
+
Treat `$ARGUMENTS` as the job Claude should do in CI — review PRs, respond to `@claude` mentions, fix failing tests on a schedule, draft release notes. Restate it in one sentence, including the trigger (mention, PR opened, cron) and the smallest set of abilities the job needs, before touching anything.
|
|
11
|
+
|
|
12
|
+
Goal: a working `anthropics/claude-code-action@v1` workflow with **minimum permissions**, secrets handled correctly, and a verified first run — not just a YAML file that looks right.
|
|
13
|
+
|
|
14
|
+
## Step 1 — Detect the starting point
|
|
15
|
+
|
|
16
|
+
Check for an existing setup: `.github/workflows/*.yml` referencing `claude-code-action`, an installed GitHub App, an `ANTHROPIC_API_KEY` secret (`gh secret list`), and any checked-in `.claude/settings.json` whose permission rules will also apply in CI. Extend what exists rather than duplicating it.
|
|
17
|
+
|
|
18
|
+
## Step 2 — Choose the integration mode
|
|
19
|
+
|
|
20
|
+
Map `$ARGUMENTS` to one of the action's two modes:
|
|
21
|
+
|
|
22
|
+
- **Mention mode** (no `prompt` input) — the action answers `@claude` comments on issues and PRs. Right for on-demand help and "fix this" requests.
|
|
23
|
+
- **Prompt mode** (`prompt` input set) — runs automatically on the workflow's trigger. Right for PR-opened reviews, scheduled audits, release notes.
|
|
24
|
+
|
|
25
|
+
State the trigger events the workflow will subscribe to and why.
|
|
26
|
+
|
|
27
|
+
## Step 3 — Prefer the installer, fall back to manual
|
|
28
|
+
|
|
29
|
+
If the user can run interactive commands, recommend `claude /install-github-app` — it installs the GitHub App, stores the secret, and scaffolds the workflow in one flow. Otherwise scaffold manually:
|
|
30
|
+
|
|
31
|
+
```yaml
|
|
32
|
+
name: Claude Code
|
|
33
|
+
on:
|
|
34
|
+
issue_comment:
|
|
35
|
+
types: [created]
|
|
36
|
+
jobs:
|
|
37
|
+
claude:
|
|
38
|
+
runs-on: ubuntu-latest
|
|
39
|
+
steps:
|
|
40
|
+
- uses: anthropics/claude-code-action@v1
|
|
41
|
+
with:
|
|
42
|
+
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Adapt `on:` to the chosen trigger; add `prompt:` for prompt mode. For Bedrock/Vertex shops, use `use_bedrock`/`use_vertex` with OIDC instead of a static key.
|
|
46
|
+
|
|
47
|
+
## Step 4 — Scope it down
|
|
48
|
+
|
|
49
|
+
Add `claude_args` with the narrowest flags that let the job succeed — e.g. a reviewer gets `--max-turns 12` and read-heavy tools; a test-fixer gets `Edit` plus `Bash(npm test:*)` only. Never pass `--dangerously-skip-permissions` in CI; the runner is not a sandbox you control. Confirm the workflow doesn't run with secrets on arbitrary fork PRs.
|
|
50
|
+
|
|
51
|
+
> [!WARNING]
|
|
52
|
+
> Treat the bot like any contributor with write access: minimum tools, bounded turns, and the merge button stays human — the action cannot approve PRs by design, so don't engineer around that gate.
|
|
53
|
+
|
|
54
|
+
## Step 5 — Secrets, correctly
|
|
55
|
+
|
|
56
|
+
Verify `ANTHROPIC_API_KEY` exists as a repo (or org) secret — `gh secret set ANTHROPIC_API_KEY` if not — and that the key is a dedicated CI key, not someone's personal one, so it can be rotated without breaking laptops. Never echo the key in workflow logs.
|
|
57
|
+
|
|
58
|
+
## Step 6 — Verify with a real trigger
|
|
59
|
+
|
|
60
|
+
Don't declare success on a green YAML lint. Fire the actual trigger: open a scratch PR and comment `@claude what does this PR change?` (mention mode) or push a trivial PR (prompt mode). Confirm the action ran, the response landed, and the cost is visible in the run output. Hand back: the workflow file path, the trigger, the permission envelope, and how to tune it later via `claude_args` — pointing at [Running Claude Code in CI](/guides/advanced/claude-code-ci-github-actions) for the deeper reference.
|