@roleplay-sh/cli 0.1.5 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +10 -7
- package/CHANGELOG.md +26 -5
- package/CONTRIBUTING.md +7 -1
- package/README.md +57 -18
- package/RELEASE.md +12 -7
- package/SECURITY.md +3 -1
- package/dist/cli.js +1220 -695
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +153 -42
- package/dist/index.js +109 -15
- package/dist/index.js.map +1 -1
- package/package.json +2 -2
package/.env.example
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Agent credentials used by your own HTTP/CLI target.
|
|
2
2
|
AGENT_API_KEY=
|
|
3
3
|
|
|
4
|
-
#
|
|
4
|
+
# Workbench project settings. Create these after starting a Builder or Team trial.
|
|
5
5
|
ROLEPLAY_CLOUD_URL=https://app.roleplay.sh
|
|
6
6
|
ROLEPLAY_PROJECT_ID=
|
|
7
7
|
ROLEPLAY_API_KEY=
|
|
@@ -11,14 +11,17 @@ ROLEPLAY_AGENT_NAME=
|
|
|
11
11
|
ROLEPLAY_TARGET_URL=http://localhost:3000/agent
|
|
12
12
|
ROLEPLAY_TARGET_COMMAND=
|
|
13
13
|
|
|
14
|
-
#
|
|
15
|
-
# Provider choices:
|
|
16
|
-
ROLEPLAY_LLM_PROVIDER
|
|
14
|
+
# Adaptive attacker and judge configuration.
|
|
15
|
+
# Provider choices: openai, anthropic, google, openai-compatible.
|
|
16
|
+
ROLEPLAY_LLM_PROVIDER=<provider>
|
|
17
17
|
ROLEPLAY_LLM_MODEL=
|
|
18
|
+
ROLEPLAY_JUDGE_MODE=semantic
|
|
19
|
+
ROLEPLAY_JUDGE_PROVIDER=<provider>
|
|
20
|
+
ROLEPLAY_JUDGE_MODEL=
|
|
18
21
|
ROLEPLAY_ATTACKER_PROVIDER=
|
|
19
22
|
ROLEPLAY_ATTACKER_MODEL=
|
|
20
|
-
|
|
21
|
-
|
|
23
|
+
|
|
24
|
+
# Provider API keys. Set only the one you use; do not commit real secrets.
|
|
22
25
|
ROLEPLAY_OPENAI_API_KEY=
|
|
23
26
|
ROLEPLAY_ANTHROPIC_API_KEY=
|
|
24
27
|
ROLEPLAY_GOOGLE_API_KEY=
|
package/CHANGELOG.md
CHANGED
|
@@ -4,11 +4,32 @@ All notable changes to roleplay.sh will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
This project follows semantic versioning after the public `0.1.0` release.
|
|
6
6
|
|
|
7
|
-
## 0.1.
|
|
7
|
+
## 0.1.7 - Unreleased
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- Guided `roleplay setup` for Workbench project, target, provider, and judge configuration.
|
|
12
|
+
- Explicit judge modes: `rules`, `semantic`, and `hybrid`.
|
|
13
|
+
- Command-specific help for `run`, `doctor`, and `setup`.
|
|
14
|
+
- Judge metadata in saved reports so users can see how evidence was evaluated.
|
|
15
|
+
|
|
16
|
+
### Changed
|
|
17
|
+
|
|
18
|
+
- Real targets now require an explicit provider and judge choice instead of silently defaulting to a named provider.
|
|
19
|
+
- Public README and release copy now present roleplay.sh as a provider-neutral Workbench runner.
|
|
20
|
+
- `doctor` now separates attacker provider readiness, judge readiness, entitlement, and upload readiness.
|
|
21
|
+
|
|
22
|
+
## 0.1.6 - 2026-06-14
|
|
23
|
+
|
|
24
|
+
### Changed
|
|
25
|
+
|
|
26
|
+
- Aligned CLI copy with the paid roleplay.sh Workbench model.
|
|
27
|
+
|
|
28
|
+
## 0.1.4 - 2026-06-14
|
|
8
29
|
|
|
9
30
|
### Changed
|
|
10
31
|
|
|
11
|
-
- Updated CLI upload, doctor, and setup copy for the paid roleplay.sh
|
|
32
|
+
- Updated CLI upload, doctor, and setup copy for the paid roleplay.sh Workbench.
|
|
12
33
|
- Clarified that production uploads require a Builder or Team trial, project API key, and sanitized upload policy.
|
|
13
34
|
- Kept public command syntax stable while preserving mock smoke tests and BYO provider usage for real runs.
|
|
14
35
|
|
|
@@ -16,14 +37,14 @@ This project follows semantic versioning after the public `0.1.0` release.
|
|
|
16
37
|
|
|
17
38
|
### Added
|
|
18
39
|
|
|
19
|
-
- Adaptive
|
|
40
|
+
- Adaptive attacker providers for OpenAI, Anthropic, Google Gemini, and OpenAI-compatible APIs.
|
|
20
41
|
- LLM transcript judging against scenario success and failure criteria.
|
|
21
42
|
- `--provider`, `--attacker-provider`, `--judge-provider`, model, and OpenAI-compatible base URL flags.
|
|
22
43
|
- Scenario YAML support for attacker and judge provider settings.
|
|
23
44
|
|
|
24
45
|
### Changed
|
|
25
46
|
|
|
26
|
-
- Real HTTP and CLI targets
|
|
47
|
+
- Real HTTP and CLI targets use provider-backed mode for `social-engineering-core`.
|
|
27
48
|
- Mock mode remains available as an explicit deterministic smoke-test path with `--target mock --provider mock`.
|
|
28
49
|
|
|
29
50
|
## 0.1.2 - 2026-06-03
|
|
@@ -39,7 +60,7 @@ This project follows semantic versioning after the public `0.1.0` release.
|
|
|
39
60
|
- Dedicated public CLI package for local attack-pack execution.
|
|
40
61
|
- Built-in `social-engineering-core` attack pack.
|
|
41
62
|
- Local reports and replayable transcripts.
|
|
42
|
-
- Sanitized
|
|
63
|
+
- Sanitized workbench upload support.
|
|
43
64
|
|
|
44
65
|
## 0.1.0 - 2026-05-17
|
|
45
66
|
|
package/CONTRIBUTING.md
CHANGED
|
@@ -11,7 +11,13 @@ pnpm test
|
|
|
11
11
|
pnpm build
|
|
12
12
|
```
|
|
13
13
|
|
|
14
|
-
Use local attack-pack execution for tests and examples. External
|
|
14
|
+
Use local attack-pack execution for tests and examples. External provider behavior is part of the public CLI; keep provider additions explicit, tested, documented, and vendor-neutral in user-facing examples.
|
|
15
|
+
|
|
16
|
+
Judge changes must preserve all three user-facing modes:
|
|
17
|
+
|
|
18
|
+
- `rules` for deterministic smoke/offline checks.
|
|
19
|
+
- `semantic` for provider-backed security evaluation.
|
|
20
|
+
- `hybrid` for semantic evaluation plus deterministic guardrails.
|
|
15
21
|
|
|
16
22
|
## Pull requests
|
|
17
23
|
|
package/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# roleplay.sh CLI
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Included local runner for roleplay.sh social-engineering tests.
|
|
4
4
|
|
|
5
|
-
`roleplay` runs
|
|
5
|
+
`roleplay` runs attack packs against your local, HTTP, CLI, or mock AI agent target, saves replayable evidence, and uploads sanitized proof to the roleplay.sh Workbench.
|
|
6
6
|
|
|
7
7
|
## Install
|
|
8
8
|
|
|
@@ -16,23 +16,30 @@ Or run without installing:
|
|
|
16
16
|
npx @roleplay-sh/cli --help
|
|
17
17
|
```
|
|
18
18
|
|
|
19
|
-
##
|
|
19
|
+
## Smoke Test Only
|
|
20
|
+
|
|
21
|
+
Use mock mode to confirm the CLI is installed and can save local evidence. This does not test a real agent.
|
|
20
22
|
|
|
21
23
|
```bash
|
|
22
24
|
roleplay init
|
|
23
|
-
roleplay run social-engineering-core --target mock --provider mock --fail-on critical
|
|
25
|
+
roleplay run social-engineering-core --target mock --provider mock --judge rules --fail-on critical
|
|
24
26
|
roleplay report latest
|
|
25
27
|
roleplay replay latest
|
|
26
28
|
```
|
|
27
29
|
|
|
28
|
-
##
|
|
30
|
+
## Run A Real Local Test
|
|
31
|
+
|
|
32
|
+
Start a Builder or Team trial in the roleplay.sh Workbench, create a project API key, choose your provider, choose how results should be judged, then run the included local runner against your agent.
|
|
29
33
|
|
|
30
34
|
HTTP target:
|
|
31
35
|
|
|
32
36
|
```bash
|
|
33
37
|
roleplay run social-engineering-core \
|
|
34
38
|
--target http://localhost:3000/agent \
|
|
35
|
-
--provider
|
|
39
|
+
--provider <provider> \
|
|
40
|
+
--judge semantic \
|
|
41
|
+
--project <project-id> \
|
|
42
|
+
--api-key <project-api-key> \
|
|
36
43
|
--fail-on critical
|
|
37
44
|
```
|
|
38
45
|
|
|
@@ -41,22 +48,49 @@ CLI target:
|
|
|
41
48
|
```bash
|
|
42
49
|
roleplay run social-engineering-core \
|
|
43
50
|
--target-command "node ./agent.js" \
|
|
44
|
-
--provider
|
|
51
|
+
--provider <provider> \
|
|
52
|
+
--judge hybrid \
|
|
53
|
+
--project <project-id> \
|
|
54
|
+
--api-key <project-api-key> \
|
|
45
55
|
--fail-on critical \
|
|
46
56
|
--yes
|
|
47
57
|
```
|
|
48
58
|
|
|
49
|
-
|
|
59
|
+
## Judge Choices
|
|
60
|
+
|
|
61
|
+
- `--judge rules`: deterministic local rule judge. Best for smoke tests and offline checks.
|
|
62
|
+
- `--judge semantic`: provider-backed security judge. Recommended for real agent tests.
|
|
63
|
+
- `--judge hybrid`: semantic judge plus deterministic guardrails. Recommended for CI once your provider is configured.
|
|
64
|
+
|
|
65
|
+
Rules-only judging can be used against real targets only with `--allow-rules-only`, so it is never mistaken for full semantic evaluation.
|
|
66
|
+
|
|
67
|
+
## Provider Configuration
|
|
68
|
+
|
|
69
|
+
roleplay.sh is provider-neutral. Pick the provider you want to use for adaptive attacker turns and semantic judging.
|
|
50
70
|
|
|
51
71
|
```bash
|
|
52
|
-
export
|
|
72
|
+
export ROLEPLAY_PROJECT_ID="<project-id>"
|
|
73
|
+
export ROLEPLAY_API_KEY="<project-api-key>"
|
|
74
|
+
export ROLEPLAY_LLM_PROVIDER="<provider>"
|
|
75
|
+
export ROLEPLAY_JUDGE_MODE="hybrid"
|
|
76
|
+
export ROLEPLAY_JUDGE_PROVIDER="<provider>"
|
|
77
|
+
export ROLEPLAY_<PROVIDER>_API_KEY="your-provider-key"
|
|
53
78
|
```
|
|
54
79
|
|
|
55
|
-
Supported
|
|
80
|
+
Supported provider identifiers: `openai`, `anthropic`, `google`, and `openai-compatible`.
|
|
56
81
|
|
|
57
|
-
|
|
82
|
+
Use `--attacker-provider` and `--judge-provider` when you want different providers for attacker turns and transcript judging.
|
|
58
83
|
|
|
59
|
-
|
|
84
|
+
## Guided Setup
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
roleplay setup
|
|
88
|
+
roleplay doctor --cloud
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
`roleplay setup` writes safe placeholders to `.env.example`. It does not store raw provider or Workbench API keys by default.
|
|
92
|
+
|
|
93
|
+
## Upload Sanitized Proof
|
|
60
94
|
|
|
61
95
|
```bash
|
|
62
96
|
ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
|
|
@@ -69,26 +103,31 @@ Sanitized upload is the default. Full transcripts, raw scenario YAML, and local
|
|
|
69
103
|
|
|
70
104
|
## Commands
|
|
71
105
|
|
|
106
|
+
- `roleplay setup` guides Workbench and local runner setup.
|
|
72
107
|
- `roleplay init` creates local config and starter scenarios.
|
|
73
108
|
- `roleplay run` runs a scenario file or built-in attack pack.
|
|
74
109
|
- `roleplay report` prints a saved run report.
|
|
75
110
|
- `roleplay replay` replays transcript evidence.
|
|
76
|
-
- `roleplay upload` uploads sanitized findings to the
|
|
111
|
+
- `roleplay upload` uploads sanitized findings to the Workbench.
|
|
77
112
|
- `roleplay list` lists local runs.
|
|
78
|
-
- `roleplay doctor` checks
|
|
113
|
+
- `roleplay doctor` checks install, Workbench, provider, judge, and upload readiness.
|
|
79
114
|
- `roleplay mcp` exposes roleplay.sh through MCP.
|
|
80
115
|
|
|
81
116
|
## CI Example
|
|
82
117
|
|
|
83
118
|
```yaml
|
|
84
119
|
- name: Run roleplay.sh attack pack
|
|
85
|
-
run: pnpm dlx @roleplay-sh/cli run social-engineering-core --fail-on critical
|
|
120
|
+
run: pnpm dlx @roleplay-sh/cli run social-engineering-core --judge hybrid --fail-on critical
|
|
86
121
|
env:
|
|
87
122
|
ROLEPLAY_TARGET_URL: ${{ secrets.ROLEPLAY_TARGET_URL }}
|
|
88
|
-
|
|
89
|
-
|
|
123
|
+
ROLEPLAY_PROJECT_ID: ${{ secrets.ROLEPLAY_PROJECT_ID }}
|
|
124
|
+
ROLEPLAY_API_KEY: ${{ secrets.ROLEPLAY_API_KEY }}
|
|
125
|
+
ROLEPLAY_LLM_PROVIDER: ${{ secrets.ROLEPLAY_LLM_PROVIDER }}
|
|
126
|
+
ROLEPLAY_JUDGE_MODE: hybrid
|
|
127
|
+
ROLEPLAY_JUDGE_PROVIDER: ${{ secrets.ROLEPLAY_JUDGE_PROVIDER }}
|
|
128
|
+
ROLEPLAY_LLM_API_KEY: ${{ secrets.ROLEPLAY_LLM_API_KEY }}
|
|
90
129
|
|
|
91
|
-
- name: Upload sanitized
|
|
130
|
+
- name: Upload sanitized proof
|
|
92
131
|
if: always()
|
|
93
132
|
run: pnpm dlx @roleplay-sh/cli upload all --source ci --mode sanitized_findings
|
|
94
133
|
env:
|
package/RELEASE.md
CHANGED
|
@@ -29,8 +29,8 @@ The publish workflow uses GitHub OIDC and intentionally does not require an npm
|
|
|
29
29
|
Create a GitHub release or push a version tag:
|
|
30
30
|
|
|
31
31
|
```bash
|
|
32
|
-
git tag v0.1.
|
|
33
|
-
git push origin v0.1.
|
|
32
|
+
git tag v0.1.7
|
|
33
|
+
git push origin v0.1.7
|
|
34
34
|
```
|
|
35
35
|
|
|
36
36
|
The publish workflow runs checks and then publishes with:
|
|
@@ -46,19 +46,24 @@ npm view @roleplay-sh/cli version
|
|
|
46
46
|
npm install -g @roleplay-sh/cli
|
|
47
47
|
roleplay --help
|
|
48
48
|
roleplay init
|
|
49
|
-
roleplay run social-engineering-core --target mock --provider mock --fail-on critical
|
|
49
|
+
roleplay run social-engineering-core --target mock --provider mock --judge rules --fail-on critical
|
|
50
50
|
roleplay report latest
|
|
51
51
|
roleplay replay latest
|
|
52
52
|
```
|
|
53
53
|
|
|
54
|
-
For real
|
|
54
|
+
For real provider-backed verification:
|
|
55
55
|
|
|
56
56
|
```bash
|
|
57
|
-
export
|
|
58
|
-
|
|
57
|
+
export ROLEPLAY_PROJECT_ID=<project-id>
|
|
58
|
+
export ROLEPLAY_API_KEY=<project-api-key>
|
|
59
|
+
export ROLEPLAY_LLM_PROVIDER=<provider>
|
|
60
|
+
export ROLEPLAY_JUDGE_MODE=semantic
|
|
61
|
+
export ROLEPLAY_JUDGE_PROVIDER=<provider>
|
|
62
|
+
export ROLEPLAY_<PROVIDER>_API_KEY=<provider-key>
|
|
63
|
+
roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge semantic --max-turns 1 --fail-on critical
|
|
59
64
|
```
|
|
60
65
|
|
|
61
|
-
For
|
|
66
|
+
For workbench upload verification, start a Builder or Team trial, create a project API key at `https://app.roleplay.sh`, and run:
|
|
62
67
|
|
|
63
68
|
```bash
|
|
64
69
|
ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
|
package/SECURITY.md
CHANGED
|
@@ -12,7 +12,9 @@ Do not include real API keys, customer data, private prompts, transcripts, or pr
|
|
|
12
12
|
|
|
13
13
|
## Data Handling
|
|
14
14
|
|
|
15
|
-
roleplay.sh stores runs locally under `.roleplay/runs`. Scenario files, hidden context, transcripts, and reports may contain sensitive information. Full transcripts stay local unless you explicitly upload them to the
|
|
15
|
+
roleplay.sh stores runs locally under `.roleplay/runs`. Scenario files, hidden context, transcripts, and reports may contain sensitive information. Full transcripts stay local unless you explicitly upload them to the workbench with full-transcript mode enabled in both the project policy and the CLI command.
|
|
16
|
+
|
|
17
|
+
Provider API keys should stay in your local environment or CI secret store. `roleplay setup` writes placeholders only and does not store raw provider or Workbench API keys by default.
|
|
16
18
|
|
|
17
19
|
## CLI Target Execution
|
|
18
20
|
|