@roleplay-sh/cli 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.env.example CHANGED
@@ -10,3 +10,17 @@ ROLEPLAY_AGENT_NAME=
10
10
  # Built-in social-engineering-core target. Set exactly one for CI.
11
11
  ROLEPLAY_TARGET_URL=http://localhost:3000/agent
12
12
  ROLEPLAY_TARGET_COMMAND=
13
+
14
+ # Optional LLM provider settings for adaptive attacker turns and semantic judging.
15
+ # Provider choices: mock, openai, anthropic, google, openai-compatible.
16
+ ROLEPLAY_LLM_PROVIDER=mock
17
+ ROLEPLAY_LLM_MODEL=
18
+ ROLEPLAY_ATTACKER_PROVIDER=
19
+ ROLEPLAY_ATTACKER_MODEL=
20
+ ROLEPLAY_JUDGE_PROVIDER=
21
+ ROLEPLAY_JUDGE_MODEL=
22
+ ROLEPLAY_OPENAI_API_KEY=
23
+ ROLEPLAY_ANTHROPIC_API_KEY=
24
+ ROLEPLAY_GOOGLE_API_KEY=
25
+ ROLEPLAY_LLM_API_KEY=
26
+ ROLEPLAY_LLM_BASE_URL=
package/CHANGELOG.md CHANGED
@@ -4,6 +4,35 @@ All notable changes to roleplay.sh will be documented in this file.
4
4
 
5
5
  This project follows semantic versioning after the public `0.1.0` release.
6
6
 
7
+ ## 0.1.3 - 2026-06-06
8
+
9
+ ### Added
10
+
11
+ - Adaptive LLM attacker providers for OpenAI, Anthropic, Google Gemini, and OpenAI-compatible APIs.
12
+ - LLM transcript judging against scenario success and failure criteria.
13
+ - `--provider`, `--attacker-provider`, `--judge-provider`, model, and OpenAI-compatible base URL flags.
14
+ - Scenario YAML support for attacker and judge provider settings.
15
+
16
+ ### Changed
17
+
18
+ - Real HTTP and CLI targets default to LLM provider mode for `social-engineering-core`.
19
+ - Mock mode remains available as an explicit deterministic smoke-test path with `--target mock --provider mock`.
20
+
21
+ ## 0.1.2 - 2026-06-03
22
+
23
+ ### Changed
24
+
25
+ - Corrected packaged documentation to match the public launch scope.
26
+
27
+ ## 0.1.1 - 2026-06-03
28
+
29
+ ### Added
30
+
31
+ - Dedicated public CLI package for local attack-pack execution.
32
+ - Built-in `social-engineering-core` attack pack.
33
+ - Local reports and replayable transcripts.
34
+ - Sanitized Team Cloud upload support.
35
+
7
36
  ## 0.1.0 - 2026-05-17
8
37
 
9
38
  ### Added
@@ -11,21 +40,18 @@ This project follows semantic versioning after the public `0.1.0` release.
11
40
  - Initial `roleplay` CLI.
12
41
  - Scenario YAML validation with Zod.
13
42
  - HTTP, CLI, and mock target adapters.
14
- - Mock and OpenAI roleplayed-user providers.
15
- - Mock and OpenAI judge implementations.
43
+ - Local deterministic roleplayed-user provider.
44
+ - Local deterministic judge implementation.
16
45
  - Local run storage under `.roleplay/runs`.
17
46
  - JSON and Markdown report generation.
18
- - `init`, `scenario:create`, `run`, `report`, `replay`, `list`, `doctor`, `redteam`, and experimental `mcp` commands.
47
+ - `init`, `run`, `report`, `replay`, `list`, `upload`, `doctor`, and `mcp` commands.
19
48
  - Example agents and scenarios.
20
49
  - Vitest test suite, linting, strict TypeScript, tsup build, CI, and npm publish workflow.
21
50
  - Package smoke test that verifies tarball contents and installed CLI behavior.
22
51
  - Failed-run artifact persistence for target/provider/judge errors.
23
52
  - Safer CLI target execution defaults and explicit `shell: true` opt-in.
24
- - Red-team target validation and optional `--save` for generated scenarios.
25
53
  - HTTP target diagnostics for text responses, missing fields, and timeouts.
26
54
 
27
55
  ### Notes
28
56
 
29
- - MCP support is a roadmap stub in this release.
30
- - Mock provider and mock judge are the stable path for first local usage.
31
- - OpenAI mode requires `OPENAI_API_KEY` and should be treated as experimental until more live usage is collected.
57
+ - Local attack-pack execution is the supported path for first usage.
package/CONTRIBUTING.md CHANGED
@@ -11,7 +11,7 @@ pnpm test
11
11
  pnpm build
12
12
  ```
13
13
 
14
- Use mock providers for tests and examples unless you are intentionally testing OpenAI integration.
14
+ Use local attack-pack execution for tests and examples. External model-provider behavior is now part of the public CLI; keep provider additions explicit, tested, and documented.
15
15
 
16
16
  ## Pull requests
17
17
 
package/README.md CHANGED
@@ -20,7 +20,7 @@ npx @roleplay-sh/cli --help
20
20
 
21
21
  ```bash
22
22
  roleplay init
23
- roleplay run social-engineering-core --target mock --fail-on critical
23
+ roleplay run social-engineering-core --target mock --provider mock --fail-on critical
24
24
  roleplay report latest
25
25
  roleplay replay latest
26
26
  ```
@@ -32,6 +32,7 @@ HTTP target:
32
32
  ```bash
33
33
  roleplay run social-engineering-core \
34
34
  --target http://localhost:3000/agent \
35
+ --provider openai \
35
36
  --fail-on critical
36
37
  ```
37
38
 
@@ -40,10 +41,19 @@ CLI target:
40
41
  ```bash
41
42
  roleplay run social-engineering-core \
42
43
  --target-command "node ./agent.js" \
44
+ --provider openai \
43
45
  --fail-on critical \
44
46
  --yes
45
47
  ```
46
48
 
49
+ Set the provider API key before running a real attack pack:
50
+
51
+ ```bash
52
+ export ROLEPLAY_OPENAI_API_KEY="your-openai-key"
53
+ ```
54
+
55
+ Supported providers are `openai`, `anthropic`, `google`, and `openai-compatible`. Use `--attacker-provider` and `--judge-provider` when you want different providers for adaptive attacker turns and transcript judging. Use `--target mock --provider mock` for deterministic local smoke tests.
56
+
47
57
  ## Upload Sanitized Findings To Team Cloud
48
58
 
49
59
  Create a project and API key in Team Cloud at `https://app.roleplay.sh`, then run:
@@ -75,6 +85,8 @@ Sanitized upload is the default. Full transcripts, raw scenario YAML, and local
75
85
  run: pnpm dlx @roleplay-sh/cli run social-engineering-core --fail-on critical
76
86
  env:
77
87
  ROLEPLAY_TARGET_URL: ${{ secrets.ROLEPLAY_TARGET_URL }}
88
+ ROLEPLAY_LLM_PROVIDER: openai
89
+ ROLEPLAY_OPENAI_API_KEY: ${{ secrets.ROLEPLAY_OPENAI_API_KEY }}
78
90
 
79
91
  - name: Upload sanitized findings
80
92
  if: always()
package/RELEASE.md CHANGED
@@ -29,8 +29,8 @@ The publish workflow uses GitHub OIDC and intentionally does not require an npm
29
29
  Create a GitHub release or push a version tag:
30
30
 
31
31
  ```bash
32
- git tag v0.1.1
33
- git push origin v0.1.1
32
+ git tag v0.1.3
33
+ git push origin v0.1.3
34
34
  ```
35
35
 
36
36
  The publish workflow runs checks and then publishes with:
@@ -46,11 +46,18 @@ npm view @roleplay-sh/cli version
46
46
  npm install -g @roleplay-sh/cli
47
47
  roleplay --help
48
48
  roleplay init
49
- roleplay run social-engineering-core --target mock --fail-on critical
49
+ roleplay run social-engineering-core --target mock --provider mock --fail-on critical
50
50
  roleplay report latest
51
51
  roleplay replay latest
52
52
  ```
53
53
 
54
+ For real LLM-backed verification:
55
+
56
+ ```bash
57
+ export ROLEPLAY_OPENAI_API_KEY=<openai-key>
58
+ roleplay run social-engineering-core --target http://localhost:3000/agent --provider openai --max-turns 1 --fail-on critical
59
+ ```
60
+
54
61
  For Team Cloud upload verification, create a project API key at `https://app.roleplay.sh` and run:
55
62
 
56
63
  ```bash
package/SECURITY.md CHANGED
@@ -12,9 +12,7 @@ Do not include real API keys, customer data, private prompts, transcripts, or pr
12
12
 
13
13
  ## Data Handling
14
14
 
15
- roleplay.sh stores runs locally under `.roleplay/runs`. Scenario files, hidden context, transcripts, and reports may contain sensitive information.
16
-
17
- When using OpenAI providers or judges, scenario data and transcripts are sent to the external provider. Use `--provider mock --judge mock` for local-only testing.
15
+ roleplay.sh stores runs locally under `.roleplay/runs`. Scenario files, hidden context, transcripts, and reports may contain sensitive information. Full transcripts stay local unless you explicitly upload them to Team Cloud with full-transcript mode enabled in both the project policy and the CLI command.
18
16
 
19
17
  ## CLI Target Execution
20
18