@plune-ai/cli 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Plune Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,139 @@
1
+ # @plune-ai/cli
2
+
3
+ > AI-powered assertion testing for LLM apps — a test runner for model behaviour.
4
+
5
+ [![npm](https://img.shields.io/npm/v/@plune-ai/cli)](https://www.npmjs.com/package/@plune-ai/cli)
6
+ [![CI](https://github.com/plune-ai/cli/actions/workflows/ci.yml/badge.svg)](https://github.com/plune-ai/cli/actions/workflows/ci.yml)
7
+ [![license](https://img.shields.io/npm/l/@plune-ai/cli)](./LICENSE)
8
+
9
+ Plune runs an assertion suite against an LLM provider and gives you a pass/fail report —
10
+ locally, in CI, or as a regression diff between two runs. You describe the checks in one
11
+ `plune.yaml`; Plune calls the model, evaluates each assertion, caches results, and reports
12
+ token cost. Ten built-in assertion types cover plain text, JSON-schema, LLM-as-judge, and
13
+ RAG metrics (faithfulness, answer-relevance, context-precision).
14
+
15
+ ## Install
16
+
17
+ ```bash
18
+ npm install -g @plune-ai/cli # or: pnpm add -g @plune-ai/cli
19
+ plune --version
20
+ ```
21
+
22
+ Or run it without installing:
23
+
24
+ ```bash
25
+ npx -y @plune-ai/cli run
26
+ ```
27
+
28
+ Requires **Node.js ≥ 20**.
29
+
30
+ ## Quickstart
31
+
32
+ ```bash
33
+ # 1. Scaffold plune.yaml, an example dataset, and .env.example
34
+ plune init
35
+
36
+ # 2. Add your provider key (read from the environment / .env — never written to disk)
37
+ echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env
38
+
39
+ # 3. Run the assertions
40
+ plune run
41
+ # → 1/1 passed · 0 failed · 0 errored · $0.0008
42
+
43
+ # 4. Re-render the last run, or diff two runs to catch regressions
44
+ plune report --format markdown
45
+ plune diff baseline.json current.json --fail-on-regression
46
+ ```
47
+
48
+ Each run writes its full result to `.plune/last-run.json`.
49
+
50
+ ## Configuration
51
+
52
+ Plune reads a single `plune.yaml`, discovered by walking up from the working directory (or
53
+ passed with `-c <path>`). This is what `plune init` scaffolds:
54
+
55
+ ```yaml
56
+ version: 1
57
+ provider:
58
+ type: anthropic # anthropic | openai | openrouter
59
+ model: claude-3-5-sonnet-latest
60
+ evals:
61
+ - id: example
62
+ prompt: "Answer concisely. {{question}}" # {{vars}} come from each dataset row
63
+ dataset: datasets/example.jsonl # a file path, or an inline `examples:` list
64
+ assertions:
65
+ - type: contains
66
+ value: "Paris"
67
+ ```
68
+
69
+ Datasets are JSONL, one row per line, shaped `{ "vars": { ... }, "expected"?: "..." }`. The
70
+ provider API key is read from the environment based on `provider.type`:
71
+
72
+ | Provider | `provider.type` | Environment variable |
73
+ | ---------- | --------------- | -------------------- |
74
+ | Anthropic | `anthropic` | `ANTHROPIC_API_KEY` |
75
+ | OpenAI | `openai` | `OPENAI_API_KEY` |
76
+ | OpenRouter | `openrouter` | `OPENROUTER_API_KEY` |
77
+
78
+ ### Assertion types
79
+
80
+ | Type | Passes when… |
81
+ | --------------------- | ---------------------------------------------------------------- |
82
+ | `exact-match` | output equals `value` (optional `trim`, `ignore_case`) |
83
+ | `contains` | output contains `value` |
84
+ | `contains-any` | output contains at least one of `values` |
85
+ | `contains-all` | output contains every one of `values` |
86
+ | `json-schema` | output validates against the JSON `schema` |
87
+ | `llm-judge` | an LLM grades the output against `criteria` (≥ `pass_threshold`) |
88
+ | `semantic-similarity` | embedding similarity to `reference` ≥ `threshold` |
89
+ | `faithfulness` | output is grounded in `context` (RAG) |
90
+ | `answer-relevance` | output actually answers the `question` (RAG) |
91
+ | `context-precision` | `context` is relevant to the `question` (RAG) |
92
+
93
+ ## Commands
94
+
95
+ | Command | Summary |
96
+ | ------- | ------- |
97
+ | `plune run` | Run the suite. Flags: `--dry-run`, `--only <id\|tag>` (repeatable), `--bail`, `--no-cache`, `--concurrency <n>`, `--format console\|json\|markdown`, `-o, --output <file>`. |
98
+ | `plune report` | Re-render the most recent run. Flags: `--format`, `-o`. |
99
+ | `plune diff <baseline> <current>` | Compare two `plune run --format json` outputs and report pass→fail regressions. Flags: `--fail-on-regression`, `--format`, `-o`. |
100
+ | `plune init` | Scaffold `plune.yaml`, a sample dataset, and `.env.example`. Flags: `--yes` (non-interactive), `--force`. |
101
+
102
+ Global flags: `-c, --config <path>` · `-v, --verbose` · `--no-color`.
103
+
104
+ **Exit codes:** `0` everything passed · `1` an assertion failed · `2` configuration or execution error.
105
+
106
+ ## Programmatic API
107
+
108
+ The same engine that powers `plune run` is exported for use from your own code. Unlike the
109
+ CLI, the library does **not** parse argv or auto-load `.env` — set the provider key in
110
+ `process.env` yourself.
111
+
112
+ ```ts
113
+ import { run } from '@plune-ai/cli';
114
+ import type { RunResult } from '@plune-ai/cli';
115
+
116
+ const result: RunResult = await run({ dryRun: false, configPath: 'plune.yaml' });
117
+ console.log(result.summary); // { total, passed, failed, errored, ... }
118
+ ```
119
+
120
+ ## Use in CI
121
+
122
+ Run Plune on every pull request and post a regression diff as a sticky comment with the
123
+ companion GitHub Action, [**plune-ai/eval-action**](https://github.com/plune-ai/eval-action):
124
+
125
+ ```yaml
126
+ - uses: plune-ai/eval-action@v1
127
+ with:
128
+ config: plune.yaml
129
+ fail-on-regression: true
130
+ ```
131
+
132
+ ## Contributing
133
+
134
+ Bug reports and pull requests are welcome — see [CONTRIBUTING.md](./CONTRIBUTING.md). For
135
+ security issues, see [SECURITY.md](./SECURITY.md).
136
+
137
+ ## License
138
+
139
+ [MIT](./LICENSE) © Plune Contributors