@kindlm/cli 0.4.0 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +108 -0
- package/package.json +3 -2
package/README.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
# KindLM
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+
|
|
5
|
+
Behavioral regression testing for AI agents. Test what your agents **do** — not just what they say.
|
|
6
|
+
|
|
7
|
+
## Why KindLM?
|
|
8
|
+
|
|
9
|
+
LLM evals measure text quality. KindLM tests **behavior** — the tool calls your agent makes, the decisions it takes, and whether it leaks PII or violates compliance rules. It runs in CI so regressions never ship.
|
|
10
|
+
|
|
11
|
+
## Features
|
|
12
|
+
|
|
13
|
+
- **Tool call assertions** — verify agents call the right tools with the right arguments, in the right order
|
|
14
|
+
- **Schema validation** — structured output checked against JSON Schema (AJV)
|
|
15
|
+
- **PII detection** — catch leaked SSNs, credit cards, emails, phone numbers, IBANs
|
|
16
|
+
- **LLM-as-judge** — score responses against natural-language criteria (0.0–1.0)
|
|
17
|
+
- **Drift detection** — semantic + field-level comparison against saved baselines
|
|
18
|
+
- **Keyword guards** — require or forbid specific phrases in output
|
|
19
|
+
- **Latency & cost budgets** — fail tests that exceed time or token-cost thresholds
|
|
20
|
+
- **EU AI Act compliance** — generate Annex IV documentation from test results
|
|
21
|
+
- **CI-native** — exit code 0/1, JUnit XML reporter, GitHub Actions ready
|
|
22
|
+
|
|
23
|
+
## Supported Providers
|
|
24
|
+
|
|
25
|
+
| Provider | Example config |
|
|
26
|
+
|----------|---------------|
|
|
27
|
+
| OpenAI | `openai:gpt-4o` |
|
|
28
|
+
| Anthropic | `anthropic:claude-sonnet-4-5-20250929` |
|
|
29
|
+
| Ollama | `ollama:llama3` |
|
|
30
|
+
| Google Gemini | `google:gemini-2.0-flash` |
|
|
31
|
+
| AWS Bedrock | `bedrock:anthropic.claude-sonnet-4-5-20250929-v1:0` |
|
|
32
|
+
| Azure OpenAI | `azure:my-gpt4o-deployment` |
|
|
33
|
+
|
|
34
|
+
## Quick Start
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
npm install -g @kindlm/cli
|
|
38
|
+
kindlm init
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Edit the generated `kindlm.yaml`:
|
|
42
|
+
|
|
43
|
+
```yaml
|
|
44
|
+
version: "1"
|
|
45
|
+
defaults:
|
|
46
|
+
provider: openai:gpt-4o
|
|
47
|
+
temperature: 0
|
|
48
|
+
runs: 3
|
|
49
|
+
|
|
50
|
+
suites:
|
|
51
|
+
- name: refund-agent
|
|
52
|
+
system_prompt: "You are a refund support agent."
|
|
53
|
+
tests:
|
|
54
|
+
- name: looks-up-order
|
|
55
|
+
input: "I want to return order #12345"
|
|
56
|
+
assert:
|
|
57
|
+
- type: tool_called
|
|
58
|
+
value: lookup_order
|
|
59
|
+
- type: no_pii
|
|
60
|
+
- type: judge
|
|
61
|
+
criteria: "Response is empathetic and professional"
|
|
62
|
+
threshold: 0.8
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Run your tests:
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
kindlm test
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Output:
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
refund-agent
|
|
75
|
+
✓ looks-up-order (3/3 runs passed)
|
|
76
|
+
✓ tool_called: lookup_order
|
|
77
|
+
✓ no_pii
|
|
78
|
+
✓ judge: 0.92 ≥ 0.8
|
|
79
|
+
|
|
80
|
+
1 suite, 1 test, 3 assertions — all passed
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## CI Integration
|
|
84
|
+
|
|
85
|
+
```yaml
|
|
86
|
+
# .github/workflows/test.yml
|
|
87
|
+
- run: npm install -g @kindlm/cli
|
|
88
|
+
- run: kindlm test --reporter junit --output results.xml
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## Repository Layout
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
packages/
|
|
95
|
+
core/ @kindlm/core — Business logic, zero I/O dependencies
|
|
96
|
+
cli/ @kindlm/cli — CLI entry point
|
|
97
|
+
cloud/ @kindlm/cloud — Cloudflare Workers API + D1 database
|
|
98
|
+
docs/ Technical specs and documentation
|
|
99
|
+
site/ Documentation website (Next.js)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Documentation
|
|
103
|
+
|
|
104
|
+
Full docs: [kindlm.dev](https://kindlm.dev) | Source: [`docs/`](./docs/)
|
|
105
|
+
|
|
106
|
+
## License
|
|
107
|
+
|
|
108
|
+
MIT (core + CLI) | AGPL (cloud)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@kindlm/cli",
|
|
3
|
-
"version": "0.4.
|
|
3
|
+
"version": "0.4.1",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"description": "CLI for KindLM — behavioral regression testing for AI agents",
|
|
@@ -38,7 +38,8 @@
|
|
|
38
38
|
"module": "./dist/index.js",
|
|
39
39
|
"types": "./dist/index.d.ts",
|
|
40
40
|
"files": [
|
|
41
|
-
"dist"
|
|
41
|
+
"dist",
|
|
42
|
+
"README.md"
|
|
42
43
|
],
|
|
43
44
|
"scripts": {
|
|
44
45
|
"build": "tsup",
|