safestar 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/readme.md +159 -0
package/package.json
CHANGED
package/readme.md
ADDED
|
@@ -0,0 +1,159 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<h1 align="center">SafeStar</h1>
|
|
3
|
+
<p align="center"><strong>The "Git" for AI Behavior.</strong></p>
|
|
4
|
+
<p align="center">Snapshot, version, and diff AI model outputs. Detect drift before your users do.</p>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<a href="https://www.npmjs.com/package/safestar"><img src="https://img.shields.io/npm/v/safestar.svg" alt="npm version"></a>
|
|
9
|
+
<a href="https://github.com/your-username/safestar/actions"><img src="https://github.com/your-username/safestar/workflows/AI%20Guardrails/badge.svg" alt="Build Status"></a>
|
|
10
|
+
<a href="https://opensource.org/licenses/ISC"><img src="https://img.shields.io/badge/License-ISC-blue.svg" alt="License: ISC"></a>
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Why SafeStar?
|
|
16
|
+
|
|
17
|
+
You updated a prompt. Tests pass. You deploy. Three days later, users complain the bot is "acting weird."
|
|
18
|
+
|
|
19
|
+
**The problem:** Traditional tests don't catch AI behavior drift—subtle changes in tone, verbosity, or consistency that emerge over time or after model updates.
|
|
20
|
+
|
|
21
|
+
**SafeStar fixes this** by treating AI outputs like code:
|
|
22
|
+
- 📸 **Snapshot** a known-good baseline
|
|
23
|
+
- 🔍 **Diff** against it in CI/CD
|
|
24
|
+
- 🚨 **Fail the build** if behavior drifts beyond tolerance
|
|
25
|
+
|
|
26
|
+
No SaaS. No external dependencies. Works with any CLI command.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Installation
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
npm install --save-dev safestar
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Quick Start
|
|
39
|
+
|
|
40
|
+
### 1. Define a Scenario
|
|
41
|
+
|
|
42
|
+
Create `scenarios/refund.yaml`:
|
|
43
|
+
|
|
44
|
+
```yaml
|
|
45
|
+
name: refund_bot_test
|
|
46
|
+
description: Ensure the refund bot doesn't hallucinate or get rude.
|
|
47
|
+
|
|
48
|
+
prompt: "I want a refund immediately."
|
|
49
|
+
|
|
50
|
+
# Run your AI however you want—Python, Node, curl, anything
|
|
51
|
+
exec: "python3 scripts/my_agent.py"
|
|
52
|
+
|
|
53
|
+
# Test multiple times to catch variance
|
|
54
|
+
runs: 5
|
|
55
|
+
|
|
56
|
+
# Heuristic guardrails
|
|
57
|
+
checks:
|
|
58
|
+
max_length: 200
|
|
59
|
+
must_contain:
|
|
60
|
+
- "refund"
|
|
61
|
+
must_not_contain:
|
|
62
|
+
- "I am just an AI"
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
> **Note:** SafeStar passes the prompt via `process.env.PROMPT` (or equivalent in your language).
|
|
66
|
+
|
|
67
|
+
### 2. Run & Baseline
|
|
68
|
+
|
|
69
|
+
Run your scenario:
|
|
70
|
+
```bash
|
|
71
|
+
npx safestar run scenarios/refund.yaml
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Happy with the output? Lock it as your gold standard:
|
|
75
|
+
```bash
|
|
76
|
+
npx safestar baseline refund_bot_test
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### 3. Diff in CI/CD
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
npx safestar diff scenarios/refund.yaml
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
**Example output:**
|
|
86
|
+
```
|
|
87
|
+
--- SAFESTAR REPORT ---
|
|
88
|
+
Status: FAIL
|
|
89
|
+
|
|
90
|
+
Metrics:
|
|
91
|
+
Avg Length: 45 chars
|
|
92
|
+
Drift: +210% vs baseline (WARNING)
|
|
93
|
+
Variance: 9.8 (High instability)
|
|
94
|
+
|
|
95
|
+
Violations:
|
|
96
|
+
- must_not_contain "sorry sorry": failed in 2 runs
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Checks Reference
|
|
102
|
+
|
|
103
|
+
| Check | Description |
|
|
104
|
+
|-------|-------------|
|
|
105
|
+
| `max_length` | Fail if output exceeds N characters |
|
|
106
|
+
| `must_contain` | Fail if any string is missing from output |
|
|
107
|
+
| `must_not_contain` | Fail if any string is found in output |
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## `exec` Examples
|
|
112
|
+
|
|
113
|
+
SafeStar works with anything that prints to stdout:
|
|
114
|
+
|
|
115
|
+
```yaml
|
|
116
|
+
# Python
|
|
117
|
+
exec: "python3 bot.py"
|
|
118
|
+
|
|
119
|
+
# Node.js
|
|
120
|
+
exec: "node agent.js"
|
|
121
|
+
|
|
122
|
+
# cURL (test an API directly)
|
|
123
|
+
exec: "curl -s https://api.openai.com/v1/chat/completions -H 'Authorization: Bearer $OPENAI_KEY' -d '{\"model\":\"gpt-4\",\"messages\":[{\"role\":\"user\",\"content\":\"$PROMPT\"}]}'"
|
|
124
|
+
|
|
125
|
+
# Any CLI
|
|
126
|
+
exec: "./my-binary --prompt \"$PROMPT\""
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## GitHub Actions
|
|
132
|
+
|
|
133
|
+
```yaml
|
|
134
|
+
name: AI Guardrails
|
|
135
|
+
on: [push]
|
|
136
|
+
|
|
137
|
+
jobs:
|
|
138
|
+
test:
|
|
139
|
+
runs-on: ubuntu-latest
|
|
140
|
+
steps:
|
|
141
|
+
- uses: actions/checkout@v4
|
|
142
|
+
- run: npm ci
|
|
143
|
+
- run: npx safestar diff scenarios/refund.yaml
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## Philosophy
|
|
149
|
+
|
|
150
|
+
- **Zero dependencies** – Runs anywhere Node runs
|
|
151
|
+
- **No SaaS** – Your data stays on your machine
|
|
152
|
+
- **Language agnostic** – If it prints to stdout, SafeStar can test it
|
|
153
|
+
- **Git-native** – Baselines are `.json` files you commit
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## License
|
|
158
|
+
|
|
159
|
+
ISC
|