safestar 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/package.json +1 -1
  2. package/readme.md +159 -0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "safestar",
3
- "version": "1.0.0",
3
+ "version": "1.1.0",
4
4
  "description": "Snapshot, version, and diff AI behavior over time.",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
package/readme.md ADDED
@@ -0,0 +1,159 @@
1
+ <p align="center">
2
+ <h1 align="center">SafeStar</h1>
3
+ <p align="center"><strong>The "Git" for AI Behavior.</strong></p>
4
+ <p align="center">Snapshot, version, and diff AI model outputs. Detect drift before your users do.</p>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="https://www.npmjs.com/package/safestar"><img src="https://img.shields.io/npm/v/safestar.svg" alt="npm version"></a>
9
+ <a href="https://github.com/your-username/safestar/actions"><img src="https://github.com/your-username/safestar/workflows/AI%20Guardrails/badge.svg" alt="Build Status"></a>
10
+ <a href="https://opensource.org/licenses/ISC"><img src="https://img.shields.io/badge/License-ISC-blue.svg" alt="License: ISC"></a>
11
+ </p>
12
+
13
+ ---
14
+
15
+ ## Why SafeStar?
16
+
17
+ You updated a prompt. Tests pass. You deploy. Three days later, users complain the bot is "acting weird."
18
+
19
+ **The problem:** Traditional tests don't catch AI behavior drift—subtle changes in tone, verbosity, or consistency that emerge over time or after model updates.
20
+
21
+ **SafeStar fixes this** by treating AI outputs like code:
22
+ - 📸 **Snapshot** a known-good baseline
23
+ - 🔍 **Diff** against it in CI/CD
24
+ - 🚨 **Fail the build** if behavior drifts beyond tolerance
25
+
26
+ No SaaS. No external dependencies. Works with any CLI command.
27
+
28
+ ---
29
+
30
+ ## Installation
31
+
32
+ ```bash
33
+ npm install --save-dev safestar
34
+ ```
35
+
36
+ ---
37
+
38
+ ## Quick Start
39
+
40
+ ### 1. Define a Scenario
41
+
42
+ Create `scenarios/refund.yaml`:
43
+
44
+ ```yaml
45
+ name: refund_bot_test
46
+ description: Ensure the refund bot doesn't hallucinate or get rude.
47
+
48
+ prompt: "I want a refund immediately."
49
+
50
+ # Run your AI however you want—Python, Node, curl, anything
51
+ exec: "python3 scripts/my_agent.py"
52
+
53
+ # Test multiple times to catch variance
54
+ runs: 5
55
+
56
+ # Heuristic guardrails
57
+ checks:
58
+ max_length: 200
59
+ must_contain:
60
+ - "refund"
61
+ must_not_contain:
62
+ - "I am just an AI"
63
+ ```
64
+
65
+ > **Note:** SafeStar passes the prompt via `process.env.PROMPT` (or equivalent in your language).
66
+
67
+ ### 2. Run & Baseline
68
+
69
+ Run your scenario:
70
+ ```bash
71
+ npx safestar run scenarios/refund.yaml
72
+ ```
73
+
74
+ Happy with the output? Lock it as your gold standard:
75
+ ```bash
76
+ npx safestar baseline refund_bot_test
77
+ ```
78
+
79
+ ### 3. Diff in CI/CD
80
+
81
+ ```bash
82
+ npx safestar diff scenarios/refund.yaml
83
+ ```
84
+
85
+ **Example output:**
86
+ ```
87
+ --- SAFESTAR REPORT ---
88
+ Status: FAIL
89
+
90
+ Metrics:
91
+ Avg Length: 45 chars
92
+ Drift: +210% vs baseline (WARNING)
93
+ Variance: 9.8 (High instability)
94
+
95
+ Violations:
96
+ - must_not_contain "sorry sorry": failed in 2 runs
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Checks Reference
102
+
103
+ | Check | Description |
104
+ |-------|-------------|
105
+ | `max_length` | Fail if output exceeds N characters |
106
+ | `must_contain` | Fail if any string is missing from output |
107
+ | `must_not_contain` | Fail if any string is found in output |
108
+
109
+ ---
110
+
111
+ ## `exec` Examples
112
+
113
+ SafeStar works with anything that prints to stdout:
114
+
115
+ ```yaml
116
+ # Python
117
+ exec: "python3 bot.py"
118
+
119
+ # Node.js
120
+ exec: "node agent.js"
121
+
122
+ # cURL (test an API directly)
123
+ exec: "curl -s https://api.openai.com/v1/chat/completions -H 'Authorization: Bearer $OPENAI_KEY' -d '{\"model\":\"gpt-4\",\"messages\":[{\"role\":\"user\",\"content\":\"$PROMPT\"}]}'"
124
+
125
+ # Any CLI
126
+ exec: "./my-binary --prompt \"$PROMPT\""
127
+ ```
128
+
129
+ ---
130
+
131
+ ## GitHub Actions
132
+
133
+ ```yaml
134
+ name: AI Guardrails
135
+ on: [push]
136
+
137
+ jobs:
138
+ test:
139
+ runs-on: ubuntu-latest
140
+ steps:
141
+ - uses: actions/checkout@v4
142
+ - run: npm ci
143
+ - run: npx safestar diff scenarios/refund.yaml
144
+ ```
145
+
146
+ ---
147
+
148
+ ## Philosophy
149
+
150
+ - **Zero dependencies** – Runs anywhere Node runs
151
+ - **No SaaS** – Your data stays on your machine
152
+ - **Language agnostic** – If it prints to stdout, SafeStar can test it
153
+ - **Git-native** – Baselines are `.json` files you commit
154
+
155
+ ---
156
+
157
+ ## License
158
+
159
+ ISC