safestar 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/package.json +1 -1
  2. package/readme.md +165 -0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "safestar",
3
- "version": "1.0.0",
3
+ "version": "1.2.0",
4
4
  "description": "Snapshot, version, and diff AI behavior over time.",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
package/readme.md ADDED
@@ -0,0 +1,165 @@
1
+ <p align="center">
2
+ <h1 align="center">SafeStar</h1>
3
+ <p align="center"><strong>The "Git" for AI Behavior.</strong></p>
4
+ <p align="center">Snapshot, version, and diff AI model outputs. Detect drift before your users do.</p>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="https://www.npmjs.com/package/safestar"><img src="https://img.shields.io/npm/v/safestar.svg" alt="npm version"></a>
9
+ <a href="https://github.com/your-username/safestar/actions"><img src="https://github.com/your-username/safestar/workflows/AI%20Guardrails/badge.svg" alt="Build Status"></a>
10
+ <a href="https://opensource.org/licenses/ISC"><img src="https://img.shields.io/badge/License-ISC-blue.svg" alt="License: ISC"></a>
11
+ </p>
12
+
13
+ ---
14
+
15
+ ## Why SafeStar?
16
+ <img src="https://github.com/user-attachments/assets/62a87419-ddaa-4fc0-bb50-90e2c5bf2dfd">
17
+
18
+
19
+ You updated a prompt. Tests pass. You deploy. Three days later, users complain the bot is "acting weird."
20
+
21
+ **The problem:** Traditional tests don't catch AI behavior drift—subtle changes in tone, verbosity, or consistency that emerge over time or after model updates.
22
+
23
+ **SafeStar fixes this** by treating AI outputs like code:
24
+ - 📸 **Snapshot** a known-good baseline
25
+ - 🔍 **Diff** against it in CI/CD
26
+ - 🚨 **Fail the build** if behavior drifts beyond tolerance
27
+
28
+ No SaaS. No external dependencies. Works with any CLI command.
29
+ <img src="https://github.com/user-attachments/assets/f3a14fec-93f6-42c3-b8e2-35af6b701b5f">
30
+
31
+ ---
32
+
33
+ ## Installation
34
+
35
+ ```bash
36
+ npm install --save-dev safestar
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Quick Start
42
+
43
+ ### 1. Define a Scenario
44
+
45
+ Create `scenarios/refund.yaml`:
46
+
47
+ ```yaml
48
+ name: refund_bot_test
49
+ description: Ensure the refund bot doesn't hallucinate or get rude.
50
+
51
+ prompt: "I want a refund immediately."
52
+
53
+ # Run your AI however you want—Python, Node, curl, anything
54
+ exec: "python3 scripts/my_agent.py"
55
+
56
+ # Test multiple times to catch variance
57
+ runs: 5
58
+
59
+ # Heuristic guardrails
60
+ checks:
61
+ max_length: 200
62
+ must_contain:
63
+ - "refund"
64
+ must_not_contain:
65
+ - "I am just an AI"
66
+ ```
67
+
68
+ > **Note:** SafeStar passes the prompt via `process.env.PROMPT` (or equivalent in your language).
69
+ <img src="https://github.com/user-attachments/assets/8ccd918e-4331-4f2c-981e-c7c9535865f0">
70
+
71
+ ### 2. Run & Baseline
72
+
73
+ Run your scenario:
74
+ ```bash
75
+ npx safestar run scenarios/refund.yaml
76
+ ```
77
+
78
+ Happy with the output? Lock it as your gold standard:
79
+ ```bash
80
+ npx safestar baseline refund_bot_test
81
+ ```
82
+ <img src="https://github.com/user-attachments/assets/2021ff79-e094-470e-be19-8490d8d3ae6b">
83
+
84
+ ### 3. Diff in CI/CD
85
+
86
+ ```bash
87
+ npx safestar diff scenarios/refund.yaml
88
+ ```
89
+
90
+ **Example output:**
91
+ ```
92
+ --- SAFESTAR REPORT ---
93
+ Status: FAIL
94
+
95
+ Metrics:
96
+ Avg Length: 45 chars
97
+ Drift: +210% vs baseline (WARNING)
98
+ Variance: 9.8 (High instability)
99
+
100
+ Violations:
101
+ - must_not_contain "sorry sorry": failed in 2 runs
102
+ ```
103
+
104
+ ---
105
+ <img src="https://github.com/user-attachments/assets/c2c6bbe6-0971-43d3-9845-0a5f0bdd0092">
106
+
107
+ ## Checks Reference
108
+
109
+ | Check | Description |
110
+ |-------|-------------|
111
+ | `max_length` | Fail if output exceeds N characters |
112
+ | `must_contain` | Fail if any string is missing from output |
113
+ | `must_not_contain` | Fail if any string is found in output |
114
+
115
+ ---
116
+
117
+ ## `exec` Examples
118
+
119
+ SafeStar works with anything that prints to stdout:
120
+
121
+ ```yaml
122
+ # Python
123
+ exec: "python3 bot.py"
124
+
125
+ # Node.js
126
+ exec: "node agent.js"
127
+
128
+ # cURL (test an API directly)
129
+ exec: "curl -s https://api.openai.com/v1/chat/completions -H 'Authorization: Bearer $OPENAI_KEY' -d '{\"model\":\"gpt-4\",\"messages\":[{\"role\":\"user\",\"content\":\"$PROMPT\"}]}'"
130
+
131
+ # Any CLI
132
+ exec: "./my-binary --prompt \"$PROMPT\""
133
+ ```
134
+
135
+ ---
136
+
137
+ ## GitHub Actions
138
+
139
+ ```yaml
140
+ name: AI Guardrails
141
+ on: [push]
142
+
143
+ jobs:
144
+ test:
145
+ runs-on: ubuntu-latest
146
+ steps:
147
+ - uses: actions/checkout@v4
148
+ - run: npm ci
149
+ - run: npx safestar diff scenarios/refund.yaml
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Philosophy
155
+
156
+ - **Zero dependencies** – Runs anywhere Node runs
157
+ - **No SaaS** – Your data stays on your machine
158
+ - **Language agnostic** – If it prints to stdout, SafeStar can test it
159
+ - **Git-native** – Baselines are `.json` files you commit
160
+
161
+ ---
162
+
163
+ ## License
164
+
165
+ ISC