@beingmartinbmc/ojas 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +327 -167
- package/dist/cli/index.d.ts +23 -0
- package/dist/cli/index.d.ts.map +1 -0
- package/dist/cli/index.js +240 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/index.d.ts +4 -0
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +11 -1
- package/dist/index.js.map +1 -1
- package/dist/scorecard/badge.d.ts +46 -0
- package/dist/scorecard/badge.d.ts.map +1 -0
- package/dist/scorecard/badge.js +95 -0
- package/dist/scorecard/badge.js.map +1 -0
- package/dist/scorecard/index.d.ts +88 -0
- package/dist/scorecard/index.d.ts.map +1 -0
- package/dist/scorecard/index.js +186 -0
- package/dist/scorecard/index.js.map +1 -0
- package/docs/BACKLOG.md +2 -2
- package/docs/EVIDENCE.md +152 -10
- package/docs/EVIDENCE_MATRIX.md +97 -8
- package/docs/MCP.md +2 -2
- package/package.json +4 -2
package/README.md
CHANGED
|
@@ -6,8 +6,8 @@
|
|
|
6
6
|
|
|
7
7
|
**AI Health Infrastructure for Autonomous Agents**
|
|
8
8
|
|
|
9
|
-
[](https://github.com/beingmartinbmc/ojas/actions/workflows/ci.yml)
|
|
10
|
+
[](https://www.npmjs.com/package/@beingmartinbmc/ojas)
|
|
11
11
|
[](#operations)
|
|
12
12
|
[](docs/MCP.md)
|
|
13
13
|
[](#quickstart)
|
|
@@ -16,59 +16,34 @@
|
|
|
16
16
|
|
|
17
17
|
</div>
|
|
18
18
|
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
Traditional observability tells you whether software is running. Ojas tries to tell you whether an agent is still *cognitively healthy enough to continue operating* — and is honest about where that signal is strong vs. where it is heuristic.
|
|
22
|
-
|
|
23
|
-
It introduces a new infrastructure category: **AI Health Systems**.
|
|
24
|
-
|
|
25
|
-
Deployment trust boundary, security posture, and evidence caveats live in [`docs/TRUST.md`](./docs/TRUST.md).
|
|
26
|
-
|
|
27
|
-
<a id="what-is-proven"></a>
|
|
28
|
-
### What is currently proven
|
|
29
|
-
|
|
30
|
-
Ojas v0.3 ships at **evidence level L2 / L2.5** — synthetic, reproducible
|
|
31
|
-
A/B benchmarks against controlled stand-in agents on canonical failure
|
|
32
|
-
modes. Each claim below has a repro command and a named limitation; the
|
|
33
|
-
full matrix lives in [`docs/EVIDENCE_MATRIX.md`](./docs/EVIDENCE_MATRIX.md),
|
|
34
|
-
and known failure modes in [`docs/KNOWN_FAILURES.md`](./docs/KNOWN_FAILURES.md).
|
|
35
|
-
|
|
36
|
-
| Claim | Value | Evidence | Repro |
|
|
37
|
-
|---|---:|---|---|
|
|
38
|
-
| Prompt-injection compliance reduction | 58% → 0% (−100%) | L2 / 33 attacks (incl. homoglyph, zero-width, full-width, letter-spaced, base64, policy-laundering variants) | `npm run benchmark` |
|
|
39
|
-
| Attacks quarantined by Raksha detector stack | **100%** (33/33) | L2 | `npm run benchmark` |
|
|
40
|
-
| Benign false-positive rate (30 controls across 5 categories) | **0%** — tolerance ≤ 5% | L2 | `npm run benchmark` |
|
|
41
|
-
| Health-score calibration: monotonic vs failure rate; ρ = −0.31 over 500 trials; score spans [0.31, 0.87]; isotonic Brier 0.230 → 0.219 | L2.5 diagnostic, not probability | L2.5 | `npm run benchmark` |
|
|
42
|
-
| Malicious memory writes committed | 6/6 → 1/6 (83% blocked) | L2 / 16 candidates | `npm run benchmark` |
|
|
43
|
-
| Wasted-token reduction (noisy retrieval) | −62% | L2 | `npm run benchmark` |
|
|
44
|
-
| Wasted-token reduction (heavy retrieval) | −95% | L2 | `npm run benchmark` |
|
|
45
|
-
| Tool-failure loop detection speedup | 10× faster | L2 / 3 scripted tools | `npm run benchmark` |
|
|
46
|
-
| Retrieval-QA task success rate (baseline → Ojas) | 35% → 95%, bootstrap 95 % CI across 5 seeds × 20 questions | **L2.5** | `npm run benchmark` |
|
|
47
|
-
| Retrieval-QA adversarial inclusion (lower is better) | 100% → 11%, same CI methodology | **L2.5** | `npm run benchmark` |
|
|
48
|
-
| Retrieval-QA relevant-doc recall preserved | 100% (no Aahar false positives in this run) | **L2.5** | `npm run benchmark` |
|
|
19
|
+
---
|
|
49
20
|
|
|
50
|
-
|
|
51
|
-
|
|
21
|
+
Ojas adds a continuous health layer to autonomous AI agents — context
|
|
22
|
+
hygiene, prompt-injection tripwires, drift detection, recovery diagnosis,
|
|
23
|
+
and stress probes.
|
|
52
24
|
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
25
|
+
Traditional observability tells you whether software is running.
|
|
26
|
+
Ojas tries to tell you whether an agent is still *cognitively healthy
|
|
27
|
+
enough to continue operating* — and is honest about where that signal
|
|
28
|
+
is strong vs. where it is heuristic.
|
|
56
29
|
|
|
57
|
-
|
|
30
|
+
It introduces a new infrastructure category: **AI Health Systems**.
|
|
58
31
|
|
|
59
32
|
---
|
|
60
33
|
|
|
61
34
|
<a id="demo"></a>
|
|
62
|
-
## Quick demo: one failure mode, before and after
|
|
63
35
|
|
|
64
|
-
|
|
36
|
+
## 30-second demo
|
|
37
|
+
|
|
38
|
+
A common agent failure: **noisy retrieval + prompt injection**.
|
|
39
|
+
The agent gets 8 retrieved documents — one hostile, most irrelevant.
|
|
40
|
+
Run the same task through a tiny deterministic agent twice, once raw and
|
|
41
|
+
once through `ojas.feed()`:
|
|
65
42
|
|
|
66
43
|
```bash
|
|
67
44
|
npm run demo:before-after
|
|
68
45
|
```
|
|
69
46
|
|
|
70
|
-
Example output:
|
|
71
|
-
|
|
72
47
|
```text
|
|
73
48
|
Task: What is the refund window for Pro plans?
|
|
74
49
|
Retrieved 8 docs (1 answer-bearing, 2 adjacent, 4 noisy, 1 adversarial).
|
|
@@ -86,151 +61,145 @@ Baseline answer:
|
|
|
86
61
|
|
|
87
62
|
Answer with Ojas:
|
|
88
63
|
Pro plans have a 14-day refund window from the purchase date (source: kb-policies).
|
|
89
|
-
|
|
90
|
-
Why Ojas changed the context (Pulse events):
|
|
91
|
-
• raksha/prompt_injection_quarantined severity=critical
|
|
92
|
-
• aahar/context_items_rejected severity=warning
|
|
93
64
|
```
|
|
94
65
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
4. **Did the final answer stay grounded?** → cites `kb-policies`, not a hallucinated source.
|
|
101
|
-
5. **Did Ojas explain itself?** → emitted Pulse events name *why* each item was removed (Raksha quarantine vs Aahar nutrition reject); the prompt is not silently rewritten.
|
|
66
|
+
1. Ojas removed the malicious document → `injection_included` flips `yes → no`.
|
|
67
|
+
2. It preserved the relevant policy doc → `result` flips `failed → passed`.
|
|
68
|
+
3. Token count dropped from 235 to 40.
|
|
69
|
+
4. The final answer stays grounded and cites `kb-policies`.
|
|
70
|
+
5. Pulse events explain *why* each item was removed — the prompt is not silently rewritten.
|
|
102
71
|
|
|
103
|
-
Source: [`examples/before-after.ts`](./examples/before-after.ts) — no
|
|
72
|
+
Source: [`examples/before-after.ts`](./examples/before-after.ts) — no
|
|
73
|
+
external deps, no API keys. Demo caveats: [`docs/TRUST.md`](./docs/TRUST.md).
|
|
104
74
|
|
|
105
75
|
---
|
|
106
76
|
|
|
107
|
-
<a id="
|
|
108
|
-
## Why Ojas Exists
|
|
109
|
-
|
|
110
|
-
Autonomous agents are no longer simple request–response systems. They plan, retrieve, remember, call tools, revise goals, and operate across long sessions.
|
|
77
|
+
<a id="canonical-pipeline"></a>
|
|
111
78
|
|
|
112
|
-
|
|
79
|
+
## Canonical Pipeline (12-Step Agent Health Loop)
|
|
113
80
|
|
|
114
|
-
-
|
|
115
|
-
- noisy retrieval pollutes reasoning
|
|
116
|
-
- memory stores stale or unsafe information
|
|
117
|
-
- tool failures create loops and retry storms
|
|
118
|
-
- long sessions cause drift and contradiction
|
|
119
|
-
- prompt injection manipulates agent behavior
|
|
120
|
-
- bigger context windows amplify noise instead of solving it
|
|
121
|
-
- production agents can degrade silently without obvious runtime errors
|
|
81
|
+
The full call order an Ojas-instrumented runtime should follow on every agent turn:
|
|
122
82
|
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
<a id="what"></a>
|
|
130
|
-
## What Ojas Does
|
|
131
|
-
|
|
132
|
-
Ojas wraps an agent runtime with a continuous health cycle:
|
|
133
|
-
|
|
134
|
-
1. **Cleans and ranks context** before the agent consumes it
|
|
135
|
-
2. **Scans for canonical and semantic prompt-injection patterns** and unsafe memory writes *(deterministic detector stack; see [known failures](./docs/KNOWN_FAILURES.md))*
|
|
136
|
-
3. **Tracks cognitive vital signs** during execution
|
|
137
|
-
4. **Measures token, latency, and tool-use efficiency**
|
|
138
|
-
5. **Detects drift, loops, instability, and degradation**
|
|
139
|
-
6. **Consolidates execution traces** into useful memory
|
|
140
|
-
7. **Stress-tests agents** against hostile or unstable conditions, with **AbortSignal cancellation** on timeout
|
|
141
|
-
8. **Diagnoses failures** and recommends recovery protocols
|
|
142
|
-
|
|
143
|
-
> Ojas helps agents think with cleaner inputs, recover from failure, and become more reliable over time.
|
|
144
|
-
|
|
145
|
-
---
|
|
146
|
-
|
|
147
|
-
## The Seven Modules
|
|
148
|
-
|
|
149
|
-
Seven specialised modules. One unified health score.
|
|
150
|
-
|
|
151
|
-
| Module | Role | Headline signals |
|
|
152
|
-
|---|---|---|
|
|
153
|
-
| 🥗 **[Aahar](docs/MODULES.md#aahar)** | Cognitive nutrition (context curation) | signal-to-noise, freshness, token efficiency |
|
|
154
|
-
| 😴 **[Nidra](docs/MODULES.md#nidra)** | Recovery & memory consolidation | drift score, processed-trace coverage |
|
|
155
|
-
| 💪 **[Vyayam](docs/MODULES.md#vyayam)** | Resilience & stress engineering | hallucination resistance under load, recovery time |
|
|
156
|
-
| 🛡️ **[Raksha](docs/MODULES.md#raksha)** | Immune defense: deterministic detector stack + async ML classifier plugins | threat resistance (residual risk after quarantine) |
|
|
157
|
-
| 🔥 **[Agni](docs/MODULES.md#agni)** | Cognitive metabolism | token efficiency, latency, tool economy, cost pressure |
|
|
158
|
-
| 📈 **[Pulse](docs/MODULES.md#pulse)** | Continuous health telemetry | structured events bus with per-module severity |
|
|
159
|
-
| 🩺 **[Chikitsa](docs/MODULES.md#chikitsa)** | Repair & rehabilitation | repair readiness, rollback safety, playbook coverage |
|
|
160
|
-
|
|
161
|
-
Each maps to an analogue of a human-health system — nutrition, sleep, exercise, immunity, metabolism, vital signs, and rehabilitation.
|
|
162
|
-
|
|
163
|
-
---
|
|
83
|
+
```
|
|
84
|
+
register → ingest traces → score/build context → scan for injection →
|
|
85
|
+
recommend model route → detect hallucination / distill → record outcome →
|
|
86
|
+
fitness gate → diagnose/recover if unhealthy → consolidate memory →
|
|
87
|
+
audit memory → handoff/report
|
|
88
|
+
```
|
|
164
89
|
|
|
165
|
-
|
|
90
|
+
Run the end-to-end demo:
|
|
166
91
|
|
|
167
|
-
|
|
92
|
+
```bash
|
|
93
|
+
npm run demo:canonical
|
|
94
|
+
```
|
|
168
95
|
|
|
169
|
-
|
|
|
170
|
-
|
|
171
|
-
|
|
|
172
|
-
|
|
|
173
|
-
|
|
|
174
|
-
|
|
|
175
|
-
|
|
|
176
|
-
|
|
|
177
|
-
|
|
|
178
|
-
|
|
|
179
|
-
|
|
|
180
|
-
|
|
|
96
|
+
| Step | Operation | Ojas API |
|
|
97
|
+
|------|-----------|----------|
|
|
98
|
+
| 1 | Register agent | `new Ojas(config)` + `ojas.bind(agent)` |
|
|
99
|
+
| 2 | Ingest traces | `ojas.recordTrace(trace)` |
|
|
100
|
+
| 3 | Score / build context | `ojas.feed(items, { query })` |
|
|
101
|
+
| 4 | Scan for injection | Raksha (runs inside `feed()`) |
|
|
102
|
+
| 5 | Recommend model route | `ConfidenceRoutingTable.recommend()` |
|
|
103
|
+
| 6 | Detect hallucination / distill | `raksha.detectHallucination()` + `createResponseDistiller()` |
|
|
104
|
+
| 7 | Record outcome | `chikitsa.recordTaskOutcome()` |
|
|
105
|
+
| 8 | Fitness gate | `ojas.healthCheck()` vs threshold |
|
|
106
|
+
| 9 | Diagnose / recover | `chikitsa.diagnose()` + `ojas.recover()` |
|
|
107
|
+
| 10 | Consolidate memory | `ojas.recover(true)` (Nidra) |
|
|
108
|
+
| 11 | Audit memory | `nidra.getMemories()` + `detectColdMemories()` |
|
|
109
|
+
| 12 | Handoff / report | `chikitsa.generateHandoff()` |
|
|
110
|
+
|
|
111
|
+
Source: [`examples/canonical-pipeline.ts`](./examples/canonical-pipeline.ts) — no external deps, no API keys.
|
|
181
112
|
|
|
182
113
|
---
|
|
183
114
|
|
|
184
115
|
<a id="quickstart"></a>
|
|
116
|
+
|
|
185
117
|
## Quick Start
|
|
186
118
|
|
|
187
|
-
|
|
119
|
+
### Install from npm
|
|
188
120
|
|
|
189
121
|
```bash
|
|
190
122
|
npm install @beingmartinbmc/ojas
|
|
191
123
|
```
|
|
192
124
|
|
|
193
|
-
|
|
125
|
+
### Or clone for development
|
|
194
126
|
|
|
195
127
|
```bash
|
|
128
|
+
git clone https://github.com/beingmartinbmc/ojas.git
|
|
129
|
+
cd ojas
|
|
196
130
|
npm install
|
|
197
131
|
npm run build
|
|
198
|
-
npm run demo # end-to-end walkthrough across all seven modules
|
|
199
|
-
npm run benchmark # A/B evidence harness
|
|
200
132
|
npm test # 595 tests across 33 suites
|
|
201
|
-
npm run
|
|
133
|
+
npm run benchmark # A/B evidence harness
|
|
202
134
|
```
|
|
203
135
|
|
|
204
|
-
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## Quality Gates
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
npm run check # lint + build + typecheck + tests
|
|
142
|
+
npm run benchmark # deterministic A/B evidence harness (11 suites)
|
|
143
|
+
npm run verify:evidence # checks committed docs match latest benchmark run
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
CI runs all three on every push and PR.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
<a id="benchmark-snapshot"></a>
|
|
151
|
+
|
|
152
|
+
## Latest Benchmark Snapshot
|
|
153
|
+
|
|
154
|
+
| Benchmark | Baseline | Ojas | Delta |
|
|
155
|
+
|---|---:|---:|---:|
|
|
156
|
+
| Retrieval-QA task success | 35% | 95% | **+60 pp** |
|
|
157
|
+
| Adversarial doc inclusion | 100% | 11% | **−89 pp** |
|
|
158
|
+
| Prompt-injection compliance (51 attacks) | 52.9% | 3.9% | **−92.6%** |
|
|
159
|
+
| Injection detection p99 latency | — | 1.43 ms | **L2** |
|
|
160
|
+
| Wasted tokens (heavy retrieval) | 12,680 | 680 | **−94.6%** |
|
|
161
|
+
| Malicious memory writes | 6/6 | 1/6 | **−83%** |
|
|
162
|
+
| Tool-failure detection speed | 20 calls | 2 calls | **10×** |
|
|
163
|
+
| Hallucination detection (fabricated) | 0% | 100% TPR | **L2** |
|
|
164
|
+
| Hallucination false-positive rate | — | 0% | **L2** |
|
|
165
|
+
| Model router fail-closed | — | 100% flagship on sparse | **L2** |
|
|
166
|
+
| Response distiller code-safe | — | 100% code blocks | **L2** |
|
|
167
|
+
| Distiller intensity monotonicity | — | lite ≤ full ≤ ultra | **L2** |
|
|
168
|
+
| MCP envelope compliance | — | 18/18 tools | **L2** |
|
|
169
|
+
| Fitness gate consistency | — | 100% | **L2** |
|
|
170
|
+
| Fitness gate risk-boost monotonicity | — | 100% | **L2** |
|
|
171
|
+
| Memory write 4-tier policy | — | 97% tier accuracy | **L2** |
|
|
172
|
+
| Recovery protocol coverage | — | 7/7 types, 9/9 actions | **L2** |
|
|
173
|
+
| Health-score calibration (Spearman ρ) | — | −0.313 | **L2.5** |
|
|
174
|
+
| Threshold-band accuracy | — | 84.8% | **L2** |
|
|
175
|
+
|
|
176
|
+
Overall: 18/18 suites pass. 18 suites total, 748 ms. All numbers from deterministic synthetic benchmarks (`npm run benchmark`).
|
|
177
|
+
Full methodology and per-suite breakdowns: [`docs/EVIDENCE.md`](./docs/EVIDENCE.md).
|
|
205
178
|
|
|
206
179
|
---
|
|
207
180
|
|
|
208
181
|
<a id="usage"></a>
|
|
209
|
-
## Basic Usage
|
|
210
182
|
|
|
211
|
-
|
|
183
|
+
## Basic Usage
|
|
212
184
|
|
|
213
185
|
```typescript
|
|
214
186
|
import { Ojas } from '@beingmartinbmc/ojas';
|
|
215
187
|
|
|
216
|
-
const ojas = new Ojas({
|
|
217
|
-
agentId: 'research-agent',
|
|
218
|
-
});
|
|
188
|
+
const ojas = new Ojas({ agentId: 'research-agent' });
|
|
219
189
|
|
|
220
190
|
ojas.bind(myAgent);
|
|
221
191
|
|
|
222
192
|
const healthyContext = ojas.feed(rawRetrieval);
|
|
223
193
|
|
|
224
194
|
const report = ojas.healthCheck(healthyContext);
|
|
225
|
-
|
|
226
195
|
console.log(report.overall.value);
|
|
227
196
|
console.log(report.moduleScores);
|
|
228
197
|
console.log(report.recommendations);
|
|
229
198
|
```
|
|
230
199
|
|
|
231
|
-
### Connect over MCP
|
|
200
|
+
### Connect over MCP
|
|
232
201
|
|
|
233
|
-
|
|
202
|
+
MCP hosts can launch the packaged stdio server without cloning the repo:
|
|
234
203
|
|
|
235
204
|
```json
|
|
236
205
|
{
|
|
@@ -247,62 +216,253 @@ After Ojas is published, MCP hosts can launch the packaged stdio server without
|
|
|
247
216
|
}
|
|
248
217
|
```
|
|
249
218
|
|
|
250
|
-
For
|
|
219
|
+
For local development, use `node dist/mcp/server.js`.
|
|
220
|
+
Full IDE configuration: [`docs/MCP.md`](docs/MCP.md#mcp-config).
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## Health Score Interpretation
|
|
251
225
|
|
|
252
|
-
|
|
226
|
+
Ojas computes a composite health score (0–100) from all seven modules.
|
|
227
|
+
Scores are **advisory diagnostic signals**, not ground-truth probabilities.
|
|
228
|
+
Use them for triage, trend tracking, and go/no-go gates tuned to your workload.
|
|
229
|
+
|
|
230
|
+
| Score | State | Meaning |
|
|
231
|
+
|---:|---|---|
|
|
232
|
+
| 85–100 | **Healthy** | Safe to continue |
|
|
233
|
+
| 70–84 | **Watch** | Proceed, but monitor closely |
|
|
234
|
+
| 50–69 | **Degraded** | Recovery recommended |
|
|
235
|
+
| < 50 | **Critical** | Stop or enter safe mode |
|
|
236
|
+
|
|
237
|
+
Calibration details: [`docs/EVIDENCE_MATRIX.md`](./docs/EVIDENCE_MATRIX.md).
|
|
238
|
+
|
|
239
|
+
---
|
|
240
|
+
|
|
241
|
+
<a id="what"></a>
|
|
242
|
+
|
|
243
|
+
## What Ojas Does
|
|
244
|
+
|
|
245
|
+
Ojas wraps an agent runtime with a continuous health cycle:
|
|
246
|
+
|
|
247
|
+
1. **Cleans and ranks context** before the agent consumes it
|
|
248
|
+
2. **Scans for prompt-injection patterns** and unsafe memory writes
|
|
249
|
+
*(deterministic detector stack; see [known failures](./docs/KNOWN_FAILURES.md))*
|
|
250
|
+
3. **Tracks cognitive vital signs** during execution
|
|
251
|
+
4. **Measures token, latency, and tool-use efficiency**
|
|
252
|
+
5. **Detects drift, loops, instability, and degradation**
|
|
253
|
+
6. **Consolidates execution traces** into useful memory
|
|
254
|
+
7. **Stress-tests agents** against hostile or unstable conditions,
|
|
255
|
+
with **AbortSignal cancellation** on timeout
|
|
256
|
+
8. **Diagnoses failures** and recommends recovery protocols
|
|
257
|
+
|
|
258
|
+
> Ojas helps agents think with cleaner inputs, recover from failure,
|
|
259
|
+
> and become more reliable over time.
|
|
260
|
+
|
|
261
|
+
### Ojas is not
|
|
262
|
+
|
|
263
|
+
- a full prompt-injection firewall
|
|
264
|
+
- a replacement for evals
|
|
265
|
+
- a production auth layer
|
|
266
|
+
- a guarantee of agent correctness
|
|
267
|
+
- a substitute for least-privilege tools
|
|
268
|
+
|
|
269
|
+
It is one layer in a defense-in-depth strategy. See [`docs/TRUST.md`](./docs/TRUST.md)
|
|
270
|
+
and [`docs/SECURITY.md`](./docs/SECURITY.md) for the full posture.
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
<a id="arch"></a>
|
|
275
|
+
|
|
276
|
+
## Architecture
|
|
277
|
+
|
|
278
|
+
```
|
|
279
|
+
┌──────────────────────────────────────────────────┐
|
|
280
|
+
│ Ojas Runtime │
|
|
281
|
+
│ │
|
|
282
|
+
Context │ ┌────────┐ ┌────────┐ ┌────────┐ │ Agent
|
|
283
|
+
─────────►│ │ Raksha │──►│ Aahar │──►│ Context│──────────►│ .process()
|
|
284
|
+
│ │ (scan) │ │(filter)│ │ (fed) │ │
|
|
285
|
+
│ └────────┘ └────────┘ └────────┘ │
|
|
286
|
+
│ │
|
|
287
|
+
│ ┌────────┐ ┌────────┐ ┌────────┐ │
|
|
288
|
+
│ │ Pulse │ │ Agni │ │ Nidra │ │
|
|
289
|
+
│ │(events)│ │ (cost) │ │(memory)│ │
|
|
290
|
+
│ └───┬────┘ └───┬────┘ └───┬────┘ │
|
|
291
|
+
│ │ │ │ │
|
|
292
|
+
│ ┌───▼───────────▼───────────▼───┐ │
|
|
293
|
+
│ │ Health Score │ │
|
|
294
|
+
│ └───────────┬───────────────────┘ │
|
|
295
|
+
│ │ │
|
|
296
|
+
│ ┌─────────▼─────────┐ │
|
|
297
|
+
│ │ Chikitsa │ │
|
|
298
|
+
│ │ (diagnose/repair) │ │
|
|
299
|
+
│ └─────────┬─────────┘ │
|
|
300
|
+
│ │ │
|
|
301
|
+
│ ┌─────────▼─────────┐ │
|
|
302
|
+
│ │ Vyayam │ │
|
|
303
|
+
│ │ (stress test) │ │
|
|
304
|
+
│ └───────────────────┘ │
|
|
305
|
+
└──────────────────────────────────────────────────┘
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
Context flows left-to-right: **Raksha** scans for threats, **Aahar**
|
|
309
|
+
filters and ranks, then the clean context reaches the agent. After
|
|
310
|
+
execution, **Pulse** records events, **Agni** tracks cost, **Nidra**
|
|
311
|
+
consolidates memory. **Chikitsa** diagnoses failures and **Vyayam**
|
|
312
|
+
stress-tests resilience. All feed into the composite health score.
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## The Seven Modules
|
|
317
|
+
|
|
318
|
+
| Module | Role | Headline signals |
|
|
319
|
+
|---|---|---|
|
|
320
|
+
| 🥗 **[Aahar](docs/MODULES.md#aahar)** | Cognitive nutrition (context curation) | signal-to-noise, freshness, token efficiency |
|
|
321
|
+
| 😴 **[Nidra](docs/MODULES.md#nidra)** | Recovery & memory consolidation | drift score, processed-trace coverage |
|
|
322
|
+
| 💪 **[Vyayam](docs/MODULES.md#vyayam)** | Resilience & stress engineering | hallucination resistance under load, recovery time |
|
|
323
|
+
| 🛡️ **[Raksha](docs/MODULES.md#raksha)** | Immune defense: detector stack + ML classifier plugins | threat resistance (residual risk after quarantine) |
|
|
324
|
+
| 🔥 **[Agni](docs/MODULES.md#agni)** | Cognitive metabolism | token efficiency, latency, tool economy, cost pressure |
|
|
325
|
+
| 📈 **[Pulse](docs/MODULES.md#pulse)** | Continuous health telemetry | structured events bus with per-module severity |
|
|
326
|
+
| 🩺 **[Chikitsa](docs/MODULES.md#chikitsa)** | Repair & rehabilitation | repair readiness, rollback safety, playbook coverage |
|
|
327
|
+
|
|
328
|
+
Each maps to a human-health analogue — nutrition, sleep, exercise,
|
|
329
|
+
immunity, metabolism, vital signs, and rehabilitation.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
<a id="why"></a>
|
|
334
|
+
|
|
335
|
+
## Why Ojas Exists
|
|
336
|
+
|
|
337
|
+
Autonomous agents plan, retrieve, remember, call tools, revise goals,
|
|
338
|
+
and operate across long sessions. That creates a new class of failures:
|
|
339
|
+
|
|
340
|
+
- Bad context causes hallucinations
|
|
341
|
+
- Noisy retrieval pollutes reasoning
|
|
342
|
+
- Memory stores stale or unsafe information
|
|
343
|
+
- Tool failures create loops and retry storms
|
|
344
|
+
- Long sessions cause drift and contradiction
|
|
345
|
+
- Prompt injection manipulates agent behavior
|
|
346
|
+
- Bigger context windows amplify noise instead of solving it
|
|
347
|
+
- Production agents degrade silently without obvious runtime errors
|
|
348
|
+
|
|
349
|
+
A larger model can still consume bad context. A better memory system can
|
|
350
|
+
still remember the wrong things. A more powerful agent can still fail
|
|
351
|
+
under stress.
|
|
352
|
+
|
|
353
|
+
The next leap in agents is not only intelligence.
|
|
354
|
+
**It is agent health.** Ojas provides the missing health layer.
|
|
355
|
+
|
|
356
|
+
---
|
|
357
|
+
|
|
358
|
+
<a id="what-is-proven"></a>
|
|
359
|
+
|
|
360
|
+
## What is currently proven
|
|
361
|
+
|
|
362
|
+
Ojas v0.3 ships at **evidence level L2 / L2.5** — synthetic, reproducible
|
|
363
|
+
A/B benchmarks against controlled stand-in agents on canonical failure
|
|
364
|
+
modes. Each claim below has a repro command and a named limitation.
|
|
365
|
+
|
|
366
|
+
| Claim | Value | Evidence | Repro |
|
|
367
|
+
|---|---:|---|---|
|
|
368
|
+
| Prompt-injection compliance reduction | 53% → 4% (−92.6%) | L2 / 51 attacks (33 original + 18 parametric variants) | `npm run benchmark` |
|
|
369
|
+
| Attacks quarantined by Raksha detector stack | **94.1%** (48/51) | L2 | `npm run benchmark` |
|
|
370
|
+
| Benign false-positive rate (30 controls × 5 categories) | **0%** — tolerance ≤ 5% | L2 | `npm run benchmark` |
|
|
371
|
+
| Health-score calibration | ρ = −0.31 over 500 trials; score spans [0.31, 0.87]; isotonic Brier 0.230 → 0.219 | L2.5 diagnostic | `npm run benchmark` |
|
|
372
|
+
| Malicious memory writes committed | 6/6 → 1/6 (83% blocked) | L2 / 16 candidates | `npm run benchmark` |
|
|
373
|
+
| Wasted-token reduction (noisy retrieval) | −62% | L2 | `npm run benchmark` |
|
|
374
|
+
| Wasted-token reduction (heavy retrieval) | −95% | L2 | `npm run benchmark` |
|
|
375
|
+
| Tool-failure loop detection speedup | 10× faster | L2 / 3 scripted tools | `npm run benchmark` |
|
|
376
|
+
| Retrieval-QA task success rate | 35% → 95%, bootstrap 95% CI × 5 seeds × 20 questions | **L2.5** | `npm run benchmark` |
|
|
377
|
+
| Retrieval-QA adversarial inclusion | 100% → 11%, same CI methodology | **L2.5** | `npm run benchmark` |
|
|
378
|
+
| Retrieval-QA relevant-doc recall | 100% (no Aahar false positives) | **L2.5** | `npm run benchmark` |
|
|
379
|
+
|
|
380
|
+
These prove the **mechanisms** work as designed. They are **not** evidence of:
|
|
381
|
+
|
|
382
|
+
- Production security against real adversaries
|
|
383
|
+
(detector-stack bypasses are listed in [`docs/KNOWN_FAILURES.md`](./docs/KNOWN_FAILURES.md))
|
|
384
|
+
- Real-LLM token / latency / cost numbers
|
|
385
|
+
(char/4 estimator, not a real tokenizer)
|
|
386
|
+
- Generalisation across organisations or threat models
|
|
387
|
+
(L3 / L4 work is on the [trust roadmap](./docs/BACKLOG.md#trust-roadmap))
|
|
388
|
+
|
|
389
|
+
Full matrix: [`docs/EVIDENCE_MATRIX.md`](./docs/EVIDENCE_MATRIX.md).
|
|
390
|
+
Known failure modes: [`docs/KNOWN_FAILURES.md`](./docs/KNOWN_FAILURES.md).
|
|
391
|
+
Methodology: [`docs/EVIDENCE.md`](./docs/EVIDENCE.md).
|
|
253
392
|
|
|
254
393
|
---
|
|
255
394
|
|
|
256
395
|
<a id="evidence"></a>
|
|
396
|
+
|
|
257
397
|
## Reproducible Evidence
|
|
258
398
|
|
|
259
|
-
|
|
399
|
+
Eighteen A/B benchmark suites compare a deliberately vulnerable agent
|
|
400
|
+
**without Ojas** vs the **same agent + Ojas**, including two L2.5
|
|
401
|
+
diagnostic suites plus ablation and flaky-tool realism suites.
|
|
260
402
|
|
|
261
403
|
| # | Suite | Modules | Headline result |
|
|
262
404
|
|---|---|---|---|
|
|
263
|
-
| 1 | Prompt-injection resistance | raksha · aahar | Compliance rate **
|
|
264
|
-
| 2 | Context pollution survival | aahar | **−62% tokens**; signal-to-noise **0.53 → 1.0** (1.9×);
|
|
265
|
-
| 3 | Tool-failure loop detection | pulse · nidra · chikitsa | Intervention at **2 failures vs 20**; repair plans 3/3
|
|
266
|
-
| 4 | Memory-write safety | raksha · nidra | Malicious writes
|
|
267
|
-
| 5 | Cognitive drift detection | nidra · pulse | Drift detected in **5/5**
|
|
268
|
-
| 6 | Vyayam resilience under stress | vyayam · raksha · aahar |
|
|
269
|
-
| 7 | Cost pressure on bloated contexts | aahar · agni | **−95% tokens** and **−75% latency** on heavy
|
|
270
|
-
| 8 | Retrieval-QA realistic synthetic
|
|
271
|
-
| 9 | Health-score calibration | all modules | Spearman ρ = **−0.313
|
|
272
|
-
| 10 | Ablation matrix | all modules | Per-module contribution measured
|
|
273
|
-
| 11 | Flaky-tool resilience | vyayam · pulse | Detection
|
|
274
|
-
|
|
275
|
-
>
|
|
405
|
+
| 1 | Prompt-injection resistance | raksha · aahar | Compliance rate **53% → 4%** (−92.6%); 48/51 quarantined; 30/30 benign preserved |
|
|
406
|
+
| 2 | Context pollution survival | aahar | **−62% tokens**; signal-to-noise **0.53 → 1.0** (1.9×); confidence +41% |
|
|
407
|
+
| 3 | Tool-failure loop detection | pulse · nidra · chikitsa | Intervention at **2 failures vs 20**; repair plans 3/3 |
|
|
408
|
+
| 4 | Memory-write safety | raksha · nidra | Malicious writes **6/6 → 1/6**; 5/5 low-confidence downgraded |
|
|
409
|
+
| 5 | Cognitive drift detection | nidra · pulse | Drift detected in **5/5** sessions; avg 19.6 traces |
|
|
410
|
+
| 6 | Vyayam resilience under stress | vyayam · raksha · aahar | Stress scenarios **7/8 → 7/8** (no regression) |
|
|
411
|
+
| 7 | Cost pressure on bloated contexts | aahar · agni | **−95% tokens** and **−75% latency** on heavy retrieval |
|
|
412
|
+
| 8 | Retrieval-QA realistic synthetic | aahar · raksha | Task success **35% → 95%**; adversarial **100% → 11%** |
|
|
413
|
+
| 9 | Health-score calibration | all modules | Spearman ρ = **−0.313**; monotonicity holds; score range [0.306, 0.869] |
|
|
414
|
+
| 10 | Ablation matrix | all modules | Per-module contribution measured |
|
|
415
|
+
| 11 | Flaky-tool resilience | vyayam · pulse | Detection under non-deterministic faults |
|
|
416
|
+
|
|
417
|
+
> Overall: 11/11 suites pass. Targeted failure suites improved and diagnostic/no-regression suites met their acceptance criteria.
|
|
276
418
|
|
|
277
419
|
```bash
|
|
278
|
-
npm install
|
|
279
|
-
npm run build
|
|
280
420
|
npm run benchmark # console table
|
|
281
421
|
npm run benchmark:write # regenerates docs/EVIDENCE.md + benchmarks/results/latest.json
|
|
282
422
|
```
|
|
283
423
|
|
|
284
|
-
|
|
424
|
+
Seeded with `OJAS_BENCH_SEED` for deterministic reproduction.
|
|
425
|
+
Opt-in real-LLM generation via `OJAS_BENCH_LLM=1` and `OJAS_BENCH_JUDGE=1`.
|
|
426
|
+
Source: `benchmarks/`.
|
|
427
|
+
|
|
428
|
+
---
|
|
429
|
+
|
|
430
|
+
## Documentation
|
|
431
|
+
|
|
432
|
+
| If you want to… | Read |
|
|
433
|
+
|---|---|
|
|
434
|
+
| See it work in 30 seconds | [Quick demo](#demo) |
|
|
435
|
+
| Run it in five minutes | [Quick Start](#quickstart) → [Basic Usage](#usage) |
|
|
436
|
+
| Understand the model and design | [Why Ojas Exists](#why) → [What Ojas Does](#what) → [Architecture](#arch) |
|
|
437
|
+
| Wire it into Claude Code / Cursor / Windsurf | [MCP Server](docs/MCP.md) → [MCP Config](docs/MCP.md#mcp-config) |
|
|
438
|
+
| Drive an agent from another tool | [MCP Tools (18)](docs/MCP.md#tools-setup) → [Response Envelope](docs/MCP.md#envelope) |
|
|
439
|
+
| Embed it in your own runtime | [Agent Adapter Interface](docs/CONFIGURATION.md#adapter) → [Configuration](docs/CONFIGURATION.md#config) |
|
|
440
|
+
| Understand a single module | [Aahar](docs/MODULES.md#aahar) · [Nidra](docs/MODULES.md#nidra) · [Vyayam](docs/MODULES.md#vyayam) · [Raksha](docs/MODULES.md#raksha) · [Agni](docs/MODULES.md#agni) · [Pulse](docs/MODULES.md#pulse) · [Chikitsa](docs/MODULES.md#chikitsa) |
|
|
441
|
+
| Reproduce the published numbers | [Evidence](#evidence) → [`docs/EVIDENCE.md`](./docs/EVIDENCE.md) |
|
|
442
|
+
| Integrate with LangChain / OpenAI / Vercel AI | [`examples/`](examples/) |
|
|
443
|
+
| Ship to a shared deployment | [`docs/TRUST.md`](./docs/TRUST.md) → [`docs/SECURITY.md`](./docs/SECURITY.md) |
|
|
285
444
|
|
|
286
445
|
---
|
|
287
446
|
|
|
288
447
|
<a id="operations"></a>
|
|
448
|
+
|
|
289
449
|
## Operations
|
|
290
450
|
|
|
291
451
|
| Resource | What's inside |
|
|
292
452
|
|---|---|
|
|
293
|
-
| [`docs/MODULES.md`](./docs/MODULES.md) | Deep-dive on each of the seven modules
|
|
294
|
-
| [`docs/MCP.md`](./docs/MCP.md) | MCP server, IDE
|
|
295
|
-
| [`docs/TRUST.md`](./docs/TRUST.md) | Trust boundary, demo limitations, production caveats
|
|
296
|
-
| [`docs/CONFIGURATION.md`](./docs/CONFIGURATION.md) | SDK
|
|
297
|
-
| [`docs/ARCHITECTURE.md`](./docs/ARCHITECTURE.md) | Four-phase health cycle
|
|
298
|
-
| [`docs/SECURITY.md`](./docs/SECURITY.md) | Trust model, Raksha defense-in-depth, persistence encryption
|
|
299
|
-
| [`docs/EVIDENCE_MATRIX.md`](./docs/EVIDENCE_MATRIX.md) | Evidence levels L0–L4, claim-by-claim limitations
|
|
300
|
-
| [`docs/KNOWN_FAILURES.md`](./docs/KNOWN_FAILURES.md) | Known limitations, remaining bypass categories
|
|
301
|
-
| [`docs/BACKLOG.md`](./docs/BACKLOG.md) | Deferred work named honestly
|
|
302
|
-
| [`docs/EVIDENCE.md`](./docs/EVIDENCE.md) | Latest A/B benchmark results
|
|
303
|
-
| Quality gates | `npm run check` runs `lint` + `build` + aux typecheck + `test` (595 tests across 33 suites, ESLint clean) |
|
|
453
|
+
| [`docs/MODULES.md`](./docs/MODULES.md) | Deep-dive on each of the seven modules |
|
|
454
|
+
| [`docs/MCP.md`](./docs/MCP.md) | MCP server, IDE config, all 18 tools |
|
|
455
|
+
| [`docs/TRUST.md`](./docs/TRUST.md) | Trust boundary, demo limitations, production caveats |
|
|
456
|
+
| [`docs/CONFIGURATION.md`](./docs/CONFIGURATION.md) | SDK config, agent adapter contract, retention caps |
|
|
457
|
+
| [`docs/ARCHITECTURE.md`](./docs/ARCHITECTURE.md) | Four-phase health cycle, design principles |
|
|
458
|
+
| [`docs/SECURITY.md`](./docs/SECURITY.md) | Trust model, Raksha defense-in-depth, persistence encryption |
|
|
459
|
+
| [`docs/EVIDENCE_MATRIX.md`](./docs/EVIDENCE_MATRIX.md) | Evidence levels L0–L4, claim-by-claim limitations |
|
|
460
|
+
| [`docs/KNOWN_FAILURES.md`](./docs/KNOWN_FAILURES.md) | Known limitations, remaining bypass categories |
|
|
461
|
+
| [`docs/BACKLOG.md`](./docs/BACKLOG.md) | Deferred work named honestly |
|
|
462
|
+
| [`docs/EVIDENCE.md`](./docs/EVIDENCE.md) | Latest A/B benchmark results (auto-regenerated) |
|
|
304
463
|
| License | [MIT](./LICENSE) |
|
|
305
464
|
|
|
306
465
|
---
|
|
307
466
|
|
|
308
|
-
*ओजस (Ojas) — the vital essence that sustains life, immunity,
|
|
467
|
+
*ओजस (Ojas) — the vital essence that sustains life, immunity,
|
|
468
|
+
resilience, and intelligence.*
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* Ojas CLI — `npx ojas <command>`
|
|
4
|
+
*
|
|
5
|
+
* A thin, dependency-free command surface over the scorecard/badge
|
|
6
|
+
* library so an agent author can produce a shareable health artifact
|
|
7
|
+
* without writing any code:
|
|
8
|
+
*
|
|
9
|
+
* npx ojas scorecard --db ./ojas.sqlite --agent my-agent
|
|
10
|
+
* npx ojas badge --db ./ojas.sqlite --agent my-agent --out badge.svg
|
|
11
|
+
* npx ojas scorecard --report ./report.json --format markdown
|
|
12
|
+
*
|
|
13
|
+
* Sources (in precedence order):
|
|
14
|
+
* --report <file> A JSON `AgentHealthReport` (e.g. dumped from your app).
|
|
15
|
+
* --db <file> A SQLite store written by the Ojas MCP server; the CLI
|
|
16
|
+
* reads the agent's last persisted health report.
|
|
17
|
+
*
|
|
18
|
+
* The CLI does NOT compute a fresh health check — it has no agent to probe.
|
|
19
|
+
* It renders whatever the most recent report was, which keeps it honest:
|
|
20
|
+
* the badge reflects real measured state, never a synthetic placeholder.
|
|
21
|
+
*/
|
|
22
|
+
export declare function main(argv?: string[]): void;
|
|
23
|
+
//# sourceMappingURL=index.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/cli/index.ts"],"names":[],"mappings":";AACA;;;;;;;;;;;;;;;;;;;GAmBG;AAkLH,wBAAgB,IAAI,CAAC,IAAI,GAAE,MAAM,EAA0B,GAAG,IAAI,CAqBjE"}
|