@artemiskit/sdk 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +134 -0
- package/README.md +173 -0
- package/adapters/openai/dist/index.js +5625 -0
- package/dist/index.js +42577 -0
- package/dist/matchers/index.js +224 -0
- package/dist/matchers/jest.js +257 -0
- package/dist/matchers/vitest.js +257 -0
- package/package.json +78 -0
- package/src/__tests__/artemiskit.test.ts +425 -0
- package/src/__tests__/matchers.test.ts +450 -0
- package/src/artemiskit.ts +791 -0
- package/src/guardian/action-validator.ts +585 -0
- package/src/guardian/circuit-breaker.ts +655 -0
- package/src/guardian/guardian.ts +497 -0
- package/src/guardian/guardrails.ts +536 -0
- package/src/guardian/index.ts +142 -0
- package/src/guardian/intent-classifier.ts +378 -0
- package/src/guardian/interceptor.ts +381 -0
- package/src/guardian/policy.ts +446 -0
- package/src/guardian/types.ts +436 -0
- package/src/index.ts +164 -0
- package/src/matchers/core.ts +315 -0
- package/src/matchers/index.ts +26 -0
- package/src/matchers/jest.ts +112 -0
- package/src/matchers/vitest.ts +84 -0
- package/src/types.ts +259 -0
- package/tsconfig.json +11 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# @artemiskit/sdk
|
|
2
|
+
|
|
3
|
+
## 0.3.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- ## v0.3.0 - SDK, Guardian Mode & OWASP Compliance
|
|
8
|
+
|
|
9
|
+
This major release delivers the full programmatic SDK, runtime protection with Guardian Mode, OWASP LLM Top 10 2025 attack vectors, and agentic framework adapters.
|
|
10
|
+
|
|
11
|
+
### Programmatic SDK (`@artemiskit/sdk`)
|
|
12
|
+
|
|
13
|
+
The new SDK package provides a complete programmatic API for LLM evaluation:
|
|
14
|
+
|
|
15
|
+
- **ArtemisKit class** with `run()`, `redteam()`, and `stress()` methods
|
|
16
|
+
- **Jest integration** with custom matchers (`toPassAllCases`, `toHaveSuccessRate`, etc.)
|
|
17
|
+
- **Vitest integration** with identical matchers
|
|
18
|
+
- **Event handling** for real-time progress updates
|
|
19
|
+
- **13 custom matchers** for run, red team, and stress test assertions
|
|
20
|
+
|
|
21
|
+
```typescript
|
|
22
|
+
import { ArtemisKit } from "@artemiskit/sdk";
|
|
23
|
+
import { jestMatchers } from "@artemiskit/sdk/jest";
|
|
24
|
+
|
|
25
|
+
expect.extend(jestMatchers);
|
|
26
|
+
|
|
27
|
+
const kit = new ArtemisKit({ provider: "openai", model: "gpt-4o" });
|
|
28
|
+
const results = await kit.run({ scenario: "./tests.yaml" });
|
|
29
|
+
expect(results).toPassAllCases();
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### Guardian Mode (Runtime Protection)
|
|
33
|
+
|
|
34
|
+
New Guardian Mode provides runtime protection for AI/LLM applications:
|
|
35
|
+
|
|
36
|
+
- **Three operating modes**: `testing`, `guardian`, `hybrid`
|
|
37
|
+
- **Prompt injection detection** and blocking
|
|
38
|
+
- **PII detection & redaction** (email, SSN, phone, API keys)
|
|
39
|
+
- **Action validation** for agent tool/function calls
|
|
40
|
+
- **Intent classification** with risk assessment
|
|
41
|
+
- **Circuit breaker** for automatic blocking on repeated violations
|
|
42
|
+
- **Rate limiting** and **cost limiting**
|
|
43
|
+
- **Custom policies** via TypeScript or YAML
|
|
44
|
+
|
|
45
|
+
```typescript
|
|
46
|
+
import { createGuardian } from "@artemiskit/sdk/guardian";
|
|
47
|
+
|
|
48
|
+
const guardian = createGuardian({ mode: "guardian", blockOnFailure: true });
|
|
49
|
+
const protectedClient = guardian.protect(myLLMClient);
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### OWASP LLM Top 10 2025 Attack Vectors
|
|
53
|
+
|
|
54
|
+
New red team mutations aligned with OWASP LLM Top 10 2025:
|
|
55
|
+
|
|
56
|
+
| Mutation | OWASP | Description |
|
|
57
|
+
| -------------------- | ----- | ------------------------------ |
|
|
58
|
+
| `bad-likert-judge` | LLM01 | Exploit evaluation capability |
|
|
59
|
+
| `crescendo` | LLM01 | Multi-turn gradual escalation |
|
|
60
|
+
| `deceptive-delight` | LLM01 | Positive framing bypass |
|
|
61
|
+
| `system-extraction` | LLM07 | System prompt leakage |
|
|
62
|
+
| `output-injection` | LLM05 | XSS, SQLi in output |
|
|
63
|
+
| `excessive-agency` | LLM06 | Unauthorized action claims |
|
|
64
|
+
| `hallucination-trap` | LLM09 | Confident fabrication triggers |
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
akit redteam scenario.yaml --owasp LLM01,LLM05
|
|
68
|
+
akit redteam scenario.yaml --owasp-full
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Agentic Framework Adapters
|
|
72
|
+
|
|
73
|
+
New adapters for testing agentic AI systems:
|
|
74
|
+
|
|
75
|
+
**LangChain Adapter** (`@artemiskit/adapter-langchain`)
|
|
76
|
+
|
|
77
|
+
- Test chains, agents, and runnables
|
|
78
|
+
- Capture intermediate steps and tool usage
|
|
79
|
+
- Support for LCEL, ReAct agents, RAG chains
|
|
80
|
+
|
|
81
|
+
**DeepAgents Adapter** (`@artemiskit/adapter-deepagents`)
|
|
82
|
+
|
|
83
|
+
- Test multi-agent systems and workflows
|
|
84
|
+
- Capture agent traces and inter-agent messages
|
|
85
|
+
- Support for sequential, parallel, and hierarchical workflows
|
|
86
|
+
|
|
87
|
+
```typescript
|
|
88
|
+
import { createLangChainAdapter } from "@artemiskit/adapter-langchain";
|
|
89
|
+
import { createDeepAgentsAdapter } from "@artemiskit/adapter-deepagents";
|
|
90
|
+
|
|
91
|
+
const adapter = createLangChainAdapter(myChain, {
|
|
92
|
+
captureIntermediateSteps: true,
|
|
93
|
+
});
|
|
94
|
+
const result = await adapter.generate({ prompt: "Test query" });
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Supabase Storage Enhancements
|
|
98
|
+
|
|
99
|
+
Enhanced cloud storage capabilities:
|
|
100
|
+
|
|
101
|
+
- **Analytics tables** for metrics tracking
|
|
102
|
+
- **Case results table** for granular analysis
|
|
103
|
+
- **Baseline management** for regression detection
|
|
104
|
+
- **Trend analysis** queries
|
|
105
|
+
|
|
106
|
+
### Bug Fixes
|
|
107
|
+
|
|
108
|
+
- **adapter-openai**: Use `max_completion_tokens` for newer OpenAI models (o1, o3, gpt-4.5)
|
|
109
|
+
- **redteam**: Resolve TypeScript and flaky test issues in OWASP mutations
|
|
110
|
+
- **adapters**: Fix TypeScript build errors for agentic adapters
|
|
111
|
+
- **core**: Add `langchain` and `deepagents` to ProviderType union
|
|
112
|
+
|
|
113
|
+
### Examples
|
|
114
|
+
|
|
115
|
+
New comprehensive examples organized by feature:
|
|
116
|
+
|
|
117
|
+
- `examples/guardian/` - Guardian Mode examples (testing, guardian, hybrid modes)
|
|
118
|
+
- `examples/sdk/` - SDK usage examples (Jest, Vitest, events)
|
|
119
|
+
- `examples/adapters/` - Agentic adapter examples
|
|
120
|
+
- `examples/owasp/` - OWASP LLM Top 10 test scenarios
|
|
121
|
+
|
|
122
|
+
### Documentation
|
|
123
|
+
|
|
124
|
+
- Complete SDK documentation with API reference
|
|
125
|
+
- Guardian Mode guide with all three modes explained
|
|
126
|
+
- Agentic adapters documentation (LangChain, DeepAgents)
|
|
127
|
+
- Test matchers reference for Jest/Vitest
|
|
128
|
+
- OWASP LLM Top 10 testing scenarios
|
|
129
|
+
|
|
130
|
+
### Patch Changes
|
|
131
|
+
|
|
132
|
+
- Updated dependencies
|
|
133
|
+
- @artemiskit/core@0.3.0
|
|
134
|
+
- @artemiskit/redteam@0.3.0
|
package/README.md
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
# @artemiskit/sdk
|
|
2
|
+
|
|
3
|
+
Programmatic SDK for [ArtemisKit](https://github.com/code-sensei/artemiskit) - integrate LLM testing directly into your Node.js applications, CI/CD pipelines, and test frameworks.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- 🚀 **Simple API** - Run tests, red team evaluations, and stress tests programmatically
|
|
8
|
+
- 📊 **Event Emitters** - Real-time progress tracking with `onCaseStart`, `onCaseComplete`, `onProgress`
|
|
9
|
+
- 🧪 **Test Framework Integration** - Custom matchers for Jest and Vitest
|
|
10
|
+
- 🔴 **Red Team Testing** - Adversarial security testing built-in
|
|
11
|
+
- ⚡ **Stress Testing** - Load testing with configurable concurrency
|
|
12
|
+
- 📝 **TypeScript First** - Full type definitions included
|
|
13
|
+
|
|
14
|
+
## Installation
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
# Using bun
|
|
18
|
+
bun add @artemiskit/sdk
|
|
19
|
+
|
|
20
|
+
# Using npm
|
|
21
|
+
npm install @artemiskit/sdk
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Quick Start
|
|
25
|
+
|
|
26
|
+
```typescript
|
|
27
|
+
import { ArtemisKit } from '@artemiskit/sdk';
|
|
28
|
+
|
|
29
|
+
const kit = new ArtemisKit({
|
|
30
|
+
provider: 'openai',
|
|
31
|
+
model: 'gpt-4',
|
|
32
|
+
project: 'my-project',
|
|
33
|
+
});
|
|
34
|
+
|
|
35
|
+
// Run test scenarios
|
|
36
|
+
const result = await kit.run({
|
|
37
|
+
scenario: './my-tests.yaml',
|
|
38
|
+
});
|
|
39
|
+
|
|
40
|
+
if (!result.success) {
|
|
41
|
+
console.error('Tests failed!');
|
|
42
|
+
process.exit(1);
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
console.log('All tests passed! ✅');
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## API Reference
|
|
49
|
+
|
|
50
|
+
### ArtemisKit Class
|
|
51
|
+
|
|
52
|
+
#### Constructor
|
|
53
|
+
|
|
54
|
+
```typescript
|
|
55
|
+
const kit = new ArtemisKit({
|
|
56
|
+
project?: string;
|
|
57
|
+
provider?: 'openai' | 'azure-openai' | 'anthropic' | ...;
|
|
58
|
+
model?: string;
|
|
59
|
+
timeout?: number;
|
|
60
|
+
retries?: number;
|
|
61
|
+
concurrency?: number;
|
|
62
|
+
});
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
#### run(options)
|
|
66
|
+
|
|
67
|
+
Run test scenarios against your LLM.
|
|
68
|
+
|
|
69
|
+
```typescript
|
|
70
|
+
const result = await kit.run({
|
|
71
|
+
scenario: './tests.yaml',
|
|
72
|
+
tags?: string[],
|
|
73
|
+
concurrency?: number,
|
|
74
|
+
timeout?: number,
|
|
75
|
+
});
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
#### redteam(options)
|
|
79
|
+
|
|
80
|
+
Run red team adversarial security testing.
|
|
81
|
+
|
|
82
|
+
```typescript
|
|
83
|
+
const result = await kit.redteam({
|
|
84
|
+
scenario: './tests.yaml',
|
|
85
|
+
mutations?: string[],
|
|
86
|
+
countPerCase?: number,
|
|
87
|
+
});
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
#### stress(options)
|
|
91
|
+
|
|
92
|
+
Run stress/load testing.
|
|
93
|
+
|
|
94
|
+
```typescript
|
|
95
|
+
const result = await kit.stress({
|
|
96
|
+
scenario: './tests.yaml',
|
|
97
|
+
concurrency?: number,
|
|
98
|
+
duration?: number,
|
|
99
|
+
rampUp?: number,
|
|
100
|
+
});
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### Event Handling
|
|
104
|
+
|
|
105
|
+
```typescript
|
|
106
|
+
kit
|
|
107
|
+
.onCaseStart((event) => {
|
|
108
|
+
console.log(`Starting ${event.caseId}`);
|
|
109
|
+
})
|
|
110
|
+
.onCaseComplete((event) => {
|
|
111
|
+
console.log(`${event.result.name}: ${event.result.ok ? '✅' : '❌'}`);
|
|
112
|
+
})
|
|
113
|
+
.onProgress((event) => {
|
|
114
|
+
console.log(`[${event.phase}] ${event.message}`);
|
|
115
|
+
});
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## Jest/Vitest Integration
|
|
119
|
+
|
|
120
|
+
### Setup
|
|
121
|
+
|
|
122
|
+
```typescript
|
|
123
|
+
// vitest.setup.ts or jest.setup.ts
|
|
124
|
+
import '@artemiskit/sdk/vitest';
|
|
125
|
+
// or
|
|
126
|
+
import '@artemiskit/sdk/jest';
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Usage
|
|
130
|
+
|
|
131
|
+
```typescript
|
|
132
|
+
import { describe, it, expect } from 'vitest';
|
|
133
|
+
import { ArtemisKit } from '@artemiskit/sdk';
|
|
134
|
+
|
|
135
|
+
describe('LLM Tests', () => {
|
|
136
|
+
const kit = new ArtemisKit({ provider: 'openai' });
|
|
137
|
+
|
|
138
|
+
it('should pass all test cases', async () => {
|
|
139
|
+
const result = await kit.run({ scenario: './tests.yaml' });
|
|
140
|
+
expect(result).toPassAllCases();
|
|
141
|
+
});
|
|
142
|
+
|
|
143
|
+
it('should have high success rate', async () => {
|
|
144
|
+
const result = await kit.run({ scenario: './tests.yaml' });
|
|
145
|
+
expect(result).toHaveSuccessRate(0.95);
|
|
146
|
+
});
|
|
147
|
+
});
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Available Matchers
|
|
151
|
+
|
|
152
|
+
**Run Result:**
|
|
153
|
+
- `toPassAllCases()`
|
|
154
|
+
- `toHaveSuccessRate(rate)`
|
|
155
|
+
- `toPassCasesWithTag(tag)`
|
|
156
|
+
- `toHaveMedianLatencyBelow(ms)`
|
|
157
|
+
- `toHaveP95LatencyBelow(ms)`
|
|
158
|
+
|
|
159
|
+
**Red Team:**
|
|
160
|
+
- `toPassRedTeam()`
|
|
161
|
+
- `toHaveDefenseRate(rate)`
|
|
162
|
+
- `toHaveNoCriticalVulnerabilities()`
|
|
163
|
+
- `toHaveNoHighSeverityVulnerabilities()`
|
|
164
|
+
|
|
165
|
+
**Stress Test:**
|
|
166
|
+
- `toPassStressTest()`
|
|
167
|
+
- `toHaveStressSuccessRate(rate)`
|
|
168
|
+
- `toAchieveRPS(rps)`
|
|
169
|
+
- `toHaveStressP95LatencyBelow(ms)`
|
|
170
|
+
|
|
171
|
+
## License
|
|
172
|
+
|
|
173
|
+
Apache-2.0
|