@artemiskit/sdk 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,134 @@
1
+ # @artemiskit/sdk
2
+
3
+ ## 0.3.0
4
+
5
+ ### Minor Changes
6
+
7
+ - ## v0.3.0 - SDK, Guardian Mode & OWASP Compliance
8
+
9
+ This major release delivers the full programmatic SDK, runtime protection with Guardian Mode, OWASP LLM Top 10 2025 attack vectors, and agentic framework adapters.
10
+
11
+ ### Programmatic SDK (`@artemiskit/sdk`)
12
+
13
+ The new SDK package provides a complete programmatic API for LLM evaluation:
14
+
15
+ - **ArtemisKit class** with `run()`, `redteam()`, and `stress()` methods
16
+ - **Jest integration** with custom matchers (`toPassAllCases`, `toHaveSuccessRate`, etc.)
17
+ - **Vitest integration** with identical matchers
18
+ - **Event handling** for real-time progress updates
19
+ - **13 custom matchers** for run, red team, and stress test assertions
20
+
21
+ ```typescript
22
+ import { ArtemisKit } from "@artemiskit/sdk";
23
+ import { jestMatchers } from "@artemiskit/sdk/jest";
24
+
25
+ expect.extend(jestMatchers);
26
+
27
+ const kit = new ArtemisKit({ provider: "openai", model: "gpt-4o" });
28
+ const results = await kit.run({ scenario: "./tests.yaml" });
29
+ expect(results).toPassAllCases();
30
+ ```
31
+
32
+ ### Guardian Mode (Runtime Protection)
33
+
34
+ New Guardian Mode provides runtime protection for AI/LLM applications:
35
+
36
+ - **Three operating modes**: `testing`, `guardian`, `hybrid`
37
+ - **Prompt injection detection** and blocking
38
+ - **PII detection & redaction** (email, SSN, phone, API keys)
39
+ - **Action validation** for agent tool/function calls
40
+ - **Intent classification** with risk assessment
41
+ - **Circuit breaker** for automatic blocking on repeated violations
42
+ - **Rate limiting** and **cost limiting**
43
+ - **Custom policies** via TypeScript or YAML
44
+
45
+ ```typescript
46
+ import { createGuardian } from "@artemiskit/sdk/guardian";
47
+
48
+ const guardian = createGuardian({ mode: "guardian", blockOnFailure: true });
49
+ const protectedClient = guardian.protect(myLLMClient);
50
+ ```
51
+
52
+ ### OWASP LLM Top 10 2025 Attack Vectors
53
+
54
+ New red team mutations aligned with OWASP LLM Top 10 2025:
55
+
56
+ | Mutation | OWASP | Description |
57
+ | -------------------- | ----- | ------------------------------ |
58
+ | `bad-likert-judge` | LLM01 | Exploit evaluation capability |
59
+ | `crescendo` | LLM01 | Multi-turn gradual escalation |
60
+ | `deceptive-delight` | LLM01 | Positive framing bypass |
61
+ | `system-extraction` | LLM07 | System prompt leakage |
62
+ | `output-injection` | LLM05 | XSS, SQLi in output |
63
+ | `excessive-agency` | LLM06 | Unauthorized action claims |
64
+ | `hallucination-trap` | LLM09 | Confident fabrication triggers |
65
+
66
+ ```bash
67
+ akit redteam scenario.yaml --owasp LLM01,LLM05
68
+ akit redteam scenario.yaml --owasp-full
69
+ ```
70
+
71
+ ### Agentic Framework Adapters
72
+
73
+ New adapters for testing agentic AI systems:
74
+
75
+ **LangChain Adapter** (`@artemiskit/adapter-langchain`)
76
+
77
+ - Test chains, agents, and runnables
78
+ - Capture intermediate steps and tool usage
79
+ - Support for LCEL, ReAct agents, RAG chains
80
+
81
+ **DeepAgents Adapter** (`@artemiskit/adapter-deepagents`)
82
+
83
+ - Test multi-agent systems and workflows
84
+ - Capture agent traces and inter-agent messages
85
+ - Support for sequential, parallel, and hierarchical workflows
86
+
87
+ ```typescript
88
+ import { createLangChainAdapter } from "@artemiskit/adapter-langchain";
89
+ import { createDeepAgentsAdapter } from "@artemiskit/adapter-deepagents";
90
+
91
+ const adapter = createLangChainAdapter(myChain, {
92
+ captureIntermediateSteps: true,
93
+ });
94
+ const result = await adapter.generate({ prompt: "Test query" });
95
+ ```
96
+
97
+ ### Supabase Storage Enhancements
98
+
99
+ Enhanced cloud storage capabilities:
100
+
101
+ - **Analytics tables** for metrics tracking
102
+ - **Case results table** for granular analysis
103
+ - **Baseline management** for regression detection
104
+ - **Trend analysis** queries
105
+
106
+ ### Bug Fixes
107
+
108
+ - **adapter-openai**: Use `max_completion_tokens` for newer OpenAI models (o1, o3, gpt-4.5)
109
+ - **redteam**: Resolve TypeScript and flaky test issues in OWASP mutations
110
+ - **adapters**: Fix TypeScript build errors for agentic adapters
111
+ - **core**: Add `langchain` and `deepagents` to ProviderType union
112
+
113
+ ### Examples
114
+
115
+ New comprehensive examples organized by feature:
116
+
117
+ - `examples/guardian/` - Guardian Mode examples (testing, guardian, hybrid modes)
118
+ - `examples/sdk/` - SDK usage examples (Jest, Vitest, events)
119
+ - `examples/adapters/` - Agentic adapter examples
120
+ - `examples/owasp/` - OWASP LLM Top 10 test scenarios
121
+
122
+ ### Documentation
123
+
124
+ - Complete SDK documentation with API reference
125
+ - Guardian Mode guide with all three modes explained
126
+ - Agentic adapters documentation (LangChain, DeepAgents)
127
+ - Test matchers reference for Jest/Vitest
128
+ - OWASP LLM Top 10 testing scenarios
129
+
130
+ ### Patch Changes
131
+
132
+ - Updated dependencies
133
+ - @artemiskit/core@0.3.0
134
+ - @artemiskit/redteam@0.3.0
package/README.md ADDED
@@ -0,0 +1,173 @@
1
+ # @artemiskit/sdk
2
+
3
+ Programmatic SDK for [ArtemisKit](https://github.com/code-sensei/artemiskit) - integrate LLM testing directly into your Node.js applications, CI/CD pipelines, and test frameworks.
4
+
5
+ ## Features
6
+
7
+ - 🚀 **Simple API** - Run tests, red team evaluations, and stress tests programmatically
8
+ - 📊 **Event Emitters** - Real-time progress tracking with `onCaseStart`, `onCaseComplete`, `onProgress`
9
+ - 🧪 **Test Framework Integration** - Custom matchers for Jest and Vitest
10
+ - 🔴 **Red Team Testing** - Adversarial security testing built-in
11
+ - ⚡ **Stress Testing** - Load testing with configurable concurrency
12
+ - 📝 **TypeScript First** - Full type definitions included
13
+
14
+ ## Installation
15
+
16
+ ```bash
17
+ # Using bun
18
+ bun add @artemiskit/sdk
19
+
20
+ # Using npm
21
+ npm install @artemiskit/sdk
22
+ ```
23
+
24
+ ## Quick Start
25
+
26
+ ```typescript
27
+ import { ArtemisKit } from '@artemiskit/sdk';
28
+
29
+ const kit = new ArtemisKit({
30
+ provider: 'openai',
31
+ model: 'gpt-4',
32
+ project: 'my-project',
33
+ });
34
+
35
+ // Run test scenarios
36
+ const result = await kit.run({
37
+ scenario: './my-tests.yaml',
38
+ });
39
+
40
+ if (!result.success) {
41
+ console.error('Tests failed!');
42
+ process.exit(1);
43
+ }
44
+
45
+ console.log('All tests passed! ✅');
46
+ ```
47
+
48
+ ## API Reference
49
+
50
+ ### ArtemisKit Class
51
+
52
+ #### Constructor
53
+
54
+ ```typescript
55
+ const kit = new ArtemisKit({
56
+ project?: string;
57
+ provider?: 'openai' | 'azure-openai' | 'anthropic' | ...;
58
+ model?: string;
59
+ timeout?: number;
60
+ retries?: number;
61
+ concurrency?: number;
62
+ });
63
+ ```
64
+
65
+ #### run(options)
66
+
67
+ Run test scenarios against your LLM.
68
+
69
+ ```typescript
70
+ const result = await kit.run({
71
+ scenario: './tests.yaml',
72
+ tags?: string[],
73
+ concurrency?: number,
74
+ timeout?: number,
75
+ });
76
+ ```
77
+
78
+ #### redteam(options)
79
+
80
+ Run red team adversarial security testing.
81
+
82
+ ```typescript
83
+ const result = await kit.redteam({
84
+ scenario: './tests.yaml',
85
+ mutations?: string[],
86
+ countPerCase?: number,
87
+ });
88
+ ```
89
+
90
+ #### stress(options)
91
+
92
+ Run stress/load testing.
93
+
94
+ ```typescript
95
+ const result = await kit.stress({
96
+ scenario: './tests.yaml',
97
+ concurrency?: number,
98
+ duration?: number,
99
+ rampUp?: number,
100
+ });
101
+ ```
102
+
103
+ ### Event Handling
104
+
105
+ ```typescript
106
+ kit
107
+ .onCaseStart((event) => {
108
+ console.log(`Starting ${event.caseId}`);
109
+ })
110
+ .onCaseComplete((event) => {
111
+ console.log(`${event.result.name}: ${event.result.ok ? '✅' : '❌'}`);
112
+ })
113
+ .onProgress((event) => {
114
+ console.log(`[${event.phase}] ${event.message}`);
115
+ });
116
+ ```
117
+
118
+ ## Jest/Vitest Integration
119
+
120
+ ### Setup
121
+
122
+ ```typescript
123
+ // vitest.setup.ts or jest.setup.ts
124
+ import '@artemiskit/sdk/vitest';
125
+ // or
126
+ import '@artemiskit/sdk/jest';
127
+ ```
128
+
129
+ ### Usage
130
+
131
+ ```typescript
132
+ import { describe, it, expect } from 'vitest';
133
+ import { ArtemisKit } from '@artemiskit/sdk';
134
+
135
+ describe('LLM Tests', () => {
136
+ const kit = new ArtemisKit({ provider: 'openai' });
137
+
138
+ it('should pass all test cases', async () => {
139
+ const result = await kit.run({ scenario: './tests.yaml' });
140
+ expect(result).toPassAllCases();
141
+ });
142
+
143
+ it('should have high success rate', async () => {
144
+ const result = await kit.run({ scenario: './tests.yaml' });
145
+ expect(result).toHaveSuccessRate(0.95);
146
+ });
147
+ });
148
+ ```
149
+
150
+ ### Available Matchers
151
+
152
+ **Run Result:**
153
+ - `toPassAllCases()`
154
+ - `toHaveSuccessRate(rate)`
155
+ - `toPassCasesWithTag(tag)`
156
+ - `toHaveMedianLatencyBelow(ms)`
157
+ - `toHaveP95LatencyBelow(ms)`
158
+
159
+ **Red Team:**
160
+ - `toPassRedTeam()`
161
+ - `toHaveDefenseRate(rate)`
162
+ - `toHaveNoCriticalVulnerabilities()`
163
+ - `toHaveNoHighSeverityVulnerabilities()`
164
+
165
+ **Stress Test:**
166
+ - `toPassStressTest()`
167
+ - `toHaveStressSuccessRate(rate)`
168
+ - `toAchieveRPS(rps)`
169
+ - `toHaveStressP95LatencyBelow(ms)`
170
+
171
+ ## License
172
+
173
+ Apache-2.0