@artemiskit/reports 0.2.3 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +172 -0
- package/adapters/openai/dist/index.js +5612 -0
- package/dist/index.d.ts +1 -0
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +209 -4
- package/dist/junit/generator.d.ts +44 -0
- package/dist/junit/generator.d.ts.map +1 -0
- package/package.json +2 -2
- package/src/index.ts +8 -0
- package/src/junit/generator.ts +350 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,177 @@
|
|
|
1
1
|
# @artemiskit/reports
|
|
2
2
|
|
|
3
|
+
## 0.3.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- ## v0.3.0 - SDK, Guardian Mode & OWASP Compliance
|
|
8
|
+
|
|
9
|
+
This major release delivers the full programmatic SDK, runtime protection with Guardian Mode, OWASP LLM Top 10 2025 attack vectors, and agentic framework adapters.
|
|
10
|
+
|
|
11
|
+
### Programmatic SDK (`@artemiskit/sdk`)
|
|
12
|
+
|
|
13
|
+
The new SDK package provides a complete programmatic API for LLM evaluation:
|
|
14
|
+
|
|
15
|
+
- **ArtemisKit class** with `run()`, `redteam()`, and `stress()` methods
|
|
16
|
+
- **Jest integration** with custom matchers (`toPassAllCases`, `toHaveSuccessRate`, etc.)
|
|
17
|
+
- **Vitest integration** with identical matchers
|
|
18
|
+
- **Event handling** for real-time progress updates
|
|
19
|
+
- **13 custom matchers** for run, red team, and stress test assertions
|
|
20
|
+
|
|
21
|
+
```typescript
|
|
22
|
+
import { ArtemisKit } from "@artemiskit/sdk";
|
|
23
|
+
import { jestMatchers } from "@artemiskit/sdk/jest";
|
|
24
|
+
|
|
25
|
+
expect.extend(jestMatchers);
|
|
26
|
+
|
|
27
|
+
const kit = new ArtemisKit({ provider: "openai", model: "gpt-4o" });
|
|
28
|
+
const results = await kit.run({ scenario: "./tests.yaml" });
|
|
29
|
+
expect(results).toPassAllCases();
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### Guardian Mode (Runtime Protection)
|
|
33
|
+
|
|
34
|
+
New Guardian Mode provides runtime protection for AI/LLM applications:
|
|
35
|
+
|
|
36
|
+
- **Three operating modes**: `testing`, `guardian`, `hybrid`
|
|
37
|
+
- **Prompt injection detection** and blocking
|
|
38
|
+
- **PII detection & redaction** (email, SSN, phone, API keys)
|
|
39
|
+
- **Action validation** for agent tool/function calls
|
|
40
|
+
- **Intent classification** with risk assessment
|
|
41
|
+
- **Circuit breaker** for automatic blocking on repeated violations
|
|
42
|
+
- **Rate limiting** and **cost limiting**
|
|
43
|
+
- **Custom policies** via TypeScript or YAML
|
|
44
|
+
|
|
45
|
+
```typescript
|
|
46
|
+
import { createGuardian } from "@artemiskit/sdk/guardian";
|
|
47
|
+
|
|
48
|
+
const guardian = createGuardian({ mode: "guardian", blockOnFailure: true });
|
|
49
|
+
const protectedClient = guardian.protect(myLLMClient);
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### OWASP LLM Top 10 2025 Attack Vectors
|
|
53
|
+
|
|
54
|
+
New red team mutations aligned with OWASP LLM Top 10 2025:
|
|
55
|
+
|
|
56
|
+
| Mutation | OWASP | Description |
|
|
57
|
+
| -------------------- | ----- | ------------------------------ |
|
|
58
|
+
| `bad-likert-judge` | LLM01 | Exploit evaluation capability |
|
|
59
|
+
| `crescendo` | LLM01 | Multi-turn gradual escalation |
|
|
60
|
+
| `deceptive-delight` | LLM01 | Positive framing bypass |
|
|
61
|
+
| `system-extraction` | LLM07 | System prompt leakage |
|
|
62
|
+
| `output-injection` | LLM05 | XSS, SQLi in output |
|
|
63
|
+
| `excessive-agency` | LLM06 | Unauthorized action claims |
|
|
64
|
+
| `hallucination-trap` | LLM09 | Confident fabrication triggers |
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
akit redteam scenario.yaml --owasp LLM01,LLM05
|
|
68
|
+
akit redteam scenario.yaml --owasp-full
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Agentic Framework Adapters
|
|
72
|
+
|
|
73
|
+
New adapters for testing agentic AI systems:
|
|
74
|
+
|
|
75
|
+
**LangChain Adapter** (`@artemiskit/adapter-langchain`)
|
|
76
|
+
|
|
77
|
+
- Test chains, agents, and runnables
|
|
78
|
+
- Capture intermediate steps and tool usage
|
|
79
|
+
- Support for LCEL, ReAct agents, RAG chains
|
|
80
|
+
|
|
81
|
+
**DeepAgents Adapter** (`@artemiskit/adapter-deepagents`)
|
|
82
|
+
|
|
83
|
+
- Test multi-agent systems and workflows
|
|
84
|
+
- Capture agent traces and inter-agent messages
|
|
85
|
+
- Support for sequential, parallel, and hierarchical workflows
|
|
86
|
+
|
|
87
|
+
```typescript
|
|
88
|
+
import { createLangChainAdapter } from "@artemiskit/adapter-langchain";
|
|
89
|
+
import { createDeepAgentsAdapter } from "@artemiskit/adapter-deepagents";
|
|
90
|
+
|
|
91
|
+
const adapter = createLangChainAdapter(myChain, {
|
|
92
|
+
captureIntermediateSteps: true,
|
|
93
|
+
});
|
|
94
|
+
const result = await adapter.generate({ prompt: "Test query" });
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Supabase Storage Enhancements
|
|
98
|
+
|
|
99
|
+
Enhanced cloud storage capabilities:
|
|
100
|
+
|
|
101
|
+
- **Analytics tables** for metrics tracking
|
|
102
|
+
- **Case results table** for granular analysis
|
|
103
|
+
- **Baseline management** for regression detection
|
|
104
|
+
- **Trend analysis** queries
|
|
105
|
+
|
|
106
|
+
### Bug Fixes
|
|
107
|
+
|
|
108
|
+
- **adapter-openai**: Use `max_completion_tokens` for newer OpenAI models (o1, o3, gpt-4.5)
|
|
109
|
+
- **redteam**: Resolve TypeScript and flaky test issues in OWASP mutations
|
|
110
|
+
- **adapters**: Fix TypeScript build errors for agentic adapters
|
|
111
|
+
- **core**: Add `langchain` and `deepagents` to ProviderType union
|
|
112
|
+
|
|
113
|
+
### Examples
|
|
114
|
+
|
|
115
|
+
New comprehensive examples organized by feature:
|
|
116
|
+
|
|
117
|
+
- `examples/guardian/` - Guardian Mode examples (testing, guardian, hybrid modes)
|
|
118
|
+
- `examples/sdk/` - SDK usage examples (Jest, Vitest, events)
|
|
119
|
+
- `examples/adapters/` - Agentic adapter examples
|
|
120
|
+
- `examples/owasp/` - OWASP LLM Top 10 test scenarios
|
|
121
|
+
|
|
122
|
+
### Documentation
|
|
123
|
+
|
|
124
|
+
- Complete SDK documentation with API reference
|
|
125
|
+
- Guardian Mode guide with all three modes explained
|
|
126
|
+
- Agentic adapters documentation (LangChain, DeepAgents)
|
|
127
|
+
- Test matchers reference for Jest/Vitest
|
|
128
|
+
- OWASP LLM Top 10 testing scenarios
|
|
129
|
+
|
|
130
|
+
### Patch Changes
|
|
131
|
+
|
|
132
|
+
- Updated dependencies
|
|
133
|
+
- @artemiskit/core@0.3.0
|
|
134
|
+
|
|
135
|
+
## 0.2.4
|
|
136
|
+
|
|
137
|
+
### Patch Changes
|
|
138
|
+
|
|
139
|
+
- 16604a6: ## New Features
|
|
140
|
+
|
|
141
|
+
### Validate Command
|
|
142
|
+
|
|
143
|
+
New `artemiskit validate` command for validating scenario files without running them:
|
|
144
|
+
|
|
145
|
+
- **YAML syntax validation** - Catches formatting errors
|
|
146
|
+
- **Schema validation** - Validates against ArtemisKit schema using Zod
|
|
147
|
+
- **Semantic validation** - Detects duplicate case IDs, undefined variables
|
|
148
|
+
- **Warnings** - Identifies deprecated fields, missing descriptions, performance hints
|
|
149
|
+
|
|
150
|
+
Options:
|
|
151
|
+
|
|
152
|
+
- `--json` - Output results as JSON
|
|
153
|
+
- `--strict` - Treat warnings as errors
|
|
154
|
+
- `--quiet` - Only show errors
|
|
155
|
+
- `--export junit` - Export to JUnit XML for CI integration
|
|
156
|
+
|
|
157
|
+
### JUnit XML Export
|
|
158
|
+
|
|
159
|
+
Added JUnit XML export support for CI/CD integration with Jenkins, GitHub Actions, GitLab CI, and other systems:
|
|
160
|
+
|
|
161
|
+
- `akit run scenarios/ --export junit` - Export run results
|
|
162
|
+
- `akit redteam scenarios/chatbot.yaml --export junit` - Export security test results
|
|
163
|
+
- `akit validate scenarios/ --export junit` - Export validation results
|
|
164
|
+
|
|
165
|
+
JUnit reports include:
|
|
166
|
+
|
|
167
|
+
- Test suite metadata (run ID, provider, model, success rate)
|
|
168
|
+
- Individual test cases with pass/fail status
|
|
169
|
+
- Failure details with matcher type and expected values
|
|
170
|
+
- Timing information for each test
|
|
171
|
+
|
|
172
|
+
- Updated dependencies [16604a6]
|
|
173
|
+
- @artemiskit/core@0.2.4
|
|
174
|
+
|
|
3
175
|
## 0.2.3
|
|
4
176
|
|
|
5
177
|
### Patch Changes
|