npm - @artemiskit/cli - Versions diffs - 0.1.8 → 0.2.2 - Mend

@artemiskit/cli 0.1.8 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/CHANGELOG.md +139 -0
package/bin/artemis.ts +0 -0
package/dist/index.js +72343 -34002
package/dist/src/cli.d.ts.map +1 -1
package/dist/src/commands/baseline.d.ts +9 -0
package/dist/src/commands/baseline.d.ts.map +1 -0
package/dist/src/commands/compare.d.ts.map +1 -1
package/dist/src/commands/init.d.ts.map +1 -1
package/dist/src/commands/redteam.d.ts.map +1 -1
package/dist/src/commands/run.d.ts.map +1 -1
package/dist/src/commands/stress.d.ts.map +1 -1
package/dist/src/config/loader.d.ts +3 -1
package/dist/src/config/loader.d.ts.map +1 -1
package/dist/src/config/schema.d.ts +16 -0
package/dist/src/config/schema.d.ts.map +1 -1
package/dist/src/ui/index.d.ts +3 -1
package/dist/src/ui/index.d.ts.map +1 -1
package/dist/src/ui/panels.d.ts +21 -0
package/dist/src/ui/panels.d.ts.map +1 -1
package/dist/src/ui/prompts.d.ts +92 -0
package/dist/src/ui/prompts.d.ts.map +1 -0
package/dist/src/utils/adapter.d.ts.map +1 -1
package/package.json +6 -6
package/src/cli.ts +2 -0
package/src/commands/baseline.ts +473 -0
package/src/commands/compare.ts +25 -0
package/src/commands/init.ts +173 -69
package/src/commands/redteam.ts +63 -10
package/src/commands/run.ts +863 -141
package/src/commands/stress.ts +76 -3
package/src/config/loader.ts +5 -2
package/src/config/schema.ts +4 -0
package/src/ui/index.ts +19 -0
package/src/ui/panels.ts +153 -5
package/src/ui/prompts.ts +749 -0
package/src/utils/adapter.ts +15 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,144 @@
 # @artemiskit/cli
+## 0.2.2
+### Patch Changes
+- d5ca7c6: Add baseline command and CI mode for regression detection
+  ### New Features
+  - **Baseline Command**: New `akit baseline` command with `set`, `list`, `get`, `remove` subcommands
+    - Lookup by run ID (default) or scenario name (`--scenario` flag)
+    - Store and manage baseline metrics for regression comparison
+  - **CI Mode**: New `--ci` flag for machine-readable output
+    - Outputs environment variable format for easy parsing
+    - Auto-detects CI environments (GitHub Actions, GitLab CI, etc.)
+    - Suppresses colors and spinners
+  - **Summary Formats**: New `--summary` flag with `json`, `text`, `security` formats
+    - JSON summary for pipeline parsing
+    - Security summary for compliance reporting
+  - **Regression Detection**: New `--baseline` and `--threshold` flags
+    - Compare runs against saved baselines
+    - Configurable regression threshold (default 5%)
+    - Exit code 1 on regression detection
+- Updated dependencies [d5ca7c6]
+  - @artemiskit/core@0.2.2
+  - @artemiskit/adapter-openai@0.1.9
+  - @artemiskit/adapter-vercel-ai@0.1.9
+  - @artemiskit/redteam@0.2.2
+  - @artemiskit/reports@0.2.2
+## 0.2.1
+### Patch Changes
+- fix: improve LLM grader compatibility with reasoning models
+  - Remove temperature parameter from LLM grader (reasoning models like o1, o3, gpt-5-mini only support temperature=1)
+  - Increase maxTokens from 200 to 1000 to accommodate reasoning models that use tokens for internal thinking
+  - Improve grader prompt for stricter JSON-only output format
+  - Add fallback parsing for malformed JSON responses
+  - Add markdown code block stripping from grader responses
+  - Add `modelFamily` configuration option to Azure OpenAI provider for correct parameter detection when deployment names differ from model names
+- Updated dependencies
+  - @artemiskit/core@0.2.1
+  - @artemiskit/adapter-openai@0.1.8
+  - @artemiskit/adapter-vercel-ai@0.1.8
+  - @artemiskit/redteam@0.2.1
+  - @artemiskit/reports@0.2.1
+## 0.2.0
+### Minor Changes
+- d2c3835: ## v0.2.0 - Enhanced Evaluation Features
+  ### CLI (`@artemiskit/cli`)
+  #### New Features
+  - **Multi-turn mutations**: Added `--mutations multi_turn` flag for red team testing with 4 built-in strategies:
+    - `gradual_escalation`: Gradually intensifies requests over conversation turns
+    - `context_switching`: Shifts topics to lower defenses before attack
+    - `persona_building`: Establishes trust through roleplay
+    - `distraction`: Uses side discussions to slip in harmful requests
+  - **Custom multi-turn conversations**: Support for array prompts in red team scenarios (consistent with `run` command format). The last user message becomes the attack target, preceding messages form conversation context.
+  - **Custom attacks**: Added `--custom-attacks` flag to load custom attack patterns from YAML files with template variables and variations.
+  - **Encoding mutations**: Added `--mutations encoding` for obfuscation attacks (base64, ROT13, hex, unicode).
+  - **Directory scanning**: Run all scenarios in a directory with `akit run scenarios/`
+  - **Glob pattern matching**: Use patterns like `akit run scenarios/**/*.yaml`
+  - **Parallel execution**: Added `--parallel` flag for concurrent scenario execution
+  - **Scenario tags**: Filter scenarios with `--tags` flag
+  ### Core (`@artemiskit/core`)
+  #### New Features
+  - **Combined matchers**: New `type: combined` expectation with `operator: and|or` for complex assertion logic
+  - **`not_contains` expectation**: Negative containment check to ensure responses don't include specific text
+  - **`similarity` expectation**: Semantic similarity matching with two modes:
+    - Embedding-based: Uses vector embeddings for fast semantic comparison
+    - LLM-based fallback: Uses LLM to evaluate semantic similarity when embeddings unavailable
+    - Configurable threshold (default 0.75)
+  - **`inline` expectation**: Safe expression-based custom matchers in YAML using JavaScript-like expressions (e.g., `response.length > 100`, `response.includes('hello')`)
+  - **p90 latency metric**: Added p90 percentile to stress test latency metrics
+  - **Token usage tracking**: Monitor token consumption per request in stress tests
+  - **Cost estimation**: Estimate API costs with model pricing data
+  ### Red Team (`@artemiskit/redteam`)
+  #### New Features
+  - **MultiTurnMutation class**: Full implementation with strategy support and custom conversation prefixes
+  - **Custom attack loader**: Parse and load custom attack patterns from YAML
+  - **Encoding mutation**: Obfuscate attack payloads using various encoding schemes
+  - **CVSS-like severity scoring**: Detailed attack severity scoring with:
+    - `CvssScore` interface with attack vector, complexity, impact metrics
+    - `CvssCalculator` class for score calculation and aggregation
+    - Predefined scores for all mutations and detection categories
+    - Human-readable score descriptions and vector strings
+  ### Reports (`@artemiskit/reports`)
+  #### New Features
+  - **Run comparison HTML report**: Visual diff between two runs showing:
+    - Metrics overview with baseline vs current comparison
+    - Change summary (regressions, improvements, unchanged)
+    - Case-by-case comparison table with filtering
+    - Side-by-side response comparison for each case
+  - **Comparison JSON export**: Structured comparison data for programmatic use
+  ### CLI Enhancements
+  - **Compare command `--html` flag**: Generate HTML comparison report
+  - **Compare command `--json` flag**: Generate JSON comparison data
+  ### Documentation
+  - Updated all CLI command documentation
+  - Added comprehensive examples for custom multi-turn scenarios
+  - Documented combined matchers and `not_contains` expectations
+  - Added mutation strategy reference tables
+### Patch Changes
+- Updated dependencies [d2c3835]
+  - @artemiskit/core@0.2.0
+  - @artemiskit/redteam@0.2.0
+  - @artemiskit/reports@0.2.0
+  - @artemiskit/adapter-openai@0.1.7
+  - @artemiskit/adapter-vercel-ai@0.1.7
 ## 0.1.8
 ### Patch Changes

package/bin/artemis.ts CHANGED Viewed

File without changes