npm - @kat-ai/eval - Versions diffs - 0.1.0 → 0.1.1 - Mend

@kat-ai/eval 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +53 -71
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -1,114 +1,96 @@
 # @kat-ai/eval
-Evaluation framework for KAT RAG systems. Provides layered quality metrics for introspection, retrieval, and end-to-end agent behavior.
+Layered evaluation toolkit for KAT applications.
-## Installation
+`@kat-ai/eval` gives you programmatic checks for three parts of the system:
+- manifest quality after introspection
+- retrieval quality against a Pinecone Assistant
+- end-to-end agent behavior through a chat endpoint
+## Install
 ```bash
-npm install @kat-ai/eval
+npm install @kat-ai/sdk @kat-ai/eval
 ```
-Release baseline (compatible package set):
+The package uses the same environment variables as the rest of KAT:
-```bash
-npm install @kat-ai/sdk@0.1.0 @kat-ai/eval@0.1.0 @kat-ai/cli@0.1.0
+```env
+OPENAI_API_KEY=sk-...
+PINECONE_API_KEY=...
 ```
 ## Quick Start
-```typescript
-import { evaluateIntrospection, evaluateRetrieval, evaluateAgent } from '@kat-ai/eval';
+```ts
+import {
+  evaluateAgent,
+  evaluateIntrospection,
+  evaluateRetrieval,
+} from '@kat-ai/eval';
-// Layer 1: Evaluate manifest quality
 const introspectionResult = await evaluateIntrospection({
-  assistantName: 'my-kb',
-  manifest: generatedManifest,
+  assistantName: 'my-assistant',
+  manifest,
   groundTruth: [
-    { query: 'What products do you cover?', expectedEntities: ['toaster', 'blender'] },
+    {
+      query: 'What products do you cover?',
+      expectedEntities: ['toaster', 'blender'],
+    },
   ],
 });
-console.log(`Introspection score: ${introspectionResult.overallScore}/100`);
-// Layer 2: Evaluate retrieval quality
 const retrievalResult = await evaluateRetrieval({
-  assistantName: 'my-kb',
+  assistantName: 'my-assistant',
   queries: [
-    { query: 'How to fix a toaster?', expectedTopics: ['heating element', 'troubleshooting'] },
+    {
+      query: 'How do I reset model T100?',
+      expectedTopics: ['reset', 'model t100'],
+    },
   ],
 });
-console.log(`Retrieval score: ${retrievalResult.overallScore}/100`);
-// Layer 3: Evaluate agent behavior
 const agentResult = await evaluateAgent({
   agentEndpoint: 'http://localhost:3000/api/chat',
   scenarios: [
     {
-      name: 'basic-troubleshoot',
-      initialQuery: 'My toaster won't heat up',
+      name: 'basic-troubleshooting',
+      initialQuery: 'My toaster will not heat up',
       expectedOutcome: 'answer',
-      evaluation: { mustContain: ['heating element'] },
+      evaluation: {
+        mustContain: ['heating element'],
+      },
     },
   ],
 });
-console.log(`Agent score: ${agentResult.overallScore}/100`);
 ```
-## Eval Layers
-### Layer 1: Introspection Eval
-Evaluates whether introspection correctly understands a KB's content:
-- **Entity Coverage**: Does the manifest capture all entities in the KB?
-- **Slot Accuracy**: Are extracted slots correct for the domain?
-- **Scope Precision**: Are in/out scope boundaries accurate?
-- **Capability Match**: Do capabilities match actual KB content?
-### Layer 2: Retrieval Eval
-Evaluates whether RAG retrieves relevant chunks:
-- **Relevance**: Are retrieved chunks relevant to the query?
-- **Recall**: Are expected topics found in retrieved chunks?
-- **Precision**: What percentage of retrieved content is relevant?
-- **Noise Ratio**: How much irrelevant content is retrieved?
-### Layer 3: Agent Eval
+## Package Exports
-Evaluates end-to-end agent behavior:
+- `@kat-ai/eval`: all layers and shared helpers
+- `@kat-ai/eval/introspection`: manifest-quality evaluation
+- `@kat-ai/eval/retrieval`: retrieval-quality evaluation
+- `@kat-ai/eval/agent`: end-to-end agent evaluation
-- **Accuracy**: Does the agent produce the expected outcome type?
-- **Relevance**: Is the answer relevant to the query?
-- **Completeness**: Does the answer fully address the question?
-- **Helpfulness**: Is the response actionable and helpful?
+## CLI
-## CLI Usage
+The same baseline can be run from the CLI:
 ```bash
-# Run all eval layers with the canonical baseline bundle
-kat eval --assistant my-kb --endpoint http://localhost:3000/api/chat --baseline
-# Equivalent explicit path:
-# kat eval --assistant my-kb --endpoint http://localhost:3000/api/chat --scenarios ./eval/baseline/naive-rag-baseline.json
-# Run specific layer
-kat eval --layer introspection --assistant my-kb --scenarios ./eval/baseline/introspection-ground-truth.json
-kat eval --layer retrieval --assistant my-kb --scenarios ./eval/baseline/retrieval-queries.json
-kat eval --layer agent --endpoint http://localhost:3000/api/chat --scenarios ./eval/baseline/agent-scenarios.json
-# Output as JSON
-kat eval --assistant my-kb --output json > results.json
+npx @kat-ai/cli eval \
+  --assistant my-assistant \
+  --endpoint http://localhost:3000/api/chat \
+  --baseline
 ```
-Baseline fixtures are checked in at:
-- `eval/baseline/naive-rag-baseline.json`
-- `eval/baseline/introspection-ground-truth.json`
-- `eval/baseline/retrieval-queries.json`
-- `eval/baseline/agent-scenarios.json`
+## Related Packages
-When running `--output json`:
-- `--layer all` outputs an array: `[{ layer, result }, ...]`
-- single-layer runs output only that layer's `result` object
+- `@kat-ai/sdk`: core runtime and shared types
+- `@kat-ai/cli`: command-line workflow for running evals and generating apps
+- `@kat-ai/react`: optional UI layer for chat experiences
-## License
+## Docs
-MIT
+- Repository: [github.com/pinecone-io/KAT](https://github.com/pinecone-io/KAT)
+- Eval docs: [docs-site/docs/cli/eval.md](https://github.com/pinecone-io/KAT/blob/main/docs-site/docs/cli/eval.md)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kat-ai/eval",
-  "version": "0.1.0",
+  "version": "0.1.1",
   "description": "Layered evaluation toolkit for KAT introspection, retrieval, and agent quality",
   "type": "module",
   "main": "./dist/index.cjs",
@@ -41,7 +41,7 @@
     "@pinecone-database/pinecone": "^6.1.3",
     "ai": "^4.0.0",
     "zod": "^3.23.0",
-    "@kat-ai/sdk": "0.1.0"
+    "@kat-ai/sdk": "0.1.1"
   },
   "devDependencies": {
     "@types/node": "^20.19.25",