katt 0.0.7 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.nvmrc ADDED
@@ -0,0 +1 @@
1
+ 24
@@ -0,0 +1,92 @@
1
+
2
+ # Contributor Covenant 3.0 Code of Conduct
3
+
4
+ ## Our Pledge
5
+
6
+ We pledge to make our community welcoming, safe, and equitable for all.
7
+
8
+ We are committed to fostering an environment that respects and promotes the dignity, rights, and contributions of all individuals, regardless of characteristics including race, ethnicity, caste, color, age, physical characteristics, neurodiversity, disability, sex or gender, gender identity or expression, sexual orientation, language, philosophy or religion, national or social origin, socio-economic position, level of education, or other status. The same privileges of participation are extended to everyone who participates in good faith and in accordance with this Covenant.
9
+
10
+ ## Encouraged Behaviors
11
+
12
+ While acknowledging differences in social norms, we all strive to meet our community's expectations for positive behavior. We also understand that our words and actions may be interpreted differently than we intend based on culture, background, or native language.
13
+
14
+ With these considerations in mind, we agree to behave mindfully toward each other and act in ways that center our shared values, including:
15
+
16
+ 1. Respecting the **purpose of our community**, our activities, and our ways of gathering.
17
+ 2. Engaging **kindly and honestly** with others.
18
+ 3. Respecting **different viewpoints** and experiences.
19
+ 4. **Taking responsibility** for our actions and contributions.
20
+ 5. Gracefully giving and accepting **constructive feedback**.
21
+ 6. Committing to **repairing harm** when it occurs.
22
+ 7. Behaving in other ways that promote and sustain the **well-being of our community**.
23
+
24
+
25
+ ## Restricted Behaviors
26
+
27
+ We agree to restrict the following behaviors in our community. Instances, threats, and promotion of these behaviors are violations of this Code of Conduct.
28
+
29
+ 1. **Harassment.** Violating explicitly expressed boundaries or engaging in unnecessary personal attention after any clear request to stop.
30
+ 2. **Character attacks.** Making insulting, demeaning, or pejorative comments directed at a community member or group of people.
31
+ 3. **Stereotyping or discrimination.** Characterizing anyone’s personality or behavior on the basis of immutable identities or traits.
32
+ 4. **Sexualization.** Behaving in a way that would generally be considered inappropriately intimate in the context or purpose of the community.
33
+ 5. **Violating confidentiality**. Sharing or acting on someone's personal or private information without their permission.
34
+ 6. **Endangerment.** Causing, encouraging, or threatening violence or other harm toward any person or group.
35
+ 7. Behaving in other ways that **threaten the well-being** of our community.
36
+
37
+ ### Other Restrictions
38
+
39
+ 1. **Misleading identity.** Impersonating someone else for any reason, or pretending to be someone else to evade enforcement actions.
40
+ 2. **Failing to credit sources.** Not properly crediting the sources of content you contribute.
41
+ 3. **Promotional materials**. Sharing marketing or other commercial content in a way that is outside the norms of the community.
42
+ 4. **Irresponsible communication.** Failing to responsibly present content which includes, links or describes any other restricted behaviors.
43
+
44
+
45
+ ## Reporting an Issue
46
+
47
+ Tensions can occur between community members even when they are trying their best to collaborate. Not every conflict represents a code of conduct violation, and this Code of Conduct reinforces encouraged behaviors and norms that can help avoid conflicts and minimize harm.
48
+
49
+ When an incident does occur, it is important to report it promptly. To report a possible violation, **Open a discussion in this repository.**
50
+
51
+ Community Moderators take reports of violations seriously and will make every effort to respond in a timely manner. They will investigate all reports of code of conduct violations, reviewing messages, logs, and recordings, or interviewing witnesses and other participants. Community Moderators will keep investigation and enforcement actions as transparent as possible while prioritizing safety and confidentiality. In order to honor these values, enforcement actions are carried out in private with the involved parties, but communicating to the whole community may be part of a mutually agreed upon resolution.
52
+
53
+
54
+ ## Addressing and Repairing Harm
55
+
56
+ ****
57
+
58
+ If an investigation by the Community Moderators finds that this Code of Conduct has been violated, the following enforcement ladder may be used to determine how best to repair harm, based on the incident's impact on the individuals involved and the community as a whole. Depending on the severity of a violation, lower rungs on the ladder may be skipped.
59
+
60
+ 1) Warning
61
+ 1) Event: A violation involving a single incident or series of incidents.
62
+ 2) Consequence: A private, written warning from the Community Moderators.
63
+ 3) Repair: Examples of repair include a private written apology, acknowledgement of responsibility, and seeking clarification on expectations.
64
+ 2) Temporarily Limited Activities
65
+ 1) Event: A repeated incidence of a violation that previously resulted in a warning, or the first incidence of a more serious violation.
66
+ 2) Consequence: A private, written warning with a time-limited cooldown period designed to underscore the seriousness of the situation and give the community members involved time to process the incident. The cooldown period may be limited to particular communication channels or interactions with particular community members.
67
+ 3) Repair: Examples of repair may include making an apology, using the cooldown period to reflect on actions and impact, and being thoughtful about re-entering community spaces after the period is over.
68
+ 3) Temporary Suspension
69
+ 1) Event: A pattern of repeated violation which the Community Moderators have tried to address with warnings, or a single serious violation.
70
+ 2) Consequence: A private written warning with conditions for return from suspension. In general, temporary suspensions give the person being suspended time to reflect upon their behavior and possible corrective actions.
71
+ 3) Repair: Examples of repair include respecting the spirit of the suspension, meeting the specified conditions for return, and being thoughtful about how to reintegrate with the community when the suspension is lifted.
72
+ 4) Permanent Ban
73
+ 1) Event: A pattern of repeated code of conduct violations that other steps on the ladder have failed to resolve, or a violation so serious that the Community Moderators determine there is no way to keep the community safe with this person as a member.
74
+ 2) Consequence: Access to all community spaces, tools, and communication channels is removed. In general, permanent bans should be rarely used, should have strong reasoning behind them, and should only be resorted to if working through other remedies has failed to change the behavior.
75
+ 3) Repair: There is no possible repair in cases of this severity.
76
+
77
+ This enforcement ladder is intended as a guideline. It does not limit the ability of Community Managers to use their discretion and judgment, in keeping with the best interests of our community.
78
+
79
+
80
+ ## Scope
81
+
82
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public or other spaces. Examples of representing our community include using an official email address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
83
+
84
+
85
+ ## Attribution
86
+
87
+ This Code of Conduct is adapted from the Contributor Covenant, version 3.0, permanently available at [https://www.contributor-covenant.org/version/3/0/](https://www.contributor-covenant.org/version/3/0/).
88
+
89
+ Contributor Covenant is stewarded by the Organization for Ethical Source and licensed under CC BY-SA 4.0. To view a copy of this license, visit [https://creativecommons.org/licenses/by-sa/4.0/](https://creativecommons.org/licenses/by-sa/4.0/)
90
+
91
+ For answers to common questions about Contributor Covenant, see the FAQ at [https://www.contributor-covenant.org/faq](https://www.contributor-covenant.org/faq). Translations are provided at [https://www.contributor-covenant.org/translations](https://www.contributor-covenant.org/translations). Additional enforcement and community guideline resources can be found at [https://www.contributor-covenant.org/resources](https://www.contributor-covenant.org/resources). The enforcement ladder was inspired by the work of [Mozilla’s code of conduct team](https://github.com/mozilla/inclusion).
92
+
package/README.md CHANGED
@@ -12,16 +12,10 @@ Katt is a lightweight testing framework for running AI Evals, inspired by [Jest]
12
12
  - [Articles](#articles)
13
13
  - [Hello World - Example](#hello-world---example)
14
14
  - [Main Features](#main-features)
15
- - [Usage](#usage)
16
15
  - [Installation](#installation)
17
16
  - [Basic Usage](#basic-usage)
18
- - [Using promptFile](#using-promptfile)
19
17
  - [Specifying AI Models](#specifying-ai-models)
20
18
  - [Development](#development)
21
- - [Setup](#setup)
22
- - [Available Scripts](#available-scripts)
23
- - [Verification Process](#verification-process)
24
- - [Project Structure](#project-structure)
25
19
  - [How It Works](#how-it-works)
26
20
  - [Requirements](#requirements)
27
21
  - [License](#license)
@@ -29,7 +23,8 @@ Katt is a lightweight testing framework for running AI Evals, inspired by [Jest]
29
23
 
30
24
  ## Overview
31
25
 
32
- Katt is designed to evaluate and validate the behavior of AI agents like **Claude Code**, **GitHub Copilot**, **OpenAI Codex** and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.
26
+ #### Run your own benchmarks and evaluations
27
+ **Katt** is designed to evaluate and validate the behavior of AI agents like **Claude Code**, **GitHub Copilot**, **OpenAI Codex** and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.
33
28
 
34
29
  ## API Documentation
35
30
 
@@ -68,6 +63,7 @@ describe("Greeting agent", () => {
68
63
  - **Classification Matcher**: Built-in `toBeClassifiedAs()` matcher to grade a response against a target label on a 1-5 scale
69
64
  - **Concurrent Execution**: Runs eval files concurrently for faster test execution
70
65
  - **Model Selection**: Support for specifying custom AI models
66
+ - **Runtime Selection**: Run prompts through GitHub Copilot (default) or Codex
71
67
  - **Configurable Timeouts**: Override prompt wait time per test or via `katt.json`
72
68
 
73
69
  ## Usage
@@ -127,11 +123,14 @@ describe("Model selection", () => {
127
123
  });
128
124
  ```
129
125
 
130
- You can also set a default model for the project by adding a `katt.json` file in the project root:
126
+ You can also set runtime defaults in `katt.json`.
127
+
128
+ Copilot (default runtime):
131
129
 
132
130
  ```json
133
131
  {
134
- "copilot": {
132
+ "agent": "gh-copilot",
133
+ "agentOptions": {
135
134
  "model": "gpt-5-mini"
136
135
  },
137
136
  "prompt": {
@@ -140,10 +139,29 @@ You can also set a default model for the project by adding a `katt.json` file in
140
139
  }
141
140
  ```
142
141
 
142
+ Codex:
143
+
144
+ ```json
145
+ {
146
+ "agent": "codex",
147
+ "agentOptions": {
148
+ "model": "gpt-5-codex",
149
+ "profile": "default",
150
+ "sandbox": "workspace-write"
151
+ },
152
+ "prompt": {
153
+ "timeoutMs": 240000
154
+ }
155
+ }
156
+ ```
157
+
143
158
  When this file exists:
144
159
 
145
- - `prompt("...")` and `promptFile("...")` use `copilot.model` by default
146
- - `prompt("...", { model: "..." })` still overrides the config value
160
+ - Supported agents are:
161
+ - `gh-copilot` (default when `agent` is missing or unsupported)
162
+ - `codex`
163
+ - `prompt("...")` and `promptFile("...")` merge `agentOptions` with call-time options
164
+ - `prompt("...", { model: "..." })` overrides the model from config
147
165
  - `prompt.timeoutMs` sets the default wait timeout for long-running prompts
148
166
 
149
167
  ## Development
@@ -201,8 +219,8 @@ katt/
201
219
  ## Requirements
202
220
 
203
221
  - Node.js
204
- - GitHub Copilot CLI installed (see [GitHub Copilot CLI installation docs](https://docs.github.com/en/copilot/how-tos/copilot-cli/install-copilot-cli))
205
- - Access to AI models (e.g., OpenAI API key for Codex)
222
+ - For `gh-copilot` runtime: access to GitHub Copilot with a logged-in user
223
+ - For `codex` runtime: Codex CLI installed and authenticated (`codex login`)
206
224
 
207
225
  ## License
208
226
 
package/SECURITY.md ADDED
@@ -0,0 +1,10 @@
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ Since Katt is under development, only the latest version will be supported.
6
+
7
+ ## Reporting a Vulnerability
8
+
9
+ - Create an issue on this repository.
10
+ - Describe the vulnerability and the level of it.
@@ -0,0 +1 @@
1
+ heeey
@@ -0,0 +1,19 @@
1
+ import { describe, expect, it, prompt } from "katt";
2
+
3
+ describe('Hello World', () => {
4
+ it('should return the date in a json format', async () => {
5
+ const currentData = new Date(Date.now());
6
+
7
+ const result = await prompt('Return the current year in the format "{ year: YYYY }"');
8
+ expect(result).toContain(`{ year: ${currentData.getFullYear()} }`);
9
+ });
10
+
11
+ it('should classify a response as helpful', async () => {
12
+ const response = await prompt('You are a helpful assistant. Give one short tip for learning JavaScript.');
13
+ await expect(response).toBeClassifiedAs('helpful', { threshold: 3 });
14
+ });
15
+ });
16
+
17
+
18
+ const result2 = await prompt('If you read this just say heeey');
19
+ expect(result2.toLowerCase()).toMatchSnapshot();
@@ -0,0 +1,15 @@
1
+ import { describe, expect, it, prompt, promptFile } from "katt";
2
+
3
+ describe('Working with files', () => {
4
+ it('It should load the file and compare', async () => {
5
+ const result = await promptFile('./customPrompt.md');
6
+ expect(result.toLowerCase()).toContain('hola');
7
+ });
8
+ });
9
+
10
+ describe('Working with prompt as expectation', () => {
11
+ it('It should be friendly', async () => {
12
+ const result = await prompt('You are a friendly assistant. If you read this, say "Hola"!', { model: 'gpt-5.2' });
13
+ expect(result).promptCheck('To be friendly, the response should contain a greeting.');
14
+ });
15
+ });
@@ -0,0 +1 @@
1
+ If you read this, say "Hola"!