katt 0.0.8 → 0.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,92 @@
1
+
2
+ # Contributor Covenant 3.0 Code of Conduct
3
+
4
+ ## Our Pledge
5
+
6
+ We pledge to make our community welcoming, safe, and equitable for all.
7
+
8
+ We are committed to fostering an environment that respects and promotes the dignity, rights, and contributions of all individuals, regardless of characteristics including race, ethnicity, caste, color, age, physical characteristics, neurodiversity, disability, sex or gender, gender identity or expression, sexual orientation, language, philosophy or religion, national or social origin, socio-economic position, level of education, or other status. The same privileges of participation are extended to everyone who participates in good faith and in accordance with this Covenant.
9
+
10
+ ## Encouraged Behaviors
11
+
12
+ While acknowledging differences in social norms, we all strive to meet our community's expectations for positive behavior. We also understand that our words and actions may be interpreted differently than we intend based on culture, background, or native language.
13
+
14
+ With these considerations in mind, we agree to behave mindfully toward each other and act in ways that center our shared values, including:
15
+
16
+ 1. Respecting the **purpose of our community**, our activities, and our ways of gathering.
17
+ 2. Engaging **kindly and honestly** with others.
18
+ 3. Respecting **different viewpoints** and experiences.
19
+ 4. **Taking responsibility** for our actions and contributions.
20
+ 5. Gracefully giving and accepting **constructive feedback**.
21
+ 6. Committing to **repairing harm** when it occurs.
22
+ 7. Behaving in other ways that promote and sustain the **well-being of our community**.
23
+
24
+
25
+ ## Restricted Behaviors
26
+
27
+ We agree to restrict the following behaviors in our community. Instances, threats, and promotion of these behaviors are violations of this Code of Conduct.
28
+
29
+ 1. **Harassment.** Violating explicitly expressed boundaries or engaging in unnecessary personal attention after any clear request to stop.
30
+ 2. **Character attacks.** Making insulting, demeaning, or pejorative comments directed at a community member or group of people.
31
+ 3. **Stereotyping or discrimination.** Characterizing anyone’s personality or behavior on the basis of immutable identities or traits.
32
+ 4. **Sexualization.** Behaving in a way that would generally be considered inappropriately intimate in the context or purpose of the community.
33
+ 5. **Violating confidentiality**. Sharing or acting on someone's personal or private information without their permission.
34
+ 6. **Endangerment.** Causing, encouraging, or threatening violence or other harm toward any person or group.
35
+ 7. Behaving in other ways that **threaten the well-being** of our community.
36
+
37
+ ### Other Restrictions
38
+
39
+ 1. **Misleading identity.** Impersonating someone else for any reason, or pretending to be someone else to evade enforcement actions.
40
+ 2. **Failing to credit sources.** Not properly crediting the sources of content you contribute.
41
+ 3. **Promotional materials**. Sharing marketing or other commercial content in a way that is outside the norms of the community.
42
+ 4. **Irresponsible communication.** Failing to responsibly present content which includes, links or describes any other restricted behaviors.
43
+
44
+
45
+ ## Reporting an Issue
46
+
47
+ Tensions can occur between community members even when they are trying their best to collaborate. Not every conflict represents a code of conduct violation, and this Code of Conduct reinforces encouraged behaviors and norms that can help avoid conflicts and minimize harm.
48
+
49
+ When an incident does occur, it is important to report it promptly. To report a possible violation, **Open a discussion in this repository.**
50
+
51
+ Community Moderators take reports of violations seriously and will make every effort to respond in a timely manner. They will investigate all reports of code of conduct violations, reviewing messages, logs, and recordings, or interviewing witnesses and other participants. Community Moderators will keep investigation and enforcement actions as transparent as possible while prioritizing safety and confidentiality. In order to honor these values, enforcement actions are carried out in private with the involved parties, but communicating to the whole community may be part of a mutually agreed upon resolution.
52
+
53
+
54
+ ## Addressing and Repairing Harm
55
+
56
+ ****
57
+
58
+ If an investigation by the Community Moderators finds that this Code of Conduct has been violated, the following enforcement ladder may be used to determine how best to repair harm, based on the incident's impact on the individuals involved and the community as a whole. Depending on the severity of a violation, lower rungs on the ladder may be skipped.
59
+
60
+ 1) Warning
61
+ 1) Event: A violation involving a single incident or series of incidents.
62
+ 2) Consequence: A private, written warning from the Community Moderators.
63
+ 3) Repair: Examples of repair include a private written apology, acknowledgement of responsibility, and seeking clarification on expectations.
64
+ 2) Temporarily Limited Activities
65
+ 1) Event: A repeated incidence of a violation that previously resulted in a warning, or the first incidence of a more serious violation.
66
+ 2) Consequence: A private, written warning with a time-limited cooldown period designed to underscore the seriousness of the situation and give the community members involved time to process the incident. The cooldown period may be limited to particular communication channels or interactions with particular community members.
67
+ 3) Repair: Examples of repair may include making an apology, using the cooldown period to reflect on actions and impact, and being thoughtful about re-entering community spaces after the period is over.
68
+ 3) Temporary Suspension
69
+ 1) Event: A pattern of repeated violation which the Community Moderators have tried to address with warnings, or a single serious violation.
70
+ 2) Consequence: A private written warning with conditions for return from suspension. In general, temporary suspensions give the person being suspended time to reflect upon their behavior and possible corrective actions.
71
+ 3) Repair: Examples of repair include respecting the spirit of the suspension, meeting the specified conditions for return, and being thoughtful about how to reintegrate with the community when the suspension is lifted.
72
+ 4) Permanent Ban
73
+ 1) Event: A pattern of repeated code of conduct violations that other steps on the ladder have failed to resolve, or a violation so serious that the Community Moderators determine there is no way to keep the community safe with this person as a member.
74
+ 2) Consequence: Access to all community spaces, tools, and communication channels is removed. In general, permanent bans should be rarely used, should have strong reasoning behind them, and should only be resorted to if working through other remedies has failed to change the behavior.
75
+ 3) Repair: There is no possible repair in cases of this severity.
76
+
77
+ This enforcement ladder is intended as a guideline. It does not limit the ability of Community Managers to use their discretion and judgment, in keeping with the best interests of our community.
78
+
79
+
80
+ ## Scope
81
+
82
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public or other spaces. Examples of representing our community include using an official email address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
83
+
84
+
85
+ ## Attribution
86
+
87
+ This Code of Conduct is adapted from the Contributor Covenant, version 3.0, permanently available at [https://www.contributor-covenant.org/version/3/0/](https://www.contributor-covenant.org/version/3/0/).
88
+
89
+ Contributor Covenant is stewarded by the Organization for Ethical Source and licensed under CC BY-SA 4.0. To view a copy of this license, visit [https://creativecommons.org/licenses/by-sa/4.0/](https://creativecommons.org/licenses/by-sa/4.0/)
90
+
91
+ For answers to common questions about Contributor Covenant, see the FAQ at [https://www.contributor-covenant.org/faq](https://www.contributor-covenant.org/faq). Translations are provided at [https://www.contributor-covenant.org/translations](https://www.contributor-covenant.org/translations). Additional enforcement and community guideline resources can be found at [https://www.contributor-covenant.org/resources](https://www.contributor-covenant.org/resources). The enforcement ladder was inspired by the work of [Mozilla’s code of conduct team](https://github.com/mozilla/inclusion).
92
+
package/README.md CHANGED
@@ -12,24 +12,21 @@ Katt is a lightweight testing framework for running AI Evals, inspired by [Jest]
12
12
  - [Articles](#articles)
13
13
  - [Hello World - Example](#hello-world---example)
14
14
  - [Main Features](#main-features)
15
- - [Usage](#usage)
16
15
  - [Installation](#installation)
17
16
  - [Basic Usage](#basic-usage)
18
- - [Using promptFile](#using-promptfile)
19
17
  - [Specifying AI Models](#specifying-ai-models)
20
18
  - [Development](#development)
21
- - [Setup](#setup)
22
- - [Available Scripts](#available-scripts)
23
- - [Verification Process](#verification-process)
24
- - [Project Structure](#project-structure)
25
19
  - [How It Works](#how-it-works)
20
+ - [Execution Flow](#execution-flow)
21
+ - [Architecture](#architecture)
26
22
  - [Requirements](#requirements)
27
23
  - [License](#license)
28
24
  - [Contributing](#contributing)
29
25
 
30
26
  ## Overview
31
27
 
32
- Katt is designed to evaluate and validate the behavior of AI agents like **Claude Code**, **GitHub Copilot**, **OpenAI Codex** and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.
28
+ #### Run your own benchmarks and evaluations
29
+ **Katt** is designed to evaluate and validate the behavior of AI agents like **Claude Code**, **GitHub Copilot**, **OpenAI Codex** and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.
33
30
 
34
31
  ## API Documentation
35
32
 
@@ -130,7 +127,7 @@ describe("Model selection", () => {
130
127
 
131
128
  You can also set runtime defaults in `katt.json`.
132
129
 
133
- Copilot (default runtime):
130
+ GitHub Copilot (default runtime):
134
131
 
135
132
  ```json
136
133
  {
@@ -189,29 +186,36 @@ npm install
189
186
 
190
187
  ### Verification Process
191
188
 
192
- After making changes, run the following sequence:
189
+ To verify your changes before opening a pull request, run:
193
190
 
194
- 1. `npm run format`
191
+ 1. `npm test`
195
192
  2. `npm run typecheck`
196
- 3. `npm run test`
197
- 4. `npm run build`
198
- 5. `npm run test:build`
193
+ 3. `npm run lint`
194
+ 4. `npm run format`
199
195
 
200
- ## Project Structure
196
+ For more details, see the [verification process section in CONTRIBUTING.md](./CONTRIBUTING.md#verification-process).
197
+ ## How It Works
201
198
 
199
+ Katt runs eval files as executable test programs and coordinates collection, assertion failures, and reporting through its runtime context.
200
+
201
+ ## Execution Flow
202
+
203
+ ```mermaid
204
+ sequenceDiagram
205
+ participant User as User/CI
206
+ participant CLI as katt CLI
207
+ participant FS as File Scanner
208
+ participant Eval as Eval Runtime
209
+ participant Report as Reporter
210
+
211
+ User->>CLI: Run `npx katt`
212
+ CLI->>FS: Discover `*.eval.js` and `*.eval.ts`
213
+ FS-->>CLI: Return eval file list
214
+ CLI->>Eval: Execute eval files
215
+ Eval-->>CLI: Return pass/fail results
216
+ CLI->>Report: Print per-test output + summary
217
+ Report-->>User: Exit code (`0` pass, `1` fail)
202
218
  ```
203
- katt/
204
- ├── src/ # Source code
205
- │ ├── cli/ # CLI implementation
206
- │ ├── lib/ # Core libraries (describe, it, expect, prompt)
207
- │ └── types/ # TypeScript type definitions
208
- ├── examples/ # Example eval files
209
- ├── specs/ # Markdown specifications
210
- ├── package.json # Package configuration
211
- └── tsconfig.json # TypeScript configuration
212
- ```
213
-
214
- ## How It Works
215
219
 
216
220
  1. Katt searches the current directory recursively for `*.eval.js` and `*.eval.ts` files
217
221
  2. It skips `.git` and `node_modules` directories
@@ -221,6 +225,22 @@ katt/
221
225
  6. A summary is displayed showing passed/failed tests and total duration
222
226
  7. Katt exits with code `0` on success or `1` on failure
223
227
 
228
+ ## Architecture
229
+
230
+ ```mermaid
231
+ flowchart LR
232
+ User["Developer"] --> CLI["katt CLI"]
233
+ CLI --> EvalFiles["Eval files (*.eval.ts / *.eval.js)"]
234
+ CLI --> Config["katt.json config"]
235
+ EvalFiles --> Runtime["Test runtime (describe/it context)"]
236
+ Config --> Runtime
237
+ Runtime --> Assertions["Assertions + snapshots"]
238
+ Runtime --> Prompts["prompt() / promptFile()"]
239
+ Prompts --> AI["AI runtime (GitHub Copilot or Codex CLI)"]
240
+ Assertions --> Report["Terminal report + exit code"]
241
+ AI --> Report
242
+ ```
243
+
224
244
  ## Requirements
225
245
 
226
246
  - Node.js
package/SECURITY.md ADDED
@@ -0,0 +1,10 @@
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ Since Katt is under development, only the latest version will be supported.
6
+
7
+ ## Reporting a Vulnerability
8
+
9
+ - Create an issue on this repository.
10
+ - Describe the vulnerability and the level of it.
@@ -7,13 +7,4 @@ describe('Hello World', () => {
7
7
  const result = await prompt('Return the current year in the format "{ year: YYYY }"');
8
8
  expect(result).toContain(`{ year: ${currentData.getFullYear()} }`);
9
9
  });
10
-
11
- it('should classify a response as helpful', async () => {
12
- const response = await prompt('You are a helpful assistant. Give one short tip for learning JavaScript.');
13
- await expect(response).toBeClassifiedAs('helpful', { threshold: 3 });
14
- });
15
10
  });
16
-
17
-
18
- const result2 = await prompt('If you read this just say heeey');
19
- expect(result2.toLowerCase()).toMatchSnapshot();
@@ -6,10 +6,3 @@ describe('Working with files', () => {
6
6
  expect(result.toLowerCase()).toContain('hola');
7
7
  });
8
8
  });
9
-
10
- describe('Working with prompt as expectation', () => {
11
- it('It should be friendly', async () => {
12
- const result = await prompt('You are a friendly assistant. If you read this, say "Hola"!', { model: 'gpt-5.2' });
13
- expect(result).promptCheck('To be friendly, the response should contain a greeting.');
14
- });
15
- });