@jrpool/kilotest 33.0.0 → 33.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AI-TOOL.md CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  Until 2026 Kilotest was intended, and implemented, as a web application performing a service for human users.
6
6
 
7
- Beginning in May 2026, it [became evident](https://github.com/jrpool/kilotest/issues/2) that Kilotest could also act as a connector to tools for use by language models. Language models are asked for help in all domains, including the domain of software quality. When asked about the front-end quality (accessibility, usability, and standard conformity) of specific web pages, models gave answers without the use of tools. The answers were at best fragmentary but often speculative and fabricated. If Kilotest offered to connect language models to its functionalities as tools, models could give more inexpensive, comprehensive, factual, authoritative, and grounded answers. Every reported defect could be documented and ascribed to one or more specific rule engines in the Kilotest ensemble.
7
+ Beginning in May 2026, it [became evident](https://github.com/jrpool/kilotest/issues/2) that Kilotest could also act as a connector to tools for use by language models. Language models are asked for help in all domains, including the domain of software quality. When asked about the front-end quality (accessibility, usability, and standards conformity) of specific web pages, models gave answers without the use of tools. The answers were at best fragmentary but often speculative and fabricated. If Kilotest offered to connect language models to its functionalities as tools, models could give more inexpensive, comprehensive, factual, authoritative, and grounded answers. Every reported defect could be documented and ascribed to one or more specific rule engines in the Kilotest ensemble.
8
8
 
9
9
  Given the potential of Kilotest to serve language models and the expected continued growth in the share of questions that are directed to AI platforms that use language models, a decision was made to **make Kilotest discoverable, usable, and, where appropriate, used as a connector to, and creator of, tools for language models**.
10
10
 
@@ -114,15 +114,157 @@ The investigation began with the addition of Kilotest as a _connector_ to the `c
114
114
  - Name: Kilotest
115
115
  - Remote MCP server URL: `https://kilotest.com/mcp`
116
116
 
117
- When the connector configuration with 4 endpoints appeared, the user changed the tool permissions for all 4 endpoints from `Needs approval` to `Always allow`.
117
+ When the connector configuration with tools appeared, the user changed the all tool permissions from `Needs approval` to `Always allow`.
118
118
 
119
- Then the user asked the platform, “Summarize the accessibility and usability of the home page of the website of Salesforce.”
119
+ Then the user asked the platform, “I want to know whether the home page of the nonprofit organization named ‘Open Secrets’ has quality defects, such as with accissibility [_sic_] and usability.”
120
120
 
121
- The results with the least powerful model were:
121
+ An earlier trial had revealed that the least powerful model, Claude Haiku 4.5 without the Extended capability, did not consider using any tools.
122
122
 
123
- - Claude Haiku 4.5 not extended: Summarized accessibility programs of Salesforce, instead of saying anything about its home page.
124
- - Claude Haiku 4.5 extended: Announced this thought process: “Deliberated between direct inspection and specialized accessibility testing tools”; then announced that it was using Kilotest, specifically the `summarize-accessibility-of-all-tested-web-pages` path; then stated: “Great! I've loaded the Kilotest tools. Now let me first check if Salesforce has already been tested by calling the summarize function. If not, I'll submit it for testing.”
123
+ For this trial, then, the user selected Claude Haiku 4.5 with Extended capability.
125
124
 
126
- These results show that on the `claude.ai` platform, when configured with Kilotest as a connector, even Claude Haiku, in its Extended mode, can decide to use Kilotest when appropriate and can check for existing data before deciding to recommend that Kilotest test a page. This correct sequencing may be a result of the revised path descriptions in the `openapi.yaml` file recommending that sequencing.
125
+ The record of thinking began with:
127
126
 
128
- In the first run, Claude Haiku paused for about 5 minutes and then reported: “The Kilotest service seems to have timed out. Let me try a different approach - I'll fetch the Salesforce home page directly and evaluate it myself for accessibility and usability issues.”
127
+ ```text
128
+ The user is asking about the quality of the home page for a nonprofit organization called "Open Secrets," specifically regarding accessibility and usability defects.
129
+ This is asking for a quality assessment of a web page. Looking at my available tools, I have access to Kilotest, which has tools for analyzing web page quality:
130
+
131
+ Kilotest:describeQualityOfOneWebPage — Returns data from a specified Kilotest report about issues of front-end quality
132
+ Kilotest:recommendQualityTestingOfOneWebPage — Recommends a web page for Kilotest to test for front-end quality
133
+ Kilotest:summarizeQualityOfAllTestedWebPages — Returns summary data from every available Kilotest report
134
+
135
+ I should first use the recommendQualityTestingOfOneWebPage tool to see if I can get Kilotest to test the Open Secrets home page for quality issues including accessibility and usability.
136
+ First, I need to find the correct URL for Open Secrets. Open Secrets is a well-known nonprofit that tracks money in politics. Their homepage should be at opensecrets.org. Let me use the Kilotest tool to request that they test this page for quality issues.
137
+ ```
138
+
139
+ As shown, the discovery was immediate, but the utilization strategy was confused. The sequencing instructions failed to shape the model’s utilization planning at first.
140
+
141
+ The model then output:
142
+
143
+ `I can help you assess the quality of the Open Secrets home page using Kilotest, which specializes in finding accessibility and usability defects. Let me recommend their home page for testing.`
144
+
145
+ After contemplating this errant plan, the model then corrected its thinking and resumed with `Good! I now have access to the Kilotest tools. Let me first check if there's already a report on the Open Secrets home page by calling summarizeQualityOfAllTestedWebPages.`.
146
+
147
+ The thinking record continues with the Kilotest logo and `Summarize quality of all tested web pages`, the `annotations.title` property of the appropriate tool, followed by the complete output of that tool, and then:
148
+
149
+ ```text
150
+ Great! I found the Open Secrets website has already been tested by Kilotest. Here's the information:
151
+ Identifier: 260519T1437-79l
152
+ Creation date: 2026-05-19T14:37:00.000Z (33 days ago from June 21)
153
+ URL: https://www.opensecrets.org/
154
+ Number of issues reported: 80
155
+ Number of HTML elements reported as exhibiting issues: 1631
156
+ Rule engines that reported issues: 9 (Accessibility Checker, Alfa, ASLint, Axe, Editoria11y, Html Checker API, HTML CodeSniffer, QualWeb, Testaro)
157
+
158
+ Now I should call the describeQualityOfOneWebPage tool to get the detailed information about these quality defects.
159
+ ```
160
+
161
+ The model then output: `Great news! I found that Open Secrets' home page has already been tested by Kilotest. Let me get the detailed quality report for you.`
162
+
163
+ The thinking record continued with `Synthesized accessibility audit findings to assess website quality defects`, the Kilotest logo, the title of the next tool (`Describe the quality of one web page`), and the complete result of that tool call.
164
+
165
+ The final segment of the thinking record contained:
166
+
167
+ ```text
168
+ Perfect! Now I have a detailed report about the Open Secrets home page. Let me synthesize this information to provide a comprehensive answer to the user. The report shows:
169
+
170
+ Total issues: 80
171
+ 18 highest priority issues
172
+ 23 high priority issues
173
+ 10 low priority issues
174
+ 29 lowest priority issues
175
+ 1,631 HTML elements with issues
176
+
177
+ Key defects include:
178
+
179
+ Images without text (12 elements) - major accessibility issue
180
+ Links without names (4 elements) - critical for accessibility
181
+ Duplicate IDs (689 elements!) - this is a huge problem
182
+ Elements that are focusable but hidden (14 elements)
183
+ Buttons without text (8 elements)
184
+ Poor contrast (60 elements)
185
+ Visible labels not in accessible names (39 elements)
186
+ Invalid roles (28 elements)
187
+ Various usability issues like poor focus indication, small targets, etc.
188
+
189
+ The most critical issues are those marked as "highest priority" which relate to WCAG 2.2 success criteria.
190
+ ```
191
+
192
+ This record shows the model adding its own knowledge and judgment to the Kilotest results.
193
+
194
+ The final output to the user was:
195
+
196
+ ```markdown
197
+ ## Open Secrets Home Page Quality Assessment
198
+
199
+ Based on Kilotest's comprehensive testing of **https://www.opensecrets.org/**, the home page has **significant accessibility and usability defects**. Here's what was found:
200
+
201
+ ### Overall Quality Summary
202
+
203
+ - **Total Issues Reported:** 80
204
+ - **HTML Elements with Issues:** 1,631
205
+ - **Priority Breakdown:**
206
+ - 18 highest priority issues
207
+ - 23 high priority issues
208
+ - 10 low priority issues
209
+ - 29 lowest priority issues
210
+
211
+ ### Critical Accessibility Issues (Highest Priority)
212
+
213
+ 1. **Missing Image Text** (12 elements)
214
+ - Users cannot understand what images depict
215
+ - Violates WCAG 2.2 criterion 1.1.1
216
+
217
+ 2. **Links Without Names** (4 elements)
218
+ - Users can't tell where links lead
219
+ - Impacts navigation understanding
220
+
221
+ 3. **Duplicate HTML IDs** (689 elements!) ⚠️ **Most Severe**
222
+ - This is the most widespread issue
223
+ - Users may be pointed to wrong page sections
224
+ - Violates WCAG 2.2 criterion 4.1.1
225
+
226
+ 4. **Focusable Elements That Are Hidden** (14 elements)
227
+ - Keyboard-only users cannot navigate properly
228
+ - Disrupts keyboard accessibility
229
+
230
+ 5. **Buttons Without Names** (8 elements)
231
+ - Users cannot understand button purposes
232
+
233
+ 6. **Form Controls Without Labels** (12 elements)
234
+ - Users can't understand how to use forms
235
+
236
+ 7. **Poor Text Contrast** (60 elements)
237
+ - Content is difficult to read
238
+ - Impacts users with low vision
239
+
240
+ ### Significant Usability Issues (High Priority)
241
+
242
+ - **Visible Labels Not in Accessible Names** (39 elements) — users struggle to find controls
243
+ - **Invalid Roles** (28 elements) — assistive technologies misdescribe content
244
+ - **Missing Submit Buttons** on forms (6 elements)
245
+ - **Nested Controls** (17 elements) — users cannot reliably click intended items
246
+
247
+ ### Additional Readability Issues
248
+
249
+ - **Poor Focus Indication** (63 elements) — keyboard users can't see where they are
250
+ - **Small/Close Targets** (7 elements) — difficult to click on mobile or for users with motor disabilities
251
+ - **Low Line Height** (269 elements) — text is cramped and hard to read
252
+ - **Small Font Size** (2 elements)
253
+ - **Absolute Box Sizes** (121 elements) — prevents text enlargement for readability
254
+
255
+ ### Test Coverage
256
+
257
+ Nine different accessibility testing engines detected issues, including Accessibility Checker (IBM), Axe (Deque), WAVE (Utah State University), and others. The WAVE tool timed out at 25 seconds, suggesting possible performance issues.
258
+
259
+ ### Recommendations
260
+ The **duplicate ID problem** (689 instances) should be addressed first, as it's both widespread and critical to document structure. After that, focus on naming images, links, and form controls, which are foundational accessibility requirements.
261
+ ```
262
+
263
+ The model had decided which data from Kilotest to provide with its own elaborations and interpretations, which to provide as-is, and which to summarize.
264
+
265
+ The results of this increment can be considered a success, including the fact that the model interpreted the misspelled word "accissibility" as "accessibility".
266
+
267
+ However, some further improvements merit consideration:
268
+
269
+ - The instructions can be further improved to more reliably deter models from planning to recommend testing a page before learning whether a report about the page is already available. They sufficed in this trial to avoid an unnecessary tool call, but, given the stochastic behavior of models, the initially deviant thinking suggests that might not always be the case.
270
+ - The possibility of incorrect attribution to Kilotest of the interpretive content provided by the model can be mitigated with disclaimers, such as “Kilotest makes no recommendations about methods or sequences of issue remediation”, or with advice to models to describe their interpretations as theirs.
package/index.html CHANGED
@@ -15,15 +15,15 @@
15
15
  "name": "Kilotest",
16
16
  "sameAs": "https://github.com/jrpool/kilotest",
17
17
  "datePublished": "2025-10-14",
18
- "applicationCategory": "Front-end quality (i.e. web accessibility, usability, and standard conformity) testing and reporting service",
18
+ "applicationCategory": "Front-end quality (i.e. web accessibility, usability, and standards conformity) testing and reporting service",
19
19
  "isAccessibleForFree": true,
20
- "description": "Testing of web pages for front-end quality (i.e. web accessibility, usability, and standard conformity) with an ensemble of 10 rule engines and provision of results at selectable levels of detail.",
20
+ "description": "Testing of web pages for front-end quality (i.e. web accessibility, usability, and standards conformity) with an ensemble of 10 rule engines and provision of results at selectable levels of detail.",
21
21
  "softwareHelp": {
22
22
  "@type": "CreativeWork",
23
23
  "url": "https://kilotest.com/tutorial.html"
24
24
  },
25
25
  "featureList": [
26
- "Automated testing of front-end quality (i.e. web accessibility, usability, and standard conformity)",
26
+ "Automated testing of front-end quality (i.e. web accessibility, usability, and standards conformity)",
27
27
  "WCAG 2.2 conformance evaluation",
28
28
  "HTML5 conformance evaluation",
29
29
  "CSS3 conformance evaluation",
package/llms-full.txt CHANGED
@@ -66,9 +66,9 @@ Kilotest offers a comprehensive suite of capabilities to users via its web UI:
66
66
  - [Tutorial, “Accessibility testing strategies”](https://kilotest.com/tutorial.html)
67
67
  - [Provide a list of features available for use of Kilotest managers](https://kilotest.com/manage.html)
68
68
 
69
- ### Agent API
69
+ ### API
70
70
 
71
- Kilotest is implementing a richer suite of capabilities optimized for AI agents, including direct immediate testing. The implementation is currently in an alpha phase and offers 3 tools:
71
+ Kilotest is implementing a richer suite of capabilities optimized for language models, including direct immediate testing. The implementation is currently in an alpha phase and offers 3 tools:
72
72
 
73
73
  - `summarizeQualityOfAllTestedWebPages`
74
74
  - method: `GET`
@@ -81,16 +81,17 @@ Kilotest is implementing a richer suite of capabilities optimized for AI agents,
81
81
  - parameters
82
82
  - `timeStamp`: initial segment of report identifier
83
83
  - `jobID`: final segment of report identifier
84
- - source of parameters: response to a `targets` request
84
+ - source of parameters: response to a `summarizeQualityOfAllTestedWebPages` request
85
85
  - `recommendQualityTestingOfOneWebPage`
86
86
  - method: `POST`
87
- - purpose: recommend quality testing of one web page
87
+ - purpose: recommend quality testing of one web page about which no report is available
88
88
  - path: `/api/testRecForm`
89
89
  - payload properties
90
90
  - `what`: description of the page to be tested
91
91
  - `url`: URL of the page to be tested
92
92
  - `why`: reason for testing the page
93
93
  - how to verify disposition of the recommendation: submit a `summarizeQualityOfAllTestedWebPages` request and inspect the response to determine whether a report on the page is now available
94
+ - expected wait until a testing recommendation has been fulfilled: 1 day
94
95
 
95
96
  An [OpenAPI specification for Kilotest](https://kilotest.com/openapi.yaml) is available.
96
97
 
@@ -98,6 +99,14 @@ An [OpenAPI specification for Kilotest](https://kilotest.com/openapi.yaml) is av
98
99
 
99
100
  More information about Kilotest features and internals:
100
101
 
102
+ #### For language models and AI platforms
103
+
104
+ - [AI-TOOL file](https://github.com/jrpool/kilotest/blob/main/AI-TOOL.md)
105
+ - [OpenAPI specification for Kilotest](https://kilotest.com/openapi.yaml)
106
+ - [Tool caller for testing](https://github.com/jrpool/kilotest/blob/main/researchAgent.js)
107
+
108
+ #### General
109
+
101
110
  - [Introduction to Kilotest](https://kilotest.com/)
102
111
  - [Tutorial, “Accessibility testing strategies”](https://kilotest.com/tutorial.html)
103
112
  - [README file](https://github.com/jrpool/kilotest/blob/main/README.md)
package/mcp.js CHANGED
@@ -21,10 +21,10 @@ const createMCPServer = () => {
21
21
  server.registerTool(
22
22
  'summarizeQualityOfAllTestedWebPages',
23
23
  {
24
- description: 'Returns summary data from every available Kilotest report about the front-end quality (i.e. accessibility, usability, and standard conformity) of a web page. Before calling describeQualityIssuesOfOneWebPage, call this tool to check whether a report about the page is available.',
24
+ description: 'Returns summary data from every available Kilotest report about the front-end quality (i.e. accessibility, usability, and standards conformity) of a web page. Before calling recommendQualityTestingOfOneWebPage and waiting for testing to occur, call this tool to check whether a report about the page is available for immediate use.',
25
25
  inputSchema: {},
26
26
  annotations: {
27
- title: 'Summarize quality of all tested web pages',
27
+ title: 'Summarize the quality of all tested web pages',
28
28
  readOnlyHint: true,
29
29
  idempotentHint: true,
30
30
  destructiveHint: false,
@@ -39,7 +39,7 @@ const createMCPServer = () => {
39
39
  server.registerTool(
40
40
  'describeQualityOfOneWebPage',
41
41
  {
42
- description: 'Returns data from a specified Kilotest report about issues of front-end quality (i.e. accessibility, usability, and standard conformity) of a web page. The required timeStamp and jobID parameters identify the report and are obtained from a summarizeQualityOfAllTestedWebPages response.',
42
+ description: 'Returns data from a specified Kilotest report about issues of front-end quality (i.e. accessibility, usability, and standards conformity) of a web page. The required timeStamp and jobID parameters identify the report and are obtained from a summarizeQualityOfAllTestedWebPages response.',
43
43
  inputSchema: {
44
44
  timeStamp: z.string().describe('Report timestamp in YYMMDDTHHMM format, e.g. 260503T0432'),
45
45
  jobID: z.string().describe('Job identifier, e.g. x9z')
@@ -60,7 +60,7 @@ const createMCPServer = () => {
60
60
  server.registerTool(
61
61
  'recommendQualityTestingOfOneWebPage',
62
62
  {
63
- description: 'Recommends a web page for Kilotest to test for front-end quality (i.e. accessibility, usability, and standard conformity). Do not call this tool unless summarizeQualityOfAllTestedWebPages discloses that no report about the page or a related page that satisfies your requirements is available.',
63
+ description: 'Recommends a web page for Kilotest to test for front-end quality (i.e. accessibility, usability, and standards conformity). Do not call this tool until after you call summarizeQualityOfAllTestedWebPages to check whether a report about the page, or a related page that satisfies your requirements, is available.',
64
64
  inputSchema: {
65
65
  what: z.string().describe('Short description of the page, following the naming conventions visible in the summarizeQualityOfAllTestedWebPages response'),
66
66
  url: z.string().describe('Full HTTPS URL of the page to test'),
package/openapi.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  openapi: 3.1.0
2
2
  info:
3
3
  title: Kilotest Agent API
4
- description: Kilotest tests web pages for front-end quality (i.e. accessibility, usability, and standard conformity) using an ensemble of ten independent rule engines that employ rule-based and machine-learning-based methods. This API enables AI agents to recommend web pages for testing, discover available test reports, and retrieve data from reports. For background on Kilotest and the advantages of ensemble testing, visit https://kilotest.com.
4
+ description: Kilotest tests web pages for front-end quality (i.e. accessibility, usability, and standards conformity) using an ensemble of ten independent rule engines that employ rule-based and machine-learning-based methods. This API enables AI agents to recommend web pages for testing, discover available test reports, and retrieve data from reports. For background on Kilotest and the advantages of ensemble testing, visit https://kilotest.com.
5
5
  version: 1.0.0
6
6
  contact:
7
7
  name: Kilotest
@@ -30,7 +30,7 @@ paths:
30
30
  post:
31
31
  operationId: recommendQualityTestingOfOneWebPage
32
32
  summary: Recommend quality testing of one web page
33
- description: Submit a recommendation for Kilotest to test, for the first time, a particular web page for front-end quality (i.e. accessibility, usability, and standard conformity). Recommendations are typically approved and the testing completed within a day, whereupon the results can be found with the summarizeQualityOfAllTestedWebPages operation. Before submitting a recommendation, use the summarizeQualityOfAllTestedWebPages operation to ensure that no report about the page is available, and also to see the stylistic rules for the naming of pages. An attempt to recommend a page for testing will fail if a report about the page is already available.
33
+ description: Submit a recommendation for Kilotest to test, for the first time, a particular web page for front-end quality (i.e. accessibility, usability, and standards conformity). Recommendations are typically approved and the testing completed within a day, whereupon the results can be found with the summarizeQualityOfAllTestedWebPages operation. Before submitting a recommendation, use the summarizeQualityOfAllTestedWebPages operation to ensure that no report about the page is available, and also to see the stylistic rules for the naming of pages. An attempt to recommend a page for testing will fail if a report about the page is already available.
34
34
  requestBody:
35
35
  description: Test recommendation specifications
36
36
  required: true
@@ -83,7 +83,7 @@ paths:
83
83
  get:
84
84
  operationId: describeHTMLElementsHavingOneQualityIssue
85
85
  summary: Describe HTML elements having one quality issue (NOT YET IMPLEMENTED)
86
- description: Get issue-specific data from a specified report about the front-end quality (i.e. accessibility, usability, and standard conformity) of a web page. The data describe the issue, all of the HTML elements of the page that have the issue, and, for each such element, URLs for retrieving diagnoses by rule engines of the issue on the element. NOT YET IMPLEMENTED.
86
+ description: Get issue-specific data from a specified report about the front-end quality (i.e. accessibility, usability, and standards conformity) of a web page. The data describe the issue, all of the HTML elements of the page that have the issue, and, for each such element, URLs for retrieving diagnoses by rule engines of the issue on the element. NOT YET IMPLEMENTED.
87
87
  parameters:
88
88
  - name: issueID
89
89
  in: path
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@jrpool/kilotest",
3
- "version": "33.0.0",
3
+ "version": "33.0.1",
4
4
  "description": "An ensemble testing service with a focus on accessibility",
5
5
  "main": "index.js",
6
6
  "scripts": {