@jrpool/kilotest 31.2.4 → 33.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AI-TOOL.md CHANGED
@@ -1,27 +1,40 @@
1
- # Kilotest as an AI Tool
1
+ # Kilotest as connector to and creator of AI Tools
2
2
 
3
3
  ## Introduction
4
4
 
5
- Until 2026 Kilotest was intended, and implemented, as a web application for human users.
5
+ Until 2026 Kilotest was intended, and implemented, as a web application performing a service for human users.
6
6
 
7
- Beginning in May 2026, it [became evident](https://github.com/jrpool/kilotest/issues/2) that Kilotest could also act as a tool for use by language models. Language models are asked for help in all domains, including the domain of software quality. When asked about the front-end quality (accessibility, usability, and standard conformity) of specific web pages, models gave answers without the use of tools, and the answers were inferior: simplistic, speculative, and fabricated. If a model were to use Kilotest as a tool, the model could give comprehensive, authoritative, grounded, and factual answers. Every reported defect could be documented and ascribed to one or more specific tools in the Kilotest ensemble.
7
+ Beginning in May 2026, it [became evident](https://github.com/jrpool/kilotest/issues/2) that Kilotest could also act as a connector to tools for use by language models. Language models are asked for help in all domains, including the domain of software quality. When asked about the front-end quality (accessibility, usability, and standards conformity) of specific web pages, models gave answers without the use of tools. The answers were at best fragmentary but often speculative and fabricated. If Kilotest offered to connect language models to its functionalities as tools, models could give more inexpensive, comprehensive, factual, authoritative, and grounded answers. Every reported defect could be documented and ascribed to one or more specific rule engines in the Kilotest ensemble.
8
8
 
9
- Given the potential of Kilotest as an AI tool and the expected continued growth in the share of questions that are directed to language models, a decision was made to make Kilotest discoverable and usable as a tool for language models.
9
+ Given the potential of Kilotest to serve language models and the expected continued growth in the share of questions that are directed to AI platforms that use language models, a decision was made to **make Kilotest discoverable, usable, and, where appropriate, used as a connector to, and creator of, tools for language models**.
10
+
11
+ ## Terms
12
+
13
+ - **Language model**: A model (e.g., Claude Haiku 4.5, Kimi K2.6, GPT-5.4, Gemini 3.1 Pro) that can consume and generate text, images, or other content.
14
+ - **AI platform**: A platform (e.g., Claude Desktop, Perplexity, ChatGPT, Gemini 3.5 Flash) that gives human users access to the services of language models and connects language models to productivity resources.
15
+ - **tool**: A specialized productivity resource providing a capability that the language model itself needs but does not have. In the OpenAPI specification, the term _operation_ is used to mean tool.
16
+ - **connector**: A service that allows AI platforms to enable their language models to discover, evaluate, and use tools.
17
+
18
+ Tools can be independent of connectors that connect them to language models, but Kilotest can be both. It already provides capabilities to human users; it can re-use the same functions that underlie those capabilities and reconfigure those functions as tools for language models. Kilotest can also provide connectors to those tools for language models and make those connectors compatible with AI platforms.
10
19
 
11
20
  ## Internal additions
12
21
 
13
- The internal changes that have been made in the Kilotest codebase to support the use of Kilotest as an AI tool are:
22
+ The internal changes that have been made in the Kilotest codebase to support the use of Kilotest as a set of tools and connectors to them are:
14
23
 
15
24
  - An API, consisting of:
16
25
  - Additions to `index.js`.
17
- - Addition to `util.js`.
26
+ - Additions to `util.js`.
18
27
  - Within directories providing API functionality:
19
28
  - `api.js` modules.
20
29
  - `util.js` modules, if needed, containing resources shared by the `index.js` and `api.js` modules in those directories.
30
+ - A [`JSON-LD`](https://json-ld.org/) script in the `index.html` file, providing structured data about the Kilotest API.
21
31
  - Additions to the `env.example` file.
22
32
  - An `llms.txt` file and an `llms-full.txt` file, documenting the use of Kilotest by language models, conforming to the [llms-txt](https://llmstxt.org/) specification.
23
33
  - An `openapi.yaml` file, documenting the Kilotest API, conforming to the [OpenAPI specification](https://spec.openapis.org/oas/v3.1.0).
34
+ - An `mcp.js` file, providing an MCP server for Kilotest.
24
35
  - A `sitemap.xml` file.
36
+ - Additions to the `README.md` file.
37
+ - This `AI-TOOL.md` file.
25
38
  - A `researchAgent.js` file, testing the API.
26
39
 
27
40
  ## External actions
@@ -32,35 +45,226 @@ The external actions that have been taken to support the use of Kilotest as an A
32
45
  - A [request](https://rapidapi.com/studio/api_91f2ce07-2572-48bd-a34d-ff01ed6cd039/publish/general) to add Kilotest to the Rapid API Hub.
33
46
  - An [issue](https://github.com/APIs-guru/openapi-directory/issues/2677) to add Kilotest to `openapi-directory`.
34
47
  - A [pull request](https://github.com/w3c/wai-evaluation-tools-list/pull/1153) to add Kilotest to the WAI evaluation tools list.
35
- - [Configuration of Claude Desktop](https://github.com/ivo-toby/mcp-openapi-server#option-1-using-with-claude-desktop-stdio-transport) on the local development host to let Claude models find Kilotest.
48
+ - [Configuration of Claude Desktop](https://github.com/ivo-toby/mcp-openapi-server#option-1-using-with-claude-desktop-stdio-transport) on the local development host to connect Claude Desktop models to Kilotest tools.
49
+ - Deployment of an MCP server in HTTP mode on the Kilotest service host.
36
50
 
37
51
  ## Use cases
38
52
 
39
- The rationale for Kilotest as a tool for language models is set forth in the `llms-full.txt` file and is not repeated here.
53
+ The rationale for Kilotest as a connector to tools for language models is set forth in the `llms-full.txt` file and is not repeated here.
40
54
 
41
55
  Some common anticipated use cases for this role are:
42
56
 
43
- - A user of a web-builder platform with responsibility for a website asks an AI platform for help in creating or improving the website.
44
- - A prospective customer of a web development service asks an AI platform to evaluate the quality of websites in the portfolios of candidate vendors.
45
- - A professional web developer within an organization asks an AI platform for a code review.
46
- - A risk-management professional within an organization asks an AI platform to report on any web accessibility defects that could expose the organization to prosecution or civil litigation.
47
- - An person who depends on web accessibility because of disabilities asks an AI platform to provide documentary support for a complaint to the owner of a website about accessibility defects.
48
- - A disability-rights advocate or attorney concerned with inaccessibility in a particular industry asks an AI platform to perform a front-end-quality comparison of some websites in that industry.
57
+ 1. A user of a web-builder platform with responsibility for a website asks an AI platform for help in creating or improving the website.
58
+ 1. A prospective customer of a web development service asks an AI platform to evaluate the quality of websites in the portfolios of candidate vendors.
59
+ 1. A professional web developer within an organization asks an AI platform for a code review.
60
+ 1. A risk-management professional within an organization asks an AI platform to report on any web accessibility defects that could expose the organization to prosecution or civil litigation for disability discrimination.
61
+ 1. A person who depends on web accessibility because of disabilities asks an AI platform to provide documentary support for a complaint to the owner of a website about accessibility defects.
62
+ 1. A disability-rights advocate or attorney concerned with inaccessibility in a particular industry asks an AI platform to perform a front-end-quality comparison of some websites in that industry.
49
63
 
50
- ## Strategy
64
+ Among these use cases, case 3 would make it feasible for the user to tell an AI platform explicitly and formally that Kilotest is an available connector to tools that are relevant to the task. All 5 of the other use cases would not make that feasible. In those 5 use cases, the user has a question but relies on the AI platform to know or discover which relevant connectors exist, to select appropriate connectors, and to provide language models that can use those connectors and tools that they connect to.
51
65
 
52
- ### Ease of use
66
+ Use cases 1, 2, 4, 5, and 6 exemplify a widespread expectation and demand for AI platform capability. The commonality is: “I have a question; answer it.” If Kilotest can be employed as an expert for AI platforms in relevant cases, platforms will be more successful in satisfying that demand. At present this is a difficult problem because of platform limitations and a lack of standardization.
53
67
 
54
- Any use of Kilotest as a tool of language models will fail for some of the anticipated users if they are obligated to be aware of Kilotest and to translate that awareness into instructions or documentation. Therefore, for success in all the use cases, users should be able to tell models what is wanted and rely on models to figure out whether they need tools and, if so, which ones and how to use them.
68
+ ## Strategy
69
+
70
+ The Kilotest project is attempting to solve the just-described problem. That attempt follows the following strategy.
55
71
 
56
72
  ### Increments
57
73
 
58
- The standardization of tool discovery and utilization by and for language models and AI platforms is partial. Major differences in protocols exist among model and platform families. Therefore, small testable increments of improvement in the tool functionality of Kilotest can best be defined by model and platform family. For example, working on discoverability first and then on usability would not be effective, because both are complex and testability would be postponed until both were complete. Instead, a coherent model and platform family should be identified and any and all improvements to make use cases successful within that family should be made and tested, before development proceeds to the next family.
74
+ Connector and tool discovery and utilization have been only partly standardized. Major differences in protocols exist among model and platform families. Therefore, small testable increments of improvement in the connector and tool functionality of Kilotest can best be defined by model and platform family and by use-case characteristics.
59
75
 
60
- One benefit of this type of incrementalism is that, after the first increment succeeds, it becomes possible to make and test external changes publicizing the fact that a particular platform-model combination delivers unprecedentedly comprehensive, truthful, and low-cost answers to questions about front-end web quality.
76
+ One benefit of such incrementalism is that, after the first increment succeeds, it becomes possible to make and test external changes publicizing the fact that a particular platform-model combination delivers unprecedentedly inexpensive, comprehensive, and truthful answers in a particular class of use cases to questions about front-end web quality.
61
77
 
62
78
  Another benefit is that subsequent increments can be defined incrementally rather than in advance. Lessons learned from the work on each increment can inform the choice of what to work on next.
63
79
 
64
80
  #### Increment 1
65
81
 
66
- In the first increment, the objective is to make Kilotest automatically discovered and used by Anthropic Claude models when used within the Claude Desktop application. That investigation is under way. Results will be summarized here.
82
+ In the first increment, the objective was to make Anthropic Claude models use Kilotest to help them answer questions from developers using the Claude Desktop application for use case 3, where the code in question is already deployed as a public web page.
83
+
84
+ As mentioned above, Claude Desktop was installed on the local development host and then configured for Kilotest. That configuration consisted of the addition of a property to the Claude Desktop configuration file, located at `~/Library/Application Support/Claude/claude_desktop_config.json`. The added property is:
85
+
86
+ ```json
87
+ "mcpServers": {
88
+ "kilotest": {
89
+ "command": "npx",
90
+ "args": [
91
+ "-y",
92
+ "@ivotoby/openapi-mcp-server",
93
+ "--disable-abbreviation",
94
+ "true"
95
+ ],
96
+ "env": {
97
+ "API_BASE_URL": "https://kilotest.com",
98
+ "OPENAPI_SPEC_PATH": "https://kilotest.com/openapi.yaml"
99
+ }
100
+ }
101
+ },
102
+ ```
103
+
104
+ That configuration notifies models that Kilotest is available as an MCP server, but does not explain what tools it provides, what they are useful for, or how to use them.
105
+
106
+ Experimentation revealed that Claude Sonnet 4.6 with Medium effort chose to use Kilotest when answering questions about the accessibility and usability of particular public web pages. Low effort did not result in the use of Kilotest. The less capable Claude Haiku model did not use Kilotest.
107
+
108
+ ### Increment 2
109
+
110
+ Increment 2 somewhat lightens the burden on the user by shifting the user environment from an installed application to a web browser, and by providing an easier interface for users to tell the platform about Kilotest. The platform is the `claude.ai` web application instead of Claude Desktop. Instead of editing a JSON file, the user tells the platform about Kilotest with a setting.
111
+
112
+ The investigation began with the addition of Kilotest as a _connector_ to the `claude.ai` web application. The user used the `Customize/Connectors/Add connector/Add custom connector` interface, providing these data before activating the `Add` button:
113
+
114
+ - Name: Kilotest
115
+ - Remote MCP server URL: `https://kilotest.com/mcp`
116
+
117
+ When the connector configuration with tools appeared, the user changed the all tool permissions from `Needs approval` to `Always allow`.
118
+
119
+ Then the user asked the platform, “I want to know whether the home page of the nonprofit organization named ‘Open Secrets’ has quality defects, such as with accissibility [_sic_] and usability.”
120
+
121
+ An earlier trial had revealed that the least powerful model, Claude Haiku 4.5 without the Extended capability, did not consider using any tools.
122
+
123
+ For this trial, then, the user selected Claude Haiku 4.5 with Extended capability.
124
+
125
+ The record of thinking began with:
126
+
127
+ ```text
128
+ The user is asking about the quality of the home page for a nonprofit organization called "Open Secrets," specifically regarding accessibility and usability defects.
129
+ This is asking for a quality assessment of a web page. Looking at my available tools, I have access to Kilotest, which has tools for analyzing web page quality:
130
+
131
+ Kilotest:describeQualityOfOneWebPage — Returns data from a specified Kilotest report about issues of front-end quality
132
+ Kilotest:recommendQualityTestingOfOneWebPage — Recommends a web page for Kilotest to test for front-end quality
133
+ Kilotest:summarizeQualityOfAllTestedWebPages — Returns summary data from every available Kilotest report
134
+
135
+ I should first use the recommendQualityTestingOfOneWebPage tool to see if I can get Kilotest to test the Open Secrets home page for quality issues including accessibility and usability.
136
+ First, I need to find the correct URL for Open Secrets. Open Secrets is a well-known nonprofit that tracks money in politics. Their homepage should be at opensecrets.org. Let me use the Kilotest tool to request that they test this page for quality issues.
137
+ ```
138
+
139
+ As shown, the discovery was immediate, but the utilization strategy was confused. The sequencing instructions failed to shape the model’s utilization planning at first.
140
+
141
+ The model then output:
142
+
143
+ `I can help you assess the quality of the Open Secrets home page using Kilotest, which specializes in finding accessibility and usability defects. Let me recommend their home page for testing.`
144
+
145
+ After contemplating this errant plan, the model then corrected its thinking and resumed with `Good! I now have access to the Kilotest tools. Let me first check if there's already a report on the Open Secrets home page by calling summarizeQualityOfAllTestedWebPages.`.
146
+
147
+ The thinking record continues with the Kilotest logo and `Summarize quality of all tested web pages`, the `annotations.title` property of the appropriate tool, followed by the complete output of that tool, and then:
148
+
149
+ ```text
150
+ Great! I found the Open Secrets website has already been tested by Kilotest. Here's the information:
151
+ Identifier: 260519T1437-79l
152
+ Creation date: 2026-05-19T14:37:00.000Z (33 days ago from June 21)
153
+ URL: https://www.opensecrets.org/
154
+ Number of issues reported: 80
155
+ Number of HTML elements reported as exhibiting issues: 1631
156
+ Rule engines that reported issues: 9 (Accessibility Checker, Alfa, ASLint, Axe, Editoria11y, Html Checker API, HTML CodeSniffer, QualWeb, Testaro)
157
+
158
+ Now I should call the describeQualityOfOneWebPage tool to get the detailed information about these quality defects.
159
+ ```
160
+
161
+ The model then output: `Great news! I found that Open Secrets' home page has already been tested by Kilotest. Let me get the detailed quality report for you.`
162
+
163
+ The thinking record continued with `Synthesized accessibility audit findings to assess website quality defects`, the Kilotest logo, the title of the next tool (`Describe the quality of one web page`), and the complete result of that tool call.
164
+
165
+ The final segment of the thinking record contained:
166
+
167
+ ```text
168
+ Perfect! Now I have a detailed report about the Open Secrets home page. Let me synthesize this information to provide a comprehensive answer to the user. The report shows:
169
+
170
+ Total issues: 80
171
+ 18 highest priority issues
172
+ 23 high priority issues
173
+ 10 low priority issues
174
+ 29 lowest priority issues
175
+ 1,631 HTML elements with issues
176
+
177
+ Key defects include:
178
+
179
+ Images without text (12 elements) - major accessibility issue
180
+ Links without names (4 elements) - critical for accessibility
181
+ Duplicate IDs (689 elements!) - this is a huge problem
182
+ Elements that are focusable but hidden (14 elements)
183
+ Buttons without text (8 elements)
184
+ Poor contrast (60 elements)
185
+ Visible labels not in accessible names (39 elements)
186
+ Invalid roles (28 elements)
187
+ Various usability issues like poor focus indication, small targets, etc.
188
+
189
+ The most critical issues are those marked as "highest priority" which relate to WCAG 2.2 success criteria.
190
+ ```
191
+
192
+ This record shows the model adding its own knowledge and judgment to the Kilotest results.
193
+
194
+ The final output to the user was:
195
+
196
+ ```markdown
197
+ ## Open Secrets Home Page Quality Assessment
198
+
199
+ Based on Kilotest's comprehensive testing of **https://www.opensecrets.org/**, the home page has **significant accessibility and usability defects**. Here's what was found:
200
+
201
+ ### Overall Quality Summary
202
+
203
+ - **Total Issues Reported:** 80
204
+ - **HTML Elements with Issues:** 1,631
205
+ - **Priority Breakdown:**
206
+ - 18 highest priority issues
207
+ - 23 high priority issues
208
+ - 10 low priority issues
209
+ - 29 lowest priority issues
210
+
211
+ ### Critical Accessibility Issues (Highest Priority)
212
+
213
+ 1. **Missing Image Text** (12 elements)
214
+ - Users cannot understand what images depict
215
+ - Violates WCAG 2.2 criterion 1.1.1
216
+
217
+ 2. **Links Without Names** (4 elements)
218
+ - Users can't tell where links lead
219
+ - Impacts navigation understanding
220
+
221
+ 3. **Duplicate HTML IDs** (689 elements!) ⚠️ **Most Severe**
222
+ - This is the most widespread issue
223
+ - Users may be pointed to wrong page sections
224
+ - Violates WCAG 2.2 criterion 4.1.1
225
+
226
+ 4. **Focusable Elements That Are Hidden** (14 elements)
227
+ - Keyboard-only users cannot navigate properly
228
+ - Disrupts keyboard accessibility
229
+
230
+ 5. **Buttons Without Names** (8 elements)
231
+ - Users cannot understand button purposes
232
+
233
+ 6. **Form Controls Without Labels** (12 elements)
234
+ - Users can't understand how to use forms
235
+
236
+ 7. **Poor Text Contrast** (60 elements)
237
+ - Content is difficult to read
238
+ - Impacts users with low vision
239
+
240
+ ### Significant Usability Issues (High Priority)
241
+
242
+ - **Visible Labels Not in Accessible Names** (39 elements) — users struggle to find controls
243
+ - **Invalid Roles** (28 elements) — assistive technologies misdescribe content
244
+ - **Missing Submit Buttons** on forms (6 elements)
245
+ - **Nested Controls** (17 elements) — users cannot reliably click intended items
246
+
247
+ ### Additional Readability Issues
248
+
249
+ - **Poor Focus Indication** (63 elements) — keyboard users can't see where they are
250
+ - **Small/Close Targets** (7 elements) — difficult to click on mobile or for users with motor disabilities
251
+ - **Low Line Height** (269 elements) — text is cramped and hard to read
252
+ - **Small Font Size** (2 elements)
253
+ - **Absolute Box Sizes** (121 elements) — prevents text enlargement for readability
254
+
255
+ ### Test Coverage
256
+
257
+ Nine different accessibility testing engines detected issues, including Accessibility Checker (IBM), Axe (Deque), WAVE (Utah State University), and others. The WAVE tool timed out at 25 seconds, suggesting possible performance issues.
258
+
259
+ ### Recommendations
260
+ The **duplicate ID problem** (689 instances) should be addressed first, as it's both widespread and critical to document structure. After that, focus on naming images, links, and form controls, which are foundational accessibility requirements.
261
+ ```
262
+
263
+ The model had decided which data from Kilotest to provide with its own elaborations and interpretations, which to provide as-is, and which to summarize.
264
+
265
+ The results of this increment can be considered a success, including the fact that the model interpreted the misspelled word "accissibility" as "accessibility".
266
+
267
+ However, some further improvements merit consideration:
268
+
269
+ - The instructions can be further improved to more reliably deter models from planning to recommend testing a page before learning whether a report about the page is already available. They sufficed in this trial to avoid an unnecessary tool call, but, given the stochastic behavior of models, the initially deviant thinking suggests that might not always be the case.
270
+ - The possibility of incorrect attribution to Kilotest of the interpretive content provided by the model can be mitigated with disclaimers, such as “Kilotest makes no recommendations about methods or sequences of issue remediation”, or with advice to models to describe their interpretations as theirs.
package/IDEAS.md CHANGED
@@ -1,4 +1,4 @@
1
- # development
1
+ # ideas
2
2
 
3
3
  Notes about the further development of this project.
4
4
 
package/README.md CHANGED
@@ -4,7 +4,7 @@ An ensemble testing service with a focus on accessibility
4
4
 
5
5
  ## Features
6
6
 
7
- This application uses an ensemble of 10 tools to test public web pages for standards conformity, usability, and accessibility.
7
+ This application uses an ensemble of 10 rule engines to test public web pages for front-end quality (i.e. accessibility, usability, and conformity to standards).
8
8
 
9
9
  The testing paradigm employed by Kilotest is discussed in these papers:
10
10
 
package/SERVICE.md CHANGED
@@ -239,7 +239,7 @@ In case Testaro again becomes a dependency of Kilotest, the notes below on secur
239
239
 
240
240
  ### Browser privileges
241
241
 
242
- Testaro uses Playwright to launch and control headless browsers, often `chromium`. Those browsers navigate to web pages that are tested by the tools that Testaro integrates. The `qualWeb` tool launches its own browser via Puppeteer as a dependency to perform its tests.
242
+ Testaro uses Playwright to launch and control headless browsers, often `chromium`. Those browsers navigate to web pages that are tested by the rule engines that Testaro integrates. The `qualWeb` tool launches its own browser via Puppeteer as a dependency to perform its tests.
243
243
 
244
244
  When either Playwright or Puppeteer launches a `chromium` browser, in most environments it is [sandboxed](https://www.geeksforgeeks.org/ethical-hacking/what-is-browser-sandboxing/). Sandboxing is a security feature that prevents the browser from accessing potentially unsafe system resources. But in the Ubuntu Linux operating system that was installed on the Vultr Cloud Compute host a sandboxed browser requires an [unprivileged user namespace](https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged-user-namespaces), and when Ubuntu was installed its configuration disallowed such namespaces. The file `/etc/sysctl.d/99-kilotest-userns.conf` with the content `kernel.apparmor_restrict_unprivileged_userns = 1` prohibited unprivileged user namespaces and thereby made sandboxed browsers unlaunchable.
245
245
 
@@ -37,7 +37,7 @@
37
37
  <li>Related WCAG standard: __wcag__</li>
38
38
  </ul>
39
39
  <h2>Diagnoses</h2>
40
- <p>Here is how tools diagnose the <q>__issue__</q> issue for HTML element __catalogIndex__ of the __target__ page.</p>
40
+ <p>Here is how rule engines diagnose the <q>__issue__</q> issue for HTML element __catalogIndex__ of the __target__ page.</p>
41
41
  <ol>
42
42
  __diagnoses__
43
43
  </ol>
package/index.html CHANGED
@@ -15,22 +15,22 @@
15
15
  "name": "Kilotest",
16
16
  "sameAs": "https://github.com/jrpool/kilotest",
17
17
  "datePublished": "2025-10-14",
18
- "applicationCategory": "Web accessibility, usability, and standard conformity testing and reporting service",
18
+ "applicationCategory": "Front-end quality (i.e. web accessibility, usability, and standards conformity) testing and reporting service",
19
19
  "isAccessibleForFree": true,
20
- "description": "Testing of web pages for accessibility, usability, and standard conformity with an ensemble of 10 tools and provision of results at selectable levels of detail.",
20
+ "description": "Testing of web pages for front-end quality (i.e. web accessibility, usability, and standards conformity) with an ensemble of 10 rule engines and provision of results at selectable levels of detail.",
21
21
  "softwareHelp": {
22
22
  "@type": "CreativeWork",
23
23
  "url": "https://kilotest.com/tutorial.html"
24
24
  },
25
25
  "featureList": [
26
- "Automated testing of web accessibility, usability, and standard conformity",
26
+ "Automated testing of front-end quality (i.e. web accessibility, usability, and standards conformity)",
27
27
  "WCAG 2.2 conformance evaluation",
28
28
  "HTML5 conformance evaluation",
29
29
  "CSS3 conformance evaluation",
30
- "Ensemble of 10 testing tools performing more than a thousand tests",
30
+ "Ensemble of 10 rule engines performing more than a thousand tests",
31
31
  "Multi-page, single-page, single-issue, and single-element reporting",
32
32
  "Priority-ranked issue lists",
33
- "Public API for agent integration"
33
+ "Public API, connector, and tools for language-model integration"
34
34
  ],
35
35
  "url": "https://kilotest.com/",
36
36
  "maintainer": {
@@ -63,17 +63,17 @@
63
63
  <summary>More than a thousand? Really?</summary>
64
64
  <h2>Ensemble testing</h2>
65
65
  <p>Kilotest does <strong>ensemble testing</strong> of web pages for <a href="https://www.w3.org/mission/accessibility/">accessibility</a>, usability, and conformity to HTML standards.</p>
66
- <p>Efficient testing <strong>first</strong> leverages <strong>automated tools</strong> and corrects the defects that they discover, <strong>then</strong> employs human testers.</p>
67
- <p>Choosing an automated tool is hard, because they <a href="https://www.w3.org/WAI/test-evaluate/tools/list/">are numerous</a> and <a href="https://arxiv.org/pdf/2304.07591">vary in what and how they test</a>.</p>
68
- <p><strong>Ensemble testing</strong>, instead of choosing a tool, combines multiple tools. It can discover more issues than single-tool testing, and the tools can check one another&rsquo;s quality.</p>
69
- <h2>The tools</h2>
70
- <p>The Kilotest ensemble is <a href="https://github.com/jrpool/testaro?tab=readme-ov-file#dependencies">10 tools</a> developed by teams in the USA, Canada, Denmark, Australia, and Portugal.</p>
71
- <p>The tools perform about 1,300 tests in total. <a href="https://medium.com/cvs-health-tech-blog/how-to-run-a-thousand-accessibility-tests-63692ad120c3">Open-source software</a> developed initially at CVS Health drives the tools, tries to weed out invalid results, and distills the findings into about 350 <q>issues</q>.</p>
66
+ <p>Efficient testing <strong>first</strong> leverages <strong>automated rule engines</strong> and corrects the defects that they discover, <strong>then</strong> employs human testers.</p>
67
+ <p>Choosing an automated rule engine is hard, because they <a href="https://www.w3.org/WAI/test-evaluate/tools/list/">are numerous</a> and <a href="https://arxiv.org/pdf/2304.07591">vary in what and how they test</a>.</p>
68
+ <p><strong>Ensemble testing</strong>, instead of choosing a rule engine, combines multiple rule engines. It can discover more issues than single-rule-engine testing, and the rule engines can check one another&rsquo;s quality.</p>
69
+ <h2>The rule engines</h2>
70
+ <p>The Kilotest ensemble is <a href="https://github.com/jrpool/testaro?tab=readme-ov-file#dependencies">10 rule engines</a> developed by teams in the USA, Canada, Denmark, Australia, and Portugal.</p>
71
+ <p>The rule engines perform about 1,300 tests in total. <a href="https://medium.com/cvs-health-tech-blog/how-to-run-a-thousand-accessibility-tests-63692ad120c3">Open-source software</a> developed initially at CVS Health drives the rule engines, tries to weed out invalid results, and distills the findings into about 350 <q>issues</q>.</p>
72
72
  <h2>Status</h2>
73
73
  <p>Kilotest is a proof of concept, launched in October 2025 and being actively developed. Your suggestions, bug reports, and collaboration proposals are welcome! Post them on <a href="https://github.com/jrpool/kilotest/issues">GitHub</a> or email them to <a href="mailto:info@kilotest.com">info@kilotest.com</a>.</p>
74
74
  <p>In the current version, users can get results of already performed tests and can recommend new tests. Kilotest managers can approve recommendations.</p>
75
75
  <h2 id="a11y">Accessibility</h2>
76
- <p>Kilotest intends to be accessible. At a minimum, Kilotest intends to conform to the rules of the tools in the Kilotest ensemble. Those rules enforce various standards and best practices, including many success criteria of conformance levels AA and AAA of the <a href="https://www.w3.org/WAI/standards-guidelines/wcag/">Web Content Accessibility Guidelines (WCAG)</a>.</p>
76
+ <p>Kilotest intends to be accessible. At a minimum, Kilotest intends to conform to the rules of the rule engines in the Kilotest ensemble. Those rules enforce various standards and best practices, including many success criteria of conformance levels AA and AAA of the <a href="https://www.w3.org/WAI/standards-guidelines/wcag/">Web Content Accessibility Guidelines (WCAG)</a>.</p>
77
77
  <p>Humans and automated agents encountering any accessibility barriers during the use of Kilotest are encouraged to report them on <a href="https://github.com/jrpool/kilotest/issues">GitHub</a> or by email to <a href="mailto:info@kilotest.com">info@kilotest.com</a>.</p>
78
78
  </details>
79
79
  <h2>What do you want to know?</h2>