@cleocode/skills 2026.4.5 → 2026.4.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/ct-adr-recorder/SKILL.md +175 -0
- package/skills/ct-adr-recorder/manifest-entry.json +30 -0
- package/skills/ct-adr-recorder/references/cascade.md +82 -0
- package/skills/ct-adr-recorder/references/examples.md +141 -0
- package/skills/ct-artifact-publisher/SKILL.md +146 -0
- package/skills/ct-artifact-publisher/manifest-entry.json +30 -0
- package/skills/ct-artifact-publisher/references/artifact-types.md +126 -0
- package/skills/ct-artifact-publisher/references/handler-interface.md +187 -0
- package/skills/ct-consensus-voter/SKILL.md +158 -0
- package/skills/ct-consensus-voter/manifest-entry.json +30 -0
- package/skills/ct-consensus-voter/references/matrix-examples.md +140 -0
- package/skills/ct-grade/references/token-tracking.md +2 -2
- package/skills/ct-grade/scripts/run_all.py +1 -1
- package/skills/ct-grade-v2-1/manifest-entry.json +1 -1
- package/skills/ct-ivt-looper/SKILL.md +181 -0
- package/skills/ct-ivt-looper/manifest-entry.json +30 -0
- package/skills/ct-ivt-looper/references/escalation.md +91 -0
- package/skills/ct-ivt-looper/references/frameworks.md +119 -0
- package/skills/ct-ivt-looper/references/loop-anatomy.md +156 -0
- package/skills/ct-orchestrator/manifest-entry.json +1 -1
- package/skills/ct-provenance-keeper/SKILL.md +161 -0
- package/skills/ct-provenance-keeper/manifest-entry.json +30 -0
- package/skills/ct-provenance-keeper/references/signing.md +188 -0
- package/skills/ct-provenance-keeper/references/slsa.md +121 -0
- package/skills/ct-release-orchestrator/SKILL.md +134 -0
- package/skills/ct-release-orchestrator/manifest-entry.json +30 -0
- package/skills/ct-release-orchestrator/references/composition.md +138 -0
- package/skills/ct-release-orchestrator/references/release-types.md +130 -0
- package/skills/ct-skill-creator/manifest-entry.json +1 -1
- package/skills/ct-skill-creator/references/provider-deployment.md +9 -9
- package/skills/ct-skill-validator/evals/evals.json +1 -1
- package/skills/ct-skill-validator/manifest-entry.json +1 -1
- package/skills/manifest.json +252 -16
- package/skills/ct-skill-creator/.cleo/.context-state.json +0 -13
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# Per-Artifact-Type Notes
|
|
2
|
+
|
|
3
|
+
Detailed notes for each of the nine registered artifact types. Read the entry for the type you are shipping.
|
|
4
|
+
|
|
5
|
+
## npm-package
|
|
6
|
+
|
|
7
|
+
**Registry**: `https://registry.npmjs.org` (default) or a private registry.
|
|
8
|
+
**Auth**: `NPM_TOKEN` env var, or OIDC trusted publishing in CI.
|
|
9
|
+
**Default publish command**: `npm publish`
|
|
10
|
+
**Version source**: `package.json:version`
|
|
11
|
+
**Idempotency**: errors on duplicate version.
|
|
12
|
+
|
|
13
|
+
### Edge cases
|
|
14
|
+
|
|
15
|
+
- **Scoped packages**: `@scope/pkg` requires `--access public` to publish publicly.
|
|
16
|
+
- **Provenance flag**: set `options.provenance: true` to emit SLSA L3 attestation via OIDC.
|
|
17
|
+
- **Monorepo**: each workspace member is a separate artifact entry; order them by dependency chain (leaves first).
|
|
18
|
+
- **Unpublish**: allowed within 72 hours, then rejected by registry policy.
|
|
19
|
+
- **Dry-run**: `npm publish --dry-run` produces the tarball without pushing.
|
|
20
|
+
|
|
21
|
+
## python-wheel
|
|
22
|
+
|
|
23
|
+
**Registry**: `https://upload.pypi.org/legacy/` (default) or private.
|
|
24
|
+
**Auth**: `TWINE_USERNAME=__token__`, `TWINE_PASSWORD=<pypi-api-token>`.
|
|
25
|
+
**Default build**: `python -m build`
|
|
26
|
+
**Default publish**: `twine upload dist/*`
|
|
27
|
+
**Version source**: `pyproject.toml:project.version`
|
|
28
|
+
|
|
29
|
+
### Edge cases
|
|
30
|
+
|
|
31
|
+
- **Build backend**: `setuptools`, `hatchling`, `poetry`, `pdm` — all emit wheels into `dist/` but may differ on metadata. Validate the wheel with `twine check dist/*.whl` before upload.
|
|
32
|
+
- **No unpublish**: PyPI does not allow unpublish. Yank via admin console only.
|
|
33
|
+
- **Dry-run**: `twine upload --repository testpypi` is the de-facto dry-run target.
|
|
34
|
+
|
|
35
|
+
## python-sdist
|
|
36
|
+
|
|
37
|
+
Same registry/auth as python-wheel. Build command is `python -m build --sdist`. Produces `.tar.gz` source distributions. Usually shipped alongside a wheel, not standalone.
|
|
38
|
+
|
|
39
|
+
## go-module
|
|
40
|
+
|
|
41
|
+
**Registry**: `proxy.golang.org` (immutable).
|
|
42
|
+
**Auth**: None — proxy publishes on git tag push.
|
|
43
|
+
**Default build**: `go mod tidy`
|
|
44
|
+
**Default publish**: tag push triggers the proxy.
|
|
45
|
+
**Version source**: `go.mod:module` + git tag.
|
|
46
|
+
|
|
47
|
+
### Edge cases
|
|
48
|
+
|
|
49
|
+
- **Immutable**: once a version is cached by the proxy, it cannot be changed or removed.
|
|
50
|
+
- **Retraction**: add a `retract` directive in `go.mod` for versions to flag as bad.
|
|
51
|
+
- **Module path**: must match the repo URL; a rename requires a new module path.
|
|
52
|
+
- **Major versions ≥ 2**: require the major suffix in the module path (`.../v2`).
|
|
53
|
+
|
|
54
|
+
## cargo-crate
|
|
55
|
+
|
|
56
|
+
**Registry**: `crates.io` (default).
|
|
57
|
+
**Auth**: `CARGO_REGISTRY_TOKEN` env var.
|
|
58
|
+
**Default build**: `cargo build --release`
|
|
59
|
+
**Default publish**: `cargo publish`
|
|
60
|
+
**Version source**: `Cargo.toml:package.version`
|
|
61
|
+
|
|
62
|
+
### Edge cases
|
|
63
|
+
|
|
64
|
+
- **Workspaces**: each publishable crate is a separate entry; order by dependency.
|
|
65
|
+
- **`cargo publish --dry-run`**: required by ARTP-002; the skill MUST run this first.
|
|
66
|
+
- **Yank**: `cargo yank --vers <version>` flags a version as unbuildable; no true unpublish.
|
|
67
|
+
- **Rate limits**: crates.io throttles publishes; back off on 429.
|
|
68
|
+
|
|
69
|
+
## ruby-gem
|
|
70
|
+
|
|
71
|
+
**Registry**: `rubygems.org` (default).
|
|
72
|
+
**Auth**: `GEM_HOST_API_KEY` env var.
|
|
73
|
+
**Default build**: `gem build *.gemspec`
|
|
74
|
+
**Default publish**: `gem push *.gem`
|
|
75
|
+
**Version source**: `*.gemspec:version`
|
|
76
|
+
|
|
77
|
+
### Edge cases
|
|
78
|
+
|
|
79
|
+
- **Yank**: `gem yank <gem> -v <version>` removes from index (soft delete).
|
|
80
|
+
- **Dry-run**: no native dry-run for `gem push`; simulate by building only.
|
|
81
|
+
- **Signing**: RubyGems supports `--sign` with a cert; configure via the gemspec.
|
|
82
|
+
|
|
83
|
+
## docker-image
|
|
84
|
+
|
|
85
|
+
**Registry**: configurable — Docker Hub, GHCR, ECR, etc.
|
|
86
|
+
**Auth**: `docker login` session, or OIDC federation for GHCR.
|
|
87
|
+
**Default build**: `docker build -t <registry>:<tag> .`
|
|
88
|
+
**Default publish**: `docker push <registry>:<tag>`
|
|
89
|
+
**Version source**: tag string (not a manifest field).
|
|
90
|
+
|
|
91
|
+
### Edge cases
|
|
92
|
+
|
|
93
|
+
- **Multi-arch**: use `docker buildx` with `--platform linux/amd64,linux/arm64` and `--push` in one step.
|
|
94
|
+
- **Digest**: `docker inspect --format='{{.Id}}' <image>` gives the content-addressed digest.
|
|
95
|
+
- **Cosign signing**: delegate to ct-provenance-keeper; cosign keyless signs the image by digest.
|
|
96
|
+
- **Overwrites**: Docker pushes silently overwrite the same tag; rely on digest tracking for integrity.
|
|
97
|
+
- **Dry-run**: no native dry-run; `docker buildx build` without `--push` is the closest equivalent.
|
|
98
|
+
|
|
99
|
+
## github-release
|
|
100
|
+
|
|
101
|
+
**Registry**: `github.com/<owner>/<repo>/releases/<tag>`.
|
|
102
|
+
**Auth**: `GITHUB_TOKEN` env var or OIDC.
|
|
103
|
+
**Default build**: (none; artifacts are uploaded from disk)
|
|
104
|
+
**Default publish**: `gh release create <tag> <files>`
|
|
105
|
+
**Version source**: git tag.
|
|
106
|
+
|
|
107
|
+
### Edge cases
|
|
108
|
+
|
|
109
|
+
- **Idempotency**: errors if the tag already has a release unless `--discussion-category` or similar flags are used.
|
|
110
|
+
- **Deletion**: `gh release delete <tag>` is a full rollback.
|
|
111
|
+
- **Body**: pulled from the changelog section for the release version; include checksums in the body.
|
|
112
|
+
- **Assets**: each uploaded file produces a separate downloadable URL.
|
|
113
|
+
|
|
114
|
+
## generic-tarball
|
|
115
|
+
|
|
116
|
+
**Registry**: configurable — any HTTP target or object store.
|
|
117
|
+
**Auth**: custom per target.
|
|
118
|
+
**Default build**: `tar czf <output> --exclude=.git .`
|
|
119
|
+
**Default publish**: custom; the handler MUST provide one.
|
|
120
|
+
**Version source**: computed from the release version, not from a manifest.
|
|
121
|
+
|
|
122
|
+
### Edge cases
|
|
123
|
+
|
|
124
|
+
- **Reproducibility**: set `--sort=name --owner=0 --group=0 --mtime='UTC 2020-01-01'` for deterministic tarballs.
|
|
125
|
+
- **Exclude list**: always exclude `.git`, `node_modules`, `target`, `dist` (unless that's the payload), and any secret-bearing files.
|
|
126
|
+
- **Checksum file**: emit `checksums.txt` alongside the tarball for distribution verification.
|
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
# Handler Interface Pseudocode
|
|
2
|
+
|
|
3
|
+
Every artifact handler is a triple of Bash functions with a strict contract. This file shows the interface and a full pseudocode example for implementing a new handler.
|
|
4
|
+
|
|
5
|
+
## The Contract
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
{prefix}_validate(artifact_config_json) -> exit 0 | 1
|
|
9
|
+
{prefix}_build(artifact_config_json, dry_run) -> exit 0 | 1
|
|
10
|
+
{prefix}_publish(artifact_config_json, dry_run) -> exit 0 | 1
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
- `artifact_config_json` is a single entry from `release.artifacts[]`, passed as a JSON string.
|
|
14
|
+
- `dry_run` is the literal string `"true"` or `"false"`.
|
|
15
|
+
- All three functions return 0 on success, non-zero on failure.
|
|
16
|
+
- None of the three MAY print credentials, write credentials to files, or pass credentials via command-line arguments.
|
|
17
|
+
|
|
18
|
+
## Required Behavior
|
|
19
|
+
|
|
20
|
+
### validate
|
|
21
|
+
|
|
22
|
+
- Parse the config JSON.
|
|
23
|
+
- Confirm every required tool is available on PATH (`command -v <tool>`).
|
|
24
|
+
- Confirm the package manifest exists (`package.json`, `pyproject.toml`, `Cargo.toml`, etc.).
|
|
25
|
+
- Confirm the version in the manifest matches the release version.
|
|
26
|
+
- Confirm credentials declared as required are present in the environment.
|
|
27
|
+
- Return 0 if all checks pass, 1 otherwise.
|
|
28
|
+
|
|
29
|
+
### build
|
|
30
|
+
|
|
31
|
+
- Parse the config JSON.
|
|
32
|
+
- If `dry_run == "true"`, log the build command and return 0 without running it.
|
|
33
|
+
- Otherwise, run the build command and capture stdout/stderr.
|
|
34
|
+
- Verify the expected output location exists and is non-empty.
|
|
35
|
+
- Compute SHA-256 checksum of every output file and emit it to `stdout` in `<sha256> <filename>` format.
|
|
36
|
+
- Return 0 on success.
|
|
37
|
+
|
|
38
|
+
### publish
|
|
39
|
+
|
|
40
|
+
- Parse the config JSON.
|
|
41
|
+
- If `dry_run == "true"`, log the publish command and return 0.
|
|
42
|
+
- Otherwise, verify credentials are present in the environment.
|
|
43
|
+
- Run the publish command.
|
|
44
|
+
- Capture the registry response (URL, digest, published timestamp).
|
|
45
|
+
- Emit the response as JSON on stdout for the pipeline to capture.
|
|
46
|
+
- Return 0 on success.
|
|
47
|
+
|
|
48
|
+
## Full Pseudocode Example: `my_custom` handler
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
# Example: publishing a custom tarball to a private registry.
|
|
52
|
+
|
|
53
|
+
my_custom_validate() {
|
|
54
|
+
local config="$1"
|
|
55
|
+
local version
|
|
56
|
+
version=$(echo "$config" | jq -r '.version // empty')
|
|
57
|
+
|
|
58
|
+
# Tool availability
|
|
59
|
+
if ! command -v tar > /dev/null; then
|
|
60
|
+
echo "ERROR: tar not found on PATH" >&2
|
|
61
|
+
return 1
|
|
62
|
+
fi
|
|
63
|
+
|
|
64
|
+
if ! command -v curl > /dev/null; then
|
|
65
|
+
echo "ERROR: curl not found on PATH" >&2
|
|
66
|
+
return 1
|
|
67
|
+
fi
|
|
68
|
+
|
|
69
|
+
# Config shape
|
|
70
|
+
if [[ -z "$version" ]]; then
|
|
71
|
+
echo "ERROR: config missing .version" >&2
|
|
72
|
+
return 1
|
|
73
|
+
fi
|
|
74
|
+
|
|
75
|
+
local registry
|
|
76
|
+
registry=$(echo "$config" | jq -r '.registry // empty')
|
|
77
|
+
if [[ -z "$registry" ]]; then
|
|
78
|
+
echo "ERROR: config missing .registry" >&2
|
|
79
|
+
return 1
|
|
80
|
+
fi
|
|
81
|
+
|
|
82
|
+
# Credential check (env var name only; never echo the value)
|
|
83
|
+
local env_var
|
|
84
|
+
env_var=$(echo "$config" | jq -r '.credentials.envVar // empty')
|
|
85
|
+
if [[ -n "$env_var" ]] && [[ -z "${!env_var:-}" ]]; then
|
|
86
|
+
echo "ERROR: credential env var $env_var not set" >&2
|
|
87
|
+
return 1
|
|
88
|
+
fi
|
|
89
|
+
|
|
90
|
+
return 0
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
my_custom_build() {
|
|
94
|
+
local config="$1"
|
|
95
|
+
local dry_run="$2"
|
|
96
|
+
local version
|
|
97
|
+
version=$(echo "$config" | jq -r '.version')
|
|
98
|
+
local output="build/my-project-${version}.tar.gz"
|
|
99
|
+
|
|
100
|
+
if [[ "$dry_run" == "true" ]]; then
|
|
101
|
+
echo "[dry-run] would run: tar czf $output --exclude=.git ."
|
|
102
|
+
return 0
|
|
103
|
+
fi
|
|
104
|
+
|
|
105
|
+
tar czf "$output" --exclude=.git --exclude=node_modules . || return 1
|
|
106
|
+
|
|
107
|
+
if [[ ! -s "$output" ]]; then
|
|
108
|
+
echo "ERROR: build output missing or empty: $output" >&2
|
|
109
|
+
return 1
|
|
110
|
+
fi
|
|
111
|
+
|
|
112
|
+
# Emit checksum for the pipeline to capture.
|
|
113
|
+
local digest
|
|
114
|
+
digest=$(sha256sum "$output" | awk '{print $1}')
|
|
115
|
+
echo "${digest} ${output}"
|
|
116
|
+
|
|
117
|
+
return 0
|
|
118
|
+
}
|
|
119
|
+
|
|
120
|
+
my_custom_publish() {
|
|
121
|
+
local config="$1"
|
|
122
|
+
local dry_run="$2"
|
|
123
|
+
local version
|
|
124
|
+
version=$(echo "$config" | jq -r '.version')
|
|
125
|
+
local registry
|
|
126
|
+
registry=$(echo "$config" | jq -r '.registry')
|
|
127
|
+
local env_var
|
|
128
|
+
env_var=$(echo "$config" | jq -r '.credentials.envVar')
|
|
129
|
+
local output="build/my-project-${version}.tar.gz"
|
|
130
|
+
|
|
131
|
+
if [[ "$dry_run" == "true" ]]; then
|
|
132
|
+
echo "[dry-run] would POST $output to $registry"
|
|
133
|
+
return 0
|
|
134
|
+
fi
|
|
135
|
+
|
|
136
|
+
# NEVER echo or log the credential value itself.
|
|
137
|
+
if [[ -z "${!env_var:-}" ]]; then
|
|
138
|
+
echo "ERROR: credential env var $env_var is unset" >&2
|
|
139
|
+
return 1
|
|
140
|
+
fi
|
|
141
|
+
|
|
142
|
+
# Use a header file so the token never appears on the command line.
|
|
143
|
+
local header_file
|
|
144
|
+
header_file=$(mktemp)
|
|
145
|
+
printf "Authorization: Bearer %s\n" "${!env_var}" > "$header_file"
|
|
146
|
+
|
|
147
|
+
local response
|
|
148
|
+
response=$(curl -sS -H @"$header_file" \
|
|
149
|
+
-X POST \
|
|
150
|
+
-F "file=@${output}" \
|
|
151
|
+
"${registry}/packages") || {
|
|
152
|
+
rm -f "$header_file"
|
|
153
|
+
return 1
|
|
154
|
+
}
|
|
155
|
+
|
|
156
|
+
rm -f "$header_file"
|
|
157
|
+
|
|
158
|
+
# Emit structured response for the pipeline.
|
|
159
|
+
echo "$response"
|
|
160
|
+
|
|
161
|
+
return 0
|
|
162
|
+
}
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
## Registering the Handler
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
source lib/release-artifacts.sh
|
|
169
|
+
register_artifact_handler "my-custom-type" "my_custom"
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
After registration, a `release.artifacts[]` entry with `"type": "my-custom-type"` will invoke the three functions automatically during the pipeline.
|
|
173
|
+
|
|
174
|
+
## Testing a Handler
|
|
175
|
+
|
|
176
|
+
Every new handler MUST ship with BATS tests that cover:
|
|
177
|
+
|
|
178
|
+
1. `validate` returns 1 on missing tools.
|
|
179
|
+
2. `validate` returns 1 on missing config fields.
|
|
180
|
+
3. `validate` returns 1 on missing credentials.
|
|
181
|
+
4. `build` respects `dry_run`.
|
|
182
|
+
5. `build` produces output and emits a valid SHA-256 checksum.
|
|
183
|
+
6. `publish` respects `dry_run`.
|
|
184
|
+
7. `publish` refuses to run with missing credentials.
|
|
185
|
+
8. The full pipeline (`validate` → `build` → `publish` with `dry_run=true`) exits 0 end-to-end.
|
|
186
|
+
|
|
187
|
+
Tests live in `tests/unit/release-artifacts-*.bats` alongside the existing handlers.
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ct-consensus-voter
|
|
3
|
+
description: "Runs structured multi-agent voting for decision tasks with confidence scores, conflict detection, and HITL escalation when the threshold is not met. Use when two or more agents must vote on options: architecture choices, tool selection, policy decisions, when a task carries agent_type:analysis, or on phrases like 'reach consensus', 'vote on options', 'resolve the debate', 'pick the best approach'. Produces a voting matrix JSON, enforces the 0.5 threshold, flags ties within 0.1 confidence as contested and escalates to human tiebreak."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Consensus Voter
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Runs a structured vote across two or more agents on a decision question, records confidence and rationale per option, and emits a voting matrix that the rest of the pipeline can read. The skill enforces a configurable consensus threshold, detects contested verdicts, and escalates to HITL when evidence is insufficient.
|
|
11
|
+
|
|
12
|
+
## Core Principle
|
|
13
|
+
|
|
14
|
+
> Consensus requires evidence, rationale, and a threshold. Without all three, escalate.
|
|
15
|
+
|
|
16
|
+
## Immutable Constraints
|
|
17
|
+
|
|
18
|
+
| ID | Rule | Enforcement |
|
|
19
|
+
|----|------|-------------|
|
|
20
|
+
| CONS-001 | Vote MUST use the structured voting matrix format. | `validateConsensusProtocol` rejects entries with fewer than 2 options. |
|
|
21
|
+
| CONS-002 | Every position MUST carry a rationale string. | Missing rationale rejects the entry. |
|
|
22
|
+
| CONS-003 | Every confidence score MUST lie in `[0.0, 1.0]`. | Out-of-range scores fail validation. |
|
|
23
|
+
| CONS-004 | Every position MUST cite at least one evidence record. | Bare votes are rejected; evidence is non-optional. |
|
|
24
|
+
| CONS-005 | Conflicts MUST be flagged with a severity level (`critical` / `high` / `medium` / `low`). | A conflict without severity is a protocol violation. |
|
|
25
|
+
| CONS-006 | The vote MUST escalate to HITL when the top option's confidence is below the threshold (default 0.5). | Exit code 65 (`HANDOFF_REQUIRED`). |
|
|
26
|
+
| CONS-007 | Manifest entry MUST set `agent_type: "analysis"`. | Validator rejects any other value. |
|
|
27
|
+
|
|
28
|
+
## Voting Matrix Schema
|
|
29
|
+
|
|
30
|
+
Every consensus run produces a single JSON document that matches this shape:
|
|
31
|
+
|
|
32
|
+
```json
|
|
33
|
+
{
|
|
34
|
+
"questionId": "CONS-0042",
|
|
35
|
+
"question": "Which ORM should the monorepo standardize on?",
|
|
36
|
+
"options": [
|
|
37
|
+
{
|
|
38
|
+
"name": "drizzle-v1-beta",
|
|
39
|
+
"confidence": 0.82,
|
|
40
|
+
"rationale": "defineRelations unblocks the cascade query",
|
|
41
|
+
"evidence": [
|
|
42
|
+
{ "file": "drizzle-release-notes.md", "section": "v1.0.0-beta", "type": "doc" },
|
|
43
|
+
{ "file": "packages/core/src/orchestration/protocol-validators.ts", "section": "validateArchitectureDecisionProtocol", "type": "code" }
|
|
44
|
+
]
|
|
45
|
+
},
|
|
46
|
+
{
|
|
47
|
+
"name": "kysely",
|
|
48
|
+
"confidence": 0.41,
|
|
49
|
+
"rationale": "cleaner long-term abstraction but invalidates migrations",
|
|
50
|
+
"evidence": [
|
|
51
|
+
{ "file": "kysely-docs.md", "section": "migrations", "type": "doc" }
|
|
52
|
+
]
|
|
53
|
+
}
|
|
54
|
+
],
|
|
55
|
+
"threshold": 0.5,
|
|
56
|
+
"verdict": "PROVEN",
|
|
57
|
+
"actualConsensus": 0.82,
|
|
58
|
+
"conflicts": []
|
|
59
|
+
}
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
The full schema, including rarer verdicts and conflict records, is in [references/matrix-examples.md](references/matrix-examples.md).
|
|
63
|
+
|
|
64
|
+
## Scoring Rules
|
|
65
|
+
|
|
66
|
+
| Rule | Description |
|
|
67
|
+
|------|-------------|
|
|
68
|
+
| **Score range** | `confidence ∈ [0.0, 1.0]`; out-of-range scores are rejected. |
|
|
69
|
+
| **Threshold** | Default 0.5. Can be overridden per task; MUST be recorded in the matrix. |
|
|
70
|
+
| **Top option** | The option with the highest confidence; ties break by rationale length (deterministic). |
|
|
71
|
+
| **Pass condition** | Top option confidence > threshold **and** no conflict has `severity: critical`. |
|
|
72
|
+
| **Fail condition** | Top option confidence ≤ threshold, OR a critical conflict is present. |
|
|
73
|
+
|
|
74
|
+
## Verdicts
|
|
75
|
+
|
|
76
|
+
| Verdict | Condition | Action |
|
|
77
|
+
|---------|-----------|--------|
|
|
78
|
+
| `PROVEN` | Top option ≥ threshold + reproducible evidence | Write manifest, exit 0, hand off to ct-adr-recorder |
|
|
79
|
+
| `REFUTED` | Counter-evidence invalidates the top option | Write manifest, exit 0, do not promote to ADR |
|
|
80
|
+
| `CONTESTED` | Top two options within 0.1 confidence | Flag as contested, exit 65 (HITL tiebreak) |
|
|
81
|
+
| `INSUFFICIENT_EVIDENCE` | No option reaches the threshold, OR fewer than 2 options have evidence | Exit 65, request additional research |
|
|
82
|
+
|
|
83
|
+
## Conflict Detection
|
|
84
|
+
|
|
85
|
+
A conflict exists when two options have confidence within 0.1 of each other *and* their rationales are mutually exclusive. The skill MUST record conflicts in the matrix:
|
|
86
|
+
|
|
87
|
+
```json
|
|
88
|
+
{
|
|
89
|
+
"conflicts": [
|
|
90
|
+
{
|
|
91
|
+
"conflictId": "c-0042-01",
|
|
92
|
+
"severity": "high",
|
|
93
|
+
"conflictType": "contradiction",
|
|
94
|
+
"positions": [
|
|
95
|
+
{ "option": "drizzle-v1-beta", "confidence": 0.82 },
|
|
96
|
+
{ "option": "kysely", "confidence": 0.79 }
|
|
97
|
+
],
|
|
98
|
+
"resolution": { "status": "pending", "resolutionType": "escalate" }
|
|
99
|
+
}
|
|
100
|
+
]
|
|
101
|
+
}
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Conflicts with `severity: critical` always escalate, regardless of top-option confidence. Severity is assigned by the skill based on the blast radius of each option (e.g., reversible tool choice = `low`; irreversible schema migration = `high`; security-impacting = `critical`).
|
|
105
|
+
|
|
106
|
+
## HITL Escalation
|
|
107
|
+
|
|
108
|
+
The skill MUST escalate when:
|
|
109
|
+
|
|
110
|
+
1. Top option confidence < threshold.
|
|
111
|
+
2. Any conflict has `severity: critical`.
|
|
112
|
+
3. Top two options differ by less than 0.1 (contested).
|
|
113
|
+
4. Fewer than 2 options have evidence records.
|
|
114
|
+
|
|
115
|
+
On escalation:
|
|
116
|
+
|
|
117
|
+
1. Write the matrix to disk with `verdict: CONTESTED` or `verdict: INSUFFICIENT_EVIDENCE`.
|
|
118
|
+
2. Record the manifest entry with `agent_type: "analysis"` and `verdict` populated.
|
|
119
|
+
3. Exit with code 65 (`HANDOFF_REQUIRED`).
|
|
120
|
+
4. Do not attempt to re-run the vote in the same session.
|
|
121
|
+
|
|
122
|
+
## Integration
|
|
123
|
+
|
|
124
|
+
Validate the matrix through `cleo check protocol`:
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
cleo check protocol \
|
|
128
|
+
--protocolType consensus \
|
|
129
|
+
--votingMatrixFile ./.cleo/consensus/CONS-0042.json \
|
|
130
|
+
--taskId T4797
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Exit code 0 = matrix is valid and verdict is `PROVEN` or `REFUTED`. Exit code 65 = `HANDOFF_REQUIRED` (contested or insufficient evidence). Exit code 61 = `E_PROTOCOL_CONSENSUS` (matrix shape is invalid).
|
|
134
|
+
|
|
135
|
+
This skill typically hands off to ct-adr-recorder on a `PROVEN` verdict so the decision can be formalized.
|
|
136
|
+
|
|
137
|
+
## Anti-Patterns
|
|
138
|
+
|
|
139
|
+
| Pattern | Problem | Solution |
|
|
140
|
+
|---------|---------|----------|
|
|
141
|
+
| Binary votes without confidence scores | Loses nuance (violates CONS-003) | Every position carries a score in `[0.0, 1.0]` |
|
|
142
|
+
| Positions without evidence | Bare opinions cannot produce consensus (violates CONS-004) | Every position cites at least one file/section/type |
|
|
143
|
+
| Accepting unanimous consensus uncritically | May indicate groupthink | The skill MUST still record rationale and check for hidden assumptions |
|
|
144
|
+
| Skipping minority positions | Loses valid concerns | Record every option the agents considered, including rejected ones |
|
|
145
|
+
| Premature escalation | Wastes human attention | Only escalate on the four listed conditions, not on every low-confidence vote |
|
|
146
|
+
| Treating the threshold as advisory | Breaks CONS-006 | The threshold is a hard gate; below it, escalate |
|
|
147
|
+
| Reusing a matrix across questions | Pollutes evidence chains | Each question gets its own `questionId` and its own matrix file |
|
|
148
|
+
|
|
149
|
+
## Critical Rules Summary
|
|
150
|
+
|
|
151
|
+
1. Every vote MUST produce a voting matrix with at least 2 options.
|
|
152
|
+
2. Every option MUST carry `confidence`, `rationale`, and at least one evidence record.
|
|
153
|
+
3. The threshold is a hard gate; below it, escalate to HITL with exit 65.
|
|
154
|
+
4. Conflicts within 0.1 confidence MUST be flagged as `CONTESTED`.
|
|
155
|
+
5. Critical-severity conflicts always escalate, regardless of top-option confidence.
|
|
156
|
+
6. Manifest entry MUST set `agent_type: "analysis"` and include the verdict.
|
|
157
|
+
7. On PROVEN, hand off to ct-adr-recorder; on CONTESTED or INSUFFICIENT_EVIDENCE, hand off to HITL.
|
|
158
|
+
8. Always validate via `cleo check protocol --protocolType consensus`.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
{
|
|
2
|
+
"_comment": "CLEO-only metadata -- add to packages/skills/skills/manifest.json",
|
|
3
|
+
"name": "ct-consensus-voter",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"tier": 2,
|
|
6
|
+
"token_budget": 6000,
|
|
7
|
+
"protocol": "consensus",
|
|
8
|
+
"capabilities": {
|
|
9
|
+
"inputs": ["task-id", "question", "candidate-options"],
|
|
10
|
+
"outputs": ["voting-matrix", "manifest-entry"],
|
|
11
|
+
"dispatch_triggers": [
|
|
12
|
+
"reach consensus",
|
|
13
|
+
"vote on options",
|
|
14
|
+
"resolve the debate",
|
|
15
|
+
"pick the best approach",
|
|
16
|
+
"run consensus"
|
|
17
|
+
],
|
|
18
|
+
"compatible_subagent_types": ["general-purpose"],
|
|
19
|
+
"chains_to": ["ct-adr-recorder", "ct-research-agent"],
|
|
20
|
+
"dispatch_keywords": {
|
|
21
|
+
"primary": ["consensus", "vote", "decide", "verdict"],
|
|
22
|
+
"secondary": ["confidence", "rationale", "evidence", "contested"]
|
|
23
|
+
}
|
|
24
|
+
},
|
|
25
|
+
"constraints": {
|
|
26
|
+
"max_context_tokens": 60000,
|
|
27
|
+
"requires_session": false,
|
|
28
|
+
"requires_epic": false
|
|
29
|
+
}
|
|
30
|
+
}
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
# Voting Matrix Examples
|
|
2
|
+
|
|
3
|
+
Three worked examples covering the outcomes the skill must handle: a clean win, a contested verdict, and an escalation for insufficient evidence.
|
|
4
|
+
|
|
5
|
+
## Example 1: Clean Win (PROVEN)
|
|
6
|
+
|
|
7
|
+
Question: "Which lock file strategy should the Rust workspace use?"
|
|
8
|
+
|
|
9
|
+
```json
|
|
10
|
+
{
|
|
11
|
+
"questionId": "CONS-0101",
|
|
12
|
+
"question": "Which lock file strategy should the Rust workspace use?",
|
|
13
|
+
"options": [
|
|
14
|
+
{
|
|
15
|
+
"name": "single-workspace-lock",
|
|
16
|
+
"confidence": 0.91,
|
|
17
|
+
"rationale": "Single Cargo.lock at the workspace root guarantees consistent dependency resolution across all 14 crates and matches how cargo build --workspace expects dependencies.",
|
|
18
|
+
"evidence": [
|
|
19
|
+
{ "file": "Cargo.toml", "section": "[workspace]", "type": "code" },
|
|
20
|
+
{ "file": "cargo-docs/workspaces.md", "section": "Lock files", "type": "doc" }
|
|
21
|
+
]
|
|
22
|
+
},
|
|
23
|
+
{
|
|
24
|
+
"name": "per-crate-locks",
|
|
25
|
+
"confidence": 0.23,
|
|
26
|
+
"rationale": "Per-crate lock files isolate version churn but fight the workspace resolver.",
|
|
27
|
+
"evidence": [
|
|
28
|
+
{ "file": "cargo-docs/resolver.md", "section": "V2", "type": "doc" }
|
|
29
|
+
]
|
|
30
|
+
}
|
|
31
|
+
],
|
|
32
|
+
"threshold": 0.5,
|
|
33
|
+
"verdict": "PROVEN",
|
|
34
|
+
"actualConsensus": 0.91,
|
|
35
|
+
"conflicts": []
|
|
36
|
+
}
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Outcome:
|
|
40
|
+
- `verdict == PROVEN` because top option confidence (0.91) > threshold (0.5) and no critical conflicts.
|
|
41
|
+
- The skill exits 0 and hands off to `ct-adr-recorder` so the decision is formalized as an ADR.
|
|
42
|
+
- The runner-up is recorded with rationale so the next reviewer knows what was rejected and why.
|
|
43
|
+
|
|
44
|
+
## Example 2: Contested Verdict (CONTESTED)
|
|
45
|
+
|
|
46
|
+
Question: "Which test framework should the new agent-runtime crate use?"
|
|
47
|
+
|
|
48
|
+
```json
|
|
49
|
+
{
|
|
50
|
+
"questionId": "CONS-0202",
|
|
51
|
+
"question": "Which test framework should the new agent-runtime crate use?",
|
|
52
|
+
"options": [
|
|
53
|
+
{
|
|
54
|
+
"name": "cargo-test-builtin",
|
|
55
|
+
"confidence": 0.74,
|
|
56
|
+
"rationale": "Built into the toolchain, zero additional dependencies, works with every CI runner out of the box.",
|
|
57
|
+
"evidence": [
|
|
58
|
+
{ "file": "cargo-docs/testing.md", "section": "Writing tests", "type": "doc" }
|
|
59
|
+
]
|
|
60
|
+
},
|
|
61
|
+
{
|
|
62
|
+
"name": "nextest",
|
|
63
|
+
"confidence": 0.72,
|
|
64
|
+
"rationale": "Parallel execution with isolated test processes is significantly faster on multi-core CI runners and surfaces flaky tests.",
|
|
65
|
+
"evidence": [
|
|
66
|
+
{ "file": "nextest-rs/docs/index.md", "section": "Benefits", "type": "doc" },
|
|
67
|
+
{ "file": "ci/bench-results.md", "section": "nextest vs cargo test", "type": "data" }
|
|
68
|
+
]
|
|
69
|
+
}
|
|
70
|
+
],
|
|
71
|
+
"threshold": 0.5,
|
|
72
|
+
"verdict": "CONTESTED",
|
|
73
|
+
"actualConsensus": 0.74,
|
|
74
|
+
"conflicts": [
|
|
75
|
+
{
|
|
76
|
+
"conflictId": "c-0202-01",
|
|
77
|
+
"severity": "medium",
|
|
78
|
+
"conflictType": "partial-overlap",
|
|
79
|
+
"positions": [
|
|
80
|
+
{ "option": "cargo-test-builtin", "confidence": 0.74 },
|
|
81
|
+
{ "option": "nextest", "confidence": 0.72 }
|
|
82
|
+
],
|
|
83
|
+
"resolution": { "status": "pending", "resolutionType": "escalate" }
|
|
84
|
+
}
|
|
85
|
+
]
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Outcome:
|
|
90
|
+
- Top option is above threshold, BUT the runner-up is within 0.1 (0.74 - 0.72 = 0.02).
|
|
91
|
+
- The skill flags the result as `CONTESTED` and exits 65 for HITL tiebreak.
|
|
92
|
+
- The human reviewer picks one option or requests additional evidence.
|
|
93
|
+
- Neither option is automatically promoted to an ADR.
|
|
94
|
+
|
|
95
|
+
## Example 3: Insufficient Evidence (INSUFFICIENT_EVIDENCE)
|
|
96
|
+
|
|
97
|
+
Question: "Should SignalDock adopt WebSockets instead of SSE for real-time updates?"
|
|
98
|
+
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"questionId": "CONS-0303",
|
|
102
|
+
"question": "Should SignalDock adopt WebSockets instead of SSE for real-time updates?",
|
|
103
|
+
"options": [
|
|
104
|
+
{
|
|
105
|
+
"name": "stay-on-sse",
|
|
106
|
+
"confidence": 0.44,
|
|
107
|
+
"rationale": "SSE works today and is simpler to scale behind HTTP/2 edge nodes, but we have no benchmarks under real agent load.",
|
|
108
|
+
"evidence": [
|
|
109
|
+
{ "file": "signaldock/docs/transport.md", "section": "SSE", "type": "doc" }
|
|
110
|
+
]
|
|
111
|
+
},
|
|
112
|
+
{
|
|
113
|
+
"name": "migrate-to-websockets",
|
|
114
|
+
"confidence": 0.39,
|
|
115
|
+
"rationale": "Bidirectional channel reduces polling round-trips but the migration cost and edge-proxy behavior under load are both unknown.",
|
|
116
|
+
"evidence": [
|
|
117
|
+
{ "file": "mozilla-websocket-guide.md", "section": "Bidirectional", "type": "doc" }
|
|
118
|
+
]
|
|
119
|
+
}
|
|
120
|
+
],
|
|
121
|
+
"threshold": 0.5,
|
|
122
|
+
"verdict": "INSUFFICIENT_EVIDENCE",
|
|
123
|
+
"actualConsensus": 0.44,
|
|
124
|
+
"conflicts": []
|
|
125
|
+
}
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Outcome:
|
|
129
|
+
- Neither option reaches the 0.5 threshold.
|
|
130
|
+
- The skill marks the verdict `INSUFFICIENT_EVIDENCE` and exits 65.
|
|
131
|
+
- The recommended next step is to run `ct-research-agent` to collect benchmarks under load, then re-run the vote.
|
|
132
|
+
- Importantly, the skill does NOT guess a winner; it asks for more evidence.
|
|
133
|
+
|
|
134
|
+
## Schema Notes
|
|
135
|
+
|
|
136
|
+
- `questionId` is a unique identifier. Never reuse across questions, even for a rerun after new evidence — use a new id like `CONS-0303-r1`.
|
|
137
|
+
- `threshold` is always recorded, even when it matches the default. Future readers must be able to see what gate applied.
|
|
138
|
+
- `actualConsensus` is the top option's confidence, not a weighted average. Readers should not have to recompute it.
|
|
139
|
+
- `conflicts` is always an array, even when empty. `conflicts: []` is clearer than a missing field.
|
|
140
|
+
- `resolution.resolutionType` values: `merge`, `choose-a`, `choose-b`, `new`, `defer`, `escalate`. The skill never picks `merge` or `choose-*` itself; those are reviewer actions.
|
|
@@ -116,5 +116,5 @@ For ct-grade specifically, both arms of an A/B test experience the same approxim
|
|
|
116
116
|
- `src/core/metrics/token-estimation.ts` — CLEO's token estimation implementation
|
|
117
117
|
- `docs/specs/CLEO-METRICS-VALIDATION-SYSTEM-SPEC.md` — Metrics system specification
|
|
118
118
|
- `.cleo/setup-otel.sh` — OTel environment setup script
|
|
119
|
-
- `packages/
|
|
120
|
-
- `packages/
|
|
119
|
+
- `packages/skills/skills/ct-grade/scripts/token_tracker.py` — Token aggregation script
|
|
120
|
+
- `packages/skills/skills/ct-grade/scripts/generate_report.py` — Report generator (uses confidence labels)
|
|
@@ -25,7 +25,7 @@ from datetime import datetime
|
|
|
25
25
|
from pathlib import Path
|
|
26
26
|
|
|
27
27
|
SCRIPT_DIR = Path(__file__).parent.resolve()
|
|
28
|
-
SKILL_DIR = SCRIPT_DIR.parent # packages/
|
|
28
|
+
SKILL_DIR = SCRIPT_DIR.parent # packages/skills/skills/ct-grade/
|
|
29
29
|
|
|
30
30
|
|
|
31
31
|
# ---------------------------------------------------------------------------
|