@aryaminus/controlkeel-opencode 0.3.4 → 0.3.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -16,6 +16,8 @@ metadata:
|
|
|
16
16
|
author: controlkeel
|
|
17
17
|
version: "2.0"
|
|
18
18
|
category: benchmark
|
|
19
|
+
ck_mcp_tools:
|
|
20
|
+
- ck_observability
|
|
19
21
|
---
|
|
20
22
|
|
|
21
23
|
# Benchmark Operator Skill
|
|
@@ -29,6 +31,15 @@ Use this skill when the task is benchmark orchestration instead of normal govern
|
|
|
29
31
|
3. Review catch rate, block rate, expected-rule hit rate, latency, and overhead.
|
|
30
32
|
4. Export the run if you need external analysis.
|
|
31
33
|
|
|
34
|
+
## Local observability feedback loop
|
|
35
|
+
|
|
36
|
+
For generated observability coverage, prefer the human-gated loop:
|
|
37
|
+
|
|
38
|
+
1. Inspect `ck_observability` reports for `saved_evals`, `benchmark_drafts`, `benchmark_scenarios`, and `benchmark_history`.
|
|
39
|
+
2. Use CLI-only commands for mutations: draft approval, materialization, and benchmark execution are not exposed through MCP in this skill.
|
|
40
|
+
3. Run generated observability benchmarks only after a dry-run review and explicit operator approval.
|
|
41
|
+
4. Treat `promotions` as advisory evidence; do not mutate policy, router, prompt, or autofix artifacts from benchmark results alone.
|
|
42
|
+
|
|
32
43
|
## Additional resources
|
|
33
44
|
|
|
34
45
|
- [Benchmark operator playbook](references/benchmark-playbook.md)
|
|
@@ -17,6 +17,7 @@ metadata:
|
|
|
17
17
|
version: "2.0"
|
|
18
18
|
category: policy
|
|
19
19
|
ck_mcp_tools:
|
|
20
|
+
- ck_observability
|
|
20
21
|
- ck_outcome_tracker
|
|
21
22
|
---
|
|
22
23
|
|
|
@@ -32,6 +33,10 @@ Use this skill only when the task is offline policy training or artifact promoti
|
|
|
32
33
|
4. Summarize held-out metrics against the heuristic baseline before promotion.
|
|
33
34
|
5. Consider real-world success inputs using `ck_outcome_tracker` (leaderboards, recorded session outcomes).
|
|
34
35
|
|
|
36
|
+
## Observability promotion evidence
|
|
37
|
+
|
|
38
|
+
Before policy artifact promotion, inspect `ck_observability` report `promotions` and `benchmark_history`. Promotion candidates are advisory and non-mutating; require human approval, successful held-out or generated benchmark evidence, and explicit policy-training intent before changing router, budget-hint, prompt, or policy artifacts.
|
|
39
|
+
|
|
35
40
|
## Additional resources
|
|
36
41
|
|
|
37
42
|
- [Promotion rules](references/promotion-rules.md)
|
|
@@ -16,6 +16,7 @@ metadata:
|
|
|
16
16
|
version: "2.0"
|
|
17
17
|
category: proof
|
|
18
18
|
ck_mcp_tools:
|
|
19
|
+
- ck_observability
|
|
19
20
|
- ck_context
|
|
20
21
|
- ck_memory_search
|
|
21
22
|
- ck_memory_record
|
|
@@ -36,6 +37,10 @@ Use this skill when you need the durable system-of-record view instead of only t
|
|
|
36
37
|
5. Use `ck_memory_record` to preserve new decisions or operator intent that future agents should recover explicitly.
|
|
37
38
|
6. Use `ck_memory_archive` to retire stale or superseded memories so retrieval quality does not decay.
|
|
38
39
|
|
|
40
|
+
## Observability evidence
|
|
41
|
+
|
|
42
|
+
When closing or resuming work, include local observability loop evidence when relevant: saved evals, benchmark drafts, materialized scenarios, generated benchmark history, and advisory promotion candidates. Prefer the read-only `ck_observability` MCP surface for summaries and record durable checkpoints with `ck_memory_record`; do not treat advisory promotion candidates as executed changes.
|
|
43
|
+
|
|
39
44
|
## Additional resources
|
|
40
45
|
|
|
41
46
|
- [Proof workflow](references/proof-workflow.md)
|
|
@@ -16,6 +16,7 @@ metadata:
|
|
|
16
16
|
version: "2.0"
|
|
17
17
|
category: release
|
|
18
18
|
ck_mcp_tools:
|
|
19
|
+
- ck_observability
|
|
19
20
|
- ck_context
|
|
20
21
|
- ck_deployment_advisor
|
|
21
22
|
---
|
|
@@ -32,6 +33,10 @@ Use this skill when the operator asks whether a mission or session is ready for
|
|
|
32
33
|
4. Summarize approvals, rejections, and any remaining human work.
|
|
33
34
|
5. Provide automatic deployment resources via `ck_deployment_advisor` (Dockerize, CI pipes) for the relevant stack (Phoenix, etc.).
|
|
34
35
|
|
|
36
|
+
## Observability readiness
|
|
37
|
+
|
|
38
|
+
Before calling a feature ready, check whether relevant local observability evidence exists. Use `ck_observability` reports for `benchmark_history` and `promotions` to summarize readiness, uncovered scenarios, missed runs, and advisory promotion candidates. A ready advisory candidate is not a policy/router/prompt promotion; it still requires explicit human review.
|
|
39
|
+
|
|
35
40
|
## Additional resources
|
|
36
41
|
|
|
37
42
|
- [Release checklist](references/release-checklist.md)
|
package/package.json
CHANGED