code-ai-installer 4.3.1 → 4.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -1
- package/dist/mcp/audit_ledger.d.ts +6 -4
- package/dist/mcp/audit_ledger.js +6 -5
- package/dist/mcp/scorecard.d.ts +4 -2
- package/dist/mcp/scorecard.js +12 -7
- package/dist/mcp/tools/aggregate_run_metrics.d.ts +3 -2
- package/dist/mcp/tools/aggregate_run_metrics.js +4 -3
- package/dist/mcp/tools/sign_off.js +7 -7
- package/domains/development/agents/auditor.md +2 -2
- package/domains/development/locales/en/agents/auditor.md +2 -2
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -157,8 +157,9 @@ Depending on `--target`, `code-ai` restructures your project:
|
|
|
157
157
|
|
|
158
158
|
## 🧬 Versions & migration
|
|
159
159
|
|
|
160
|
-
`code-ai-installer` is on **v4.3.
|
|
160
|
+
`code-ai-installer` is on **v4.3.2**.
|
|
161
161
|
|
|
162
|
+
- **v4.3.2** — the Auditor now sees **`/bugfix`** runs. A run scorecard is recorded when a run reaches its mode's final gate (full / hotfix → RG, bugfix → TEST) instead of only at RG, so bugfix work counts toward the ≥3-run audit threshold rather than staying invisible. Reads each domain's `pipeline.yaml` (no hardcoded gate); existing ledger entries are unaffected and only runs completed after the update are recorded.
|
|
162
163
|
- **v4.3.1** — the `code-ai-mcp` registration is now **pinned** to the installed version (`npx -p code-ai-installer@<version>`) and re-pinned on every reinstall, so an updated server actually takes effect instead of an unpinned `npx` silently reusing a stale global/cache copy. The server also logs `code-ai-mcp v<version> · domain=<domain>` to stderr at startup, so the live build is visible in Claude's MCP logs.
|
|
163
164
|
- **v4.3.0** — `render_diff` MCP tool (unified diff → a standalone HTML review page); MCP gate-flow + stop-at-user-gate sections added to the content / analytics / product conductors; Auditor trigger — a `/audit` command plus a Release-Gate nudge that surfaces after every 3rd completed run (development pilot).
|
|
164
165
|
- **v4.1.0** — MCP servers now register in your **global (user-scope)** config via a direct, idempotent `~/.claude.json` merge (no dependency on the `claude` CLI); the conductor halts at each user gate — one at a time, no batching, no auto-pass on green.
|
|
@@ -1,15 +1,17 @@
|
|
|
1
1
|
import { RunScorecard } from "./scorecard.js";
|
|
2
2
|
import type { TaskState } from "./task_state.js";
|
|
3
|
+
import type { GateName } from "../shared/index.js";
|
|
3
4
|
/** Append one scorecard as a JSON line to the run ledger (creates dir if needed). */
|
|
4
5
|
export declare function appendScorecard(card: RunScorecard): Promise<void>;
|
|
5
6
|
/** Read all scorecards from the ledger. Malformed / partial lines are skipped. */
|
|
6
7
|
export declare function readLedger(): Promise<RunScorecard[]>;
|
|
7
8
|
/**
|
|
8
|
-
* Build + append a scorecard for a completed run. Called from sign_off when
|
|
9
|
-
*
|
|
10
|
-
* break a sign-off (telemetry
|
|
9
|
+
* Build + append a scorecard for a completed run. Called from sign_off when the
|
|
10
|
+
* mode's terminal gate is signed (full/hotfix → RG, bugfix → TEST). Best-effort
|
|
11
|
+
* by contract: callers MUST NOT let a ledger failure break a sign-off (telemetry
|
|
12
|
+
* is never load-bearing for the gate).
|
|
11
13
|
*/
|
|
12
|
-
export declare function recordRunScorecard(state: TaskState): Promise<void>;
|
|
14
|
+
export declare function recordRunScorecard(state: TaskState, terminalGate: GateName): Promise<void>;
|
|
13
15
|
/** Number of COMPLETED (RG-signed) runs in the ledger. */
|
|
14
16
|
export declare function countCompletedRuns(): Promise<number>;
|
|
15
17
|
/**
|
package/dist/mcp/audit_ledger.js
CHANGED
|
@@ -72,13 +72,14 @@ async function readSideCounts(taskId) {
|
|
|
72
72
|
};
|
|
73
73
|
}
|
|
74
74
|
/**
|
|
75
|
-
* Build + append a scorecard for a completed run. Called from sign_off when
|
|
76
|
-
*
|
|
77
|
-
* break a sign-off (telemetry
|
|
75
|
+
* Build + append a scorecard for a completed run. Called from sign_off when the
|
|
76
|
+
* mode's terminal gate is signed (full/hotfix → RG, bugfix → TEST). Best-effort
|
|
77
|
+
* by contract: callers MUST NOT let a ledger failure break a sign-off (telemetry
|
|
78
|
+
* is never load-bearing for the gate).
|
|
78
79
|
*/
|
|
79
|
-
export async function recordRunScorecard(state) {
|
|
80
|
+
export async function recordRunScorecard(state, terminalGate) {
|
|
80
81
|
const extras = await readSideCounts(state.task_id);
|
|
81
|
-
await appendScorecard(buildScorecard(state, extras));
|
|
82
|
+
await appendScorecard(buildScorecard(state, extras, terminalGate));
|
|
82
83
|
}
|
|
83
84
|
/** Number of COMPLETED (RG-signed) runs in the ledger. */
|
|
84
85
|
export async function countCompletedRuns() {
|
package/dist/mcp/scorecard.d.ts
CHANGED
|
@@ -1,10 +1,12 @@
|
|
|
1
1
|
import { z } from "zod";
|
|
2
|
+
import { GateName } from "../shared/index.js";
|
|
2
3
|
import type { TaskState } from "./task_state.js";
|
|
3
4
|
/**
|
|
4
5
|
* Run scorecard — a compact, per-pipeline-run summary derived ENTIRELY from
|
|
5
6
|
* telemetry the MCP state machine already persists (task state + jsonl side
|
|
6
7
|
* logs). One scorecard is appended to the run ledger when a run completes
|
|
7
|
-
* (
|
|
8
|
+
* (its mode's terminal gate is signed — RG for full/hotfix, TEST for bugfix).
|
|
9
|
+
* The Auditor's aggregation tool crunches >=3 of these.
|
|
8
10
|
*
|
|
9
11
|
* Deliberately records raw signals, not judgments — interpretation belongs to
|
|
10
12
|
* the aggregator (numbers) and, later, the Auditor agent (findings).
|
|
@@ -137,4 +139,4 @@ export type ScorecardExtras = {
|
|
|
137
139
|
* Build a run scorecard from a task's persisted state plus side-log counts.
|
|
138
140
|
* Pure + synchronous: deterministic and unit-testable on synthetic state.
|
|
139
141
|
*/
|
|
140
|
-
export declare function buildScorecard(state: TaskState, extras: ScorecardExtras): RunScorecard;
|
|
142
|
+
export declare function buildScorecard(state: TaskState, extras: ScorecardExtras, terminalGate?: GateName): RunScorecard;
|
package/dist/mcp/scorecard.js
CHANGED
|
@@ -4,7 +4,8 @@ import { ClassificationOutcome, GateName, PipelineMode, Signer, } from "../share
|
|
|
4
4
|
* Run scorecard — a compact, per-pipeline-run summary derived ENTIRELY from
|
|
5
5
|
* telemetry the MCP state machine already persists (task state + jsonl side
|
|
6
6
|
* logs). One scorecard is appended to the run ledger when a run completes
|
|
7
|
-
* (
|
|
7
|
+
* (its mode's terminal gate is signed — RG for full/hotfix, TEST for bugfix).
|
|
8
|
+
* The Auditor's aggregation tool crunches >=3 of these.
|
|
8
9
|
*
|
|
9
10
|
* Deliberately records raw signals, not judgments — interpretation belongs to
|
|
10
11
|
* the aggregator (numbers) and, later, the Auditor agent (findings).
|
|
@@ -35,10 +36,10 @@ export const RunScorecard = z.object({
|
|
|
35
36
|
schema_version: z.literal(SCORECARD_SCHEMA_VERSION),
|
|
36
37
|
task_id: z.string().min(1),
|
|
37
38
|
mode: PipelineMode,
|
|
38
|
-
/** True when
|
|
39
|
+
/** True when the mode's terminal gate has been signed (RG for full/hotfix, TEST for bugfix). */
|
|
39
40
|
completed: z.boolean(),
|
|
40
41
|
created_at: z.string(),
|
|
41
|
-
/** Timestamp of the
|
|
42
|
+
/** Timestamp of the terminal-gate sign-off, or null if not completed. */
|
|
42
43
|
completed_at: z.string().nullable(),
|
|
43
44
|
gates: z.array(GateScore),
|
|
44
45
|
dev_rollback_count: z.number().int().nonnegative(),
|
|
@@ -52,7 +53,7 @@ export const RunScorecard = z.object({
|
|
|
52
53
|
* Build a run scorecard from a task's persisted state plus side-log counts.
|
|
53
54
|
* Pure + synchronous: deterministic and unit-testable on synthetic state.
|
|
54
55
|
*/
|
|
55
|
-
export function buildScorecard(state, extras) {
|
|
56
|
+
export function buildScorecard(state, extras, terminalGate = "RG") {
|
|
56
57
|
// Group sign-offs by gate: count + last signer (last in chronological array).
|
|
57
58
|
const signoffCount = new Map();
|
|
58
59
|
const lastSigner = new Map();
|
|
@@ -73,9 +74,13 @@ export function buildScorecard(state, extras) {
|
|
|
73
74
|
classification: lastClass.get(gate) ?? null,
|
|
74
75
|
exceptions_count: extras.exceptions_by_gate[gate] ?? 0,
|
|
75
76
|
}));
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
77
|
+
// A run is complete when its mode's TERMINAL gate is signed (full/hotfix → RG,
|
|
78
|
+
// bugfix → TEST). The terminal gate is supplied by the caller (sign_off knows it
|
|
79
|
+
// from pipeline.yaml); defaults to "RG" so the full-mode path and standalone
|
|
80
|
+
// callers are unchanged.
|
|
81
|
+
const terminalSignoffs = state.signoffs.filter((s) => s.gate === terminalGate);
|
|
82
|
+
const completed = terminalSignoffs.length > 0;
|
|
83
|
+
const completed_at = completed ? terminalSignoffs[terminalSignoffs.length - 1].timestamp : null;
|
|
79
84
|
const exceptions_count = Object.values(extras.exceptions_by_gate).reduce((a, b) => a + (b ?? 0), 0);
|
|
80
85
|
// Group skill invocations by skill + gate into counts.
|
|
81
86
|
const skillAgg = new Map();
|
|
@@ -6,7 +6,8 @@ import type { AggregateRunMetricsInput, AggregateRunMetricsOutput } from "../../
|
|
|
6
6
|
* the Auditor agent's job (a later ADR).
|
|
7
7
|
*
|
|
8
8
|
* Attribution: per-AGENT via gate→produced_by[0] from pipeline.yaml (single
|
|
9
|
-
* source); per-WORKFLOW via mode. Only COMPLETED runs (
|
|
9
|
+
* source); per-WORKFLOW via mode. Only COMPLETED runs (the mode's terminal gate
|
|
10
|
+
* signed — RG for full/hotfix, TEST for bugfix) are
|
|
10
11
|
* aggregated; `min_runs` (default 3) is the design's small-sample guard — below
|
|
11
12
|
* it `met_threshold=false` and the Auditor should stay silent.
|
|
12
13
|
*
|
|
@@ -14,6 +15,6 @@ import type { AggregateRunMetricsInput, AggregateRunMetricsOutput } from "../../
|
|
|
14
15
|
* per_skill (incl. the gates a skill was pulled at) via get_skill instrumentation
|
|
15
16
|
* (ADR-DEV-121) — the remaining gap is trigger_accuracy (relevant-vs-invoked),
|
|
16
17
|
* which needs a relevance oracle; no explicit human-rejection signal; completed =
|
|
17
|
-
*
|
|
18
|
+
* the mode's terminal gate signed (RG for full/hotfix, TEST for bugfix).
|
|
18
19
|
*/
|
|
19
20
|
export declare function aggregateRunMetrics(input: AggregateRunMetricsInput): Promise<AggregateRunMetricsOutput>;
|
|
@@ -8,7 +8,8 @@ import { resolveActiveDomain } from "../config.js";
|
|
|
8
8
|
* the Auditor agent's job (a later ADR).
|
|
9
9
|
*
|
|
10
10
|
* Attribution: per-AGENT via gate→produced_by[0] from pipeline.yaml (single
|
|
11
|
-
* source); per-WORKFLOW via mode. Only COMPLETED runs (
|
|
11
|
+
* source); per-WORKFLOW via mode. Only COMPLETED runs (the mode's terminal gate
|
|
12
|
+
* signed — RG for full/hotfix, TEST for bugfix) are
|
|
12
13
|
* aggregated; `min_runs` (default 3) is the design's small-sample guard — below
|
|
13
14
|
* it `met_threshold=false` and the Auditor should stay silent.
|
|
14
15
|
*
|
|
@@ -16,7 +17,7 @@ import { resolveActiveDomain } from "../config.js";
|
|
|
16
17
|
* per_skill (incl. the gates a skill was pulled at) via get_skill instrumentation
|
|
17
18
|
* (ADR-DEV-121) — the remaining gap is trigger_accuracy (relevant-vs-invoked),
|
|
18
19
|
* which needs a relevance oracle; no explicit human-rejection signal; completed =
|
|
19
|
-
*
|
|
20
|
+
* the mode's terminal gate signed (RG for full/hotfix, TEST for bugfix).
|
|
20
21
|
*/
|
|
21
22
|
export async function aggregateRunMetrics(input) {
|
|
22
23
|
const minRuns = input.min_runs;
|
|
@@ -117,7 +118,7 @@ export async function aggregateRunMetrics(input) {
|
|
|
117
118
|
gates: [...a.gates].sort(),
|
|
118
119
|
}));
|
|
119
120
|
const notes = [
|
|
120
|
-
"completed =
|
|
121
|
+
"completed = the mode's terminal gate is signed (RG for full/hotfix, TEST for bugfix).",
|
|
121
122
|
"Skill invocations ARE captured (get_skill instrumentation, ADR-DEV-121): per_skill carries invocation counts + the gates each skill was pulled at, so skill.gates can be tuned to observed usage. The remaining gap is trigger_accuracy (relevant-vs-invoked), which needs a relevance oracle — not this data layer.",
|
|
122
123
|
"No explicit human gate-rejection signal; rejections manifest as rollbacks/exceptions.",
|
|
123
124
|
"Per-agent attribution uses gate→produced_by[0] from the active domain's pipeline.yaml.",
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { getGateConfig, loadPipeline } from "../pipeline.js";
|
|
1
|
+
import { getGateConfig, getNextGate, loadPipeline } from "../pipeline.js";
|
|
2
2
|
import { readTaskState, writeTaskState } from "../task_state.js";
|
|
3
3
|
import { recordRunScorecard, countCompletedRuns, auditNudgeFor } from "../audit_ledger.js";
|
|
4
4
|
import { resolveActiveDomain } from "../config.js";
|
|
@@ -57,14 +57,14 @@ export async function signOff(input) {
|
|
|
57
57
|
evidence: input.evidence,
|
|
58
58
|
});
|
|
59
59
|
await writeTaskState(state);
|
|
60
|
-
// Auditor telemetry: when the
|
|
61
|
-
//
|
|
62
|
-
// Best-effort: a ledger failure (or nudge
|
|
63
|
-
// sign-off (telemetry is not load-bearing for the gate).
|
|
60
|
+
// Auditor telemetry: when the run reaches its mode's TERMINAL gate (full/hotfix
|
|
61
|
+
// → RG, bugfix → TEST), the run is complete — append a scorecard to the local
|
|
62
|
+
// ledger, then compute the /audit nudge. Best-effort: a ledger failure (or nudge
|
|
63
|
+
// failure) must NEVER break a sign-off (telemetry is not load-bearing for the gate).
|
|
64
64
|
let audit_nudge;
|
|
65
|
-
if (input.gate ===
|
|
65
|
+
if (getNextGate(pipeline, state.mode, input.gate) === null) {
|
|
66
66
|
try {
|
|
67
|
-
await recordRunScorecard(state);
|
|
67
|
+
await recordRunScorecard(state, input.gate);
|
|
68
68
|
audit_nudge = auditNudgeFor(await countCompletedRuns());
|
|
69
69
|
}
|
|
70
70
|
catch {
|
|
@@ -22,7 +22,7 @@ schema_version: 1
|
|
|
22
22
|
- НЕ на каждом гейте и НЕ в фоне. Один проход, когда накопилось **≥3 завершённых прогона** (порог настраивается).
|
|
23
23
|
- Ниже порога — молчит. На n=1 выводов не делает (малая выборка).
|
|
24
24
|
- Per-gate телеметрию уже пишет стейт-машина; Аудитор делает один проход по накопленным данным.
|
|
25
|
-
- Запуск прохода: команда `/audit` (вручную) или подсказка `audit_nudge`, которую `sign_off` возвращает после RG.
|
|
25
|
+
- Запуск прохода: команда `/audit` (вручную) или подсказка `audit_nudge`, которую `sign_off` возвращает после завершения прогона (подписан финальный гейт режима — RG для full/hotfix, TEST для bugfix).
|
|
26
26
|
|
|
27
27
|
---
|
|
28
28
|
|
|
@@ -57,7 +57,7 @@ schema_version: 1
|
|
|
57
57
|
- **Через человека даже в автономии:** разрушительное по существующему (удаление, крупная переписка, снятие возможностей).
|
|
58
58
|
- **Всегда:** обязательный отчёт после любого автономного действия. Ничего невидимого.
|
|
59
59
|
- **Дедуп при добавлении:** перед авто-добавлением скила — проверка пересечения (`related` + контролируемый словарь); отчёт перечисляет добавления для последующей чистки.
|
|
60
|
-
- Механизм предложение→одобрение и тумблер автономии уже реализованы (`propose_change` → `review_proposal`: матрица рисков + дедуп). Проход запускается командой `/audit` или подсказкой после
|
|
60
|
+
- Механизм предложение→одобрение и тумблер автономии уже реализованы (`propose_change` → `review_proposal`: матрица рисков + дедуп). Проход запускается командой `/audit` или подсказкой после завершения прогона.
|
|
61
61
|
|
|
62
62
|
---
|
|
63
63
|
|
|
@@ -22,7 +22,7 @@ Close the self-improvement loop: build → run → measure → improve. Once rea
|
|
|
22
22
|
- NOT at every gate and NOT in the background. One pass, once **≥3 completed runs** have accumulated (threshold configurable).
|
|
23
23
|
- Below the threshold — silent. It draws no conclusions from n=1 (small sample).
|
|
24
24
|
- Per-gate telemetry is already persisted by the state machine; the Auditor makes one pass over the accumulated data.
|
|
25
|
-
- Triggering a pass: the `/audit` command (manual) or the `audit_nudge` that `sign_off` returns
|
|
25
|
+
- Triggering a pass: the `/audit` command (manual) or the `audit_nudge` that `sign_off` returns when a run completes (its mode's terminal gate is signed — RG for full/hotfix, TEST for bugfix).
|
|
26
26
|
|
|
27
27
|
---
|
|
28
28
|
|
|
@@ -57,7 +57,7 @@ Close the self-improvement loop: build → run → measure → improve. Once rea
|
|
|
57
57
|
- **Human-gated even under autonomy:** destructive changes to existing assets (delete, major rewrite, capability removal).
|
|
58
58
|
- **Always:** a mandatory report after any autonomous action. Nothing invisible.
|
|
59
59
|
- **Additive dedup:** before auto-adding a skill — an overlap check (`related` + controlled vocab); the report lists additions for later pruning.
|
|
60
|
-
- The propose→approve mechanism and the autonomy toggle already exist (`propose_change` → `review_proposal`: risk matrix + dedup). A pass is triggered by `/audit` or the post-
|
|
60
|
+
- The propose→approve mechanism and the autonomy toggle already exist (`propose_change` → `review_proposal`: risk matrix + dedup). A pass is triggered by `/audit` or the post-completion nudge.
|
|
61
61
|
|
|
62
62
|
---
|
|
63
63
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "code-ai-installer",
|
|
3
|
-
"version": "4.3.
|
|
3
|
+
"version": "4.3.2",
|
|
4
4
|
"description": "Production-ready CLI to install code-ai agents and skills for multiple AI coding assistants. Bundles the code-ai-mcp MCP server for Claude Code.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Denish1209",
|