code-ai-installer 4.3.1 → 4.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -157,8 +157,9 @@ Depending on `--target`, `code-ai` restructures your project:
157
157
 
158
158
  ## 🧬 Versions & migration
159
159
 
160
- `code-ai-installer` is on **v4.3.1**.
160
+ `code-ai-installer` is on **v4.3.2**.
161
161
 
162
+ - **v4.3.2** — the Auditor now sees **`/bugfix`** runs. A run scorecard is recorded when a run reaches its mode's final gate (full / hotfix → RG, bugfix → TEST) instead of only at RG, so bugfix work counts toward the ≥3-run audit threshold rather than staying invisible. Reads each domain's `pipeline.yaml` (no hardcoded gate); existing ledger entries are unaffected and only runs completed after the update are recorded.
162
163
  - **v4.3.1** — the `code-ai-mcp` registration is now **pinned** to the installed version (`npx -p code-ai-installer@<version>`) and re-pinned on every reinstall, so an updated server actually takes effect instead of an unpinned `npx` silently reusing a stale global/cache copy. The server also logs `code-ai-mcp v<version> · domain=<domain>` to stderr at startup, so the live build is visible in Claude's MCP logs.
163
164
  - **v4.3.0** — `render_diff` MCP tool (unified diff → a standalone HTML review page); MCP gate-flow + stop-at-user-gate sections added to the content / analytics / product conductors; Auditor trigger — a `/audit` command plus a Release-Gate nudge that surfaces after every 3rd completed run (development pilot).
164
165
  - **v4.1.0** — MCP servers now register in your **global (user-scope)** config via a direct, idempotent `~/.claude.json` merge (no dependency on the `claude` CLI); the conductor halts at each user gate — one at a time, no batching, no auto-pass on green.
@@ -1,15 +1,17 @@
1
1
  import { RunScorecard } from "./scorecard.js";
2
2
  import type { TaskState } from "./task_state.js";
3
+ import type { GateName } from "../shared/index.js";
3
4
  /** Append one scorecard as a JSON line to the run ledger (creates dir if needed). */
4
5
  export declare function appendScorecard(card: RunScorecard): Promise<void>;
5
6
  /** Read all scorecards from the ledger. Malformed / partial lines are skipped. */
6
7
  export declare function readLedger(): Promise<RunScorecard[]>;
7
8
  /**
8
- * Build + append a scorecard for a completed run. Called from sign_off when RG
9
- * is signed. Best-effort by contract: callers MUST NOT let a ledger failure
10
- * break a sign-off (telemetry is never load-bearing for the gate).
9
+ * Build + append a scorecard for a completed run. Called from sign_off when the
10
+ * mode's terminal gate is signed (full/hotfix RG, bugfix TEST). Best-effort
11
+ * by contract: callers MUST NOT let a ledger failure break a sign-off (telemetry
12
+ * is never load-bearing for the gate).
11
13
  */
12
- export declare function recordRunScorecard(state: TaskState): Promise<void>;
14
+ export declare function recordRunScorecard(state: TaskState, terminalGate: GateName): Promise<void>;
13
15
  /** Number of COMPLETED (RG-signed) runs in the ledger. */
14
16
  export declare function countCompletedRuns(): Promise<number>;
15
17
  /**
@@ -72,13 +72,14 @@ async function readSideCounts(taskId) {
72
72
  };
73
73
  }
74
74
  /**
75
- * Build + append a scorecard for a completed run. Called from sign_off when RG
76
- * is signed. Best-effort by contract: callers MUST NOT let a ledger failure
77
- * break a sign-off (telemetry is never load-bearing for the gate).
75
+ * Build + append a scorecard for a completed run. Called from sign_off when the
76
+ * mode's terminal gate is signed (full/hotfix RG, bugfix TEST). Best-effort
77
+ * by contract: callers MUST NOT let a ledger failure break a sign-off (telemetry
78
+ * is never load-bearing for the gate).
78
79
  */
79
- export async function recordRunScorecard(state) {
80
+ export async function recordRunScorecard(state, terminalGate) {
80
81
  const extras = await readSideCounts(state.task_id);
81
- await appendScorecard(buildScorecard(state, extras));
82
+ await appendScorecard(buildScorecard(state, extras, terminalGate));
82
83
  }
83
84
  /** Number of COMPLETED (RG-signed) runs in the ledger. */
84
85
  export async function countCompletedRuns() {
@@ -1,10 +1,12 @@
1
1
  import { z } from "zod";
2
+ import { GateName } from "../shared/index.js";
2
3
  import type { TaskState } from "./task_state.js";
3
4
  /**
4
5
  * Run scorecard — a compact, per-pipeline-run summary derived ENTIRELY from
5
6
  * telemetry the MCP state machine already persists (task state + jsonl side
6
7
  * logs). One scorecard is appended to the run ledger when a run completes
7
- * (RG signed). The Auditor's aggregation tool crunches >=3 of these.
8
+ * (its mode's terminal gate is signed RG for full/hotfix, TEST for bugfix).
9
+ * The Auditor's aggregation tool crunches >=3 of these.
8
10
  *
9
11
  * Deliberately records raw signals, not judgments — interpretation belongs to
10
12
  * the aggregator (numbers) and, later, the Auditor agent (findings).
@@ -137,4 +139,4 @@ export type ScorecardExtras = {
137
139
  * Build a run scorecard from a task's persisted state plus side-log counts.
138
140
  * Pure + synchronous: deterministic and unit-testable on synthetic state.
139
141
  */
140
- export declare function buildScorecard(state: TaskState, extras: ScorecardExtras): RunScorecard;
142
+ export declare function buildScorecard(state: TaskState, extras: ScorecardExtras, terminalGate?: GateName): RunScorecard;
@@ -4,7 +4,8 @@ import { ClassificationOutcome, GateName, PipelineMode, Signer, } from "../share
4
4
  * Run scorecard — a compact, per-pipeline-run summary derived ENTIRELY from
5
5
  * telemetry the MCP state machine already persists (task state + jsonl side
6
6
  * logs). One scorecard is appended to the run ledger when a run completes
7
- * (RG signed). The Auditor's aggregation tool crunches >=3 of these.
7
+ * (its mode's terminal gate is signed RG for full/hotfix, TEST for bugfix).
8
+ * The Auditor's aggregation tool crunches >=3 of these.
8
9
  *
9
10
  * Deliberately records raw signals, not judgments — interpretation belongs to
10
11
  * the aggregator (numbers) and, later, the Auditor agent (findings).
@@ -35,10 +36,10 @@ export const RunScorecard = z.object({
35
36
  schema_version: z.literal(SCORECARD_SCHEMA_VERSION),
36
37
  task_id: z.string().min(1),
37
38
  mode: PipelineMode,
38
- /** True when an RG sign-off exists (the state machine does not set completed_at). */
39
+ /** True when the mode's terminal gate has been signed (RG for full/hotfix, TEST for bugfix). */
39
40
  completed: z.boolean(),
40
41
  created_at: z.string(),
41
- /** Timestamp of the RG sign-off, or null if not completed. */
42
+ /** Timestamp of the terminal-gate sign-off, or null if not completed. */
42
43
  completed_at: z.string().nullable(),
43
44
  gates: z.array(GateScore),
44
45
  dev_rollback_count: z.number().int().nonnegative(),
@@ -52,7 +53,7 @@ export const RunScorecard = z.object({
52
53
  * Build a run scorecard from a task's persisted state plus side-log counts.
53
54
  * Pure + synchronous: deterministic and unit-testable on synthetic state.
54
55
  */
55
- export function buildScorecard(state, extras) {
56
+ export function buildScorecard(state, extras, terminalGate = "RG") {
56
57
  // Group sign-offs by gate: count + last signer (last in chronological array).
57
58
  const signoffCount = new Map();
58
59
  const lastSigner = new Map();
@@ -73,9 +74,13 @@ export function buildScorecard(state, extras) {
73
74
  classification: lastClass.get(gate) ?? null,
74
75
  exceptions_count: extras.exceptions_by_gate[gate] ?? 0,
75
76
  }));
76
- const rgSignoffs = state.signoffs.filter((s) => s.gate === "RG");
77
- const completed = rgSignoffs.length > 0;
78
- const completed_at = completed ? rgSignoffs[rgSignoffs.length - 1].timestamp : null;
77
+ // A run is complete when its mode's TERMINAL gate is signed (full/hotfix → RG,
78
+ // bugfix TEST). The terminal gate is supplied by the caller (sign_off knows it
79
+ // from pipeline.yaml); defaults to "RG" so the full-mode path and standalone
80
+ // callers are unchanged.
81
+ const terminalSignoffs = state.signoffs.filter((s) => s.gate === terminalGate);
82
+ const completed = terminalSignoffs.length > 0;
83
+ const completed_at = completed ? terminalSignoffs[terminalSignoffs.length - 1].timestamp : null;
79
84
  const exceptions_count = Object.values(extras.exceptions_by_gate).reduce((a, b) => a + (b ?? 0), 0);
80
85
  // Group skill invocations by skill + gate into counts.
81
86
  const skillAgg = new Map();
@@ -6,7 +6,8 @@ import type { AggregateRunMetricsInput, AggregateRunMetricsOutput } from "../../
6
6
  * the Auditor agent's job (a later ADR).
7
7
  *
8
8
  * Attribution: per-AGENT via gate→produced_by[0] from pipeline.yaml (single
9
- * source); per-WORKFLOW via mode. Only COMPLETED runs (RG signed) are
9
+ * source); per-WORKFLOW via mode. Only COMPLETED runs (the mode's terminal gate
10
+ * signed — RG for full/hotfix, TEST for bugfix) are
10
11
  * aggregated; `min_runs` (default 3) is the design's small-sample guard — below
11
12
  * it `met_threshold=false` and the Auditor should stay silent.
12
13
  *
@@ -14,6 +15,6 @@ import type { AggregateRunMetricsInput, AggregateRunMetricsOutput } from "../../
14
15
  * per_skill (incl. the gates a skill was pulled at) via get_skill instrumentation
15
16
  * (ADR-DEV-121) — the remaining gap is trigger_accuracy (relevant-vs-invoked),
16
17
  * which needs a relevance oracle; no explicit human-rejection signal; completed =
17
- * RG signed (completed_at unset).
18
+ * the mode's terminal gate signed (RG for full/hotfix, TEST for bugfix).
18
19
  */
19
20
  export declare function aggregateRunMetrics(input: AggregateRunMetricsInput): Promise<AggregateRunMetricsOutput>;
@@ -8,7 +8,8 @@ import { resolveActiveDomain } from "../config.js";
8
8
  * the Auditor agent's job (a later ADR).
9
9
  *
10
10
  * Attribution: per-AGENT via gate→produced_by[0] from pipeline.yaml (single
11
- * source); per-WORKFLOW via mode. Only COMPLETED runs (RG signed) are
11
+ * source); per-WORKFLOW via mode. Only COMPLETED runs (the mode's terminal gate
12
+ * signed — RG for full/hotfix, TEST for bugfix) are
12
13
  * aggregated; `min_runs` (default 3) is the design's small-sample guard — below
13
14
  * it `met_threshold=false` and the Auditor should stay silent.
14
15
  *
@@ -16,7 +17,7 @@ import { resolveActiveDomain } from "../config.js";
16
17
  * per_skill (incl. the gates a skill was pulled at) via get_skill instrumentation
17
18
  * (ADR-DEV-121) — the remaining gap is trigger_accuracy (relevant-vs-invoked),
18
19
  * which needs a relevance oracle; no explicit human-rejection signal; completed =
19
- * RG signed (completed_at unset).
20
+ * the mode's terminal gate signed (RG for full/hotfix, TEST for bugfix).
20
21
  */
21
22
  export async function aggregateRunMetrics(input) {
22
23
  const minRuns = input.min_runs;
@@ -117,7 +118,7 @@ export async function aggregateRunMetrics(input) {
117
118
  gates: [...a.gates].sort(),
118
119
  }));
119
120
  const notes = [
120
- "completed = RG sign-off present (state machine does not populate completed_at).",
121
+ "completed = the mode's terminal gate is signed (RG for full/hotfix, TEST for bugfix).",
121
122
  "Skill invocations ARE captured (get_skill instrumentation, ADR-DEV-121): per_skill carries invocation counts + the gates each skill was pulled at, so skill.gates can be tuned to observed usage. The remaining gap is trigger_accuracy (relevant-vs-invoked), which needs a relevance oracle — not this data layer.",
122
123
  "No explicit human gate-rejection signal; rejections manifest as rollbacks/exceptions.",
123
124
  "Per-agent attribution uses gate→produced_by[0] from the active domain's pipeline.yaml.",
@@ -1,4 +1,4 @@
1
- import { getGateConfig, loadPipeline } from "../pipeline.js";
1
+ import { getGateConfig, getNextGate, loadPipeline } from "../pipeline.js";
2
2
  import { readTaskState, writeTaskState } from "../task_state.js";
3
3
  import { recordRunScorecard, countCompletedRuns, auditNudgeFor } from "../audit_ledger.js";
4
4
  import { resolveActiveDomain } from "../config.js";
@@ -57,14 +57,14 @@ export async function signOff(input) {
57
57
  evidence: input.evidence,
58
58
  });
59
59
  await writeTaskState(state);
60
- // Auditor telemetry: when the terminal gate is signed, the run is complete —
61
- // append a scorecard to the local ledger, then compute the /audit nudge.
62
- // Best-effort: a ledger failure (or nudge failure) must NEVER break a
63
- // sign-off (telemetry is not load-bearing for the gate).
60
+ // Auditor telemetry: when the run reaches its mode's TERMINAL gate (full/hotfix
61
+ // RG, bugfix TEST), the run is complete append a scorecard to the local
62
+ // ledger, then compute the /audit nudge. Best-effort: a ledger failure (or nudge
63
+ // failure) must NEVER break a sign-off (telemetry is not load-bearing for the gate).
64
64
  let audit_nudge;
65
- if (input.gate === "RG") {
65
+ if (getNextGate(pipeline, state.mode, input.gate) === null) {
66
66
  try {
67
- await recordRunScorecard(state);
67
+ await recordRunScorecard(state, input.gate);
68
68
  audit_nudge = auditNudgeFor(await countCompletedRuns());
69
69
  }
70
70
  catch {
@@ -22,7 +22,7 @@ schema_version: 1
22
22
  - НЕ на каждом гейте и НЕ в фоне. Один проход, когда накопилось **≥3 завершённых прогона** (порог настраивается).
23
23
  - Ниже порога — молчит. На n=1 выводов не делает (малая выборка).
24
24
  - Per-gate телеметрию уже пишет стейт-машина; Аудитор делает один проход по накопленным данным.
25
- - Запуск прохода: команда `/audit` (вручную) или подсказка `audit_nudge`, которую `sign_off` возвращает после RG.
25
+ - Запуск прохода: команда `/audit` (вручную) или подсказка `audit_nudge`, которую `sign_off` возвращает после завершения прогона (подписан финальный гейт режима — RG для full/hotfix, TEST для bugfix).
26
26
 
27
27
  ---
28
28
 
@@ -57,7 +57,7 @@ schema_version: 1
57
57
  - **Через человека даже в автономии:** разрушительное по существующему (удаление, крупная переписка, снятие возможностей).
58
58
  - **Всегда:** обязательный отчёт после любого автономного действия. Ничего невидимого.
59
59
  - **Дедуп при добавлении:** перед авто-добавлением скила — проверка пересечения (`related` + контролируемый словарь); отчёт перечисляет добавления для последующей чистки.
60
- - Механизм предложение→одобрение и тумблер автономии уже реализованы (`propose_change` → `review_proposal`: матрица рисков + дедуп). Проход запускается командой `/audit` или подсказкой после RG.
60
+ - Механизм предложение→одобрение и тумблер автономии уже реализованы (`propose_change` → `review_proposal`: матрица рисков + дедуп). Проход запускается командой `/audit` или подсказкой после завершения прогона.
61
61
 
62
62
  ---
63
63
 
@@ -22,7 +22,7 @@ Close the self-improvement loop: build → run → measure → improve. Once rea
22
22
  - NOT at every gate and NOT in the background. One pass, once **≥3 completed runs** have accumulated (threshold configurable).
23
23
  - Below the threshold — silent. It draws no conclusions from n=1 (small sample).
24
24
  - Per-gate telemetry is already persisted by the state machine; the Auditor makes one pass over the accumulated data.
25
- - Triggering a pass: the `/audit` command (manual) or the `audit_nudge` that `sign_off` returns after RG.
25
+ - Triggering a pass: the `/audit` command (manual) or the `audit_nudge` that `sign_off` returns when a run completes (its mode's terminal gate is signed — RG for full/hotfix, TEST for bugfix).
26
26
 
27
27
  ---
28
28
 
@@ -57,7 +57,7 @@ Close the self-improvement loop: build → run → measure → improve. Once rea
57
57
  - **Human-gated even under autonomy:** destructive changes to existing assets (delete, major rewrite, capability removal).
58
58
  - **Always:** a mandatory report after any autonomous action. Nothing invisible.
59
59
  - **Additive dedup:** before auto-adding a skill — an overlap check (`related` + controlled vocab); the report lists additions for later pruning.
60
- - The propose→approve mechanism and the autonomy toggle already exist (`propose_change` → `review_proposal`: risk matrix + dedup). A pass is triggered by `/audit` or the post-RG nudge.
60
+ - The propose→approve mechanism and the autonomy toggle already exist (`propose_change` → `review_proposal`: risk matrix + dedup). A pass is triggered by `/audit` or the post-completion nudge.
61
61
 
62
62
  ---
63
63
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "code-ai-installer",
3
- "version": "4.3.1",
3
+ "version": "4.3.2",
4
4
  "description": "Production-ready CLI to install code-ai agents and skills for multiple AI coding assistants. Bundles the code-ai-mcp MCP server for Claude Code.",
5
5
  "license": "MIT",
6
6
  "author": "Denish1209",