loki-mode 7.15.0 → 7.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,127 @@
1
+ # R4: Visible Trust Trajectory - Design Note
2
+
3
+ Status: implemented in worktree (not yet merged). Author: R4 release team.
4
+ Verified against live source on 2026-06-03 (v7.8.3 worktree base; R1/R3/R5
5
+ already shipped, so the arc is further along than the loki-plan doc states).
6
+
7
+ ## The story no competitor tells
8
+
9
+ Devin, Cursor, Windsurf, Claude Code, Aider et al. show you a single run.
10
+ None show you whether the agent is getting more trustworthy on YOUR repo over
11
+ time. Loki already runs a 3-reviewer council + RARV-C closure on every run and
12
+ persists the result. R4 makes the resulting TRUST TRAJECTORY visible:
13
+
14
+ - council approve-rate trending UP
15
+ - gate pass-rate trending UP
16
+ - iterations-to-completion trending DOWN
17
+ - human interventions trending DOWN
18
+
19
+ If the agent is earning autonomy on this repo, the trajectory shows it. That is
20
+ compounding, repo-specific proof of trust -> stickiness.
21
+
22
+ ## Honest-data rule (non-negotiable)
23
+
24
+ Every number derives from REAL persisted run records. Never fabricate a trend.
25
+ With fewer than 2 runs, the trajectory is reported as "not enough history yet"
26
+ (insufficient=true), never a fake direction.
27
+
28
+ ## Data source (REUSED, not new)
29
+
30
+ R3 already established `.loki/proofs/<run_id>/proof.json` as the persistent,
31
+ one-per-run history record (written by `autonomy/lib/proof-generator.py` at run
32
+ completion, on both success and failure, unless `LOKI_PROOF=0`). The R3 cost
33
+ timeline endpoint (`dashboard/server.py` `/api/cost/timeline`) already mines
34
+ this exact directory for per-run cost history.
35
+
36
+ R4 mines the SAME directory for the trust signals already present in each
37
+ proof.json:
38
+
39
+ | Trust signal | proof.json path | Notes |
40
+ |-------------------------|------------------------------------------|-------|
41
+ | council pass (per run) | `council.final_verdict` | APPROVE/APPROVED/COMPLETE => pass |
42
+ | council ratio (per run) | `council.reviewers[].vote` (APPROVE/...) | secondary signal when verdict absent |
43
+ | gate pass-rate (per run)| `quality_gates.passed` / `.total` | already aggregated by generator |
44
+ | iterations (per run) | `iterations` (int or {count}) | iterations-to-completion |
45
+ | files changed (per run) | `files_changed.count` | context, not a trust axis |
46
+ | timestamp | `generated_at` (ISO 8601) | ordering axis |
47
+
48
+ Human interventions: there is no per-run intervention counter persisted in
49
+ proof.json today. Rather than fabricate one or add new instrumentation in this
50
+ slice, R4 reports interventions as a derived best-effort signal ONLY when the
51
+ proof carries it (`council.interventions` or top-level `interventions`), and
52
+ otherwise marks that axis `available=false` with an honest note. This keeps the
53
+ honest-data rule intact and leaves a clean seam for a future per-run
54
+ intervention counter (a one-line add in proof-generator.py).
55
+
56
+ ## Direction calculation (up / down / flat)
57
+
58
+ For each numeric axis across the time-ordered run series:
59
+
60
+ 1. Split the series into an earlier half and a later half (median split; odd
61
+ counts drop the middle point so the two halves never overlap).
62
+ 2. Compare the mean of the later half vs the earlier half.
63
+ 3. delta = later_mean - earlier_mean. Direction:
64
+ - `flat` if |delta| <= epsilon (epsilon scaled per axis; rates use 0.01).
65
+ - `up` / `down` by sign of delta.
66
+ 4. "Good direction" is axis-specific: higher is better for council/gate pass
67
+ rates; lower is better for iterations + interventions. The `improving`
68
+ boolean encodes whether the direction is the good one, so the UI can color
69
+ green/red without re-encoding the per-axis polarity.
70
+
71
+ Rationale for half-split vs least-squares slope: half-split is robust to a
72
+ single noisy run, needs no float regression in bash, and is trivially testable
73
+ with fixtures. A 2-run series degrades to last-vs-first, which is correct.
74
+
75
+ ## Persistence (under .loki/metrics/, REUSED dir)
76
+
77
+ The aggregated trajectory is persisted to
78
+ `.loki/metrics/trust-trajectory.json` (schema_version 1). This is a derived
79
+ cache, written by the `loki trust` command and the dashboard endpoint so other
80
+ surfaces can read a single source of truth. It is NOT authoritative state: it
81
+ is always recomputable from `.loki/proofs/`. Deleting it loses nothing.
82
+
83
+ ## Surfaces
84
+
85
+ 1. CLI: `loki trust [--json]` (NEW Bun-native command, mirrors `loki kpis`
86
+ exactly). Falls through to a bash `cmd_trust` when bun is absent (kpis had
87
+ no bash fallback; R4 adds one because the Python derivation is shared and
88
+ trivial to call from bash, giving real bash+Bun parity).
89
+ - `loki kpis` stays a single-run snapshot. R4 does NOT duplicate it; `trust`
90
+ is the across-runs trajectory view. `loki kpis` output gains a one-line
91
+ pointer to `loki trust` (no behavior change).
92
+
93
+ 2. Dashboard endpoint: `GET /api/trust/trajectory` (NEW, mirrors
94
+ `/api/cost/timeline`). Reads `.loki/proofs/*/proof.json`, returns the
95
+ per-run series + per-axis direction + insufficient flag.
96
+
97
+ 3. Dashboard panel: standalone `dashboard/static/trust.html` + `/trust` route
98
+ (mirrors `cost.html` + `/cost`), plus a nav entry and SPA section in
99
+ `build-standalone.js` (mirrors the cost panel wiring exactly).
100
+
101
+ 4. WS push: the `_push_loki_state_loop` broadcasts a `trust_update` message
102
+ when the trajectory's overall improving-count changes (mirrors the R3
103
+ `budget_status` transition push). No new channel; reuses manager.broadcast.
104
+
105
+ ## Parity + no-duplication audit
106
+
107
+ - Data: reuses `.loki/proofs/` (R1/R3). No new run-time instrumentation.
108
+ - Endpoint: new route, but copies the `/api/cost/timeline` read pattern and
109
+ the `_proofs_dir()` / `_safe_json_read` helpers verbatim in spirit.
110
+ - Panel: new `trust.html`, structurally a sibling of `cost.html`.
111
+ - CLI: new `trust`, structurally a sibling of `kpis`. `kpis` unchanged except a
112
+ one-line see-also.
113
+ - Shared derivation: a single Python module
114
+ (`autonomy/lib/trust_trajectory.py`) is the source of truth; the dashboard
115
+ endpoint imports it, and the bash `cmd_trust` shells out to it. The Bun
116
+ command reimplements the same pure logic in TS (parity-tested), matching how
117
+ `kpis` has both a TS derivation and reads the same JSON the bash side writes.
118
+
119
+ ## Test plan
120
+
121
+ - Python: `tests/test_trust_trajectory.py` - aggregation from fixture
122
+ proof.json files, direction calc (up/down/flat) per axis polarity, the
123
+ insufficient-history (<2 runs) case, no-PII (only derived numbers + run_id +
124
+ timestamps leave the function), malformed proof.json skipped not fatal.
125
+ - TS: `loki-ts/tests/metrics/trust.test.ts` - same aggregation + direction
126
+ parity on identical fixtures, insufficient case, JSON/human formatting.
127
+ - All mocked from on-disk fixtures. No provider calls, no paid calls.