role-os 2.5.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,155 @@
1
+ # Specialist — role schema extension + registry entry
2
+
3
+ This schema is **additive and non-breaking**. A role without a `specialist:` block behaves
4
+ exactly as today (Claude-backed). A role with the block declares a trained adapter that the
5
+ gate may route to per dispatch — see `policy/specialist-tier.md` for the law.
6
+
7
+ There are two related but distinct shapes:
8
+
9
+ 1. **The `specialist:` block on a role** — declares that the role has a specialist available,
10
+ and where to find it.
11
+ 2. **The registry entry** — the on-disk record (`.role-os/specialists.json`) that the gate
12
+ loads. The registry holds the version history; the role block points into it.
13
+
14
+ ## 1. The `specialist:` block on a role
15
+
16
+ A role may include a `specialist:` block. The block is consumed by the gate; the role's
17
+ behavior definition (its `.md` file under `starter-pack/agents/`) does not change.
18
+
19
+ ```json
20
+ {
21
+ "role": "<existing role name>",
22
+ "specialist": {
23
+ "backend_url": "<string — e.g. http://localhost:8000>",
24
+ "adapter_id": "<string — the pinned adapter the backend should serve>",
25
+ "gate_threshold": <number in [0, 1] — OvA score below this fails open to Claude>,
26
+ "fallback": "claude",
27
+ "workload_quota": <number in (0, 1] — max share of dispatches per window>,
28
+ "certified_level": "<string — e.g. L0 (uncertified), L1, L2…>"
29
+ }
30
+ }
31
+ ```
32
+
33
+ Field meanings:
34
+
35
+ | Field | Type | Required | Meaning |
36
+ |-------|------|----------|---------|
37
+ | `backend_url` | string | yes | Base URL of the HTTP service implementing the [Specialist HTTP contract](../policy/specialist-tier.md#specialist-http-contract). v0.1 contract: `POST <backend_url>/verify`. |
38
+ | `adapter_id` | string | yes | The trained adapter pin. The backend must echo it; mismatch fails open. |
39
+ | `gate_threshold` | number | yes | OvA score floor. `score < gate_threshold` fails open to Claude. v0.1 default in code: 0.75. |
40
+ | `fallback` | string | yes | Must be `"claude"` in v0.1. Reserved for future families. |
41
+ | `workload_quota` | number | yes | Max share of dispatches per window. v0.1 window default: 200 dispatches. |
42
+ | `certified_level` | string | yes | The current certification level. `"L0"` means uncertified; the gate refuses to route to an uncertified specialist (see Reject 2 in the policy). |
43
+
44
+ A role without a `specialist:` block — or with `specialist: null` — is Claude-backed
45
+ throughout. Removing the block is a valid way to disable specialist dispatch for a role.
46
+
47
+ ## 2. The registry entry
48
+
49
+ The registry lives at `.role-os/specialists.json` (overridable via `ROLEOS_SPECIALISTS_PATH`).
50
+ It is the on-disk record the gate loads at boot.
51
+
52
+ ```json
53
+ {
54
+ "schema": "roleos-specialist-registry/v1",
55
+ "specialists": [
56
+ {
57
+ "role": "<existing role name>",
58
+ "backend_url": "<string>",
59
+ "fallback": "claude",
60
+ "workload_quota": <number>,
61
+ "active_version": "<string — id from versions[]>",
62
+ "versions": [
63
+ {
64
+ "id": "<string — opaque version id>",
65
+ "adapter_id": "<string>",
66
+ "base_model": "<string — must NOT be a Claude-family id>",
67
+ "gate_threshold": <number in [0, 1]>,
68
+ "certified_level": "<string — L0 / L1 / L2 / …>",
69
+ "exam_hash": "<string — sha256 of the certification exam this version was scored against>",
70
+ "field_audit_window": <number — rolling window for field audit (e.g. 200 dispatches)>,
71
+ "created_at": "<ISO-8601 timestamp>",
72
+ "notes": "<string — optional, operator-facing>"
73
+ }
74
+ ]
75
+ }
76
+ ]
77
+ }
78
+ ```
79
+
80
+ Field meanings (registry-specific — block fields above carry the same meaning):
81
+
82
+ | Field | Type | Required | Meaning |
83
+ |-------|------|----------|---------|
84
+ | `schema` | string | yes | Schema id with a major version. v0.1 = `roleos-specialist-registry/v1`. |
85
+ | `specialists[].active_version` | string \| null | yes | The `versions[].id` that the gate currently routes to, or `null` if no version is currently active. An all-L0 registry starts at `null`. Promotion is the only way this changes from `null` to a version id. |
86
+ | `versions[].id` | string | yes | Opaque to the gate; usually a content-addressable id. Unique within `versions[]`. |
87
+ | `versions[].base_model` | string | yes | The base model the adapter sits on. **Rejected at load** if it resolves to a Claude-family id (see Reject 1). |
88
+ | `versions[].exam_hash` | string | yes | SHA-256 of the certification exam this version was scored against. Two versions with different `exam_hash` cannot be compared without recomputing — the eval gate enforces this. |
89
+ | `versions[].field_audit_window` | number | yes | The rolling-window size for field audit. The eval harness writes outcomes against this. |
90
+ | `versions[].created_at` | string | yes | When this version entered the registry. Used for ordering, not for any decision. |
91
+
92
+ ## Reject conditions enforced at registry load
93
+
94
+ (Mirrors the policy's reject conditions, applied at the registry layer.)
95
+
96
+ - **R1.** `base_model` resolves to a Claude-family id → entry refused.
97
+ - **R2.** `active_version` is set to a version with `certified_level: "L0"` → promotion
98
+ refused. (A registry shipped with all-L0 specialists is valid; promotion is the gate.)
99
+ - **R3.** Two versions with the same `id` in `versions[]` → registry refused (id collision).
100
+ - **R4.** `active_version` does not appear in `versions[]` → registry refused (dangling
101
+ pointer).
102
+ - **R5.** `gate_threshold` outside `[0, 1]` → entry refused.
103
+ - **R6.** `workload_quota` outside `(0, 1]` → entry refused.
104
+ - **R7.** `schema` does not match the supported major version → registry refused.
105
+
106
+ R1, R3, and R4 are correctness invariants — there is no flag to bypass them.
107
+
108
+ ## What is NOT in the registry
109
+
110
+ - **Adapter binaries.** The registry references adapters by `adapter_id`; the binaries live
111
+ with the serving substrate (gpu-container's vLLM container in v1). A registry without
112
+ matching backend artifacts is valid — calls will fail open at dispatch time, not at load.
113
+ - **Eval harness state.** The certification exam and the field audit data live in the eval
114
+ harness (built in the training kickoffs). The registry only holds `exam_hash` and
115
+ `field_audit_window` as pins.
116
+ - **Shadow-probe history.** The shadow-probe log is its own append-only log
117
+ (`.role-os/specialist-shadow-probes.jsonl`). It is not in the registry — registries are
118
+ pointer state, logs are history.
119
+ - **Operator state.** Halt-clear receipts and rollback receipts live in
120
+ `.role-os/specialist-events.jsonl`, not in the registry itself.
121
+
122
+ ## Example registry (one role, one uncertified version, no active version)
123
+
124
+ ```json
125
+ {
126
+ "schema": "roleos-specialist-registry/v1",
127
+ "specialists": [
128
+ {
129
+ "role": "Verifier",
130
+ "backend_url": "http://localhost:8000",
131
+ "fallback": "claude",
132
+ "workload_quota": 0.7,
133
+ "active_version": null,
134
+ "versions": [
135
+ {
136
+ "id": "v0-stub",
137
+ "adapter_id": "verifier-l4-stub-2026-06-04",
138
+ "base_model": "Qwen/Qwen3-7B",
139
+ "gate_threshold": 0.75,
140
+ "certified_level": "L0",
141
+ "exam_hash": "0000000000000000000000000000000000000000000000000000000000000000",
142
+ "field_audit_window": 200,
143
+ "created_at": "2026-06-04T00:00:00Z",
144
+ "notes": "v0.1 stub entry — uncertified, not yet promoted to active."
145
+ }
146
+ ]
147
+ }
148
+ ]
149
+ }
150
+ ```
151
+
152
+ The role has a version on file, but `active_version` is `null` — so every dispatch for this
153
+ role goes to Claude. The L0 version cannot be promoted (Reject 2); a certified L1+ version
154
+ would be added to `versions[]` by the eval harness and then promoted via `roleos specialist
155
+ promote <role> <version>`.