role-os 2.5.0 → 2.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +25 -0
- package/bin/roleos.mjs +10 -0
- package/package.json +1 -1
- package/src/citation-panel.mjs +9 -7
- package/src/specialist/budget-consult.mjs +120 -0
- package/src/specialist/client.mjs +131 -0
- package/src/specialist/dispatch.mjs +237 -0
- package/src/specialist/events.mjs +56 -0
- package/src/specialist/gate.mjs +202 -0
- package/src/specialist/registry.mjs +219 -0
- package/src/specialist/shadow.mjs +122 -0
- package/src/specialist/state.mjs +125 -0
- package/src/specialist-cmd.mjs +378 -0
- package/src/verify-citations.mjs +1 -0
- package/starter-pack/policy/specialist-tier.md +288 -0
- package/starter-pack/schemas/specialist.md +155 -0
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
# Specialist — role schema extension + registry entry
|
|
2
|
+
|
|
3
|
+
This schema is **additive and non-breaking**. A role without a `specialist:` block behaves
|
|
4
|
+
exactly as today (Claude-backed). A role with the block declares a trained adapter that the
|
|
5
|
+
gate may route to per dispatch — see `policy/specialist-tier.md` for the law.
|
|
6
|
+
|
|
7
|
+
There are two related but distinct shapes:
|
|
8
|
+
|
|
9
|
+
1. **The `specialist:` block on a role** — declares that the role has a specialist available,
|
|
10
|
+
and where to find it.
|
|
11
|
+
2. **The registry entry** — the on-disk record (`.role-os/specialists.json`) that the gate
|
|
12
|
+
loads. The registry holds the version history; the role block points into it.
|
|
13
|
+
|
|
14
|
+
## 1. The `specialist:` block on a role
|
|
15
|
+
|
|
16
|
+
A role may include a `specialist:` block. The block is consumed by the gate; the role's
|
|
17
|
+
behavior definition (its `.md` file under `starter-pack/agents/`) does not change.
|
|
18
|
+
|
|
19
|
+
```json
|
|
20
|
+
{
|
|
21
|
+
"role": "<existing role name>",
|
|
22
|
+
"specialist": {
|
|
23
|
+
"backend_url": "<string — e.g. http://localhost:8000>",
|
|
24
|
+
"adapter_id": "<string — the pinned adapter the backend should serve>",
|
|
25
|
+
"gate_threshold": <number in [0, 1] — OvA score below this fails open to Claude>,
|
|
26
|
+
"fallback": "claude",
|
|
27
|
+
"workload_quota": <number in (0, 1] — max share of dispatches per window>,
|
|
28
|
+
"certified_level": "<string — e.g. L0 (uncertified), L1, L2…>"
|
|
29
|
+
}
|
|
30
|
+
}
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Field meanings:
|
|
34
|
+
|
|
35
|
+
| Field | Type | Required | Meaning |
|
|
36
|
+
|-------|------|----------|---------|
|
|
37
|
+
| `backend_url` | string | yes | Base URL of the HTTP service implementing the [Specialist HTTP contract](../policy/specialist-tier.md#specialist-http-contract). v0.1 contract: `POST <backend_url>/verify`. |
|
|
38
|
+
| `adapter_id` | string | yes | The trained adapter pin. The backend must echo it; mismatch fails open. |
|
|
39
|
+
| `gate_threshold` | number | yes | OvA score floor. `score < gate_threshold` fails open to Claude. v0.1 default in code: 0.75. |
|
|
40
|
+
| `fallback` | string | yes | Must be `"claude"` in v0.1. Reserved for future families. |
|
|
41
|
+
| `workload_quota` | number | yes | Max share of dispatches per window. v0.1 window default: 200 dispatches. |
|
|
42
|
+
| `certified_level` | string | yes | The current certification level. `"L0"` means uncertified; the gate refuses to route to an uncertified specialist (see Reject 2 in the policy). |
|
|
43
|
+
|
|
44
|
+
A role without a `specialist:` block — or with `specialist: null` — is Claude-backed
|
|
45
|
+
throughout. Removing the block is a valid way to disable specialist dispatch for a role.
|
|
46
|
+
|
|
47
|
+
## 2. The registry entry
|
|
48
|
+
|
|
49
|
+
The registry lives at `.role-os/specialists.json` (overridable via `ROLEOS_SPECIALISTS_PATH`).
|
|
50
|
+
It is the on-disk record the gate loads at boot.
|
|
51
|
+
|
|
52
|
+
```json
|
|
53
|
+
{
|
|
54
|
+
"schema": "roleos-specialist-registry/v1",
|
|
55
|
+
"specialists": [
|
|
56
|
+
{
|
|
57
|
+
"role": "<existing role name>",
|
|
58
|
+
"backend_url": "<string>",
|
|
59
|
+
"fallback": "claude",
|
|
60
|
+
"workload_quota": <number>,
|
|
61
|
+
"active_version": "<string — id from versions[]>",
|
|
62
|
+
"versions": [
|
|
63
|
+
{
|
|
64
|
+
"id": "<string — opaque version id>",
|
|
65
|
+
"adapter_id": "<string>",
|
|
66
|
+
"base_model": "<string — must NOT be a Claude-family id>",
|
|
67
|
+
"gate_threshold": <number in [0, 1]>,
|
|
68
|
+
"certified_level": "<string — L0 / L1 / L2 / …>",
|
|
69
|
+
"exam_hash": "<string — sha256 of the certification exam this version was scored against>",
|
|
70
|
+
"field_audit_window": <number — rolling window for field audit (e.g. 200 dispatches)>,
|
|
71
|
+
"created_at": "<ISO-8601 timestamp>",
|
|
72
|
+
"notes": "<string — optional, operator-facing>"
|
|
73
|
+
}
|
|
74
|
+
]
|
|
75
|
+
}
|
|
76
|
+
]
|
|
77
|
+
}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Field meanings (registry-specific — block fields above carry the same meaning):
|
|
81
|
+
|
|
82
|
+
| Field | Type | Required | Meaning |
|
|
83
|
+
|-------|------|----------|---------|
|
|
84
|
+
| `schema` | string | yes | Schema id with a major version. v0.1 = `roleos-specialist-registry/v1`. |
|
|
85
|
+
| `specialists[].active_version` | string \| null | yes | The `versions[].id` that the gate currently routes to, or `null` if no version is currently active. An all-L0 registry starts at `null`. Promotion is the only way this changes from `null` to a version id. |
|
|
86
|
+
| `versions[].id` | string | yes | Opaque to the gate; usually a content-addressable id. Unique within `versions[]`. |
|
|
87
|
+
| `versions[].base_model` | string | yes | The base model the adapter sits on. **Rejected at load** if it resolves to a Claude-family id (see Reject 1). |
|
|
88
|
+
| `versions[].exam_hash` | string | yes | SHA-256 of the certification exam this version was scored against. Two versions with different `exam_hash` cannot be compared without recomputing — the eval gate enforces this. |
|
|
89
|
+
| `versions[].field_audit_window` | number | yes | The rolling-window size for field audit. The eval harness writes outcomes against this. |
|
|
90
|
+
| `versions[].created_at` | string | yes | When this version entered the registry. Used for ordering, not for any decision. |
|
|
91
|
+
|
|
92
|
+
## Reject conditions enforced at registry load
|
|
93
|
+
|
|
94
|
+
(Mirrors the policy's reject conditions, applied at the registry layer.)
|
|
95
|
+
|
|
96
|
+
- **R1.** `base_model` resolves to a Claude-family id → entry refused.
|
|
97
|
+
- **R2.** `active_version` is set to a version with `certified_level: "L0"` → promotion
|
|
98
|
+
refused. (A registry shipped with all-L0 specialists is valid; promotion is the gate.)
|
|
99
|
+
- **R3.** Two versions with the same `id` in `versions[]` → registry refused (id collision).
|
|
100
|
+
- **R4.** `active_version` does not appear in `versions[]` → registry refused (dangling
|
|
101
|
+
pointer).
|
|
102
|
+
- **R5.** `gate_threshold` outside `[0, 1]` → entry refused.
|
|
103
|
+
- **R6.** `workload_quota` outside `(0, 1]` → entry refused.
|
|
104
|
+
- **R7.** `schema` does not match the supported major version → registry refused.
|
|
105
|
+
|
|
106
|
+
R1, R3, and R4 are correctness invariants — there is no flag to bypass them.
|
|
107
|
+
|
|
108
|
+
## What is NOT in the registry
|
|
109
|
+
|
|
110
|
+
- **Adapter binaries.** The registry references adapters by `adapter_id`; the binaries live
|
|
111
|
+
with the serving substrate (gpu-container's vLLM container in v1). A registry without
|
|
112
|
+
matching backend artifacts is valid — calls will fail open at dispatch time, not at load.
|
|
113
|
+
- **Eval harness state.** The certification exam and the field audit data live in the eval
|
|
114
|
+
harness (built in the training kickoffs). The registry only holds `exam_hash` and
|
|
115
|
+
`field_audit_window` as pins.
|
|
116
|
+
- **Shadow-probe history.** The shadow-probe log is its own append-only log
|
|
117
|
+
(`.role-os/specialist-shadow-probes.jsonl`). It is not in the registry — registries are
|
|
118
|
+
pointer state, logs are history.
|
|
119
|
+
- **Operator state.** Halt-clear receipts and rollback receipts live in
|
|
120
|
+
`.role-os/specialist-events.jsonl`, not in the registry itself.
|
|
121
|
+
|
|
122
|
+
## Example registry (one role, one uncertified version, no active version)
|
|
123
|
+
|
|
124
|
+
```json
|
|
125
|
+
{
|
|
126
|
+
"schema": "roleos-specialist-registry/v1",
|
|
127
|
+
"specialists": [
|
|
128
|
+
{
|
|
129
|
+
"role": "Verifier",
|
|
130
|
+
"backend_url": "http://localhost:8000",
|
|
131
|
+
"fallback": "claude",
|
|
132
|
+
"workload_quota": 0.7,
|
|
133
|
+
"active_version": null,
|
|
134
|
+
"versions": [
|
|
135
|
+
{
|
|
136
|
+
"id": "v0-stub",
|
|
137
|
+
"adapter_id": "verifier-l4-stub-2026-06-04",
|
|
138
|
+
"base_model": "Qwen/Qwen3-7B",
|
|
139
|
+
"gate_threshold": 0.75,
|
|
140
|
+
"certified_level": "L0",
|
|
141
|
+
"exam_hash": "0000000000000000000000000000000000000000000000000000000000000000",
|
|
142
|
+
"field_audit_window": 200,
|
|
143
|
+
"created_at": "2026-06-04T00:00:00Z",
|
|
144
|
+
"notes": "v0.1 stub entry — uncertified, not yet promoted to active."
|
|
145
|
+
}
|
|
146
|
+
]
|
|
147
|
+
}
|
|
148
|
+
]
|
|
149
|
+
}
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
The role has a version on file, but `active_version` is `null` — so every dispatch for this
|
|
153
|
+
role goes to Claude. The L0 version cannot be promoted (Reject 2); a certified L1+ version
|
|
154
|
+
would be added to `versions[]` by the eval harness and then promoted via `roleos specialist
|
|
155
|
+
promote <role> <version>`.
|