open-research-protocol 0.4.16 → 0.4.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/ORP_LINK_RUNNER_PLAN.md +6 -6
- package/docs/ORP_PUBLIC_LAUNCH_CHECKLIST.md +2 -2
- package/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md +3 -3
- package/docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md +2 -2
- package/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md +2 -2
- package/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md +4 -4
- package/docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md +3 -3
- package/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md +8 -8
- package/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md +25 -25
- package/docs/ORP_REASONING_KERNEL_EVOLUTION.md +4 -4
- package/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md +4 -4
- package/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md +19 -19
- package/docs/ORP_REASONING_KERNEL_V0_1.md +8 -8
- package/package.json +1 -1
- package/packages/orp-workspace-launcher/README.md +5 -5
- package/packages/orp-workspace-launcher/src/ledger.js +6 -6
- package/packages/orp-workspace-launcher/src/orp-command.js +3 -3
- package/scripts/render-terminal-demo.py +3 -3
|
@@ -243,10 +243,10 @@ Behavior:
|
|
|
243
243
|
|
|
244
244
|
Formal schemas live in:
|
|
245
245
|
|
|
246
|
-
- [link-project.schema.json](
|
|
247
|
-
- [link-session.schema.json](
|
|
248
|
-
- [runner-machine.schema.json](
|
|
249
|
-
- [runner-runtime.schema.json](
|
|
246
|
+
- [link-project.schema.json](../spec/v1/link-project.schema.json)
|
|
247
|
+
- [link-session.schema.json](../spec/v1/link-session.schema.json)
|
|
248
|
+
- [runner-machine.schema.json](../spec/v1/runner-machine.schema.json)
|
|
249
|
+
- [runner-runtime.schema.json](../spec/v1/runner-runtime.schema.json)
|
|
250
250
|
|
|
251
251
|
Planned file locations and schema usage:
|
|
252
252
|
|
|
@@ -454,14 +454,14 @@ The CLI and Rust app now share one client-side project/session/runner contract,
|
|
|
454
454
|
- [x] Add route/helper logging for failed poll/start/complete flows, lease mismatches, missing-routeable-session failures, and repeated retry patterns.
|
|
455
455
|
- [x] Surface runner health in the Rust desktop app so operators can see online/syncing/working/error states locally.
|
|
456
456
|
- [x] Add an internal rollout and recovery runbook:
|
|
457
|
-
- [RUNNER_INTERNAL_OPERATIONS.md](
|
|
457
|
+
- [RUNNER_INTERNAL_OPERATIONS.md](./RUNNER_INTERNAL_OPERATIONS.md)
|
|
458
458
|
- [x] Deploy the hosted runner backend changes to the real internal environment.
|
|
459
459
|
- [x] Run a live internal smoke on deployed infrastructure.
|
|
460
460
|
- Completed on March 16, 2026 against `https://orp.earth`.
|
|
461
461
|
- Verified `orp link project bind`, `orp link session register`, `orp runner enable`, `orp runner sync`, `orp checkpoint queue`, `orp runner work --once`, and `orp agent work --once`.
|
|
462
462
|
- Confirmed the production `orp` checkpoint job `78cd459a-fc0b-451b-af06-be2d27379169` completed successfully and produced checkpoint response `41087b8b-9556-4ec1-90c6-eefb69bac585`.
|
|
463
463
|
- [x] Add and verify a reusable Rust-side smoke harness for the desktop wrapper path.
|
|
464
|
-
- Implemented at
|
|
464
|
+
- Implemented in the companion Rust workspace at `orp-rust/src/bin/runner_smoke.rs`.
|
|
465
465
|
- Verified on March 16, 2026 against `https://orp.earth`.
|
|
466
466
|
- Confirmed Rust-side smoke job `853a55f9-b0e5-42f7-8f2f-cdc8db1a354c` completed successfully and produced checkpoint response `6b5aee77-f176-4249-a127-978b987da946`.
|
|
467
467
|
|
|
@@ -51,7 +51,7 @@ Use this checklist when releasing ORP as the unified public CLI and product surf
|
|
|
51
51
|
- `orp agent work --once --json` remains available as the compatibility path
|
|
52
52
|
- Confirm the checkpoint response lands back in the hosted workspace.
|
|
53
53
|
- Confirm the hosted operator console reflects the same lifecycle at `/dashboard/admin/runners`.
|
|
54
|
-
- Use [RUNNER_INTERNAL_OPERATIONS.md](
|
|
54
|
+
- Use [RUNNER_INTERNAL_OPERATIONS.md](./RUNNER_INTERNAL_OPERATIONS.md) for the internal rollout and recovery flow.
|
|
55
55
|
|
|
56
56
|
## 4. Package release
|
|
57
57
|
|
|
@@ -89,4 +89,4 @@ Use this checklist when releasing ORP as the unified public CLI and product surf
|
|
|
89
89
|
- Keep the web app and CLI rollout loosely coupled.
|
|
90
90
|
- Launch the ORP CLI first if the web app/domain transition is still in progress.
|
|
91
91
|
- Do not change domain, auth, runner, and package names all in one step unless all staging checks are green.
|
|
92
|
-
- Follow [ORP_WEB_DOMAIN_TRANSITION_PLAN.md](
|
|
92
|
+
- Follow [ORP_WEB_DOMAIN_TRANSITION_PLAN.md](./ORP_WEB_DOMAIN_TRANSITION_PLAN.md) for the hosted cutover sequence.
|
|
@@ -5,12 +5,12 @@ the ORP Reasoning Kernel.
|
|
|
5
5
|
|
|
6
6
|
Supporting artifact:
|
|
7
7
|
|
|
8
|
-
- [docs/benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json](
|
|
8
|
+
- [docs/benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json](./benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json)
|
|
9
9
|
|
|
10
10
|
Supporting corpus and harness:
|
|
11
11
|
|
|
12
|
-
- [examples/kernel/comparison/comparison-corpus.json](
|
|
13
|
-
- [scripts/orp-kernel-agent-pilot.py](
|
|
12
|
+
- [examples/kernel/comparison/comparison-corpus.json](../examples/kernel/comparison/comparison-corpus.json)
|
|
13
|
+
- [scripts/orp-kernel-agent-pilot.py](../scripts/orp-kernel-agent-pilot.py)
|
|
14
14
|
|
|
15
15
|
## What This Pilot Measures
|
|
16
16
|
|
|
@@ -5,11 +5,11 @@ pilot for the live ORP kernel agent evaluation.
|
|
|
5
5
|
|
|
6
6
|
Supporting artifact:
|
|
7
7
|
|
|
8
|
-
- [docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json](
|
|
8
|
+
- [docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json](./benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json)
|
|
9
9
|
|
|
10
10
|
Supporting harness:
|
|
11
11
|
|
|
12
|
-
- [scripts/orp-kernel-agent-replication.py](
|
|
12
|
+
- [scripts/orp-kernel-agent-replication.py](../scripts/orp-kernel-agent-replication.py)
|
|
13
13
|
|
|
14
14
|
The harness now supports:
|
|
15
15
|
|
|
@@ -5,11 +5,11 @@ for the ORP Reasoning Kernel.
|
|
|
5
5
|
|
|
6
6
|
Supporting artifact:
|
|
7
7
|
|
|
8
|
-
- [docs/benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json](
|
|
8
|
+
- [docs/benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json](./benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json)
|
|
9
9
|
|
|
10
10
|
Supporting harness:
|
|
11
11
|
|
|
12
|
-
- [scripts/orp-kernel-canonical-continuation.py](
|
|
12
|
+
- [scripts/orp-kernel-canonical-continuation.py](../scripts/orp-kernel-canonical-continuation.py)
|
|
13
13
|
|
|
14
14
|
## What This Pilot Measures
|
|
15
15
|
|
|
@@ -9,12 +9,12 @@ artifact styles:
|
|
|
9
9
|
|
|
10
10
|
Supporting artifact:
|
|
11
11
|
|
|
12
|
-
- [docs/benchmarks/orp_reasoning_kernel_comparison_v0_1.json](
|
|
12
|
+
- [docs/benchmarks/orp_reasoning_kernel_comparison_v0_1.json](./benchmarks/orp_reasoning_kernel_comparison_v0_1.json)
|
|
13
13
|
|
|
14
14
|
Supporting corpus and harness:
|
|
15
15
|
|
|
16
|
-
- [examples/kernel/comparison/comparison-corpus.json](
|
|
17
|
-
- [scripts/orp-kernel-comparison.py](
|
|
16
|
+
- [examples/kernel/comparison/comparison-corpus.json](../examples/kernel/comparison/comparison-corpus.json)
|
|
17
|
+
- [scripts/orp-kernel-comparison.py](../scripts/orp-kernel-comparison.py)
|
|
18
18
|
|
|
19
19
|
## What This Pilot Measures
|
|
20
20
|
|
|
@@ -91,7 +91,7 @@ This pilot does **not** prove that the kernel:
|
|
|
91
91
|
- is universally superior across all teams or domains
|
|
92
92
|
|
|
93
93
|
Those still require the larger studies in
|
|
94
|
-
[docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](
|
|
94
|
+
[docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](./ORP_REASONING_KERNEL_EVALUATION_PLAN.md).
|
|
95
95
|
|
|
96
96
|
## Why The Scoring Is Structured This Way
|
|
97
97
|
|
|
@@ -5,15 +5,15 @@ Reasoning Kernel.
|
|
|
5
5
|
|
|
6
6
|
Supporting artifact:
|
|
7
7
|
|
|
8
|
-
- [docs/benchmarks/orp_reasoning_kernel_continuation_v0_1.json](
|
|
8
|
+
- [docs/benchmarks/orp_reasoning_kernel_continuation_v0_1.json](./benchmarks/orp_reasoning_kernel_continuation_v0_1.json)
|
|
9
9
|
|
|
10
10
|
Supporting harness:
|
|
11
11
|
|
|
12
|
-
- [scripts/orp-kernel-continuation-pilot.py](
|
|
12
|
+
- [scripts/orp-kernel-continuation-pilot.py](../scripts/orp-kernel-continuation-pilot.py)
|
|
13
13
|
|
|
14
14
|
Related harder benchmark:
|
|
15
15
|
|
|
16
|
-
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](
|
|
16
|
+
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
|
|
17
17
|
|
|
18
18
|
## What This Pilot Measures
|
|
19
19
|
|
|
@@ -13,14 +13,14 @@ to:
|
|
|
13
13
|
|
|
14
14
|
Supporting references:
|
|
15
15
|
|
|
16
|
-
- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](
|
|
17
|
-
- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](
|
|
18
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](
|
|
19
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](
|
|
20
|
-
- [docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md](
|
|
21
|
-
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](
|
|
22
|
-
- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](
|
|
23
|
-
- [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](
|
|
16
|
+
- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](./ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
|
|
17
|
+
- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](./ORP_REASONING_KERNEL_PICKUP_PILOT.md)
|
|
18
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](./ORP_REASONING_KERNEL_AGENT_PILOT.md)
|
|
19
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](./ORP_REASONING_KERNEL_AGENT_REPLICATION.md)
|
|
20
|
+
- [docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CONTINUATION_PILOT.md)
|
|
21
|
+
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
|
|
22
|
+
- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](./ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
|
|
23
|
+
- [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](./ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md)
|
|
24
24
|
|
|
25
25
|
## Evaluation Principles
|
|
26
26
|
|
|
@@ -13,13 +13,13 @@ stronger when we can say, precisely:
|
|
|
13
13
|
|
|
14
14
|
Supporting references:
|
|
15
15
|
|
|
16
|
-
- [docs/ORP_REASONING_KERNEL_V0_1.md](
|
|
17
|
-
- [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](
|
|
18
|
-
- [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](
|
|
19
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](
|
|
20
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](
|
|
21
|
-
- [docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md](
|
|
22
|
-
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](
|
|
16
|
+
- [docs/ORP_REASONING_KERNEL_V0_1.md](./ORP_REASONING_KERNEL_V0_1.md)
|
|
17
|
+
- [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](./ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md)
|
|
18
|
+
- [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json)
|
|
19
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](./ORP_REASONING_KERNEL_AGENT_PILOT.md)
|
|
20
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](./ORP_REASONING_KERNEL_AGENT_REPLICATION.md)
|
|
21
|
+
- [docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CONTINUATION_PILOT.md)
|
|
22
|
+
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
|
|
23
23
|
|
|
24
24
|
## Evidence Grades
|
|
25
25
|
|
|
@@ -41,24 +41,24 @@ the current kernel release.
|
|
|
41
41
|
|
|
42
42
|
| Claim | Grade | Current Evidence | Why It Matters |
|
|
43
43
|
| --- | --- | --- | --- |
|
|
44
|
-
| ORP has a real typed kernel artifact surface. | A | [spec/v1/kernel.schema.json](
|
|
45
|
-
| `orp init` seeds a valid starter kernel artifact and validates it in the default flow. | A | [tests/test_orp_init.py](
|
|
46
|
-
| All seven v0.1 artifact classes can scaffold and validate successfully. | A | [tests/test_orp_kernel.py](
|
|
47
|
-
| Hard mode blocks invalid promotable artifacts. | A | [tests/test_orp_kernel.py](
|
|
48
|
-
| Soft mode records invalidity without blocking work. | A | [tests/test_orp_kernel.py](
|
|
49
|
-
| Existing `structure_kernel` gates remain compatible when no explicit kernel config is present. | A | [tests/test_orp_kernel.py](
|
|
50
|
-
| One-shot local kernel CLI operations are within human-scale latency on the reference machine. | A | [scripts/orp-kernel-benchmark.py](
|
|
51
|
-
| A small cross-domain reference corpus fits the current class set cleanly. | A | [examples/kernel/corpus](
|
|
52
|
-
| Each artifact class rejects a candidate when a required field is removed. | A | [tests/test_orp_kernel_corpus.py](
|
|
53
|
-
| The CLI validator stays aligned with the published kernel schema. | A | [tests/test_orp_kernel_corpus.py](
|
|
54
|
-
| Equivalent YAML and JSON artifacts validate to the same semantic result. | A | [tests/test_orp_kernel_corpus.py](
|
|
55
|
-
| The validator rejects adversarial near-miss artifacts. | A | [tests/test_orp_kernel_corpus.py](
|
|
56
|
-
| On a matched internal comparison corpus, kernel artifacts outperform both free-form and generic checklist artifacts on structural scoring. | A | [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](
|
|
57
|
-
| On a matched internal pickup proxy, kernel artifacts preserve more explicit handoff-critical information than both free-form and generic checklist artifacts. | A | [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](
|
|
58
|
-
| On a matched live Codex recoverability simulation, kernel artifacts preserve full required-field recoverability, outperform free-form artifacts on all matched cases, and outperform generic checklist artifacts on average without per-case losses. | A | [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](
|
|
59
|
-
| On a `10`-repeat full-corpus live Codex replication pilot, the kernel’s recoverability advantage stays stable across fresh-session reruns, with zero invention, no run-level losses, and perfect per-field stability on required kernel fields. | A | [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](
|
|
60
|
-
| On a matched full-corpus live continuation pilot, kernel artifacts support the strongest continuation score, never underperform the generic checklist baseline, and keep invention at zero. | A | [docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md](
|
|
61
|
-
| On a harder matched full-corpus canonical continuation pilot, kernel artifacts beat free-form on every case, beat checklist on average, and keep the lowest invention rate while revealing checklist as a real competitive baseline on some cases. | A | [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](
|
|
44
|
+
| ORP has a real typed kernel artifact surface. | A | [spec/v1/kernel.schema.json](../spec/v1/kernel.schema.json), [cli/orp.py](../cli/orp.py) | The kernel is not just prose. It is an enforceable CLI surface. |
|
|
45
|
+
| `orp init` seeds a valid starter kernel artifact and validates it in the default flow. | A | [tests/test_orp_init.py](../tests/test_orp_init.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | New repos get the kernel by default instead of needing manual adoption. |
|
|
46
|
+
| All seven v0.1 artifact classes can scaffold and validate successfully. | A | [tests/test_orp_kernel.py](../tests/test_orp_kernel.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The kernel is broad enough for multiple project artifact types. |
|
|
47
|
+
| Hard mode blocks invalid promotable artifacts. | A | [tests/test_orp_kernel.py](../tests/test_orp_kernel.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | ORP can enforce structural promotion standards rather than only advising. |
|
|
48
|
+
| Soft mode records invalidity without blocking work. | A | [tests/test_orp_kernel.py](../tests/test_orp_kernel.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | ORP can stay fluid at intake while still surfacing missing structure. |
|
|
49
|
+
| Existing `structure_kernel` gates remain compatible when no explicit kernel config is present. | A | [tests/test_orp_kernel.py](../tests/test_orp_kernel.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The kernel does not silently break earlier ORP configurations. |
|
|
50
|
+
| One-shot local kernel CLI operations are within human-scale latency on the reference machine. | A | [scripts/orp-kernel-benchmark.py](../scripts/orp-kernel-benchmark.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The kernel is operationally lightweight enough to use during normal work. |
|
|
51
|
+
| A small cross-domain reference corpus fits the current class set cleanly. | A | [examples/kernel/corpus](../examples/kernel/corpus), [tests/test_orp_kernel_corpus.py](../tests/test_orp_kernel_corpus.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The kernel now has explicit cross-domain fit evidence, not only rationale. |
|
|
52
|
+
| Each artifact class rejects a candidate when a required field is removed. | A | [tests/test_orp_kernel_corpus.py](../tests/test_orp_kernel_corpus.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | Class-specific enforcement is directly proven instead of inferred from a subset of cases. |
|
|
53
|
+
| The CLI validator stays aligned with the published kernel schema. | A | [tests/test_orp_kernel_corpus.py](../tests/test_orp_kernel_corpus.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The kernel no longer relies on an undocumented validator rule set drifting away from the schema. |
|
|
54
|
+
| Equivalent YAML and JSON artifacts validate to the same semantic result. | A | [tests/test_orp_kernel_corpus.py](../tests/test_orp_kernel_corpus.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The protocol is representation-stable rather than format-sensitive. |
|
|
55
|
+
| The validator rejects adversarial near-miss artifacts. | A | [tests/test_orp_kernel_corpus.py](../tests/test_orp_kernel_corpus.py), [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json) | The kernel is stronger against malformed or gameable inputs than before. |
|
|
56
|
+
| On a matched internal comparison corpus, kernel artifacts outperform both free-form and generic checklist artifacts on structural scoring. | A | [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](./ORP_REASONING_KERNEL_COMPARISON_PILOT.md), [docs/benchmarks/orp_reasoning_kernel_comparison_v0_1.json](./benchmarks/orp_reasoning_kernel_comparison_v0_1.json), [scripts/orp-kernel-comparison.py](../scripts/orp-kernel-comparison.py) | ORP now has direct comparative evidence for structural artifact quality on a matched internal corpus, not only rationale. |
|
|
57
|
+
| On a matched internal pickup proxy, kernel artifacts preserve more explicit handoff-critical information than both free-form and generic checklist artifacts. | A | [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](./ORP_REASONING_KERNEL_PICKUP_PILOT.md), [docs/benchmarks/orp_reasoning_kernel_pickup_v0_1.json](./benchmarks/orp_reasoning_kernel_pickup_v0_1.json), [scripts/orp-kernel-pickup.py](../scripts/orp-kernel-pickup.py) | ORP now has a second comparative signal showing that kernel structure turns into more explicit pickup value, not just fuller-looking artifacts. |
|
|
58
|
+
| On a matched live Codex recoverability simulation, kernel artifacts preserve full required-field recoverability, outperform free-form artifacts on all matched cases, and outperform generic checklist artifacts on average without per-case losses. | A | [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](./ORP_REASONING_KERNEL_AGENT_PILOT.md), [docs/benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json](./benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json), [scripts/orp-kernel-agent-pilot.py](../scripts/orp-kernel-agent-pilot.py) | ORP now has direct in-environment agent evidence that the kernel’s structural advantage survives contact with a real fresh downstream Codex session. |
|
|
59
|
+
| On a `10`-repeat full-corpus live Codex replication pilot, the kernel’s recoverability advantage stays stable across fresh-session reruns, with zero invention, no run-level losses, and perfect per-field stability on required kernel fields. | A | [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](./ORP_REASONING_KERNEL_AGENT_REPLICATION.md), [docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json](./benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json), [scripts/orp-kernel-agent-replication.py](../scripts/orp-kernel-agent-replication.py) | ORP now has stronger repeatability evidence that the live agent result is not just a single-run artifact and that the structural advantage survives at field level, not only in aggregate means. |
|
|
60
|
+
| On a matched full-corpus live continuation pilot, kernel artifacts support the strongest continuation score, never underperform the generic checklist baseline, and keep invention at zero. | A | [docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CONTINUATION_PILOT.md), [docs/benchmarks/orp_reasoning_kernel_continuation_v0_1.json](./benchmarks/orp_reasoning_kernel_continuation_v0_1.json), [scripts/orp-kernel-continuation-pilot.py](../scripts/orp-kernel-continuation-pilot.py) | ORP now has direct agent-first evidence that kernel artifacts are not only recoverable, but also a safe and effective base for downstream continuation. |
|
|
61
|
+
| On a harder matched full-corpus canonical continuation pilot, kernel artifacts beat free-form on every case, beat checklist on average, and keep the lowest invention rate while revealing checklist as a real competitive baseline on some cases. | A | [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md), [docs/benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json](./benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json), [scripts/orp-kernel-canonical-continuation.py](../scripts/orp-kernel-canonical-continuation.py) | ORP now has a stricter downstream-agent benchmark where the task is not merely “continue safely,” but “produce the next canonical artifact” without inventing unsupported structure. |
|
|
62
62
|
|
|
63
63
|
## What Is Strong But Not Fully Sealed
|
|
64
64
|
|
|
@@ -28,8 +28,8 @@ The current core kernel remains the canonical source of truth for:
|
|
|
28
28
|
|
|
29
29
|
Those semantics live in:
|
|
30
30
|
|
|
31
|
-
- [spec/v1/kernel.schema.json](
|
|
32
|
-
- [cli/orp.py](
|
|
31
|
+
- [spec/v1/kernel.schema.json](../spec/v1/kernel.schema.json)
|
|
32
|
+
- [cli/orp.py](../cli/orp.py)
|
|
33
33
|
|
|
34
34
|
The kernel should not self-mutate from a single chat, a single agent guess, or
|
|
35
35
|
one repo’s habits.
|
|
@@ -74,7 +74,7 @@ Use it for changes like:
|
|
|
74
74
|
|
|
75
75
|
Proposal shape is governed by:
|
|
76
76
|
|
|
77
|
-
- [spec/v1/kernel-proposal.schema.json](
|
|
77
|
+
- [spec/v1/kernel-proposal.schema.json](../spec/v1/kernel-proposal.schema.json)
|
|
78
78
|
|
|
79
79
|
### `orp kernel migrate`
|
|
80
80
|
|
|
@@ -93,7 +93,7 @@ It should begin as an extension or proposal before becoming universal.
|
|
|
93
93
|
|
|
94
94
|
Extension shape is defined in:
|
|
95
95
|
|
|
96
|
-
- [spec/v1/kernel-extension.schema.json](
|
|
96
|
+
- [spec/v1/kernel-extension.schema.json](../spec/v1/kernel-extension.schema.json)
|
|
97
97
|
|
|
98
98
|
That gives ORP a place to trial domain-specific structure without forcing it
|
|
99
99
|
into every project prematurely.
|
|
@@ -5,12 +5,12 @@ Reasoning Kernel.
|
|
|
5
5
|
|
|
6
6
|
Supporting artifact:
|
|
7
7
|
|
|
8
|
-
- [docs/benchmarks/orp_reasoning_kernel_pickup_v0_1.json](
|
|
8
|
+
- [docs/benchmarks/orp_reasoning_kernel_pickup_v0_1.json](./benchmarks/orp_reasoning_kernel_pickup_v0_1.json)
|
|
9
9
|
|
|
10
10
|
Supporting corpus and harness:
|
|
11
11
|
|
|
12
|
-
- [examples/kernel/comparison/comparison-corpus.json](
|
|
13
|
-
- [scripts/orp-kernel-pickup.py](
|
|
12
|
+
- [examples/kernel/comparison/comparison-corpus.json](../examples/kernel/comparison/comparison-corpus.json)
|
|
13
|
+
- [scripts/orp-kernel-pickup.py](../scripts/orp-kernel-pickup.py)
|
|
14
14
|
|
|
15
15
|
## What This Pilot Measures
|
|
16
16
|
|
|
@@ -96,7 +96,7 @@ methodology.
|
|
|
96
96
|
It is stronger evidence than a rationale-only claim, but it remains an
|
|
97
97
|
internal, deterministic proxy. The next step after this is still a live
|
|
98
98
|
human/agent pickup study as described in
|
|
99
|
-
[docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](
|
|
99
|
+
[docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](./ORP_REASONING_KERNEL_EVALUATION_PLAN.md).
|
|
100
100
|
|
|
101
101
|
## Bottom Line
|
|
102
102
|
|
|
@@ -6,18 +6,18 @@ for `v0.1`.
|
|
|
6
6
|
|
|
7
7
|
The supporting benchmark artifact for this document is:
|
|
8
8
|
|
|
9
|
-
- [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](
|
|
9
|
+
- [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json)
|
|
10
10
|
|
|
11
11
|
For the honest claim-by-claim evidence status and remaining research gaps, see:
|
|
12
12
|
|
|
13
|
-
- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](
|
|
14
|
-
- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](
|
|
15
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](
|
|
16
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](
|
|
17
|
-
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](
|
|
18
|
-
- [docs/ORP_REASONING_KERNEL_EVOLUTION.md](
|
|
19
|
-
- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](
|
|
20
|
-
- [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](
|
|
13
|
+
- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](./ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
|
|
14
|
+
- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](./ORP_REASONING_KERNEL_PICKUP_PILOT.md)
|
|
15
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](./ORP_REASONING_KERNEL_AGENT_PILOT.md)
|
|
16
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](./ORP_REASONING_KERNEL_AGENT_REPLICATION.md)
|
|
17
|
+
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
|
|
18
|
+
- [docs/ORP_REASONING_KERNEL_EVOLUTION.md](./ORP_REASONING_KERNEL_EVOLUTION.md)
|
|
19
|
+
- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](./ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
|
|
20
|
+
- [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](./ORP_REASONING_KERNEL_EVALUATION_PLAN.md)
|
|
21
21
|
|
|
22
22
|
## 1. Definition
|
|
23
23
|
|
|
@@ -38,10 +38,10 @@ It operates in three roles:
|
|
|
38
38
|
|
|
39
39
|
The kernel is implemented through:
|
|
40
40
|
|
|
41
|
-
- [spec/v1/kernel.schema.json](
|
|
41
|
+
- [spec/v1/kernel.schema.json](../spec/v1/kernel.schema.json)
|
|
42
42
|
- `orp kernel scaffold`
|
|
43
43
|
- `orp kernel validate`
|
|
44
|
-
- `structure_kernel` gate enforcement in [cli/orp.py](
|
|
44
|
+
- `structure_kernel` gate enforcement in [cli/orp.py](../cli/orp.py)
|
|
45
45
|
|
|
46
46
|
## 2. What Problem It Solves
|
|
47
47
|
|
|
@@ -153,8 +153,8 @@ The schema currently supports:
|
|
|
153
153
|
|
|
154
154
|
Each class has a minimum required field set in:
|
|
155
155
|
|
|
156
|
-
- [kernel.schema.json](
|
|
157
|
-
- [cli/orp.py](
|
|
156
|
+
- [kernel.schema.json](../spec/v1/kernel.schema.json)
|
|
157
|
+
- [cli/orp.py](../cli/orp.py)
|
|
158
158
|
|
|
159
159
|
### CLI operations
|
|
160
160
|
|
|
@@ -191,7 +191,7 @@ and the default profile validates it in hard mode.
|
|
|
191
191
|
|
|
192
192
|
The repeatable harness is:
|
|
193
193
|
|
|
194
|
-
- [scripts/orp-kernel-benchmark.py](
|
|
194
|
+
- [scripts/orp-kernel-benchmark.py](../scripts/orp-kernel-benchmark.py)
|
|
195
195
|
|
|
196
196
|
The harness benchmarks and validates:
|
|
197
197
|
|
|
@@ -374,11 +374,11 @@ The benchmark report now records ten claims, all currently passing:
|
|
|
374
374
|
|
|
375
375
|
These claims are backed by:
|
|
376
376
|
|
|
377
|
-
- [tests/test_orp_kernel.py](
|
|
378
|
-
- [tests/test_orp_init.py](
|
|
379
|
-
- [tests/test_orp_kernel_benchmark.py](
|
|
380
|
-
- [tests/test_orp_kernel_corpus.py](
|
|
381
|
-
- [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](
|
|
377
|
+
- [tests/test_orp_kernel.py](../tests/test_orp_kernel.py)
|
|
378
|
+
- [tests/test_orp_init.py](../tests/test_orp_init.py)
|
|
379
|
+
- [tests/test_orp_kernel_benchmark.py](../tests/test_orp_kernel_benchmark.py)
|
|
380
|
+
- [tests/test_orp_kernel_corpus.py](../tests/test_orp_kernel_corpus.py)
|
|
381
|
+
- [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](./benchmarks/orp_reasoning_kernel_v0_1_validation.json)
|
|
382
382
|
|
|
383
383
|
## 8. Why This Applies To All Project Types
|
|
384
384
|
|
|
@@ -13,17 +13,17 @@ truth.
|
|
|
13
13
|
|
|
14
14
|
For the supporting benchmark evidence and alternatives analysis behind this
|
|
15
15
|
design, see
|
|
16
|
-
[docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](
|
|
16
|
+
[docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](./ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md).
|
|
17
17
|
|
|
18
18
|
For the explicit evidence gaps and next comparative experiments, see:
|
|
19
19
|
|
|
20
|
-
- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](
|
|
21
|
-
- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](
|
|
22
|
-
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](
|
|
23
|
-
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](
|
|
24
|
-
- [docs/ORP_REASONING_KERNEL_EVOLUTION.md](
|
|
25
|
-
- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](
|
|
26
|
-
- [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](
|
|
20
|
+
- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](./ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
|
|
21
|
+
- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](./ORP_REASONING_KERNEL_PICKUP_PILOT.md)
|
|
22
|
+
- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](./ORP_REASONING_KERNEL_AGENT_PILOT.md)
|
|
23
|
+
- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](./ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
|
|
24
|
+
- [docs/ORP_REASONING_KERNEL_EVOLUTION.md](./ORP_REASONING_KERNEL_EVOLUTION.md)
|
|
25
|
+
- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](./ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
|
|
26
|
+
- [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](./ORP_REASONING_KERNEL_EVALUATION_PLAN.md)
|
|
27
27
|
|
|
28
28
|
It should make three things true at once:
|
|
29
29
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "open-research-protocol",
|
|
3
|
-
"version": "0.4.
|
|
3
|
+
"version": "0.4.17",
|
|
4
4
|
"description": "ORP CLI (Open Research Protocol): workspace ledgers, secrets, scheduling, governed execution, and agent-friendly research workflows.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Fractal Research Group <cody@frg.earth>",
|
|
@@ -10,7 +10,7 @@ Create a local workspace ledger with no hosted account required:
|
|
|
10
10
|
|
|
11
11
|
```bash
|
|
12
12
|
orp workspace create main-cody-1
|
|
13
|
-
orp workspace create research-lab --path /
|
|
13
|
+
orp workspace create research-lab --path /absolute/path/to/research-lab --resume-tool claude --resume-session-id 469d99b2-2997-42bf-a8f5-3812c808ef29
|
|
14
14
|
```
|
|
15
15
|
|
|
16
16
|
Inspect the saved ledger:
|
|
@@ -30,21 +30,21 @@ orp workspace tabs main --json
|
|
|
30
30
|
Add a new saved tab manually:
|
|
31
31
|
|
|
32
32
|
```bash
|
|
33
|
-
orp workspace ledger add main --path /
|
|
34
|
-
orp workspace add-tab main --path /
|
|
33
|
+
orp workspace ledger add main --path /absolute/path/to/frg-site --resume-command "codex resume 019d348d-5031-78e1-9840-a66deaac33ae"
|
|
34
|
+
orp workspace add-tab main --path /absolute/path/to/anthropic-lab --resume-tool claude --resume-session-id claude-456
|
|
35
35
|
```
|
|
36
36
|
|
|
37
37
|
Remove a saved tab manually:
|
|
38
38
|
|
|
39
39
|
```bash
|
|
40
|
-
orp workspace ledger remove main --path /
|
|
40
|
+
orp workspace ledger remove main --path /absolute/path/to/frg-site
|
|
41
41
|
orp workspace remove-tab main --resume-session-id claude-456 --resume-tool claude
|
|
42
42
|
```
|
|
43
43
|
|
|
44
44
|
Work directly with a local manifest file:
|
|
45
45
|
|
|
46
46
|
```bash
|
|
47
|
-
orp workspace add-tab --workspace-file ./workspace.json --path /
|
|
47
|
+
orp workspace add-tab --workspace-file ./workspace.json --path /absolute/path/to/orp
|
|
48
48
|
orp workspace tabs --workspace-file ./workspace.json
|
|
49
49
|
orp workspace tabs --workspace-file ./workspace.json --json
|
|
50
50
|
```
|
|
@@ -497,9 +497,9 @@ Options:
|
|
|
497
497
|
-h, --help Show this help text
|
|
498
498
|
|
|
499
499
|
Examples:
|
|
500
|
-
orp workspace add-tab main --path /
|
|
501
|
-
orp workspace add-tab main --path /
|
|
502
|
-
orp workspace add-tab main --path /
|
|
500
|
+
orp workspace add-tab main --path /absolute/path/to/new-project
|
|
501
|
+
orp workspace add-tab main --path /absolute/path/to/new-project --resume-command "codex resume 019d..."
|
|
502
|
+
orp workspace add-tab main --path /absolute/path/to/new-project --resume-tool claude --resume-session-id claude-456
|
|
503
503
|
`);
|
|
504
504
|
}
|
|
505
505
|
|
|
@@ -523,8 +523,8 @@ Options:
|
|
|
523
523
|
Examples:
|
|
524
524
|
orp workspace create main-cody-1
|
|
525
525
|
orp workspace create main-cody-1 --slot main
|
|
526
|
-
orp workspace create research-lab --path /
|
|
527
|
-
orp workspace create research-lab --path /
|
|
526
|
+
orp workspace create research-lab --path /absolute/path/to/research-lab
|
|
527
|
+
orp workspace create research-lab --path /absolute/path/to/research-lab --resume-tool claude --resume-session-id 469d99b2-2997-42bf-a8f5-3812c808ef29
|
|
528
528
|
`);
|
|
529
529
|
}
|
|
530
530
|
|
|
@@ -551,7 +551,7 @@ Options:
|
|
|
551
551
|
|
|
552
552
|
Examples:
|
|
553
553
|
orp workspace remove-tab main --index 11
|
|
554
|
-
orp workspace remove-tab main --path /
|
|
554
|
+
orp workspace remove-tab main --path /absolute/path/to/frg-site --resume-session-id 019d348d-5031-78e1-9840-a66deaac33ae
|
|
555
555
|
orp workspace remove-tab main --title frg-site
|
|
556
556
|
`);
|
|
557
557
|
}
|
|
@@ -48,11 +48,11 @@ Examples:
|
|
|
48
48
|
orp workspace create main-cody-1
|
|
49
49
|
orp workspace create main-cody-1 --slot main
|
|
50
50
|
orp workspace ledger main
|
|
51
|
-
orp workspace ledger add main --path /
|
|
51
|
+
orp workspace ledger add main --path /absolute/path/to/new-project --resume-command "codex resume 019d..."
|
|
52
52
|
orp workspace ledger remove main --title frg-site
|
|
53
53
|
orp workspace tabs main-cody-1
|
|
54
|
-
orp workspace add-tab main --path /
|
|
55
|
-
orp workspace remove-tab main --path /
|
|
54
|
+
orp workspace add-tab main --path /absolute/path/to/new-project --resume-command "codex resume 019d..."
|
|
55
|
+
orp workspace remove-tab main --path /absolute/path/to/frg-site --resume-session-id 019d348d-5031-78e1-9840-a66deaac33ae
|
|
56
56
|
orp workspace slot set main main-cody-1
|
|
57
57
|
orp workspace slot set offhand research-lab
|
|
58
58
|
orp workspace slot list
|
|
@@ -75,12 +75,12 @@ SCENES = [
|
|
|
75
75
|
"output_font": "small",
|
|
76
76
|
"line_height": 30,
|
|
77
77
|
"output": [
|
|
78
|
-
("ORP 0.4.
|
|
78
|
+
("ORP 0.4.16", ACCENT),
|
|
79
79
|
("Agent-first CLI for workspace ledgers, secrets, scheduling, and research workflows.", INK),
|
|
80
80
|
("Repo", SKY),
|
|
81
|
-
(" root:
|
|
81
|
+
(" root: ~/code/open-research-protocol", INK),
|
|
82
82
|
(" config: orp.yml (missing)", ACCENT),
|
|
83
|
-
(" git: yes, branch=main, commit=
|
|
83
|
+
(" git: yes, branch=main, commit=1a2b3c4", SOFT),
|
|
84
84
|
("Daily Loop", SKY),
|
|
85
85
|
(" orp workspace tabs main", ACCENT),
|
|
86
86
|
(' orp secrets add --alias <alias> --label "<label>" --provider <provider>', INK),
|