open-research-protocol 0.4.5 → 0.4.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,499 @@
1
+ # ORP Reasoning Kernel v0.1
2
+
3
+ Status: draft
4
+
5
+ This document defines the first ORP framing for a reasoning kernel that fits
6
+ the current CLI-first contract.
7
+
8
+ ## Purpose
9
+
10
+ The ORP Reasoning Kernel is the artifact-shaping grammar that interprets
11
+ intent, validates structure, and governs promotion into canonical repository
12
+ truth.
13
+
14
+ For the supporting benchmark evidence and alternatives analysis behind this
15
+ design, see
16
+ [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md).
17
+
18
+ It should make three things true at once:
19
+
20
+ - humans can speak naturally at the boundary
21
+ - agents can reason structurally while operating
22
+ - repositories remain canonically legible over time
23
+
24
+ The kernel is not the repository's source of truth by itself. The repository's
25
+ canonical artifacts remain the source of truth. The kernel defines the shape
26
+ those artifacts must satisfy before ORP treats them as solid enough to trust,
27
+ reuse, test, and hand off.
28
+
29
+ Short form:
30
+
31
+ - natural language at the boundary
32
+ - kernel structure at promotion
33
+ - canonical artifacts at the core
34
+
35
+ Or even shorter:
36
+
37
+ - loose in, structured through, solid out
38
+
39
+ ## Why It Belongs In ORP
40
+
41
+ ORP already has most of the right boundary lines:
42
+
43
+ - the CLI is the canonical contract
44
+ - canonical artifacts are distinct from chat
45
+ - packets are process metadata, not evidence
46
+ - evidence remains in canonical artifact paths
47
+ - `structure_kernel` already exists as a named gate phase in the v1 schemas
48
+
49
+ That means the kernel should enter ORP as a protocol and artifact discipline,
50
+ not as a UI-only idea and not as a parallel chat system.
51
+
52
+ ## Non-Goals
53
+
54
+ The kernel should not become:
55
+
56
+ - a mandatory prompt rewriter for every user message
57
+ - a style police layer for normal English
58
+ - a second truth system outside repository artifacts
59
+ - a heavyweight blocker that turns ORP into bureaucracy
60
+ - an attempt to define one grand ontology for all human reasoning
61
+
62
+ ORP should stay fluid at intake and rigorous at promotion.
63
+
64
+ ## Core Model
65
+
66
+ The kernel should operate across three layers.
67
+
68
+ ### 1. Raw Intent
69
+
70
+ This is the boundary layer where humans and agents can remain loose.
71
+
72
+ Examples:
73
+
74
+ - a chat message
75
+ - a checkpoint note
76
+ - a bug report
77
+ - a design ask
78
+ - a quick research hypothesis
79
+ - a rough implementation request
80
+
81
+ Raw intent is allowed to be partial, ambiguous, and exploratory.
82
+
83
+ ### 2. Working Interpretation
84
+
85
+ This is the agent's kernel-shaped reading of the request. It is structured, but
86
+ still revisable.
87
+
88
+ Examples of working-interpretation fields:
89
+
90
+ - artifact class
91
+ - object
92
+ - goal
93
+ - boundary
94
+ - constraints
95
+ - invariants
96
+ - failure modes
97
+ - evidence expectations
98
+ - next action class
99
+ - candidate canonical target
100
+
101
+ This layer is not yet repository truth. It is the agent's current structured
102
+ map.
103
+
104
+ ### 3. Canonical Artifact
105
+
106
+ This is the layer ORP will trust as repository truth.
107
+
108
+ Examples:
109
+
110
+ - task card
111
+ - design decision
112
+ - hypothesis record
113
+ - experiment record
114
+ - checkpoint summary
115
+ - policy object
116
+ - result record
117
+
118
+ Artifacts become canonical only after they satisfy the kernel's typed
119
+ completeness rules.
120
+
121
+ ## Kernel Roles
122
+
123
+ The kernel should have three roles.
124
+
125
+ ### Interpreter
126
+
127
+ Turn natural language into structured intent.
128
+
129
+ Example:
130
+
131
+ - raw ask: "build the trace widget for terminal sessions so I can watch what
132
+ lanes are doing and quickly tell if one is drifting"
133
+ - interpreted structure:
134
+ - object: terminal trace widget
135
+ - goal: expose lane state and drift
136
+ - boundary: terminal-first orchestration UX
137
+ - failure mode: hidden drift and unclear status
138
+ - next action: spec then incremental implementation
139
+
140
+ ### Validator
141
+
142
+ Check whether a proposed artifact is structurally sufficient for its class.
143
+
144
+ ### Canonizer
145
+
146
+ Govern whether and how accepted work enters canonical repository artifacts.
147
+
148
+ The kernel does not replace repository truth. It controls the grammar for how
149
+ repository truth gets shaped.
150
+
151
+ ## Soft Mode And Hard Mode
152
+
153
+ This is the most important operating distinction.
154
+
155
+ ### Soft Mode
156
+
157
+ Used at intake and during ideation.
158
+
159
+ The kernel may:
160
+
161
+ - classify the request
162
+ - infer missing structure
163
+ - surface ambiguity
164
+ - suggest what is missing
165
+ - route the next action
166
+
167
+ The kernel should not block the user from speaking naturally.
168
+
169
+ ### Hard Mode
170
+
171
+ Used when work is promoted into canonical repository artifacts.
172
+
173
+ If something is going to become:
174
+
175
+ - a task card
176
+ - a design decision
177
+ - a hypothesis
178
+ - an experiment
179
+ - a policy object
180
+ - a checkpoint summary
181
+ - a stable result record
182
+
183
+ then it must satisfy the kernel's minimum structural rules for that artifact
184
+ class.
185
+
186
+ This gives ORP the right balance:
187
+
188
+ - ideation stays fluid
189
+ - repository truth stays solid
190
+
191
+ ## Kernel Primitives
192
+
193
+ These are the first useful cross-artifact fields.
194
+
195
+ - `artifact_class`
196
+ - `object`
197
+ - `goal`
198
+ - `boundary`
199
+ - `constraints`
200
+ - `invariants`
201
+ - `failure_modes`
202
+ - `evidence_expectations`
203
+ - `success_criteria`
204
+ - `next_action`
205
+ - `canonical_target`
206
+ - `artifact_refs`
207
+
208
+ Not every artifact class needs every field. The kernel should be typed.
209
+
210
+ ## Typed Artifact Classes
211
+
212
+ The kernel should start with a small set of artifact classes rather than trying
213
+ to model everything at once.
214
+
215
+ ### `task`
216
+
217
+ Minimum useful fields:
218
+
219
+ - object
220
+ - goal
221
+ - boundary
222
+ - constraints
223
+ - success_criteria
224
+
225
+ ### `decision`
226
+
227
+ Minimum useful fields:
228
+
229
+ - question
230
+ - chosen_path
231
+ - rejected_alternatives
232
+ - rationale
233
+ - consequences
234
+
235
+ ### `hypothesis`
236
+
237
+ Minimum useful fields:
238
+
239
+ - claim
240
+ - boundary
241
+ - assumptions
242
+ - test_path
243
+ - falsifiers
244
+
245
+ ### `experiment`
246
+
247
+ Minimum useful fields:
248
+
249
+ - objective
250
+ - method
251
+ - inputs
252
+ - outputs
253
+ - evidence_expectations
254
+ - interpretation_limits
255
+
256
+ ### `checkpoint`
257
+
258
+ Minimum useful fields:
259
+
260
+ - completed_unit
261
+ - current_state
262
+ - risks
263
+ - next_handoff_target
264
+ - artifact_refs
265
+
266
+ ### `policy`
267
+
268
+ Minimum useful fields:
269
+
270
+ - scope
271
+ - rule
272
+ - rationale
273
+ - invariants
274
+ - enforcement_surface
275
+
276
+ ### `result`
277
+
278
+ Minimum useful fields:
279
+
280
+ - claim
281
+ - evidence_paths
282
+ - status
283
+ - interpretation_limits
284
+ - next_follow_up
285
+
286
+ ## Kernel Operators
287
+
288
+ These are the actions an agent can take through the kernel.
289
+
290
+ ### Classify
291
+
292
+ Determine what kind of artifact or action is being requested.
293
+
294
+ ### Expand
295
+
296
+ Turn shorthand into a fuller structure.
297
+
298
+ ### Tighten
299
+
300
+ Remove ambiguity and define boundaries more explicitly.
301
+
302
+ ### Challenge
303
+
304
+ Surface assumptions, missing invariants, or weak success criteria.
305
+
306
+ ### Route
307
+
308
+ Choose the next action:
309
+
310
+ - explore
311
+ - implement
312
+ - critique
313
+ - test
314
+ - summarize
315
+ - ask
316
+ - update an artifact
317
+
318
+ ### Promote
319
+
320
+ Move working interpretation into canonical artifact form.
321
+
322
+ ### Downgrade
323
+
324
+ If verification or structure fails, lower the claim or keep it provisional.
325
+
326
+ ### Record
327
+
328
+ Write the appropriate process metadata and artifact references.
329
+
330
+ ## The Kernel Test
331
+
332
+ The kernel test is the minimum structural truth standard for promotion.
333
+
334
+ A proposed artifact passes the kernel test when another human or agent can tell:
335
+
336
+ - what it is
337
+ - why it exists
338
+ - what boundaries apply
339
+ - what assumptions it depends on
340
+ - what success or failure would look like
341
+ - where its evidence or dependent artifacts live
342
+
343
+ This should not be one giant universal checklist. It should be a typed test
344
+ based on artifact class.
345
+
346
+ Example:
347
+
348
+ A proposed task called `Build terminal drift monitor` is not yet solid enough.
349
+
350
+ Kernel questions:
351
+
352
+ - what is the object?
353
+ - what boundary defines the monitor?
354
+ - what signals does it consume?
355
+ - what counts as drift?
356
+ - what constraints exist?
357
+ - what invariants must hold?
358
+ - how will success be measured?
359
+ - what artifact should this produce?
360
+
361
+ After expansion:
362
+
363
+ - object: terminal lane drift monitor
364
+ - goal: reveal divergence between intended lane behavior and actual outputs
365
+ - inputs: lane logs, timing, state transitions, error events
366
+ - outputs: summarized health state and drill-down trace
367
+ - constraints: terminal-native, low overhead, no GUI dependency
368
+ - invariant: does not alter lane execution
369
+ - failure_modes: false positives, information overload, missing event capture
370
+ - success_criteria: operator identifies a stalled or drifting lane within ten
371
+ seconds
372
+ - canonical_target: monitor spec plus event schema plus sample logs
373
+
374
+ Now it is structurally legible enough to promote.
375
+
376
+ ## Relationship To Evidence
377
+
378
+ The kernel does not redefine ORP's evidence boundary.
379
+
380
+ It should reinforce it.
381
+
382
+ - packets remain process metadata
383
+ - summaries remain process metadata
384
+ - kernel traces remain process metadata
385
+ - evidence remains in canonical artifact paths such as code, data, papers,
386
+ proofs, logs, and experiment outputs
387
+
388
+ The kernel governs shape, not proof.
389
+
390
+ ## Relationship To Chat
391
+
392
+ Chat should not become source of truth by accident.
393
+
394
+ The correct split is:
395
+
396
+ - raw conversation is exploratory
397
+ - working interpretation is semi-structured
398
+ - canonical artifacts are the repository truth
399
+
400
+ This prevents chat from turning into a mushy, implicit spec layer.
401
+
402
+ ## CLI Integration Points
403
+
404
+ The kernel should live in the CLI contract, not in UI-only layers.
405
+
406
+ ### 1. `gate`
407
+
408
+ `structure_kernel` already exists as a gate phase in:
409
+
410
+ - `spec/v1/orp.config.schema.json`
411
+ - `spec/v1/packet.schema.json`
412
+
413
+ This is the most natural hard-mode enforcement surface.
414
+
415
+ The first kernel validations should plug in here.
416
+
417
+ ### 2. `checkpoint`
418
+
419
+ Checkpoint notes may stay natural-language, but ORP should eventually support a
420
+ kernel-shaped checkpoint summary for more reliable handoff.
421
+
422
+ ### 3. `packet`
423
+
424
+ Packets may record kernel validation status and artifact references as process
425
+ metadata only.
426
+
427
+ ### 4. `report`
428
+
429
+ Reports may render kernel-shaped summaries for human review, but should not
430
+ pretend that kernel structure is evidence.
431
+
432
+ ### 5. `ready`
433
+
434
+ Where readiness depends on canonical artifact quality, ORP may eventually
435
+ require kernel-valid promoted artifacts as part of the readiness bar.
436
+
437
+ ## Boundary With Rust And Web
438
+
439
+ Rust and web should reflect the kernel, not redefine it.
440
+
441
+ That means:
442
+
443
+ - kernel schema and validation rules belong in the CLI
444
+ - Rust may expose kernel views, prompts, or editing affordances
445
+ - web may expose kernel-backed artifact cards and review surfaces
446
+ - neither Rust nor web should invent competing kernel semantics
447
+
448
+ This follows the same boundary already established for link, session, runner,
449
+ and governance truth.
450
+
451
+ ## First Implementation Slice
452
+
453
+ The right v0.1 implementation slice is intentionally small.
454
+
455
+ ### Phase 1
456
+
457
+ - add this design note
458
+ - add a machine-readable kernel schema in `spec/v1/kernel.schema.json`
459
+ - define typed artifact classes and required fields
460
+
461
+ ### Phase 2
462
+
463
+ - allow optional kernel blocks in process artifacts
464
+ - expose kernel validation results in run artifacts and packets
465
+ - make `structure_kernel` a real validation lane in `gate`
466
+
467
+ ### Phase 3
468
+
469
+ - add explicit CLI surfaces such as:
470
+ - `orp kernel validate`
471
+ - `orp kernel explain`
472
+ - `orp kernel promote`
473
+
474
+ These should be optional helpers, not mandatory user-facing prompt wrappers.
475
+
476
+ ## Design Discipline
477
+
478
+ The kernel should enter ORP operationally, not metaphysically.
479
+
480
+ Start with:
481
+
482
+ - a small number of artifact classes
483
+ - a small number of required fields
484
+ - a promotion test
485
+ - clear validation semantics
486
+
487
+ Do not begin by trying to encode all human reasoning.
488
+
489
+ ## Canonical Statement
490
+
491
+ The clean ORP statement for this model is:
492
+
493
+ The ORP Reasoning Kernel defines the shape of truth, while canonical artifacts
494
+ remain the source of truth.
495
+
496
+ Or, in product language:
497
+
498
+ Natural language at the boundary. Kernel structure at promotion. Canonical
499
+ artifacts at the core.
@@ -0,0 +1,197 @@
1
+ {
2
+ "schema_version": "1.0.0",
3
+ "kind": "orp_reasoning_kernel_validation_report",
4
+ "metadata": {
5
+ "generated_at_utc": "2026-03-23T04:42:53Z",
6
+ "repo_commit": "5c87faf4fbd54d203cc0ca05683544355c306d55",
7
+ "repo_branch": "main",
8
+ "package_version": "0.4.7",
9
+ "python_version": "3.9.6",
10
+ "node_version": "v24.10.0",
11
+ "platform": "macOS-26.3-arm64-arm-64bit"
12
+ },
13
+ "benchmarks": {
14
+ "init_starter_kernel": {
15
+ "iterations": 5,
16
+ "observed": {
17
+ "init": {
18
+ "mean_ms": 245.853,
19
+ "median_ms": 242.029,
20
+ "min_ms": 239.454,
21
+ "max_ms": 257.57
22
+ },
23
+ "validate": {
24
+ "mean_ms": 169.097,
25
+ "median_ms": 167.938,
26
+ "min_ms": 165.273,
27
+ "max_ms": 173.245
28
+ },
29
+ "gate_run": {
30
+ "mean_ms": 242.618,
31
+ "median_ms": 239.599,
32
+ "min_ms": 238.174,
33
+ "max_ms": 252.913
34
+ }
35
+ },
36
+ "targets": {
37
+ "init_mean_lt_ms": 350.0,
38
+ "validate_mean_lt_ms": 200.0,
39
+ "gate_mean_lt_ms": 300.0
40
+ },
41
+ "meets_targets": {
42
+ "init": true,
43
+ "validate": true,
44
+ "gate_run": true
45
+ },
46
+ "sample_run_records": [
47
+ "orp/artifacts/run-20260323-044247-956825/RUN.json",
48
+ "orp/artifacts/run-20260323-044248-621472/RUN.json"
49
+ ]
50
+ },
51
+ "artifact_roundtrip": {
52
+ "artifact_classes_total": 7,
53
+ "rows": [
54
+ {
55
+ "artifact_class": "task",
56
+ "scaffold_ms": 162.963,
57
+ "validate_ms": 161.02
58
+ },
59
+ {
60
+ "artifact_class": "decision",
61
+ "scaffold_ms": 162.639,
62
+ "validate_ms": 161.466
63
+ },
64
+ {
65
+ "artifact_class": "hypothesis",
66
+ "scaffold_ms": 162.337,
67
+ "validate_ms": 165.228
68
+ },
69
+ {
70
+ "artifact_class": "experiment",
71
+ "scaffold_ms": 171.011,
72
+ "validate_ms": 160.825
73
+ },
74
+ {
75
+ "artifact_class": "checkpoint",
76
+ "scaffold_ms": 161.705,
77
+ "validate_ms": 163.51
78
+ },
79
+ {
80
+ "artifact_class": "policy",
81
+ "scaffold_ms": 160.807,
82
+ "validate_ms": 163.85
83
+ },
84
+ {
85
+ "artifact_class": "result",
86
+ "scaffold_ms": 163.882,
87
+ "validate_ms": 162.509
88
+ }
89
+ ],
90
+ "observed": {
91
+ "scaffold": {
92
+ "mean_ms": 163.621,
93
+ "median_ms": 162.639,
94
+ "min_ms": 160.807,
95
+ "max_ms": 171.011
96
+ },
97
+ "validate": {
98
+ "mean_ms": 162.63,
99
+ "median_ms": 162.509,
100
+ "min_ms": 160.825,
101
+ "max_ms": 165.228
102
+ }
103
+ },
104
+ "targets": {
105
+ "scaffold_mean_lt_ms": 200.0,
106
+ "validate_mean_lt_ms": 200.0
107
+ },
108
+ "meets_targets": {
109
+ "scaffold": true,
110
+ "validate": true
111
+ }
112
+ },
113
+ "gate_modes": {
114
+ "hard_mode": {
115
+ "ms": 174.339,
116
+ "exit_code": 1,
117
+ "overall": "FAIL",
118
+ "kernel_valid": false,
119
+ "missing_fields": [
120
+ "constraints",
121
+ "success_criteria"
122
+ ]
123
+ },
124
+ "soft_mode": {
125
+ "ms": 173.082,
126
+ "exit_code": 0,
127
+ "overall": "PASS",
128
+ "kernel_valid": false
129
+ },
130
+ "legacy_compatibility": {
131
+ "ms": 172.431,
132
+ "exit_code": 0,
133
+ "overall": "PASS",
134
+ "has_kernel_validation": false
135
+ },
136
+ "meets_expectations": {
137
+ "hard_blocks_invalid_artifact": true,
138
+ "soft_allows_invalid_artifact_with_advisory": true,
139
+ "legacy_structure_kernel_remains_compatible": true
140
+ }
141
+ }
142
+ },
143
+ "claims": [
144
+ {
145
+ "id": "starter_kernel_bootstrap",
146
+ "claim": "orp init seeds a valid starter kernel artifact and a passing default structure_kernel gate.",
147
+ "status": "pass",
148
+ "evidence": [
149
+ "benchmarks.init_starter_kernel",
150
+ "cli/orp.py",
151
+ "tests/test_orp_init.py"
152
+ ]
153
+ },
154
+ {
155
+ "id": "typed_artifact_roundtrip",
156
+ "claim": "All seven v0.1 artifact classes can be scaffolded and validated through the CLI.",
157
+ "status": "pass",
158
+ "evidence": [
159
+ "benchmarks.artifact_roundtrip",
160
+ "spec/v1/kernel.schema.json",
161
+ "tests/test_orp_kernel.py"
162
+ ]
163
+ },
164
+ {
165
+ "id": "promotion_enforcement_modes",
166
+ "claim": "Hard mode blocks invalid promotable artifacts, while soft mode records advisory issues without blocking.",
167
+ "status": "pass",
168
+ "evidence": [
169
+ "benchmarks.gate_modes",
170
+ "tests/test_orp_kernel.py"
171
+ ]
172
+ },
173
+ {
174
+ "id": "legacy_structure_kernel_compatibility",
175
+ "claim": "Existing structure_kernel gates without explicit kernel config remain compatible.",
176
+ "status": "pass",
177
+ "evidence": [
178
+ "benchmarks.gate_modes",
179
+ "cli/orp.py"
180
+ ]
181
+ },
182
+ {
183
+ "id": "local_cli_kernel_ergonomics",
184
+ "claim": "One-shot kernel CLI operations remain within human-scale local ergonomics targets on the reference machine.",
185
+ "status": "pass",
186
+ "evidence": [
187
+ "benchmarks.init_starter_kernel",
188
+ "benchmarks.artifact_roundtrip"
189
+ ]
190
+ }
191
+ ],
192
+ "summary": {
193
+ "all_claims_pass": true,
194
+ "artifact_classes_total": 7,
195
+ "all_performance_targets_met": true
196
+ }
197
+ }
@@ -8,6 +8,8 @@ These files are intentionally **minimal** and **illustrative**.
8
8
 
9
9
  Additional v1 runtime draft examples:
10
10
 
11
+ - `orp.reasoning-kernel.starter.yml` — minimal kernel-aware profile showing a real `structure_kernel` gate.
12
+ - `kernel/trace-widget.task.kernel.yml` — example typed kernel artifact for a promotable task.
11
13
  - `orp.sunflower-coda.atomic.yml` — discovery-first profile for atomic board workflows.
12
14
  - `orp.sunflower-coda.live-compare.yml` — side-by-side gate-compare profiles for sunflower Problems 857/20/367.
13
15
  - `orp.sunflower-coda.pr-governance.yml` — local-first PR governance profile set (pre-open, draft-readiness, full flow).
@@ -0,0 +1,18 @@
1
+ schema_version: "1.0.0"
2
+ artifact_class: task
3
+ object: terminal trace widget
4
+ goal: reveal lane state and drift fast enough for a human operator to act
5
+ boundary:
6
+ - terminal-first orchestration workflow
7
+ - session and lane visibility only
8
+ constraints:
9
+ - low friction in the terminal
10
+ - must not alter lane execution
11
+ - should remain useful without GUI affordances
12
+ success_criteria:
13
+ - operator can identify a stalled or drifting lane within ten seconds
14
+ canonical_target:
15
+ - docs/TRACE_WIDGET_SPEC.md
16
+ artifact_refs:
17
+ - docs/TRACE_WIDGET_SPEC.md
18
+ - analysis/trace-widget-sample-output.txt