open-research-protocol 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/README.md +15 -0
  2. package/cli/orp.py +1158 -43
  3. package/docs/AGENT_LOOP.md +3 -0
  4. package/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md +125 -0
  5. package/docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md +97 -0
  6. package/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md +100 -0
  7. package/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md +116 -0
  8. package/docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md +86 -0
  9. package/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md +261 -0
  10. package/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md +131 -0
  11. package/docs/ORP_REASONING_KERNEL_EVOLUTION.md +123 -0
  12. package/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md +107 -0
  13. package/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md +140 -22
  14. package/docs/ORP_REASONING_KERNEL_V0_1.md +11 -0
  15. package/docs/ORP_YOUTUBE_INSPECT.md +97 -0
  16. package/docs/benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json +796 -0
  17. package/docs/benchmarks/orp_reasoning_kernel_agent_replication_task_smoke.json +487 -0
  18. package/docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_1.json +1927 -0
  19. package/docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json +10217 -0
  20. package/docs/benchmarks/orp_reasoning_kernel_canonical_continuation_task_smoke.json +174 -0
  21. package/docs/benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json +598 -0
  22. package/docs/benchmarks/orp_reasoning_kernel_comparison_v0_1.json +688 -0
  23. package/docs/benchmarks/orp_reasoning_kernel_continuation_task_smoke.json +150 -0
  24. package/docs/benchmarks/orp_reasoning_kernel_continuation_v0_1.json +448 -0
  25. package/docs/benchmarks/orp_reasoning_kernel_pickup_v0_1.json +594 -0
  26. package/docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json +769 -41
  27. package/examples/README.md +2 -0
  28. package/examples/kernel/comparison/comparison-corpus.json +337 -0
  29. package/examples/kernel/comparison/next-task-continuation.json +55 -0
  30. package/examples/kernel/corpus/operations/habanero-routing.checkpoint.kernel.yml +12 -0
  31. package/examples/kernel/corpus/operations/runner-routing.policy.kernel.yml +9 -0
  32. package/examples/kernel/corpus/product/project-home.decision.kernel.yml +11 -0
  33. package/examples/kernel/corpus/research/kernel-handoff.experiment.kernel.yml +16 -0
  34. package/examples/kernel/corpus/research/lane-drift.hypothesis.kernel.yml +11 -0
  35. package/examples/kernel/corpus/software/trace-widget.task.kernel.yml +13 -0
  36. package/examples/kernel/corpus/writing/kernel-launch.result.kernel.yml +12 -0
  37. package/llms.txt +3 -0
  38. package/package.json +4 -1
  39. package/scripts/orp-kernel-agent-pilot.py +673 -0
  40. package/scripts/orp-kernel-agent-replication.py +307 -0
  41. package/scripts/orp-kernel-benchmark.py +471 -2
  42. package/scripts/orp-kernel-canonical-continuation.py +381 -0
  43. package/scripts/orp-kernel-ci-check.py +138 -0
  44. package/scripts/orp-kernel-comparison.py +592 -0
  45. package/scripts/orp-kernel-continuation-pilot.py +384 -0
  46. package/scripts/orp-kernel-pickup.py +401 -0
  47. package/spec/v1/kernel-extension.schema.json +96 -0
  48. package/spec/v1/kernel-proposal.schema.json +115 -0
  49. package/spec/v1/kernel.schema.json +2 -1
  50. package/spec/v1/youtube-source.schema.json +151 -0
@@ -8,6 +8,17 @@ The supporting benchmark artifact for this document is:
8
8
 
9
9
  - [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](/Volumes/Code_2TB/code/orp/docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json)
10
10
 
11
+ For the honest claim-by-claim evidence status and remaining research gaps, see:
12
+
13
+ - [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
14
+ - [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md)
15
+ - [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md)
16
+ - [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md)
17
+ - [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
18
+ - [docs/ORP_REASONING_KERNEL_EVOLUTION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVOLUTION.md)
19
+ - [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
20
+ - [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md)
21
+
11
22
  ## 1. Definition
12
23
 
13
24
  The ORP Reasoning Kernel is the typed artifact grammar and validation layer
@@ -151,6 +162,9 @@ The kernel currently exposes:
151
162
 
152
163
  - `orp kernel scaffold`
153
164
  - `orp kernel validate`
165
+ - `orp kernel stats`
166
+ - `orp kernel propose`
167
+ - `orp kernel migrate`
154
168
 
155
169
  ### Gate integration
156
170
 
@@ -187,14 +201,22 @@ The harness benchmarks and validates:
187
201
  `orp kernel scaffold` + `orp kernel validate` for every artifact class
188
202
  3. Enforcement path
189
203
  hard mode, soft mode, and legacy compatibility
190
-
191
- The benchmark report was generated on:
192
-
193
- - commit `5c87faf4fbd54d203cc0ca05683544355c306d55`
194
- - package version `0.4.6`
195
- - Python `3.9.6`
196
- - Node `v24.10.0`
197
- - `macOS-26.3-arm64-arm-64bit`
204
+ 4. Cross-domain corpus path
205
+ validate a small reference corpus spanning software, product, research,
206
+ operations, and writing
207
+ 5. Class-specific requirement path
208
+ remove every required field, one at a time, across every artifact class and
209
+ verify rejection
210
+ 6. Schema alignment path
211
+ confirm the CLI validator and published kernel schema stay synchronized
212
+ 7. Representation invariance path
213
+ confirm equivalent YAML and JSON artifacts validate to the same result
214
+ 8. Mutation stress path
215
+ reject adversarial near-miss artifacts such as wrong types, whitespace-only
216
+ text, bad schema metadata, and unexpected fields
217
+
218
+ The precise environment metadata for the current recorded benchmark run lives in
219
+ the benchmark artifact itself.
198
220
 
199
221
  ## 6. What The Benchmarks Show
200
222
 
@@ -202,9 +224,9 @@ The benchmark report was generated on:
202
224
 
203
225
  Reference run, 5 iterations:
204
226
 
205
- - `orp init` mean: `245.958 ms`
206
- - starter `orp kernel validate` mean: `165.837 ms`
207
- - default `orp gate run` mean: `240.768 ms`
227
+ - `orp init` mean: `242.098 ms`
228
+ - starter `orp kernel validate` mean: `162.684 ms`
229
+ - default `orp gate run` mean: `239.282 ms`
208
230
 
209
231
  Interpretation:
210
232
 
@@ -220,8 +242,8 @@ All seven artifact classes successfully scaffolded and validated.
220
242
 
221
243
  Observed means:
222
244
 
223
- - scaffold mean: `157.864 ms`
224
- - validate mean: `156.060 ms`
245
+ - scaffold mean: `161.405 ms`
246
+ - validate mean: `161.641 ms`
225
247
 
226
248
  Interpretation:
227
249
 
@@ -233,9 +255,9 @@ Interpretation:
233
255
 
234
256
  Reference single-run timings:
235
257
 
236
- - hard mode invalid artifact: `164.938 ms`, `FAIL`
237
- - soft mode invalid artifact: `163.174 ms`, `PASS` with advisory invalid state
238
- - legacy compatibility gate: `161.567 ms`, `PASS` without `kernel_validation`
258
+ - hard mode invalid artifact: `172.719 ms`, `FAIL`
259
+ - soft mode invalid artifact: `166.790 ms`, `PASS` with advisory invalid state
260
+ - legacy compatibility gate: `175.379 ms`, `PASS` without `kernel_validation`
239
261
 
240
262
  Interpretation:
241
263
 
@@ -243,27 +265,119 @@ Interpretation:
243
265
  - existing `structure_kernel` surfaces do not regress when no explicit kernel
244
266
  config is present
245
267
 
268
+ ### D. Cross-domain corpus fit
269
+
270
+ Reference corpus run:
271
+
272
+ - fixtures: `7`
273
+ - domains: `5`
274
+ - artifact classes covered: `7`
275
+ - corpus validate mean: `169.879 ms`
276
+
277
+ Interpretation:
278
+
279
+ - The kernel now has a small but explicit cross-domain reference corpus, not
280
+ just abstract cross-domain claims.
281
+ - This does not prove universal fit, but it does show that the current class
282
+ set can represent a concrete spread of software, product, research,
283
+ operations, and writing artifacts cleanly.
284
+
285
+ ### E. Class-specific requirement enforcement
286
+
287
+ Reference enforcement run:
288
+
289
+ - cases: `36`
290
+ - mean validation time: `154.307 ms`
291
+ - every required field across every class triggered rejection when removed
292
+
293
+ Interpretation:
294
+
295
+ - The class requirements are not only documented; they are actively enforced.
296
+ - ORP now has evidence that each current artifact class rejects an incomplete
297
+ candidate when a required field is missing.
298
+
299
+ ### F. Schema-to-validator alignment
300
+
301
+ Reference alignment run:
302
+
303
+ - schema required-field map matches the CLI required-field map
304
+ - schema field set total: `37`
305
+ - CLI field set total: `37`
306
+
307
+ Interpretation:
308
+
309
+ - The validator is now auditable against the published schema rather than
310
+ drifting as a separate undocumented ruleset.
311
+
312
+ ### G. Representation invariance
313
+
314
+ Reference invariance run:
315
+
316
+ - YAML artifact: valid
317
+ - JSON artifact: valid
318
+ - semantic validation result: equivalent
319
+
320
+ Interpretation:
321
+
322
+ - The kernel behaves as a structural protocol rather than a formatting
323
+ preference.
324
+
325
+ ### H. Adversarial mutation detection
326
+
327
+ Reference mutation run:
328
+
329
+ - cases: `7`
330
+ - mean validation time: `152.650 ms`
331
+ - all cases rejected correctly
332
+
333
+ Covered mutations:
334
+
335
+ - unexpected field
336
+ - whitespace-only required text
337
+ - wrong field type
338
+ - non-string list item
339
+ - unsupported artifact class
340
+ - wrong schema version
341
+ - empty required list
342
+
343
+ Interpretation:
344
+
345
+ - The validator now has evidence against adversarial near-miss inputs, not only
346
+ against missing fields.
347
+
246
348
  ## 7. Claims And Evidence
247
349
 
248
- The benchmark report records five claims, all currently passing:
350
+ The benchmark report now records ten claims, all currently passing:
249
351
 
250
- 1. `starter_kernel_bootstrap`
352
+ 1. `schema_validator_alignment`
353
+ The CLI validator stays aligned with the published kernel schema.
354
+ 2. `starter_kernel_bootstrap`
251
355
  ORP seeds a valid starter artifact and a passing default kernel gate.
252
- 2. `typed_artifact_roundtrip`
356
+ 3. `typed_artifact_roundtrip`
253
357
  All seven artifact classes scaffold and validate successfully.
254
- 3. `promotion_enforcement_modes`
358
+ 4. `promotion_enforcement_modes`
255
359
  Hard mode blocks invalid artifacts; soft mode records advisory invalidity.
256
- 4. `legacy_structure_kernel_compatibility`
360
+ 5. `legacy_structure_kernel_compatibility`
257
361
  Older `structure_kernel` gates remain compatible.
258
- 5. `local_cli_kernel_ergonomics`
362
+ 6. `local_cli_kernel_ergonomics`
259
363
  One-shot kernel operations remain within human-scale local latency
260
364
  thresholds on the reference machine.
365
+ 7. `cross_domain_corpus_fit`
366
+ The current kernel class set fits a small cross-domain reference corpus
367
+ cleanly.
368
+ 8. `class_specific_requirement_enforcement`
369
+ Each artifact class rejects a candidate when a required field is removed.
370
+ 9. `representation_invariance`
371
+ Equivalent YAML and JSON artifacts validate to the same semantic result.
372
+ 10. `adversarial_mutation_detection`
373
+ The validator rejects adversarial near-miss artifacts.
261
374
 
262
375
  These claims are backed by:
263
376
 
264
377
  - [tests/test_orp_kernel.py](/Volumes/Code_2TB/code/orp/tests/test_orp_kernel.py)
265
378
  - [tests/test_orp_init.py](/Volumes/Code_2TB/code/orp/tests/test_orp_init.py)
266
379
  - [tests/test_orp_kernel_benchmark.py](/Volumes/Code_2TB/code/orp/tests/test_orp_kernel_benchmark.py)
380
+ - [tests/test_orp_kernel_corpus.py](/Volumes/Code_2TB/code/orp/tests/test_orp_kernel_corpus.py)
267
381
  - [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](/Volumes/Code_2TB/code/orp/docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json)
268
382
 
269
383
  ## 8. Why This Applies To All Project Types
@@ -347,6 +461,10 @@ The current evidence supports that claim:
347
461
  - it works across all current artifact classes
348
462
  - it enforces hard vs soft promotion semantics correctly
349
463
  - it preserves compatibility with pre-kernel `structure_kernel` gates
464
+ - it stays aligned with the published schema
465
+ - it fits a small cross-domain reference corpus
466
+ - it behaves consistently across YAML and JSON
467
+ - it rejects malformed near-miss artifacts
350
468
  - it stays within human-scale local CLI latency targets
351
469
 
352
470
  That makes it a good `v0.1` kernel: minimal, general, validated, and already
@@ -15,6 +15,16 @@ For the supporting benchmark evidence and alternatives analysis behind this
15
15
  design, see
16
16
  [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md).
17
17
 
18
+ For the explicit evidence gaps and next comparative experiments, see:
19
+
20
+ - [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
21
+ - [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md)
22
+ - [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md)
23
+ - [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
24
+ - [docs/ORP_REASONING_KERNEL_EVOLUTION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVOLUTION.md)
25
+ - [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
26
+ - [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md)
27
+
18
28
  It should make three things true at once:
19
29
 
20
30
  - humans can speak naturally at the boundary
@@ -441,6 +451,7 @@ Rust and web should reflect the kernel, not redefine it.
441
451
  That means:
442
452
 
443
453
  - kernel schema and validation rules belong in the CLI
454
+ - kernel observation, proposal, and migration rules belong in the CLI
444
455
  - Rust may expose kernel views, prompts, or editing affordances
445
456
  - web may expose kernel-backed artifact cards and review surfaces
446
457
  - neither Rust nor web should invent competing kernel semantics
@@ -0,0 +1,97 @@
1
+ # ORP YouTube Inspect
2
+
3
+ `orp youtube inspect` is ORP's first-class public-source ingestion surface for
4
+ YouTube videos.
5
+
6
+ It gives agents and users a stable way to turn a YouTube link into:
7
+
8
+ - normalized video metadata,
9
+ - public caption transcript text when available,
10
+ - segment-level timing rows,
11
+ - and one agent-friendly `text_bundle` field that can be handed directly into
12
+ summarization, extraction, comparison, or kernel-shaped artifact creation.
13
+
14
+ ## Why this exists
15
+
16
+ Agents often receive a raw YouTube URL and are asked:
17
+
18
+ - what is this video about?
19
+ - summarize it,
20
+ - extract claims,
21
+ - capture action items,
22
+ - compare it against repo work,
23
+ - or turn it into a canonical ORP artifact.
24
+
25
+ Without a built-in surface, each agent has to improvise scraping, transcript
26
+ discovery, and output shape. ORP now treats this as a real protocol ability.
27
+
28
+ ## Command
29
+
30
+ ```bash
31
+ orp youtube inspect https://www.youtube.com/watch?v=<video_id> --json
32
+ ```
33
+
34
+ Optional persistence:
35
+
36
+ ```bash
37
+ orp youtube inspect https://www.youtube.com/watch?v=<video_id> --save --json
38
+ orp youtube inspect https://www.youtube.com/watch?v=<video_id> --out analysis/source.youtube.json --json
39
+ ```
40
+
41
+ ## Output shape
42
+
43
+ The canonical artifact schema is:
44
+
45
+ - `spec/v1/youtube-source.schema.json`
46
+
47
+ The command returns:
48
+
49
+ - source identity:
50
+ - `source_url`
51
+ - `canonical_url`
52
+ - `video_id`
53
+ - metadata:
54
+ - `title`
55
+ - `author_name`
56
+ - `author_url`
57
+ - `thumbnail_url`
58
+ - `channel_id`
59
+ - `description`
60
+ - `duration_seconds`
61
+ - `published_at`
62
+ - `playability_status`
63
+ - transcript fields:
64
+ - `transcript_available`
65
+ - `transcript_language`
66
+ - `transcript_track_name`
67
+ - `transcript_kind`
68
+ - `transcript_fetch_mode`
69
+ - `transcript_text`
70
+ - `transcript_segments`
71
+ - agent-ready bundle:
72
+ - `text_bundle`
73
+ - capture notes:
74
+ - `warnings`
75
+
76
+ ## Save behavior
77
+
78
+ `--save` writes the artifact to:
79
+
80
+ ```text
81
+ orp/external/youtube/<video_id>.json
82
+ ```
83
+
84
+ This keeps YouTube ingestion consistent with ORP's larger local-first artifact
85
+ discipline while staying outside the evidence boundary by default.
86
+
87
+ ## Important boundary
88
+
89
+ `orp youtube inspect` returns public source context. It does **not** make the
90
+ result canonical evidence by itself.
91
+
92
+ If a video matters for repo truth, the agent should still:
93
+
94
+ 1. inspect the video,
95
+ 2. summarize or structure the relevant claims,
96
+ 3. promote that into a typed ORP artifact when appropriate,
97
+ 4. and cite the saved source artifact path alongside any downstream result.