npm - open-research-protocol - Versions diffs - 0.4.7 → 0.4.9 - Mend

open-research-protocol 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/README.md +15 -0
package/cli/orp.py +1158 -43
package/docs/AGENT_LOOP.md +3 -0
package/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md +125 -0
package/docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md +97 -0
package/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md +100 -0
package/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md +116 -0
package/docs/ORP_REASONING_KERNEL_CONTINUATION_PILOT.md +86 -0
package/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md +261 -0
package/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md +131 -0
package/docs/ORP_REASONING_KERNEL_EVOLUTION.md +123 -0
package/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md +107 -0
package/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md +140 -22
package/docs/ORP_REASONING_KERNEL_V0_1.md +11 -0
package/docs/ORP_YOUTUBE_INSPECT.md +97 -0
package/docs/benchmarks/orp_reasoning_kernel_agent_pilot_v0_1.json +796 -0
package/docs/benchmarks/orp_reasoning_kernel_agent_replication_task_smoke.json +487 -0
package/docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_1.json +1927 -0
package/docs/benchmarks/orp_reasoning_kernel_agent_replication_v0_2.json +10217 -0
package/docs/benchmarks/orp_reasoning_kernel_canonical_continuation_task_smoke.json +174 -0
package/docs/benchmarks/orp_reasoning_kernel_canonical_continuation_v0_1.json +598 -0
package/docs/benchmarks/orp_reasoning_kernel_comparison_v0_1.json +688 -0
package/docs/benchmarks/orp_reasoning_kernel_continuation_task_smoke.json +150 -0
package/docs/benchmarks/orp_reasoning_kernel_continuation_v0_1.json +448 -0
package/docs/benchmarks/orp_reasoning_kernel_pickup_v0_1.json +594 -0
package/docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json +769 -41
package/examples/README.md +2 -0
package/examples/kernel/comparison/comparison-corpus.json +337 -0
package/examples/kernel/comparison/next-task-continuation.json +55 -0
package/examples/kernel/corpus/operations/habanero-routing.checkpoint.kernel.yml +12 -0
package/examples/kernel/corpus/operations/runner-routing.policy.kernel.yml +9 -0
package/examples/kernel/corpus/product/project-home.decision.kernel.yml +11 -0
package/examples/kernel/corpus/research/kernel-handoff.experiment.kernel.yml +16 -0
package/examples/kernel/corpus/research/lane-drift.hypothesis.kernel.yml +11 -0
package/examples/kernel/corpus/software/trace-widget.task.kernel.yml +13 -0
package/examples/kernel/corpus/writing/kernel-launch.result.kernel.yml +12 -0
package/llms.txt +3 -0
package/package.json +4 -1
package/scripts/orp-kernel-agent-pilot.py +673 -0
package/scripts/orp-kernel-agent-replication.py +307 -0
package/scripts/orp-kernel-benchmark.py +471 -2
package/scripts/orp-kernel-canonical-continuation.py +381 -0
package/scripts/orp-kernel-ci-check.py +138 -0
package/scripts/orp-kernel-comparison.py +592 -0
package/scripts/orp-kernel-continuation-pilot.py +384 -0
package/scripts/orp-kernel-pickup.py +401 -0
package/spec/v1/kernel-extension.schema.json +96 -0
package/spec/v1/kernel-proposal.schema.json +115 -0
package/spec/v1/kernel.schema.json +2 -1
package/spec/v1/youtube-source.schema.json +151 -0

package/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md CHANGED Viewed

@@ -8,6 +8,17 @@ The supporting benchmark artifact for this document is:
 - [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](/Volumes/Code_2TB/code/orp/docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json)
+For the honest claim-by-claim evidence status and remaining research gaps, see:
+- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_AGENT_REPLICATION.md)
+- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_EVOLUTION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVOLUTION.md)
+- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
+- [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md)
 ## 1. Definition
 The ORP Reasoning Kernel is the typed artifact grammar and validation layer
@@ -151,6 +162,9 @@ The kernel currently exposes:
 - `orp kernel scaffold`
 - `orp kernel validate`
+- `orp kernel stats`
+- `orp kernel propose`
+- `orp kernel migrate`
 ### Gate integration
@@ -187,14 +201,22 @@ The harness benchmarks and validates:
    `orp kernel scaffold` + `orp kernel validate` for every artifact class
 3. Enforcement path
    hard mode, soft mode, and legacy compatibility
-The benchmark report was generated on:
-- commit `5c87faf4fbd54d203cc0ca05683544355c306d55`
-- package version `0.4.6`
-- Python `3.9.6`
-- Node `v24.10.0`
-- `macOS-26.3-arm64-arm-64bit`
+4. Cross-domain corpus path
+   validate a small reference corpus spanning software, product, research,
+   operations, and writing
+5. Class-specific requirement path
+   remove every required field, one at a time, across every artifact class and
+   verify rejection
+6. Schema alignment path
+   confirm the CLI validator and published kernel schema stay synchronized
+7. Representation invariance path
+   confirm equivalent YAML and JSON artifacts validate to the same result
+8. Mutation stress path
+   reject adversarial near-miss artifacts such as wrong types, whitespace-only
+   text, bad schema metadata, and unexpected fields
+The precise environment metadata for the current recorded benchmark run lives in
+the benchmark artifact itself.
 ## 6. What The Benchmarks Show
@@ -202,9 +224,9 @@ The benchmark report was generated on:
 Reference run, 5 iterations:
-- `orp init` mean: `245.958 ms`
-- starter `orp kernel validate` mean: `165.837 ms`
-- default `orp gate run` mean: `240.768 ms`
+- `orp init` mean: `242.098 ms`
+- starter `orp kernel validate` mean: `162.684 ms`
+- default `orp gate run` mean: `239.282 ms`
 Interpretation:
@@ -220,8 +242,8 @@ All seven artifact classes successfully scaffolded and validated.
 Observed means:
-- scaffold mean: `157.864 ms`
-- validate mean: `156.060 ms`
+- scaffold mean: `161.405 ms`
+- validate mean: `161.641 ms`
 Interpretation:
@@ -233,9 +255,9 @@ Interpretation:
 Reference single-run timings:
-- hard mode invalid artifact: `164.938 ms`, `FAIL`
-- soft mode invalid artifact: `163.174 ms`, `PASS` with advisory invalid state
-- legacy compatibility gate: `161.567 ms`, `PASS` without `kernel_validation`
+- hard mode invalid artifact: `172.719 ms`, `FAIL`
+- soft mode invalid artifact: `166.790 ms`, `PASS` with advisory invalid state
+- legacy compatibility gate: `175.379 ms`, `PASS` without `kernel_validation`
 Interpretation:
@@ -243,27 +265,119 @@ Interpretation:
 - existing `structure_kernel` surfaces do not regress when no explicit kernel
   config is present
+### D. Cross-domain corpus fit
+Reference corpus run:
+- fixtures: `7`
+- domains: `5`
+- artifact classes covered: `7`
+- corpus validate mean: `169.879 ms`
+Interpretation:
+- The kernel now has a small but explicit cross-domain reference corpus, not
+  just abstract cross-domain claims.
+- This does not prove universal fit, but it does show that the current class
+  set can represent a concrete spread of software, product, research,
+  operations, and writing artifacts cleanly.
+### E. Class-specific requirement enforcement
+Reference enforcement run:
+- cases: `36`
+- mean validation time: `154.307 ms`
+- every required field across every class triggered rejection when removed
+Interpretation:
+- The class requirements are not only documented; they are actively enforced.
+- ORP now has evidence that each current artifact class rejects an incomplete
+  candidate when a required field is missing.
+### F. Schema-to-validator alignment
+Reference alignment run:
+- schema required-field map matches the CLI required-field map
+- schema field set total: `37`
+- CLI field set total: `37`
+Interpretation:
+- The validator is now auditable against the published schema rather than
+  drifting as a separate undocumented ruleset.
+### G. Representation invariance
+Reference invariance run:
+- YAML artifact: valid
+- JSON artifact: valid
+- semantic validation result: equivalent
+Interpretation:
+- The kernel behaves as a structural protocol rather than a formatting
+  preference.
+### H. Adversarial mutation detection
+Reference mutation run:
+- cases: `7`
+- mean validation time: `152.650 ms`
+- all cases rejected correctly
+Covered mutations:
+- unexpected field
+- whitespace-only required text
+- wrong field type
+- non-string list item
+- unsupported artifact class
+- wrong schema version
+- empty required list
+Interpretation:
+- The validator now has evidence against adversarial near-miss inputs, not only
+  against missing fields.
 ## 7. Claims And Evidence
-The benchmark report records five claims, all currently passing:
+The benchmark report now records ten claims, all currently passing:
-1. `starter_kernel_bootstrap`
+1. `schema_validator_alignment`
+   The CLI validator stays aligned with the published kernel schema.
+2. `starter_kernel_bootstrap`
    ORP seeds a valid starter artifact and a passing default kernel gate.
-2. `typed_artifact_roundtrip`
+3. `typed_artifact_roundtrip`
    All seven artifact classes scaffold and validate successfully.
-3. `promotion_enforcement_modes`
+4. `promotion_enforcement_modes`
    Hard mode blocks invalid artifacts; soft mode records advisory invalidity.
-4. `legacy_structure_kernel_compatibility`
+5. `legacy_structure_kernel_compatibility`
    Older `structure_kernel` gates remain compatible.
-5. `local_cli_kernel_ergonomics`
+6. `local_cli_kernel_ergonomics`
    One-shot kernel operations remain within human-scale local latency
    thresholds on the reference machine.
+7. `cross_domain_corpus_fit`
+   The current kernel class set fits a small cross-domain reference corpus
+   cleanly.
+8. `class_specific_requirement_enforcement`
+   Each artifact class rejects a candidate when a required field is removed.
+9. `representation_invariance`
+   Equivalent YAML and JSON artifacts validate to the same semantic result.
+10. `adversarial_mutation_detection`
+   The validator rejects adversarial near-miss artifacts.
 These claims are backed by:
 - [tests/test_orp_kernel.py](/Volumes/Code_2TB/code/orp/tests/test_orp_kernel.py)
 - [tests/test_orp_init.py](/Volumes/Code_2TB/code/orp/tests/test_orp_init.py)
 - [tests/test_orp_kernel_benchmark.py](/Volumes/Code_2TB/code/orp/tests/test_orp_kernel_benchmark.py)
+- [tests/test_orp_kernel_corpus.py](/Volumes/Code_2TB/code/orp/tests/test_orp_kernel_corpus.py)
 - [docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json](/Volumes/Code_2TB/code/orp/docs/benchmarks/orp_reasoning_kernel_v0_1_validation.json)
 ## 8. Why This Applies To All Project Types
@@ -347,6 +461,10 @@ The current evidence supports that claim:
 - it works across all current artifact classes
 - it enforces hard vs soft promotion semantics correctly
 - it preserves compatibility with pre-kernel `structure_kernel` gates
+- it stays aligned with the published schema
+- it fits a small cross-domain reference corpus
+- it behaves consistently across YAML and JSON
+- it rejects malformed near-miss artifacts
 - it stays within human-scale local CLI latency targets
 That makes it a good `v0.1` kernel: minimal, general, validated, and already

package/docs/ORP_REASONING_KERNEL_V0_1.md CHANGED Viewed

@@ -15,6 +15,16 @@ For the supporting benchmark evidence and alternatives analysis behind this
 design, see
 [docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_TECHNICAL_VALIDATION.md).
+For the explicit evidence gaps and next comparative experiments, see:
+- [docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_COMPARISON_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_PICKUP_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_AGENT_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_AGENT_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_CANONICAL_CONTINUATION_PILOT.md)
+- [docs/ORP_REASONING_KERNEL_EVOLUTION.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVOLUTION.md)
+- [docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVIDENCE_MATRIX.md)
+- [docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md](/Volumes/Code_2TB/code/orp/docs/ORP_REASONING_KERNEL_EVALUATION_PLAN.md)
 It should make three things true at once:
 - humans can speak naturally at the boundary
@@ -441,6 +451,7 @@ Rust and web should reflect the kernel, not redefine it.
 That means:
 - kernel schema and validation rules belong in the CLI
+- kernel observation, proposal, and migration rules belong in the CLI
 - Rust may expose kernel views, prompts, or editing affordances
 - web may expose kernel-backed artifact cards and review surfaces
 - neither Rust nor web should invent competing kernel semantics

package/docs/ORP_YOUTUBE_INSPECT.md ADDED Viewed

@@ -0,0 +1,97 @@
+# ORP YouTube Inspect
+`orp youtube inspect` is ORP's first-class public-source ingestion surface for
+YouTube videos.
+It gives agents and users a stable way to turn a YouTube link into:
+- normalized video metadata,
+- public caption transcript text when available,
+- segment-level timing rows,
+- and one agent-friendly `text_bundle` field that can be handed directly into
+  summarization, extraction, comparison, or kernel-shaped artifact creation.
+## Why this exists
+Agents often receive a raw YouTube URL and are asked:
+- what is this video about?
+- summarize it,
+- extract claims,
+- capture action items,
+- compare it against repo work,
+- or turn it into a canonical ORP artifact.
+Without a built-in surface, each agent has to improvise scraping, transcript
+discovery, and output shape. ORP now treats this as a real protocol ability.
+## Command
+```bash
+orp youtube inspect https://www.youtube.com/watch?v=<video_id> --json
+```
+Optional persistence:
+```bash
+orp youtube inspect https://www.youtube.com/watch?v=<video_id> --save --json
+orp youtube inspect https://www.youtube.com/watch?v=<video_id> --out analysis/source.youtube.json --json
+```
+## Output shape
+The canonical artifact schema is:
+- `spec/v1/youtube-source.schema.json`
+The command returns:
+- source identity:
+  - `source_url`
+  - `canonical_url`
+  - `video_id`
+- metadata:
+  - `title`
+  - `author_name`
+  - `author_url`
+  - `thumbnail_url`
+  - `channel_id`
+  - `description`
+  - `duration_seconds`
+  - `published_at`
+  - `playability_status`
+- transcript fields:
+  - `transcript_available`
+  - `transcript_language`
+  - `transcript_track_name`
+  - `transcript_kind`
+  - `transcript_fetch_mode`
+  - `transcript_text`
+  - `transcript_segments`
+- agent-ready bundle:
+  - `text_bundle`
+- capture notes:
+  - `warnings`
+## Save behavior
+`--save` writes the artifact to:
+```text
+orp/external/youtube/<video_id>.json
+```
+This keeps YouTube ingestion consistent with ORP's larger local-first artifact
+discipline while staying outside the evidence boundary by default.
+## Important boundary
+`orp youtube inspect` returns public source context. It does **not** make the
+result canonical evidence by itself.
+If a video matters for repo truth, the agent should still:
+1. inspect the video,
+2. summarize or structure the relevant claims,
+3. promote that into a typed ORP artifact when appropriate,
+4. and cite the saved source artifact path alongside any downstream result.