npm - pi-autoresearch-vkf - Versions diffs - 0.5.0 → 0.5.2 - Mend

pi-autoresearch-vkf 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,26 @@
 # Changelog
-## 0.5.0 — unreleased
+## 0.5.2
+Prefixed all skill names with `autoresearch-vkf-` to avoid namespace conflicts
+with other tooling. Renamed `knowledge-gather`, `claim-extract`, `claim-verify`,
+`contradiction-miner`, `cross-domain-transfer`, `idea-tournament`,
+`hypothesis-loop`, and `research-report`; all cross-references in the skills,
+the README, and the extension were updated accordingly. No behavior change.
+## 0.5.1
+Fix tool/skill name collisions with pi-autoresearch (both can now load together).
+- Tools `run_experiment` → **`vkf_run_experiment`** and `log_experiment` →
+  **`vkf_log_experiment`** (pi-autoresearch registers the bare names; pi requires
+  globally-unique tool names across loaded extensions).
+- Skill `autoresearch-create` → **`autoresearch-vkf`** (pi-autoresearch ships a
+  skill of the same name). Invoke the loop via the `autoresearch-vkf` skill now.
+- Docs/skills/benchmark updated accordingly. No behavior change.
+## 0.5.0 — first published release
 Self-contained workspace (breaking path change).

package/README.md CHANGED Viewed

@@ -51,7 +51,7 @@ pi install file:/path/to/pi-autoresearch-vkf
 ### Knowledge sources (how ingestion works)
 The extension stores and reasons over knowledge; it does **not** fetch papers
-itself. Gathering is done by the host agent through the `knowledge-gather` skill,
+itself. Gathering is done by the host agent through the `autoresearch-vkf-knowledge-gather` skill,
 using the agent's built-in **`WebSearch` + `WebFetch`** against free, openly
 accessible databases — no API keys, no paid services, no MCP setup:
@@ -74,7 +74,7 @@ In a project you want to optimize:
 optimize the test suite runtime, using the research literature and remembering what works
 ```
-The **autoresearch-create** skill drives it: confirm goal/metric/command → init →
+The **autoresearch-vkf** skill drives it: confirm goal/metric/command → init →
 gather literature → extract & verify claims → loop (recall → experiment →
 write-back) → report. All state lives in one self-contained `.autoresearch-vkf/`
 folder at the project root, so work **survives restarts and context resets**.
@@ -86,12 +86,12 @@ goal ─► recall_memory ─► gather literature ─► remember_claim (candid
    │                                              │
    │                                         verify_claim ──► trusted claims
    ▼                                              │
- hypothesis-loop:  recall ─► pick idea ─► run_experiment ─► log_experiment
+ autoresearch-vkf-hypothesis-loop:  recall ─► pick idea ─► vkf_run_experiment ─► vkf_log_experiment
    │                                                            │
    │                                  writes experiment card back to memory,
    │                                  updates the claim's belief & lifecycle
    ▼
- research-report   (paper → claim → hypothesis → patch → metric Δ → memory update)
+ autoresearch-vkf-research-report   (paper → claim → hypothesis → patch → metric Δ → memory update)
 ```
 ### One self-contained workspace
@@ -135,8 +135,8 @@ verifier — is the defense against **memory poisoning**.
 | `score_ideas` | Rank untested ideas by `EV × feasibility × evidence × novelty × info_gain ÷ cost`. |
 | `find_contradictions` | Mine memory for tensions between claims — each a seed for a novel hypothesis. |
 | `find_transfers` | Cross-domain mechanism search: same *how*, different *where*. |
-| `run_experiment` | Run the measurement command; capture `METRIC name=value`. |
-| `log_experiment` | Record a result, write it back to memory, update belief & lifecycle. |
+| `vkf_run_experiment` | Run the measurement command; capture `METRIC name=value`. |
+| `vkf_log_experiment` | Record a result, write it back to memory, update belief & lifecycle. |
 | `promote_to_global` | Copy a trusted card into the cross-project global memory. |
 | `export_dashboard` | Write browser dashboards: a live progress page + the `vkf html` idea-lineage graph. |
 | `research_status` | Show session experiments + memory lifecycle. |
@@ -145,15 +145,15 @@ verifier — is the defense against **memory poisoning**.
 | Skill | Role |
 |-------|------|
-| `autoresearch-create` | Orchestrator / spine — the entry point. |
-| `knowledge-gather` | Find candidate techniques via WebSearch/WebFetch (arXiv / Semantic Scholar / OpenAlex / GitHub). |
-| `claim-extract` | Distill sources into reusable claim cards. |
-| `claim-verify` | Check citations & codebase fit — the trust layer. |
-| `contradiction-miner` | Turn tensions in memory into novel hypotheses. |
-| `cross-domain-transfer` | Import a mechanism from another field. |
-| `idea-tournament` | Multi-perspective debate to pick the 2–3 ideas worth testing. |
-| `hypothesis-loop` | Pick the next idea and run the smallest falsifying experiment. |
-| `research-report` | The auditable lineage report. |
+| `autoresearch-vkf` | Orchestrator / spine — the entry point. |
+| `autoresearch-vkf-knowledge-gather` | Find candidate techniques via WebSearch/WebFetch (arXiv / Semantic Scholar / OpenAlex / GitHub). |
+| `autoresearch-vkf-claim-extract` | Distill sources into reusable claim cards. |
+| `autoresearch-vkf-claim-verify` | Check citations & codebase fit — the trust layer. |
+| `autoresearch-vkf-contradiction-miner` | Turn tensions in memory into novel hypotheses. |
+| `autoresearch-vkf-cross-domain-transfer` | Import a mechanism from another field. |
+| `autoresearch-vkf-idea-tournament` | Multi-perspective debate to pick the 2–3 ideas worth testing. |
+| `autoresearch-vkf-hypothesis-loop` | Pick the next idea and run the smallest falsifying experiment. |
+| `autoresearch-vkf-research-report` | The auditable lineage report. |
 ### The `.autoresearch-vkf/` workspace
@@ -282,7 +282,7 @@ Verify what will ship first with `npm pack --dry-run`.
 All four planned phases are in: the lean MVP (Phase 1), the **novelty scorer**
 (Phase 2), the **hypothesis-synthesis layer** (Phase 3 — `find_contradictions`,
-`find_transfers`, `idea-tournament`), and **global cross-project memory + the
+`find_transfers`, `autoresearch-vkf-idea-tournament`), and **global cross-project memory + the
 benchmark** (Phase 4).
 Possible next steps:

package/extensions/pi-autoresearch-vkf/index.ts CHANGED Viewed

@@ -112,7 +112,7 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
       if (existing) {
         refreshWidget(ctx, root);
         return textResult(
-          `A research session already exists: "${existing.name}".\nSession: ${sp.dir}\nMemory:  ${memoryPaths(root).dir}\nContinue the loop with recall_memory → run_experiment → log_experiment.`,
+          `A research session already exists: "${existing.name}".\nSession: ${sp.dir}\nMemory:  ${memoryPaths(root).dir}\nContinue the loop with recall_memory → vkf_run_experiment → vkf_log_experiment.`,
           { created: false },
         );
       }
@@ -144,7 +144,7 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
           `Memory bundle: ${memoryPaths(root).dir} ${fresh ? "(new)" : "(existing)"} — profile ${config.memoryProfile}.`,
           `Optimizing ${config.metricName} (${config.direction} is better).`,
           "",
-          "Next: gather literature (knowledge-gather skill) → remember_claim candidates → verify_claim → recall_memory to pick an idea → run_experiment → log_experiment.",
+          "Next: gather literature (autoresearch-vkf-knowledge-gather skill) → remember_claim candidates → verify_claim → recall_memory to pick an idea → vkf_run_experiment → vkf_log_experiment.",
         ].join("\n"),
         { created: true },
       );
@@ -491,7 +491,7 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
       if (ideas.length === 0) {
         return textResult(
-          "No untested ideas to score. Gather literature (knowledge-gather) and remember_claim some candidates first.",
+          "No untested ideas to score. Gather literature (autoresearch-vkf-knowledge-gather) and remember_claim some candidates first.",
           { ranked: 0 },
         );
       }
@@ -620,7 +620,7 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
     },
   });
-  // ── run_experiment ─────────────────────────────────────────────────────────
+  // ── vkf_run_experiment ───────────────────────────────────────────────────────
   const RunParams = Type.Object({
     command: Type.Optional(Type.String({ description: "Command to run (via `bash -lc`). Defaults to the session's configured command." })),
     claim_id: Type.Optional(Type.String({ description: "The claim/idea this run is testing, for logging." })),
@@ -629,10 +629,10 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
   });
   pi.registerTool({
-    name: "run_experiment",
+    name: "vkf_run_experiment",
     label: "Run experiment",
     description:
-      "Run the measurement command and capture its output and any `METRIC name=number` lines. Does not judge or record an outcome — read the metric, then record it with log_experiment.",
+      "Run the measurement command and capture its output and any `METRIC name=number` lines. Does not judge or record an outcome — read the metric, then record it with vkf_log_experiment.",
     parameters: RunParams,
     async execute(_id, params: Static<typeof RunParams>, signal, _onUpdate, ctx): Promise<AgentToolResult<{ code: number; metrics: Record<string, number> }>> {
       const root = resolveRoot(ctx);
@@ -667,7 +667,7 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
     },
   });
-  // ── log_experiment ─────────────────────────────────────────────────────────
+  // ── vkf_log_experiment ───────────────────────────────────────────────────────
   const LogParams = Type.Object({
     description: Type.String({ description: "What was changed in this experiment, in words." }),
     value: Type.Number({ description: "The metric value obtained." }),
@@ -681,7 +681,7 @@ export default function autoresearchExtension(pi: ExtensionAPI): void {
   });
   pi.registerTool({
-    name: "log_experiment",
+    name: "vkf_log_experiment",
     label: "Log experiment",
     description:
       "Record an experiment's result. Appends to the session log AND writes an experiment card back to the VKF memory (a win OR a loss is durable knowledge), updating the tested claim's belief and lifecycle. This write-back is what lets future runs avoid repeating work.",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-autoresearch-vkf",
-  "version": "0.5.0",
+  "version": "0.5.2",
   "type": "module",
   "description": "Autoresearch with verifiable long-term scientific memory. A pi extension that gathers literature, stores it as VKF claims, runs experiments, and writes verified results back to a git-native knowledge bundle so future runs build on what was learned instead of rediscovering it.",
   "keywords": [

package/skills/{autoresearch-create → autoresearch-vkf}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: autoresearch-create
+name: autoresearch-vkf
 description: Run an autoresearch loop with verifiable long-term memory. Use when asked to optimize/improve a measurable target (test speed, bundle size, model loss, build time, Lighthouse score, …) by drawing on the research literature and remembering what was learned across runs. Orchestrates init → gather literature → extract & verify claims → recall → experiment → write results back to VKF memory → report.
 ---
@@ -21,8 +21,8 @@ You are the spine. Delegate the specialized work to the sub-skills below.
 - `score_ideas` — rank untested ideas by priority (EV × feasibility × evidence × novelty × info_gain ÷ cost).
 - `find_contradictions` — mine memory for tensions that seed novel hypotheses.
 - `find_transfers` — cross-domain mechanism search for surprising analogies.
-- `run_experiment` — run the measurement command, capture `METRIC name=value`.
-- `log_experiment` — record a result and write it back to memory (updates belief & lifecycle).
+- `vkf_run_experiment` — run the measurement command, capture `METRIC name=value`.
+- `vkf_log_experiment` — record a result and write it back to memory (updates belief & lifecycle).
 - `research_status` — show session + memory state.
 ## The two layers
@@ -51,23 +51,23 @@ transaction record — promotion is an explicit, audited step.
    gathering anything. If prior runs already learned something, build on it and
    skip rediscovery.
-4. **Gather literature** → use the **knowledge-gather** skill to find candidate
+4. **Gather literature** → use the **autoresearch-vkf-knowledge-gather** skill to find candidate
    techniques (via `WebSearch`/`WebFetch` against free databases — arXiv, Semantic
-   Scholar, OpenAlex), then **claim-extract** to turn them into structured claims
-   via `remember_claim`. Then **claim-verify** to check citations and codebase fit.
+   Scholar, OpenAlex), then **autoresearch-vkf-claim-extract** to turn them into structured claims
+   via `remember_claim`. Then **autoresearch-vkf-claim-verify** to check citations and codebase fit.
 4b. **Synthesize new ideas** (optional but high-value) → mine memory for novelty
-   instead of only retrieving it: **contradiction-miner** (tensions →
-   hypotheses), **cross-domain-transfer** (import a mechanism from another field).
-   When many ideas compete for budget, run the **idea-tournament** skill to pick
+   instead of only retrieving it: **autoresearch-vkf-contradiction-miner** (tensions →
+   hypotheses), **autoresearch-vkf-cross-domain-transfer** (import a mechanism from another field).
+   When many ideas compete for budget, run the **autoresearch-vkf-idea-tournament** skill to pick
    the 2–3 worth testing.
-5. **Loop** → use the **hypothesis-loop** skill: `recall_memory` → pick the
+5. **Loop** → use the **autoresearch-vkf-hypothesis-loop** skill: `recall_memory` → pick the
    highest-value, sufficiently-novel idea → implement the smallest falsifying
-   change → `run_experiment` → `log_experiment` → repeat. Keep wins, revert
+   change → `vkf_run_experiment` → `vkf_log_experiment` → repeat. Keep wins, revert
    regressions; either way the result is now in memory.
-6. **Report** → use the **research-report** skill to produce the lineage report
+6. **Report** → use the **autoresearch-vkf-research-report** skill to produce the lineage report
    (paper → claim → hypothesis → patch → metric Δ → status → memory update).
 Keep `.autoresearch-vkf/session/prompt.md` current so a fresh agent can continue. The loop is

package/skills/{claim-extract → autoresearch-vkf-claim-extract}/SKILL.md RENAMED Viewed

@@ -1,6 +1,6 @@
 ---
-name: claim-extract
-description: Convert gathered literature into structured, reusable VKF claim cards (research atoms). Use after knowledge-gather to stage candidate claims in memory with remember_claim. Turns noisy papers into small, checkable, reusable assertions.
+name: autoresearch-vkf-claim-extract
+description: Convert gathered literature into structured, reusable VKF claim cards (research atoms). Use after autoresearch-vkf-knowledge-gather to stage candidate claims in memory with remember_claim. Turns noisy papers into small, checkable, reusable assertions.
 ---
 # Extract claims from literature
@@ -19,7 +19,7 @@ Call `remember_claim` with:
   "Replacing static gradient clipping with EMA-based adaptive clipping lowers
   early-training validation loss for small transformers."
 - **mechanism** — *why* it should work. This is the most valuable field: it's
-  what later lets the hypothesis-loop transfer the idea across domains.
+  what later lets the autoresearch-vkf-hypothesis-loop transfer the idea across domains.
 - **context** — where it applies (architecture, scale, dataset regime).
 - **implementation_recipe** — concretely how to apply it in this codebase.
 - **failure_modes** — known/suspected ways it breaks or interacts badly.
@@ -41,4 +41,4 @@ Call `remember_claim` with:
   theoretical, or anecdotal in the confidence/reliability you assign.
 Everything you stage here is a **candidate** (status `draft`) with a transaction
-record — nothing is trusted yet. Hand off to **claim-verify**.
+record — nothing is trusted yet. Hand off to **autoresearch-vkf-claim-verify**.

package/skills/{claim-verify → autoresearch-vkf-claim-verify}/SKILL.md RENAMED Viewed

@@ -1,6 +1,6 @@
 ---
-name: claim-verify
-description: Verify staged candidate claims before the loop builds on them — check that the cited source really says it, classify the evidence, and confirm codebase relevance. Use after claim-extract to promote or downgrade claims with verify_claim. This is the trust layer that prevents memory poisoning.
+name: autoresearch-vkf-claim-verify
+description: Verify staged candidate claims before the loop builds on them — check that the cited source really says it, classify the evidence, and confirm codebase relevance. Use after autoresearch-vkf-claim-extract to promote or downgrade claims with verify_claim. This is the trust layer that prevents memory poisoning.
 ---
 # Verify claims
@@ -32,7 +32,7 @@ Check, in order:
   `conflicts_with` the other card's id.
 - `deprecated` — true but stale/superseded/not applicable here.
 - `rejected` — misread, hallucinated, or unsupported.
-- (`locally_tested` / `replicated` are normally set by `log_experiment`, not here.)
+- (`locally_tested` / `replicated` are normally set by `vkf_log_experiment`, not here.)
 Always give a **reason** — it becomes part of the audit trail (a VKF
 transaction). After each call, the tool reports `vkf validate` so you can see the
@@ -46,5 +46,5 @@ bundle stays governed.
   the source.
 - A claim's truth in a paper ≠ its usefulness for our goal. Keep those separate.
-When the trusted set is healthy, hand back to **autoresearch-create** for the
-**hypothesis-loop**.
+When the trusted set is healthy, hand back to **autoresearch-vkf** for the
+**autoresearch-vkf-hypothesis-loop**.

package/skills/{contradiction-miner → autoresearch-vkf-contradiction-miner}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: contradiction-miner
+name: autoresearch-vkf-contradiction-miner
 description: Generate novel hypotheses from tensions already in memory — conflicting claims, ideas that won in one place and lost in another, and different mechanisms aimed at the same goal. Use when the loop needs fresh, non-obvious ideas rather than more literature.
 ---
@@ -44,8 +44,8 @@ Record it with `remember_claim`, setting:
 - a `mechanism` (required — a hypothesis with no mechanism is just noise),
 - an honest `confidence` (these are speculative; start low–medium).
-It enters memory as a **candidate** like any other idea — then `claim-verify` and
-the `hypothesis-loop` (via `score_ideas`) decide whether it's worth testing.
+It enters memory as a **candidate** like any other idea — then `autoresearch-vkf-claim-verify` and
+the `autoresearch-vkf-hypothesis-loop` (via `score_ideas`) decide whether it's worth testing.
 ## Discipline

package/skills/{cross-domain-transfer → autoresearch-vkf-cross-domain-transfer}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: cross-domain-transfer
+name: autoresearch-vkf-cross-domain-transfer
 description: Generate novel ideas by importing a mechanism from another field into the current problem. Use when you want surprising analogies that keyword search misses — search by mechanism, not keywords.
 ---
@@ -47,11 +47,11 @@ Record the best candidate with `remember_claim`:
 - `failure_modes` — note where the analogy might break (the assumptions the source
   domain has that yours doesn't).
-Then let `claim-verify` and `score_ideas` decide if it earns an experiment.
+Then let `autoresearch-vkf-claim-verify` and `score_ideas` decide if it earns an experiment.
 ## Discipline
 - **Require a mechanistic reason for transfer**, not just surface similarity. "Both
   use matrices" is not a transfer.
 - If you gathered claims only from your own domain, there's nothing to transfer
-  *from* — use `knowledge-gather` to pull in adjacent fields first.
+  *from* — use `autoresearch-vkf-knowledge-gather` to pull in adjacent fields first.

package/skills/{hypothesis-loop → autoresearch-vkf-hypothesis-loop}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: hypothesis-loop
+name: autoresearch-vkf-hypothesis-loop
 description: The core experiment loop — recall memory, pick the highest-value sufficiently-novel idea, run the smallest falsifying experiment, and write the result back to memory. Use to drive iterations of an autoresearch loop after claims have been gathered and verified.
 ---
@@ -33,9 +33,9 @@ ideas deliberately and you never repeat settled work.
      regress), *risk* (what could break), *novelty basis* (why it's not a repeat).
 4. **Run the smallest falsifying experiment.** Make the minimal change in scope,
-   then `run_experiment`. Read the `METRIC` line — don't eyeball logs.
+   then `vkf_run_experiment`. Read the `METRIC` line — don't eyeball logs.
-5. **Judge honestly, then `log_experiment`.** Record the value, the tested
+5. **Judge honestly, then `vkf_log_experiment`.** Record the value, the tested
    `claim_id`, whether you `kept` it, conditions, and notes. The tool:
    - derives win/loss/inconclusive vs the baseline,
    - writes an **experiment card back to memory** (a loss is durable knowledge),
@@ -54,4 +54,4 @@ ideas deliberately and you never repeat settled work.
 - **One variable at a time** so the result attributes cleanly to the hypothesis.
 When you've made meaningful progress or exhausted promising ideas, hand to
-**research-report**.
+**autoresearch-vkf-research-report**.

package/skills/{idea-tournament → autoresearch-vkf-idea-tournament}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: idea-tournament
+name: autoresearch-vkf-idea-tournament
 description: Run a structured multi-perspective tournament over candidate ideas to pick the 2-3 worth testing. Use when there are many candidate hypotheses competing for limited experiment budget.
 ---
@@ -7,7 +7,7 @@ description: Run a structured multi-perspective tournament over candidate ideas
 When many ideas compete for a limited experiment budget, don't just take the top
 of one ranking. Run a tournament: judge each idea from several perspectives, then
-advance only the best 2–3 to the `hypothesis-loop`.
+advance only the best 2–3 to the `autoresearch-vkf-hypothesis-loop`.
 ## Assemble the field
@@ -45,7 +45,7 @@ the numbers miss — especially the Skeptic's failure-mode and gaming checks.
   `verify_claim`) so the tournament's reasoning is remembered and they aren't
   re-litigated next round.
-Hand the 2–3 winners to the **hypothesis-loop**.
+Hand the 2–3 winners to the **autoresearch-vkf-hypothesis-loop**.
 ## Discipline

package/skills/{knowledge-gather → autoresearch-vkf-knowledge-gather}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: knowledge-gather
+name: autoresearch-vkf-knowledge-gather
 description: Gather frontier knowledge relevant to a research goal — search papers, repos, docs, and benchmarks for candidate techniques. Use as the discovery step of an autoresearch loop, before extracting claims. Collects candidate knowledge; it does not invent ideas or run experiments.
 ---
@@ -63,5 +63,5 @@ For each candidate, capture enough to become a claim later:
 - **Look for contradictions and gaps** between sources — they're the richest
   seeds for novel hypotheses later.
-Hand the collected candidates to **claim-extract**, which writes them into memory
+Hand the collected candidates to **autoresearch-vkf-claim-extract**, which writes them into memory
 with `remember_claim`.

package/skills/{research-report → autoresearch-vkf-research-report}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: research-report
+name: autoresearch-vkf-research-report
 description: Produce the autoresearch report with full idea lineage — paper → claim → hypothesis → patch → metric change → status → memory update. Use to summarize an autoresearch run into an auditable, human-readable report at .autoresearch-vkf/session/report.md.
 ---