npm - @mcptoolshop/research-os - Versions diffs - 0.1.1 → 0.3.0 - Mend

@mcptoolshop/research-os 0.1.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,158 @@
 All notable changes to `research-os` are documented here.
+## [0.3.0] — 2026-05-09
+Tight release. One real, tested, dogfooded improvement: `--detector` flag on
+`research-os contradict map`. F-09 from Experiment 3 Session 1 (XRPL pack)
+earned the fix. No other v0.3 candidates shipped — F-01 (init version-stamp),
+F-02 (packs-dir docs), F-05 (discover --query example), F-08 (Windows process
+recovery) are deferred to v0.3.x.
+### Added
+- **`--detector <auto|heuristic|ollama-intern>`** flag on
+  `research-os contradict map`. Three explicit modes:
+  - `auto` (default) — preserves env-var-driven behavior. When the
+    configured Ollama model is available, runs the LLM detector;
+    otherwise falls through to heuristic. Mirrors v0.2.x behavior.
+  - `heuristic` — bypasses Ollama entirely. No model availability check,
+    no LLM calls. Always works. Always completes quickly.
+  - `ollama-intern` — requires the configured model. Exits with code 2
+    and a visible failure message if the model is unavailable, instead
+    of silently falling back to heuristic.
+  Invalid `--detector` values exit with code 2. The mode chosen is
+  announced visibly on the first output line of every run; there are no
+  silent shifts. Full reference: [`docs/contradict-map.md`](docs/contradict-map.md).
+- **12 new tests** in `test/contradictions-detector-flag.test.ts` covering
+  all three modes (heuristic never instantiates the Ollama client;
+  ollama-intern errors visibly when model unavailable; auto preserves
+  existing behavior; invalid value fails fast), heuristic ledger validity,
+  and a regression fixture that mirrors the XRPL Section 01 pattern (~60
+  claims with ~5-token shared vocabulary completes via heuristic in well
+  under 30 seconds).
+- **`docs/contradict-map.md`** — full CLI reference: detector modes, mode
+  announcements (verbatim strings), when-to-use-which guidance, and the
+  release thesis.
+- **Handbook page** at `/handbook/contradict-map` — condensed reference
+  matching the docs page.
+### Changed
+- **CLI `--help`** for `contradict map` now lists the three `--detector`
+  choices.
+- **Reference page** in the handbook updated to mention the flag and
+  link to the new contradict-map page.
+### Documentation
+- README status block updated to v0.3.0; version badge updated.
+- `docs/roadmap.md` Experiment 3 entry: F-09 chain blocker noted as
+  resolved in v0.3.0 (Experiment 3 itself remains in progress; closure
+  requires a third external-domain pack).
+- **Cross-repo:** `research-packs/docs/operator-playbook.md` updated in
+  the same release window. The earlier "clear `OLLAMA_INTERN_MODEL` to
+  force heuristic" workaround is replaced with `--detector heuristic`
+  as the canonical operator surface. The handbook mirror in this repo
+  is kept consistent with the canonical.
+- `SHIP_GATE.md` D2 updated: version bump + tag for v0.3.0.
+### Tests
+- **527 total** (515 at v0.2.0 → 527 at v0.3.0, +12 from
+  `test/contradictions-detector-flag.test.ts`).
+### Migration notes
+No code-level migration required. Existing scripts that don't pass
+`--detector` continue to work via `auto` mode.
+For operators who previously cleared `OLLAMA_INTERN_MODEL` to force the
+heuristic detector: that pattern still works in environments where the
+default model isn't installed, but the flag is the canonical surface
+and is environment-independent. Switch to `--detector heuristic` when
+re-running narrow-topic sections; the v0.3.0 operator-playbook update
+in `research-packs` documents the rationale.
+## [0.2.0] — 2026-05-09
+Tight release. Two real, tested, dogfooded improvements: `research-os pack publish`
+(Experiment 2) and the Pattern 2 readiness predicate fix (Session 11 escalation).
+No other v0.2 candidates shipped — remaining 7 items (large-page chunker, JSON-aware
+excerpt chunker, publisher derivation, model-fallback warnings, contradict detector
+strategy, GitHub source guidance, llms.txt guard) are deferred to v0.3/v0.x.
+### Added
+- **`research-os pack publish`** — exports a frozen pack into the canonical
+  [`research-packs`](https://github.com/mcp-tool-shop-org/research-packs) archive format.
+  CLI: `research-os pack publish --to <path> [--from <path>] [--operator-notes <text>] [--force] [--dry-run]`.
+  Exit 0 on PASS, 2 on refusal. Derives `pack.manifest.json` from pack artifacts,
+  generates `README.md` from `synthesis/final-report.md`, provisions `docs/how-to-read-this.md`
+  scaffold, verifies the admission contract (5 required files, sha256 receipt reproduction,
+  all fingerprinted artifacts). See [`docs/pack-publish.md`](docs/pack-publish.md).
+- **48 new tests** under `test/pack-publish/` covering all 8 minimum-scope behaviors
+  (copy, manifest derivation, sha256 verification, accepted-claims derivation,
+  preserved-contradiction-records derivation, README generation, how-to-read scaffold,
+  inline verify-pack) plus refusal cases (missing receipt, missing synthesis,
+  freeze-refusal present, non-empty target without --force, tampered artifacts).
+- **`docs/pack-publish.md`** — full CLI reference: flags, refusal cases, produced layout,
+  what the command does NOT do, typical operator workflow.
+- **`docs/pack-publish-dogfood.md`** — dogfood receipt: both existing `research-packs`
+  packages re-derived via `pack publish` and verified by `research-packs/scripts/verify-pack.mjs`.
+  `comfyui-workflow-durability` PASS (302 claims, 124 artifacts); `research-os-self-dogfood`
+  PASS (296 claims, 131 artifacts).
+- **Handbook page** at `/handbook/pack-publish` — condensed reference with flags, refusal
+  cases, typical workflow, and links to the full reference doc and dogfood receipt.
+### Changed
+- **Pattern 2 readiness predicate enforcement completed** (commit `22b5dba`).
+  `src/cowork/derive.ts:determineMode` and `src/audit/aggregate.ts:buildReadinessSummary`
+  now use `active_blockers.length === 0` semantics instead of `repair_claim_ids.length === 0`
+  and `repair_claims === 0`. Under Pattern 2, `needs_scope_repair`, `needs_source_repair`,
+  and `needs_human_review` decisions are settled state (review ran, gate passed with
+  sufficient accepted claims) — they are not active blockers.
+  **Behavioral change:** packs that previously returned `audit: repair_required` or
+  `handoff: repair_required` solely because claims carry intermediate reviewer decisions
+  (not because any active gate blocker remains) now correctly return
+  `audit: ready_for_synthesis` / `handoff: synthesis_ready`. The `active_blockers` field
+  is now the authoritative readiness signal; it was already correctly computed but was
+  not wired into the verdict in v0.1.
+  The v0.1 dogfood pack (`research-os-self-dogfood`) is regression-clean under the new
+  predicate — it used the heuristic reviewer (only `accepted_for_synthesis` and `rejected`
+  decisions), so `repair_claim_ids.length === 0` was equivalent to `active_blockers.length === 0`
+  by coincidence. That coincidence is gone; the intent is now enforced directly.
+### Documentation
+- README status block updated to v0.2.0; `pack publish` mentioned in the workflow chain.
+- `docs/roadmap.md` Experiment 2 entry updated: `IMPLEMENTED → CLOSED 2026-05-09`.
+- `SHIP_GATE.md` D2 updated: version bump + tag for v0.2.0.
+### Tests
+- **515 total** (467 at v0.1.1 → 515 at v0.2.0, +48 from `test/pack-publish/`).
+### Migration notes
+No migration required for existing v0.1.x packs. The Pattern 2 predicate change is
+forward-only: if your pack previously returned `repair_required` and the verdict was
+wrong (all active gate blockers were already resolved), re-running `research-os audit`
+or `research-os cowork handoff` after upgrading to v0.2.0 will return the correct
+`ready_for_synthesis` / `synthesis_ready` verdict. Existing freeze receipts remain valid.
 ## [0.1.1] — 2026-05-08
 Documentation and release-alignment patch. No code or behavior changes — all production source and tests are identical to v0.1.0 (463 vitest cases, all passing).

package/README.md CHANGED Viewed

@@ -7,7 +7,7 @@
 </p>
 <p align="center">
-  <a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.1.1"><img src="https://img.shields.io/badge/version-0.1.1-blue" alt="version 0.1.1"></a>
+  <a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.3.0"><img src="https://img.shields.io/badge/version-0.3.0-blue" alt="version 0.3.0"></a>
   <a href="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml"><img src="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
   <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
   <img src="https://img.shields.io/badge/node-%E2%89%A520-brightgreen" alt="Node ≥20">
@@ -24,7 +24,67 @@ Local-first CLI that turns an open-ended topic into a gated **research-pack**
 It is not a report generator. It is not an LLM-orchestration framework. It does not write your synthesis for you. It enforces the conditions under which synthesis can begin.
-**v0.1 has been used exactly once: by itself, on itself.** That single use found seven correctness gaps in `research-os`, each fixed before this release. The proof trail — seven sessions, two integration patterns earned, 463 vitest cases, one frozen pack — lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
+Frozen packs are archived in [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs) — live, with two day-one packages. See [`docs/roadmap.md`](docs/roadmap.md) for the v1.0 path.
+v0.1 has been pressure-tested in two dogfood arcs. The first — research-os researching its own spec — found seven correctness gaps before the v0.1.0 release, each requiring a real code fix and earning a law or integration pattern. The second (v1 Experiment 1: ComfyUI workflow durability, 11 sessions, a domain with no vocabulary overlap with research-os) closed 2026-05-09: pack frozen, archive live, Pattern 2 enforcement completed via commit `22b5dba`. The v0.1 proof trail lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md); the Experiment 1 proof lives in [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
+## Install
+**Requirements:** Node.js ≥ 20.
+```bash
+npm install -g @mcptoolshop/research-os
+```
+For contributors building from source:
+```bash
+git clone https://github.com/mcp-tool-shop-org/research-os.git
+cd research-os
+npm install
+npm run build
+npm link
+```
+## Quick start
+```bash
+# Create a new research-pack
+research-os init "How should X be structured?"
+# Add a section
+research-os section add 01-landscape --purpose "Map the current landscape"
+# Discover and approve sources, then gather
+research-os discover run 01-landscape
+research-os discover approve 01-landscape --top 8
+research-os gather 01-landscape --approved
+# Run the per-section chain
+research-os claim extract 01-landscape
+research-os claim audit-density 01-landscape
+research-os claim triage 01-landscape
+research-os contradict map 01-landscape --triaged-only
+research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
+research-os review-promote 01-landscape --profile hermes-two-pass
+research-os gate 01-landscape
+research-os section report 01-landscape
+# Pack-level finish
+research-os audit
+research-os index build --all
+research-os cowork handoff
+research-os synth workspace   # only if handoff returned synthesis_ready
+research-os freeze
+# Export to the research-packs archive
+research-os pack publish \
+  --to <research-packs>/packages/<name>
+```
+**For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
+**Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
 ## The 16 load-bearing laws
@@ -76,55 +136,6 @@ Each step is a CLI command. Each step writes to append-only artifacts. No step s
 This is the structural alternative to *search → summarize → pretty report*. The chain is the product.
-## Install
-**Requirements:** Node.js ≥ 20.
-```bash
-# From source (v0.1.0 is not yet published to npm)
-git clone https://github.com/mcp-tool-shop-org/research-os.git
-cd research-os
-npm install
-npm run build
-npm link   # makes `research-os` available on your PATH
-```
-## Quick start
-```bash
-# Create a new research-pack
-research-os init "How should X be structured?"
-# Add a section
-research-os section add 01-landscape --purpose "Map the current landscape"
-# Discover and approve sources, then gather
-research-os discover run 01-landscape
-research-os discover approve 01-landscape --top 8
-research-os gather 01-landscape --approved
-# Run the per-section chain
-research-os claim extract 01-landscape
-research-os claim audit-density 01-landscape
-research-os claim triage 01-landscape
-research-os contradict map 01-landscape --triaged-only
-research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
-research-os review-promote 01-landscape --profile hermes-two-pass
-research-os gate 01-landscape
-research-os section report 01-landscape
-# Pack-level finish
-research-os audit
-research-os index build --all
-research-os cowork handoff
-research-os synth workspace   # only if handoff returned synthesis_ready
-research-os freeze
-```
-**For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
-**Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
 ## Vocabulary
 | Term | Meaning |
@@ -140,24 +151,33 @@ research-os freeze
 ## Status
-**v0.1.0** — frozen 2026-05-08. The dogfood pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. 463/463 vitest passing. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and the freeze receipt fingerprints.
+**v0.3.0** — published to npm as `@mcptoolshop/research-os@0.3.0`, 2026-05-09. Ships the `--detector <auto|heuristic|ollama-intern>` flag on `contradict map` (F-09 chain-blocker fix from Experiment 3 Session 1, XRPL pack). 527/527 vitest passing. See [CHANGELOG.md](CHANGELOG.md) and [`docs/contradict-map.md`](docs/contradict-map.md).
+**`contradict map --detector`** — Detector selection is now an explicit operator choice instead of a state-dependent env-var dance. `auto` (default) preserves prior behavior; `heuristic` always works without LLM and completes quickly on narrow-topic documentation sections; `ollama-intern` requires the configured model and exits visibly if unavailable. Mode is announced visibly on every run. The earlier "clear `OLLAMA_INTERN_MODEL` to force heuristic" workaround is superseded by `--detector heuristic` as the canonical operator surface; see the [research-packs operator playbook](https://github.com/mcp-tool-shop-org/research-packs/blob/main/docs/operator-playbook.md) and the [handbook contradict-map page](https://mcp-tool-shop-org.github.io/research-os/handbook/contradict-map/).
+**v0.2.0** — published 2026-05-09. Shipped `research-os pack publish` (Experiment 2) and the Pattern 2 readiness predicate fix. 515/515 vitest passing then. See [CHANGELOG.md](CHANGELOG.md). Frozen packs export to the canonical `research-packs` archive with a single command; admission contract is enforced by code, not checklist. See [`docs/pack-publish.md`](docs/pack-publish.md).
+**v0.1.0** — dogfood pack frozen 2026-05-08. The pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and freeze receipt fingerprints.
+**research-packs archive monorepo** — live at [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs) with two day-one packages. `comfyui-workflow-durability` (Experiment 1, 302 accepted claims, 8 sections) and `research-os-self-dogfood` (v0.1 dogfood backfill, 296 accepted claims, 8 sections). Both packages PASS `verify-pack.mjs`.
+**v1 Experiment 1 (ComfyUI workflow durability)** — CLOSED 2026-05-09. All 8 sections at Terminal A, pack frozen, archive live. See [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md) and [`docs/roadmap.md`](docs/roadmap.md).
-### What v0.1 is not
+### What v0.3 is not
-- Not battle-tested by external users. The single dogfood run found seven bugs.
-- Not yet on npm. Install from source until `npm publish` happens.
+- Not battle-tested by external users. Two dogfood arcs have closed — one self-referential, one external-domain — and Experiment 3 (API stability under external pressure) is in progress: pack #1 of 3 (XRPL creator-token durability) earned the v0.3.0 `--detector` flag. Two more external-domain packs required for Experiment 3 closure.
 - Not a synthesis writer. The `synth workspace` command generates the structured workspace; humans (or Cowork) write the prose against accepted claim IDs.
-- Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the five experiments that close the gap.
+- Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the six experiments that close the gap.
 ### Known limitations
-- **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as a known weakness; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
-- **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted.
-- **The dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** Discovery hallucinated wrong-domain results for self-referential section names — corrected by query-precision discipline (see handbook) and operator-pre-staged URLs for ambiguous topics.
+- **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as Experiment 4 in the roadmap; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
+- **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted. Experiment 5 in the roadmap.
+- **The v0.1 self-dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** `hermes3:8b` was not available on this rig during the v0.1 arc. The substitution disclosure stands until a hermes3-based receipt is produced — Experiment 6 in the roadmap. For operators on rigs without `hermes3:8b`, set `OLLAMA_INTERN_MODEL` to an available model; operator-pre-staged URLs and query-precision discipline (see handbook) mitigate discovery hallucination on ambiguous topics.
 ## Roadmap to v1.0
-v1.0 is an earned state, not a release date. Five open experiments stand between v0.1 and v1.0 — API stability under external pressure, a non-self-referential dogfood pack, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
+v1.0 is an earned state, not a release date. Six open experiments stand between v0.1 and v1.0 — non-self-referential dogfood (currently in progress as the ComfyUI workflow durability pack), a `research-os pack publish` command that automates export into the canonical `research-packs` monorepo (Experiment 2, scoped behind Experiment 1's manual closeout), API stability under external pressure, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Experiment 1 is not done at pack freeze — it closes when the frozen pack ships as the first package in the `research-packs` monorepo alongside the v0.1 self-dogfood backfill. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
 ## License