@mcptoolshop/research-os 0.1.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,158 @@
2
2
 
3
3
  All notable changes to `research-os` are documented here.
4
4
 
5
+ ## [0.3.0] — 2026-05-09
6
+
7
+ Tight release. One real, tested, dogfooded improvement: `--detector` flag on
8
+ `research-os contradict map`. F-09 from Experiment 3 Session 1 (XRPL pack)
9
+ earned the fix. No other v0.3 candidates shipped — F-01 (init version-stamp),
10
+ F-02 (packs-dir docs), F-05 (discover --query example), F-08 (Windows process
11
+ recovery) are deferred to v0.3.x.
12
+
13
+ ### Added
14
+
15
+ - **`--detector <auto|heuristic|ollama-intern>`** flag on
16
+ `research-os contradict map`. Three explicit modes:
17
+
18
+ - `auto` (default) — preserves env-var-driven behavior. When the
19
+ configured Ollama model is available, runs the LLM detector;
20
+ otherwise falls through to heuristic. Mirrors v0.2.x behavior.
21
+ - `heuristic` — bypasses Ollama entirely. No model availability check,
22
+ no LLM calls. Always works. Always completes quickly.
23
+ - `ollama-intern` — requires the configured model. Exits with code 2
24
+ and a visible failure message if the model is unavailable, instead
25
+ of silently falling back to heuristic.
26
+
27
+ Invalid `--detector` values exit with code 2. The mode chosen is
28
+ announced visibly on the first output line of every run; there are no
29
+ silent shifts. Full reference: [`docs/contradict-map.md`](docs/contradict-map.md).
30
+
31
+ - **12 new tests** in `test/contradictions-detector-flag.test.ts` covering
32
+ all three modes (heuristic never instantiates the Ollama client;
33
+ ollama-intern errors visibly when model unavailable; auto preserves
34
+ existing behavior; invalid value fails fast), heuristic ledger validity,
35
+ and a regression fixture that mirrors the XRPL Section 01 pattern (~60
36
+ claims with ~5-token shared vocabulary completes via heuristic in well
37
+ under 30 seconds).
38
+
39
+ - **`docs/contradict-map.md`** — full CLI reference: detector modes, mode
40
+ announcements (verbatim strings), when-to-use-which guidance, and the
41
+ release thesis.
42
+
43
+ - **Handbook page** at `/handbook/contradict-map` — condensed reference
44
+ matching the docs page.
45
+
46
+ ### Changed
47
+
48
+ - **CLI `--help`** for `contradict map` now lists the three `--detector`
49
+ choices.
50
+ - **Reference page** in the handbook updated to mention the flag and
51
+ link to the new contradict-map page.
52
+
53
+ ### Documentation
54
+
55
+ - README status block updated to v0.3.0; version badge updated.
56
+ - `docs/roadmap.md` Experiment 3 entry: F-09 chain blocker noted as
57
+ resolved in v0.3.0 (Experiment 3 itself remains in progress; closure
58
+ requires a third external-domain pack).
59
+ - **Cross-repo:** `research-packs/docs/operator-playbook.md` updated in
60
+ the same release window. The earlier "clear `OLLAMA_INTERN_MODEL` to
61
+ force heuristic" workaround is replaced with `--detector heuristic`
62
+ as the canonical operator surface. The handbook mirror in this repo
63
+ is kept consistent with the canonical.
64
+ - `SHIP_GATE.md` D2 updated: version bump + tag for v0.3.0.
65
+
66
+ ### Tests
67
+
68
+ - **527 total** (515 at v0.2.0 → 527 at v0.3.0, +12 from
69
+ `test/contradictions-detector-flag.test.ts`).
70
+
71
+ ### Migration notes
72
+
73
+ No code-level migration required. Existing scripts that don't pass
74
+ `--detector` continue to work via `auto` mode.
75
+
76
+ For operators who previously cleared `OLLAMA_INTERN_MODEL` to force the
77
+ heuristic detector: that pattern still works in environments where the
78
+ default model isn't installed, but the flag is the canonical surface
79
+ and is environment-independent. Switch to `--detector heuristic` when
80
+ re-running narrow-topic sections; the v0.3.0 operator-playbook update
81
+ in `research-packs` documents the rationale.
82
+
83
+ ## [0.2.0] — 2026-05-09
84
+
85
+ Tight release. Two real, tested, dogfooded improvements: `research-os pack publish`
86
+ (Experiment 2) and the Pattern 2 readiness predicate fix (Session 11 escalation).
87
+ No other v0.2 candidates shipped — remaining 7 items (large-page chunker, JSON-aware
88
+ excerpt chunker, publisher derivation, model-fallback warnings, contradict detector
89
+ strategy, GitHub source guidance, llms.txt guard) are deferred to v0.3/v0.x.
90
+
91
+ ### Added
92
+
93
+ - **`research-os pack publish`** — exports a frozen pack into the canonical
94
+ [`research-packs`](https://github.com/mcp-tool-shop-org/research-packs) archive format.
95
+ CLI: `research-os pack publish --to <path> [--from <path>] [--operator-notes <text>] [--force] [--dry-run]`.
96
+ Exit 0 on PASS, 2 on refusal. Derives `pack.manifest.json` from pack artifacts,
97
+ generates `README.md` from `synthesis/final-report.md`, provisions `docs/how-to-read-this.md`
98
+ scaffold, verifies the admission contract (5 required files, sha256 receipt reproduction,
99
+ all fingerprinted artifacts). See [`docs/pack-publish.md`](docs/pack-publish.md).
100
+
101
+ - **48 new tests** under `test/pack-publish/` covering all 8 minimum-scope behaviors
102
+ (copy, manifest derivation, sha256 verification, accepted-claims derivation,
103
+ preserved-contradiction-records derivation, README generation, how-to-read scaffold,
104
+ inline verify-pack) plus refusal cases (missing receipt, missing synthesis,
105
+ freeze-refusal present, non-empty target without --force, tampered artifacts).
106
+
107
+ - **`docs/pack-publish.md`** — full CLI reference: flags, refusal cases, produced layout,
108
+ what the command does NOT do, typical operator workflow.
109
+
110
+ - **`docs/pack-publish-dogfood.md`** — dogfood receipt: both existing `research-packs`
111
+ packages re-derived via `pack publish` and verified by `research-packs/scripts/verify-pack.mjs`.
112
+ `comfyui-workflow-durability` PASS (302 claims, 124 artifacts); `research-os-self-dogfood`
113
+ PASS (296 claims, 131 artifacts).
114
+
115
+ - **Handbook page** at `/handbook/pack-publish` — condensed reference with flags, refusal
116
+ cases, typical workflow, and links to the full reference doc and dogfood receipt.
117
+
118
+ ### Changed
119
+
120
+ - **Pattern 2 readiness predicate enforcement completed** (commit `22b5dba`).
121
+ `src/cowork/derive.ts:determineMode` and `src/audit/aggregate.ts:buildReadinessSummary`
122
+ now use `active_blockers.length === 0` semantics instead of `repair_claim_ids.length === 0`
123
+ and `repair_claims === 0`. Under Pattern 2, `needs_scope_repair`, `needs_source_repair`,
124
+ and `needs_human_review` decisions are settled state (review ran, gate passed with
125
+ sufficient accepted claims) — they are not active blockers.
126
+
127
+ **Behavioral change:** packs that previously returned `audit: repair_required` or
128
+ `handoff: repair_required` solely because claims carry intermediate reviewer decisions
129
+ (not because any active gate blocker remains) now correctly return
130
+ `audit: ready_for_synthesis` / `handoff: synthesis_ready`. The `active_blockers` field
131
+ is now the authoritative readiness signal; it was already correctly computed but was
132
+ not wired into the verdict in v0.1.
133
+
134
+ The v0.1 dogfood pack (`research-os-self-dogfood`) is regression-clean under the new
135
+ predicate — it used the heuristic reviewer (only `accepted_for_synthesis` and `rejected`
136
+ decisions), so `repair_claim_ids.length === 0` was equivalent to `active_blockers.length === 0`
137
+ by coincidence. That coincidence is gone; the intent is now enforced directly.
138
+
139
+ ### Documentation
140
+
141
+ - README status block updated to v0.2.0; `pack publish` mentioned in the workflow chain.
142
+ - `docs/roadmap.md` Experiment 2 entry updated: `IMPLEMENTED → CLOSED 2026-05-09`.
143
+ - `SHIP_GATE.md` D2 updated: version bump + tag for v0.2.0.
144
+
145
+ ### Tests
146
+
147
+ - **515 total** (467 at v0.1.1 → 515 at v0.2.0, +48 from `test/pack-publish/`).
148
+
149
+ ### Migration notes
150
+
151
+ No migration required for existing v0.1.x packs. The Pattern 2 predicate change is
152
+ forward-only: if your pack previously returned `repair_required` and the verdict was
153
+ wrong (all active gate blockers were already resolved), re-running `research-os audit`
154
+ or `research-os cowork handoff` after upgrading to v0.2.0 will return the correct
155
+ `ready_for_synthesis` / `synthesis_ready` verdict. Existing freeze receipts remain valid.
156
+
5
157
  ## [0.1.1] — 2026-05-08
6
158
 
7
159
  Documentation and release-alignment patch. No code or behavior changes — all production source and tests are identical to v0.1.0 (463 vitest cases, all passing).
package/README.md CHANGED
@@ -7,7 +7,7 @@
7
7
  </p>
8
8
 
9
9
  <p align="center">
10
- <a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.1.1"><img src="https://img.shields.io/badge/version-0.1.1-blue" alt="version 0.1.1"></a>
10
+ <a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.3.0"><img src="https://img.shields.io/badge/version-0.3.0-blue" alt="version 0.3.0"></a>
11
11
  <a href="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml"><img src="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
12
12
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
13
13
  <img src="https://img.shields.io/badge/node-%E2%89%A520-brightgreen" alt="Node ≥20">
@@ -24,7 +24,67 @@ Local-first CLI that turns an open-ended topic into a gated **research-pack**
24
24
 
25
25
  It is not a report generator. It is not an LLM-orchestration framework. It does not write your synthesis for you. It enforces the conditions under which synthesis can begin.
26
26
 
27
- **v0.1 has been used exactly once: by itself, on itself.** That single use found seven correctness gaps in `research-os`, each fixed before this release. The proof trail seven sessions, two integration patterns earned, 463 vitest cases, one frozen pack — lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
27
+ Frozen packs are archived in [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs)live, with two day-one packages. See [`docs/roadmap.md`](docs/roadmap.md) for the v1.0 path.
28
+
29
+ v0.1 has been pressure-tested in two dogfood arcs. The first — research-os researching its own spec — found seven correctness gaps before the v0.1.0 release, each requiring a real code fix and earning a law or integration pattern. The second (v1 Experiment 1: ComfyUI workflow durability, 11 sessions, a domain with no vocabulary overlap with research-os) closed 2026-05-09: pack frozen, archive live, Pattern 2 enforcement completed via commit `22b5dba`. The v0.1 proof trail lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md); the Experiment 1 proof lives in [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
30
+
31
+ ## Install
32
+
33
+ **Requirements:** Node.js ≥ 20.
34
+
35
+ ```bash
36
+ npm install -g @mcptoolshop/research-os
37
+ ```
38
+
39
+ For contributors building from source:
40
+
41
+ ```bash
42
+ git clone https://github.com/mcp-tool-shop-org/research-os.git
43
+ cd research-os
44
+ npm install
45
+ npm run build
46
+ npm link
47
+ ```
48
+
49
+ ## Quick start
50
+
51
+ ```bash
52
+ # Create a new research-pack
53
+ research-os init "How should X be structured?"
54
+
55
+ # Add a section
56
+ research-os section add 01-landscape --purpose "Map the current landscape"
57
+
58
+ # Discover and approve sources, then gather
59
+ research-os discover run 01-landscape
60
+ research-os discover approve 01-landscape --top 8
61
+ research-os gather 01-landscape --approved
62
+
63
+ # Run the per-section chain
64
+ research-os claim extract 01-landscape
65
+ research-os claim audit-density 01-landscape
66
+ research-os claim triage 01-landscape
67
+ research-os contradict map 01-landscape --triaged-only
68
+ research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
69
+ research-os review-promote 01-landscape --profile hermes-two-pass
70
+ research-os gate 01-landscape
71
+ research-os section report 01-landscape
72
+
73
+ # Pack-level finish
74
+ research-os audit
75
+ research-os index build --all
76
+ research-os cowork handoff
77
+ research-os synth workspace # only if handoff returned synthesis_ready
78
+ research-os freeze
79
+
80
+ # Export to the research-packs archive
81
+ research-os pack publish \
82
+ --to <research-packs>/packages/<name>
83
+ ```
84
+
85
+ **For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
86
+
87
+ **Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
28
88
 
29
89
  ## The 16 load-bearing laws
30
90
 
@@ -76,55 +136,6 @@ Each step is a CLI command. Each step writes to append-only artifacts. No step s
76
136
 
77
137
  This is the structural alternative to *search → summarize → pretty report*. The chain is the product.
78
138
 
79
- ## Install
80
-
81
- **Requirements:** Node.js ≥ 20.
82
-
83
- ```bash
84
- # From source (v0.1.0 is not yet published to npm)
85
- git clone https://github.com/mcp-tool-shop-org/research-os.git
86
- cd research-os
87
- npm install
88
- npm run build
89
- npm link # makes `research-os` available on your PATH
90
- ```
91
-
92
- ## Quick start
93
-
94
- ```bash
95
- # Create a new research-pack
96
- research-os init "How should X be structured?"
97
-
98
- # Add a section
99
- research-os section add 01-landscape --purpose "Map the current landscape"
100
-
101
- # Discover and approve sources, then gather
102
- research-os discover run 01-landscape
103
- research-os discover approve 01-landscape --top 8
104
- research-os gather 01-landscape --approved
105
-
106
- # Run the per-section chain
107
- research-os claim extract 01-landscape
108
- research-os claim audit-density 01-landscape
109
- research-os claim triage 01-landscape
110
- research-os contradict map 01-landscape --triaged-only
111
- research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
112
- research-os review-promote 01-landscape --profile hermes-two-pass
113
- research-os gate 01-landscape
114
- research-os section report 01-landscape
115
-
116
- # Pack-level finish
117
- research-os audit
118
- research-os index build --all
119
- research-os cowork handoff
120
- research-os synth workspace # only if handoff returned synthesis_ready
121
- research-os freeze
122
- ```
123
-
124
- **For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
125
-
126
- **Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
127
-
128
139
  ## Vocabulary
129
140
 
130
141
  | Term | Meaning |
@@ -140,24 +151,33 @@ research-os freeze
140
151
 
141
152
  ## Status
142
153
 
143
- **v0.1.0** — frozen 2026-05-08. The dogfood pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. 463/463 vitest passing. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and the freeze receipt fingerprints.
154
+ **v0.3.0** — published to npm as `@mcptoolshop/research-os@0.3.0`, 2026-05-09. Ships the `--detector <auto|heuristic|ollama-intern>` flag on `contradict map` (F-09 chain-blocker fix from Experiment 3 Session 1, XRPL pack). 527/527 vitest passing. See [CHANGELOG.md](CHANGELOG.md) and [`docs/contradict-map.md`](docs/contradict-map.md).
155
+
156
+ **`contradict map --detector`** — Detector selection is now an explicit operator choice instead of a state-dependent env-var dance. `auto` (default) preserves prior behavior; `heuristic` always works without LLM and completes quickly on narrow-topic documentation sections; `ollama-intern` requires the configured model and exits visibly if unavailable. Mode is announced visibly on every run. The earlier "clear `OLLAMA_INTERN_MODEL` to force heuristic" workaround is superseded by `--detector heuristic` as the canonical operator surface; see the [research-packs operator playbook](https://github.com/mcp-tool-shop-org/research-packs/blob/main/docs/operator-playbook.md) and the [handbook contradict-map page](https://mcp-tool-shop-org.github.io/research-os/handbook/contradict-map/).
157
+
158
+ **v0.2.0** — published 2026-05-09. Shipped `research-os pack publish` (Experiment 2) and the Pattern 2 readiness predicate fix. 515/515 vitest passing then. See [CHANGELOG.md](CHANGELOG.md). Frozen packs export to the canonical `research-packs` archive with a single command; admission contract is enforced by code, not checklist. See [`docs/pack-publish.md`](docs/pack-publish.md).
159
+
160
+ **v0.1.0** — dogfood pack frozen 2026-05-08. The pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and freeze receipt fingerprints.
161
+
162
+ **research-packs archive monorepo** — live at [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs) with two day-one packages. `comfyui-workflow-durability` (Experiment 1, 302 accepted claims, 8 sections) and `research-os-self-dogfood` (v0.1 dogfood backfill, 296 accepted claims, 8 sections). Both packages PASS `verify-pack.mjs`.
163
+
164
+ **v1 Experiment 1 (ComfyUI workflow durability)** — CLOSED 2026-05-09. All 8 sections at Terminal A, pack frozen, archive live. See [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md) and [`docs/roadmap.md`](docs/roadmap.md).
144
165
 
145
- ### What v0.1 is not
166
+ ### What v0.3 is not
146
167
 
147
- - Not battle-tested by external users. The single dogfood run found seven bugs.
148
- - Not yet on npm. Install from source until `npm publish` happens.
168
+ - Not battle-tested by external users. Two dogfood arcs have closed — one self-referential, one external-domain — and Experiment 3 (API stability under external pressure) is in progress: pack #1 of 3 (XRPL creator-token durability) earned the v0.3.0 `--detector` flag. Two more external-domain packs required for Experiment 3 closure.
149
169
  - Not a synthesis writer. The `synth workspace` command generates the structured workspace; humans (or Cowork) write the prose against accepted claim IDs.
150
- - Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the five experiments that close the gap.
170
+ - Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the six experiments that close the gap.
151
171
 
152
172
  ### Known limitations
153
173
 
154
- - **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as a known weakness; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
155
- - **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted.
156
- - **The dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** Discovery hallucinated wrong-domain results for self-referential section namescorrected by query-precision discipline (see handbook) and operator-pre-staged URLs for ambiguous topics.
174
+ - **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as Experiment 4 in the roadmap; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
175
+ - **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted. Experiment 5 in the roadmap.
176
+ - **The v0.1 self-dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** `hermes3:8b` was not available on this rig during the v0.1 arc. The substitution disclosure stands until a hermes3-based receipt is produced Experiment 6 in the roadmap. For operators on rigs without `hermes3:8b`, set `OLLAMA_INTERN_MODEL` to an available model; operator-pre-staged URLs and query-precision discipline (see handbook) mitigate discovery hallucination on ambiguous topics.
157
177
 
158
178
  ## Roadmap to v1.0
159
179
 
160
- v1.0 is an earned state, not a release date. Five open experiments stand between v0.1 and v1.0 — API stability under external pressure, a non-self-referential dogfood pack, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
180
+ v1.0 is an earned state, not a release date. Six open experiments stand between v0.1 and v1.0 — non-self-referential dogfood (currently in progress as the ComfyUI workflow durability pack), a `research-os pack publish` command that automates export into the canonical `research-packs` monorepo (Experiment 2, scoped behind Experiment 1's manual closeout), API stability under external pressure, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Experiment 1 is not done at pack freeze — it closes when the frozen pack ships as the first package in the `research-packs` monorepo alongside the v0.1 self-dogfood backfill. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
161
181
 
162
182
  ## License
163
183