@mcptoolshop/research-os 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,80 @@
2
2
 
3
3
  All notable changes to `research-os` are documented here.
4
4
 
5
+ ## [0.2.0] — 2026-05-09
6
+
7
+ Tight release. Two real, tested, dogfooded improvements: `research-os pack publish`
8
+ (Experiment 2) and the Pattern 2 readiness predicate fix (Session 11 escalation).
9
+ No other v0.2 candidates shipped — remaining 7 items (large-page chunker, JSON-aware
10
+ excerpt chunker, publisher derivation, model-fallback warnings, contradict detector
11
+ strategy, GitHub source guidance, llms.txt guard) are deferred to v0.3/v0.x.
12
+
13
+ ### Added
14
+
15
+ - **`research-os pack publish`** — exports a frozen pack into the canonical
16
+ [`research-packs`](https://github.com/mcp-tool-shop-org/research-packs) archive format.
17
+ CLI: `research-os pack publish --to <path> [--from <path>] [--operator-notes <text>] [--force] [--dry-run]`.
18
+ Exit 0 on PASS, 2 on refusal. Derives `pack.manifest.json` from pack artifacts,
19
+ generates `README.md` from `synthesis/final-report.md`, provisions `docs/how-to-read-this.md`
20
+ scaffold, verifies the admission contract (5 required files, sha256 receipt reproduction,
21
+ all fingerprinted artifacts). See [`docs/pack-publish.md`](docs/pack-publish.md).
22
+
23
+ - **48 new tests** under `test/pack-publish/` covering all 8 minimum-scope behaviors
24
+ (copy, manifest derivation, sha256 verification, accepted-claims derivation,
25
+ preserved-contradiction-records derivation, README generation, how-to-read scaffold,
26
+ inline verify-pack) plus refusal cases (missing receipt, missing synthesis,
27
+ freeze-refusal present, non-empty target without --force, tampered artifacts).
28
+
29
+ - **`docs/pack-publish.md`** — full CLI reference: flags, refusal cases, produced layout,
30
+ what the command does NOT do, typical operator workflow.
31
+
32
+ - **`docs/pack-publish-dogfood.md`** — dogfood receipt: both existing `research-packs`
33
+ packages re-derived via `pack publish` and verified by `research-packs/scripts/verify-pack.mjs`.
34
+ `comfyui-workflow-durability` PASS (302 claims, 124 artifacts); `research-os-self-dogfood`
35
+ PASS (296 claims, 131 artifacts).
36
+
37
+ - **Handbook page** at `/handbook/pack-publish` — condensed reference with flags, refusal
38
+ cases, typical workflow, and links to the full reference doc and dogfood receipt.
39
+
40
+ ### Changed
41
+
42
+ - **Pattern 2 readiness predicate enforcement completed** (commit `22b5dba`).
43
+ `src/cowork/derive.ts:determineMode` and `src/audit/aggregate.ts:buildReadinessSummary`
44
+ now use `active_blockers.length === 0` semantics instead of `repair_claim_ids.length === 0`
45
+ and `repair_claims === 0`. Under Pattern 2, `needs_scope_repair`, `needs_source_repair`,
46
+ and `needs_human_review` decisions are settled state (review ran, gate passed with
47
+ sufficient accepted claims) — they are not active blockers.
48
+
49
+ **Behavioral change:** packs that previously returned `audit: repair_required` or
50
+ `handoff: repair_required` solely because claims carry intermediate reviewer decisions
51
+ (not because any active gate blocker remains) now correctly return
52
+ `audit: ready_for_synthesis` / `handoff: synthesis_ready`. The `active_blockers` field
53
+ is now the authoritative readiness signal; it was already correctly computed but was
54
+ not wired into the verdict in v0.1.
55
+
56
+ The v0.1 dogfood pack (`research-os-self-dogfood`) is regression-clean under the new
57
+ predicate — it used the heuristic reviewer (only `accepted_for_synthesis` and `rejected`
58
+ decisions), so `repair_claim_ids.length === 0` was equivalent to `active_blockers.length === 0`
59
+ by coincidence. That coincidence is gone; the intent is now enforced directly.
60
+
61
+ ### Documentation
62
+
63
+ - README status block updated to v0.2.0; `pack publish` mentioned in the workflow chain.
64
+ - `docs/roadmap.md` Experiment 2 entry updated: `IMPLEMENTED → CLOSED 2026-05-09`.
65
+ - `SHIP_GATE.md` D2 updated: version bump + tag for v0.2.0.
66
+
67
+ ### Tests
68
+
69
+ - **515 total** (467 at v0.1.1 → 515 at v0.2.0, +48 from `test/pack-publish/`).
70
+
71
+ ### Migration notes
72
+
73
+ No migration required for existing v0.1.x packs. The Pattern 2 predicate change is
74
+ forward-only: if your pack previously returned `repair_required` and the verdict was
75
+ wrong (all active gate blockers were already resolved), re-running `research-os audit`
76
+ or `research-os cowork handoff` after upgrading to v0.2.0 will return the correct
77
+ `ready_for_synthesis` / `synthesis_ready` verdict. Existing freeze receipts remain valid.
78
+
5
79
  ## [0.1.1] — 2026-05-08
6
80
 
7
81
  Documentation and release-alignment patch. No code or behavior changes — all production source and tests are identical to v0.1.0 (463 vitest cases, all passing).
package/README.md CHANGED
@@ -7,7 +7,7 @@
7
7
  </p>
8
8
 
9
9
  <p align="center">
10
- <a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.1.1"><img src="https://img.shields.io/badge/version-0.1.1-blue" alt="version 0.1.1"></a>
10
+ <a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.2.0"><img src="https://img.shields.io/badge/version-0.2.0-blue" alt="version 0.2.0"></a>
11
11
  <a href="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml"><img src="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
12
12
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
13
13
  <img src="https://img.shields.io/badge/node-%E2%89%A520-brightgreen" alt="Node ≥20">
@@ -24,7 +24,67 @@ Local-first CLI that turns an open-ended topic into a gated **research-pack**
24
24
 
25
25
  It is not a report generator. It is not an LLM-orchestration framework. It does not write your synthesis for you. It enforces the conditions under which synthesis can begin.
26
26
 
27
- **v0.1 has been used exactly once: by itself, on itself.** That single use found seven correctness gaps in `research-os`, each fixed before this release. The proof trail seven sessions, two integration patterns earned, 463 vitest cases, one frozen pack — lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
27
+ Frozen packs are archived in [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs)live, with two day-one packages. See [`docs/roadmap.md`](docs/roadmap.md) for the v1.0 path.
28
+
29
+ v0.1 has been pressure-tested in two dogfood arcs. The first — research-os researching its own spec — found seven correctness gaps before the v0.1.0 release, each requiring a real code fix and earning a law or integration pattern. The second (v1 Experiment 1: ComfyUI workflow durability, 11 sessions, a domain with no vocabulary overlap with research-os) closed 2026-05-09: pack frozen, archive live, Pattern 2 enforcement completed via commit `22b5dba`. The v0.1 proof trail lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md); the Experiment 1 proof lives in [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
30
+
31
+ ## Install
32
+
33
+ **Requirements:** Node.js ≥ 20.
34
+
35
+ ```bash
36
+ npm install -g @mcptoolshop/research-os
37
+ ```
38
+
39
+ For contributors building from source:
40
+
41
+ ```bash
42
+ git clone https://github.com/mcp-tool-shop-org/research-os.git
43
+ cd research-os
44
+ npm install
45
+ npm run build
46
+ npm link
47
+ ```
48
+
49
+ ## Quick start
50
+
51
+ ```bash
52
+ # Create a new research-pack
53
+ research-os init "How should X be structured?"
54
+
55
+ # Add a section
56
+ research-os section add 01-landscape --purpose "Map the current landscape"
57
+
58
+ # Discover and approve sources, then gather
59
+ research-os discover run 01-landscape
60
+ research-os discover approve 01-landscape --top 8
61
+ research-os gather 01-landscape --approved
62
+
63
+ # Run the per-section chain
64
+ research-os claim extract 01-landscape
65
+ research-os claim audit-density 01-landscape
66
+ research-os claim triage 01-landscape
67
+ research-os contradict map 01-landscape --triaged-only
68
+ research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
69
+ research-os review-promote 01-landscape --profile hermes-two-pass
70
+ research-os gate 01-landscape
71
+ research-os section report 01-landscape
72
+
73
+ # Pack-level finish
74
+ research-os audit
75
+ research-os index build --all
76
+ research-os cowork handoff
77
+ research-os synth workspace # only if handoff returned synthesis_ready
78
+ research-os freeze
79
+
80
+ # Export to the research-packs archive
81
+ research-os pack publish \
82
+ --to <research-packs>/packages/<name>
83
+ ```
84
+
85
+ **For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
86
+
87
+ **Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
28
88
 
29
89
  ## The 16 load-bearing laws
30
90
 
@@ -76,55 +136,6 @@ Each step is a CLI command. Each step writes to append-only artifacts. No step s
76
136
 
77
137
  This is the structural alternative to *search → summarize → pretty report*. The chain is the product.
78
138
 
79
- ## Install
80
-
81
- **Requirements:** Node.js ≥ 20.
82
-
83
- ```bash
84
- # From source (v0.1.0 is not yet published to npm)
85
- git clone https://github.com/mcp-tool-shop-org/research-os.git
86
- cd research-os
87
- npm install
88
- npm run build
89
- npm link # makes `research-os` available on your PATH
90
- ```
91
-
92
- ## Quick start
93
-
94
- ```bash
95
- # Create a new research-pack
96
- research-os init "How should X be structured?"
97
-
98
- # Add a section
99
- research-os section add 01-landscape --purpose "Map the current landscape"
100
-
101
- # Discover and approve sources, then gather
102
- research-os discover run 01-landscape
103
- research-os discover approve 01-landscape --top 8
104
- research-os gather 01-landscape --approved
105
-
106
- # Run the per-section chain
107
- research-os claim extract 01-landscape
108
- research-os claim audit-density 01-landscape
109
- research-os claim triage 01-landscape
110
- research-os contradict map 01-landscape --triaged-only
111
- research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
112
- research-os review-promote 01-landscape --profile hermes-two-pass
113
- research-os gate 01-landscape
114
- research-os section report 01-landscape
115
-
116
- # Pack-level finish
117
- research-os audit
118
- research-os index build --all
119
- research-os cowork handoff
120
- research-os synth workspace # only if handoff returned synthesis_ready
121
- research-os freeze
122
- ```
123
-
124
- **For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
125
-
126
- **Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
127
-
128
139
  ## Vocabulary
129
140
 
130
141
  | Term | Meaning |
@@ -140,24 +151,31 @@ research-os freeze
140
151
 
141
152
  ## Status
142
153
 
143
- **v0.1.0** — frozen 2026-05-08. The dogfood pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. 463/463 vitest passing. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and the freeze receipt fingerprints.
154
+ **v0.2.0** — published to npm as `@mcptoolshop/research-os@0.2.0`, 2026-05-09. Ships `research-os pack publish` (Experiment 2) and the Pattern 2 readiness predicate fix. 515/515 vitest passing. See [CHANGELOG.md](CHANGELOG.md).
155
+
156
+ **`research-os pack publish`** — Frozen packs export to the canonical `research-packs` archive with a single command. Derives `pack.manifest.json`, generates `README.md`, provisions `docs/how-to-read-this.md`, verifies the admission contract. See [`docs/pack-publish.md`](docs/pack-publish.md). Dogfood: both existing packages re-derived and `verify-pack.mjs` PASS — see [`docs/pack-publish-dogfood.md`](docs/pack-publish-dogfood.md).
157
+
158
+ **v0.1.0** — dogfood pack frozen 2026-05-08. The pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and freeze receipt fingerprints.
159
+
160
+ **research-packs archive monorepo** — live at [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs) with two day-one packages. `comfyui-workflow-durability` (Experiment 1, 302 accepted claims, 8 sections) and `research-os-self-dogfood` (v0.1 dogfood backfill, 296 accepted claims, 8 sections). Both packages PASS `verify-pack.mjs`.
161
+
162
+ **v1 Experiment 1 (ComfyUI workflow durability)** — CLOSED 2026-05-09. All 8 sections at Terminal A, pack frozen, archive live. See [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md) and [`docs/roadmap.md`](docs/roadmap.md).
144
163
 
145
- ### What v0.1 is not
164
+ ### What v0.2 is not
146
165
 
147
- - Not battle-tested by external users. The single dogfood run found seven bugs.
148
- - Not yet on npm. Install from source until `npm publish` happens.
166
+ - Not battle-tested by external users. Two dogfood arcs have run one self-referential, one external-domain. External user pressure is Experiment 3 (API stability) in the v1.0 roadmap.
149
167
  - Not a synthesis writer. The `synth workspace` command generates the structured workspace; humans (or Cowork) write the prose against accepted claim IDs.
150
- - Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the five experiments that close the gap.
168
+ - Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the six experiments that close the gap.
151
169
 
152
170
  ### Known limitations
153
171
 
154
- - **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as a known weakness; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
155
- - **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted.
156
- - **The dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** Discovery hallucinated wrong-domain results for self-referential section namescorrected by query-precision discipline (see handbook) and operator-pre-staged URLs for ambiguous topics.
172
+ - **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as Experiment 4 in the roadmap; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
173
+ - **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted. Experiment 5 in the roadmap.
174
+ - **The v0.1 self-dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** `hermes3:8b` was not available on this rig during the v0.1 arc. The substitution disclosure stands until a hermes3-based receipt is produced Experiment 6 in the roadmap. For operators on rigs without `hermes3:8b`, set `OLLAMA_INTERN_MODEL` to an available model; operator-pre-staged URLs and query-precision discipline (see handbook) mitigate discovery hallucination on ambiguous topics.
157
175
 
158
176
  ## Roadmap to v1.0
159
177
 
160
- v1.0 is an earned state, not a release date. Five open experiments stand between v0.1 and v1.0 — API stability under external pressure, a non-self-referential dogfood pack, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
178
+ v1.0 is an earned state, not a release date. Six open experiments stand between v0.1 and v1.0 — non-self-referential dogfood (currently in progress as the ComfyUI workflow durability pack), a `research-os pack publish` command that automates export into the canonical `research-packs` monorepo (Experiment 2, scoped behind Experiment 1's manual closeout), API stability under external pressure, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Experiment 1 is not done at pack freeze — it closes when the frozen pack ships as the first package in the `research-packs` monorepo alongside the v0.1 self-dogfood backfill. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
161
179
 
162
180
  ## License
163
181