@mcptoolshop/research-os 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +96 -0
- package/README.md +78 -60
- package/dist/cli.js +558 -4
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +10 -1
- package/dist/index.js +5 -3
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,102 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to `research-os` are documented here.
|
|
4
4
|
|
|
5
|
+
## [0.2.0] — 2026-05-09
|
|
6
|
+
|
|
7
|
+
Tight release. Two real, tested, dogfooded improvements: `research-os pack publish`
|
|
8
|
+
(Experiment 2) and the Pattern 2 readiness predicate fix (Session 11 escalation).
|
|
9
|
+
No other v0.2 candidates shipped — remaining 7 items (large-page chunker, JSON-aware
|
|
10
|
+
excerpt chunker, publisher derivation, model-fallback warnings, contradict detector
|
|
11
|
+
strategy, GitHub source guidance, llms.txt guard) are deferred to v0.3/v0.x.
|
|
12
|
+
|
|
13
|
+
### Added
|
|
14
|
+
|
|
15
|
+
- **`research-os pack publish`** — exports a frozen pack into the canonical
|
|
16
|
+
[`research-packs`](https://github.com/mcp-tool-shop-org/research-packs) archive format.
|
|
17
|
+
CLI: `research-os pack publish --to <path> [--from <path>] [--operator-notes <text>] [--force] [--dry-run]`.
|
|
18
|
+
Exit 0 on PASS, 2 on refusal. Derives `pack.manifest.json` from pack artifacts,
|
|
19
|
+
generates `README.md` from `synthesis/final-report.md`, provisions `docs/how-to-read-this.md`
|
|
20
|
+
scaffold, verifies the admission contract (5 required files, sha256 receipt reproduction,
|
|
21
|
+
all fingerprinted artifacts). See [`docs/pack-publish.md`](docs/pack-publish.md).
|
|
22
|
+
|
|
23
|
+
- **48 new tests** under `test/pack-publish/` covering all 8 minimum-scope behaviors
|
|
24
|
+
(copy, manifest derivation, sha256 verification, accepted-claims derivation,
|
|
25
|
+
preserved-contradiction-records derivation, README generation, how-to-read scaffold,
|
|
26
|
+
inline verify-pack) plus refusal cases (missing receipt, missing synthesis,
|
|
27
|
+
freeze-refusal present, non-empty target without --force, tampered artifacts).
|
|
28
|
+
|
|
29
|
+
- **`docs/pack-publish.md`** — full CLI reference: flags, refusal cases, produced layout,
|
|
30
|
+
what the command does NOT do, typical operator workflow.
|
|
31
|
+
|
|
32
|
+
- **`docs/pack-publish-dogfood.md`** — dogfood receipt: both existing `research-packs`
|
|
33
|
+
packages re-derived via `pack publish` and verified by `research-packs/scripts/verify-pack.mjs`.
|
|
34
|
+
`comfyui-workflow-durability` PASS (302 claims, 124 artifacts); `research-os-self-dogfood`
|
|
35
|
+
PASS (296 claims, 131 artifacts).
|
|
36
|
+
|
|
37
|
+
- **Handbook page** at `/handbook/pack-publish` — condensed reference with flags, refusal
|
|
38
|
+
cases, typical workflow, and links to the full reference doc and dogfood receipt.
|
|
39
|
+
|
|
40
|
+
### Changed
|
|
41
|
+
|
|
42
|
+
- **Pattern 2 readiness predicate enforcement completed** (commit `22b5dba`).
|
|
43
|
+
`src/cowork/derive.ts:determineMode` and `src/audit/aggregate.ts:buildReadinessSummary`
|
|
44
|
+
now use `active_blockers.length === 0` semantics instead of `repair_claim_ids.length === 0`
|
|
45
|
+
and `repair_claims === 0`. Under Pattern 2, `needs_scope_repair`, `needs_source_repair`,
|
|
46
|
+
and `needs_human_review` decisions are settled state (review ran, gate passed with
|
|
47
|
+
sufficient accepted claims) — they are not active blockers.
|
|
48
|
+
|
|
49
|
+
**Behavioral change:** packs that previously returned `audit: repair_required` or
|
|
50
|
+
`handoff: repair_required` solely because claims carry intermediate reviewer decisions
|
|
51
|
+
(not because any active gate blocker remains) now correctly return
|
|
52
|
+
`audit: ready_for_synthesis` / `handoff: synthesis_ready`. The `active_blockers` field
|
|
53
|
+
is now the authoritative readiness signal; it was already correctly computed but was
|
|
54
|
+
not wired into the verdict in v0.1.
|
|
55
|
+
|
|
56
|
+
The v0.1 dogfood pack (`research-os-self-dogfood`) is regression-clean under the new
|
|
57
|
+
predicate — it used the heuristic reviewer (only `accepted_for_synthesis` and `rejected`
|
|
58
|
+
decisions), so `repair_claim_ids.length === 0` was equivalent to `active_blockers.length === 0`
|
|
59
|
+
by coincidence. That coincidence is gone; the intent is now enforced directly.
|
|
60
|
+
|
|
61
|
+
### Documentation
|
|
62
|
+
|
|
63
|
+
- README status block updated to v0.2.0; `pack publish` mentioned in the workflow chain.
|
|
64
|
+
- `docs/roadmap.md` Experiment 2 entry updated: `IMPLEMENTED → CLOSED 2026-05-09`.
|
|
65
|
+
- `SHIP_GATE.md` D2 updated: version bump + tag for v0.2.0.
|
|
66
|
+
|
|
67
|
+
### Tests
|
|
68
|
+
|
|
69
|
+
- **515 total** (467 at v0.1.1 → 515 at v0.2.0, +48 from `test/pack-publish/`).
|
|
70
|
+
|
|
71
|
+
### Migration notes
|
|
72
|
+
|
|
73
|
+
No migration required for existing v0.1.x packs. The Pattern 2 predicate change is
|
|
74
|
+
forward-only: if your pack previously returned `repair_required` and the verdict was
|
|
75
|
+
wrong (all active gate blockers were already resolved), re-running `research-os audit`
|
|
76
|
+
or `research-os cowork handoff` after upgrading to v0.2.0 will return the correct
|
|
77
|
+
`ready_for_synthesis` / `synthesis_ready` verdict. Existing freeze receipts remain valid.
|
|
78
|
+
|
|
79
|
+
## [0.1.1] — 2026-05-08
|
|
80
|
+
|
|
81
|
+
Documentation and release-alignment patch. No code or behavior changes — all production source and tests are identical to v0.1.0 (463 vitest cases, all passing).
|
|
82
|
+
|
|
83
|
+
### Added
|
|
84
|
+
- `docs/roadmap.md` — five experiments that stand between v0.1 and v1.0 (API stability under external pressure, non-self-referential dogfood, extractor-provenance gap closure, reviewer-calibration generalization, hermes3 baseline).
|
|
85
|
+
- README "What v0.1 is not" section disclosing what hasn't been validated yet.
|
|
86
|
+
- README "Known limitations" section naming the extractor-provenance gap and the model-substitution caveat.
|
|
87
|
+
- README "Roadmap to v1.0" section linking to the roadmap doc.
|
|
88
|
+
- Centered logo in README, hosted at `mcp-tool-shop-org/brand`.
|
|
89
|
+
- Status badges: version, CI, license, Node ≥20, handbook.
|
|
90
|
+
- Translated READMEs in 7 languages (ja, zh, es, fr, hi, it, pt-BR) plus the language nav bar.
|
|
91
|
+
- `publishConfig.access=public` in `package.json` for the scoped npm package.
|
|
92
|
+
|
|
93
|
+
### Fixed
|
|
94
|
+
- README workflow chain order: `review`/`review-promote` come before `gate` (gate consumes review decisions, not the other way around). Quick-start commands updated to match.
|
|
95
|
+
- README CLI invocation: `--triaged-only --preset hermes-two-pass --profile hermes-two-pass` matches the actual demonstrated workflow.
|
|
96
|
+
- `pages.yml` workflow trigger: `branches: [master]` (was `[main]`); the deploy never fired on the v0.1.0 push because of this. Site is now live at <https://mcp-tool-shop-org.github.io/research-os/>.
|
|
97
|
+
|
|
98
|
+
### Why a patch release
|
|
99
|
+
The v0.1.0 tag was created before the documentation pass landed. The npm tarball at `0.1.0` already includes the corrected README (it was published after the doc commits), but the GitHub tag/release pointed to the pre-doc commit. v0.1.1 realigns everything: tag, GitHub Release, and npm tarball all point at the same coherent state.
|
|
100
|
+
|
|
5
101
|
## [0.1.0] — 2026-05-08
|
|
6
102
|
|
|
7
103
|
### Added
|
package/README.md
CHANGED
|
@@ -7,7 +7,7 @@
|
|
|
7
7
|
</p>
|
|
8
8
|
|
|
9
9
|
<p align="center">
|
|
10
|
-
<a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.
|
|
10
|
+
<a href="https://github.com/mcp-tool-shop-org/research-os/releases/tag/v0.2.0"><img src="https://img.shields.io/badge/version-0.2.0-blue" alt="version 0.2.0"></a>
|
|
11
11
|
<a href="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml"><img src="https://github.com/mcp-tool-shop-org/research-os/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
|
|
12
12
|
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
|
|
13
13
|
<img src="https://img.shields.io/badge/node-%E2%89%A520-brightgreen" alt="Node ≥20">
|
|
@@ -24,7 +24,67 @@ Local-first CLI that turns an open-ended topic into a gated **research-pack**
|
|
|
24
24
|
|
|
25
25
|
It is not a report generator. It is not an LLM-orchestration framework. It does not write your synthesis for you. It enforces the conditions under which synthesis can begin.
|
|
26
26
|
|
|
27
|
-
|
|
27
|
+
Frozen packs are archived in [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs) — live, with two day-one packages. See [`docs/roadmap.md`](docs/roadmap.md) for the v1.0 path.
|
|
28
|
+
|
|
29
|
+
v0.1 has been pressure-tested in two dogfood arcs. The first — research-os researching its own spec — found seven correctness gaps before the v0.1.0 release, each requiring a real code fix and earning a law or integration pattern. The second (v1 Experiment 1: ComfyUI workflow durability, 11 sessions, a domain with no vocabulary overlap with research-os) closed 2026-05-09: pack frozen, archive live, Pattern 2 enforcement completed via commit `22b5dba`. The v0.1 proof trail lives in [`docs/dogfood-proof.md`](docs/dogfood-proof.md); the Experiment 1 proof lives in [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md). Live handbook: <https://mcp-tool-shop-org.github.io/research-os/handbook/>.
|
|
30
|
+
|
|
31
|
+
## Install
|
|
32
|
+
|
|
33
|
+
**Requirements:** Node.js ≥ 20.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
npm install -g @mcptoolshop/research-os
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
For contributors building from source:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
git clone https://github.com/mcp-tool-shop-org/research-os.git
|
|
43
|
+
cd research-os
|
|
44
|
+
npm install
|
|
45
|
+
npm run build
|
|
46
|
+
npm link
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Quick start
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
# Create a new research-pack
|
|
53
|
+
research-os init "How should X be structured?"
|
|
54
|
+
|
|
55
|
+
# Add a section
|
|
56
|
+
research-os section add 01-landscape --purpose "Map the current landscape"
|
|
57
|
+
|
|
58
|
+
# Discover and approve sources, then gather
|
|
59
|
+
research-os discover run 01-landscape
|
|
60
|
+
research-os discover approve 01-landscape --top 8
|
|
61
|
+
research-os gather 01-landscape --approved
|
|
62
|
+
|
|
63
|
+
# Run the per-section chain
|
|
64
|
+
research-os claim extract 01-landscape
|
|
65
|
+
research-os claim audit-density 01-landscape
|
|
66
|
+
research-os claim triage 01-landscape
|
|
67
|
+
research-os contradict map 01-landscape --triaged-only
|
|
68
|
+
research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
|
|
69
|
+
research-os review-promote 01-landscape --profile hermes-two-pass
|
|
70
|
+
research-os gate 01-landscape
|
|
71
|
+
research-os section report 01-landscape
|
|
72
|
+
|
|
73
|
+
# Pack-level finish
|
|
74
|
+
research-os audit
|
|
75
|
+
research-os index build --all
|
|
76
|
+
research-os cowork handoff
|
|
77
|
+
research-os synth workspace # only if handoff returned synthesis_ready
|
|
78
|
+
research-os freeze
|
|
79
|
+
|
|
80
|
+
# Export to the research-packs archive
|
|
81
|
+
research-os pack publish \
|
|
82
|
+
--to <research-packs>/packages/<name>
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
**For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
|
|
86
|
+
|
|
87
|
+
**Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
|
|
28
88
|
|
|
29
89
|
## The 16 load-bearing laws
|
|
30
90
|
|
|
@@ -76,55 +136,6 @@ Each step is a CLI command. Each step writes to append-only artifacts. No step s
|
|
|
76
136
|
|
|
77
137
|
This is the structural alternative to *search → summarize → pretty report*. The chain is the product.
|
|
78
138
|
|
|
79
|
-
## Install
|
|
80
|
-
|
|
81
|
-
**Requirements:** Node.js ≥ 20.
|
|
82
|
-
|
|
83
|
-
```bash
|
|
84
|
-
# From source (v0.1.0 is not yet published to npm)
|
|
85
|
-
git clone https://github.com/mcp-tool-shop-org/research-os.git
|
|
86
|
-
cd research-os
|
|
87
|
-
npm install
|
|
88
|
-
npm run build
|
|
89
|
-
npm link # makes `research-os` available on your PATH
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
## Quick start
|
|
93
|
-
|
|
94
|
-
```bash
|
|
95
|
-
# Create a new research-pack
|
|
96
|
-
research-os init "How should X be structured?"
|
|
97
|
-
|
|
98
|
-
# Add a section
|
|
99
|
-
research-os section add 01-landscape --purpose "Map the current landscape"
|
|
100
|
-
|
|
101
|
-
# Discover and approve sources, then gather
|
|
102
|
-
research-os discover run 01-landscape
|
|
103
|
-
research-os discover approve 01-landscape --top 8
|
|
104
|
-
research-os gather 01-landscape --approved
|
|
105
|
-
|
|
106
|
-
# Run the per-section chain
|
|
107
|
-
research-os claim extract 01-landscape
|
|
108
|
-
research-os claim audit-density 01-landscape
|
|
109
|
-
research-os claim triage 01-landscape
|
|
110
|
-
research-os contradict map 01-landscape --triaged-only
|
|
111
|
-
research-os review 01-landscape --triaged-only --preset hermes-two-pass --profile hermes-two-pass
|
|
112
|
-
research-os review-promote 01-landscape --profile hermes-two-pass
|
|
113
|
-
research-os gate 01-landscape
|
|
114
|
-
research-os section report 01-landscape
|
|
115
|
-
|
|
116
|
-
# Pack-level finish
|
|
117
|
-
research-os audit
|
|
118
|
-
research-os index build --all
|
|
119
|
-
research-os cowork handoff
|
|
120
|
-
research-os synth workspace # only if handoff returned synthesis_ready
|
|
121
|
-
research-os freeze
|
|
122
|
-
```
|
|
123
|
-
|
|
124
|
-
**For a real worked example**, see the dogfood pack at `research-os-packs/research-os-spec/` — every artifact, every receipt, every disposition, every freeze fingerprint, all on disk in append-only ledgers. That pack is what produced `docs/dogfood-proof.md`.
|
|
125
|
-
|
|
126
|
-
**Requires [ollama-intern-mcp](https://github.com/mcp-tool-shop-org/ollama-intern-mcp) running locally** for LLM extraction, triage, review, and discovery. Default model is `hermes3:8b`; override with `OLLAMA_INTERN_MODEL=<model>`. Set `OLLAMA_HOST` if Ollama is not on the default `localhost:11434`.
|
|
127
|
-
|
|
128
139
|
## Vocabulary
|
|
129
140
|
|
|
130
141
|
| Term | Meaning |
|
|
@@ -140,24 +151,31 @@ research-os freeze
|
|
|
140
151
|
|
|
141
152
|
## Status
|
|
142
153
|
|
|
143
|
-
**v0.
|
|
154
|
+
**v0.2.0** — published to npm as `@mcptoolshop/research-os@0.2.0`, 2026-05-09. Ships `research-os pack publish` (Experiment 2) and the Pattern 2 readiness predicate fix. 515/515 vitest passing. See [CHANGELOG.md](CHANGELOG.md).
|
|
155
|
+
|
|
156
|
+
**`research-os pack publish`** — Frozen packs export to the canonical `research-packs` archive with a single command. Derives `pack.manifest.json`, generates `README.md`, provisions `docs/how-to-read-this.md`, verifies the admission contract. See [`docs/pack-publish.md`](docs/pack-publish.md). Dogfood: both existing packages re-derived and `verify-pack.mjs` PASS — see [`docs/pack-publish-dogfood.md`](docs/pack-publish-dogfood.md).
|
|
157
|
+
|
|
158
|
+
**v0.1.0** — dogfood pack frozen 2026-05-08. The pack at `research-os-packs/research-os-spec/` (sibling repo) reached freeze with 296 accepted claims across 8 sections, 17 dispositioned, 30 operator-overridden, 0 active repair blockers, 0 unresolved contradictions, all gates `synthesis_eligible=true`. Sixteen load-bearing laws cumulative. See [`docs/dogfood-proof.md`](docs/dogfood-proof.md) for the seven findings and freeze receipt fingerprints.
|
|
159
|
+
|
|
160
|
+
**research-packs archive monorepo** — live at [`mcp-tool-shop-org/research-packs`](https://github.com/mcp-tool-shop-org/research-packs) with two day-one packages. `comfyui-workflow-durability` (Experiment 1, 302 accepted claims, 8 sections) and `research-os-self-dogfood` (v0.1 dogfood backfill, 296 accepted claims, 8 sections). Both packages PASS `verify-pack.mjs`.
|
|
161
|
+
|
|
162
|
+
**v1 Experiment 1 (ComfyUI workflow durability)** — CLOSED 2026-05-09. All 8 sections at Terminal A, pack frozen, archive live. See [`docs/experiment-1-proof.md`](docs/experiment-1-proof.md) and [`docs/roadmap.md`](docs/roadmap.md).
|
|
144
163
|
|
|
145
|
-
### What v0.
|
|
164
|
+
### What v0.2 is not
|
|
146
165
|
|
|
147
|
-
- Not battle-tested by external users.
|
|
148
|
-
- Not yet on npm. Install from source until `npm publish` happens.
|
|
166
|
+
- Not battle-tested by external users. Two dogfood arcs have run — one self-referential, one external-domain. External user pressure is Experiment 3 (API stability) in the v1.0 roadmap.
|
|
149
167
|
- Not a synthesis writer. The `synth workspace` command generates the structured workspace; humans (or Cowork) write the prose against accepted claim IDs.
|
|
150
|
-
- Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the
|
|
168
|
+
- Not API-stable under semver. v1.0.0 is an earned state, not a calendar date — see [`docs/roadmap.md`](docs/roadmap.md) for the six experiments that close the gap.
|
|
151
169
|
|
|
152
170
|
### Known limitations
|
|
153
171
|
|
|
154
|
-
- **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as
|
|
155
|
-
- **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted.
|
|
156
|
-
- **The dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).**
|
|
172
|
+
- **Extractor provenance is not visible at the gate seam.** A section can pass the accepted-claim floor while relying on heuristic-fallback claims when the calibrated extractor (Ollama with the configured model) is unavailable. Recorded as Experiment 4 in the roadmap; future hardening will report accepted claims by extractor and require the floor's worth of accepted claims from the calibrated path.
|
|
173
|
+
- **Reviewer model selection beyond the calibrated `hermes-two-pass` baseline is unresolved.** The dogfood arc validated one reviewer config; alternative models need their own seeded-failure recall calibration before they can be trusted. Experiment 5 in the roadmap.
|
|
174
|
+
- **The v0.1 self-dogfood pack used `mistral-nemo:12b` for extraction (canonical default is `hermes3:8b`).** `hermes3:8b` was not available on this rig during the v0.1 arc. The substitution disclosure stands until a hermes3-based receipt is produced — Experiment 6 in the roadmap. For operators on rigs without `hermes3:8b`, set `OLLAMA_INTERN_MODEL` to an available model; operator-pre-staged URLs and query-precision discipline (see handbook) mitigate discovery hallucination on ambiguous topics.
|
|
157
175
|
|
|
158
176
|
## Roadmap to v1.0
|
|
159
177
|
|
|
160
|
-
v1.0 is an earned state, not a release date.
|
|
178
|
+
v1.0 is an earned state, not a release date. Six open experiments stand between v0.1 and v1.0 — non-self-referential dogfood (currently in progress as the ComfyUI workflow durability pack), a `research-os pack publish` command that automates export into the canonical `research-packs` monorepo (Experiment 2, scoped behind Experiment 1's manual closeout), API stability under external pressure, closing the extractor-provenance gap, generalizing reviewer calibration beyond `hermes-two-pass`, and a clean baseline run on `hermes3:8b`. Experiment 1 is not done at pack freeze — it closes when the frozen pack ships as the first package in the `research-packs` monorepo alongside the v0.1 self-dogfood backfill. Full plan in [`docs/roadmap.md`](docs/roadmap.md). The architecture lock holds throughout; v1.0 deepens what v0.1 proved rather than reopening it.
|
|
161
179
|
|
|
162
180
|
## License
|
|
163
181
|
|