ralph-research 0.1.4 → 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +46 -5
- package/dist/cli/commands/demo.d.ts +2 -0
- package/dist/cli/commands/demo.js +5 -4
- package/dist/cli/commands/demo.js.map +1 -1
- package/dist/cli/program.js +1 -1
- package/dist/mcp/server.js +1 -1
- package/package.json +20 -2
- package/templates/code/README.md +42 -0
- package/templates/code/ralph.yaml +57 -0
- package/templates/code/scripts/experiment.mjs +29 -0
- package/templates/code/scripts/metric.mjs +8 -0
- package/templates/code/scripts/propose.mjs +14 -0
- package/templates/code/src/calculator.mjs +7 -0
- package/templates/code/tests/calculator.test.mjs +20 -0
- package/templates/writing/README.md +46 -0
package/README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# ralph-research
|
|
2
2
|
|
|
3
3
|
[](https://github.com/coyaSONG/ralph-research/actions/workflows/ci.yml)
|
|
4
|
+
[](https://www.npmjs.com/package/ralph-research)
|
|
4
5
|
[](LICENSE)
|
|
5
6
|
[](package.json)
|
|
6
7
|
[](tsconfig.json)
|
|
@@ -15,6 +16,22 @@ Local-first runtime for recursive research improvement over real artifacts.
|
|
|
15
16
|
4. persist the run, decision, and frontier state
|
|
16
17
|
5. promote only verified improvements
|
|
17
18
|
|
|
19
|
+
```mermaid
|
|
20
|
+
flowchart LR
|
|
21
|
+
M[Manifest<br/>ralph.yaml] --> P[Proposer]
|
|
22
|
+
P -->|candidate change<br/>in worktree| E[Experiment]
|
|
23
|
+
E -->|outputs| X[Metric extractor]
|
|
24
|
+
X --> R{Ratchet}
|
|
25
|
+
R -->|wins frontier| A[Accept → main]
|
|
26
|
+
R -->|else| J[Reject]
|
|
27
|
+
A -.->|persists| S[(.ralph/<br/>runs · decisions · frontier)]
|
|
28
|
+
J -.->|persists| S
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
If your viewer does not render Mermaid: the diagram is just the five
|
|
32
|
+
numbered steps above, with every transition writing to durable state under
|
|
33
|
+
`.ralph/`. That's the bit that makes the loop resumable.
|
|
34
|
+
|
|
18
35
|
The current product bar is reliability, not breadth. The bundled success path is the `writing` template, while the runtime itself is manifest-driven and reusable for other local workflows.
|
|
19
36
|
|
|
20
37
|
## Trust Signals
|
|
@@ -44,8 +61,8 @@ The current product bar is reliability, not breadth. The bundled success path is
|
|
|
44
61
|
| If you want to... | Use |
|
|
45
62
|
| --- | --- |
|
|
46
63
|
| Check whether a repo is runnable | `rrx validate` then `rrx doctor` |
|
|
47
|
-
| Materialize the bundled example project | `rrx init --template writing` |
|
|
48
|
-
| Run a disposable end-to-end demo | `rrx demo writing` |
|
|
64
|
+
| Materialize the bundled example project | `rrx init --template writing` (or `--template code`) |
|
|
65
|
+
| Run a disposable end-to-end demo | `rrx demo writing` (or `rrx demo code`) |
|
|
49
66
|
| Launch the v1 goal-driven orchestrator | `rrx "improve the holdout top-3 model"` |
|
|
50
67
|
| Launch the v1 goal-driven orchestrator explicitly | `rrx launch "improve the holdout top-3 model"` |
|
|
51
68
|
| Resume a persisted TUI research session | `rrx resume latest` |
|
|
@@ -103,7 +120,7 @@ See [docs/operation-model.md](docs/operation-model.md) for the full lifecycle an
|
|
|
103
120
|
|
|
104
121
|
## Current Scope
|
|
105
122
|
|
|
106
|
-
- Bundled
|
|
123
|
+
- Bundled templates: `writing` (prose ratchet) and `code` (test-pass ratchet over a tiny calculator module)
|
|
107
124
|
- Default template metric: local command metric, no API key required
|
|
108
125
|
- Optional judge path: pairwise LLM judge packs
|
|
109
126
|
- MCP tools:
|
|
@@ -113,9 +130,11 @@ See [docs/operation-model.md](docs/operation-model.md) for the full lifecycle an
|
|
|
113
130
|
|
|
114
131
|
The runtime supports broader manifests than the bundled template demonstrates, but the shipped onboarding path is intentionally narrow until those flows are equally reliable.
|
|
115
132
|
|
|
116
|
-
##
|
|
133
|
+
## Bundled Templates
|
|
117
134
|
|
|
118
|
-
|
|
135
|
+
### Writing template
|
|
136
|
+
|
|
137
|
+
Self-contained prose improvement loop:
|
|
119
138
|
|
|
120
139
|
- `docs/draft.md`: sample draft
|
|
121
140
|
- `scripts/propose.mjs`: bounded rewrite
|
|
@@ -125,6 +144,18 @@ The bundled writing template is self-contained:
|
|
|
125
144
|
|
|
126
145
|
`templates/writing/ralph.yaml` uses a local command metric by default, so the first run works without model credentials.
|
|
127
146
|
|
|
147
|
+
### Code template
|
|
148
|
+
|
|
149
|
+
Self-contained test-pass ratchet over a tiny calculator module:
|
|
150
|
+
|
|
151
|
+
- `src/calculator.mjs`: deliberately-broken `sum`/`multiply`
|
|
152
|
+
- `tests/calculator.test.mjs`: four assertions using the built-in `node:test` runner
|
|
153
|
+
- `scripts/propose.mjs`: writes the fixed calculator implementation
|
|
154
|
+
- `scripts/experiment.mjs`: runs `node --test --test-reporter=tap` and persists the pass/fail counts
|
|
155
|
+
- `scripts/metric.mjs`: emits the pass count as the `tests_passed` metric
|
|
156
|
+
|
|
157
|
+
`rrx demo code` materializes the template, runs one cycle, and shows the ratchet promoting the candidate from `tests_passed: 0` to `tests_passed: 4`.
|
|
158
|
+
|
|
128
159
|
## Progressive Runs
|
|
129
160
|
|
|
130
161
|
`rrx run` executes one cycle by default and auto-resumes the latest recoverable run when one exists.
|
|
@@ -144,6 +175,7 @@ npx ralph-research run --until-target --until-no-improve 3 --json
|
|
|
144
175
|
|
|
145
176
|
## More Docs
|
|
146
177
|
|
|
178
|
+
- [docs/quickstart.md](docs/quickstart.md): five-minute walkthrough from `npx ralph-research demo writing` to inspecting the persisted decision evidence
|
|
147
179
|
- [docs/operation-model.md](docs/operation-model.md): lifecycle, persisted state, recovery classes
|
|
148
180
|
- [docs/playbook.md](docs/playbook.md): situation-to-command operator guide
|
|
149
181
|
- [docs/examples.md](docs/examples.md): quickstart and manifest examples pulled from shipped templates and fixtures
|
|
@@ -193,6 +225,15 @@ npm run typecheck
|
|
|
193
225
|
npm run build
|
|
194
226
|
```
|
|
195
227
|
|
|
228
|
+
## Support the Project
|
|
229
|
+
|
|
230
|
+
If `ralph-research` saves you from wiring up your own write-evaluate-accept loop:
|
|
231
|
+
|
|
232
|
+
- Star the repo on [GitHub](https://github.com/coyaSONG/ralph-research). It is the single clearest signal that the runtime is worth maintaining and helps surface it to other people who need the same shape of tool.
|
|
233
|
+
- File issues with concrete reproductions. The issue templates ask for the version, OS, and exact commands so they convert quickly into fixes.
|
|
234
|
+
- Open a PR for the gaps you actually hit. `CONTRIBUTING.md` covers the local loop; the bar is a Vitest regression that fails against the previous code.
|
|
235
|
+
- If you want to talk shape and direction rather than file an issue, the manifest schema (`src/core/manifest/schema.ts`) and the recovery classifier (`src/core/state/research-session-recovery-classifier.ts`) are the two surfaces I most want feedback on.
|
|
236
|
+
|
|
196
237
|
## License
|
|
197
238
|
|
|
198
239
|
MIT
|
|
@@ -5,5 +5,7 @@ export interface DemoCommandOptions {
|
|
|
5
5
|
force?: boolean;
|
|
6
6
|
json?: boolean;
|
|
7
7
|
}
|
|
8
|
+
export declare const SUPPORTED_DEMO_TEMPLATES: readonly ["writing", "code"];
|
|
9
|
+
export type SupportedDemoTemplate = (typeof SUPPORTED_DEMO_TEMPLATES)[number];
|
|
8
10
|
export declare function runDemoCommand(template: string, options: DemoCommandOptions, io?: CommandIO): Promise<number>;
|
|
9
11
|
export declare function registerDemoCommand(program: Command): void;
|
|
@@ -6,6 +6,7 @@ import { inspectRun } from "../../app/services/project-state-service.js";
|
|
|
6
6
|
import { RunCycleService } from "../../app/services/run-cycle-service.js";
|
|
7
7
|
import { DEFAULT_MANIFEST_FILENAME } from "../../core/manifest/schema.js";
|
|
8
8
|
import { copyTemplate } from "../../shared/template-utils.js";
|
|
9
|
+
export const SUPPORTED_DEMO_TEMPLATES = ["writing", "code"];
|
|
9
10
|
const defaultCommandIO = {
|
|
10
11
|
stdout: (message) => {
|
|
11
12
|
process.stdout.write(`${message}\n`);
|
|
@@ -15,8 +16,8 @@ const defaultCommandIO = {
|
|
|
15
16
|
},
|
|
16
17
|
};
|
|
17
18
|
export async function runDemoCommand(template, options, io = defaultCommandIO) {
|
|
18
|
-
if (template
|
|
19
|
-
const message = `Unsupported demo template ${template};
|
|
19
|
+
if (!SUPPORTED_DEMO_TEMPLATES.includes(template)) {
|
|
20
|
+
const message = `Unsupported demo template ${template}; supported templates: ${SUPPORTED_DEMO_TEMPLATES.join(", ")}`;
|
|
20
21
|
if (options.json) {
|
|
21
22
|
io.stderr(JSON.stringify({ ok: false, error: message }, null, 2));
|
|
22
23
|
}
|
|
@@ -28,7 +29,7 @@ export async function runDemoCommand(template, options, io = defaultCommandIO) {
|
|
|
28
29
|
try {
|
|
29
30
|
const targetDir = options.path
|
|
30
31
|
? resolve(options.path)
|
|
31
|
-
: await mkdtemp(join(tmpdir(),
|
|
32
|
+
: await mkdtemp(join(tmpdir(), `rrx-demo-${template}-`));
|
|
32
33
|
if (options.force) {
|
|
33
34
|
await rm(targetDir, { recursive: true, force: true });
|
|
34
35
|
}
|
|
@@ -87,7 +88,7 @@ export function registerDemoCommand(program) {
|
|
|
87
88
|
program
|
|
88
89
|
.command("demo")
|
|
89
90
|
.description("Create and run a zero-config demo.")
|
|
90
|
-
.argument("<template>",
|
|
91
|
+
.argument("<template>", `Demo template name (one of: ${SUPPORTED_DEMO_TEMPLATES.join(", ")})`)
|
|
91
92
|
.option("-p, --path <path>", "Destination directory")
|
|
92
93
|
.option("--force", "Replace the destination directory if it already exists", false)
|
|
93
94
|
.option("--json", "Emit machine-readable output", false)
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"demo.js","sourceRoot":"","sources":["../../../src/cli/commands/demo.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE,EAAE,MAAM,kBAAkB,CAAC;AACtD,OAAO,EAAE,MAAM,EAAE,MAAM,SAAS,CAAC;AACjC,OAAO,EAAE,IAAI,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAG1C,OAAO,EAAE,KAAK,EAAE,MAAM,OAAO,CAAC;AAE9B,OAAO,EAAE,UAAU,EAAE,MAAM,6CAA6C,CAAC;AACzE,OAAO,EAAE,eAAe,EAAE,MAAM,yCAAyC,CAAC;AAC1E,OAAO,EAAE,yBAAyB,EAAE,MAAM,+BAA+B,CAAC;AAC1E,OAAO,EAAE,YAAY,EAAE,MAAM,gCAAgC,CAAC;AAS9D,MAAM,gBAAgB,GAAc;IAClC,MAAM,EAAE,CAAC,OAAO,EAAE,EAAE;QAClB,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,GAAG,OAAO,IAAI,CAAC,CAAC;IACvC,CAAC;IACD,MAAM,EAAE,CAAC,OAAO,EAAE,EAAE;QAClB,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,GAAG,OAAO,IAAI,CAAC,CAAC;IACvC,CAAC;CACF,CAAC;AAEF,MAAM,CAAC,KAAK,UAAU,cAAc,CAClC,QAAgB,EAChB,OAA2B,EAC3B,KAAgB,gBAAgB;IAEhC,IAAI,QAAQ,
|
|
1
|
+
{"version":3,"file":"demo.js","sourceRoot":"","sources":["../../../src/cli/commands/demo.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE,EAAE,MAAM,kBAAkB,CAAC;AACtD,OAAO,EAAE,MAAM,EAAE,MAAM,SAAS,CAAC;AACjC,OAAO,EAAE,IAAI,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAG1C,OAAO,EAAE,KAAK,EAAE,MAAM,OAAO,CAAC;AAE9B,OAAO,EAAE,UAAU,EAAE,MAAM,6CAA6C,CAAC;AACzE,OAAO,EAAE,eAAe,EAAE,MAAM,yCAAyC,CAAC;AAC1E,OAAO,EAAE,yBAAyB,EAAE,MAAM,+BAA+B,CAAC;AAC1E,OAAO,EAAE,YAAY,EAAE,MAAM,gCAAgC,CAAC;AAS9D,MAAM,CAAC,MAAM,wBAAwB,GAAG,CAAC,SAAS,EAAE,MAAM,CAAU,CAAC;AAGrE,MAAM,gBAAgB,GAAc;IAClC,MAAM,EAAE,CAAC,OAAO,EAAE,EAAE;QAClB,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,GAAG,OAAO,IAAI,CAAC,CAAC;IACvC,CAAC;IACD,MAAM,EAAE,CAAC,OAAO,EAAE,EAAE;QAClB,OAAO,CAAC,MAAM,CAAC,KAAK,CAAC,GAAG,OAAO,IAAI,CAAC,CAAC;IACvC,CAAC;CACF,CAAC;AAEF,MAAM,CAAC,KAAK,UAAU,cAAc,CAClC,QAAgB,EAChB,OAA2B,EAC3B,KAAgB,gBAAgB;IAEhC,IAAI,CAAE,wBAA8C,CAAC,QAAQ,CAAC,QAAQ,CAAC,EAAE,CAAC;QACxE,MAAM,OAAO,GAAG,6BAA6B,QAAQ,0BAA0B,wBAAwB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;QACrH,IAAI,OAAO,CAAC,IAAI,EAAE,CAAC;YACjB,EAAE,CAAC,MAAM,CAAC,IAAI,CAAC,SAAS,CAAC,EAAE,EAAE,EAAE,KAAK,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC;QACpE,CAAC;aAAM,CAAC;YACN,EAAE,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;QACrB,CAAC;QACD,OAAO,CAAC,CAAC;IACX,CAAC;IAED,IAAI,CAAC;QACH,MAAM,SAAS,GAAG,OAAO,CAAC,IAAI;YAC5B,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,IAAI,CAAC;YACvB,CAAC,CAAC,MAAM,OAAO,CAAC,IAAI,CAAC,MAAM,EAAE,EAAE,YAAY,QAAQ,GAAG,CAAC,CAAC,CAAC;QAC3D,IAAI,OAAO,CAAC,KAAK,EAAE,CAAC;YAClB,MAAM,EAAE,CAAC,SAAS,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,KAAK,EAAE,IAAI,EAAE,CAAC,CAAC;QACxD,CAAC;QACD,MAAM,KAAK,CAAC,SAAS,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAC;QAE5C,MAAM,YAAY,CAAC,QAAQ,EAAE,SAAS,EAAE;YACtC,GAAG,CAAC,OAAO,CAAC,KAAK,KAAK,SAAS,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,EAAE,KAAK,EAAE,OAAO,CAAC,KAAK,EAAE,CAAC;SACjE,CAAC,CAAC;QACH,MAAM,kBAAkB,CAAC,SAAS,CAAC,CAAC;QAEpC,MAAM,OAAO,GAAG,IAAI,eAAe,EAAE,CAAC;QACtC,MAAM,MAAM,GAAG,MAAM,OAAO,CAAC,GAAG,CAAC;YAC/B,QAAQ,EAAE,SAAS;YACnB,YAAY,EAAE,IAAI,CAAC,SAAS,EAAE,yBAAyB,CAAC;SACzD,CAAC,CAAC;QAEH,MAAM,KAAK,GAAG,MAAM,CAAC,SAAS,EAAE,GAAG,CAAC,KAAK,CAAC;QAC1C,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,MAAM,IAAI,KAAK,CAAC,iDAAiD,MAAM,CAAC,MAAM,EAAE,CAAC,CAAC;QACpF,CAAC;QAED,MAAM,UAAU,GAAG,MAAM,UAAU,CAAC;YAClC,QAAQ,EAAE,SAAS;YACnB,YAAY,EAAE,IAAI,CAAC,SAAS,EAAE,yBAAyB,CAAC;YACxD,KAAK;SACN,CAAC,CAAC;QAEH,IAAI,OAAO,CAAC,IAAI,EAAE,CAAC;YACjB,EAAE,CAAC,MAAM,CACP,IAAI,CAAC,SAAS,CACZ;gBACE,EAAE,EAAE,IAAI;gBACR,QAAQ;gBACR,SAAS;gBACT,MAAM,EAAE,MAAM,CAAC,MAAM;gBACrB,KAAK;gBACL,OAAO,EAAE,UAAU,CAAC,cAAc;aACnC,EACD,IAAI,EACJ,CAAC,CACF,CACF,CAAC;QACJ,CAAC;aAAM,CAAC;YACN,EAAE,CAAC,MAAM,CACP;gBACE,mBAAmB,SAAS,EAAE;gBAC9B,iBAAiB,MAAM,CAAC,MAAM,EAAE;gBAChC,QAAQ,KAAK,EAAE;gBACf,aAAa,UAAU,CAAC,cAAc,CAAC,cAAc,IAAI,KAAK,EAAE;gBAChE,YAAY,SAAS,mBAAmB,KAAK,SAAS;aACvD,CAAC,IAAI,CAAC,IAAI,CAAC,CACb,CAAC;QACJ,CAAC;QAED,OAAO,CAAC,CAAC;IACX,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QACf,MAAM,OAAO,GAAG,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,oBAAoB,CAAC;QAC9E,IAAI,OAAO,CAAC,IAAI,EAAE,CAAC;YACjB,EAAE,CAAC,MAAM,CAAC,IAAI,CAAC,SAAS,CAAC,EAAE,EAAE,EAAE,KAAK,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC;QACpE,CAAC;aAAM,CAAC;YACN,EAAE,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;QACrB,CAAC;QACD,OAAO,CAAC,CAAC;IACX,CAAC;AACH,CAAC;AAED,MAAM,UAAU,mBAAmB,CAAC,OAAgB;IAClD,OAAO;SACJ,OAAO,CAAC,MAAM,CAAC;SACf,WAAW,CAAC,oCAAoC,CAAC;SACjD,QAAQ,CACP,YAAY,EACZ,+BAA+B,wBAAwB,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,CACtE;SACA,MAAM,CAAC,mBAAmB,EAAE,uBAAuB,CAAC;SACpD,MAAM,CAAC,SAAS,EAAE,wDAAwD,EAAE,KAAK,CAAC;SAClF,MAAM,CAAC,QAAQ,EAAE,8BAA8B,EAAE,KAAK,CAAC;SACvD,MAAM,CAAC,KAAK,EAAE,QAAgB,EAAE,OAA2B,EAAE,EAAE;QAC9D,MAAM,QAAQ,GAAG,MAAM,cAAc,CAAC,QAAQ,EAAE,OAAO,CAAC,CAAC;QACzD,IAAI,QAAQ,KAAK,CAAC,EAAE,CAAC;YACnB,OAAO,CAAC,QAAQ,GAAG,QAAQ,CAAC;QAC9B,CAAC;IACH,CAAC,CAAC,CAAC;AACP,CAAC;AAED,KAAK,UAAU,kBAAkB,CAAC,QAAgB;IAChD,MAAM,KAAK,CAAC,KAAK,EAAE,CAAC,MAAM,CAAC,EAAE,EAAE,GAAG,EAAE,QAAQ,EAAE,CAAC,CAAC;IAChD,MAAM,KAAK,CAAC,KAAK,EAAE,CAAC,QAAQ,EAAE,WAAW,EAAE,qBAAqB,CAAC,EAAE,EAAE,GAAG,EAAE,QAAQ,EAAE,CAAC,CAAC;IACtF,MAAM,KAAK,CAAC,KAAK,EAAE,CAAC,QAAQ,EAAE,YAAY,EAAE,kBAAkB,CAAC,EAAE,EAAE,GAAG,EAAE,QAAQ,EAAE,CAAC,CAAC;IACpF,MAAM,KAAK,CAAC,KAAK,EAAE,CAAC,KAAK,EAAE,GAAG,CAAC,EAAE,EAAE,GAAG,EAAE,QAAQ,EAAE,CAAC,CAAC;IACpD,MAAM,KAAK,CAAC,KAAK,EAAE,CAAC,QAAQ,EAAE,IAAI,EAAE,cAAc,CAAC,EAAE,EAAE,GAAG,EAAE,QAAQ,EAAE,CAAC,CAAC;IACxE,MAAM,KAAK,CAAC,KAAK,EAAE,CAAC,QAAQ,EAAE,IAAI,EAAE,MAAM,CAAC,EAAE,EAAE,GAAG,EAAE,QAAQ,EAAE,CAAC,CAAC;AAClE,CAAC"}
|
package/dist/cli/program.js
CHANGED
|
@@ -18,7 +18,7 @@ export function createProgram(dependencies = {}) {
|
|
|
18
18
|
program
|
|
19
19
|
.name("rrx")
|
|
20
20
|
.description("Local-first runtime for recursive research improvement.")
|
|
21
|
-
.version("0.1.
|
|
21
|
+
.version("0.1.6")
|
|
22
22
|
.argument("[goal]", "Goal to pursue through the v1 TUI research orchestrator")
|
|
23
23
|
.action(async (goal) => {
|
|
24
24
|
if (goal === undefined) {
|
package/dist/mcp/server.js
CHANGED
|
@@ -14,7 +14,7 @@ export function createRalphResearchMcpServer(options = {}) {
|
|
|
14
14
|
(() => new ResearchSessionRecoveryService());
|
|
15
15
|
const server = new McpServer({
|
|
16
16
|
name: "ralph-research",
|
|
17
|
-
version: "0.1.
|
|
17
|
+
version: "0.1.6",
|
|
18
18
|
});
|
|
19
19
|
server.registerTool("run_research_cycle", {
|
|
20
20
|
description: "Run one or more research cycles using the shared ralph-research service layer.",
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ralph-research",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.6",
|
|
4
4
|
"description": "Local-first runtime for recursive research improvement.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -24,9 +24,27 @@
|
|
|
24
24
|
"research",
|
|
25
25
|
"ratchet",
|
|
26
26
|
"cli",
|
|
27
|
-
"mcp"
|
|
27
|
+
"mcp",
|
|
28
|
+
"agent",
|
|
29
|
+
"llm",
|
|
30
|
+
"local-first",
|
|
31
|
+
"typescript",
|
|
32
|
+
"nodejs",
|
|
33
|
+
"codex"
|
|
28
34
|
],
|
|
29
35
|
"license": "MIT",
|
|
36
|
+
"author": "coyaSONG",
|
|
37
|
+
"homepage": "https://github.com/coyaSONG/ralph-research#readme",
|
|
38
|
+
"repository": {
|
|
39
|
+
"type": "git",
|
|
40
|
+
"url": "git+https://github.com/coyaSONG/ralph-research.git"
|
|
41
|
+
},
|
|
42
|
+
"bugs": {
|
|
43
|
+
"url": "https://github.com/coyaSONG/ralph-research/issues"
|
|
44
|
+
},
|
|
45
|
+
"engines": {
|
|
46
|
+
"node": ">=24"
|
|
47
|
+
},
|
|
30
48
|
"dependencies": {
|
|
31
49
|
"@modelcontextprotocol/sdk": "^1.17.4",
|
|
32
50
|
"commander": "^14.0.1",
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# Code template
|
|
2
|
+
|
|
3
|
+
A self-contained `ralph-research` template that drives a test-pass ratchet
|
|
4
|
+
over a tiny JavaScript calculator module. Uses only Node's built-in
|
|
5
|
+
`node:test` runner, so the first cycle runs with no external toolchain.
|
|
6
|
+
|
|
7
|
+
## What ships in this template
|
|
8
|
+
|
|
9
|
+
- `src/calculator.mjs` — exports `sum` and `multiply` with deliberate bugs
|
|
10
|
+
- `tests/calculator.test.mjs` — four assertions covering both functions
|
|
11
|
+
- `scripts/propose.mjs` — overwrites `src/calculator.mjs` with the fixed
|
|
12
|
+
implementation
|
|
13
|
+
- `scripts/experiment.mjs` — runs `node --test --test-reporter=tap` against
|
|
14
|
+
the test file and parses the TAP summary into `out/test-results.json`
|
|
15
|
+
- `scripts/metric.mjs` — reads `out/test-results.json` and prints the
|
|
16
|
+
pass count as the `tests_passed` metric
|
|
17
|
+
- `ralph.yaml` — wires the above into a `single_best` frontier with an
|
|
18
|
+
`epsilon_improve` ratchet
|
|
19
|
+
|
|
20
|
+
## Running this template
|
|
21
|
+
|
|
22
|
+
From the directory that contains `ralph.yaml`:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
rrx validate
|
|
26
|
+
rrx doctor
|
|
27
|
+
rrx run --json
|
|
28
|
+
rrx inspect run-0001 --json
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
On a fresh checkout the cycle promotes `tests_passed` from `0` to `4` and
|
|
32
|
+
the ratchet accepts. Subsequent cycles run against the already-fixed
|
|
33
|
+
calculator and are rejected because the candidate cannot improve on the
|
|
34
|
+
incumbent.
|
|
35
|
+
|
|
36
|
+
To extend the template into a real research loop, replace the proposer with
|
|
37
|
+
a real candidate generator (for example, a small LLM call that rewrites
|
|
38
|
+
`src/calculator.mjs` to add a new function) and broaden the test suite so
|
|
39
|
+
the ratchet has something meaningful to compare on each cycle.
|
|
40
|
+
|
|
41
|
+
See [`docs/operation-model.md`](../../docs/operation-model.md) for the
|
|
42
|
+
runtime contract every manifest must honor.
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
schemaVersion: "0.1"
|
|
2
|
+
|
|
3
|
+
project:
|
|
4
|
+
name: code-demo
|
|
5
|
+
artifact: code
|
|
6
|
+
baselineRef: main
|
|
7
|
+
workspace: git
|
|
8
|
+
|
|
9
|
+
scope:
|
|
10
|
+
allowedGlobs:
|
|
11
|
+
- "src/**"
|
|
12
|
+
- "tests/**"
|
|
13
|
+
- "out/**"
|
|
14
|
+
maxFilesChanged: 2
|
|
15
|
+
maxLineDelta: 40
|
|
16
|
+
|
|
17
|
+
proposer:
|
|
18
|
+
type: command
|
|
19
|
+
command: "node scripts/propose.mjs"
|
|
20
|
+
|
|
21
|
+
experiment:
|
|
22
|
+
run:
|
|
23
|
+
command: "node scripts/experiment.mjs"
|
|
24
|
+
outputs:
|
|
25
|
+
- id: test-results
|
|
26
|
+
path: out/test-results.json
|
|
27
|
+
|
|
28
|
+
metrics:
|
|
29
|
+
catalog:
|
|
30
|
+
- id: tests_passed
|
|
31
|
+
kind: numeric
|
|
32
|
+
direction: maximize
|
|
33
|
+
extractor:
|
|
34
|
+
type: command
|
|
35
|
+
command: "node scripts/metric.mjs"
|
|
36
|
+
parser: plain_number
|
|
37
|
+
|
|
38
|
+
constraints: []
|
|
39
|
+
|
|
40
|
+
frontier:
|
|
41
|
+
strategy: single_best
|
|
42
|
+
primaryMetric: tests_passed
|
|
43
|
+
|
|
44
|
+
ratchet:
|
|
45
|
+
type: epsilon_improve
|
|
46
|
+
metric: tests_passed
|
|
47
|
+
epsilon: 0
|
|
48
|
+
|
|
49
|
+
# Optional progressive-stop contract:
|
|
50
|
+
# stopping:
|
|
51
|
+
# target:
|
|
52
|
+
# metric: tests_passed
|
|
53
|
+
# op: ">="
|
|
54
|
+
# value: 4
|
|
55
|
+
|
|
56
|
+
storage:
|
|
57
|
+
root: .ralph
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
import { spawnSync } from "node:child_process";
|
|
2
|
+
import { mkdirSync, writeFileSync } from "node:fs";
|
|
3
|
+
import { join } from "node:path";
|
|
4
|
+
|
|
5
|
+
mkdirSync(join(process.cwd(), "out"), { recursive: true });
|
|
6
|
+
|
|
7
|
+
const result = spawnSync(
|
|
8
|
+
process.execPath,
|
|
9
|
+
["--test", "--test-reporter=tap", "tests/calculator.test.mjs"],
|
|
10
|
+
{
|
|
11
|
+
cwd: process.cwd(),
|
|
12
|
+
encoding: "utf8",
|
|
13
|
+
},
|
|
14
|
+
);
|
|
15
|
+
|
|
16
|
+
const combined = `${result.stdout ?? ""}\n${result.stderr ?? ""}`;
|
|
17
|
+
const passMatch = combined.match(/# pass (\d+)/);
|
|
18
|
+
const failMatch = combined.match(/# fail (\d+)/);
|
|
19
|
+
|
|
20
|
+
const passed = passMatch ? Number(passMatch[1]) : 0;
|
|
21
|
+
const failed = failMatch ? Number(failMatch[1]) : 0;
|
|
22
|
+
|
|
23
|
+
writeFileSync(
|
|
24
|
+
join(process.cwd(), "out", "test-results.json"),
|
|
25
|
+
`${JSON.stringify({ passed, failed }, null, 2)}\n`,
|
|
26
|
+
"utf8",
|
|
27
|
+
);
|
|
28
|
+
|
|
29
|
+
console.log("experiment complete");
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
import { writeFileSync } from "node:fs";
|
|
2
|
+
import { join } from "node:path";
|
|
3
|
+
|
|
4
|
+
const fixedCalculator = `export function sum(a, b) {
|
|
5
|
+
return a + b;
|
|
6
|
+
}
|
|
7
|
+
|
|
8
|
+
export function multiply(a, b) {
|
|
9
|
+
return a * b;
|
|
10
|
+
}
|
|
11
|
+
`;
|
|
12
|
+
|
|
13
|
+
writeFileSync(join(process.cwd(), "src", "calculator.mjs"), fixedCalculator, "utf8");
|
|
14
|
+
console.log("proposal complete");
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
import { test } from "node:test";
|
|
2
|
+
import { strict as assert } from "node:assert";
|
|
3
|
+
|
|
4
|
+
import { multiply, sum } from "../src/calculator.mjs";
|
|
5
|
+
|
|
6
|
+
test("sum adds two positive integers", () => {
|
|
7
|
+
assert.equal(sum(2, 3), 5);
|
|
8
|
+
});
|
|
9
|
+
|
|
10
|
+
test("sum handles a zero operand", () => {
|
|
11
|
+
assert.equal(sum(0, 7), 7);
|
|
12
|
+
});
|
|
13
|
+
|
|
14
|
+
test("multiply multiplies two positive integers", () => {
|
|
15
|
+
assert.equal(multiply(3, 4), 12);
|
|
16
|
+
});
|
|
17
|
+
|
|
18
|
+
test("multiply by one is identity", () => {
|
|
19
|
+
assert.equal(multiply(1, 9), 9);
|
|
20
|
+
});
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# Writing template
|
|
2
|
+
|
|
3
|
+
A self-contained `ralph-research` template that demonstrates the
|
|
4
|
+
write-evaluate-accept loop on a markdown draft.
|
|
5
|
+
|
|
6
|
+
## What ships in this template
|
|
7
|
+
|
|
8
|
+
- `docs/draft.md` — the baseline draft the runtime improves
|
|
9
|
+
- `scripts/propose.mjs` — overwrites `docs/draft.md` with a bounded rewrite
|
|
10
|
+
- `scripts/experiment.mjs` — copies the candidate draft into `out/draft.md`
|
|
11
|
+
- `scripts/metric.mjs` — emits a numeric `quality` score from keyword presence
|
|
12
|
+
(no API key, no LLM call)
|
|
13
|
+
- `prompts/judge.md` — starter prompt for an optional pairwise LLM judge
|
|
14
|
+
- `ralph.yaml` — the manifest that wires the four pieces above into the
|
|
15
|
+
runtime
|
|
16
|
+
|
|
17
|
+
The manifest enables `quality` as a numeric metric backed by `metric.mjs`. The
|
|
18
|
+
optional `judgePacks` block is commented-out scaffolding for when you swap
|
|
19
|
+
the numeric metric for an LLM judge.
|
|
20
|
+
|
|
21
|
+
## Running this template
|
|
22
|
+
|
|
23
|
+
From the directory that contains `ralph.yaml`:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
rrx validate # check the manifest parses
|
|
27
|
+
rrx doctor # sanity-check the working tree
|
|
28
|
+
rrx run --json # execute one cycle
|
|
29
|
+
rrx inspect run-0001 --json
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
`rrx run` writes `.ralph/runs/run-0001/run.json`,
|
|
33
|
+
`.ralph/runs/run-0001/decision.json`, and `.ralph/frontier.json`. Inspecting
|
|
34
|
+
those three files is the fastest way to understand what the runtime
|
|
35
|
+
actually persists.
|
|
36
|
+
|
|
37
|
+
## Extending this template
|
|
38
|
+
|
|
39
|
+
- Replace `scripts/metric.mjs` with a real quality metric you trust.
|
|
40
|
+
- Uncomment the `judgePacks` block in `ralph.yaml` and point it at a real
|
|
41
|
+
judge model to compare candidates pairwise.
|
|
42
|
+
- Add files to `docs/` and broaden the `scope.allowedGlobs` if you want the
|
|
43
|
+
proposer to touch more than a single draft.
|
|
44
|
+
|
|
45
|
+
See [`docs/operation-model.md`](../../docs/operation-model.md) for the
|
|
46
|
+
runtime contract every manifest must honor.
|