dream-cycle 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +176 -0
- package/bin/dream-cycle.mjs +71 -0
- package/lib/config.mjs +109 -0
- package/lib/doctor.mjs +150 -0
- package/lib/gate.mjs +289 -0
- package/lib/harden.mjs +38 -0
- package/lib/init.mjs +41 -0
- package/lib/radar.mjs +145 -0
- package/lib/run.mjs +206 -0
- package/package.json +14 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 dream-cycle contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
# dream-cycle
|
|
2
|
+
|
|
3
|
+
Dream Cycle v0: sintetiza un corpus de evaluación **post-cutoff** desde cualquier
|
|
4
|
+
repo de GitHub (repo2rlenv `pr_diff`), lo calibra con agentes oracle/nop vía
|
|
5
|
+
harbor, lo gatea con falsificación y corre un radar de capacidad reactivo.
|
|
6
|
+
|
|
7
|
+
**El radar propone; el humano decide.**
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
1. SINTETIZAR repo2rlenv generate pr_diff (PRs post-cutoff, sin LLM)
|
|
11
|
+
│
|
|
12
|
+
2. ENDURECER Dockerfiles (repos privados: set-url después del reset)
|
|
13
|
+
│
|
|
14
|
+
3. CALIBRAR harbor oracle (gold ⇒ ~1.0) + nop (nulo ⇒ 0.0)
|
|
15
|
+
│
|
|
16
|
+
4. CURAR tareas inevaluables (oracle < 0.999) → exclusiones DECLARADAS
|
|
17
|
+
│
|
|
18
|
+
5. GATE oracle debe PASAR · nop debe FALLAR (falsificación)
|
|
19
|
+
│
|
|
20
|
+
6. RADAR compara result.json entre runs → gaps con borrador de issue
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Alcance: dónde encaja en el loop de investigación
|
|
24
|
+
|
|
25
|
+
Este paquete cubre **solo la etapa 0** del loop de auto-mejora
|
|
26
|
+
([BLUEPRINT-008](https://github.com/DarkCodePE/investigador/blob/main/docs/research-flow/BLUEPRINT-008-repo2rlenv-integration.md)):
|
|
27
|
+
decidir QUÉ se mejora, con corpus gateado y radar de gaps. Las etapas
|
|
28
|
+
siguientes del ciclo viven fuera del paquete y **no son requisitos** para
|
|
29
|
+
usarlo:
|
|
30
|
+
|
|
31
|
+
| Etapa del ciclo | Pieza | Relación con este paquete |
|
|
32
|
+
|---|---|---|
|
|
33
|
+
| 0 — sintetizar · gatear · radar | **este paquete** | `dream-cycle run` / `radar` |
|
|
34
|
+
| 2A+3A — auto-mejora del harness | [SIA](https://github.com/hexo-ai/sia) ([ADR-010](https://github.com/DarkCodePE/investigador/blob/main/docs/research-flow/ADR-010-sia-loop-integration.md)) | consume las tareas que este paquete sintetiza, vía el adapter `repo2rlenv-to-sia.py` |
|
|
35
|
+
| memoria cross-run | bridge Harbor→ReasoningBank ([ADR-009](https://github.com/DarkCodePE/investigador/blob/main/docs/research-flow/ADR-009-harbor-reasoningbank-adapter.md)) | destila las trayectorias de los jobs harbor que este paquete produce |
|
|
36
|
+
|
|
37
|
+
El Dream Cycle decide qué se mejora; SIA reescribe; ReasoningBank recuerda.
|
|
38
|
+
SIA no detecta gaps — necesita que le den la tarea (ADR-010 §Deliberately NOT) —
|
|
39
|
+
por eso el radar de este paquete es quien alimenta esa cadena.
|
|
40
|
+
|
|
41
|
+
## Instalación
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
npm install -g dream-cycle # o sin instalar:
|
|
45
|
+
npx dream-cycle doctor
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Requisitos externos
|
|
49
|
+
|
|
50
|
+
El paquete no trae dependencias npm, pero el subcomando `run` orquesta cuatro
|
|
51
|
+
herramientas externas que debes instalar por separado. Verifícalas con
|
|
52
|
+
`dream-cycle doctor`.
|
|
53
|
+
|
|
54
|
+
| Herramienta | Qué es | Instalación |
|
|
55
|
+
|---|---|---|
|
|
56
|
+
| [repo2rlenv](https://github.com/huggingface/Repo2RLEnv) | convierte repos en entornos RL evaluables (sintetiza el corpus `pr_diff`) | `uv tool install repo2rlenv==0.8.3` (PyPI, requiere Python ≥ 3.12) |
|
|
57
|
+
| [harbor](https://pypi.org/project/harbor/) | runner de agentes sobre entornos (calibración oracle/nop) | `uv tool install harbor` |
|
|
58
|
+
| [gh](https://cli.github.com/) | GitHub CLI autenticado (minado de PRs) — **≥ 2.49** (`pr list --json baseRefOid`) | [releases de cli/cli](https://github.com/cli/cli/releases) + `gh auth login` |
|
|
59
|
+
| [docker](https://docs.docker.com/get-docker/) | construye y corre los entornos de cada tarea | Docker Desktop o Engine, con el daemon corriendo |
|
|
60
|
+
|
|
61
|
+
Si una herramienta no está en `PATH` (p. ej. la usas desde un venv local),
|
|
62
|
+
apunta a ella con su variable de entorno:
|
|
63
|
+
|
|
64
|
+
| Variable de entorno | Efecto |
|
|
65
|
+
|---|---|
|
|
66
|
+
| `REPO2RLENV_BIN` | ruta al binario `repo2rlenv` (p. ej. `.venv/bin/repo2rlenv`) |
|
|
67
|
+
| `HARBOR_BIN` | ruta al binario `harbor` |
|
|
68
|
+
| `GH_BIN` | ruta al binario `gh` (GitHub CLI) |
|
|
69
|
+
| `DOCKER_BIN` | ruta al binario `docker` |
|
|
70
|
+
| `RUNS_DIR` | sobreescribe el directorio de result.json del gate (máxima precedencia) |
|
|
71
|
+
| `RADAR_JSON` | radar: emite el reporte como JSON a stdout |
|
|
72
|
+
| `RADAR_NO_WRITE` | radar: no escribe `dream-cycle-radar-latest.json` |
|
|
73
|
+
| `BENCH_JSON` | gate: emite el summary como JSON a stdout |
|
|
74
|
+
| `BENCH_NO_WRITE` | gate: no escribe result.json |
|
|
75
|
+
| `DOCKER_HOST` | socket de docker (default `unix:///var/run/docker.sock`) |
|
|
76
|
+
|
|
77
|
+
Resolución de binarios: variable de entorno → `.dream-cycle/config.json`
|
|
78
|
+
(`bins.<name>`) → `PATH`.
|
|
79
|
+
|
|
80
|
+
### Versiones de repo2rlenv
|
|
81
|
+
|
|
82
|
+
repo2rlenv saca releases con frecuencia. Este paquete NO lo congela (es una
|
|
83
|
+
herramienta externa, no una dependencia npm); el manejo es por capas:
|
|
84
|
+
|
|
85
|
+
1. **Versión probada declarada**: el pipeline está validado E2E contra
|
|
86
|
+
`repo2rlenv 0.8.3` — por eso la instalación recomendada lleva pin. `doctor`
|
|
87
|
+
muestra la versión detectada y **avisa** si difiere de la probada.
|
|
88
|
+
2. **`validate` después de `generate`**: `run` ejecuta el `validate` del propio
|
|
89
|
+
repo2rlenv sobre el corpus recién sintetizado y aborta si el formato no
|
|
90
|
+
cierra (exit 1).
|
|
91
|
+
3. **Calibración oracle/nop como red final**: aunque el formato pase, un cambio
|
|
92
|
+
de comportamiento upstream que rompa los rewards cae en el gate — oracle
|
|
93
|
+
que no llega a ~1.0 o nop que no da 0.0 abortan el run (exits 3–5).
|
|
94
|
+
|
|
95
|
+
Para adoptar una versión nueva: instalarla, re-correr `dream-cycle run` contra
|
|
96
|
+
un repo conocido, y si el gate pasa, actualizar el pin (README) y
|
|
97
|
+
`R2E_TESTED_VERSION` (lib/doctor.mjs).
|
|
98
|
+
|
|
99
|
+
## Subcomandos
|
|
100
|
+
|
|
101
|
+
### `dream-cycle doctor [--json]`
|
|
102
|
+
|
|
103
|
+
Diagnostica node (≥20), docker (CLI + daemon), gh (CLI + auth), harbor y
|
|
104
|
+
repo2rlenv. Exit 0 si todo está, 1 si falta algo.
|
|
105
|
+
|
|
106
|
+
### `dream-cycle run <owner/repo> [--limit N] [--since YYYY-MM-DD]`
|
|
107
|
+
|
|
108
|
+
Pipeline completo contra cualquier repo. `--since` es el guardarraíl de
|
|
109
|
+
contaminación: solo se minan PRs posteriores a esa fecha (default `2026-02-01`).
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
dream-cycle run astral-sh/ruff --limit 20 --since 2026-02-01
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
| Exit | Significado |
|
|
116
|
+
|---|---|
|
|
117
|
+
| 0 | éxito |
|
|
118
|
+
| 1 | subproceso/genérico (generate/validate fallaron, binarios faltantes) |
|
|
119
|
+
| 2 | error de uso |
|
|
120
|
+
| 3 | corpus insuficiente (< minTasks tareas antes o después de curar) |
|
|
121
|
+
| 4 | gate oracle FALLÓ tras curación — investigar |
|
|
122
|
+
| 5 | el gate PASÓ con el agente nulo — el corpus NO discrimina |
|
|
123
|
+
|
|
124
|
+
### `dream-cycle radar [--runs-dir <dir>] [--benchmark <name>] [--epsilon <n>] [--target <reward>]`
|
|
125
|
+
|
|
126
|
+
Radar reactivo: compara los result.json del gate entre runs y declara gaps
|
|
127
|
+
(`regression`, `stagnation`, `capability_gap`, `coverage_gap`, `trust_gap`)
|
|
128
|
+
con un borrador de issue cada uno. Exit 1 = hay gaps (señal, no error).
|
|
129
|
+
|
|
130
|
+
### `dream-cycle init`
|
|
131
|
+
|
|
132
|
+
Crea `.dream-cycle/` con `config.json` (defaults), `corpus/`, `jobs/`, `runs/`
|
|
133
|
+
y un `.gitignore`. Idempotente: nunca pisa un config existente. **No es
|
|
134
|
+
prerequisito** — sin config todo funciona con los defaults.
|
|
135
|
+
|
|
136
|
+
## `.dream-cycle/` y qué se commitea
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
.dream-cycle/
|
|
140
|
+
├── config.json # SÍ se commitea (paths, bins, defaults del proyecto)
|
|
141
|
+
├── .gitignore # ignora los tres de abajo
|
|
142
|
+
├── corpus/ # artefactos locales — NO
|
|
143
|
+
├── jobs/ # jobs harbor — NO
|
|
144
|
+
└── runs/ # result.json del gate — decide el humano (sirven de historial del radar)
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
## Protocolo de honestidad
|
|
148
|
+
|
|
149
|
+
- **Curación declarada**: las tareas inevaluables (oracle < 0.999 o sin reward)
|
|
150
|
+
se excluyen, pero la exclusión queda VISIBLE en `tasksExcluded` del result.json
|
|
151
|
+
y cambia el `corpusVersion` (que se calcula sobre el set efectivo).
|
|
152
|
+
- **Falsificación nop obligatoria**: un corpus solo vale si el agente nulo lo
|
|
153
|
+
FALLA. Si el nop pasa, el corpus no discrimina y el pipeline aborta (exit 5).
|
|
154
|
+
- **Targets antes de resultados**: el gate en modo agent se NIEGA a gatear sin
|
|
155
|
+
un `--target-reward-mean` declarado por el invocador (anti-fabricación).
|
|
156
|
+
- **El radar propone, el humano decide**: los `issueDraft` requieren revisión
|
|
157
|
+
humana antes de abrirse; el radar nunca abre issues ni decide qué construir.
|
|
158
|
+
|
|
159
|
+
## Seguridad
|
|
160
|
+
|
|
161
|
+
- **NUNCA tokens en archivos**: el overlay de compose para repos privados solo
|
|
162
|
+
contiene el placeholder literal `${GITHUB_TOKEN}`; compose lo interpola desde
|
|
163
|
+
el entorno del proceso. El token jamás toca disco.
|
|
164
|
+
- Nota: un build-arg queda visible en los metadatos de la imagen local
|
|
165
|
+
(`docker history`); aceptable para runs locales — **no publicar esas imágenes**.
|
|
166
|
+
- El brazo agente opcional con claude-code se invoca a mano con placeholder:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
harbor run -p <corpus> -a claude-code -m <model> \
|
|
170
|
+
--ae CLAUDE_CODE_OAUTH_TOKEN=<token> --ae CLAUDE_FORCE_OAUTH=1 \
|
|
171
|
+
--env docker -o <jobs>/dc-<slug>-agent
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
## Licencia
|
|
175
|
+
|
|
176
|
+
MIT
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
// dream-cycle.mjs — dispatcher CLI del paquete dream-cycle.
|
|
3
|
+
//
|
|
4
|
+
// ÚNICO lugar del paquete que llama process.exit: las lib/* retornan exit codes
|
|
5
|
+
// o lanzan UsageError (uso, exit 2) / PipelineError (exit = code del error).
|
|
6
|
+
|
|
7
|
+
import { readFileSync } from 'node:fs';
|
|
8
|
+
import { UsageError, PipelineError } from '../lib/config.mjs';
|
|
9
|
+
|
|
10
|
+
const HELP = `uso: dream-cycle <comando> [opciones]
|
|
11
|
+
|
|
12
|
+
comandos:
|
|
13
|
+
doctor [--json]
|
|
14
|
+
verifica dependencias externas (node, docker, gh, harbor, repo2rlenv)
|
|
15
|
+
|
|
16
|
+
run <owner/repo> [--limit N] [--since YYYY-MM-DD]
|
|
17
|
+
pipeline completo: sintetizar (pr_diff) → endurecer → calibrar
|
|
18
|
+
(oracle+nop) → curar → gate → radar
|
|
19
|
+
|
|
20
|
+
radar [--runs-dir <dir>] [--benchmark <name>] [--epsilon <n>] [--target <reward>]
|
|
21
|
+
radar reactivo sobre el historial de result.json del gate
|
|
22
|
+
(RADAR_JSON=1 para JSON, RADAR_NO_WRITE=1 para no escribir)
|
|
23
|
+
|
|
24
|
+
init
|
|
25
|
+
crea .dream-cycle/ (config.json, corpus/, jobs/, runs/) — idempotente
|
|
26
|
+
|
|
27
|
+
--version | --help
|
|
28
|
+
|
|
29
|
+
exit codes de run:
|
|
30
|
+
0 éxito · 1 subproceso/genérico · 2 uso · 3 corpus insuficiente ·
|
|
31
|
+
4 gate oracle falló · 5 el agente nulo pasó (el corpus no discrimina)`;
|
|
32
|
+
|
|
33
|
+
async function main() {
|
|
34
|
+
const [cmd, ...rest] = process.argv.slice(2);
|
|
35
|
+
|
|
36
|
+
if (cmd === '--version' || cmd === '-v') {
|
|
37
|
+
const pkg = JSON.parse(readFileSync(new URL('../package.json', import.meta.url), 'utf-8'));
|
|
38
|
+
console.log(pkg.version);
|
|
39
|
+
return 0;
|
|
40
|
+
}
|
|
41
|
+
if (cmd === '--help' || cmd === '-h' || cmd === 'help') {
|
|
42
|
+
console.log(HELP);
|
|
43
|
+
return 0;
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
const ctx = { argv: rest, cwd: process.cwd(), env: process.env };
|
|
47
|
+
switch (cmd) {
|
|
48
|
+
case 'doctor': return (await import('../lib/doctor.mjs')).doctorCli(ctx);
|
|
49
|
+
case 'run': return (await import('../lib/run.mjs')).runCli(ctx);
|
|
50
|
+
case 'radar': return (await import('../lib/radar.mjs')).radarCli(ctx);
|
|
51
|
+
case 'init': return (await import('../lib/init.mjs')).initCli(ctx);
|
|
52
|
+
default:
|
|
53
|
+
console.error(HELP);
|
|
54
|
+
return 2;
|
|
55
|
+
}
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
try {
|
|
59
|
+
process.exit(await main());
|
|
60
|
+
} catch (e) {
|
|
61
|
+
if (e instanceof UsageError) {
|
|
62
|
+
console.error(`dream-cycle: ${e.message}`);
|
|
63
|
+
process.exit(2);
|
|
64
|
+
}
|
|
65
|
+
if (e instanceof PipelineError) {
|
|
66
|
+
console.error(`dream-cycle: ${e.message}`);
|
|
67
|
+
process.exit(e.code);
|
|
68
|
+
}
|
|
69
|
+
console.error(e?.stack ?? String(e));
|
|
70
|
+
process.exit(1);
|
|
71
|
+
}
|
package/lib/config.mjs
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
// config.mjs — carga/resolución de config, rutas y binarios del paquete dream-cycle.
|
|
2
|
+
//
|
|
3
|
+
// Sin .dream-cycle/config.json todo funciona con DEFAULT_CONFIG: init NO es
|
|
4
|
+
// prerequisito. Los paths del config son relativos al cwd del host y se resuelven
|
|
5
|
+
// a absolutos aquí. Los binarios externos se resuelven en orden:
|
|
6
|
+
// env (REPO2RLENV_BIN/HARBOR_BIN/GH_BIN/DOCKER_BIN) → config.bins.<name> → PATH → null
|
|
7
|
+
|
|
8
|
+
import { readFileSync, accessSync, constants, existsSync } from 'node:fs';
|
|
9
|
+
import { join, resolve, isAbsolute, delimiter } from 'node:path';
|
|
10
|
+
|
|
11
|
+
// ── Errores compartidos (las lib/* lanzan; SOLO bin/ llama process.exit) ──
|
|
12
|
+
export class UsageError extends Error {}
|
|
13
|
+
export class PipelineError extends Error {
|
|
14
|
+
constructor(message, code = 1) {
|
|
15
|
+
super(message);
|
|
16
|
+
this.code = code;
|
|
17
|
+
}
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
export const DEFAULT_CONFIG = {
|
|
21
|
+
version: 1,
|
|
22
|
+
paths: {
|
|
23
|
+
corpusDir: '.dream-cycle/corpus',
|
|
24
|
+
jobsDir: '.dream-cycle/jobs',
|
|
25
|
+
runsDir: '.dream-cycle/runs',
|
|
26
|
+
},
|
|
27
|
+
bins: { repo2rlenv: null, harbor: null, gh: null, docker: null },
|
|
28
|
+
defaults: {
|
|
29
|
+
limit: 20,
|
|
30
|
+
since: '2026-02-01', // guardarraíl de contaminación: solo PRs post-cutoff
|
|
31
|
+
benchmark: 'repo2rlenv-capability',
|
|
32
|
+
epsilon: 0.01,
|
|
33
|
+
target: null,
|
|
34
|
+
minTasks: 3,
|
|
35
|
+
oracleThreshold: 0.999,
|
|
36
|
+
},
|
|
37
|
+
};
|
|
38
|
+
|
|
39
|
+
// Variable de entorno que puede pisar cada binario del config
|
|
40
|
+
const BIN_ENV_VARS = {
|
|
41
|
+
repo2rlenv: 'REPO2RLENV_BIN',
|
|
42
|
+
harbor: 'HARBOR_BIN',
|
|
43
|
+
gh: 'GH_BIN',
|
|
44
|
+
docker: 'DOCKER_BIN',
|
|
45
|
+
};
|
|
46
|
+
|
|
47
|
+
// Merge profundo de objetos planos (arrays y escalares se reemplazan)
|
|
48
|
+
function deepMerge(base, over) {
|
|
49
|
+
for (const [k, v] of Object.entries(over ?? {})) {
|
|
50
|
+
if (v && typeof v === 'object' && !Array.isArray(v)
|
|
51
|
+
&& base[k] && typeof base[k] === 'object' && !Array.isArray(base[k])) {
|
|
52
|
+
deepMerge(base[k], v);
|
|
53
|
+
} else {
|
|
54
|
+
base[k] = v;
|
|
55
|
+
}
|
|
56
|
+
}
|
|
57
|
+
return base;
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
// ── Config ──
|
|
61
|
+
export function loadConfig(cwd) {
|
|
62
|
+
const configPath = join(cwd, '.dream-cycle', 'config.json');
|
|
63
|
+
if (!existsSync(configPath)) {
|
|
64
|
+
return { config: structuredClone(DEFAULT_CONFIG), configPath, exists: false };
|
|
65
|
+
}
|
|
66
|
+
let user;
|
|
67
|
+
try {
|
|
68
|
+
user = JSON.parse(readFileSync(configPath, 'utf-8'));
|
|
69
|
+
} catch (e) {
|
|
70
|
+
throw new UsageError(`config.json inválido (${configPath}): ${e.message}`);
|
|
71
|
+
}
|
|
72
|
+
return { config: deepMerge(structuredClone(DEFAULT_CONFIG), user), configPath, exists: true };
|
|
73
|
+
}
|
|
74
|
+
|
|
75
|
+
export function resolvePaths(config, cwd) {
|
|
76
|
+
const p = config?.paths ?? DEFAULT_CONFIG.paths;
|
|
77
|
+
return {
|
|
78
|
+
corpusDir: resolve(cwd, p.corpusDir ?? DEFAULT_CONFIG.paths.corpusDir),
|
|
79
|
+
jobsDir: resolve(cwd, p.jobsDir ?? DEFAULT_CONFIG.paths.jobsDir),
|
|
80
|
+
runsDir: resolve(cwd, p.runsDir ?? DEFAULT_CONFIG.paths.runsDir),
|
|
81
|
+
};
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
// ── Binarios ──
|
|
85
|
+
// which() casero: escanea env.PATH y devuelve la primera ruta ejecutable (o null).
|
|
86
|
+
// Acepta también rutas absolutas (verifica X_OK directamente).
|
|
87
|
+
export function which(cmd, env) {
|
|
88
|
+
if (!cmd) return null;
|
|
89
|
+
if (isAbsolute(cmd)) {
|
|
90
|
+
try { accessSync(cmd, constants.X_OK); return cmd; } catch { return null; }
|
|
91
|
+
}
|
|
92
|
+
for (const dir of (env?.PATH ?? '').split(delimiter)) {
|
|
93
|
+
if (!dir) continue;
|
|
94
|
+
const candidate = join(dir, cmd);
|
|
95
|
+
try { accessSync(candidate, constants.X_OK); return candidate; } catch { /* siguiente */ }
|
|
96
|
+
}
|
|
97
|
+
return null;
|
|
98
|
+
}
|
|
99
|
+
|
|
100
|
+
// Orden de resolución: env var → config.bins.<name> → PATH → null
|
|
101
|
+
export function resolveBin(name, config, env) {
|
|
102
|
+
const candidates = [env?.[BIN_ENV_VARS[name]], config?.bins?.[name], name];
|
|
103
|
+
for (const c of candidates) {
|
|
104
|
+
if (!c) continue;
|
|
105
|
+
const found = which(c, env);
|
|
106
|
+
if (found) return found;
|
|
107
|
+
}
|
|
108
|
+
return null;
|
|
109
|
+
}
|
package/lib/doctor.mjs
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
// doctor.mjs — detección de dependencias externas del Dream Cycle.
|
|
2
|
+
//
|
|
3
|
+
// El doctor NUNCA crashea: toda invocación externa va en try/catch con timeout.
|
|
4
|
+
// Reporta found/path/version por dependencia, warnings (daemon caído, gh sin
|
|
5
|
+
// auth) y hints accionables cuando falta algo.
|
|
6
|
+
|
|
7
|
+
import { execFileSync } from 'node:child_process';
|
|
8
|
+
import { loadConfig, resolveBin, UsageError } from './config.mjs';
|
|
9
|
+
|
|
10
|
+
const EXEC_OPTS = { timeout: 5000, stdio: ['ignore', 'pipe', 'pipe'], encoding: 'utf-8' };
|
|
11
|
+
|
|
12
|
+
// Última versión de repo2rlenv contra la que se validó el pipeline E2E.
|
|
13
|
+
// Actualizar tras re-correr `dream-cycle run` completo con una versión nueva.
|
|
14
|
+
const R2E_TESTED_VERSION = '0.8.3';
|
|
15
|
+
|
|
16
|
+
// repo2rlenv mina PRs con `gh pr list --json baseRefOid`; ese campo existe
|
|
17
|
+
// desde gh 2.49 — con gh anterior el minado falla con "Unknown JSON field".
|
|
18
|
+
const GH_MIN_VERSION = '2.49.0';
|
|
19
|
+
|
|
20
|
+
// Compara versiones "X.Y.Z": negativo si a < b
|
|
21
|
+
function cmpVersions(a, b) {
|
|
22
|
+
const pa = a.split('.').map(Number);
|
|
23
|
+
const pb = b.split('.').map(Number);
|
|
24
|
+
for (let i = 0; i < 3; i++) {
|
|
25
|
+
if ((pa[i] ?? 0) !== (pb[i] ?? 0)) return (pa[i] ?? 0) - (pb[i] ?? 0);
|
|
26
|
+
}
|
|
27
|
+
return 0;
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
// Ejecuta <bin> <args> y devuelve { ok, out } — jamás lanza
|
|
31
|
+
function tryExec(bin, args) {
|
|
32
|
+
try {
|
|
33
|
+
return { ok: true, out: (execFileSync(bin, args, EXEC_OPTS) ?? '').toString().trim() };
|
|
34
|
+
} catch {
|
|
35
|
+
return { ok: false, out: '' };
|
|
36
|
+
}
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
function firstLine(s) {
|
|
40
|
+
return s ? s.split('\n')[0].trim() : null;
|
|
41
|
+
}
|
|
42
|
+
|
|
43
|
+
export function runDoctor({ cwd, env }) {
|
|
44
|
+
const { config } = loadConfig(cwd);
|
|
45
|
+
const checks = [];
|
|
46
|
+
|
|
47
|
+
// node — el propio runtime; found si major >= 20 (engines del paquete)
|
|
48
|
+
const major = Number(process.version.slice(1).split('.')[0]);
|
|
49
|
+
checks.push({
|
|
50
|
+
name: 'node', required: true, found: major >= 20,
|
|
51
|
+
path: process.execPath, version: process.version, warning: null,
|
|
52
|
+
hint: major >= 20 ? null : 'dream-cycle requiere Node >= 20 — actualiza tu runtime',
|
|
53
|
+
});
|
|
54
|
+
|
|
55
|
+
// docker — CLI + daemon (daemon caído = warning, no FALTA)
|
|
56
|
+
const dockerBin = resolveBin('docker', config, env);
|
|
57
|
+
const docker = { name: 'docker', required: true, found: false, path: dockerBin, version: null, warning: null, hint: null };
|
|
58
|
+
if (dockerBin) {
|
|
59
|
+
const v = tryExec(dockerBin, ['--version']);
|
|
60
|
+
docker.found = v.ok;
|
|
61
|
+
docker.version = v.ok ? firstLine(v.out) : null;
|
|
62
|
+
if (v.ok && !tryExec(dockerBin, ['info']).ok) {
|
|
63
|
+
docker.warning = 'CLI encontrado pero daemon inalcanzable';
|
|
64
|
+
}
|
|
65
|
+
}
|
|
66
|
+
if (!docker.found) docker.hint = 'instala Docker o exporta DOCKER_BIN=/ruta/al/binario';
|
|
67
|
+
checks.push(docker);
|
|
68
|
+
|
|
69
|
+
// gh — CLI + autenticación (sin auth = warning: repos privados no funcionarán)
|
|
70
|
+
const ghBin = resolveBin('gh', config, env);
|
|
71
|
+
const gh = { name: 'gh', required: true, found: false, path: ghBin, version: null, warning: null, hint: null };
|
|
72
|
+
if (ghBin) {
|
|
73
|
+
const v = tryExec(ghBin, ['--version']);
|
|
74
|
+
gh.found = v.ok;
|
|
75
|
+
gh.version = v.ok ? firstLine(v.out) : null;
|
|
76
|
+
const warnings = [];
|
|
77
|
+
const semver = gh.version?.match(/(\d+\.\d+\.\d+)/)?.[1];
|
|
78
|
+
if (semver && cmpVersions(semver, GH_MIN_VERSION) < 0) {
|
|
79
|
+
warnings.push(`gh ${semver} < ${GH_MIN_VERSION} — el minado pr_diff fallará (campo baseRefOid); actualiza gh`);
|
|
80
|
+
}
|
|
81
|
+
if (v.ok && !tryExec(ghBin, ['auth', 'token']).ok) {
|
|
82
|
+
warnings.push('gh sin autenticar — repos privados no funcionarán');
|
|
83
|
+
}
|
|
84
|
+
if (warnings.length) gh.warning = warnings.join(' · ');
|
|
85
|
+
}
|
|
86
|
+
if (!gh.found) gh.hint = 'instala GitHub CLI (gh) o exporta GH_BIN=/ruta/al/binario';
|
|
87
|
+
checks.push(gh);
|
|
88
|
+
|
|
89
|
+
// harbor — --version con fallback a --help (versiones viejas no exponen --version)
|
|
90
|
+
const harborBin = resolveBin('harbor', config, env);
|
|
91
|
+
const harbor = { name: 'harbor', required: true, found: false, path: harborBin, version: null, warning: null, hint: null };
|
|
92
|
+
if (harborBin) {
|
|
93
|
+
const v = tryExec(harborBin, ['--version']);
|
|
94
|
+
if (v.ok) {
|
|
95
|
+
harbor.found = true;
|
|
96
|
+
harbor.version = firstLine(v.out);
|
|
97
|
+
} else {
|
|
98
|
+
harbor.found = tryExec(harborBin, ['--help']).ok;
|
|
99
|
+
}
|
|
100
|
+
}
|
|
101
|
+
if (!harbor.found) harbor.hint = 'instala harbor (uv tool install harbor — está en PyPI) o exporta HARBOR_BIN=/ruta/al/binario';
|
|
102
|
+
checks.push(harbor);
|
|
103
|
+
|
|
104
|
+
// repo2rlenv — --version con fallback a --help; avisa si la versión difiere
|
|
105
|
+
// de la probada (releases frecuentes upstream: validate + gate son la red real)
|
|
106
|
+
const r2eBin = resolveBin('repo2rlenv', config, env);
|
|
107
|
+
const r2e = { name: 'repo2rlenv', required: true, found: false, path: r2eBin, version: null, warning: null, hint: null };
|
|
108
|
+
if (r2eBin) {
|
|
109
|
+
const v = tryExec(r2eBin, ['--version']);
|
|
110
|
+
if (v.ok) {
|
|
111
|
+
r2e.found = true;
|
|
112
|
+
r2e.version = firstLine(v.out);
|
|
113
|
+
if (!r2e.version.includes(R2E_TESTED_VERSION)) {
|
|
114
|
+
r2e.warning = `versión distinta de la probada (${R2E_TESTED_VERSION}) — si \`run\` falla, prueba: uv tool install repo2rlenv==${R2E_TESTED_VERSION}`;
|
|
115
|
+
}
|
|
116
|
+
} else {
|
|
117
|
+
r2e.found = tryExec(r2eBin, ['--help']).ok;
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
if (!r2e.found) r2e.hint = 'instala repo2rlenv (uv tool install repo2rlenv — requiere Python >= 3.12) o exporta REPO2RLENV_BIN=/ruta/al/binario';
|
|
121
|
+
checks.push(r2e);
|
|
122
|
+
|
|
123
|
+
const ok = checks.every((c) => !c.required || c.found);
|
|
124
|
+
return { ok, checks };
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
export function doctorCli({ argv, cwd, env }) {
|
|
128
|
+
let json = false;
|
|
129
|
+
for (const a of argv) {
|
|
130
|
+
if (a === '--json') json = true;
|
|
131
|
+
else throw new UsageError(`doctor: argumento desconocido ${a}`);
|
|
132
|
+
}
|
|
133
|
+
|
|
134
|
+
const result = runDoctor({ cwd, env });
|
|
135
|
+
|
|
136
|
+
if (json) {
|
|
137
|
+
console.log(JSON.stringify(result, null, 2));
|
|
138
|
+
} else {
|
|
139
|
+
console.log('# dream-cycle doctor');
|
|
140
|
+
for (const c of result.checks) {
|
|
141
|
+
const status = c.found ? 'OK ' : 'FALTA';
|
|
142
|
+
const ver = c.version ? ` (${c.version})` : '';
|
|
143
|
+
console.log(` ${status} ${c.name.padEnd(12)} ${c.path ?? '-'}${ver}`);
|
|
144
|
+
if (c.warning) console.log(` aviso: ${c.warning}`);
|
|
145
|
+
if (c.hint) console.log(` sugerencia: ${c.hint}`);
|
|
146
|
+
}
|
|
147
|
+
console.log(result.ok ? 'todo listo' : 'faltan dependencias — el pipeline `run` no funcionará completo');
|
|
148
|
+
}
|
|
149
|
+
return result.ok ? 0 : 1;
|
|
150
|
+
}
|
package/lib/gate.mjs
ADDED
|
@@ -0,0 +1,289 @@
|
|
|
1
|
+
// gate.mjs — Gate de capacidad F2 (BLUEPRINT-008 §F2, ADR-002).
|
|
2
|
+
// Vendorizado de scripts/repo2rlenv-capability-harness.mjs @ 89641e3 (2026-06-11).
|
|
3
|
+
// El original NO se toca; sincronizar manualmente si diverge.
|
|
4
|
+
//
|
|
5
|
+
// Integración por artefactos (BLUEPRINT-008 §4.2): NO corre el agente ni calcula
|
|
6
|
+
// rewards — agrega los /logs/verifier/reward.json que el verifier de cada tarea
|
|
7
|
+
// escribió en un job Harbor terminado, los cruza contra el corpus versionado y
|
|
8
|
+
// emite result.json con gate.checks.
|
|
9
|
+
//
|
|
10
|
+
// Cambios respecto del original (y NADA más):
|
|
11
|
+
// (a) RUNS_DIR: env.RUNS_DIR → opts.runsDir → default del config
|
|
12
|
+
// (b) die() lanza UsageError en vez de process.exit(2)
|
|
13
|
+
// (c) main() → runGate(opts) que RETORNA el summary
|
|
14
|
+
// (d) sin auto-ejecución al final del módulo
|
|
15
|
+
//
|
|
16
|
+
// BENCH_JSON=1 solo JSON a stdout
|
|
17
|
+
// BENCH_NO_WRITE=1 no escribe archivos
|
|
18
|
+
//
|
|
19
|
+
// Targets (fijados ANTES de cualquier run — protocolo de honestidad):
|
|
20
|
+
// oracle: rewardMean ≥ 0.999 (REWARD_SCHEMA: "Gold patch → 1.0"), coverage = 1,
|
|
21
|
+
// resolvedRate = 1 sobre tareas runtime, 0 tareas no confiables.
|
|
22
|
+
// agent: coverage = 1 + target de reward DECLARADO por el invocador; sin
|
|
23
|
+
// targetRewardMean el gate se NIEGA a gatear (anti-fabricación).
|
|
24
|
+
|
|
25
|
+
import { readFileSync, writeFileSync, mkdirSync, readdirSync, statSync, existsSync } from 'node:fs';
|
|
26
|
+
import { createHash } from 'node:crypto';
|
|
27
|
+
import { join, resolve, basename, dirname, sep } from 'node:path';
|
|
28
|
+
import { performance } from 'node:perf_hooks';
|
|
29
|
+
import { loadConfig, resolvePaths, UsageError } from './config.mjs';
|
|
30
|
+
|
|
31
|
+
export const BENCH_NAME = 'repo2rlenv-capability';
|
|
32
|
+
// Cotas de calibración por construcción (no hay paper para un corpus propio):
|
|
33
|
+
const CALIBRATION = {
|
|
34
|
+
oracleExpected: 1.0,
|
|
35
|
+
nullAgentExpected: 0.0,
|
|
36
|
+
source: 'Repo2RLEnv docs/reference/REWARD_SCHEMA.md ("Gold patch -> 1.0") + BLUEPRINT-008 §2.1',
|
|
37
|
+
};
|
|
38
|
+
|
|
39
|
+
// ── CLI ──
|
|
40
|
+
export function parseGateArgs(argv) {
|
|
41
|
+
const a = { jobs: [], corpora: [], mode: null, targetRewardMean: null, targetCostUsd: null, agentLabel: null, exclude: [] };
|
|
42
|
+
for (let i = 0; i < argv.length; i++) {
|
|
43
|
+
const k = argv[i], v = argv[i + 1];
|
|
44
|
+
if (k === '--job') { a.jobs.push(resolve(v)); i++; }
|
|
45
|
+
else if (k === '--corpus') { a.corpora.push(resolve(v)); i++; }
|
|
46
|
+
else if (k === '--exclude') { a.exclude.push(v); i++; } // curación VISIBLE: queda en tasksExcluded del result
|
|
47
|
+
else if (k === '--mode') { a.mode = v; i++; }
|
|
48
|
+
else if (k === '--target-reward-mean') { a.targetRewardMean = Number(v); i++; }
|
|
49
|
+
else if (k === '--target-cost-usd') { a.targetCostUsd = Number(v); i++; }
|
|
50
|
+
else if (k === '--agent-label') { a.agentLabel = v; i++; }
|
|
51
|
+
else die(`argumento desconocido: ${k}`);
|
|
52
|
+
}
|
|
53
|
+
if (!a.jobs.length || !a.corpora.length) die('uso: --job <dir> --corpus <dir> --mode oracle|agent');
|
|
54
|
+
if (!['oracle', 'agent'].includes(a.mode)) die('--mode debe ser oracle|agent');
|
|
55
|
+
if (a.mode === 'agent' && !(a.targetRewardMean > 0))
|
|
56
|
+
die('modo agent sin --target-reward-mean: me niego a gatear sin un target declarado (anti-fabricación, ADR-002 pregunta 4)');
|
|
57
|
+
return a;
|
|
58
|
+
}
|
|
59
|
+
function die(msg) { throw new UsageError(`repo2rlenv-capability-harness: ${msg}`); }
|
|
60
|
+
|
|
61
|
+
// ── Corpus: task.toml → {id, contentHash, pipeline, rewardKind} ──
|
|
62
|
+
function walk(dir, found = []) {
|
|
63
|
+
for (const e of readdirSync(dir)) {
|
|
64
|
+
const p = join(dir, e);
|
|
65
|
+
const st = statSync(p);
|
|
66
|
+
if (st.isDirectory()) walk(p, found);
|
|
67
|
+
else found.push(p);
|
|
68
|
+
}
|
|
69
|
+
return found;
|
|
70
|
+
}
|
|
71
|
+
function tomlStr(src, key) {
|
|
72
|
+
const m = src.match(new RegExp(`^${key}\\s*=\\s*"([^"]*)"`, 'm'));
|
|
73
|
+
return m ? m[1] : null;
|
|
74
|
+
}
|
|
75
|
+
function loadCorpus(corpora) {
|
|
76
|
+
const tasks = [];
|
|
77
|
+
for (const dir of corpora) {
|
|
78
|
+
for (const f of walk(dir).filter((p) => basename(p) === 'task.toml')) {
|
|
79
|
+
const src = readFileSync(f, 'utf-8');
|
|
80
|
+
const name = tomlStr(src, 'name') ?? '';
|
|
81
|
+
const id = name.includes('/') ? name.split('/').pop() : basename(join(f, '..'));
|
|
82
|
+
tasks.push({
|
|
83
|
+
id,
|
|
84
|
+
dirBase: basename(join(f, '..')),
|
|
85
|
+
contentHash: tomlStr(src, 'content_hash'),
|
|
86
|
+
pipeline: tomlStr(src, 'pipeline'),
|
|
87
|
+
rewardKind: /test_execution/.test(src) ? 'test_execution' : 'diff_similarity',
|
|
88
|
+
});
|
|
89
|
+
}
|
|
90
|
+
}
|
|
91
|
+
if (!tasks.length) die('corpus vacío: ningún task.toml encontrado');
|
|
92
|
+
const dup = tasks.map((t) => t.id).filter((id, i, all) => all.indexOf(id) !== i);
|
|
93
|
+
if (dup.length) die(`ids de tarea duplicados en el corpus: ${[...new Set(dup)].join(', ')}`);
|
|
94
|
+
return tasks;
|
|
95
|
+
}
|
|
96
|
+
function corpusVersion(tasks) {
|
|
97
|
+
const hashes = tasks.map((t) => t.contentHash ?? `MISSING:${t.id}`).sort();
|
|
98
|
+
return 'sha256:' + createHash('sha256').update(hashes.join('\n')).digest('hex');
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
// ── Job Harbor: localizar reward.json/reward.txt por tarea ──
|
|
102
|
+
// Match por config.json del trial (task.path → dir original): Harbor TRUNCA el
|
|
103
|
+
// nombre del trial a 32 chars (wiki-138/wiki-139 colapsan a wiki-13__xxx), así
|
|
104
|
+
// que el nombre de directorio es ambiguo. config.json es la fuente no ambigua.
|
|
105
|
+
// Fallback a segMatches por compatibilidad con jobs viejos sin config legible.
|
|
106
|
+
function segMatches(seg, key) {
|
|
107
|
+
if (seg === key) return true;
|
|
108
|
+
if (!seg.startsWith(key)) return false;
|
|
109
|
+
return !/[a-zA-Z0-9]/.test(seg[key.length]); // borde: evita RuVector-41 ⊂ RuVector-419
|
|
110
|
+
}
|
|
111
|
+
function trialDirFor(rewardPath) {
|
|
112
|
+
// .../<trial>/verifier/reward.json → .../<trial>
|
|
113
|
+
return dirname(dirname(rewardPath));
|
|
114
|
+
}
|
|
115
|
+
function taskPathOfTrial(trialDir) {
|
|
116
|
+
const cfg = join(trialDir, 'config.json');
|
|
117
|
+
if (!existsSync(cfg)) return null;
|
|
118
|
+
try { return JSON.parse(readFileSync(cfg, 'utf-8'))?.task?.path ?? null; } catch { return null; }
|
|
119
|
+
}
|
|
120
|
+
function rewardsByTask(jobs, tasks) {
|
|
121
|
+
const files = jobs.flatMap((j) => walk(j)).filter((p) => /reward\.(json|txt)$/.test(basename(p)));
|
|
122
|
+
const byTask = new Map();
|
|
123
|
+
for (const t of tasks) {
|
|
124
|
+
const keys = [...new Set([t.id, t.dirBase])];
|
|
125
|
+
const mine = files.filter((p) => {
|
|
126
|
+
const tp = taskPathOfTrial(trialDirFor(p));
|
|
127
|
+
if (tp) return basename(tp) === t.dirBase; // match no ambiguo
|
|
128
|
+
return p.split(sep).some((seg) => keys.some((k) => segMatches(seg, k))); // fallback
|
|
129
|
+
});
|
|
130
|
+
const jsons = mine.filter((p) => p.endsWith('reward.json'));
|
|
131
|
+
byTask.set(t.id, jsons.length ? jsons : mine.filter((p) => p.endsWith('reward.txt')));
|
|
132
|
+
}
|
|
133
|
+
return byTask;
|
|
134
|
+
}
|
|
135
|
+
function readReward(path) {
|
|
136
|
+
if (path.endsWith('reward.txt')) return { reward: Number(readFileSync(path, 'utf-8').trim()) };
|
|
137
|
+
const r = JSON.parse(readFileSync(path, 'utf-8'));
|
|
138
|
+
// El verifier pr_diff desplegado escribe `final_reward`; REWARD_SCHEMA.md documenta
|
|
139
|
+
// `reward` (runtime). Aceptar ambos — sin esto el pr_diff agrega 0 en silencio.
|
|
140
|
+
if (r.reward === undefined && typeof r.final_reward === 'number') r.reward = r.final_reward;
|
|
141
|
+
return r;
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
// ── Agregación ──
|
|
145
|
+
function aggregate(tasks, byTask) {
|
|
146
|
+
const perTask = [];
|
|
147
|
+
const missing = [];
|
|
148
|
+
for (const t of tasks) {
|
|
149
|
+
const files = byTask.get(t.id);
|
|
150
|
+
if (!files.length) { missing.push(t.id); continue; }
|
|
151
|
+
const rewards = files.map(readReward);
|
|
152
|
+
const r = rewards[rewards.length - 1]; // múltiples trials: reporta el último, media en `reward`
|
|
153
|
+
const reward = rewards.reduce((s, x) => s + (x.reward ?? 0), 0) / rewards.length;
|
|
154
|
+
perTask.push({
|
|
155
|
+
task: t.id,
|
|
156
|
+
contentHash: t.contentHash,
|
|
157
|
+
rewardKind: t.rewardKind,
|
|
158
|
+
reward: Number(reward.toFixed(6)),
|
|
159
|
+
trials: rewards.length,
|
|
160
|
+
resolved: r.resolved ?? null,
|
|
161
|
+
judgeStatus: r.judge_status ?? null,
|
|
162
|
+
capped: r.capped ?? null,
|
|
163
|
+
parseStatus: r.parse_status ?? null,
|
|
164
|
+
evalTrustworthy: r.eval_trustworthy ?? (r.parse_status ? r.parse_status === 'ok' : null),
|
|
165
|
+
});
|
|
166
|
+
}
|
|
167
|
+
const n = perTask.length;
|
|
168
|
+
const sum = perTask.reduce((s, p) => s + p.reward, 0);
|
|
169
|
+
const runtime = perTask.filter((p) => p.rewardKind === 'test_execution');
|
|
170
|
+
const untrusted = perTask.filter((p) => p.evalTrustworthy === false);
|
|
171
|
+
return {
|
|
172
|
+
perTask,
|
|
173
|
+
missing,
|
|
174
|
+
metrics: {
|
|
175
|
+
reward: {
|
|
176
|
+
mean: n ? Number((sum / n).toFixed(6)) : null,
|
|
177
|
+
meanPessimistic: Number((sum / tasks.length).toFixed(6)), // faltantes cuentan 0
|
|
178
|
+
min: n ? Math.min(...perTask.map((p) => p.reward)) : null,
|
|
179
|
+
max: n ? Math.max(...perTask.map((p) => p.reward)) : null,
|
|
180
|
+
},
|
|
181
|
+
resolvedRate: runtime.length
|
|
182
|
+
? Number((runtime.filter((p) => p.resolved === true).length / runtime.length).toFixed(6))
|
|
183
|
+
: null,
|
|
184
|
+
coverage: Number((n / tasks.length).toFixed(6)),
|
|
185
|
+
untrustedCount: untrusted.length,
|
|
186
|
+
},
|
|
187
|
+
deterministic: perTask.every((p) => p.judgeStatus !== 'ok'), // ningún juez LLM contribuyó
|
|
188
|
+
judgeSummary: perTask.reduce((acc, p) => {
|
|
189
|
+
if (p.judgeStatus) acc[p.judgeStatus] = (acc[p.judgeStatus] ?? 0) + 1;
|
|
190
|
+
return acc;
|
|
191
|
+
}, {}),
|
|
192
|
+
};
|
|
193
|
+
}
|
|
194
|
+
|
|
195
|
+
// ── Gate (targets fijados arriba, antes de ver resultados) ──
|
|
196
|
+
function gateChecks(mode, m, args) {
|
|
197
|
+
const checks = [];
|
|
198
|
+
checks.push([`coverage >= 1`, m.coverage >= 1]);
|
|
199
|
+
if (mode === 'oracle') {
|
|
200
|
+
checks.push([`reward.mean >= 0.999 (oráculo: gold patch ⇒ 1.0)`, m.reward.mean !== null && m.reward.mean >= 0.999]);
|
|
201
|
+
if (m.resolvedRate !== null) checks.push([`resolvedRate >= 1 (runtime)`, m.resolvedRate >= 1]);
|
|
202
|
+
checks.push([`untrustedCount == 0`, m.untrustedCount === 0]);
|
|
203
|
+
} else {
|
|
204
|
+
checks.push([`reward.mean >= ${args.targetRewardMean} (target declarado)`,
|
|
205
|
+
m.reward.mean !== null && m.reward.mean >= args.targetRewardMean]);
|
|
206
|
+
}
|
|
207
|
+
return checks;
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
// ── Núcleo (ex-main): agrega, gatea y RETORNA el summary ──
|
|
211
|
+
export function runGate(opts) {
|
|
212
|
+
const t0 = performance.now();
|
|
213
|
+
const env = opts.env ?? process.env;
|
|
214
|
+
const cwd = opts.cwd ?? process.cwd();
|
|
215
|
+
const args = {
|
|
216
|
+
jobs: [], corpora: [], mode: null, targetRewardMean: null,
|
|
217
|
+
targetCostUsd: null, agentLabel: null, exclude: [], ...opts,
|
|
218
|
+
};
|
|
219
|
+
if (!args.jobs.length || !args.corpora.length) die('uso: --job <dir> --corpus <dir> --mode oracle|agent');
|
|
220
|
+
if (!['oracle', 'agent'].includes(args.mode)) die('--mode debe ser oracle|agent');
|
|
221
|
+
if (args.mode === 'agent' && !(args.targetRewardMean > 0))
|
|
222
|
+
die('modo agent sin --target-reward-mean: me niego a gatear sin un target declarado (anti-fabricación, ADR-002 pregunta 4)');
|
|
223
|
+
|
|
224
|
+
// Precedencia runs-dir: env.RUNS_DIR > opts.runsDir > default del config
|
|
225
|
+
const runsDir = env.RUNS_DIR
|
|
226
|
+
? resolve(cwd, env.RUNS_DIR)
|
|
227
|
+
: (opts.runsDir ?? resolvePaths(loadConfig(cwd).config, cwd).runsDir);
|
|
228
|
+
|
|
229
|
+
const allTasks = loadCorpus(args.corpora);
|
|
230
|
+
const unknownExcl = args.exclude.filter((e) => !allTasks.some((t) => t.id === e));
|
|
231
|
+
if (unknownExcl.length) die(`--exclude no matchea el corpus: ${unknownExcl.join(', ')}`);
|
|
232
|
+
const tasks = allTasks.filter((t) => !args.exclude.includes(t.id));
|
|
233
|
+
if (!tasks.length) die('todas las tareas excluidas');
|
|
234
|
+
const agg = aggregate(tasks, rewardsByTask(args.jobs, tasks));
|
|
235
|
+
const checks = gateChecks(args.mode, agg.metrics, args);
|
|
236
|
+
const passed = checks.filter(([, ok]) => ok).length;
|
|
237
|
+
|
|
238
|
+
const summary = {
|
|
239
|
+
runAt: new Date().toISOString(),
|
|
240
|
+
benchmark: BENCH_NAME,
|
|
241
|
+
mode: args.mode,
|
|
242
|
+
agent: args.agentLabel ?? (args.mode === 'oracle' ? 'oracle (gold patch)' : 'undeclared'),
|
|
243
|
+
corpusVersion: corpusVersion(tasks), // cambia si cambia el set efectivo (exclusiones incluidas)
|
|
244
|
+
corpusSize: tasks.length,
|
|
245
|
+
tasksExcluded: args.exclude,
|
|
246
|
+
corpora: args.corpora,
|
|
247
|
+
jobDirs: args.jobs,
|
|
248
|
+
calibration: CALIBRATION,
|
|
249
|
+
metrics: agg.metrics,
|
|
250
|
+
deterministic: agg.deterministic,
|
|
251
|
+
judgeSummary: agg.judgeSummary,
|
|
252
|
+
costUsd: null, // eje bench: el job dir del oráculo no trae costo/latencia por tarea — declarado, no inventado
|
|
253
|
+
latencyMs: { harness: Number((performance.now() - t0).toFixed(2)), perTask: null },
|
|
254
|
+
perTask: agg.perTask,
|
|
255
|
+
tasksMissing: agg.missing,
|
|
256
|
+
gate: { passed, total: checks.length, allPass: passed === checks.length,
|
|
257
|
+
checks: checks.map(([label, ok]) => ({ label, ok })) },
|
|
258
|
+
};
|
|
259
|
+
|
|
260
|
+
if (!opts.quiet) {
|
|
261
|
+
if (opts.json || env.BENCH_JSON) console.log(JSON.stringify(summary, null, 2));
|
|
262
|
+
else {
|
|
263
|
+
console.log(`# ${BENCH_NAME} | mode: ${args.mode} | agent: ${summary.agent} | corpus ${summary.corpusVersion.slice(0, 18)}… (${tasks.length} tareas)`);
|
|
264
|
+
for (const [l, ok] of checks) console.log(` ${ok ? 'PASS' : 'FAIL'} ${l}`);
|
|
265
|
+
console.log(`reward.mean=${agg.metrics.reward.mean} resolvedRate=${agg.metrics.resolvedRate} coverage=${agg.metrics.coverage} | ${passed}/${checks.length} targets`);
|
|
266
|
+
if (agg.missing.length) console.log(`tareas sin reward en el job: ${agg.missing.join(', ')}`);
|
|
267
|
+
}
|
|
268
|
+
}
|
|
269
|
+
|
|
270
|
+
if (!(opts.noWrite || env.BENCH_NO_WRITE)) {
|
|
271
|
+
mkdirSync(runsDir, { recursive: true });
|
|
272
|
+
const stamp = summary.runAt.replace(/[:.]/g, '-');
|
|
273
|
+
writeFileSync(join(runsDir, `${BENCH_NAME}-${stamp}.json`), JSON.stringify(summary, null, 2));
|
|
274
|
+
writeFileSync(join(runsDir, `${BENCH_NAME}-latest.json`), JSON.stringify(summary, null, 2));
|
|
275
|
+
}
|
|
276
|
+
return summary;
|
|
277
|
+
}
|
|
278
|
+
|
|
279
|
+
// gate NO es subcomando del bin — gateCli existe solo para uso programático
|
|
280
|
+
export function gateCli({ argv, cwd, env }) {
|
|
281
|
+
let summary;
|
|
282
|
+
try {
|
|
283
|
+
summary = runGate({ ...parseGateArgs(argv), cwd, env });
|
|
284
|
+
} catch (e) {
|
|
285
|
+
if (e instanceof UsageError) { console.error(e.message); return 2; }
|
|
286
|
+
throw e;
|
|
287
|
+
}
|
|
288
|
+
return summary.gate.allPass ? 0 : 1;
|
|
289
|
+
}
|
package/lib/harden.mjs
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
// harden.mjs — endurecimiento de Dockerfiles para repos privados.
|
|
2
|
+
// Port del patch python de scripts/dream-cycle.sh (líneas 51-63), hallazgo
|
|
3
|
+
// upstream 2026-06-11: Repo2RLEnv quita el token de la URL remota ANTES del
|
|
4
|
+
// fetch/reset; con --filter=blob:none en repo privado los blobs lazy dan
|
|
5
|
+
// 401 → exit 128. Fix: mover el set-url (que quita el token) a después del
|
|
6
|
+
// reset. Idempotente: si el patrón ya no está, se salta el archivo.
|
|
7
|
+
|
|
8
|
+
import { readFileSync, writeFileSync, readdirSync, existsSync, statSync } from 'node:fs';
|
|
9
|
+
import { join } from 'node:path';
|
|
10
|
+
|
|
11
|
+
export function hardenDockerfiles(corpusDir, log = console.log) {
|
|
12
|
+
const hardened = [];
|
|
13
|
+
if (!existsSync(corpusDir)) return hardened;
|
|
14
|
+
|
|
15
|
+
for (const task of readdirSync(corpusDir)) {
|
|
16
|
+
const dockerfile = join(corpusDir, task, 'environment', 'Dockerfile');
|
|
17
|
+
let st;
|
|
18
|
+
try { st = statSync(dockerfile); } catch { continue; }
|
|
19
|
+
if (!st.isFile()) continue;
|
|
20
|
+
|
|
21
|
+
let s = readFileSync(dockerfile, 'utf-8');
|
|
22
|
+
const m = s.match(/ \\\n && git -C \/workspace remote set-url origin (\S+)\n/);
|
|
23
|
+
if (!m) continue; // ya parcheado o formato distinto (idempotencia)
|
|
24
|
+
|
|
25
|
+
const url = m[1];
|
|
26
|
+
s = s.replace(m[0], '\n');
|
|
27
|
+
// replacer-FUNCIÓN obligatoria: la URL puede contener `$` y un string de
|
|
28
|
+
// reemplazo lo interpretaría como patrón especial ($&, $1, ...)
|
|
29
|
+
s = s.replace(
|
|
30
|
+
/(RUN git reset --hard \S+ \\\n && git clean [^\n]+)\n/,
|
|
31
|
+
(_, g1) => g1 + ' \\\n && git remote set-url origin ' + url + '\n',
|
|
32
|
+
);
|
|
33
|
+
writeFileSync(dockerfile, s);
|
|
34
|
+
hardened.push(task);
|
|
35
|
+
log(` endurecido: ${task}`);
|
|
36
|
+
}
|
|
37
|
+
return hardened;
|
|
38
|
+
}
|
package/lib/init.mjs
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
// init.mjs — escribe .dream-cycle/config.json idempotente.
|
|
2
|
+
//
|
|
3
|
+
// init NO es prerequisito (todo funciona con DEFAULT_CONFIG); solo materializa
|
|
4
|
+
// el esqueleto para quien quiera personalizar paths/bins/defaults. Idempotente:
|
|
5
|
+
// jamás pisa un config.json ni un .gitignore existentes.
|
|
6
|
+
|
|
7
|
+
import { existsSync, mkdirSync, writeFileSync } from 'node:fs';
|
|
8
|
+
import { join } from 'node:path';
|
|
9
|
+
import { DEFAULT_CONFIG, UsageError } from './config.mjs';
|
|
10
|
+
|
|
11
|
+
export function runInit({ cwd }) {
|
|
12
|
+
const base = join(cwd, '.dream-cycle');
|
|
13
|
+
const configPath = join(base, 'config.json');
|
|
14
|
+
|
|
15
|
+
// dirs de trabajo (mkdir recursivo es idempotente por sí mismo)
|
|
16
|
+
for (const d of [base, join(base, 'corpus'), join(base, 'jobs'), join(base, 'runs')]) {
|
|
17
|
+
mkdirSync(d, { recursive: true });
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
let created = false;
|
|
21
|
+
if (!existsSync(configPath)) {
|
|
22
|
+
writeFileSync(configPath, JSON.stringify(DEFAULT_CONFIG, null, 2) + '\n');
|
|
23
|
+
created = true;
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
// corpus/jobs/runs son artefactos locales — fuera del control de versiones
|
|
27
|
+
const gitignorePath = join(base, '.gitignore');
|
|
28
|
+
if (!existsSync(gitignorePath)) {
|
|
29
|
+
writeFileSync(gitignorePath, 'corpus/\njobs/\nruns/\n');
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
return { configPath, created };
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
export function initCli({ argv, cwd }) {
|
|
36
|
+
for (const a of argv) throw new UsageError(`init: argumento desconocido ${a}`);
|
|
37
|
+
const { configPath, created } = runInit({ cwd });
|
|
38
|
+
if (created) console.log(`config creada: ${configPath}`);
|
|
39
|
+
else console.log(`config existente, sin cambios: ${configPath}`);
|
|
40
|
+
return 0;
|
|
41
|
+
}
|
package/lib/radar.mjs
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
// radar.mjs — Dream Cycle v0: radar REACTIVO sobre los result.json del gate.
|
|
2
|
+
// Vendorizado de scripts/dream-cycle-radar.mjs @ 89641e3 (2026-06-11).
|
|
3
|
+
// El original NO se toca; sincronizar manualmente si diverge.
|
|
4
|
+
//
|
|
5
|
+
// Decisión (memoria dream-cycle-feasibility, 2026-06-10): nada de cron nocturno ni
|
|
6
|
+
// módulo; el radar es la comparación entre corridas del gate de capacidad. Detecta
|
|
7
|
+
// y DECLARA gaps — no decide qué construir (ADR-001 "Deliberately NOT #3": el paso
|
|
8
|
+
// 3 es humano; el radar propone, el humano dispone).
|
|
9
|
+
//
|
|
10
|
+
// Series comparables = mismo (benchmark, corpusVersion, mode, agent). Sobre cada una:
|
|
11
|
+
// regression reward.mean cae > epsilon vs run anterior
|
|
12
|
+
// stagnation ≥3 runs sin mejora y bajo target (si hay target declarado)
|
|
13
|
+
// capability_gap último run bajo target → lista tareas con reward < 0.5 (estilo #2156)
|
|
14
|
+
// coverage_gap tasksMissing no vacío en el último run
|
|
15
|
+
// trust_gap untrustedCount > 0 en el último run
|
|
16
|
+
//
|
|
17
|
+
// Salida: reporte JSON (+ borrador de issue por gap) y exit 1 si hay gaps (señal),
|
|
18
|
+
// 0 si todo limpio. El humano convierte borradores en issues; el radar nunca los abre.
|
|
19
|
+
//
|
|
20
|
+
// Cambio ÚNICO respecto del original: el default de --runs-dir sale del config
|
|
21
|
+
// (.dream-cycle/runs) en vez de docs/research-flow/runs.
|
|
22
|
+
// RADAR_JSON=1 / RADAR_NO_WRITE=1
|
|
23
|
+
|
|
24
|
+
import { readFileSync, writeFileSync, mkdirSync, readdirSync } from 'node:fs';
|
|
25
|
+
import { join, resolve } from 'node:path';
|
|
26
|
+
import { loadConfig, resolvePaths, UsageError } from './config.mjs';
|
|
27
|
+
|
|
28
|
+
// ── Núcleo analítico (exportado para tests) ──
|
|
29
|
+
export function analyzeSeries(runs, { epsilon, target }) {
|
|
30
|
+
const gaps = [];
|
|
31
|
+
const sorted = [...runs].sort((a, b) => a.runAt.localeCompare(b.runAt));
|
|
32
|
+
const last = sorted[sorted.length - 1];
|
|
33
|
+
const id = `${last.benchmark} · ${(last.corpusVersion ?? '').slice(0, 18)} · ${last.mode}/${last.agent}`;
|
|
34
|
+
|
|
35
|
+
for (let i = 1; i < sorted.length; i++) {
|
|
36
|
+
const prev = sorted[i - 1].metrics?.reward?.mean;
|
|
37
|
+
const curr = sorted[i].metrics?.reward?.mean;
|
|
38
|
+
if (prev == null || curr == null) continue;
|
|
39
|
+
const delta = Number((curr - prev).toFixed(6));
|
|
40
|
+
if (delta < -epsilon) {
|
|
41
|
+
gaps.push({
|
|
42
|
+
type: 'regression', series: id, runAt: sorted[i].runAt, delta,
|
|
43
|
+
from: prev, to: curr,
|
|
44
|
+
issueDraft: `[radar] Regresión de capacidad en ${id}: reward.mean ${prev} → ${curr} (Δ ${delta}) entre ${sorted[i - 1].runAt} y ${sorted[i].runAt}. Revisar el diff de ese intervalo y re-correr el gate.`,
|
|
45
|
+
});
|
|
46
|
+
}
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
if (target != null) {
|
|
50
|
+
const lastMean = last.metrics?.reward?.mean;
|
|
51
|
+
if (lastMean != null && lastMean < target) {
|
|
52
|
+
const failingTasks = (last.perTask ?? []).filter((p) => (p.reward ?? 0) < 0.5).map((p) => p.task);
|
|
53
|
+
gaps.push({
|
|
54
|
+
type: 'capability_gap', series: id, runAt: last.runAt, mean: lastMean, target, failingTasks,
|
|
55
|
+
issueDraft: `[radar] Gap de capacidad en ${id}: reward.mean ${lastMean} < target ${target}. Tareas que fallan: ${failingTasks.join(', ') || '(ver perTask)'}. Candidato a tema de minado (Repo2RLEnv) o a patrón faltante en el bank.`,
|
|
56
|
+
});
|
|
57
|
+
}
|
|
58
|
+
const tail = sorted.slice(-3);
|
|
59
|
+
if (tail.length === 3 && tail.every((r) => (r.metrics?.reward?.mean ?? 1) < target)
|
|
60
|
+
&& tail[2].metrics.reward.mean <= tail[0].metrics.reward.mean) {
|
|
61
|
+
gaps.push({
|
|
62
|
+
type: 'stagnation', series: id, runAt: last.runAt,
|
|
63
|
+
means: tail.map((r) => r.metrics.reward.mean),
|
|
64
|
+
issueDraft: `[radar] Estancamiento en ${id}: 3 runs consecutivos bajo target ${target} sin mejora (${tail.map((r) => r.metrics.reward.mean).join(' → ')}).`,
|
|
65
|
+
});
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
if ((last.tasksMissing ?? []).length) {
|
|
70
|
+
gaps.push({
|
|
71
|
+
type: 'coverage_gap', series: id, runAt: last.runAt, tasksMissing: last.tasksMissing,
|
|
72
|
+
issueDraft: `[radar] Cobertura incompleta en ${id}: sin reward para ${last.tasksMissing.join(', ')}. El run no es comparable hasta cubrir el corpus completo.`,
|
|
73
|
+
});
|
|
74
|
+
}
|
|
75
|
+
if ((last.metrics?.untrustedCount ?? 0) > 0) {
|
|
76
|
+
gaps.push({
|
|
77
|
+
type: 'trust_gap', series: id, runAt: last.runAt, untrustedCount: last.metrics.untrustedCount,
|
|
78
|
+
issueDraft: `[radar] ${last.metrics.untrustedCount} tarea(s) con eval no confiable (parse fallback) en ${id}: el reward de esas tareas no significa nada — arreglar el parser/runner antes de comparar.`,
|
|
79
|
+
});
|
|
80
|
+
}
|
|
81
|
+
return gaps;
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
// ── Reporte (lectura pura: no escribe nada en disco) ──
|
|
85
|
+
export function radarReport({ runsDir, benchmark, epsilon, target }) {
|
|
86
|
+
const files = readdirSync(runsDir)
|
|
87
|
+
.filter((f) => f.startsWith(benchmark + '-') && f.endsWith('.json') && !f.includes('latest'));
|
|
88
|
+
const runs = files.map((f) => {
|
|
89
|
+
try { return JSON.parse(readFileSync(join(runsDir, f), 'utf-8')); } catch { return null; }
|
|
90
|
+
}).filter((r) => r && r.runAt && r.metrics);
|
|
91
|
+
|
|
92
|
+
const bySeries = new Map();
|
|
93
|
+
for (const r of runs) {
|
|
94
|
+
const key = `${r.corpusVersion}|${r.mode}|${r.agent}`;
|
|
95
|
+
if (!bySeries.has(key)) bySeries.set(key, []);
|
|
96
|
+
bySeries.get(key).push(r);
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
const gaps = [];
|
|
100
|
+
for (const series of bySeries.values()) gaps.push(...analyzeSeries(series, { epsilon, target }));
|
|
101
|
+
|
|
102
|
+
const report = {
|
|
103
|
+
runAt: new Date().toISOString(),
|
|
104
|
+
radar: 'dream-cycle-v0-reactive',
|
|
105
|
+
runsDir,
|
|
106
|
+
benchmark,
|
|
107
|
+
params: { epsilon, target },
|
|
108
|
+
seriesAnalyzed: bySeries.size,
|
|
109
|
+
runsAnalyzed: runs.length,
|
|
110
|
+
gaps,
|
|
111
|
+
note: 'El radar propone, no decide (ADR-001): los issueDraft requieren revisión humana antes de abrirse.',
|
|
112
|
+
};
|
|
113
|
+
return { report, hasGaps: gaps.length > 0 };
|
|
114
|
+
}
|
|
115
|
+
|
|
116
|
+
// ── CLI ──
|
|
117
|
+
export function radarCli({ argv, cwd, env }) {
|
|
118
|
+
const { config } = loadConfig(cwd);
|
|
119
|
+
const args = {
|
|
120
|
+
runsDir: resolvePaths(config, cwd).runsDir,
|
|
121
|
+
benchmark: 'repo2rlenv-capability', epsilon: 0.01, target: null,
|
|
122
|
+
};
|
|
123
|
+
for (let i = 0; i < argv.length; i++) {
|
|
124
|
+
const k = argv[i], v = argv[i + 1];
|
|
125
|
+
if (k === '--runs-dir') { args.runsDir = resolve(cwd, v); i++; }
|
|
126
|
+
else if (k === '--benchmark') { args.benchmark = v; i++; }
|
|
127
|
+
else if (k === '--epsilon') { args.epsilon = Number(v); i++; }
|
|
128
|
+
else if (k === '--target') { args.target = Number(v); i++; }
|
|
129
|
+
else throw new UsageError(`radar: argumento desconocido ${k}`);
|
|
130
|
+
}
|
|
131
|
+
|
|
132
|
+
const { report, hasGaps } = radarReport(args);
|
|
133
|
+
|
|
134
|
+
if (env.RADAR_JSON) console.log(JSON.stringify(report, null, 2));
|
|
135
|
+
else {
|
|
136
|
+
console.log(`# dream-cycle radar | ${report.seriesAnalyzed} series, ${report.runsAnalyzed} runs`);
|
|
137
|
+
if (!report.gaps.length) console.log(' sin gaps — todo dentro de baseline');
|
|
138
|
+
for (const g of report.gaps) console.log(` [${g.type}] ${g.issueDraft}`);
|
|
139
|
+
}
|
|
140
|
+
if (!env.RADAR_NO_WRITE) {
|
|
141
|
+
mkdirSync(args.runsDir, { recursive: true });
|
|
142
|
+
writeFileSync(join(args.runsDir, 'dream-cycle-radar-latest.json'), JSON.stringify(report, null, 2));
|
|
143
|
+
}
|
|
144
|
+
return hasGaps ? 1 : 0; // exit 1 = hay gaps (señal, no error)
|
|
145
|
+
}
|
package/lib/run.mjs
ADDED
|
@@ -0,0 +1,206 @@
|
|
|
1
|
+
// run.mjs — pipeline completo del Dream Cycle v0 contra CUALQUIER repo público
|
|
2
|
+
// (port de scripts/dream-cycle.sh @ 89641e3, BLUEPRINT-008 post-F4).
|
|
3
|
+
//
|
|
4
|
+
// Cadena completa, repo-agnóstica (pipeline pr_diff: sin bootstrap Docker pesado,
|
|
5
|
+
// funciona para cualquier lenguaje):
|
|
6
|
+
// 1. SINTETIZAR repo2rlenv generate pr_diff (post-cutoff, sin LLM)
|
|
7
|
+
// 2. CALIBRAR harbor oracle (gold ⇒ ~1.0) + nop (nulo ⇒ 0.0)
|
|
8
|
+
// 3. CURAR tareas inevaluables (oracle < umbral) → exclusiones DECLARADAS
|
|
9
|
+
// 4. GATE modo oracle (PASS esperado) y nop (debe FALLAR — falsificación)
|
|
10
|
+
// 5. RADAR radar reactivo sobre el historial de runs
|
|
11
|
+
//
|
|
12
|
+
// El radar propone; el humano decide qué gap atacar (ADR-001 Deliberately-NOT-#3).
|
|
13
|
+
// Divergencia deliberada vs el .sh: sin prepend de ~/.local/tools/gh-latest al
|
|
14
|
+
// PATH — el sustituto es GH_BIN o config.bins.gh.
|
|
15
|
+
//
|
|
16
|
+
// Exit codes: 0 éxito · 1 subproceso/genérico · 2 uso · 3 corpus insuficiente ·
|
|
17
|
+
// 4 gate oracle falló · 5 nop pasó (el corpus no discrimina)
|
|
18
|
+
|
|
19
|
+
import { existsSync, readdirSync, statSync, mkdtempSync, writeFileSync, rmSync, mkdirSync } from 'node:fs';
|
|
20
|
+
import { spawnSync } from 'node:child_process';
|
|
21
|
+
import { tmpdir } from 'node:os';
|
|
22
|
+
import { join } from 'node:path';
|
|
23
|
+
import { loadConfig, resolvePaths, resolveBin, UsageError, PipelineError } from './config.mjs';
|
|
24
|
+
import { hardenDockerfiles } from './harden.mjs';
|
|
25
|
+
import { runGate } from './gate.mjs';
|
|
26
|
+
import { radarReport } from './radar.mjs';
|
|
27
|
+
|
|
28
|
+
// Subdirectorios de primer nivel (equivalente al `ls -d */` del bash)
|
|
29
|
+
function subdirs(dir) {
|
|
30
|
+
if (!existsSync(dir)) return [];
|
|
31
|
+
return readdirSync(dir).filter((e) => {
|
|
32
|
+
try { return statSync(join(dir, e)).isDirectory(); } catch { return false; }
|
|
33
|
+
});
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
export async function runPipeline({ repo, limit, since, cwd, env, log = console.log }) {
|
|
37
|
+
const { config } = loadConfig(cwd);
|
|
38
|
+
const defaults = config.defaults;
|
|
39
|
+
limit = limit ?? defaults.limit;
|
|
40
|
+
since = since ?? defaults.since; // guardarraíl de contaminación: solo PRs post-cutoff
|
|
41
|
+
const minTasks = defaults.minTasks;
|
|
42
|
+
|
|
43
|
+
const { corpusDir, jobsDir, runsDir } = resolvePaths(config, cwd);
|
|
44
|
+
const slug = repo.replace('/', '__');
|
|
45
|
+
const corpus = join(corpusDir, `dc-${slug}`);
|
|
46
|
+
const oracleJob = join(jobsDir, `dc-${slug}-oracle`);
|
|
47
|
+
const nopJob = join(jobsDir, `dc-${slug}-nop`);
|
|
48
|
+
|
|
49
|
+
// OLLAMA_API_KEY no se usa (pr_diff no necesita LLM) pero harbor lo exige presente
|
|
50
|
+
const childEnv = {
|
|
51
|
+
...env,
|
|
52
|
+
DOCKER_HOST: env.DOCKER_HOST ?? 'unix:///var/run/docker.sock',
|
|
53
|
+
OLLAMA_API_KEY: env.OLLAMA_API_KEY ?? 'dummy',
|
|
54
|
+
};
|
|
55
|
+
|
|
56
|
+
const r2eBin = resolveBin('repo2rlenv', config, env);
|
|
57
|
+
const harborBin = resolveBin('harbor', config, env);
|
|
58
|
+
const ghBin = resolveBin('gh', config, env);
|
|
59
|
+
if (!r2eBin || !harborBin) {
|
|
60
|
+
const missing = [!r2eBin && 'repo2rlenv', !harborBin && 'harbor'].filter(Boolean).join(', ');
|
|
61
|
+
throw new PipelineError(`faltan binarios: ${missing} — corre \`dream-cycle doctor\` para el diagnóstico completo`, 1);
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
log(`== dream-cycle v0 :: ${repo} (limit=${limit} since=${since}) ==`);
|
|
65
|
+
|
|
66
|
+
// 1. SINTETIZAR (idempotente: salta si el corpus ya existe con tareas)
|
|
67
|
+
const hasTasks = subdirs(corpus).some((d) => existsSync(join(corpus, d, 'task.toml')));
|
|
68
|
+
if (existsSync(corpus) && hasTasks) {
|
|
69
|
+
log(`-- corpus existente: ${corpus} (saltando generate)`);
|
|
70
|
+
} else {
|
|
71
|
+
log(`-- generate pr_diff → ${corpus}`);
|
|
72
|
+
const gen = spawnSync(r2eBin, [
|
|
73
|
+
'generate', '--repo', repo, '--pipeline', 'pr_diff', '--out', corpus,
|
|
74
|
+
'--pipeline-opt', `limit=${limit}`, '--pipeline-opt', `since=${since}`,
|
|
75
|
+
], { stdio: 'inherit', env: childEnv });
|
|
76
|
+
if (gen.status !== 0) throw new PipelineError('repo2rlenv generate falló', 1);
|
|
77
|
+
}
|
|
78
|
+
const nTasks = subdirs(corpus).length;
|
|
79
|
+
if (nTasks < minTasks) {
|
|
80
|
+
throw new PipelineError(`corpus insuficiente (${nTasks} tareas) — repo sin PRs minables post-${since}`, 3);
|
|
81
|
+
}
|
|
82
|
+
const val = spawnSync(r2eBin, ['validate', corpus], { stdio: ['ignore', 'ignore', 'inherit'], env: childEnv });
|
|
83
|
+
if (val.status !== 0) throw new PipelineError('repo2rlenv validate falló', 1);
|
|
84
|
+
log(`-- validate OK (${nTasks} tareas)`);
|
|
85
|
+
|
|
86
|
+
// 1b. ENDURECER Dockerfiles para repos privados (hallazgo upstream 2026-06-11)
|
|
87
|
+
hardenDockerfiles(corpus, log);
|
|
88
|
+
|
|
89
|
+
// 2. CALIBRAR (idempotente: salta si el job ya existe)
|
|
90
|
+
// Repos privados: el Dockerfile emitido clona con ARG GITHUB_TOKEN — se lo
|
|
91
|
+
// alimentamos vía overlay de compose + token de gh. El token JAMÁS toca disco:
|
|
92
|
+
// el overlay solo lleva el placeholder ${GITHUB_TOKEN} y compose interpola
|
|
93
|
+
// desde el entorno. Nota: un build-arg queda visible en docker history de la
|
|
94
|
+
// imagen local; aceptable para runs locales, no publicar esas imágenes.
|
|
95
|
+
const harborExtra = [];
|
|
96
|
+
let overlayTmp = null;
|
|
97
|
+
try {
|
|
98
|
+
if (ghBin) {
|
|
99
|
+
const tok = spawnSync(ghBin, ['auth', 'token'], { encoding: 'utf-8', env: childEnv });
|
|
100
|
+
if (tok.status === 0 && tok.stdout && tok.stdout.trim()) {
|
|
101
|
+
childEnv.GITHUB_TOKEN = tok.stdout.trim();
|
|
102
|
+
overlayTmp = mkdtempSync(join(tmpdir(), 'dream-cycle-'));
|
|
103
|
+
const overlayPath = join(overlayTmp, 'overlay.yaml');
|
|
104
|
+
writeFileSync(overlayPath, 'services:\n main:\n build:\n args:\n GITHUB_TOKEN: ${GITHUB_TOKEN}\n');
|
|
105
|
+
harborExtra.push('--extra-docker-compose', overlayPath);
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
for (const agent of ['oracle', 'nop']) {
|
|
109
|
+
const jobDir = agent === 'oracle' ? oracleJob : nopJob;
|
|
110
|
+
if (existsSync(jobDir)) {
|
|
111
|
+
log(`-- job ${agent} existente (saltando)`);
|
|
112
|
+
continue;
|
|
113
|
+
}
|
|
114
|
+
log(`-- harbor run -a ${agent} ...`);
|
|
115
|
+
spawnSync(harborBin, ['run', '-p', corpus, '-a', agent, '--env', 'docker', '-o', jobDir, ...harborExtra],
|
|
116
|
+
{ stdio: 'ignore', env: childEnv });
|
|
117
|
+
// exit code IGNORADO (el `|| true` del bash): los trials pueden figurar
|
|
118
|
+
// 'failed' por la incompat pydantic conocida; los artefactos del verifier
|
|
119
|
+
// son la fuente de verdad (memoria harbor-repo2rlenv-schema-incompat)
|
|
120
|
+
}
|
|
121
|
+
} finally {
|
|
122
|
+
if (overlayTmp) rmSync(overlayTmp, { recursive: true, force: true });
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
// 3. CURAR: tareas sin reward o con oracle < umbral → exclusiones DECLARADAS
|
|
126
|
+
let exclusions = [];
|
|
127
|
+
try {
|
|
128
|
+
const s = runGate({
|
|
129
|
+
jobs: [oracleJob], corpora: [corpus], mode: 'oracle', exclude: [],
|
|
130
|
+
noWrite: true, quiet: true, env: childEnv,
|
|
131
|
+
});
|
|
132
|
+
exclusions = [
|
|
133
|
+
...s.perTask.filter((p) => p.reward < defaults.oracleThreshold).map((p) => p.task),
|
|
134
|
+
...s.tasksMissing,
|
|
135
|
+
];
|
|
136
|
+
} catch {
|
|
137
|
+
exclusions = []; // gate de curación ilegible → sin exclusiones (el gate final decidirá)
|
|
138
|
+
}
|
|
139
|
+
log(`-- curación: ${exclusions.length} tareas inevaluables declaradas`);
|
|
140
|
+
const nEval = nTasks - exclusions.length;
|
|
141
|
+
if (nEval < minTasks) throw new PipelineError(`tras curar quedan ${nEval} tareas — corpus no útil`, 3);
|
|
142
|
+
|
|
143
|
+
// 4. GATE: oracle debe PASAR (este run SÍ escribe result.json al runsDir), nop debe FALLAR
|
|
144
|
+
log('-- gate oracle (esperado: PASS)');
|
|
145
|
+
const oracleSummary = runGate({
|
|
146
|
+
jobs: [oracleJob], corpora: [corpus], mode: 'oracle', exclude: exclusions,
|
|
147
|
+
runsDir, env: childEnv,
|
|
148
|
+
});
|
|
149
|
+
if (!oracleSummary.gate.allPass) throw new PipelineError('gate oracle FALLÓ tras curación — investigar', 4);
|
|
150
|
+
|
|
151
|
+
log('-- gate nop (esperado: FAIL — el corpus discrimina)');
|
|
152
|
+
let nopPassed = false;
|
|
153
|
+
try {
|
|
154
|
+
const nop = runGate({
|
|
155
|
+
jobs: [nopJob], corpora: [corpus], mode: 'oracle', exclude: exclusions,
|
|
156
|
+
noWrite: true, quiet: true, env: childEnv,
|
|
157
|
+
});
|
|
158
|
+
nopPassed = nop.gate.allPass === true;
|
|
159
|
+
} catch {
|
|
160
|
+
nopPassed = false; // error del gate nop = "no pasó" = falsificación OK
|
|
161
|
+
}
|
|
162
|
+
if (nopPassed) throw new PipelineError('el gate PASÓ con el agente nulo — el corpus NO discrimina', 5);
|
|
163
|
+
log('-- falsificación OK');
|
|
164
|
+
|
|
165
|
+
// 5. RADAR (gap = señal para el humano, NO error: no propaga al exit code)
|
|
166
|
+
log('-- radar');
|
|
167
|
+
const { report } = radarReport({
|
|
168
|
+
runsDir, benchmark: defaults.benchmark, epsilon: defaults.epsilon, target: defaults.target,
|
|
169
|
+
});
|
|
170
|
+
log(`# dream-cycle radar | ${report.seriesAnalyzed} series, ${report.runsAnalyzed} runs`);
|
|
171
|
+
if (!report.gaps.length) log(' sin gaps — todo dentro de baseline');
|
|
172
|
+
for (const g of report.gaps) log(` [${g.type}] ${g.issueDraft}`);
|
|
173
|
+
mkdirSync(runsDir, { recursive: true });
|
|
174
|
+
writeFileSync(join(runsDir, 'dream-cycle-radar-latest.json'), JSON.stringify(report, null, 2));
|
|
175
|
+
|
|
176
|
+
log(`== dream-cycle v0 completo para ${repo} ==`);
|
|
177
|
+
log(` corpus: ${corpus} (${nEval} evaluables) · gate: ${join(runsDir, 'repo2rlenv-capability-latest.json')}`);
|
|
178
|
+
log(` brazo agente (opcional, suscripción): harbor run -p ${corpus} -a claude-code -m <model> \\`);
|
|
179
|
+
log(` --ae CLAUDE_CODE_OAUTH_TOKEN=<token> --ae CLAUDE_FORCE_OAUTH=1 --env docker -o ${join(jobsDir, `dc-${slug}-agent`)}`);
|
|
180
|
+
}
|
|
181
|
+
|
|
182
|
+
export async function runCli({ argv, cwd, env }) {
|
|
183
|
+
let repo = null, limit = null, since = null;
|
|
184
|
+
for (let i = 0; i < argv.length; i++) {
|
|
185
|
+
const k = argv[i];
|
|
186
|
+
if (k === '--limit') { limit = Number(argv[++i]); }
|
|
187
|
+
else if (k === '--since') { since = argv[++i]; }
|
|
188
|
+
else if (k.startsWith('--')) throw new UsageError(`run: argumento desconocido ${k}`);
|
|
189
|
+
else if (repo === null) repo = k;
|
|
190
|
+
else throw new UsageError(`run: argumento inesperado ${k}`);
|
|
191
|
+
}
|
|
192
|
+
if (!repo || !/^[^\/\s]+\/[^\/\s]+$/.test(repo)) {
|
|
193
|
+
throw new UsageError('uso: dream-cycle run <owner/repo> [--limit N] [--since YYYY-MM-DD]');
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
try {
|
|
197
|
+
await runPipeline({ repo, limit, since, cwd, env });
|
|
198
|
+
return 0;
|
|
199
|
+
} catch (e) {
|
|
200
|
+
if (e instanceof PipelineError) {
|
|
201
|
+
console.error(`!! ${e.message}`);
|
|
202
|
+
return e.code;
|
|
203
|
+
}
|
|
204
|
+
throw e;
|
|
205
|
+
}
|
|
206
|
+
}
|
package/package.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "dream-cycle",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Dream Cycle v0: synthesize a post-cutoff eval corpus from any GitHub repo (repo2rlenv pr_diff), calibrate with oracle/nop agents via harbor, gate with falsification, and run a reactive capability radar. The radar proposes; the human decides.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"license": "MIT",
|
|
7
|
+
"bin": { "dream-cycle": "bin/dream-cycle.mjs" },
|
|
8
|
+
"files": ["bin", "lib", "README.md"],
|
|
9
|
+
"engines": { "node": ">=20" },
|
|
10
|
+
"scripts": { "test": "node --test tests/*.test.mjs", "prepublishOnly": "npm test" },
|
|
11
|
+
"repository": { "type": "git", "url": "git+https://github.com/DarkCodePE/investigador.git", "directory": "packages/dream-cycle" },
|
|
12
|
+
"keywords": ["eval", "benchmark", "rl-environment", "harbor", "repo2rlenv", "capability-gate", "agent-evaluation"],
|
|
13
|
+
"publishConfig": { "access": "public", "tag": "latest" }
|
|
14
|
+
}
|