rlhf-feedback-loop 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +52 -271
- package/adapters/mcp/server-stdio.js +81 -1
- package/bin/cli.js +301 -87
- package/config/mcp-allowlists.json +2 -0
- package/package.json +24 -6
- package/scripts/code-reasoning.js +1 -0
package/README.md
CHANGED
|
@@ -1,308 +1,89 @@
|
|
|
1
1
|
# RLHF Feedback Loop
|
|
2
2
|
|
|
3
3
|
[](https://github.com/IgorGanapolsky/rlhf-feedback-loop/actions/workflows/ci.yml)
|
|
4
|
-
[](https://www.npmjs.com/package/rlhf-feedback-loop)
|
|
5
5
|
[](LICENSE)
|
|
6
6
|
[](adapters/mcp/server-stdio.js)
|
|
7
7
|
[](scripts/export-dpo-pairs.js)
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
**Make your AI agent learn from mistakes.** Capture thumbs up/down feedback, block repeated failures, and export DPO training data — across ChatGPT, Claude, Codex, Gemini, and Amp.
|
|
10
10
|
|
|
11
|
-
##
|
|
11
|
+
## Architecture
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+

|
|
14
14
|
|
|
15
|
-
|
|
15
|
+

|
|
16
16
|
|
|
17
|
-
|
|
18
|
-
npx rlhf-feedback-loop init
|
|
19
|
-
node .rlhf/capture-feedback.js --feedback=up --context="test"
|
|
20
|
-
```
|
|
17
|
+
## Get Started
|
|
21
18
|
|
|
22
|
-
|
|
19
|
+
One command. Works with any MCP-compatible agent:
|
|
23
20
|
|
|
24
21
|
```bash
|
|
25
|
-
|
|
22
|
+
claude mcp add rlhf -- npx -y rlhf-feedback-loop serve
|
|
26
23
|
```
|
|
27
24
|
|
|
28
|
-
Full guide: [plugins/claude-skill/INSTALL.md](plugins/claude-skill/INSTALL.md)
|
|
29
|
-
|
|
30
|
-
### Codex
|
|
31
|
-
|
|
32
25
|
```bash
|
|
33
|
-
|
|
26
|
+
codex mcp add rlhf -- npx -y rlhf-feedback-loop serve
|
|
34
27
|
```
|
|
35
28
|
|
|
36
|
-
Full guide: [plugins/codex-profile/INSTALL.md](plugins/codex-profile/INSTALL.md)
|
|
37
|
-
|
|
38
|
-
### Gemini
|
|
39
|
-
|
|
40
29
|
```bash
|
|
41
|
-
|
|
30
|
+
gemini mcp add rlhf -- npx -y rlhf-feedback-loop serve
|
|
42
31
|
```
|
|
43
32
|
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
### Amp
|
|
47
|
-
|
|
48
|
-
```bash
|
|
49
|
-
cp plugins/amp-skill/SKILL.md .amp/skills/rlhf-feedback.md
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
Full guide: [plugins/amp-skill/INSTALL.md](plugins/amp-skill/INSTALL.md)
|
|
53
|
-
|
|
54
|
-
### ChatGPT (GPT Actions)
|
|
55
|
-
|
|
56
|
-
Import `adapters/chatgpt/openapi.yaml` in the GPT Builder Actions editor.
|
|
57
|
-
|
|
58
|
-
Full guide: [adapters/chatgpt/INSTALL.md](adapters/chatgpt/INSTALL.md)
|
|
59
|
-
|
|
60
|
-
---
|
|
61
|
-
|
|
62
|
-
## Value Proposition
|
|
63
|
-
|
|
64
|
-
Most teams collect feedback but do not convert it into reliable behavior change.
|
|
65
|
-
This project gives you a working loop:
|
|
66
|
-
|
|
67
|
-
1. Capture thumbs up/down with context.
|
|
68
|
-
2. Score outcomes with weighted rubrics and objective guardrails.
|
|
69
|
-
3. Promote only schema-valid, rubric-eligible memories.
|
|
70
|
-
4. Generate prevention rules from repeated mistakes and failed rubric dimensions.
|
|
71
|
-
5. Export DPO-ready preference pairs with rubric deltas.
|
|
72
|
-
6. Construct bounded context packs (constructor/loader/evaluator).
|
|
73
|
-
7. Reuse the same core through API + MCP wrappers.
|
|
74
|
-
8. Route intents through policy bundles with human checkpoints on high-risk actions.
|
|
75
|
-
|
|
76
|
-
## Pricing
|
|
77
|
-
|
|
78
|
-
| Plan | Price | What you get |
|
|
79
|
-
|------|-------|-------------|
|
|
80
|
-
| **Open Source** | $0 forever | Full source, self-hosted, MIT license, 314+ tests, 5-platform plugins |
|
|
81
|
-
| **Cloud Pro** | $49/mo | Hosted HTTPS API on Railway, provisioned API key on payment, usage metering, email support |
|
|
82
|
-
|
|
83
|
-
Get Cloud Pro: see the [landing page](docs/landing-page.html) or go straight to Stripe Checkout.
|
|
84
|
-
|
|
85
|
-
---
|
|
86
|
-
|
|
87
|
-
## Quick Start
|
|
88
|
-
|
|
89
|
-
```bash
|
|
90
|
-
cp .env.example .env
|
|
91
|
-
npm test
|
|
92
|
-
npm run prove:adapters
|
|
93
|
-
npm run prove:automation
|
|
94
|
-
npm run start:api
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
Set `RLHF_API_KEY` before running the API (or explicitly set `RLHF_ALLOW_INSECURE=true` for isolated local testing only).
|
|
98
|
-
|
|
99
|
-
Capture feedback:
|
|
100
|
-
|
|
101
|
-
```bash
|
|
102
|
-
node .claude/scripts/feedback/capture-feedback.js \
|
|
103
|
-
--feedback=down \
|
|
104
|
-
--context="Claimed done without test evidence" \
|
|
105
|
-
--what-went-wrong="No proof attached" \
|
|
106
|
-
--what-to-change="Always run tests and include output" \
|
|
107
|
-
--tags="verification,testing"
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
## Integration Adapters
|
|
111
|
-
|
|
112
|
-
- ChatGPT Actions: `adapters/chatgpt/openapi.yaml`
|
|
113
|
-
- Claude MCP: `adapters/claude/.mcp.json`
|
|
114
|
-
- Codex MCP: `adapters/codex/config.toml`
|
|
115
|
-
- Gemini tools: `adapters/gemini/function-declarations.json`
|
|
116
|
-
- Amp skill: `adapters/amp/skills/rlhf-feedback/SKILL.md`
|
|
117
|
-
|
|
118
|
-
## API Surface
|
|
119
|
-
|
|
120
|
-
- `POST /v1/feedback/capture`
|
|
121
|
-
- `GET /v1/feedback/stats`
|
|
122
|
-
- `GET /v1/intents/catalog`
|
|
123
|
-
- `POST /v1/intents/plan`
|
|
124
|
-
- `GET /v1/feedback/summary`
|
|
125
|
-
- `POST /v1/feedback/rules`
|
|
126
|
-
- `POST /v1/dpo/export`
|
|
127
|
-
- `POST /v1/context/construct`
|
|
128
|
-
- `POST /v1/context/evaluate`
|
|
129
|
-
- `GET /v1/context/provenance`
|
|
130
|
-
|
|
131
|
-
Spec: `openapi/openapi.yaml`
|
|
132
|
-
|
|
133
|
-
## Versioning
|
|
134
|
-
|
|
135
|
-
- Package/runtime release version: `package.json`
|
|
136
|
-
- API contract version: `openapi/openapi.yaml`
|
|
137
|
-
- MCP server protocol version: `adapters/mcp/server-stdio.js` `serverInfo.version`
|
|
138
|
-
|
|
139
|
-
## ContextFS
|
|
33
|
+
That's it. Your agent can now capture feedback, recall past learnings mid-conversation, and block repeated mistakes.
|
|
140
34
|
|
|
141
|
-
|
|
142
|
-
- Constructor: relevance-ranked context pack assembly
|
|
143
|
-
- Loader: strict `maxItems` + `maxChars` budgeting
|
|
144
|
-
- Evaluator: outcome/provenance logging for improvement loops
|
|
145
|
-
|
|
146
|
-
Docs: [docs/CONTEXTFS.md](docs/CONTEXTFS.md)
|
|
147
|
-
|
|
148
|
-
## MCP Policy Profiles
|
|
149
|
-
|
|
150
|
-
Use least-privilege MCP profiles based on runtime risk:
|
|
151
|
-
|
|
152
|
-
- `default`: full local toolset
|
|
153
|
-
- `readonly`: read-heavy operations
|
|
154
|
-
- `locked`: summary-only constrained mode
|
|
155
|
-
|
|
156
|
-
Config: [config/mcp-allowlists.json](config/mcp-allowlists.json)
|
|
157
|
-
|
|
158
|
-
## Rubric Engine
|
|
159
|
-
|
|
160
|
-
Rubric config: `config/rubrics/default-v1.json`
|
|
161
|
-
|
|
162
|
-
- Weighted criteria scoring (`1-5`)
|
|
163
|
-
- Multi-judge disagreement detection
|
|
164
|
-
- Objective guardrail checks (`testsPassed`, `pathSafety`, `budgetCompliant`)
|
|
165
|
-
- Promotion gate blocks positive memory writes on unsafe/high-disagreement signals
|
|
166
|
-
|
|
167
|
-
## Intent Router
|
|
168
|
-
|
|
169
|
-
Versioned orchestration bundles define intent-to-action plans and checkpoint policy:
|
|
170
|
-
|
|
171
|
-
- Bundle configs: `config/policy-bundles/*.json`
|
|
172
|
-
- CLI list: `npm run intents:list`
|
|
173
|
-
- CLI plan: `npm run intents:plan`
|
|
174
|
-
|
|
175
|
-
The router marks high-risk intents as `checkpoint_required` unless explicitly approved.
|
|
176
|
-
Details: [docs/INTENT_ROUTER.md](docs/INTENT_ROUTER.md)
|
|
177
|
-
|
|
178
|
-
## Autonomous GitOps
|
|
179
|
-
|
|
180
|
-
The repo now ships with PR-gated autonomous operations:
|
|
181
|
-
|
|
182
|
-
- `CI` (`.github/workflows/ci.yml`): required quality gate (`npm test`, adapter proof, automation proof)
|
|
183
|
-
- `Agent PR Auto-Merge` (`.github/workflows/agent-automerge.yml`): auto-merges eligible agent branches (`claude/*`, `codex/*`, `auto/*`, `agent/*`) after required checks pass
|
|
184
|
-
- `Dependabot Auto-Merge` (`.github/workflows/dependabot-automerge.yml`): auto-approves and merges safe dependency updates after required checks pass
|
|
185
|
-
- `Self-Healing Monitor` (`.github/workflows/self-healing-monitor.yml`): scheduled health checks, auto-created alert issue on failure, remediation PR generation when fixable
|
|
186
|
-
- `Self-Healing Auto-Fix` (`.github/workflows/self-healing-auto-fix.yml`): scheduled safe-fix attempts that open remediation PRs
|
|
187
|
-
- `Merge Branch to Main` (`.github/workflows/merge-branch.yml`): manual fallback that still uses PR flow and branch protections
|
|
188
|
-
|
|
189
|
-
Required repo settings:
|
|
190
|
-
|
|
191
|
-
- `main` protected + required check(s)
|
|
192
|
-
- auto-merge enabled
|
|
193
|
-
- branch deletion on merge enabled
|
|
194
|
-
|
|
195
|
-
Secrets:
|
|
196
|
-
|
|
197
|
-
- Required: `GH_PAT` (or rely on `GITHUB_TOKEN` where permitted)
|
|
198
|
-
- Optional: `SENTRY_AUTH_TOKEN`, `SENTRY_DSN`
|
|
199
|
-
- Optional (LLM router): `LLM_GATEWAY_BASE_URL`, `LLM_GATEWAY_API_KEY`, `TETRATE_API_KEY`
|
|
200
|
-
|
|
201
|
-
Sync helper:
|
|
35
|
+
Or install via npm for CLI and programmatic use:
|
|
202
36
|
|
|
203
37
|
```bash
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
## Architecture
|
|
208
|
-
|
|
209
|
-
### RLHF Feedback Loop
|
|
210
|
-
|
|
211
|
-
```mermaid
|
|
212
|
-
flowchart TD
|
|
213
|
-
A["👍/👎 User Feedback"] --> B["Capture Layer\n(context + tags)"]
|
|
214
|
-
B --> C{"Action Resolver"}
|
|
215
|
-
C -->|store-learning| D["Schema Validator"]
|
|
216
|
-
C -->|store-mistake| D
|
|
217
|
-
C -->|no-action| X["Discard"]
|
|
218
|
-
D -->|valid| E["Memory Store\n(learning / error)"]
|
|
219
|
-
D -->|invalid| X
|
|
220
|
-
E --> F["Analytics\n(trends + recurrence)"]
|
|
221
|
-
F --> G["Prevention Rules Engine"]
|
|
222
|
-
F --> H["DPO Export\n(prompt/chosen/rejected)"]
|
|
223
|
-
E --> I["Rubric Engine\n(weighted scoring + guardrails)"]
|
|
224
|
-
I -->|promotion gate| E
|
|
38
|
+
npm install rlhf-feedback-loop
|
|
39
|
+
npx rlhf-feedback-loop init
|
|
225
40
|
```
|
|
226
41
|
|
|
227
|
-
|
|
42
|
+
## How It Works
|
|
228
43
|
|
|
229
|
-
```mermaid
|
|
230
|
-
flowchart LR
|
|
231
|
-
subgraph Adapters
|
|
232
|
-
GPT["ChatGPT\n(GPT Actions)"]
|
|
233
|
-
CL["Claude\n(MCP Server)"]
|
|
234
|
-
CX["Codex\n(MCP Config)"]
|
|
235
|
-
GEM["Gemini\n(Function Calling)"]
|
|
236
|
-
AMP["Amp\n(Skills Template)"]
|
|
237
|
-
end
|
|
238
|
-
|
|
239
|
-
subgraph Core["RLHF Feedback API"]
|
|
240
|
-
SV["Schema Validation"]
|
|
241
|
-
PR["Prevention Rules"]
|
|
242
|
-
DPO["DPO Export"]
|
|
243
|
-
BG["Budget Guard\n($10/mo cap)"]
|
|
244
|
-
end
|
|
245
|
-
|
|
246
|
-
GPT <--> Core
|
|
247
|
-
CL <--> Core
|
|
248
|
-
CX <--> Core
|
|
249
|
-
GEM <--> Core
|
|
250
|
-
AMP <--> Core
|
|
251
44
|
```
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
45
|
+
Thumbs up/down
|
|
46
|
+
|
|
|
47
|
+
v
|
|
48
|
+
Capture → JSONL log
|
|
49
|
+
|
|
|
50
|
+
v
|
|
51
|
+
Rubric engine (block false positives)
|
|
52
|
+
|
|
|
53
|
+
+---+---+
|
|
54
|
+
| |
|
|
55
|
+
Good Bad
|
|
56
|
+
| |
|
|
57
|
+
v v
|
|
58
|
+
Learn Prevention rule
|
|
59
|
+
| |
|
|
60
|
+
v v
|
|
61
|
+
LanceDB ShieldCortex
|
|
62
|
+
vectors context packs
|
|
63
|
+
|
|
|
64
|
+
v
|
|
65
|
+
DPO export → fine-tune your model
|
|
260
66
|
```
|
|
261
67
|
|
|
262
|
-
|
|
263
|
-
Verification evidence: [docs/VERIFICATION_EVIDENCE.md](docs/VERIFICATION_EVIDENCE.md)
|
|
264
|
-
Compatibility proof artifacts: [proof/compatibility/report.md](proof/compatibility/report.md), [proof/compatibility/report.json](proof/compatibility/report.json)
|
|
265
|
-
Automation proof artifacts: [proof/automation/report.md](proof/automation/report.md), [proof/automation/report.json](proof/automation/report.json)
|
|
266
|
-
|
|
267
|
-
## Budget Guardrail
|
|
268
|
-
|
|
269
|
-
Default monthly cap is `$10` for paid external operations.
|
|
270
|
-
The local budget ledger blocks additional spend if cap would be exceeded.
|
|
271
|
-
|
|
272
|
-
## Semantic Cache (Cost + Latency)
|
|
273
|
-
|
|
274
|
-
Context pack construction now supports semantic cache reuse for similar queries:
|
|
275
|
-
|
|
276
|
-
- token-overlap (Jaccard) similarity gate
|
|
277
|
-
- TTL-bound cache entries
|
|
278
|
-
- full provenance (`context_pack_cache_hit`)
|
|
279
|
-
|
|
280
|
-
Environment toggles:
|
|
281
|
-
|
|
282
|
-
- `RLHF_SEMANTIC_CACHE_ENABLED=true|false` (default `true`)
|
|
283
|
-
- `RLHF_SEMANTIC_CACHE_THRESHOLD=0.7`
|
|
284
|
-
- `RLHF_SEMANTIC_CACHE_TTL_SECONDS=86400`
|
|
285
|
-
|
|
286
|
-
This directly reduces repeated retrieval/LLM context assembly work and improves response latency under budget constraints.
|
|
287
|
-
|
|
288
|
-
## Optional Tetrate Router
|
|
68
|
+
All data stored locally as **JSONL** files — fully transparent, fully portable, no vendor lock-in. **LanceDB** indexes memories as vector embeddings for semantic search. **ShieldCortex** assembles context packs so your agent starts each task informed.
|
|
289
69
|
|
|
290
|
-
|
|
291
|
-
Recommended only when routing paid LLM calls (PaperBanana, external judges, hosted control-plane features):
|
|
70
|
+
## Why This Exists
|
|
292
71
|
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
-
|
|
72
|
+
| Problem | What this does |
|
|
73
|
+
|---------|---------------|
|
|
74
|
+
| Agent keeps making the same mistake | Prevention rules auto-generated from repeated failures |
|
|
75
|
+
| Agent claims "done" without proof | Rubric engine blocks positive feedback without test evidence |
|
|
76
|
+
| Feedback collected but never used | DPO pairs exported for actual model fine-tuning |
|
|
77
|
+
| Different tools, different formats | One MCP server works across 5 platforms |
|
|
78
|
+
| Agent starts every task blank | In-session recall injects past learnings into current conversation |
|
|
296
79
|
|
|
297
|
-
##
|
|
80
|
+
## Deep Dive
|
|
298
81
|
|
|
299
|
-
-
|
|
300
|
-
-
|
|
301
|
-
-
|
|
82
|
+
- [API Reference](openapi/openapi.yaml) — full OpenAPI spec
|
|
83
|
+
- [Context Engine](docs/CONTEXTFS.md) — multi-agent memory orchestration
|
|
84
|
+
- [Autonomous GitOps](docs/AUTONOMOUS_GITOPS.md) — self-healing CI/CD
|
|
85
|
+
- [Contributing](CONTRIBUTING.md)
|
|
302
86
|
|
|
303
|
-
|
|
87
|
+
## License
|
|
304
88
|
|
|
305
|
-
|
|
306
|
-
- [docs/PLATFORM_RESEARCH_2026-03-03.md](docs/PLATFORM_RESEARCH_2026-03-03.md)
|
|
307
|
-
- [docs/PLUGIN_DISTRIBUTION.md](docs/PLUGIN_DISTRIBUTION.md)
|
|
308
|
-
- [docs/AUTONOMOUS_GITOPS.md](docs/AUTONOMOUS_GITOPS.md)
|
|
89
|
+
MIT. See [LICENSE](LICENSE).
|
|
@@ -31,6 +31,9 @@ const {
|
|
|
31
31
|
getAllowedTools,
|
|
32
32
|
assertToolAllowed,
|
|
33
33
|
} = require('../../scripts/mcp-policy');
|
|
34
|
+
const {
|
|
35
|
+
searchSimilar,
|
|
36
|
+
} = require('../../scripts/vector-store');
|
|
34
37
|
|
|
35
38
|
const SERVER_INFO = {
|
|
36
39
|
name: 'rlhf-feedback-loop-mcp',
|
|
@@ -212,6 +215,18 @@ const TOOLS = [
|
|
|
212
215
|
},
|
|
213
216
|
},
|
|
214
217
|
},
|
|
218
|
+
{
|
|
219
|
+
name: 'recall',
|
|
220
|
+
description: 'Recall relevant past feedback, memories, and prevention rules for the current task. Call this at the start of any task to inject past learnings into the conversation.',
|
|
221
|
+
inputSchema: {
|
|
222
|
+
type: 'object',
|
|
223
|
+
required: ['query'],
|
|
224
|
+
properties: {
|
|
225
|
+
query: { type: 'string', description: 'Describe the current task or context to find relevant past feedback' },
|
|
226
|
+
limit: { type: 'number', description: 'Max memories to return (default 5)' },
|
|
227
|
+
},
|
|
228
|
+
},
|
|
229
|
+
},
|
|
215
230
|
];
|
|
216
231
|
|
|
217
232
|
function toText(result) {
|
|
@@ -237,6 +252,56 @@ function parseOptionalObject(input, name) {
|
|
|
237
252
|
async function callTool(name, args = {}) {
|
|
238
253
|
assertToolAllowed(name, getActiveMcpProfile());
|
|
239
254
|
|
|
255
|
+
if (name === 'recall') {
|
|
256
|
+
const query = args.query || '';
|
|
257
|
+
const limit = Number(args.limit || 5);
|
|
258
|
+
const parts = [];
|
|
259
|
+
|
|
260
|
+
// 1. Vector search for similar past feedback
|
|
261
|
+
try {
|
|
262
|
+
const similar = await searchSimilar(query, limit);
|
|
263
|
+
if (similar.length > 0) {
|
|
264
|
+
parts.push('## Relevant Past Feedback\n');
|
|
265
|
+
for (const mem of similar) {
|
|
266
|
+
const signal = mem.signal === 'positive' ? 'GOOD' : 'BAD';
|
|
267
|
+
parts.push(`**[${signal}]** ${mem.context}`);
|
|
268
|
+
if (mem.tags) parts.push(` Tags: ${mem.tags}`);
|
|
269
|
+
parts.push('');
|
|
270
|
+
}
|
|
271
|
+
}
|
|
272
|
+
} catch (_) {
|
|
273
|
+
// Vector store may not be initialized yet — fall back to JSONL
|
|
274
|
+
}
|
|
275
|
+
|
|
276
|
+
// 2. Load prevention rules
|
|
277
|
+
try {
|
|
278
|
+
const rulesPath = path.join(SAFE_DATA_DIR, 'prevention-rules.md');
|
|
279
|
+
if (fs.existsSync(rulesPath)) {
|
|
280
|
+
const rules = fs.readFileSync(rulesPath, 'utf8').trim();
|
|
281
|
+
if (rules.length > 50) {
|
|
282
|
+
parts.push('## Active Prevention Rules\n');
|
|
283
|
+
parts.push(rules);
|
|
284
|
+
parts.push('');
|
|
285
|
+
}
|
|
286
|
+
}
|
|
287
|
+
} catch (_) {}
|
|
288
|
+
|
|
289
|
+
// 3. Recent feedback summary
|
|
290
|
+
try {
|
|
291
|
+
const summary = feedbackSummary(10);
|
|
292
|
+
if (summary) {
|
|
293
|
+
parts.push('## Recent Feedback Summary\n');
|
|
294
|
+
parts.push(summary);
|
|
295
|
+
}
|
|
296
|
+
} catch (_) {}
|
|
297
|
+
|
|
298
|
+
const text = parts.length > 0
|
|
299
|
+
? parts.join('\n')
|
|
300
|
+
: 'No past feedback found. This appears to be a fresh start.';
|
|
301
|
+
|
|
302
|
+
return { content: [{ type: 'text', text }] };
|
|
303
|
+
}
|
|
304
|
+
|
|
240
305
|
if (name === 'capture_feedback') {
|
|
241
306
|
const result = captureFeedback({
|
|
242
307
|
signal: args.signal,
|
|
@@ -249,7 +314,22 @@ async function callTool(name, args = {}) {
|
|
|
249
314
|
tags: args.tags || [],
|
|
250
315
|
skill: args.skill,
|
|
251
316
|
});
|
|
252
|
-
|
|
317
|
+
|
|
318
|
+
// Auto-recall: after capturing, return relevant context so the agent
|
|
319
|
+
// can immediately adjust behavior based on past learnings
|
|
320
|
+
let recallText = '';
|
|
321
|
+
try {
|
|
322
|
+
const similar = await searchSimilar(args.context || '', 3);
|
|
323
|
+
if (similar.length > 0) {
|
|
324
|
+
recallText = '\n\n---\n## Related Past Feedback (auto-recall)\n';
|
|
325
|
+
for (const mem of similar) {
|
|
326
|
+
const signal = mem.signal === 'positive' ? 'GOOD' : 'BAD';
|
|
327
|
+
recallText += `- **[${signal}]** ${mem.context}\n`;
|
|
328
|
+
}
|
|
329
|
+
}
|
|
330
|
+
} catch (_) {}
|
|
331
|
+
|
|
332
|
+
return { content: [{ type: 'text', text: toText(result) + recallText }] };
|
|
253
333
|
}
|
|
254
334
|
|
|
255
335
|
if (name === 'feedback_summary') {
|
package/bin/cli.js
CHANGED
|
@@ -3,23 +3,131 @@
|
|
|
3
3
|
* rlhf-feedback-loop CLI
|
|
4
4
|
*
|
|
5
5
|
* Usage:
|
|
6
|
-
* npx rlhf-feedback-loop init
|
|
7
|
-
*
|
|
8
|
-
*
|
|
6
|
+
* npx rlhf-feedback-loop init # scaffold .rlhf/ config + .mcp.json
|
|
7
|
+
* npx rlhf-feedback-loop capture # capture feedback
|
|
8
|
+
* npx rlhf-feedback-loop export-dpo # export DPO training pairs
|
|
9
|
+
* npx rlhf-feedback-loop stats # feedback analytics
|
|
10
|
+
* npx rlhf-feedback-loop rules # generate prevention rules
|
|
11
|
+
* npx rlhf-feedback-loop self-heal # run self-healing check + fix
|
|
12
|
+
* npx rlhf-feedback-loop prove # run proof harness
|
|
13
|
+
* npx rlhf-feedback-loop start-api # start HTTPS API server
|
|
9
14
|
*/
|
|
10
15
|
|
|
11
16
|
'use strict';
|
|
12
17
|
|
|
13
18
|
const fs = require('fs');
|
|
14
19
|
const path = require('path');
|
|
20
|
+
const { execSync } = require('child_process');
|
|
15
21
|
|
|
16
22
|
const COMMAND = process.argv[2];
|
|
17
23
|
const CWD = process.cwd();
|
|
24
|
+
const PKG_ROOT = path.join(__dirname, '..');
|
|
25
|
+
|
|
26
|
+
function parseArgs(argv) {
|
|
27
|
+
const args = {};
|
|
28
|
+
argv.forEach((arg) => {
|
|
29
|
+
if (!arg.startsWith('--')) return;
|
|
30
|
+
const [key, ...rest] = arg.slice(2).split('=');
|
|
31
|
+
args[key] = rest.length ? rest.join('=') : true;
|
|
32
|
+
});
|
|
33
|
+
return args;
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
function pkgVersion() {
|
|
37
|
+
const pkg = JSON.parse(fs.readFileSync(path.join(PKG_ROOT, 'package.json'), 'utf8'));
|
|
38
|
+
return pkg.version;
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
// --- Platform auto-detection helpers ---
|
|
42
|
+
|
|
43
|
+
const HOME = process.env.HOME || process.env.USERPROFILE || '';
|
|
44
|
+
const MCP_SERVER_ENTRY = {
|
|
45
|
+
command: 'node',
|
|
46
|
+
args: [path.relative(CWD, path.join(PKG_ROOT, 'adapters', 'mcp', 'server-stdio.js'))],
|
|
47
|
+
};
|
|
48
|
+
|
|
49
|
+
function mergeMcpJson(filePath, label) {
|
|
50
|
+
if (!fs.existsSync(filePath)) {
|
|
51
|
+
const dir = path.dirname(filePath);
|
|
52
|
+
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
|
|
53
|
+
fs.writeFileSync(filePath, JSON.stringify({ mcpServers: { 'rlhf-feedback-loop': MCP_SERVER_ENTRY } }, null, 2) + '\n');
|
|
54
|
+
console.log(` ${label}: wrote ${path.relative(CWD, filePath)}`);
|
|
55
|
+
return true;
|
|
56
|
+
}
|
|
57
|
+
const existing = JSON.parse(fs.readFileSync(filePath, 'utf8'));
|
|
58
|
+
if (existing.mcpServers && existing.mcpServers['rlhf-feedback-loop']) return false;
|
|
59
|
+
existing.mcpServers = existing.mcpServers || {};
|
|
60
|
+
existing.mcpServers['rlhf-feedback-loop'] = MCP_SERVER_ENTRY;
|
|
61
|
+
fs.writeFileSync(filePath, JSON.stringify(existing, null, 2) + '\n');
|
|
62
|
+
console.log(` ${label}: updated ${path.relative(CWD, filePath)}`);
|
|
63
|
+
return true;
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
function detectPlatform(name, checks) {
|
|
67
|
+
for (const check of checks) {
|
|
68
|
+
try { if (check()) return true; } catch (_) {}
|
|
69
|
+
}
|
|
70
|
+
return false;
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
function whichExists(cmd) {
|
|
74
|
+
try { execSync(`which ${cmd}`, { stdio: 'pipe' }); return true; } catch (_) { return false; }
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
function setupClaude() {
|
|
78
|
+
return mergeMcpJson(path.join(CWD, '.mcp.json'), 'Claude Code');
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
function setupCodex() {
|
|
82
|
+
const configPath = path.join(HOME, '.codex', 'config.toml');
|
|
83
|
+
const block = `\n[mcp_servers.rlhf_feedback_loop]\ncommand = "node"\nargs = ["${MCP_SERVER_ENTRY.args[0]}"]\n`;
|
|
84
|
+
if (!fs.existsSync(configPath)) {
|
|
85
|
+
fs.mkdirSync(path.dirname(configPath), { recursive: true });
|
|
86
|
+
fs.writeFileSync(configPath, block);
|
|
87
|
+
console.log(' Codex: created ~/.codex/config.toml');
|
|
88
|
+
return true;
|
|
89
|
+
}
|
|
90
|
+
const content = fs.readFileSync(configPath, 'utf8');
|
|
91
|
+
if (content.includes('[mcp_servers.rlhf_feedback_loop]')) return false;
|
|
92
|
+
fs.appendFileSync(configPath, block);
|
|
93
|
+
console.log(' Codex: appended MCP server to ~/.codex/config.toml');
|
|
94
|
+
return true;
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
function setupGemini() {
|
|
98
|
+
const settingsPath = path.join(HOME, '.gemini', 'settings.json');
|
|
99
|
+
if (fs.existsSync(settingsPath)) {
|
|
100
|
+
const settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8'));
|
|
101
|
+
if (settings.mcpServers && settings.mcpServers['rlhf-feedback-loop']) return false;
|
|
102
|
+
settings.mcpServers = settings.mcpServers || {};
|
|
103
|
+
settings.mcpServers['rlhf-feedback-loop'] = MCP_SERVER_ENTRY;
|
|
104
|
+
fs.writeFileSync(settingsPath, JSON.stringify(settings, null, 2) + '\n');
|
|
105
|
+
console.log(' Gemini: updated ~/.gemini/settings.json');
|
|
106
|
+
return true;
|
|
107
|
+
}
|
|
108
|
+
// Fallback: project-level .gemini/settings.json
|
|
109
|
+
return mergeMcpJson(path.join(CWD, '.gemini', 'settings.json'), 'Gemini');
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
function setupAmp() {
|
|
113
|
+
const skillDir = path.join(CWD, '.amp', 'skills', 'rlhf-feedback');
|
|
114
|
+
const destPath = path.join(skillDir, 'SKILL.md');
|
|
115
|
+
if (fs.existsSync(destPath)) return false;
|
|
116
|
+
const srcPath = path.join(PKG_ROOT, 'plugins', 'amp-skill', 'SKILL.md');
|
|
117
|
+
if (!fs.existsSync(srcPath)) return false;
|
|
118
|
+
fs.mkdirSync(skillDir, { recursive: true });
|
|
119
|
+
fs.copyFileSync(srcPath, destPath);
|
|
120
|
+
console.log(' Amp: installed .amp/skills/rlhf-feedback/SKILL.md');
|
|
121
|
+
return true;
|
|
122
|
+
}
|
|
123
|
+
|
|
124
|
+
function setupCursor() {
|
|
125
|
+
return mergeMcpJson(path.join(CWD, '.cursor', 'mcp.json'), 'Cursor');
|
|
126
|
+
}
|
|
18
127
|
|
|
19
128
|
function init() {
|
|
20
129
|
const rlhfDir = path.join(CWD, '.rlhf');
|
|
21
130
|
|
|
22
|
-
// Create directory
|
|
23
131
|
if (!fs.existsSync(rlhfDir)) {
|
|
24
132
|
fs.mkdirSync(rlhfDir, { recursive: true });
|
|
25
133
|
console.log('Created .rlhf/');
|
|
@@ -27,130 +135,236 @@ function init() {
|
|
|
27
135
|
console.log('.rlhf/ already exists — updating config');
|
|
28
136
|
}
|
|
29
137
|
|
|
30
|
-
// Write config.json
|
|
31
138
|
const config = {
|
|
32
|
-
version:
|
|
139
|
+
version: pkgVersion(),
|
|
33
140
|
apiUrl: process.env.RLHF_API_URL || 'http://localhost:3000',
|
|
34
141
|
logPath: '.rlhf/feedback-log.jsonl',
|
|
35
142
|
memoryPath: '.rlhf/memory-log.jsonl',
|
|
36
143
|
createdAt: new Date().toISOString(),
|
|
37
144
|
};
|
|
38
145
|
|
|
39
|
-
|
|
40
|
-
fs.writeFileSync(configPath, JSON.stringify(config, null, 2) + '\n');
|
|
146
|
+
fs.writeFileSync(path.join(rlhfDir, 'config.json'), JSON.stringify(config, null, 2) + '\n');
|
|
41
147
|
console.log('Wrote .rlhf/config.json');
|
|
42
148
|
|
|
43
|
-
//
|
|
44
|
-
|
|
45
|
-
/**
|
|
46
|
-
* Standalone feedback capture script — created by npx rlhf-feedback-loop init
|
|
47
|
-
* Full version: https://github.com/IgorGanapolsky/rlhf-feedback-loop
|
|
48
|
-
*
|
|
49
|
-
* Usage:
|
|
50
|
-
* node .rlhf/capture-feedback.js --feedback=up --context="that worked great" --tags="testing"
|
|
51
|
-
* node .rlhf/capture-feedback.js --feedback=down --context="missed edge case" --what-went-wrong="..." --what-to-change="..."
|
|
52
|
-
*/
|
|
149
|
+
// Always create .mcp.json (project-level MCP config used by Claude, Codex, Cursor)
|
|
150
|
+
mergeMcpJson(path.join(CWD, '.mcp.json'), 'MCP');
|
|
53
151
|
|
|
54
|
-
|
|
152
|
+
// Auto-detect and configure platform-specific locations
|
|
153
|
+
console.log('');
|
|
154
|
+
console.log('Detecting platforms...');
|
|
155
|
+
let configured = 0;
|
|
55
156
|
|
|
56
|
-
const
|
|
57
|
-
|
|
58
|
-
|
|
157
|
+
const platforms = [
|
|
158
|
+
{ name: 'Codex', detect: [() => whichExists('codex'), () => fs.existsSync(path.join(HOME, '.codex'))], setup: setupCodex },
|
|
159
|
+
{ name: 'Gemini', detect: [() => whichExists('gemini'), () => fs.existsSync(path.join(HOME, '.gemini'))], setup: setupGemini },
|
|
160
|
+
{ name: 'Amp', detect: [() => whichExists('amp'), () => fs.existsSync(path.join(HOME, '.amp'))], setup: setupAmp },
|
|
161
|
+
{ name: 'Cursor', detect: [() => fs.existsSync(path.join(HOME, '.cursor', 'mcp.json')), () => fs.existsSync(path.join(CWD, '.cursor'))], setup: setupCursor },
|
|
162
|
+
];
|
|
59
163
|
|
|
60
|
-
const
|
|
61
|
-
|
|
62
|
-
const
|
|
164
|
+
for (const p of platforms) {
|
|
165
|
+
if (detectPlatform(p.name, p.detect)) {
|
|
166
|
+
const didSetup = p.setup();
|
|
167
|
+
if (didSetup) configured++;
|
|
168
|
+
else console.log(` ${p.name}: already configured`);
|
|
169
|
+
}
|
|
170
|
+
}
|
|
63
171
|
|
|
64
|
-
|
|
65
|
-
const
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
args[key] = rest.length ? rest.join('=') : true;
|
|
70
|
-
});
|
|
71
|
-
return args;
|
|
72
|
-
}
|
|
172
|
+
// ChatGPT — cannot be automated
|
|
173
|
+
const chatgptSpec = path.join(PKG_ROOT, 'adapters', 'chatgpt', 'openapi.yaml');
|
|
174
|
+
if (fs.existsSync(chatgptSpec)) {
|
|
175
|
+
console.log(` ChatGPT: import ${path.relative(CWD, chatgptSpec)} in GPT Builder > Actions`);
|
|
176
|
+
}
|
|
73
177
|
|
|
74
|
-
|
|
75
|
-
const signal = args.feedback || args.signal;
|
|
178
|
+
if (configured === 0) console.log(' All detected platforms already configured.');
|
|
76
179
|
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
180
|
+
// .gitignore
|
|
181
|
+
const gitignorePath = path.join(CWD, '.gitignore');
|
|
182
|
+
if (fs.existsSync(gitignorePath)) {
|
|
183
|
+
const gitignore = fs.readFileSync(gitignorePath, 'utf8');
|
|
184
|
+
const entries = ['.rlhf/feedback-log.jsonl', '.rlhf/memory-log.jsonl'];
|
|
185
|
+
const missing = entries.filter((e) => !gitignore.includes(e));
|
|
186
|
+
if (missing.length > 0) {
|
|
187
|
+
fs.appendFileSync(gitignorePath, '\n# RLHF local feedback data\n' + missing.join('\n') + '\n');
|
|
188
|
+
console.log('Updated .gitignore');
|
|
189
|
+
}
|
|
190
|
+
}
|
|
191
|
+
|
|
192
|
+
console.log('');
|
|
193
|
+
console.log(`rlhf-feedback-loop v${pkgVersion()} initialized.`);
|
|
194
|
+
console.log('Run: npx rlhf-feedback-loop help');
|
|
81
195
|
}
|
|
82
196
|
|
|
83
|
-
|
|
197
|
+
function capture() {
|
|
198
|
+
const args = parseArgs(process.argv.slice(3));
|
|
84
199
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
tags: args.tags ? args.tags.split(',').map((t) => t.trim()) : [],
|
|
93
|
-
timestamp: new Date().toISOString(),
|
|
94
|
-
hostname: os.hostname(),
|
|
95
|
-
};
|
|
200
|
+
// Delegate to the full engine
|
|
201
|
+
const { captureFeedback, analyzeFeedback, feedbackSummary, writePreventionRules } = require(path.join(PKG_ROOT, 'scripts', 'feedback-loop'));
|
|
202
|
+
|
|
203
|
+
if (args.stats) {
|
|
204
|
+
console.log(JSON.stringify(analyzeFeedback(), null, 2));
|
|
205
|
+
return;
|
|
206
|
+
}
|
|
96
207
|
|
|
97
|
-
|
|
98
|
-
|
|
208
|
+
if (args.summary) {
|
|
209
|
+
console.log(feedbackSummary(Number(args.recent || 20)));
|
|
210
|
+
return;
|
|
211
|
+
}
|
|
99
212
|
|
|
100
|
-
//
|
|
101
|
-
const
|
|
102
|
-
|
|
213
|
+
// Normalize signal with fuzzy matching (uses the full engine's normalize)
|
|
214
|
+
const captureScript = require(path.join(PKG_ROOT, '.claude', 'scripts', 'feedback', 'capture-feedback.js'));
|
|
215
|
+
// The capture-feedback.js runs as main when required directly, so we call via subprocess
|
|
216
|
+
const scriptArgs = process.argv.slice(3).join(' ');
|
|
217
|
+
try {
|
|
218
|
+
const output = execSync(
|
|
219
|
+
`node "${path.join(PKG_ROOT, '.claude', 'scripts', 'feedback', 'capture-feedback.js')}" ${scriptArgs}`,
|
|
220
|
+
{ encoding: 'utf8', stdio: 'pipe', cwd: CWD }
|
|
221
|
+
);
|
|
222
|
+
process.stdout.write(output);
|
|
223
|
+
} catch (err) {
|
|
224
|
+
process.stderr.write(err.stderr || err.stdout || err.message);
|
|
225
|
+
process.exit(err.status || 1);
|
|
226
|
+
}
|
|
227
|
+
}
|
|
103
228
|
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
console.log(
|
|
107
|
-
|
|
229
|
+
function stats() {
|
|
230
|
+
const { analyzeFeedback } = require(path.join(PKG_ROOT, 'scripts', 'feedback-loop'));
|
|
231
|
+
console.log(JSON.stringify(analyzeFeedback(), null, 2));
|
|
232
|
+
}
|
|
233
|
+
|
|
234
|
+
function summary() {
|
|
235
|
+
const args = parseArgs(process.argv.slice(3));
|
|
236
|
+
const { feedbackSummary } = require(path.join(PKG_ROOT, 'scripts', 'feedback-loop'));
|
|
237
|
+
console.log(feedbackSummary(Number(args.recent || 20)));
|
|
238
|
+
}
|
|
108
239
|
|
|
109
|
-
|
|
110
|
-
fs.writeFileSync(scriptPath, captureScript);
|
|
111
|
-
// Make executable
|
|
240
|
+
function exportDpo() {
|
|
112
241
|
try {
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
242
|
+
const output = execSync(
|
|
243
|
+
`node "${path.join(PKG_ROOT, 'scripts', 'export-dpo-pairs.js')}"`,
|
|
244
|
+
{ encoding: 'utf8', stdio: 'pipe', cwd: CWD }
|
|
245
|
+
);
|
|
246
|
+
process.stdout.write(output);
|
|
247
|
+
} catch (err) {
|
|
248
|
+
process.stderr.write(err.stderr || err.stdout || err.message);
|
|
249
|
+
process.exit(err.status || 1);
|
|
116
250
|
}
|
|
117
|
-
|
|
251
|
+
}
|
|
118
252
|
|
|
119
|
-
|
|
120
|
-
const
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
253
|
+
function rules() {
|
|
254
|
+
const args = parseArgs(process.argv.slice(3));
|
|
255
|
+
const { writePreventionRules } = require(path.join(PKG_ROOT, 'scripts', 'feedback-loop'));
|
|
256
|
+
const outPath = args.output || path.join(CWD, '.rlhf', 'prevention-rules.md');
|
|
257
|
+
const result = writePreventionRules(outPath, Number(args.min || 2));
|
|
258
|
+
console.log(`Wrote prevention rules to ${result.path}`);
|
|
259
|
+
}
|
|
260
|
+
|
|
261
|
+
function selfHeal() {
|
|
262
|
+
try {
|
|
263
|
+
const output = execSync(
|
|
264
|
+
`node "${path.join(PKG_ROOT, 'scripts', 'self-healing-check.js')}" && node "${path.join(PKG_ROOT, 'scripts', 'self-heal.js')}"`,
|
|
265
|
+
{ encoding: 'utf8', stdio: 'inherit', cwd: CWD }
|
|
266
|
+
);
|
|
267
|
+
} catch (err) {
|
|
268
|
+
process.exit(err.status || 1);
|
|
129
269
|
}
|
|
270
|
+
}
|
|
130
271
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
272
|
+
function prove() {
|
|
273
|
+
const args = parseArgs(process.argv.slice(3));
|
|
274
|
+
const target = args.target || 'adapters';
|
|
275
|
+
const script = path.join(PKG_ROOT, 'scripts', `prove-${target}.js`);
|
|
276
|
+
if (!fs.existsSync(script)) {
|
|
277
|
+
console.error(`Unknown proof target: ${target}`);
|
|
278
|
+
console.error('Available: adapters, automation, attribution, lancedb, data-quality, intelligence, loop-closure, training-export');
|
|
279
|
+
process.exit(1);
|
|
280
|
+
}
|
|
281
|
+
try {
|
|
282
|
+
execSync(`node "${script}"`, { encoding: 'utf8', stdio: 'inherit', cwd: CWD });
|
|
283
|
+
} catch (err) {
|
|
284
|
+
process.exit(err.status || 1);
|
|
285
|
+
}
|
|
286
|
+
}
|
|
287
|
+
|
|
288
|
+
function serve() {
|
|
289
|
+
// Start MCP server over stdio — used by `claude mcp add`, `codex mcp add`, `gemini mcp add`
|
|
290
|
+
const mcpServer = path.join(PKG_ROOT, 'adapters', 'mcp', 'server-stdio.js');
|
|
291
|
+
require(mcpServer);
|
|
292
|
+
}
|
|
293
|
+
|
|
294
|
+
function startApi() {
|
|
295
|
+
const serverPath = path.join(PKG_ROOT, 'src', 'api', 'server.js');
|
|
296
|
+
try {
|
|
297
|
+
execSync(`node "${serverPath}"`, { stdio: 'inherit', cwd: CWD });
|
|
298
|
+
} catch (err) {
|
|
299
|
+
process.exit(err.status || 1);
|
|
300
|
+
}
|
|
136
301
|
}
|
|
137
302
|
|
|
138
303
|
function help() {
|
|
139
|
-
|
|
304
|
+
const v = pkgVersion();
|
|
305
|
+
console.log(`rlhf-feedback-loop v${v}`);
|
|
140
306
|
console.log('');
|
|
141
307
|
console.log('Commands:');
|
|
142
|
-
console.log(' init
|
|
143
|
-
console.log('
|
|
308
|
+
console.log(' init Scaffold .rlhf/ config + MCP server in current project');
|
|
309
|
+
console.log(' serve Start MCP server (stdio) — for claude/codex/gemini mcp add');
|
|
310
|
+
console.log(' capture [flags] Capture feedback (--feedback=up|down --context="..." --tags="...")');
|
|
311
|
+
console.log(' stats Show feedback analytics');
|
|
312
|
+
console.log(' summary Human-readable feedback summary');
|
|
313
|
+
console.log(' export-dpo Export DPO training pairs (prompt/chosen/rejected JSONL)');
|
|
314
|
+
console.log(' rules Generate prevention rules from repeated failures');
|
|
315
|
+
console.log(' self-heal Run self-healing check and auto-fix');
|
|
316
|
+
console.log(' prove [--target=X] Run proof harness (adapters|automation|attribution|lancedb|...)');
|
|
317
|
+
console.log(' start-api Start the RLHF HTTPS API server');
|
|
318
|
+
console.log(' help Show this help message');
|
|
144
319
|
console.log('');
|
|
145
320
|
console.log('Examples:');
|
|
146
321
|
console.log(' npx rlhf-feedback-loop init');
|
|
147
|
-
console.log('
|
|
322
|
+
console.log(' npx rlhf-feedback-loop capture --feedback=up --context="all tests pass"');
|
|
323
|
+
console.log(' npx rlhf-feedback-loop capture --feedback=down --context="broke prod" --what-went-wrong="no tests"');
|
|
324
|
+
console.log(' npx rlhf-feedback-loop export-dpo');
|
|
325
|
+
console.log(' npx rlhf-feedback-loop stats');
|
|
326
|
+
console.log('');
|
|
327
|
+
console.log('MCP install (one command per platform):');
|
|
328
|
+
console.log(' claude mcp add rlhf -- npx -y rlhf-feedback-loop serve');
|
|
329
|
+
console.log(' codex mcp add rlhf -- npx -y rlhf-feedback-loop serve');
|
|
330
|
+
console.log(' gemini mcp add rlhf -- npx -y rlhf-feedback-loop serve');
|
|
148
331
|
}
|
|
149
332
|
|
|
150
333
|
switch (COMMAND) {
|
|
151
334
|
case 'init':
|
|
152
335
|
init();
|
|
153
336
|
break;
|
|
337
|
+
case 'serve':
|
|
338
|
+
case 'mcp':
|
|
339
|
+
serve();
|
|
340
|
+
break;
|
|
341
|
+
case 'capture':
|
|
342
|
+
case 'feedback':
|
|
343
|
+
capture();
|
|
344
|
+
break;
|
|
345
|
+
case 'stats':
|
|
346
|
+
stats();
|
|
347
|
+
break;
|
|
348
|
+
case 'summary':
|
|
349
|
+
summary();
|
|
350
|
+
break;
|
|
351
|
+
case 'export-dpo':
|
|
352
|
+
case 'dpo':
|
|
353
|
+
exportDpo();
|
|
354
|
+
break;
|
|
355
|
+
case 'rules':
|
|
356
|
+
rules();
|
|
357
|
+
break;
|
|
358
|
+
case 'self-heal':
|
|
359
|
+
selfHeal();
|
|
360
|
+
break;
|
|
361
|
+
case 'prove':
|
|
362
|
+
prove();
|
|
363
|
+
break;
|
|
364
|
+
case 'start-api':
|
|
365
|
+
case 'serve':
|
|
366
|
+
startApi();
|
|
367
|
+
break;
|
|
154
368
|
case 'help':
|
|
155
369
|
case '--help':
|
|
156
370
|
case '-h':
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
"version": 1,
|
|
3
3
|
"profiles": {
|
|
4
4
|
"default": [
|
|
5
|
+
"recall",
|
|
5
6
|
"capture_feedback",
|
|
6
7
|
"feedback_summary",
|
|
7
8
|
"feedback_stats",
|
|
@@ -14,6 +15,7 @@
|
|
|
14
15
|
"context_provenance"
|
|
15
16
|
],
|
|
16
17
|
"readonly": [
|
|
18
|
+
"recall",
|
|
17
19
|
"feedback_summary",
|
|
18
20
|
"feedback_stats",
|
|
19
21
|
"list_intents",
|
package/package.json
CHANGED
|
@@ -1,7 +1,15 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "rlhf-feedback-loop",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "0.6.0",
|
|
4
|
+
"description": "Make your AI agent learn from mistakes. Capture thumbs up/down feedback, block repeated failures, export DPO training data. Works with ChatGPT, Claude, Codex, Gemini, Amp.",
|
|
5
|
+
"homepage": "https://github.com/IgorGanapolsky/rlhf-feedback-loop#readme",
|
|
6
|
+
"repository": {
|
|
7
|
+
"type": "git",
|
|
8
|
+
"url": "https://github.com/IgorGanapolsky/rlhf-feedback-loop.git"
|
|
9
|
+
},
|
|
10
|
+
"bugs": {
|
|
11
|
+
"url": "https://github.com/IgorGanapolsky/rlhf-feedback-loop/issues"
|
|
12
|
+
},
|
|
5
13
|
"main": "scripts/feedback-loop.js",
|
|
6
14
|
"bin": {
|
|
7
15
|
"rlhf-feedback-loop": "./bin/cli.js"
|
|
@@ -25,8 +33,8 @@
|
|
|
25
33
|
"test:schema": "node scripts/feedback-schema.js --test",
|
|
26
34
|
"test:loop": "node scripts/feedback-loop.js --test",
|
|
27
35
|
"test:dpo": "node scripts/export-dpo-pairs.js --test",
|
|
28
|
-
"test:api": "node --test tests/api-server.test.js tests/api-auth-config.test.js tests/mcp-server.test.js tests/adapters.test.js tests/openapi-parity.test.js tests/budget-guard.test.js tests/contextfs.test.js tests/mcp-policy.test.js tests/subagent-profiles.test.js tests/intent-router.test.js tests/rubric-engine.test.js tests/self-healing-check.test.js tests/self-heal.test.js tests/feedback-schema.test.js tests/thompson-sampling.test.js tests/feedback-sequences.test.js tests/diversity-tracking.test.js tests/vector-store.test.js tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js tests/loop-closure.test.js tests/code-reasoning.test.js",
|
|
29
|
-
"test:proof": "node --test tests/prove-adapters.test.js tests/prove-automation.test.js",
|
|
36
|
+
"test:api": "node --test tests/api-server.test.js tests/api-auth-config.test.js tests/mcp-server.test.js tests/adapters.test.js tests/openapi-parity.test.js tests/budget-guard.test.js tests/contextfs.test.js tests/mcp-policy.test.js tests/subagent-profiles.test.js tests/intent-router.test.js tests/rubric-engine.test.js tests/self-healing-check.test.js tests/self-heal.test.js tests/feedback-schema.test.js tests/thompson-sampling.test.js tests/feedback-sequences.test.js tests/diversity-tracking.test.js tests/vector-store.test.js tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js tests/loop-closure.test.js tests/code-reasoning.test.js tests/feedback-loop.test.js tests/feedback-inbox-read.test.js tests/feedback-to-memory.test.js",
|
|
37
|
+
"test:proof": "node --test tests/prove-adapters.test.js tests/prove-automation.test.js tests/prove-attribution.test.js tests/prove-lancedb.test.js tests/prove-data-quality.test.js tests/prove-intelligence.test.js tests/prove-loop-closure.test.js tests/prove-subway-upgrades.test.js tests/prove-training-export.test.js",
|
|
30
38
|
"test:rlaif": "node --test tests/rlaif-self-audit.test.js tests/dpo-optimizer.test.js tests/meta-policy.test.js",
|
|
31
39
|
"test:attribution": "node --test tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js",
|
|
32
40
|
"test:quality": "node --test tests/validate-feedback.test.js",
|
|
@@ -79,13 +87,23 @@
|
|
|
79
87
|
"claude",
|
|
80
88
|
"codex",
|
|
81
89
|
"gemini",
|
|
90
|
+
"chatgpt",
|
|
91
|
+
"amp",
|
|
92
|
+
"mcp",
|
|
93
|
+
"model-context-protocol",
|
|
82
94
|
"agent-evaluation",
|
|
83
|
-
"prompt-engineering"
|
|
95
|
+
"prompt-engineering",
|
|
96
|
+
"context-engineering",
|
|
97
|
+
"ai-safety",
|
|
98
|
+
"machine-learning",
|
|
99
|
+
"openapi",
|
|
100
|
+
"developer-tools"
|
|
84
101
|
],
|
|
85
102
|
"license": "MIT",
|
|
86
103
|
"dependencies": {
|
|
87
104
|
"@huggingface/transformers": "^3.8.1",
|
|
88
105
|
"@lancedb/lancedb": "^0.26.2",
|
|
89
|
-
"apache-arrow": "^18.1.0"
|
|
106
|
+
"apache-arrow": "^18.1.0",
|
|
107
|
+
"stripe": "^20.4.0"
|
|
90
108
|
}
|
|
91
109
|
}
|