devin-memento 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- devin_memento-0.1.0/PKG-INFO +309 -0
- devin_memento-0.1.0/README.md +294 -0
- devin_memento-0.1.0/devin_memento.egg-info/PKG-INFO +309 -0
- devin_memento-0.1.0/devin_memento.egg-info/SOURCES.txt +11 -0
- devin_memento-0.1.0/devin_memento.egg-info/dependency_links.txt +1 -0
- devin_memento-0.1.0/devin_memento.egg-info/entry_points.txt +3 -0
- devin_memento-0.1.0/devin_memento.egg-info/top_level.txt +3 -0
- devin_memento-0.1.0/harvest_devin.py +530 -0
- devin_memento-0.1.0/judge.py +129 -0
- devin_memento-0.1.0/mcp_server.py +442 -0
- devin_memento-0.1.0/pyproject.toml +37 -0
- devin_memento-0.1.0/setup.cfg +4 -0
- devin_memento-0.1.0/tests/test_memento.py +421 -0
|
@@ -0,0 +1,309 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: devin-memento
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Memento — a Devin MCP server that gives Devin a nightly sleep cycle to self-improve its SKILL.md.
|
|
5
|
+
Author: xerxes-y
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/xerxes-y/memento
|
|
8
|
+
Project-URL: Repository, https://github.com/xerxes-y/memento
|
|
9
|
+
Keywords: devin,mcp,skillopt,agent,skill-optimization
|
|
10
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
13
|
+
Requires-Python: >=3.10
|
|
14
|
+
Description-Content-Type: text/markdown
|
|
15
|
+
|
|
16
|
+
# memento
|
|
17
|
+
|
|
18
|
+
**Memento** integration for **Devin** (Cognition).
|
|
19
|
+
|
|
20
|
+
Gives Devin a nightly *sleep cycle*: reviews past sessions, mines recurring
|
|
21
|
+
patterns, proposes bounded edits to a long-term `SKILL.md`, and gates every
|
|
22
|
+
change with a held-out validation score — so only improvements that actually
|
|
23
|
+
make Devin better *at your work* get adopted.
|
|
24
|
+
|
|
25
|
+
> Built on [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt).
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## How it works
|
|
30
|
+
|
|
31
|
+
Devin does not write conversation transcripts to disk in a format
|
|
32
|
+
the sleep engine understands. `harvest_devin.py` bridges this by converting
|
|
33
|
+
every locally available source into Claude Code-compatible JSONL transcripts:
|
|
34
|
+
|
|
35
|
+
| Source | Where | What it contributes |
|
|
36
|
+
|---|---|---|
|
|
37
|
+
| **Devin transcripts** | `~/.local/share/devin/cli/transcripts/*.json` | Native ATIF-v1.7 sessions — real user↔agent turns |
|
|
38
|
+
| **Memories** | `~/.agentmemory/standalone.json` | Memories saved via memento's built-in `memory_save` tool (or the [agentmemory MCP server](https://github.com/rohitg00/agentmemory) if you run it) |
|
|
39
|
+
| **Skill files** | `.devin/skills/*/SKILL.md` | Skill trigger patterns and expected behavior |
|
|
40
|
+
|
|
41
|
+
Memory is **built in** — `memory_save`/`memory_recall` write the same
|
|
42
|
+
`standalone.json` the harvester reads, so no separate memory MCP is required (it
|
|
43
|
+
stays compatible with [agentmemory](https://github.com/rohitg00/agentmemory) if
|
|
44
|
+
you already use it).
|
|
45
|
+
Workspaces are **auto-detected** from the Devin registry (nothing to configure):
|
|
46
|
+
- Devin: `~/.config/Devin/User/workspaceStorage/*/workspace.json`
|
|
47
|
+
|
|
48
|
+
After `memento_adopt` the evolved skill is synced to
|
|
49
|
+
`.devin/skills/memento-learned/SKILL.md` automatically.
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Install
|
|
54
|
+
|
|
55
|
+
**Requirements:** Python ≥ 3.10, Git, Devin CLI.
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
git clone https://github.com/xerxes-y/memento.git
|
|
59
|
+
cd memento
|
|
60
|
+
bash install.sh
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
`install.sh` will:
|
|
64
|
+
1. Use or clone [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) to `<project-dir>/../SkillOpt` (or `--skillopt-dir`)
|
|
65
|
+
2. Install `skillopt_sleep` (editable) into your Python environment
|
|
66
|
+
3. Create `~/.memento/` (runtime data dir)
|
|
67
|
+
4. Seed `memento-learned/SKILL.md` into every detected Devin workspace (`.devin/skills/`)
|
|
68
|
+
5. Auto-register with **Devin CLI** MCP (`devin mcp add memento`) if the Devin CLI is on PATH
|
|
69
|
+
|
|
70
|
+
### Devin post-install
|
|
71
|
+
|
|
72
|
+
MCP registration is automatic if the Devin CLI is installed.
|
|
73
|
+
Optionally copy `devin-rules.snippet.md` to `.devin/rules/memento.md` in your workspace so Devin knows to offer the sleep tools.
|
|
74
|
+
|
|
75
|
+
### Windows
|
|
76
|
+
|
|
77
|
+
The runtime (`mcp_server.py` + `harvest_devin.py`) is cross-platform and
|
|
78
|
+
auto-detects Devin data under `%LOCALAPPDATA%\devin\cli\transcripts` — no extra flags needed.
|
|
79
|
+
|
|
80
|
+
`install.sh` is bash, so run it from **Git Bash** or **WSL**, or wire it up
|
|
81
|
+
manually: add the snippet from `mcp-config.example.json` to your Devin MCP config
|
|
82
|
+
(use `python` instead of `python3` and absolute Windows paths in `args`/`env`).
|
|
83
|
+
|
|
84
|
+
### Manual config
|
|
85
|
+
|
|
86
|
+
**Devin** — run once in a terminal:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
devin mcp add memento \
|
|
90
|
+
--env "MEMENTO_ENGINE_REPO=<project-dir>/../SkillOpt" \
|
|
91
|
+
--env "MEMENTO_HOME=$HOME/.memento" \
|
|
92
|
+
-- python3 <project-dir>/mcp_server.py
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Add to Devin as an MCP extension (`uvx`, one line)
|
|
98
|
+
|
|
99
|
+
memento is published to PyPI as **[`devin-memento`](https://pypi.org/project/devin-memento/)**
|
|
100
|
+
with a `devin-memento` console entrypoint, so it runs as a self-contained package
|
|
101
|
+
with no clone or path wiring — ideal for Devin's **custom MCP** UI
|
|
102
|
+
(*Settings → Connections → MCP servers → Add a custom MCP → STDIO*) or the
|
|
103
|
+
`devin mcp add` CLI.
|
|
104
|
+
|
|
105
|
+
**STDIO config (Devin custom MCP):**
|
|
106
|
+
|
|
107
|
+
| Field | Value |
|
|
108
|
+
|---|---|
|
|
109
|
+
| Command | `uvx` |
|
|
110
|
+
| Args | `["devin-memento"]` |
|
|
111
|
+
| Env | `MEMENTO_ENGINE_REPO`, `MEMENTO_HOME` |
|
|
112
|
+
|
|
113
|
+
Or via the CLI:
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
devin mcp add memento \
|
|
117
|
+
--env "MEMENTO_ENGINE_REPO=$HOME/.local/share/SkillOpt" \
|
|
118
|
+
--env "MEMENTO_HOME=$HOME/.memento" \
|
|
119
|
+
-- uvx devin-memento
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
To run the unreleased `main` instead of the PyPI release, swap the args for
|
|
123
|
+
`["--from", "git+https://github.com/xerxes-y/memento", "devin-memento"]`.
|
|
124
|
+
|
|
125
|
+
Maintainers cut a release with:
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
python3 -m build && python3 -m twine upload dist/*
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
> The optimization engine (`skillopt_sleep`) is loaded at runtime from
|
|
132
|
+
> `MEMENTO_ENGINE_REPO` (a local SkillOpt clone), so it works inside the isolated
|
|
133
|
+
> `uvx` env without being on PyPI. Point `MEMENTO_ENGINE_REPO` at a clone (or run
|
|
134
|
+
> `install.sh` once to create one).
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## Use
|
|
139
|
+
|
|
140
|
+
Ask Devin:
|
|
141
|
+
|
|
142
|
+
> *"run the sleep cycle"*, *"what did the last sleep propose?"*, *"adopt it"*
|
|
143
|
+
|
|
144
|
+
Or call tools directly:
|
|
145
|
+
|
|
146
|
+
| Tool | What it does |
|
|
147
|
+
|---|---|
|
|
148
|
+
| `memento_auto` | **fully automatic** — run + auto-adopt above the validation gate, returns the SKILL.md diff report |
|
|
149
|
+
| `memento_status` | nights run so far + latest staged proposal |
|
|
150
|
+
| `memento_dry_run` | preview cycle — no staging, no changes |
|
|
151
|
+
| `memento_run` | full cycle; stages a proposal for your review |
|
|
152
|
+
| `memento_adopt` | apply the staged proposal; syncs skill to workspace |
|
|
153
|
+
| `memento_harvest` | debug: list the recurring tasks mined |
|
|
154
|
+
| `memory_save` | persist a memory (`title` + `content`) to the built-in store |
|
|
155
|
+
| `memory_recall` | list/search saved memories (optional `query`, `limit`) |
|
|
156
|
+
|
|
157
|
+
Each tool accepts:
|
|
158
|
+
|
|
159
|
+
| Argument | Values | Default |
|
|
160
|
+
|---|---|---|
|
|
161
|
+
| `project` | abs path | cwd |
|
|
162
|
+
| `backend` | `mock` / `claude` / `codex` | `mock` |
|
|
163
|
+
| `scope` | `invoked` / `all` | `invoked` |
|
|
164
|
+
|
|
165
|
+
`mock` is free (no API calls). For real LLM optimization:
|
|
166
|
+
- `backend: "claude"` → set `ANTHROPIC_API_KEY`
|
|
167
|
+
- `backend: "codex"` → set `OPENAI_API_KEY`
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Run it fully automatically
|
|
172
|
+
|
|
173
|
+
`memento_auto` runs a cycle **and** adopts the result in one step, gated by the
|
|
174
|
+
engine's held-out validation (plus an optional `MEMENTO_AUTO_ADOPT_MIN_SCORE`
|
|
175
|
+
floor), then returns a before/after `SKILL.md` diff. Ask Devin *"auto-evolve the
|
|
176
|
+
skill"*, or schedule it to run unattended.
|
|
177
|
+
|
|
178
|
+
**macOS (launchd) — nightly at 02:00:**
|
|
179
|
+
|
|
180
|
+
```bash
|
|
181
|
+
bash install.sh --schedule # uses first detected workspace
|
|
182
|
+
bash install.sh --schedule --schedule-time 03:30 --schedule-project /path/to/repo
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
This writes `~/Library/LaunchAgents/com.memento.plist` and loads it; logs
|
|
186
|
+
go to `~/.memento/memento-auto.log`. Remove with
|
|
187
|
+
`launchctl unload <plist> && rm <plist>`.
|
|
188
|
+
|
|
189
|
+
**Linux / cron** — point a cron entry at the standalone runner:
|
|
190
|
+
|
|
191
|
+
```cron
|
|
192
|
+
0 2 * * * python3 /path/to/mcp_server.py --auto --project /path/to/repo --backend mock
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## Environment variables
|
|
198
|
+
|
|
199
|
+
| Variable | Default | Purpose |
|
|
200
|
+
|---|---|---|
|
|
201
|
+
| `MEMENTO_ENGINE_REPO` | `~/.local/share/SkillOpt` | Path to the SkillOpt repo |
|
|
202
|
+
| `MEMENTO_HOME` | `~/.memento` | Runtime data dir |
|
|
203
|
+
| `MEMENTO_WORKSPACES` | auto-detected | Colon-separated workspace paths |
|
|
204
|
+
| `MEMENTO_MANAGED_SKILL` | `memento-learned` | Skill name to evolve |
|
|
205
|
+
| `MEMENTO_MEMORY_PATH` | `~/.agentmemory/standalone.json` | Where `memory_save`/`memory_recall` store memories |
|
|
206
|
+
| `MEMENTO_AUTO_ADOPT_MIN_SCORE` | unset | Optional floor for `memento_auto`; skip adopt if the parsed validation score is below it (the engine's own gate still applies) |
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## Verify (no Devin session needed)
|
|
211
|
+
|
|
212
|
+
Run the test suite (stdlib-only, no pytest required):
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
python3 -m unittest discover -s tests -v
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
It covers the harvest helpers, the Devin ATIF transcript path, the judge, the MCP
|
|
219
|
+
protocol, and the **microsoft/SkillOpt engine command contract**. The one
|
|
220
|
+
integration test that runs the real engine is skipped automatically unless
|
|
221
|
+
`skillopt_sleep` is installed (via `install.sh`).
|
|
222
|
+
|
|
223
|
+
Or smoke-test the MCP server's JSON-RPC directly:
|
|
224
|
+
|
|
225
|
+
```bash
|
|
226
|
+
MEMENTO_ENGINE_REPO=~/.local/share/SkillOpt \
|
|
227
|
+
printf '%s\n' \
|
|
228
|
+
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
|
|
229
|
+
'{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
|
|
230
|
+
| python3 mcp_server.py
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## Project structure
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
memento/
|
|
239
|
+
├── mcp_server.py MCP server (stdlib-only, stdio) — Devin
|
|
240
|
+
├── harvest_devin.py Transcript generator (Devin ATIF-v1.7 + agentmemory + skills)
|
|
241
|
+
├── judge.py Reference judge — scores a reply against a rubric (validation gate)
|
|
242
|
+
├── fixtures/
|
|
243
|
+
│ └── devin_sample.json Sample ATIF transcript for offline testing
|
|
244
|
+
├── tests/
|
|
245
|
+
│ └── test_memento.py Test suite (harvest, Devin path, judge, MCP, engine contract)
|
|
246
|
+
├── blog-memento.html Walk-through / use-case blog (PO · QA · Developer)
|
|
247
|
+
├── mcp-config.example.json Devin MCP config snippet
|
|
248
|
+
├── devin-rules.snippet.md Copy to .devin/rules/memento.md
|
|
249
|
+
├── seed_skill/
|
|
250
|
+
│ └── SKILL.md Initial skill seed (replaced by memento_adopt)
|
|
251
|
+
├── install.sh One-shot installer (Devin auto-detected)
|
|
252
|
+
├── pyproject.toml Packaging — `memento-mcp` console entrypoint (uvx/pip)
|
|
253
|
+
└── README.md
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## Outcomes & the validation gate
|
|
259
|
+
|
|
260
|
+
SkillOpt only improves a skill **where tasks recur and have a checkable
|
|
261
|
+
correctness signal**. A bare transcript has neither, so `harvest_devin.py`
|
|
262
|
+
enriches Devin trajectories with two things and writes them to
|
|
263
|
+
`<data-dir>/outcomes.jsonl`:
|
|
264
|
+
|
|
265
|
+
- **`taskKey`** — a stable `<lang>:<intent>:<target>` grouping key (e.g.
|
|
266
|
+
`java:fix:orderservice`) so repeats of the same task collapse into one
|
|
267
|
+
recurring task the gate can replay.
|
|
268
|
+
- **an outcome envelope** — the checkable signal:
|
|
269
|
+
- **hard signal** when the agent recorded a test/build result:
|
|
270
|
+
`{"success": true, "verifier": "tests", "evidence": "BUILD SUCCESS",
|
|
271
|
+
"reference": {"repro": "rtk mvn test -Dtest=OrderServiceTest"}}`
|
|
272
|
+
- **deferred (judge)** when no hard signal exists:
|
|
273
|
+
`{"success": null, "verifier": "judge", "rubric": [...]}` — a rubric is
|
|
274
|
+
derived from the task so [`judge.py`](judge.py) (or the engine) can score the
|
|
275
|
+
replay instead.
|
|
276
|
+
|
|
277
|
+
Score a reply against a rubric:
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
echo "<candidate reply>" | python3 judge.py --rubric-inline '["Addresses OrderService", "Resolves the reported defect without introducing new errors"]'
|
|
281
|
+
# → 0.5
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
`judge.py` defaults to an offline keyword-coverage heuristic (no API key).
|
|
285
|
+
Set `MEMENTO_JUDGE=claude` (+ `ANTHROPIC_API_KEY`) for an LLM judge.
|
|
286
|
+
|
|
287
|
+
> **Reality check:** the hard-signal path only fires if Devin actually
|
|
288
|
+
> records test or build results in its transcripts. If it doesn't, every task
|
|
289
|
+
> falls to the `judge` branch — point `--devin-transcripts` at a real transcript
|
|
290
|
+
> dir and inspect `outcomes.jsonl` to find out which case you're in.
|
|
291
|
+
|
|
292
|
+
Try it on the bundled fixture:
|
|
293
|
+
|
|
294
|
+
```bash
|
|
295
|
+
python3 harvest_devin.py --devin-transcripts fixtures --out-dir /tmp/memento-test
|
|
296
|
+
cat /tmp/memento-test/outcomes.jsonl
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
---
|
|
300
|
+
|
|
301
|
+
## Contributing / upstream
|
|
302
|
+
|
|
303
|
+
This plugin is being contributed back to
|
|
304
|
+
[microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) as
|
|
305
|
+
`plugins/devin/`. Bug reports and improvements welcome here or upstream.
|
|
306
|
+
|
|
307
|
+
## License
|
|
308
|
+
|
|
309
|
+
MIT — same as microsoft/SkillOpt.
|
|
@@ -0,0 +1,294 @@
|
|
|
1
|
+
# memento
|
|
2
|
+
|
|
3
|
+
**Memento** integration for **Devin** (Cognition).
|
|
4
|
+
|
|
5
|
+
Gives Devin a nightly *sleep cycle*: reviews past sessions, mines recurring
|
|
6
|
+
patterns, proposes bounded edits to a long-term `SKILL.md`, and gates every
|
|
7
|
+
change with a held-out validation score — so only improvements that actually
|
|
8
|
+
make Devin better *at your work* get adopted.
|
|
9
|
+
|
|
10
|
+
> Built on [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt).
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## How it works
|
|
15
|
+
|
|
16
|
+
Devin does not write conversation transcripts to disk in a format
|
|
17
|
+
the sleep engine understands. `harvest_devin.py` bridges this by converting
|
|
18
|
+
every locally available source into Claude Code-compatible JSONL transcripts:
|
|
19
|
+
|
|
20
|
+
| Source | Where | What it contributes |
|
|
21
|
+
|---|---|---|
|
|
22
|
+
| **Devin transcripts** | `~/.local/share/devin/cli/transcripts/*.json` | Native ATIF-v1.7 sessions — real user↔agent turns |
|
|
23
|
+
| **Memories** | `~/.agentmemory/standalone.json` | Memories saved via memento's built-in `memory_save` tool (or the [agentmemory MCP server](https://github.com/rohitg00/agentmemory) if you run it) |
|
|
24
|
+
| **Skill files** | `.devin/skills/*/SKILL.md` | Skill trigger patterns and expected behavior |
|
|
25
|
+
|
|
26
|
+
Memory is **built in** — `memory_save`/`memory_recall` write the same
|
|
27
|
+
`standalone.json` the harvester reads, so no separate memory MCP is required (it
|
|
28
|
+
stays compatible with [agentmemory](https://github.com/rohitg00/agentmemory) if
|
|
29
|
+
you already use it).
|
|
30
|
+
Workspaces are **auto-detected** from the Devin registry (nothing to configure):
|
|
31
|
+
- Devin: `~/.config/Devin/User/workspaceStorage/*/workspace.json`
|
|
32
|
+
|
|
33
|
+
After `memento_adopt` the evolved skill is synced to
|
|
34
|
+
`.devin/skills/memento-learned/SKILL.md` automatically.
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Install
|
|
39
|
+
|
|
40
|
+
**Requirements:** Python ≥ 3.10, Git, Devin CLI.
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
git clone https://github.com/xerxes-y/memento.git
|
|
44
|
+
cd memento
|
|
45
|
+
bash install.sh
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
`install.sh` will:
|
|
49
|
+
1. Use or clone [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) to `<project-dir>/../SkillOpt` (or `--skillopt-dir`)
|
|
50
|
+
2. Install `skillopt_sleep` (editable) into your Python environment
|
|
51
|
+
3. Create `~/.memento/` (runtime data dir)
|
|
52
|
+
4. Seed `memento-learned/SKILL.md` into every detected Devin workspace (`.devin/skills/`)
|
|
53
|
+
5. Auto-register with **Devin CLI** MCP (`devin mcp add memento`) if the Devin CLI is on PATH
|
|
54
|
+
|
|
55
|
+
### Devin post-install
|
|
56
|
+
|
|
57
|
+
MCP registration is automatic if the Devin CLI is installed.
|
|
58
|
+
Optionally copy `devin-rules.snippet.md` to `.devin/rules/memento.md` in your workspace so Devin knows to offer the sleep tools.
|
|
59
|
+
|
|
60
|
+
### Windows
|
|
61
|
+
|
|
62
|
+
The runtime (`mcp_server.py` + `harvest_devin.py`) is cross-platform and
|
|
63
|
+
auto-detects Devin data under `%LOCALAPPDATA%\devin\cli\transcripts` — no extra flags needed.
|
|
64
|
+
|
|
65
|
+
`install.sh` is bash, so run it from **Git Bash** or **WSL**, or wire it up
|
|
66
|
+
manually: add the snippet from `mcp-config.example.json` to your Devin MCP config
|
|
67
|
+
(use `python` instead of `python3` and absolute Windows paths in `args`/`env`).
|
|
68
|
+
|
|
69
|
+
### Manual config
|
|
70
|
+
|
|
71
|
+
**Devin** — run once in a terminal:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
devin mcp add memento \
|
|
75
|
+
--env "MEMENTO_ENGINE_REPO=<project-dir>/../SkillOpt" \
|
|
76
|
+
--env "MEMENTO_HOME=$HOME/.memento" \
|
|
77
|
+
-- python3 <project-dir>/mcp_server.py
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Add to Devin as an MCP extension (`uvx`, one line)
|
|
83
|
+
|
|
84
|
+
memento is published to PyPI as **[`devin-memento`](https://pypi.org/project/devin-memento/)**
|
|
85
|
+
with a `devin-memento` console entrypoint, so it runs as a self-contained package
|
|
86
|
+
with no clone or path wiring — ideal for Devin's **custom MCP** UI
|
|
87
|
+
(*Settings → Connections → MCP servers → Add a custom MCP → STDIO*) or the
|
|
88
|
+
`devin mcp add` CLI.
|
|
89
|
+
|
|
90
|
+
**STDIO config (Devin custom MCP):**
|
|
91
|
+
|
|
92
|
+
| Field | Value |
|
|
93
|
+
|---|---|
|
|
94
|
+
| Command | `uvx` |
|
|
95
|
+
| Args | `["devin-memento"]` |
|
|
96
|
+
| Env | `MEMENTO_ENGINE_REPO`, `MEMENTO_HOME` |
|
|
97
|
+
|
|
98
|
+
Or via the CLI:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
devin mcp add memento \
|
|
102
|
+
--env "MEMENTO_ENGINE_REPO=$HOME/.local/share/SkillOpt" \
|
|
103
|
+
--env "MEMENTO_HOME=$HOME/.memento" \
|
|
104
|
+
-- uvx devin-memento
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
To run the unreleased `main` instead of the PyPI release, swap the args for
|
|
108
|
+
`["--from", "git+https://github.com/xerxes-y/memento", "devin-memento"]`.
|
|
109
|
+
|
|
110
|
+
Maintainers cut a release with:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
python3 -m build && python3 -m twine upload dist/*
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
> The optimization engine (`skillopt_sleep`) is loaded at runtime from
|
|
117
|
+
> `MEMENTO_ENGINE_REPO` (a local SkillOpt clone), so it works inside the isolated
|
|
118
|
+
> `uvx` env without being on PyPI. Point `MEMENTO_ENGINE_REPO` at a clone (or run
|
|
119
|
+
> `install.sh` once to create one).
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Use
|
|
124
|
+
|
|
125
|
+
Ask Devin:
|
|
126
|
+
|
|
127
|
+
> *"run the sleep cycle"*, *"what did the last sleep propose?"*, *"adopt it"*
|
|
128
|
+
|
|
129
|
+
Or call tools directly:
|
|
130
|
+
|
|
131
|
+
| Tool | What it does |
|
|
132
|
+
|---|---|
|
|
133
|
+
| `memento_auto` | **fully automatic** — run + auto-adopt above the validation gate, returns the SKILL.md diff report |
|
|
134
|
+
| `memento_status` | nights run so far + latest staged proposal |
|
|
135
|
+
| `memento_dry_run` | preview cycle — no staging, no changes |
|
|
136
|
+
| `memento_run` | full cycle; stages a proposal for your review |
|
|
137
|
+
| `memento_adopt` | apply the staged proposal; syncs skill to workspace |
|
|
138
|
+
| `memento_harvest` | debug: list the recurring tasks mined |
|
|
139
|
+
| `memory_save` | persist a memory (`title` + `content`) to the built-in store |
|
|
140
|
+
| `memory_recall` | list/search saved memories (optional `query`, `limit`) |
|
|
141
|
+
|
|
142
|
+
Each tool accepts:
|
|
143
|
+
|
|
144
|
+
| Argument | Values | Default |
|
|
145
|
+
|---|---|---|
|
|
146
|
+
| `project` | abs path | cwd |
|
|
147
|
+
| `backend` | `mock` / `claude` / `codex` | `mock` |
|
|
148
|
+
| `scope` | `invoked` / `all` | `invoked` |
|
|
149
|
+
|
|
150
|
+
`mock` is free (no API calls). For real LLM optimization:
|
|
151
|
+
- `backend: "claude"` → set `ANTHROPIC_API_KEY`
|
|
152
|
+
- `backend: "codex"` → set `OPENAI_API_KEY`
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Run it fully automatically
|
|
157
|
+
|
|
158
|
+
`memento_auto` runs a cycle **and** adopts the result in one step, gated by the
|
|
159
|
+
engine's held-out validation (plus an optional `MEMENTO_AUTO_ADOPT_MIN_SCORE`
|
|
160
|
+
floor), then returns a before/after `SKILL.md` diff. Ask Devin *"auto-evolve the
|
|
161
|
+
skill"*, or schedule it to run unattended.
|
|
162
|
+
|
|
163
|
+
**macOS (launchd) — nightly at 02:00:**
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
bash install.sh --schedule # uses first detected workspace
|
|
167
|
+
bash install.sh --schedule --schedule-time 03:30 --schedule-project /path/to/repo
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
This writes `~/Library/LaunchAgents/com.memento.plist` and loads it; logs
|
|
171
|
+
go to `~/.memento/memento-auto.log`. Remove with
|
|
172
|
+
`launchctl unload <plist> && rm <plist>`.
|
|
173
|
+
|
|
174
|
+
**Linux / cron** — point a cron entry at the standalone runner:
|
|
175
|
+
|
|
176
|
+
```cron
|
|
177
|
+
0 2 * * * python3 /path/to/mcp_server.py --auto --project /path/to/repo --backend mock
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## Environment variables
|
|
183
|
+
|
|
184
|
+
| Variable | Default | Purpose |
|
|
185
|
+
|---|---|---|
|
|
186
|
+
| `MEMENTO_ENGINE_REPO` | `~/.local/share/SkillOpt` | Path to the SkillOpt repo |
|
|
187
|
+
| `MEMENTO_HOME` | `~/.memento` | Runtime data dir |
|
|
188
|
+
| `MEMENTO_WORKSPACES` | auto-detected | Colon-separated workspace paths |
|
|
189
|
+
| `MEMENTO_MANAGED_SKILL` | `memento-learned` | Skill name to evolve |
|
|
190
|
+
| `MEMENTO_MEMORY_PATH` | `~/.agentmemory/standalone.json` | Where `memory_save`/`memory_recall` store memories |
|
|
191
|
+
| `MEMENTO_AUTO_ADOPT_MIN_SCORE` | unset | Optional floor for `memento_auto`; skip adopt if the parsed validation score is below it (the engine's own gate still applies) |
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
195
|
+
## Verify (no Devin session needed)
|
|
196
|
+
|
|
197
|
+
Run the test suite (stdlib-only, no pytest required):
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
python3 -m unittest discover -s tests -v
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
It covers the harvest helpers, the Devin ATIF transcript path, the judge, the MCP
|
|
204
|
+
protocol, and the **microsoft/SkillOpt engine command contract**. The one
|
|
205
|
+
integration test that runs the real engine is skipped automatically unless
|
|
206
|
+
`skillopt_sleep` is installed (via `install.sh`).
|
|
207
|
+
|
|
208
|
+
Or smoke-test the MCP server's JSON-RPC directly:
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
MEMENTO_ENGINE_REPO=~/.local/share/SkillOpt \
|
|
212
|
+
printf '%s\n' \
|
|
213
|
+
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
|
|
214
|
+
'{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
|
|
215
|
+
| python3 mcp_server.py
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
## Project structure
|
|
221
|
+
|
|
222
|
+
```
|
|
223
|
+
memento/
|
|
224
|
+
├── mcp_server.py MCP server (stdlib-only, stdio) — Devin
|
|
225
|
+
├── harvest_devin.py Transcript generator (Devin ATIF-v1.7 + agentmemory + skills)
|
|
226
|
+
├── judge.py Reference judge — scores a reply against a rubric (validation gate)
|
|
227
|
+
├── fixtures/
|
|
228
|
+
│ └── devin_sample.json Sample ATIF transcript for offline testing
|
|
229
|
+
├── tests/
|
|
230
|
+
│ └── test_memento.py Test suite (harvest, Devin path, judge, MCP, engine contract)
|
|
231
|
+
├── blog-memento.html Walk-through / use-case blog (PO · QA · Developer)
|
|
232
|
+
├── mcp-config.example.json Devin MCP config snippet
|
|
233
|
+
├── devin-rules.snippet.md Copy to .devin/rules/memento.md
|
|
234
|
+
├── seed_skill/
|
|
235
|
+
│ └── SKILL.md Initial skill seed (replaced by memento_adopt)
|
|
236
|
+
├── install.sh One-shot installer (Devin auto-detected)
|
|
237
|
+
├── pyproject.toml Packaging — `memento-mcp` console entrypoint (uvx/pip)
|
|
238
|
+
└── README.md
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
243
|
+
## Outcomes & the validation gate
|
|
244
|
+
|
|
245
|
+
SkillOpt only improves a skill **where tasks recur and have a checkable
|
|
246
|
+
correctness signal**. A bare transcript has neither, so `harvest_devin.py`
|
|
247
|
+
enriches Devin trajectories with two things and writes them to
|
|
248
|
+
`<data-dir>/outcomes.jsonl`:
|
|
249
|
+
|
|
250
|
+
- **`taskKey`** — a stable `<lang>:<intent>:<target>` grouping key (e.g.
|
|
251
|
+
`java:fix:orderservice`) so repeats of the same task collapse into one
|
|
252
|
+
recurring task the gate can replay.
|
|
253
|
+
- **an outcome envelope** — the checkable signal:
|
|
254
|
+
- **hard signal** when the agent recorded a test/build result:
|
|
255
|
+
`{"success": true, "verifier": "tests", "evidence": "BUILD SUCCESS",
|
|
256
|
+
"reference": {"repro": "rtk mvn test -Dtest=OrderServiceTest"}}`
|
|
257
|
+
- **deferred (judge)** when no hard signal exists:
|
|
258
|
+
`{"success": null, "verifier": "judge", "rubric": [...]}` — a rubric is
|
|
259
|
+
derived from the task so [`judge.py`](judge.py) (or the engine) can score the
|
|
260
|
+
replay instead.
|
|
261
|
+
|
|
262
|
+
Score a reply against a rubric:
|
|
263
|
+
|
|
264
|
+
```bash
|
|
265
|
+
echo "<candidate reply>" | python3 judge.py --rubric-inline '["Addresses OrderService", "Resolves the reported defect without introducing new errors"]'
|
|
266
|
+
# → 0.5
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
`judge.py` defaults to an offline keyword-coverage heuristic (no API key).
|
|
270
|
+
Set `MEMENTO_JUDGE=claude` (+ `ANTHROPIC_API_KEY`) for an LLM judge.
|
|
271
|
+
|
|
272
|
+
> **Reality check:** the hard-signal path only fires if Devin actually
|
|
273
|
+
> records test or build results in its transcripts. If it doesn't, every task
|
|
274
|
+
> falls to the `judge` branch — point `--devin-transcripts` at a real transcript
|
|
275
|
+
> dir and inspect `outcomes.jsonl` to find out which case you're in.
|
|
276
|
+
|
|
277
|
+
Try it on the bundled fixture:
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
python3 harvest_devin.py --devin-transcripts fixtures --out-dir /tmp/memento-test
|
|
281
|
+
cat /tmp/memento-test/outcomes.jsonl
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
## Contributing / upstream
|
|
287
|
+
|
|
288
|
+
This plugin is being contributed back to
|
|
289
|
+
[microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) as
|
|
290
|
+
`plugins/devin/`. Bug reports and improvements welcome here or upstream.
|
|
291
|
+
|
|
292
|
+
## License
|
|
293
|
+
|
|
294
|
+
MIT — same as microsoft/SkillOpt.
|