devin-memento 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,309 @@
1
+ Metadata-Version: 2.4
2
+ Name: devin-memento
3
+ Version: 0.1.0
4
+ Summary: Memento — a Devin MCP server that gives Devin a nightly sleep cycle to self-improve its SKILL.md.
5
+ Author: xerxes-y
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/xerxes-y/memento
8
+ Project-URL: Repository, https://github.com/xerxes-y/memento
9
+ Keywords: devin,mcp,skillopt,agent,skill-optimization
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Topic :: Software Development :: Libraries
13
+ Requires-Python: >=3.10
14
+ Description-Content-Type: text/markdown
15
+
16
+ # memento
17
+
18
+ **Memento** integration for **Devin** (Cognition).
19
+
20
+ Gives Devin a nightly *sleep cycle*: reviews past sessions, mines recurring
21
+ patterns, proposes bounded edits to a long-term `SKILL.md`, and gates every
22
+ change with a held-out validation score — so only improvements that actually
23
+ make Devin better *at your work* get adopted.
24
+
25
+ > Built on [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt).
26
+
27
+ ---
28
+
29
+ ## How it works
30
+
31
+ Devin does not write conversation transcripts to disk in a format
32
+ the sleep engine understands. `harvest_devin.py` bridges this by converting
33
+ every locally available source into Claude Code-compatible JSONL transcripts:
34
+
35
+ | Source | Where | What it contributes |
36
+ |---|---|---|
37
+ | **Devin transcripts** | `~/.local/share/devin/cli/transcripts/*.json` | Native ATIF-v1.7 sessions — real user↔agent turns |
38
+ | **Memories** | `~/.agentmemory/standalone.json` | Memories saved via memento's built-in `memory_save` tool (or the [agentmemory MCP server](https://github.com/rohitg00/agentmemory) if you run it) |
39
+ | **Skill files** | `.devin/skills/*/SKILL.md` | Skill trigger patterns and expected behavior |
40
+
41
+ Memory is **built in** — `memory_save`/`memory_recall` write the same
42
+ `standalone.json` the harvester reads, so no separate memory MCP is required (it
43
+ stays compatible with [agentmemory](https://github.com/rohitg00/agentmemory) if
44
+ you already use it).
45
+ Workspaces are **auto-detected** from the Devin registry (nothing to configure):
46
+ - Devin: `~/.config/Devin/User/workspaceStorage/*/workspace.json`
47
+
48
+ After `memento_adopt` the evolved skill is synced to
49
+ `.devin/skills/memento-learned/SKILL.md` automatically.
50
+
51
+ ---
52
+
53
+ ## Install
54
+
55
+ **Requirements:** Python ≥ 3.10, Git, Devin CLI.
56
+
57
+ ```bash
58
+ git clone https://github.com/xerxes-y/memento.git
59
+ cd memento
60
+ bash install.sh
61
+ ```
62
+
63
+ `install.sh` will:
64
+ 1. Use or clone [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) to `<project-dir>/../SkillOpt` (or `--skillopt-dir`)
65
+ 2. Install `skillopt_sleep` (editable) into your Python environment
66
+ 3. Create `~/.memento/` (runtime data dir)
67
+ 4. Seed `memento-learned/SKILL.md` into every detected Devin workspace (`.devin/skills/`)
68
+ 5. Auto-register with **Devin CLI** MCP (`devin mcp add memento`) if the Devin CLI is on PATH
69
+
70
+ ### Devin post-install
71
+
72
+ MCP registration is automatic if the Devin CLI is installed.
73
+ Optionally copy `devin-rules.snippet.md` to `.devin/rules/memento.md` in your workspace so Devin knows to offer the sleep tools.
74
+
75
+ ### Windows
76
+
77
+ The runtime (`mcp_server.py` + `harvest_devin.py`) is cross-platform and
78
+ auto-detects Devin data under `%LOCALAPPDATA%\devin\cli\transcripts` — no extra flags needed.
79
+
80
+ `install.sh` is bash, so run it from **Git Bash** or **WSL**, or wire it up
81
+ manually: add the snippet from `mcp-config.example.json` to your Devin MCP config
82
+ (use `python` instead of `python3` and absolute Windows paths in `args`/`env`).
83
+
84
+ ### Manual config
85
+
86
+ **Devin** — run once in a terminal:
87
+
88
+ ```bash
89
+ devin mcp add memento \
90
+ --env "MEMENTO_ENGINE_REPO=<project-dir>/../SkillOpt" \
91
+ --env "MEMENTO_HOME=$HOME/.memento" \
92
+ -- python3 <project-dir>/mcp_server.py
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Add to Devin as an MCP extension (`uvx`, one line)
98
+
99
+ memento is published to PyPI as **[`devin-memento`](https://pypi.org/project/devin-memento/)**
100
+ with a `devin-memento` console entrypoint, so it runs as a self-contained package
101
+ with no clone or path wiring — ideal for Devin's **custom MCP** UI
102
+ (*Settings → Connections → MCP servers → Add a custom MCP → STDIO*) or the
103
+ `devin mcp add` CLI.
104
+
105
+ **STDIO config (Devin custom MCP):**
106
+
107
+ | Field | Value |
108
+ |---|---|
109
+ | Command | `uvx` |
110
+ | Args | `["devin-memento"]` |
111
+ | Env | `MEMENTO_ENGINE_REPO`, `MEMENTO_HOME` |
112
+
113
+ Or via the CLI:
114
+
115
+ ```bash
116
+ devin mcp add memento \
117
+ --env "MEMENTO_ENGINE_REPO=$HOME/.local/share/SkillOpt" \
118
+ --env "MEMENTO_HOME=$HOME/.memento" \
119
+ -- uvx devin-memento
120
+ ```
121
+
122
+ To run the unreleased `main` instead of the PyPI release, swap the args for
123
+ `["--from", "git+https://github.com/xerxes-y/memento", "devin-memento"]`.
124
+
125
+ Maintainers cut a release with:
126
+
127
+ ```bash
128
+ python3 -m build && python3 -m twine upload dist/*
129
+ ```
130
+
131
+ > The optimization engine (`skillopt_sleep`) is loaded at runtime from
132
+ > `MEMENTO_ENGINE_REPO` (a local SkillOpt clone), so it works inside the isolated
133
+ > `uvx` env without being on PyPI. Point `MEMENTO_ENGINE_REPO` at a clone (or run
134
+ > `install.sh` once to create one).
135
+
136
+ ---
137
+
138
+ ## Use
139
+
140
+ Ask Devin:
141
+
142
+ > *"run the sleep cycle"*, *"what did the last sleep propose?"*, *"adopt it"*
143
+
144
+ Or call tools directly:
145
+
146
+ | Tool | What it does |
147
+ |---|---|
148
+ | `memento_auto` | **fully automatic** — run + auto-adopt above the validation gate, returns the SKILL.md diff report |
149
+ | `memento_status` | nights run so far + latest staged proposal |
150
+ | `memento_dry_run` | preview cycle — no staging, no changes |
151
+ | `memento_run` | full cycle; stages a proposal for your review |
152
+ | `memento_adopt` | apply the staged proposal; syncs skill to workspace |
153
+ | `memento_harvest` | debug: list the recurring tasks mined |
154
+ | `memory_save` | persist a memory (`title` + `content`) to the built-in store |
155
+ | `memory_recall` | list/search saved memories (optional `query`, `limit`) |
156
+
157
+ Each tool accepts:
158
+
159
+ | Argument | Values | Default |
160
+ |---|---|---|
161
+ | `project` | abs path | cwd |
162
+ | `backend` | `mock` / `claude` / `codex` | `mock` |
163
+ | `scope` | `invoked` / `all` | `invoked` |
164
+
165
+ `mock` is free (no API calls). For real LLM optimization:
166
+ - `backend: "claude"` → set `ANTHROPIC_API_KEY`
167
+ - `backend: "codex"` → set `OPENAI_API_KEY`
168
+
169
+ ---
170
+
171
+ ## Run it fully automatically
172
+
173
+ `memento_auto` runs a cycle **and** adopts the result in one step, gated by the
174
+ engine's held-out validation (plus an optional `MEMENTO_AUTO_ADOPT_MIN_SCORE`
175
+ floor), then returns a before/after `SKILL.md` diff. Ask Devin *"auto-evolve the
176
+ skill"*, or schedule it to run unattended.
177
+
178
+ **macOS (launchd) — nightly at 02:00:**
179
+
180
+ ```bash
181
+ bash install.sh --schedule # uses first detected workspace
182
+ bash install.sh --schedule --schedule-time 03:30 --schedule-project /path/to/repo
183
+ ```
184
+
185
+ This writes `~/Library/LaunchAgents/com.memento.plist` and loads it; logs
186
+ go to `~/.memento/memento-auto.log`. Remove with
187
+ `launchctl unload <plist> && rm <plist>`.
188
+
189
+ **Linux / cron** — point a cron entry at the standalone runner:
190
+
191
+ ```cron
192
+ 0 2 * * * python3 /path/to/mcp_server.py --auto --project /path/to/repo --backend mock
193
+ ```
194
+
195
+ ---
196
+
197
+ ## Environment variables
198
+
199
+ | Variable | Default | Purpose |
200
+ |---|---|---|
201
+ | `MEMENTO_ENGINE_REPO` | `~/.local/share/SkillOpt` | Path to the SkillOpt repo |
202
+ | `MEMENTO_HOME` | `~/.memento` | Runtime data dir |
203
+ | `MEMENTO_WORKSPACES` | auto-detected | Colon-separated workspace paths |
204
+ | `MEMENTO_MANAGED_SKILL` | `memento-learned` | Skill name to evolve |
205
+ | `MEMENTO_MEMORY_PATH` | `~/.agentmemory/standalone.json` | Where `memory_save`/`memory_recall` store memories |
206
+ | `MEMENTO_AUTO_ADOPT_MIN_SCORE` | unset | Optional floor for `memento_auto`; skip adopt if the parsed validation score is below it (the engine's own gate still applies) |
207
+
208
+ ---
209
+
210
+ ## Verify (no Devin session needed)
211
+
212
+ Run the test suite (stdlib-only, no pytest required):
213
+
214
+ ```bash
215
+ python3 -m unittest discover -s tests -v
216
+ ```
217
+
218
+ It covers the harvest helpers, the Devin ATIF transcript path, the judge, the MCP
219
+ protocol, and the **microsoft/SkillOpt engine command contract**. The one
220
+ integration test that runs the real engine is skipped automatically unless
221
+ `skillopt_sleep` is installed (via `install.sh`).
222
+
223
+ Or smoke-test the MCP server's JSON-RPC directly:
224
+
225
+ ```bash
226
+ MEMENTO_ENGINE_REPO=~/.local/share/SkillOpt \
227
+ printf '%s\n' \
228
+ '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
229
+ '{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
230
+ | python3 mcp_server.py
231
+ ```
232
+
233
+ ---
234
+
235
+ ## Project structure
236
+
237
+ ```
238
+ memento/
239
+ ├── mcp_server.py MCP server (stdlib-only, stdio) — Devin
240
+ ├── harvest_devin.py Transcript generator (Devin ATIF-v1.7 + agentmemory + skills)
241
+ ├── judge.py Reference judge — scores a reply against a rubric (validation gate)
242
+ ├── fixtures/
243
+ │ └── devin_sample.json Sample ATIF transcript for offline testing
244
+ ├── tests/
245
+ │ └── test_memento.py Test suite (harvest, Devin path, judge, MCP, engine contract)
246
+ ├── blog-memento.html Walk-through / use-case blog (PO · QA · Developer)
247
+ ├── mcp-config.example.json Devin MCP config snippet
248
+ ├── devin-rules.snippet.md Copy to .devin/rules/memento.md
249
+ ├── seed_skill/
250
+ │ └── SKILL.md Initial skill seed (replaced by memento_adopt)
251
+ ├── install.sh One-shot installer (Devin auto-detected)
252
+ ├── pyproject.toml Packaging — `memento-mcp` console entrypoint (uvx/pip)
253
+ └── README.md
254
+ ```
255
+
256
+ ---
257
+
258
+ ## Outcomes & the validation gate
259
+
260
+ SkillOpt only improves a skill **where tasks recur and have a checkable
261
+ correctness signal**. A bare transcript has neither, so `harvest_devin.py`
262
+ enriches Devin trajectories with two things and writes them to
263
+ `<data-dir>/outcomes.jsonl`:
264
+
265
+ - **`taskKey`** — a stable `<lang>:<intent>:<target>` grouping key (e.g.
266
+ `java:fix:orderservice`) so repeats of the same task collapse into one
267
+ recurring task the gate can replay.
268
+ - **an outcome envelope** — the checkable signal:
269
+ - **hard signal** when the agent recorded a test/build result:
270
+ `{"success": true, "verifier": "tests", "evidence": "BUILD SUCCESS",
271
+ "reference": {"repro": "rtk mvn test -Dtest=OrderServiceTest"}}`
272
+ - **deferred (judge)** when no hard signal exists:
273
+ `{"success": null, "verifier": "judge", "rubric": [...]}` — a rubric is
274
+ derived from the task so [`judge.py`](judge.py) (or the engine) can score the
275
+ replay instead.
276
+
277
+ Score a reply against a rubric:
278
+
279
+ ```bash
280
+ echo "<candidate reply>" | python3 judge.py --rubric-inline '["Addresses OrderService", "Resolves the reported defect without introducing new errors"]'
281
+ # → 0.5
282
+ ```
283
+
284
+ `judge.py` defaults to an offline keyword-coverage heuristic (no API key).
285
+ Set `MEMENTO_JUDGE=claude` (+ `ANTHROPIC_API_KEY`) for an LLM judge.
286
+
287
+ > **Reality check:** the hard-signal path only fires if Devin actually
288
+ > records test or build results in its transcripts. If it doesn't, every task
289
+ > falls to the `judge` branch — point `--devin-transcripts` at a real transcript
290
+ > dir and inspect `outcomes.jsonl` to find out which case you're in.
291
+
292
+ Try it on the bundled fixture:
293
+
294
+ ```bash
295
+ python3 harvest_devin.py --devin-transcripts fixtures --out-dir /tmp/memento-test
296
+ cat /tmp/memento-test/outcomes.jsonl
297
+ ```
298
+
299
+ ---
300
+
301
+ ## Contributing / upstream
302
+
303
+ This plugin is being contributed back to
304
+ [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) as
305
+ `plugins/devin/`. Bug reports and improvements welcome here or upstream.
306
+
307
+ ## License
308
+
309
+ MIT — same as microsoft/SkillOpt.
@@ -0,0 +1,294 @@
1
+ # memento
2
+
3
+ **Memento** integration for **Devin** (Cognition).
4
+
5
+ Gives Devin a nightly *sleep cycle*: reviews past sessions, mines recurring
6
+ patterns, proposes bounded edits to a long-term `SKILL.md`, and gates every
7
+ change with a held-out validation score — so only improvements that actually
8
+ make Devin better *at your work* get adopted.
9
+
10
+ > Built on [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt).
11
+
12
+ ---
13
+
14
+ ## How it works
15
+
16
+ Devin does not write conversation transcripts to disk in a format
17
+ the sleep engine understands. `harvest_devin.py` bridges this by converting
18
+ every locally available source into Claude Code-compatible JSONL transcripts:
19
+
20
+ | Source | Where | What it contributes |
21
+ |---|---|---|
22
+ | **Devin transcripts** | `~/.local/share/devin/cli/transcripts/*.json` | Native ATIF-v1.7 sessions — real user↔agent turns |
23
+ | **Memories** | `~/.agentmemory/standalone.json` | Memories saved via memento's built-in `memory_save` tool (or the [agentmemory MCP server](https://github.com/rohitg00/agentmemory) if you run it) |
24
+ | **Skill files** | `.devin/skills/*/SKILL.md` | Skill trigger patterns and expected behavior |
25
+
26
+ Memory is **built in** — `memory_save`/`memory_recall` write the same
27
+ `standalone.json` the harvester reads, so no separate memory MCP is required (it
28
+ stays compatible with [agentmemory](https://github.com/rohitg00/agentmemory) if
29
+ you already use it).
30
+ Workspaces are **auto-detected** from the Devin registry (nothing to configure):
31
+ - Devin: `~/.config/Devin/User/workspaceStorage/*/workspace.json`
32
+
33
+ After `memento_adopt` the evolved skill is synced to
34
+ `.devin/skills/memento-learned/SKILL.md` automatically.
35
+
36
+ ---
37
+
38
+ ## Install
39
+
40
+ **Requirements:** Python ≥ 3.10, Git, Devin CLI.
41
+
42
+ ```bash
43
+ git clone https://github.com/xerxes-y/memento.git
44
+ cd memento
45
+ bash install.sh
46
+ ```
47
+
48
+ `install.sh` will:
49
+ 1. Use or clone [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) to `<project-dir>/../SkillOpt` (or `--skillopt-dir`)
50
+ 2. Install `skillopt_sleep` (editable) into your Python environment
51
+ 3. Create `~/.memento/` (runtime data dir)
52
+ 4. Seed `memento-learned/SKILL.md` into every detected Devin workspace (`.devin/skills/`)
53
+ 5. Auto-register with **Devin CLI** MCP (`devin mcp add memento`) if the Devin CLI is on PATH
54
+
55
+ ### Devin post-install
56
+
57
+ MCP registration is automatic if the Devin CLI is installed.
58
+ Optionally copy `devin-rules.snippet.md` to `.devin/rules/memento.md` in your workspace so Devin knows to offer the sleep tools.
59
+
60
+ ### Windows
61
+
62
+ The runtime (`mcp_server.py` + `harvest_devin.py`) is cross-platform and
63
+ auto-detects Devin data under `%LOCALAPPDATA%\devin\cli\transcripts` — no extra flags needed.
64
+
65
+ `install.sh` is bash, so run it from **Git Bash** or **WSL**, or wire it up
66
+ manually: add the snippet from `mcp-config.example.json` to your Devin MCP config
67
+ (use `python` instead of `python3` and absolute Windows paths in `args`/`env`).
68
+
69
+ ### Manual config
70
+
71
+ **Devin** — run once in a terminal:
72
+
73
+ ```bash
74
+ devin mcp add memento \
75
+ --env "MEMENTO_ENGINE_REPO=<project-dir>/../SkillOpt" \
76
+ --env "MEMENTO_HOME=$HOME/.memento" \
77
+ -- python3 <project-dir>/mcp_server.py
78
+ ```
79
+
80
+ ---
81
+
82
+ ## Add to Devin as an MCP extension (`uvx`, one line)
83
+
84
+ memento is published to PyPI as **[`devin-memento`](https://pypi.org/project/devin-memento/)**
85
+ with a `devin-memento` console entrypoint, so it runs as a self-contained package
86
+ with no clone or path wiring — ideal for Devin's **custom MCP** UI
87
+ (*Settings → Connections → MCP servers → Add a custom MCP → STDIO*) or the
88
+ `devin mcp add` CLI.
89
+
90
+ **STDIO config (Devin custom MCP):**
91
+
92
+ | Field | Value |
93
+ |---|---|
94
+ | Command | `uvx` |
95
+ | Args | `["devin-memento"]` |
96
+ | Env | `MEMENTO_ENGINE_REPO`, `MEMENTO_HOME` |
97
+
98
+ Or via the CLI:
99
+
100
+ ```bash
101
+ devin mcp add memento \
102
+ --env "MEMENTO_ENGINE_REPO=$HOME/.local/share/SkillOpt" \
103
+ --env "MEMENTO_HOME=$HOME/.memento" \
104
+ -- uvx devin-memento
105
+ ```
106
+
107
+ To run the unreleased `main` instead of the PyPI release, swap the args for
108
+ `["--from", "git+https://github.com/xerxes-y/memento", "devin-memento"]`.
109
+
110
+ Maintainers cut a release with:
111
+
112
+ ```bash
113
+ python3 -m build && python3 -m twine upload dist/*
114
+ ```
115
+
116
+ > The optimization engine (`skillopt_sleep`) is loaded at runtime from
117
+ > `MEMENTO_ENGINE_REPO` (a local SkillOpt clone), so it works inside the isolated
118
+ > `uvx` env without being on PyPI. Point `MEMENTO_ENGINE_REPO` at a clone (or run
119
+ > `install.sh` once to create one).
120
+
121
+ ---
122
+
123
+ ## Use
124
+
125
+ Ask Devin:
126
+
127
+ > *"run the sleep cycle"*, *"what did the last sleep propose?"*, *"adopt it"*
128
+
129
+ Or call tools directly:
130
+
131
+ | Tool | What it does |
132
+ |---|---|
133
+ | `memento_auto` | **fully automatic** — run + auto-adopt above the validation gate, returns the SKILL.md diff report |
134
+ | `memento_status` | nights run so far + latest staged proposal |
135
+ | `memento_dry_run` | preview cycle — no staging, no changes |
136
+ | `memento_run` | full cycle; stages a proposal for your review |
137
+ | `memento_adopt` | apply the staged proposal; syncs skill to workspace |
138
+ | `memento_harvest` | debug: list the recurring tasks mined |
139
+ | `memory_save` | persist a memory (`title` + `content`) to the built-in store |
140
+ | `memory_recall` | list/search saved memories (optional `query`, `limit`) |
141
+
142
+ Each tool accepts:
143
+
144
+ | Argument | Values | Default |
145
+ |---|---|---|
146
+ | `project` | abs path | cwd |
147
+ | `backend` | `mock` / `claude` / `codex` | `mock` |
148
+ | `scope` | `invoked` / `all` | `invoked` |
149
+
150
+ `mock` is free (no API calls). For real LLM optimization:
151
+ - `backend: "claude"` → set `ANTHROPIC_API_KEY`
152
+ - `backend: "codex"` → set `OPENAI_API_KEY`
153
+
154
+ ---
155
+
156
+ ## Run it fully automatically
157
+
158
+ `memento_auto` runs a cycle **and** adopts the result in one step, gated by the
159
+ engine's held-out validation (plus an optional `MEMENTO_AUTO_ADOPT_MIN_SCORE`
160
+ floor), then returns a before/after `SKILL.md` diff. Ask Devin *"auto-evolve the
161
+ skill"*, or schedule it to run unattended.
162
+
163
+ **macOS (launchd) — nightly at 02:00:**
164
+
165
+ ```bash
166
+ bash install.sh --schedule # uses first detected workspace
167
+ bash install.sh --schedule --schedule-time 03:30 --schedule-project /path/to/repo
168
+ ```
169
+
170
+ This writes `~/Library/LaunchAgents/com.memento.plist` and loads it; logs
171
+ go to `~/.memento/memento-auto.log`. Remove with
172
+ `launchctl unload <plist> && rm <plist>`.
173
+
174
+ **Linux / cron** — point a cron entry at the standalone runner:
175
+
176
+ ```cron
177
+ 0 2 * * * python3 /path/to/mcp_server.py --auto --project /path/to/repo --backend mock
178
+ ```
179
+
180
+ ---
181
+
182
+ ## Environment variables
183
+
184
+ | Variable | Default | Purpose |
185
+ |---|---|---|
186
+ | `MEMENTO_ENGINE_REPO` | `~/.local/share/SkillOpt` | Path to the SkillOpt repo |
187
+ | `MEMENTO_HOME` | `~/.memento` | Runtime data dir |
188
+ | `MEMENTO_WORKSPACES` | auto-detected | Colon-separated workspace paths |
189
+ | `MEMENTO_MANAGED_SKILL` | `memento-learned` | Skill name to evolve |
190
+ | `MEMENTO_MEMORY_PATH` | `~/.agentmemory/standalone.json` | Where `memory_save`/`memory_recall` store memories |
191
+ | `MEMENTO_AUTO_ADOPT_MIN_SCORE` | unset | Optional floor for `memento_auto`; skip adopt if the parsed validation score is below it (the engine's own gate still applies) |
192
+
193
+ ---
194
+
195
+ ## Verify (no Devin session needed)
196
+
197
+ Run the test suite (stdlib-only, no pytest required):
198
+
199
+ ```bash
200
+ python3 -m unittest discover -s tests -v
201
+ ```
202
+
203
+ It covers the harvest helpers, the Devin ATIF transcript path, the judge, the MCP
204
+ protocol, and the **microsoft/SkillOpt engine command contract**. The one
205
+ integration test that runs the real engine is skipped automatically unless
206
+ `skillopt_sleep` is installed (via `install.sh`).
207
+
208
+ Or smoke-test the MCP server's JSON-RPC directly:
209
+
210
+ ```bash
211
+ MEMENTO_ENGINE_REPO=~/.local/share/SkillOpt \
212
+ printf '%s\n' \
213
+ '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
214
+ '{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
215
+ | python3 mcp_server.py
216
+ ```
217
+
218
+ ---
219
+
220
+ ## Project structure
221
+
222
+ ```
223
+ memento/
224
+ ├── mcp_server.py MCP server (stdlib-only, stdio) — Devin
225
+ ├── harvest_devin.py Transcript generator (Devin ATIF-v1.7 + agentmemory + skills)
226
+ ├── judge.py Reference judge — scores a reply against a rubric (validation gate)
227
+ ├── fixtures/
228
+ │ └── devin_sample.json Sample ATIF transcript for offline testing
229
+ ├── tests/
230
+ │ └── test_memento.py Test suite (harvest, Devin path, judge, MCP, engine contract)
231
+ ├── blog-memento.html Walk-through / use-case blog (PO · QA · Developer)
232
+ ├── mcp-config.example.json Devin MCP config snippet
233
+ ├── devin-rules.snippet.md Copy to .devin/rules/memento.md
234
+ ├── seed_skill/
235
+ │ └── SKILL.md Initial skill seed (replaced by memento_adopt)
236
+ ├── install.sh One-shot installer (Devin auto-detected)
237
+ ├── pyproject.toml Packaging — `memento-mcp` console entrypoint (uvx/pip)
238
+ └── README.md
239
+ ```
240
+
241
+ ---
242
+
243
+ ## Outcomes & the validation gate
244
+
245
+ SkillOpt only improves a skill **where tasks recur and have a checkable
246
+ correctness signal**. A bare transcript has neither, so `harvest_devin.py`
247
+ enriches Devin trajectories with two things and writes them to
248
+ `<data-dir>/outcomes.jsonl`:
249
+
250
+ - **`taskKey`** — a stable `<lang>:<intent>:<target>` grouping key (e.g.
251
+ `java:fix:orderservice`) so repeats of the same task collapse into one
252
+ recurring task the gate can replay.
253
+ - **an outcome envelope** — the checkable signal:
254
+ - **hard signal** when the agent recorded a test/build result:
255
+ `{"success": true, "verifier": "tests", "evidence": "BUILD SUCCESS",
256
+ "reference": {"repro": "rtk mvn test -Dtest=OrderServiceTest"}}`
257
+ - **deferred (judge)** when no hard signal exists:
258
+ `{"success": null, "verifier": "judge", "rubric": [...]}` — a rubric is
259
+ derived from the task so [`judge.py`](judge.py) (or the engine) can score the
260
+ replay instead.
261
+
262
+ Score a reply against a rubric:
263
+
264
+ ```bash
265
+ echo "<candidate reply>" | python3 judge.py --rubric-inline '["Addresses OrderService", "Resolves the reported defect without introducing new errors"]'
266
+ # → 0.5
267
+ ```
268
+
269
+ `judge.py` defaults to an offline keyword-coverage heuristic (no API key).
270
+ Set `MEMENTO_JUDGE=claude` (+ `ANTHROPIC_API_KEY`) for an LLM judge.
271
+
272
+ > **Reality check:** the hard-signal path only fires if Devin actually
273
+ > records test or build results in its transcripts. If it doesn't, every task
274
+ > falls to the `judge` branch — point `--devin-transcripts` at a real transcript
275
+ > dir and inspect `outcomes.jsonl` to find out which case you're in.
276
+
277
+ Try it on the bundled fixture:
278
+
279
+ ```bash
280
+ python3 harvest_devin.py --devin-transcripts fixtures --out-dir /tmp/memento-test
281
+ cat /tmp/memento-test/outcomes.jsonl
282
+ ```
283
+
284
+ ---
285
+
286
+ ## Contributing / upstream
287
+
288
+ This plugin is being contributed back to
289
+ [microsoft/SkillOpt](https://github.com/microsoft/SkillOpt) as
290
+ `plugins/devin/`. Bug reports and improvements welcome here or upstream.
291
+
292
+ ## License
293
+
294
+ MIT — same as microsoft/SkillOpt.