get-claudia 1.59.1 → 1.60.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +66 -0
- package/README.md +25 -6
- package/memory-daemon/claudia_memory/database.py +4 -1
- package/memory-daemon/claudia_memory/services/remember.py +11 -3
- package/package.json +1 -1
- package/template-v2/.claude/manifest.json +41 -31
- package/template-v2/.claude/skills/README.md +48 -0
- package/template-v2/.claude/skills/auto-research/SKILL.md +187 -0
- package/template-v2/.claude/skills/auto-research/references/program-template.md +72 -0
- package/template-v2/.claude/skills/auto-research/references/safety-rules.md +110 -0
- package/template-v2/.claude/skills/brain/SKILL.md +1 -1
- package/template-v2/.claude/skills/brain-monitor/SKILL.md +1 -1
- package/template-v2/.claude/skills/capability-suggester.md +1 -1
- package/template-v2/.claude/skills/capture-meeting/SKILL.md +1 -1
- package/template-v2/.claude/skills/close-the-loop/SKILL.md +139 -0
- package/template-v2/.claude/skills/connector-discovery.md +1 -1
- package/template-v2/.claude/skills/diagnose/SKILL.md +1 -1
- package/template-v2/.claude/skills/draft-reply/SKILL.md +1 -1
- package/template-v2/.claude/skills/file-document/SKILL.md +1 -1
- package/template-v2/.claude/skills/follow-up-draft/SKILL.md +1 -1
- package/template-v2/.claude/skills/growth-check/SKILL.md +1 -1
- package/template-v2/.claude/skills/hire-agent.md +1 -1
- package/template-v2/.claude/skills/inbox-check/SKILL.md +1 -1
- package/template-v2/.claude/skills/ingest-sources/SKILL.md +1 -1
- package/template-v2/.claude/skills/judgment-awareness.md +1 -1
- package/template-v2/.claude/skills/manuscript-fact-check/SKILL.md +111 -0
- package/template-v2/.claude/skills/map-connections/SKILL.md +1 -1
- package/template-v2/.claude/skills/meditate/SKILL.md +1 -1
- package/template-v2/.claude/skills/meeting-prep/SKILL.md +1 -1
- package/template-v2/.claude/skills/memory-audit/SKILL.md +1 -1
- package/template-v2/.claude/skills/memory-health/SKILL.md +1 -1
- package/template-v2/.claude/skills/morning-brief/SKILL.md +1 -1
- package/template-v2/.claude/skills/new-person/SKILL.md +1 -1
- package/template-v2/.claude/skills/pattern-recognizer.md +1 -1
- package/template-v2/.claude/skills/relationship-tracker.md +1 -1
- package/template-v2/.claude/skills/risk-surfacer.md +1 -1
- package/template-v2/.claude/skills/skill-router/SKILL.md +148 -0
- package/template-v2/.claude/skills/skill-router/references/overlap-clusters.md +141 -0
- package/template-v2/.claude/skills/summarize-doc/SKILL.md +1 -1
- package/template-v2/.claude/skills/vault-awareness.md +18 -1
- package/template-v2/.claude/skills/weekly-review/SKILL.md +1 -1
- package/template-v2/.claude/skills/what-am-i-missing/SKILL.md +1 -1
- package/template-v2/.claude/skills/wiki/SKILL.md +187 -0
- package/template-v2/.claude/skills/wiki/references/citations-and-contradictions.md +99 -0
- package/template-v2/.claude/skills/wiki/references/page-template.md +84 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,72 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Claudia will be documented in this file.
|
|
4
4
|
|
|
5
|
+
## 1.60.1 (2026-05-22)
|
|
6
|
+
|
|
7
|
+
### Two bug fixes from the fixing-phase pass
|
|
8
|
+
|
|
9
|
+
Both shipped together as a patch release. No new features; no surface-area change. Upgrade with `npx get-claudia .` from any existing install.
|
|
10
|
+
|
|
11
|
+
### Fixed
|
|
12
|
+
|
|
13
|
+
- **Memory daemon: WAL checkpoint no longer blocks concurrent readers** (#66, thanks @tilthnco). `database.py` now uses `PRAGMA wal_checkpoint(PASSIVE)` on every connection instead of `TRUNCATE`. `TRUNCATE` takes an exclusive lock, which deadlocks against concurrent WAL readers like Litestream and causes the MCP server to time out after 30s on startup. `PASSIVE` yields cleanly if a reader holds the WAL lock. Compaction behavior is unchanged; only the locking mode is relaxed. Crash safety is unaffected since WAL mode (not checkpointing) is the durability mechanism.
|
|
14
|
+
- **Memory daemon: organisations from session summaries no longer misclassified as `person`** (Proposal #51, sub-tranche B2). Two stale defaults were bypassing the smart type-inference path: `RememberService.end_session()` hard-defaulted missing `type` fields to the literal string `"person"` (short-circuiting inference inside `remember_entity()`), and `remember_entity()` itself was still calling the legacy local `_infer_entity_type()` (returns `"person"` fallback, no `"AI"`-suffix rule) instead of the smart inferencer at `entities.infer_entity_type` (returns `"concept"` fallback, handles `"Markup AI"`, `".ai"`, etc.). Both call sites now route through the smart inferencer, matching `_find_or_create_entity()` which already did. The literal repro from 2026-05-13 (Matt Blumberg, Markup AI) now classifies Markup AI as `organization`. See `docs/proposals/08-smarter-memory-writes.md` for the original proposal; `tests/test_entity_resolution.py::TestEndSessionInfersEntityType` pins the fix (3 new tests, 25/25 in the file pass).
|
|
15
|
+
|
|
16
|
+
### Changed
|
|
17
|
+
|
|
18
|
+
- 27 skill descriptions in `template-v2/.claude/skills/` now end with a "See also: …" pointer to adjacent skills (from #61). Affected clusters: outbound message composition, memory introspection, memory visualization, reflective cadences, meeting lifecycle, risks and blind spots, people and relationships, patterns and judgment, inbound document processing. Skill names, trigger phrases, and behaviors are unchanged; descriptions are extended additively to nudge skill routing toward the canonical pick when a request straddles two skills.
|
|
19
|
+
|
|
20
|
+
### Documentation
|
|
21
|
+
|
|
22
|
+
- Expanded `template-v2/.claude/skills/README.md` with four new sections (from #61): "Writing a good skill description", "The see also convention", "Disambiguation notes", and "Proactive vs contextual: when to make a skill auto-fire". Each section grounds itself in existing-catalog exemplars.
|
|
23
|
+
|
|
24
|
+
## 1.60.0 (2026-05-15)
|
|
25
|
+
|
|
26
|
+
### Three new skills inspired by Karpathy's recent work, adapted to Claudia's principles
|
|
27
|
+
|
|
28
|
+
This release adds the wiki layer (new default vault projection, inspired by Karpathy's [llm-wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)), the auto-research skill (workspace-scoped iteration loop, inspired by Karpathy's [autoresearch](https://github.com/karpathy/autoresearch)), and the skill router (discovery and disambiguation meta-skill for the catalog).
|
|
29
|
+
|
|
30
|
+
### Wiki layer (new default vault projection)
|
|
31
|
+
|
|
32
|
+
The wiki is the third tier of Claudia's memory: raw memories live in the database, derived signals (entities, reflections, patterns) live in the daemon, and **synthesized topic pages** now live in the user's vault at `~/.claudia/vault/Wiki/`. Each page is written by Claudia from raw memories, cites every load-bearing claim with `[mem:NNN]`, flags contradictions at the top, and grows incrementally with use.
|
|
33
|
+
|
|
34
|
+
- New `wiki` skill at `template-v2/.claude/skills/wiki/SKILL.md` with two reference docs (page template, citations/contradictions discipline). The skill defines when to write pages, the page structure, the citation format, and the read workflow (consult the wiki page first, fall back to memory query if stale).
|
|
35
|
+
- Vault default for new installs is now wiki, not PARA. New installs write synthesized pages at `~/.claudia/vault/Wiki/` instead of mechanical entity dumps.
|
|
36
|
+
- `vault-awareness` skill updated with a "Wiki vs PARA" section at the top and a detection rule, with a pointer to the new wiki skill for specifics.
|
|
37
|
+
|
|
38
|
+
**Compatibility:** Existing PARA vaults are preserved untouched. Users with vaults from v1.42 to v1.59 keep their existing structure. Detection rule: if `~/.claudia/vault/Active/` or `Relationships/` exists without `Wiki/`, treat as PARA. The mechanical dump continues to function for them. A future `claudia-memory --migrate-to-wiki` CLI will copy PARA aside and rebuild wiki pages from raw memories. Not in this release; PARA users stay on PARA until they explicitly migrate.
|
|
39
|
+
|
|
40
|
+
### Auto-research skill (workspace-scoped iteration loop)
|
|
41
|
+
|
|
42
|
+
Workspace-scoped hill-climbing loop for iterating on local artifacts. Adapted to Claudia's safety principles (workspace-only edits, no external actions during the loop, bounded budget, baseline-score gate, per-iteration revertability).
|
|
43
|
+
|
|
44
|
+
- New `auto-research` skill at `template-v2/.claude/skills/auto-research/SKILL.md` with two reference docs: `references/program-template.md` for the governance file each run uses, and `references/safety-rules.md` documenting the 8 mandatory safety rules.
|
|
45
|
+
- Workspace at `~/.claudia/auto-research/<task-id>/`. Each run gets its own directory with a fresh git repo, an immutable copy of the original artifact, a working copy that gets edited, a `results.tsv` history, and per-iteration snapshots.
|
|
46
|
+
- Loop: read state, propose one specific change, implement, score against rubric, ratchet if better OR revert if worse, report to user as one line per iteration.
|
|
47
|
+
- Default budget: 20 iterations. Plateau detection: stops early if 5 consecutive iterations show no improvement.
|
|
48
|
+
- The user's original file is **never** modified during the loop. Hand-off at the end requires explicit user confirmation of destination.
|
|
49
|
+
- Refuse-to-start conditions are explicit: external-action artifacts, sensitive content, no clear evaluator, already-good baseline, or cases where bold structural change is the actual need (not iteration).
|
|
50
|
+
- Karpathy named the conservatism ceiling himself: RLHF-trained iteration is "cagy and scared." Iterations tend toward safe edits, not bold reframing. The skill surfaces this honestly when the loop plateaus, so the user knows when to stop iterating and start a fresh draft instead.
|
|
51
|
+
|
|
52
|
+
### Skill router (discovery + disambiguation)
|
|
53
|
+
|
|
54
|
+
A meta-skill that helps both users and Claudia navigate the ~42-skill catalog. Addresses two long-standing problems: users not knowing what's available, and Claudia firing the wrong skill when a request straddles two.
|
|
55
|
+
|
|
56
|
+
- New `skill-router` skill at `template-v2/.claude/skills/skill-router/SKILL.md` with `references/overlap-clusters.md` documenting all 10 known overlap clusters with canonical picks and disambiguation patterns.
|
|
57
|
+
- **Discovery surface.** `/skills` returns a categorized list (daily flow, reviews, pipeline, knowledge, drafting, setup). `/skills <topic>` filters by keyword.
|
|
58
|
+
- **Disambiguation surface.** When a request matches 2+ skills, Claudia names the candidates briefly and proceeds with the canonical one. Pattern: "Sounds like X or Y. I'll do X. Say so if you wanted Y."
|
|
59
|
+
|
|
60
|
+
### Not yet in this release
|
|
61
|
+
- Automatic wiki-refresh queue (mark entities as "dirty" on ingest, batch refresh later).
|
|
62
|
+
- Daemon-side MCP tools for wiki page metadata.
|
|
63
|
+
- Shell-command evaluators for auto-research (today: rubric-based scoring only).
|
|
64
|
+
- Token-spend and wall-clock budgets for auto-research (today: iteration count only).
|
|
65
|
+
- Telemetry for skill router.
|
|
66
|
+
|
|
67
|
+
No CLI surface change, no MCP tool change, no database schema change. Existing skill names and trigger phrases unchanged. Existing PARA vaults preserved.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
5
71
|
## 1.59.1 (2026-05-15)
|
|
6
72
|
|
|
7
73
|
### Docs uplift
|
package/README.md
CHANGED
|
@@ -79,6 +79,20 @@ You make a promise in a meeting. Nobody tracks it. You promise a deliverable on
|
|
|
79
79
|
<p>End a session with <code>/meditate</code> and she extracts what she learned: your preferences, patterns, and judgment calls. Next session, she's sharper.</p>
|
|
80
80
|
</td>
|
|
81
81
|
</tr>
|
|
82
|
+
<tr>
|
|
83
|
+
<td width="33%" align="center">
|
|
84
|
+
<h3>📚 Writes Her Own Wiki</h3>
|
|
85
|
+
<p>Every active person, project, and organization gets a synthesized page in your Obsidian vault. Each fact cites its source memory. Contradictions get flagged at the top. <em>New in v1.60</em>.</p>
|
|
86
|
+
</td>
|
|
87
|
+
<td width="33%" align="center">
|
|
88
|
+
<h3>🔁 Iterates Until It's Right</h3>
|
|
89
|
+
<p>Ask her to <code>/auto-research</code> a draft and she runs a bounded loop against a rubric you name. Keeps the iterations she likes, reverts the ones she doesn't. Your original file never moves until you say so. <em>New in v1.60</em>.</p>
|
|
90
|
+
</td>
|
|
91
|
+
<td width="33%" align="center">
|
|
92
|
+
<h3>🧭 Routes the Right Skill</h3>
|
|
93
|
+
<p>Type <code>/skills</code> to see what's available. Ambiguous request? She names the options and proceeds with the canonical one, so you never get the wrong tool by accident. <em>New in v1.60</em>.</p>
|
|
94
|
+
</td>
|
|
95
|
+
</tr>
|
|
82
96
|
</table>
|
|
83
97
|
|
|
84
98
|
---
|
|
@@ -103,7 +117,7 @@ You make a promise in a meeting. Nobody tracks it. You promise a deliverable on
|
|
|
103
117
|
</td>
|
|
104
118
|
<td width="50%" align="center">
|
|
105
119
|
<h3>📓 Vault</h3>
|
|
106
|
-
<p>Memory
|
|
120
|
+
<p>Memory projects to an Obsidian vault. New installs default to the <strong>wiki layout</strong>: synthesized topic pages per active entity, each fact citing its source memory, contradictions flagged at the top. Existing PARA users keep their layout untouched. Plain markdown you own forever.</p>
|
|
107
121
|
</td>
|
|
108
122
|
</tr>
|
|
109
123
|
</table>
|
|
@@ -264,9 +278,12 @@ Claudia detects your work style and generates structure that fits:
|
|
|
264
278
|
| `/meditate` | End-of-session reflection: extracts learnings, judgment, patterns |
|
|
265
279
|
| `/deep-context [topic]` | Full-context deep analysis |
|
|
266
280
|
| `/memory-audit` | See everything Claudia knows, with source chains |
|
|
281
|
+
| `/wiki` | Write or update a synthesized topic page in your vault |
|
|
282
|
+
| `/auto-research` | Iterate a draft against a rubric until it scores well |
|
|
283
|
+
| `/skills` | Discover all available skills, grouped by purpose |
|
|
267
284
|
|
|
268
285
|
<details>
|
|
269
|
-
<summary><strong>All commands (
|
|
286
|
+
<summary><strong>All commands (45 skills)</strong></summary>
|
|
270
287
|
|
|
271
288
|
| Command | What It Does |
|
|
272
289
|
|---------|--------------|
|
|
@@ -322,17 +339,19 @@ This generates a one-click URL to enable all required Google APIs and walks you
|
|
|
322
339
|
|
|
323
340
|
### Obsidian Vault
|
|
324
341
|
|
|
325
|
-
Memory
|
|
342
|
+
Memory projects to an Obsidian vault at `~/.claudia/vault/`. New installs default to the **wiki layout**: synthesized topic pages at `~/.claudia/vault/Wiki/`, one per active person, project, or organization. Each page is written by Claudia from your raw memories, cites every load-bearing claim with `[mem:NNN]`, and flags contradictions at the top. Obsidian's graph view connects them via `[[wikilinks]]`.
|
|
343
|
+
|
|
344
|
+
Existing installs from v1.59 and earlier keep their PARA-organized vault (`Active/`, `Relationships/`, `Reference/`, `Archive/`) untouched. SQLite remains the source of truth; the vault is a projection you can browse, search, and read.
|
|
326
345
|
|
|
327
346
|
---
|
|
328
347
|
|
|
329
348
|
## How It Works
|
|
330
349
|
|
|
331
|
-
**
|
|
350
|
+
**45 skills · 33 MCP tools · 500+ tests**
|
|
332
351
|
|
|
333
352
|
Claudia has two layers:
|
|
334
353
|
|
|
335
|
-
**Template layer** (markdown) defines who she is.
|
|
354
|
+
**Template layer** (markdown) defines who she is. 45 skills, rules, and identity files that Claude reads on startup. Skills range from proactive behaviors (commitment detection, pattern recognition, judgment awareness) to user-invocable workflows (`/morning-brief`, `/research`, `/meditate`, `/wiki`, `/auto-research`). Workspace templates let you spin up new projects with `/new-workspace [name]`. A built-in `skill-router` skill helps you discover what's available and disambiguates when a request straddles two skills.
|
|
336
355
|
|
|
337
356
|
**Memory system** (Python) defines what she remembers. Two daemon modes share the same SQLite database:
|
|
338
357
|
|
|
@@ -393,7 +412,7 @@ For full architecture diagrams, see [ARCHITECTURE.md](ARCHITECTURE.md).
|
|
|
393
412
|
|
|
394
413
|
- **Fully local.** Memory, embeddings, cognitive tools run on your machine. No external APIs for storage.
|
|
395
414
|
- **No external actions without approval.** Every email, calendar event, external action requires your explicit "yes."
|
|
396
|
-
- **Your data in two formats.** SQLite database (`~/.claudia/memory/`) for fast semantic search, plus
|
|
415
|
+
- **Your data in two formats.** SQLite database (`~/.claudia/memory/`) for fast semantic search, plus an Obsidian vault for reading and graph navigation. Two independent copies you own forever.
|
|
397
416
|
- **Delete anything, anytime.** Full control over your data. No lock-in, no cloud dependency.
|
|
398
417
|
|
|
399
418
|
---
|
|
@@ -116,7 +116,10 @@ class Database:
|
|
|
116
116
|
conn.execute("PRAGMA synchronous = NORMAL")
|
|
117
117
|
conn.execute("PRAGMA foreign_keys = ON")
|
|
118
118
|
# Recover any uncommitted WAL writes from a previous crashed daemon
|
|
119
|
-
|
|
119
|
+
# Use PASSIVE (not TRUNCATE) so concurrent readers (e.g. Litestream)
|
|
120
|
+
# don't cause a 30-second timeout. TRUNCATE blocks all readers;
|
|
121
|
+
# PASSIVE yields cleanly if a reader holds the WAL lock.
|
|
122
|
+
conn.execute("PRAGMA wal_checkpoint(PASSIVE)")
|
|
120
123
|
|
|
121
124
|
# Load sqlite-vec for vector search
|
|
122
125
|
if not load_sqlite_vec(conn):
|
|
@@ -436,9 +436,14 @@ class RememberService:
|
|
|
436
436
|
Returns:
|
|
437
437
|
Entity ID
|
|
438
438
|
"""
|
|
439
|
-
# Infer type from name keywords when no type is specified
|
|
439
|
+
# Infer type from name keywords when no type is specified.
|
|
440
|
+
# Use the smart inferencer (entities.infer_entity_type) so callers like
|
|
441
|
+
# end_session() benefit from the "AI"-suffix rule and concept fallback,
|
|
442
|
+
# matching _find_or_create_entity at line ~1871. The local legacy
|
|
443
|
+
# _infer_entity_type is preserved for direct unit-test imports.
|
|
444
|
+
# See Proposal #51 / docs/proposals/08-smarter-memory-writes.md.
|
|
440
445
|
if not entity_type or not entity_type.strip():
|
|
441
|
-
entity_type =
|
|
446
|
+
entity_type = _smart_infer_entity_type(name)
|
|
442
447
|
|
|
443
448
|
# Run deterministic guards
|
|
444
449
|
existing_names = [
|
|
@@ -1440,9 +1445,12 @@ class RememberService:
|
|
|
1440
1445
|
# 5. Store entities
|
|
1441
1446
|
if entities:
|
|
1442
1447
|
for entity in entities:
|
|
1448
|
+
# Default to empty string (not "person") so remember_entity()
|
|
1449
|
+
# routes through _infer_entity_type() when type is missing.
|
|
1450
|
+
# See Proposal #51 / test_entity_resolution.py for context.
|
|
1443
1451
|
entity_id = self.remember_entity(
|
|
1444
1452
|
name=entity["name"],
|
|
1445
|
-
entity_type=entity.get("type", "
|
|
1453
|
+
entity_type=entity.get("type", ""),
|
|
1446
1454
|
description=entity.get("description"),
|
|
1447
1455
|
aliases=entity.get("aliases"),
|
|
1448
1456
|
)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "1.
|
|
3
|
-
"generated": "2026-05-
|
|
2
|
+
"version": "1.60.1",
|
|
3
|
+
"generated": "2026-05-23T01:39:21.155Z",
|
|
4
4
|
"algorithm": "sha256",
|
|
5
5
|
"files": {
|
|
6
6
|
".claude/rules/claudia-principles.md": "939e9720421628e7f2e4c8dfbaa4aeb9c1e18e8c6a5379cd6b772a6835b812e5",
|
|
@@ -9,7 +9,7 @@
|
|
|
9
9
|
".claude/rules/memory-commitment.md": "49eee330b56c6ca0b5f1e01550931c4eef3dcb3249e3d0e2380de3e8dbfe31a8",
|
|
10
10
|
".claude/rules/shell-compatibility.md": "565977bc04e269b3ce7d8a7963173df4f44bb9634692ea76abb1b64f6a67513e",
|
|
11
11
|
".claude/rules/trust-north-star.md": "0188b17c26b791cf597ce975bb75d40f543aa3d6b84b7edb1309b78530b3d43f",
|
|
12
|
-
".claude/skills/README.md": "
|
|
12
|
+
".claude/skills/README.md": "22311215317c61dcb67975af02bdd1d7164578713ff9faf6fbed8aec14256bcd",
|
|
13
13
|
".claude/skills/agent-dispatcher.md": "b48fc5283f0a88d1ebbc692b30b6c326fef012d4168dcf721a94e3c954b76b90",
|
|
14
14
|
".claude/skills/archetypes/_base-structure.md": "c0d8df77c07aa48cd9ba9c5ba3eddb533fcea790c7f98440fa3d31405fd5c75d",
|
|
15
15
|
".claude/skills/archetypes/consultant.md": "1e0ccbf89115f92a1fa0c86b48edcce22668c4bb828ba4268376fcd6b5c05680",
|
|
@@ -17,57 +17,67 @@
|
|
|
17
17
|
".claude/skills/archetypes/executive.md": "0d43c5c2c6315f5a4d6d36150778b55229cc922e0491a80af54824d87ac52e82",
|
|
18
18
|
".claude/skills/archetypes/founder.md": "a5bf3f22439c7c5f554c13966899a4af4de80f90c8f6a25eb62be1c68de6d54f",
|
|
19
19
|
".claude/skills/archetypes/solo.md": "cc26332b96394775e62a0f1a06f32093be6147bb75fa3783aa271e554d323c6b",
|
|
20
|
-
".claude/skills/
|
|
21
|
-
".claude/skills/
|
|
22
|
-
".claude/skills/
|
|
23
|
-
".claude/skills/
|
|
20
|
+
".claude/skills/auto-research/SKILL.md": "1a54e5d82f4e87c78a54e1331ff2d32aaff12c2712058e21173aa900a57b01c3",
|
|
21
|
+
".claude/skills/auto-research/references/program-template.md": "850457de96bde5ad65d24f1018509da94dd31fec9ec45c15087031f470a28dbc",
|
|
22
|
+
".claude/skills/auto-research/references/safety-rules.md": "ae036270e86949ed0f9bface582f6038276adafd3f40a29aa733698111a26683",
|
|
23
|
+
".claude/skills/brain/SKILL.md": "b87dd93c85923c676292bea67f7a0580ee71eea2bea05426ea6eda951a140d5c",
|
|
24
|
+
".claude/skills/brain-monitor/SKILL.md": "6e9ce1e6e40bc612a5126d84d743c1fd7ebc596f60575873533479fdcae5b608",
|
|
25
|
+
".claude/skills/capability-suggester.md": "06235c8ab062d356cf9b6c18c7aa44ea667e7694983a46e8ae0a9fe32e734944",
|
|
26
|
+
".claude/skills/capture-meeting/SKILL.md": "815e7587fcd925323c379a6c57e458b3484d034338c1237378641d142c126175",
|
|
24
27
|
".claude/skills/capture-meeting/evals/basic.yaml": "47c4eb1479aeda8067bc877cebfe8688c9b2bb3ce09297e42375cdf558f412e3",
|
|
25
28
|
".claude/skills/client-health/SKILL.md": "fed639e6f2c4433cab6a6b4b59776985492dda16be3bd1e843c3e94175936655",
|
|
29
|
+
".claude/skills/close-the-loop/SKILL.md": "835af2b0bde5583d1dffa07f261a1ec50e87f99eb39694c00cb7965ccce21942",
|
|
26
30
|
".claude/skills/commitment-detector.md": "66f9e919349d1880eb12636cec6f583be1b3fbe3b28f6180ec732d3c284af3b6",
|
|
27
|
-
".claude/skills/connector-discovery.md": "
|
|
31
|
+
".claude/skills/connector-discovery.md": "a6659f6ed8a182f968cdc9b4a342de9015d5579859310b77ff230ce73397cdc5",
|
|
28
32
|
".claude/skills/databases/SKILL.md": "b2ce4ae30aafa2487dabd7befd067976f0092fcef31dea7601e1683ceca0ef50",
|
|
29
33
|
".claude/skills/deep-context/SKILL.md": "2d21f93e33191c226f46909a66d21c64c84df2ac261082adef78cd68093d7b07",
|
|
30
|
-
".claude/skills/diagnose/SKILL.md": "
|
|
34
|
+
".claude/skills/diagnose/SKILL.md": "2658936671b5752b2e033177b6e57e783afcabf09556e199d88fc48f313295ad",
|
|
31
35
|
".claude/skills/diagnose/evals/basic.yaml": "7fa4e3a255f14d7dc883f59f8fd1d796a62db92d62a543ef87b5dda92a1c0c0b",
|
|
32
36
|
".claude/skills/diagnose/references/common-issues.md": "f80b13d4b70e656151a4cb9db8c1f042378d1f6cb9f5ba8c550741d6bf34917c",
|
|
33
|
-
".claude/skills/draft-reply/SKILL.md": "
|
|
37
|
+
".claude/skills/draft-reply/SKILL.md": "4658b3de3952b518d589af1a2ab21940ca31b477c68d2e03b05e20c3a11642de",
|
|
34
38
|
".claude/skills/feedback/SKILL.md": "a8b5a1e82eca177461d11c87839dbf933e358fc3214d5f1975dd2d4fff9f3be7",
|
|
35
|
-
".claude/skills/file-document/SKILL.md": "
|
|
39
|
+
".claude/skills/file-document/SKILL.md": "2ac7d37f59281511477692d61320e176b304fa65eedbd464b2dc203d9b4dd6f7",
|
|
36
40
|
".claude/skills/financial-snapshot/SKILL.md": "33ddabb7d60024eadd8e6772229d8b17db3c3fa02d1289c80330c74aa86ae00a",
|
|
37
41
|
".claude/skills/fix-duplicates/SKILL.md": "bd3d6aae0bce09bc844ba93f76abb4a97ff0286c6a4da29cd369763864fd4a86",
|
|
38
|
-
".claude/skills/follow-up-draft/SKILL.md": "
|
|
39
|
-
".claude/skills/growth-check/SKILL.md": "
|
|
40
|
-
".claude/skills/hire-agent.md": "
|
|
41
|
-
".claude/skills/inbox-check/SKILL.md": "
|
|
42
|
-
".claude/skills/ingest-sources/SKILL.md": "
|
|
42
|
+
".claude/skills/follow-up-draft/SKILL.md": "49212946cb7b2db39cecd435ab6fcb7cb26cc49666306084d27676543be67281",
|
|
43
|
+
".claude/skills/growth-check/SKILL.md": "a3ed3f2427f3639f81d16b1a2adfa268bafed795d2b0d02621bc6be318784c24",
|
|
44
|
+
".claude/skills/hire-agent.md": "2d60d241592db7d5f30a635d11164802f907b02616fb25170f3f867d5235f067",
|
|
45
|
+
".claude/skills/inbox-check/SKILL.md": "e2cddc91f72ffc0443d105385d2e8b464f86404a4825d308bd03a68e4ee43726",
|
|
46
|
+
".claude/skills/ingest-sources/SKILL.md": "938b93d299649db156ad19e9a7bd5d09e950f53bc2eaf8a85386f9fdbca350d0",
|
|
43
47
|
".claude/skills/ingest-sources/references/extraction-patterns.md": "28d6dc604c4a11eacfe4523e705ad430ba8fe26632bafcccaef656e9618e5aea",
|
|
44
|
-
".claude/skills/judgment-awareness.md": "
|
|
45
|
-
".claude/skills/
|
|
46
|
-
".claude/skills/
|
|
48
|
+
".claude/skills/judgment-awareness.md": "0077129a87f91d295515108f3f9a68ca6874c6e38fc7f2a73a462704fa00ba2d",
|
|
49
|
+
".claude/skills/manuscript-fact-check/SKILL.md": "3dbdeea105740e42b1ff9d5e7245dd37418401b36655169b5a5ab2a4a8d6a383",
|
|
50
|
+
".claude/skills/map-connections/SKILL.md": "e8133d5e3de36e32e3858a1ee3918a577855e77ade022933097f87a0100cc9a3",
|
|
51
|
+
".claude/skills/meditate/SKILL.md": "1d814f0c52f4309708669139ba4b637bf05a228e2381950ee3851c6145e25cc7",
|
|
47
52
|
".claude/skills/meditate/evals/basic.yaml": "daba441b2fd9d1d4afddcff6eaa9673884198b1d0f85d9d06edbe0738012291a",
|
|
48
|
-
".claude/skills/meeting-prep/SKILL.md": "
|
|
49
|
-
".claude/skills/memory-audit/SKILL.md": "
|
|
50
|
-
".claude/skills/memory-health/SKILL.md": "
|
|
53
|
+
".claude/skills/meeting-prep/SKILL.md": "cb0b2afc41052a0652e7de30d09e71ab489162fbb09c648e3c95cec2d438ba63",
|
|
54
|
+
".claude/skills/memory-audit/SKILL.md": "7daed2ec128ea70171b8e3ffaf4d933ca69db2637b1487f69724e3428c04663a",
|
|
55
|
+
".claude/skills/memory-health/SKILL.md": "ef4683fe8441e36d8ac7a931a96657c67de2ad3a6e2ca58266b48d74ce167a7b",
|
|
51
56
|
".claude/skills/memory-manager.md": "62edf1ed340687da9d8896dc4a21934b39b12c8269ea7a8de1beb9d25fd894e6",
|
|
52
|
-
".claude/skills/morning-brief/SKILL.md": "
|
|
57
|
+
".claude/skills/morning-brief/SKILL.md": "7e97e0657dccd0c0bcd3ffb783c2e7d0dd7582bf3490162cc8c1e987a49ea5dc",
|
|
53
58
|
".claude/skills/morning-brief/evals/basic.yaml": "537de9adcde134161de9ffb9602b8c920bf1464bf224d99ad6f95f854d8480b7",
|
|
54
|
-
".claude/skills/new-person/SKILL.md": "
|
|
59
|
+
".claude/skills/new-person/SKILL.md": "41b98b2bac95a76e2e58ba4c695d171b1e068910436741baa534d3db2b00ca9c",
|
|
55
60
|
".claude/skills/new-person/evals/basic.yaml": "f3c09a37c05d420e67520d8132a46356fed81575dcfb817cc97caf47f230392b",
|
|
56
61
|
".claude/skills/new-workspace/SKILL.md": "c3f838b44a016b87445c66555398b666970a156288320efb183bc8cb8af2f0df",
|
|
57
62
|
".claude/skills/new-workspace/references/workspace-templates.md": "6b36e961f112d46f32fdcf19cdacf3b05b4246201be315eee79659d606a7e243",
|
|
58
63
|
".claude/skills/onboarding.md": "72fe133672f61929de0cfc7ff77699eb8856ca1e283b13acd3becffff75fca55",
|
|
59
|
-
".claude/skills/pattern-recognizer.md": "
|
|
64
|
+
".claude/skills/pattern-recognizer.md": "6928735f17cfcd793483dba2c68d346b11450f086a4bcc24e72e9b348270534b",
|
|
60
65
|
".claude/skills/pipeline-review/SKILL.md": "7891ceb3b6f4fb19e5255ae71c5894b6da8a9c566bc317e566bd4bc282ea1a6f",
|
|
61
|
-
".claude/skills/relationship-tracker.md": "
|
|
66
|
+
".claude/skills/relationship-tracker.md": "ef351780adadf3866a545b527450727bfb49907d1e675be670f7efc3acb4e776",
|
|
62
67
|
".claude/skills/research/SKILL.md": "0d317e6350b043ee774916b923957f61dbc174e3caafc0187198fa236a2fd28d",
|
|
63
68
|
".claude/skills/research/references/source-evaluation.md": "64614b7eff83468d7ff76dd640252579f23e69e760969672e9aebe5ceb00f695",
|
|
64
|
-
".claude/skills/risk-surfacer.md": "
|
|
69
|
+
".claude/skills/risk-surfacer.md": "b01e6bc7a22df414a7ee7a5e4a108088b5d049646723cc7513fac27a3ba4856c",
|
|
65
70
|
".claude/skills/skill-index.json": "043c86f1b07f28bb54db80117437f60fb3d79ad0d18549fdc2c74921b996a56f",
|
|
71
|
+
".claude/skills/skill-router/SKILL.md": "12fc45aebc94d3ead65843cd336a88797a40856e82908cda47256c9cbb4cb60f",
|
|
72
|
+
".claude/skills/skill-router/references/overlap-clusters.md": "bc819b4892cbdf9919db392825230f4cd4eec546cd7f0c9de6c29a2468ab489a",
|
|
66
73
|
".claude/skills/structure-generator.md": "dbe70841ab60599a632687aaa3c3652361483df2fe854640c56982b60622eb19",
|
|
67
|
-
".claude/skills/summarize-doc/SKILL.md": "
|
|
68
|
-
".claude/skills/vault-awareness.md": "
|
|
69
|
-
".claude/skills/weekly-review/SKILL.md": "
|
|
70
|
-
".claude/skills/what-am-i-missing/SKILL.md": "
|
|
74
|
+
".claude/skills/summarize-doc/SKILL.md": "cd73acc34541c9971ce7248df87d8b26d572d0cf0b14275076f2fb321d32af3a",
|
|
75
|
+
".claude/skills/vault-awareness.md": "da4b056ea9b83ecbfb1a7ffd9318caf98c1fb38bb8334bde3b8975b8467f576b",
|
|
76
|
+
".claude/skills/weekly-review/SKILL.md": "2a9e5697499cb2980f2954a55a53e496f734c673d64e900ee1affed771a93ec1",
|
|
77
|
+
".claude/skills/what-am-i-missing/SKILL.md": "8557f893869c66e3cf0401b68a604c57f1edf8dcb065d469e7502c9af06c8dfe",
|
|
78
|
+
".claude/skills/wiki/SKILL.md": "e45e8db6fce1d774e36c93be27e5d9153b3ae474c48db478a4d7ff2b138e42e8",
|
|
79
|
+
".claude/skills/wiki/references/citations-and-contradictions.md": "1b16301020fdb5e4840035ca22425ebb03202694e88eb494105ff39e22e03263",
|
|
80
|
+
".claude/skills/wiki/references/page-template.md": "b19382ee9366b1e7f8c9f230881ac3c45982b59ffee0d82fe64b103862ba4745",
|
|
71
81
|
"CLAUDE.md": "736c5d95f1477ea6ef1c00d3006eeb4b369eb2bd5f3a926e4d78c42f857fa881"
|
|
72
82
|
}
|
|
73
83
|
}
|
|
@@ -248,6 +248,54 @@ Include:
|
|
|
248
248
|
- **Output Format** - Expected output structure
|
|
249
249
|
- **Judgment Points** - Where to ask for confirmation
|
|
250
250
|
|
|
251
|
+
## Writing a good skill description
|
|
252
|
+
|
|
253
|
+
The `description:` field in a skill's frontmatter is what Claude Code uses to decide when to trigger the skill. A vague description fires inconsistently; a tight description fires reliably. Three patterns from the existing catalog earn their keep:
|
|
254
|
+
|
|
255
|
+
**Verb + object + outcome + trigger phrases.** Lead with what the skill does, then list the exact phrases users say. From `capture-meeting`:
|
|
256
|
+
|
|
257
|
+
> Process meeting notes or transcript to extract decisions, commitments, and insights. Use when user shares transcript or says "capture this meeting", "here are my notes from the call".
|
|
258
|
+
|
|
259
|
+
**Quantify the work.** Numerical bounds disambiguate the skill from neighbors. From `deep-context`:
|
|
260
|
+
|
|
261
|
+
> Full-context deep analysis for meeting prep, relationship analysis, or strategic planning. Pulls up to 180 memories across multiple dimensions for comprehensive synthesis.
|
|
262
|
+
|
|
263
|
+
**State the proactive trigger threshold.** For proactive skills, name the count that fires them. From `pattern-recognizer`:
|
|
264
|
+
|
|
265
|
+
> Activates when the same topic, frustration, or behavior appears across 3+ interactions.
|
|
266
|
+
|
|
267
|
+
Avoid descriptions that read as aspirational ("helps you think strategically"), circular ("manages memory operations"), or multi-purpose without a clear trigger.
|
|
268
|
+
|
|
269
|
+
## The "see also" convention
|
|
270
|
+
|
|
271
|
+
When two skills could plausibly fire on the same situation, both should point at each other in their descriptions so a user who picked the less-canonical name finds the alternative. Pattern: append a final line to the `description:` field of the form:
|
|
272
|
+
|
|
273
|
+
> See also: `<skill-name>` for <what differs>.
|
|
274
|
+
|
|
275
|
+
Common overlap clusters in the current catalog:
|
|
276
|
+
|
|
277
|
+
- **Outbound messages**: `draft-reply` (general) ↔ `follow-up-draft` (post-meeting)
|
|
278
|
+
- **Memory**: `memory-audit` (content) ↔ `memory-health` (system) ↔ `diagnose` (connectivity)
|
|
279
|
+
- **Visualization**: `brain` (3D web) ↔ `brain-monitor` (terminal)
|
|
280
|
+
- **Reflective cadences**: `morning-brief` (daily) → `weekly-review` (weekly) → `growth-check` (monthly+) → `meditate` (per session)
|
|
281
|
+
- **Meeting lifecycle**: `meeting-prep` (before) → `capture-meeting` (during/after notes) → `follow-up-draft` (after, outbound)
|
|
282
|
+
- **Inbound processing**: `ingest-sources` (multi-doc) ↔ `file-document` (single) ↔ `capture-meeting` (meeting only) ↔ `summarize-doc` (no filing)
|
|
283
|
+
|
|
284
|
+
## Disambiguation notes
|
|
285
|
+
|
|
286
|
+
- `connector-discovery` is about connecting external services (Gmail, Calendar, Slack). It is not about people or human connections. For people, see `relationship-tracker`, `new-person`, and `map-connections`.
|
|
287
|
+
- `pattern-recognizer` notices what's recurring. `judgment-awareness` applies the user's decision rules to a recurring situation. `capability-suggester` proposes a new command when the same task is repeated by hand. Three skills, three different stages of "we keep doing this."
|
|
288
|
+
|
|
289
|
+
## Proactive vs contextual: when to make a skill auto-fire
|
|
290
|
+
|
|
291
|
+
A skill with `invocation: proactive` runs without the user asking. This is powerful but expensive: every proactive skill loaded into context is a tax on every conversation. Use proactive only when:
|
|
292
|
+
|
|
293
|
+
1. The trigger condition is **specific enough that false positives are rare** (e.g., commitment-detector watches for explicit promise phrasing, not vague intentions).
|
|
294
|
+
2. The action is **observational**, not mutative (surface a pattern, not send an email).
|
|
295
|
+
3. The output is **brief**, ideally one line, deferring detail to a follow-up question.
|
|
296
|
+
|
|
297
|
+
Contextual skills (no `invocation` set, or `invocation: contextual`) are cheaper. They wait for the user to invoke them by name or phrase. Default to contextual; promote to proactive only after observing repeated manual invocations of the same skill.
|
|
298
|
+
|
|
251
299
|
## Archetype Templates
|
|
252
300
|
|
|
253
301
|
The `archetypes/` folder contains structure and command templates for each user type:
|
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: auto-research
|
|
3
|
+
description: Iteratively improve a local artifact (draft, document, page) by running a hill-climbing loop. The user names the artifact, the evaluator, and the budget. Claudia edits the artifact, scores it, keeps it if better or reverts if worse, repeats. Use when user says "iterate on this", "loop on this until it's better", "run experiments on this draft", "auto-research this", "make this better and don't stop until it's good". Workspace-scoped, never touches live files, no external actions during the loop.
|
|
4
|
+
effort-level: high
|
|
5
|
+
invocation: contextual
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Auto-Research
|
|
9
|
+
|
|
10
|
+
A hill-climber for any local artifact with a rubric. Inspired by [Karpathy's autoresearch pattern](https://github.com/karpathy/autoresearch), adapted to Claudia's safety principles.
|
|
11
|
+
|
|
12
|
+
## Mental model
|
|
13
|
+
|
|
14
|
+
Three primitives, one governance file:
|
|
15
|
+
|
|
16
|
+
1. **The artifact** — what gets iterated on (a draft email, a brief, a wiki page, a one-pager). Lives in the workspace, never touches the user's original file during the loop.
|
|
17
|
+
2. **The evaluator** — a scalar rubric Claudia scores against. Plain English. Must produce a number for the original artifact before the loop starts.
|
|
18
|
+
3. **The budget** — iteration count (default 20). Loop stops when budget is exhausted or score plateaus.
|
|
19
|
+
4. **The program** (governance file) — what the user wants, what's off-limits, what counts as "better." Claudia helps the user write this if they don't volunteer it.
|
|
20
|
+
|
|
21
|
+
Loop: read program + artifact + results history → propose one specific change → implement → score → if better, ratchet (commit); if worse, revert (git reset). Report each iteration as a one-line update. Repeat until budget is hit.
|
|
22
|
+
|
|
23
|
+
## When to invoke this skill
|
|
24
|
+
|
|
25
|
+
User explicitly asks:
|
|
26
|
+
- "Iterate on this until it's [adjective]"
|
|
27
|
+
- "Loop on this draft"
|
|
28
|
+
- "Run experiments on this"
|
|
29
|
+
- "Auto-research this"
|
|
30
|
+
- "Hill-climb this against [criterion]"
|
|
31
|
+
- "Make this better, don't stop until it's good"
|
|
32
|
+
|
|
33
|
+
User implicitly invokes when:
|
|
34
|
+
- They've shared an artifact and a quality bar in the same message
|
|
35
|
+
- They've tried 2-3 manual revisions of the same draft in the same session and are still unsatisfied
|
|
36
|
+
|
|
37
|
+
## When NOT to invoke
|
|
38
|
+
|
|
39
|
+
Refuse to start the loop if:
|
|
40
|
+
|
|
41
|
+
1. **External-action artifact.** The artifact is, or directly drives, an external action (an email about to be sent, a Slack message in the compose window, a calendar invite). Auto-research on these is too easy to misuse. Hand-iterate with the user instead.
|
|
42
|
+
2. **No clear evaluator.** The user can't articulate what "better" means even after one pass of helping. Without an evaluator, the loop hill-climbs the wrong hill.
|
|
43
|
+
3. **Sensitive content.** Medical, deeply personal, legal-defensible artifacts. Iteration risks introducing fabricated detail that looks plausible. Decline and explain.
|
|
44
|
+
4. **The artifact is already good.** If you score the baseline and it's above the user's stated threshold, tell them and offer one quick polish instead of N iterations.
|
|
45
|
+
5. **Bold structural change is the actual need.** Karpathy's own limitation: RLHF-trained iteration is "cagy and scared." Iterations tend toward safe edits, not bold reframings. If the user wants a fundamentally different angle, iteration will not get them there. Suggest a fresh draft instead.
|
|
46
|
+
|
|
47
|
+
## Workspace layout
|
|
48
|
+
|
|
49
|
+
Each run gets its own workspace at `~/.claudia/auto-research/<task-id>/`:
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
~/.claudia/auto-research/<task-id>/
|
|
53
|
+
├── program.md the brief (user-authored with Claudia's help)
|
|
54
|
+
├── artifact.md (or .txt, the working copy that gets edited)
|
|
55
|
+
├── original.md immutable copy of the input, for reference and diff
|
|
56
|
+
├── results.tsv one row per iteration: timestamp, score, kept/reverted, change-summary
|
|
57
|
+
├── best.md symlink (or copy on Windows) to the highest-scoring version
|
|
58
|
+
└── iterations/
|
|
59
|
+
├── 01/artifact.md
|
|
60
|
+
├── 02/artifact.md
|
|
61
|
+
└── ...
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
The task-id is the slugified user phrase plus a short timestamp: `iterate-board-update-20260515-1430`.
|
|
65
|
+
|
|
66
|
+
**Critical:** The loop edits `artifact.md` inside the workspace. The user's original file (wherever it lives in their file system) is **NEVER** modified during the loop. At the end, Claudia asks the user where to put `best.md`; the user decides.
|
|
67
|
+
|
|
68
|
+
## program.md template
|
|
69
|
+
|
|
70
|
+
```markdown
|
|
71
|
+
# Program for: <one-line description of the task>
|
|
72
|
+
|
|
73
|
+
## Goal
|
|
74
|
+
|
|
75
|
+
(1-3 sentences. What is the end state Claudia is iterating toward?)
|
|
76
|
+
|
|
77
|
+
## Evaluator (rubric)
|
|
78
|
+
|
|
79
|
+
Each iteration is scored 0-10 on each dimension. Total score = sum. Higher is better.
|
|
80
|
+
|
|
81
|
+
| Dimension | What scores high | What scores low |
|
|
82
|
+
|-----------|------------------|-----------------|
|
|
83
|
+
| (dimension 1) | (what makes a 10) | (what makes a 0) |
|
|
84
|
+
| (dimension 2) | ... | ... |
|
|
85
|
+
| (dimension 3) | ... | ... |
|
|
86
|
+
|
|
87
|
+
## Hard constraints (do NOT violate)
|
|
88
|
+
|
|
89
|
+
- Length cap: stay under N words.
|
|
90
|
+
- Must contain: specific phrase or fact.
|
|
91
|
+
- Must NOT contain: forbidden phrasings, names, claims.
|
|
92
|
+
- Tone: must match the user's prior emails to <recipient> (paste examples in references/).
|
|
93
|
+
- (etc.)
|
|
94
|
+
|
|
95
|
+
## Budget
|
|
96
|
+
|
|
97
|
+
- Max iterations: 20 (default)
|
|
98
|
+
- OR stop when score plateaus (no improvement for 5 iterations in a row)
|
|
99
|
+
|
|
100
|
+
## Out of scope
|
|
101
|
+
|
|
102
|
+
- (anything Claudia might be tempted to do that isn't the goal)
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Claudia's first action when invoked: read the artifact, draft a program.md based on what the user said, present it for confirmation, then start the loop.
|
|
106
|
+
|
|
107
|
+
## The loop (Claudia's internal workflow)
|
|
108
|
+
|
|
109
|
+
For each iteration N:
|
|
110
|
+
|
|
111
|
+
1. **Read the state.** Read `program.md`, `artifact.md`, last 3 rows of `results.tsv`.
|
|
112
|
+
2. **Propose one change.** One specific edit, justified in one sentence. Not a rewrite. Not a refactor. One change.
|
|
113
|
+
3. **Implement.** Apply the edit to `artifact.md`. Save.
|
|
114
|
+
4. **Score.** Score the new `artifact.md` against the rubric in `program.md`. Sum the dimensions. Round to one decimal.
|
|
115
|
+
5. **Decide.** If new score > best score so far: ratchet. Copy `artifact.md` to `iterations/NN/`, update `best.md` to point at the new winner, append a row to `results.tsv` marked `kept`. If new score <= best: revert. Copy `iterations/(best_iter)/artifact.md` back to `artifact.md`, append a row marked `reverted`.
|
|
116
|
+
6. **Report.** One line to the user: `iter N: score 7.4 (kept), change: tightened lede to one sentence`.
|
|
117
|
+
7. **Check stop conditions.** If budget exhausted: stop. If score plateaued (no improvement for 5 iterations): stop. Otherwise: next iteration.
|
|
118
|
+
|
|
119
|
+
## Safety rules (mandatory, follow without exception)
|
|
120
|
+
|
|
121
|
+
1. **Workspace-only edits.** The loop never writes to a file outside `~/.claudia/auto-research/<task-id>/`. The user's original file at `/Users/.../wherever.md` is untouched until the user explicitly approves the final hand-off.
|
|
122
|
+
2. **No external actions during the loop.** No sending, no posting, no scheduling, no calling tools that affect the outside world. The loop is a closed system.
|
|
123
|
+
3. **Baseline-score gate.** Before iteration 1, score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)`. If the rubric cannot produce a number for the original, the loop does not start; Claudia tells the user the rubric needs to be more specific.
|
|
124
|
+
4. **Bounded budget.** Default 20 iterations. User can set higher (50 max) or lower. Plateau detection (no improvement for 5 in a row) stops early. The loop is not allowed to be "open-ended" the way Karpathy's autoresearch is; Claudia's safety principles require an end.
|
|
125
|
+
5. **Per-iteration revertability.** Each iteration is a git commit inside the workspace. Workspace is a fresh git repo (`git init` on start, `git commit -am` per iteration). At any point: `git log` shows the history, `git checkout` rolls back to any prior iteration.
|
|
126
|
+
6. **User interrupt at any time.** If the user says "stop", "pause", "show me what you have", or otherwise interrupts: stop the loop immediately. The workspace persists. The best version so far is in `best.md`. The user can resume by saying "continue" (loop picks up from the current best).
|
|
127
|
+
7. **Honest about the conservatism ceiling.** Karpathy named this: RLHF-trained iteration tends toward safe, local-optimum edits. If the loop plateaus and the user clearly wanted bold reframing, say so: "I've plateaued at score 7.2. The iterations have been incremental polishing. If you wanted a fundamentally different angle, this loop won't get there; a fresh draft might."
|
|
128
|
+
|
|
129
|
+
## Hand-off at the end of the loop
|
|
130
|
+
|
|
131
|
+
When the loop stops (budget, plateau, or user interrupt):
|
|
132
|
+
|
|
133
|
+
1. Read `best.md`.
|
|
134
|
+
2. Read `results.tsv` and summarize: started at score X, ended at score Y, took N iterations, here are the three biggest jumps (which iterations and what they changed).
|
|
135
|
+
3. Show `best.md` content to the user.
|
|
136
|
+
4. Ask: "Want me to put this back at the original location (`<path>`), save it to a new location, or just leave it in the workspace?"
|
|
137
|
+
5. **Only after explicit user confirmation**, copy `best.md` to the destination they name. The user's original file is touched here for the first time and only here.
|
|
138
|
+
6. Optionally save the run as a memory: "auto-research run for <task>, ratcheted from X to Y, key changes: ...". Useful if the user wants to learn from past runs.
|
|
139
|
+
|
|
140
|
+
## What this skill is NOT
|
|
141
|
+
|
|
142
|
+
- **Not a draft generator.** It improves an artifact that already exists. If the user wants a draft from scratch, use `draft-reply`, `follow-up-draft`, `summarize-doc`, or just write it conversationally first.
|
|
143
|
+
- **Not for bold reframing.** It's a hill-climber. It finds local optima. For "completely rethink this from a different angle", iteration is the wrong tool.
|
|
144
|
+
- **Not autonomous in the Karpathy "NEVER STOP" sense.** Claudia's safety principles require bounded loops and user-in-loop hand-off. The loop runs without per-iteration approval, but it stops, and external actions stay with the user.
|
|
145
|
+
|
|
146
|
+
## Examples of good rubrics
|
|
147
|
+
|
|
148
|
+
For "iterate on this board update":
|
|
149
|
+
|
|
150
|
+
| Dimension | 10 | 0 |
|
|
151
|
+
|-----------|----|----|
|
|
152
|
+
| Lede strength | Opens with the one number or decision that matters | Opens with throat-clearing or backstory |
|
|
153
|
+
| Brevity | Reads in under 90 seconds | Goes over 5 minutes |
|
|
154
|
+
| Ask clarity | The ask, if any, is one sentence | Asks are buried or multi-step |
|
|
155
|
+
| Tone | Matches my prior board updates (samples in references/) | Sounds AI-generated |
|
|
156
|
+
|
|
157
|
+
For "iterate on this proposal":
|
|
158
|
+
|
|
159
|
+
| Dimension | 10 | 0 |
|
|
160
|
+
|-----------|----|----|
|
|
161
|
+
| Problem framing | Names the client's actual pain in their words | Generic problem framing |
|
|
162
|
+
| Differentiation | One sentence on why I'm the right fit | No clear differentiator |
|
|
163
|
+
| Scope clarity | Phases are explicit, no scope creep | Vague deliverables |
|
|
164
|
+
| Price anchoring | Price appears after value, not before | Price leads or is hidden |
|
|
165
|
+
|
|
166
|
+
Specific. Numerical. Tied to the user's actual standards.
|
|
167
|
+
|
|
168
|
+
## Examples of bad rubrics
|
|
169
|
+
|
|
170
|
+
"Make it better" — no signal.
|
|
171
|
+
"More professional" — directionally OK but no scoring criteria.
|
|
172
|
+
"Like Bezos would write" — aspirational but Claudia can't score "Bezos-ness" reliably.
|
|
173
|
+
|
|
174
|
+
When the user gives a bad rubric, help them turn it into a good one before starting the loop.
|
|
175
|
+
|
|
176
|
+
## See also
|
|
177
|
+
|
|
178
|
+
- `wiki` for iterating wiki pages specifically (compress, sharpen, restructure)
|
|
179
|
+
- `draft-reply`, `follow-up-draft` for one-shot draft generation
|
|
180
|
+
- `summarize-doc` for one-pass summarization
|
|
181
|
+
|
|
182
|
+
## Open questions for future versions
|
|
183
|
+
|
|
184
|
+
- Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today, Claudia scores against the rubric in `program.md` directly.
|
|
185
|
+
- Token-spend budget and wall-clock budget are deferred. Today, only iteration count is supported.
|
|
186
|
+
- Parallel branches (try 3 different changes in parallel, keep the winner) are deferred. Today, sequential only.
|
|
187
|
+
- Wiki-page iteration as a first-class use case (apply auto-research to a wiki page until it's < N words while citing all sources) is deferred. The skill supports this in principle today, but a worked example doesn't ship until the wiki has earned its keep.
|