get-claudia 1.61.0 → 1.62.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +26 -1
- package/bin/index.js +17 -9
- package/bin/shell-init.js +28 -8
- package/memory-daemon/claudia_memory/loops/__init__.py +9 -0
- package/memory-daemon/claudia_memory/loops/status.py +85 -0
- package/package.json +1 -1
- package/template-v2/.claude/agents/loop-checker.md +51 -0
- package/template-v2/.claude/manifest.json +8 -4
- package/template-v2/.claude/skills/_loop/README.md +33 -0
- package/template-v2/.claude/skills/_loop/checker.md +64 -0
- package/template-v2/.claude/skills/_loop/maker.md +39 -0
- package/template-v2/.claude/skills/_loop/repair.md +85 -0
- package/template-v2/.claude/skills/auto-research/SKILL.md +47 -9
- package/template-v2/.claude/skills/meditate/SKILL.md +19 -0
- package/template-v2/.claude/hooks/__pycache__/post-tool-capture.cpython-313.pyc +0 -0
- package/template-v2/.claude/hooks/__pycache__/session-health-check.cpython-313.pyc +0 -0
- package/template-v2/.claude/hooks/__pycache__/user-prompt-capture.cpython-313.pyc +0 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,32 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Claudia will be documented in this file.
|
|
4
4
|
|
|
5
|
+
## 1.62.0 (2026-06-13)
|
|
6
|
+
|
|
7
|
+
### Loop engineering foundation (Proposal 11, Phases 1-2)
|
|
8
|
+
|
|
9
|
+
The first two phases of the Autonomy & Personalization Layer: the Maker-Checker pattern on the existing `auto-research` loop, a bounded self-repair sub-loop, the shared loop infrastructure later loops reuse, and a `meditate` feed that reviews how the loops themselves performed. See `docs/proposals/11-autonomy-personalization-layer.md`.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- **Independent Checker for `auto-research`** (Proposal 11, E2). The iteration loop no longer grades its own work. Each iteration now dispatches a new `loop-checker` agent (Haiku, adversarial brief) that scores the artifact against the rubric independently; the Checker's score, not Claudia's, drives the keep/revert decision. When the Checker's score and Claudia's self-score diverge past threshold, the iteration is flagged `contested` and surfaced in the end-of-run summary. New files: `template-v2/.claude/agents/loop-checker.md` and `template-v2/.claude/skills/_loop/` (`maker.md`, `checker.md`, `README.md`).
|
|
14
|
+
- **Loop status-file schema** (Proposal 11, E1). `docs/loop-status-schema.md` defines a standard Markdown-body + YAML-frontmatter control plane (`last_input`, `maker_proposal`, `checker_verdict`, `verified`, `next_action`, and friends) plus the exit-condition standard every loop must declare up front. `auto-research` now writes `research_status.md` in this format each iteration.
|
|
15
|
+
- **Atomic status-file helper** (Proposal 11, E1/B3). New daemon module `claudia_memory/loops/status.py` (`write_status` / `read_status`) writes a status file through a same-directory temp file plus `os.replace`, so an interrupted write never leaves a partial file at the canonical path. Six tests in `memory-daemon/tests/test_loop_status.py` cover round-trip, native-type preservation, parent-dir creation, temp cleanup, and crash-safety.
|
|
16
|
+
- **Self-repair sub-loop** (Proposal 11, E3). When a loop stalls (the Checker fails 3 times running or scores regress), `auto-research` now enters a bounded self-repair pass (`template-v2/.claude/skills/_loop/repair.md`): it diagnoses whether the rubric or a brief is the real problem, proposes one fix, and proves it on the exact failing input before adopting it. Capped at 2 repair attempts; confirmed repairs leave a regression fixture under `~/.claudia/loops/regressions/`; edits to shipped briefs always go to a human approval gate.
|
|
17
|
+
- **`meditate` reviews loop-harness performance** (Proposal 11, E4). End-of-session reflection now reads recent loop status files, summarizes how Claudia's own Maker-Checker loops did (contested iterations, stuck loops, recurring Checker findings), and can propose a harness improvement. Any such proposal passes through the Checker before it is written, and edits to shipped briefs are presented as a diff for approval.
|
|
18
|
+
|
|
19
|
+
## 1.61.1 (2026-05-23)
|
|
20
|
+
|
|
21
|
+
### `claudia` command works from anywhere, and repairs itself
|
|
22
|
+
|
|
23
|
+
A follow-up to 1.61.0's shell command. The installer was storing a relative path, so `claudia` only worked from the install's parent directory. This release fixes the stored path and makes the command self-healing, so the error cannot recur.
|
|
24
|
+
|
|
25
|
+
### Fixed
|
|
26
|
+
|
|
27
|
+
- **`claudia` command failing with "Claudia home directory not found: claudia".** The installer wrote a *relative* path (`claudia`) into `~/.claudia/claudia-home` instead of the absolute install path, because the shell-helper step was handed `targetDir` (relative) rather than the resolved `targetPath`. The `claudia` command then only worked when run from the install's parent directory and failed everywhere else. The installer now always stores the absolute path (via `path.resolve`, which also fixes a separate bug where passing an absolute install path as an argument produced a doubled path like `/home/you/home/you/claudia`).
|
|
28
|
+
- **`claudia` command now self-heals a bad `claudia-home`.** The shell function resolves a relative stored value against `$HOME` (never the current directory), falls back to the default `~/claudia` install when the stored path is missing or stale, and rewrites `claudia-home` with the corrected absolute path so the error cannot recur. Existing installs that already have a bad value are repaired the first time `claudia` runs after upgrading.
|
|
29
|
+
- **`--skip-memory` installs now get the `claudia` command too.** Previously the shell-helper step was skipped entirely on the `--skip-memory` path, leaving those users with no `claudia`/`update-claudia` command. The shell helper is independent of the memory system and now installs on every path.
|
|
30
|
+
|
|
5
31
|
## 1.61.0 (2026-05-22)
|
|
6
32
|
|
|
7
33
|
### Launch from anywhere, upgrade from anywhere
|
|
@@ -19,7 +45,6 @@ This release adds a `claudia` shell command (and `update-claudia`) so you no lon
|
|
|
19
45
|
|
|
20
46
|
- **ASCII greeting logo: row 1 (hair top) realigned over face** (in `template-v2/CLAUDE.md`). The top hair row was indented 6 leading spaces while the face row below it starts at column 0, so the hair appeared shifted ~4 characters to the right. Now sits correctly above the face. Existing installs see the fix on next upgrade.
|
|
21
47
|
- **Installer no longer treats flag-looking arguments as install targets.** Previously, `npx get-claudia --help` (or any unrecognized `--flag`) was interpreted as the install path and would create a `./--help/` directory with templates copied into it. Now: `--help`/`-h` prints a proper usage message and exits, `--version`/`-V` prints the version, and any other leading-dash argument prints `Error: unrecognized argument: <flag>` and exits 1 without touching the filesystem.
|
|
22
|
-
|
|
23
48
|
## 1.60.1 (2026-05-22)
|
|
24
49
|
|
|
25
50
|
### Two bug fixes from the fixing-phase pass
|
package/bin/index.js
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
|
|
3
3
|
import { existsSync, mkdirSync, cpSync, readdirSync, readFileSync, writeFileSync, statSync, renameSync, unlinkSync, copyFileSync } from 'fs';
|
|
4
|
-
import { join, dirname } from 'path';
|
|
4
|
+
import { join, dirname, resolve } from 'path';
|
|
5
5
|
import { fileURLToPath } from 'url';
|
|
6
6
|
import { spawn, execFileSync } from 'child_process';
|
|
7
7
|
import { homedir } from 'os';
|
|
@@ -828,7 +828,9 @@ async function main() {
|
|
|
828
828
|
// Support "." or "upgrade" for current directory
|
|
829
829
|
const isCurrentDir = arg === '.' || arg === 'upgrade';
|
|
830
830
|
const targetDir = isCurrentDir ? '.' : (arg || 'claudia');
|
|
831
|
-
|
|
831
|
+
// resolve() (not join) so an absolute arg is honored as-is and the result is
|
|
832
|
+
// always absolute (this value is what gets written into ~/.claudia/claudia-home).
|
|
833
|
+
const targetPath = isCurrentDir ? process.cwd() : resolve(process.cwd(), targetDir);
|
|
832
834
|
|
|
833
835
|
// Check if directory already exists with Claudia files
|
|
834
836
|
let isUpgrade = false;
|
|
@@ -969,11 +971,14 @@ async function main() {
|
|
|
969
971
|
}
|
|
970
972
|
renderer.render();
|
|
971
973
|
|
|
972
|
-
//
|
|
973
|
-
|
|
974
|
-
|
|
975
|
-
renderer
|
|
976
|
-
|
|
974
|
+
// The `claudia` shell command is independent of the memory system, so install
|
|
975
|
+
// it here too, otherwise --skip-memory users get no `claudia` command.
|
|
976
|
+
runShellStep(renderer, targetPath, () => {
|
|
977
|
+
runVaultStep(renderer, () => {
|
|
978
|
+
renderer.stopSpinner();
|
|
979
|
+
renderer.render();
|
|
980
|
+
showCompletion(targetDir, isCurrentDir, false, undefined, isUpgrade);
|
|
981
|
+
});
|
|
977
982
|
});
|
|
978
983
|
return;
|
|
979
984
|
}
|
|
@@ -1563,8 +1568,11 @@ async function main() {
|
|
|
1563
1568
|
|
|
1564
1569
|
renderer.stopSpinner();
|
|
1565
1570
|
|
|
1566
|
-
// Shell helper step, then vault, then completion
|
|
1567
|
-
|
|
1571
|
+
// Shell helper step, then vault, then completion.
|
|
1572
|
+
// Pass the absolute targetPath (not the relative targetDir): this value lands
|
|
1573
|
+
// in ~/.claudia/claudia-home, and a relative value there breaks `claudia` from
|
|
1574
|
+
// any directory other than the install's parent.
|
|
1575
|
+
runShellStep(renderer, targetPath, () => {
|
|
1568
1576
|
runVaultStep(renderer, () => {
|
|
1569
1577
|
renderer.render();
|
|
1570
1578
|
showDbScanResults(dbScan);
|
package/bin/shell-init.js
CHANGED
|
@@ -28,17 +28,37 @@ export const SHELL_INIT_CONTENT = `# Claudia shell helpers — sourced from your
|
|
|
28
28
|
|
|
29
29
|
_claudia_home() {
|
|
30
30
|
local home_file="$HOME/.claudia/claudia-home"
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
31
|
+
local dir=""
|
|
32
|
+
[ -f "$home_file" ] && dir="$(cat "$home_file" 2>/dev/null)"
|
|
33
|
+
|
|
34
|
+
# A relative value (e.g. an older install that stored "claudia") is anchored to
|
|
35
|
+
# $HOME, never the current directory, so \`claudia\` works from anywhere.
|
|
36
|
+
case "$dir" in
|
|
37
|
+
/*) ;; # already absolute
|
|
38
|
+
"") ;; # empty -> handled by recovery below
|
|
39
|
+
*) dir="$HOME/$dir" ;; # relative -> resolve under $HOME
|
|
40
|
+
esac
|
|
41
|
+
|
|
42
|
+
# Recover from a missing or stale path by falling back to the default install
|
|
43
|
+
# location when it looks like a real Claudia install.
|
|
44
|
+
if [ -z "$dir" ] || [ ! -d "$dir" ]; then
|
|
45
|
+
if [ -d "$HOME/claudia/.claude" ] || [ -f "$HOME/claudia/CLAUDE.md" ]; then
|
|
46
|
+
dir="$HOME/claudia"
|
|
47
|
+
fi
|
|
34
48
|
fi
|
|
35
|
-
|
|
36
|
-
dir
|
|
37
|
-
|
|
38
|
-
echo "
|
|
39
|
-
echo "Fix by editing $home_file" >&2
|
|
49
|
+
|
|
50
|
+
if [ -z "$dir" ] || [ ! -d "$dir" ]; then
|
|
51
|
+
echo "Claudia install not found. Run: npx get-claudia ." >&2
|
|
52
|
+
[ -f "$home_file" ] && echo "(or set the correct path in $home_file)" >&2
|
|
40
53
|
return 1
|
|
41
54
|
fi
|
|
55
|
+
|
|
56
|
+
# Self-heal: persist the corrected absolute path so the error never recurs.
|
|
57
|
+
if [ "$(cat "$home_file" 2>/dev/null)" != "$dir" ]; then
|
|
58
|
+
mkdir -p "$HOME/.claudia" 2>/dev/null
|
|
59
|
+
printf '%s\\n' "$dir" > "$home_file" 2>/dev/null || true
|
|
60
|
+
fi
|
|
61
|
+
|
|
42
62
|
printf '%s' "$dir"
|
|
43
63
|
}
|
|
44
64
|
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
"""Loop-engineering helpers (Proposal 11).
|
|
2
|
+
|
|
3
|
+
Shared infrastructure for Maker-Checker loops: atomic, human-readable status
|
|
4
|
+
files that act as the control plane for both skill-level and daemon-level loops.
|
|
5
|
+
"""
|
|
6
|
+
|
|
7
|
+
from claudia_memory.loops.status import read_status, write_status
|
|
8
|
+
|
|
9
|
+
__all__ = ["read_status", "write_status"]
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
"""Atomic status-file helper for loop engineering (Proposal 11, E1/B3).
|
|
2
|
+
|
|
3
|
+
A status file is Markdown with a YAML frontmatter block. The frontmatter carries
|
|
4
|
+
the structured control fields (loop_id, verified, checker_verdict, next_action,
|
|
5
|
+
...) and the body carries the human-readable narrative:
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
loop_id: consolidation
|
|
9
|
+
verified: true
|
|
10
|
+
next_action: none
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Loop status: consolidation
|
|
14
|
+
|
|
15
|
+
Last run consolidated 12 memories; all invariants held.
|
|
16
|
+
|
|
17
|
+
Writes go through a temp file in the same directory followed by ``os.replace``,
|
|
18
|
+
so an interrupted write never leaves a partially-written file at the canonical
|
|
19
|
+
path. A reader sees either the complete previous file or the complete new one.
|
|
20
|
+
"""
|
|
21
|
+
|
|
22
|
+
from __future__ import annotations
|
|
23
|
+
|
|
24
|
+
import os
|
|
25
|
+
from pathlib import Path
|
|
26
|
+
from typing import Any
|
|
27
|
+
|
|
28
|
+
import yaml
|
|
29
|
+
|
|
30
|
+
_DELIM = "---"
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def write_status(path: str | os.PathLike, fields: dict[str, Any], body: str = "") -> None:
|
|
34
|
+
"""Atomically write a status file at ``path``.
|
|
35
|
+
|
|
36
|
+
Serializes ``fields`` as YAML frontmatter and appends ``body`` as the
|
|
37
|
+
Markdown body. Creates missing parent directories. If the final rename
|
|
38
|
+
fails, the canonical path is left untouched and the temp file is removed.
|
|
39
|
+
"""
|
|
40
|
+
target = Path(path)
|
|
41
|
+
target.parent.mkdir(parents=True, exist_ok=True)
|
|
42
|
+
|
|
43
|
+
frontmatter = yaml.safe_dump(fields, sort_keys=False, default_flow_style=False).strip()
|
|
44
|
+
content = f"{_DELIM}\n{frontmatter}\n{_DELIM}\n\n{body}"
|
|
45
|
+
|
|
46
|
+
# Temp file lives in the SAME directory as the target so that os.replace is
|
|
47
|
+
# a same-filesystem rename (atomic), not a cross-device copy.
|
|
48
|
+
tmp = target.with_name(f".{target.name}.tmp")
|
|
49
|
+
try:
|
|
50
|
+
with open(tmp, "w", encoding="utf-8") as fh:
|
|
51
|
+
fh.write(content)
|
|
52
|
+
fh.flush()
|
|
53
|
+
os.fsync(fh.fileno())
|
|
54
|
+
os.replace(tmp, target)
|
|
55
|
+
except BaseException:
|
|
56
|
+
# Any failure (including the rename) must leave the canonical path
|
|
57
|
+
# intact and must not litter the directory with a temp file.
|
|
58
|
+
try:
|
|
59
|
+
os.unlink(tmp)
|
|
60
|
+
except OSError:
|
|
61
|
+
pass
|
|
62
|
+
raise
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
def read_status(path: str | os.PathLike) -> tuple[dict[str, Any], str]:
|
|
66
|
+
"""Read a status file, returning ``(fields, body)``.
|
|
67
|
+
|
|
68
|
+
``fields`` is the parsed frontmatter (empty dict if there is none); ``body``
|
|
69
|
+
is the Markdown body with the single separating blank line removed.
|
|
70
|
+
"""
|
|
71
|
+
text = Path(path).read_text(encoding="utf-8")
|
|
72
|
+
return _split_frontmatter(text)
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
def _split_frontmatter(text: str) -> tuple[dict[str, Any], str]:
|
|
76
|
+
if not text.startswith(_DELIM):
|
|
77
|
+
return {}, text
|
|
78
|
+
rest = text[len(_DELIM):].lstrip("\n")
|
|
79
|
+
end = rest.find(f"\n{_DELIM}")
|
|
80
|
+
if end == -1:
|
|
81
|
+
return {}, text
|
|
82
|
+
raw = rest[:end]
|
|
83
|
+
body = rest[end + 1 + len(_DELIM):].lstrip("\n")
|
|
84
|
+
fields = yaml.safe_load(raw) or {}
|
|
85
|
+
return fields, body
|
package/package.json
CHANGED
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: loop-checker
|
|
3
|
+
description: Independently scores a loop iteration's output against a rubric and returns a structured verdict. Adversarial by design: finds faults, does not confirm. Used by auto-research and future Maker-Checker loops.
|
|
4
|
+
model: haiku
|
|
5
|
+
dispatch-category: verification
|
|
6
|
+
dispatch-tier: task
|
|
7
|
+
auto-dispatch: false
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Loop Checker
|
|
11
|
+
|
|
12
|
+
You are Claudia's Loop Checker. When a loop produces an iteration, Claudia
|
|
13
|
+
dispatches you to score it independently of whoever produced it. You are the
|
|
14
|
+
"Checker" half of the Maker-Checker pattern (Proposal 11).
|
|
15
|
+
|
|
16
|
+
Your brief is the Checker role brief at `.claude/skills/_loop/checker.md`. Follow
|
|
17
|
+
it exactly. The short version:
|
|
18
|
+
|
|
19
|
+
- You receive a `rubric`, an `artifact` to score, and the `proposed_change` the
|
|
20
|
+
Maker made (with the Maker's expected effect, if any).
|
|
21
|
+
- Score the artifact against each rubric dimension. Sum to a total.
|
|
22
|
+
- Hunt for faults: unmet constraints, regressions the change introduced,
|
|
23
|
+
dimensions the Maker over-claimed.
|
|
24
|
+
- Decide `verified`. When uncertain, default to `verified: false`: a false pass
|
|
25
|
+
costs more than one more iteration.
|
|
26
|
+
- Do not be swayed by the Maker's self-score. Score independently.
|
|
27
|
+
|
|
28
|
+
## Output
|
|
29
|
+
|
|
30
|
+
Return exactly the verdict JSON defined in the Checker role brief:
|
|
31
|
+
|
|
32
|
+
```json
|
|
33
|
+
{
|
|
34
|
+
"verified": false,
|
|
35
|
+
"score": 7.2,
|
|
36
|
+
"max_score": 10,
|
|
37
|
+
"issues": [
|
|
38
|
+
{"severity": "major", "issue": "the ask is buried in paragraph 3", "where": "body"}
|
|
39
|
+
],
|
|
40
|
+
"rationale": "One or two sentences, grounded in the rubric.",
|
|
41
|
+
"hard_constraint_violated": false
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Constraints
|
|
46
|
+
|
|
47
|
+
- Do NOT edit the artifact. You score and report; the Maker edits.
|
|
48
|
+
- Do NOT decide keep or revert. You return a verdict; the loop applies the rule.
|
|
49
|
+
- Do NOT soften findings to be agreeable. Disagreement with the Maker is the
|
|
50
|
+
entire reason you exist as a separate agent.
|
|
51
|
+
- Be concrete: every issue names a severity, what is wrong, and where.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "1.
|
|
3
|
-
"generated": "2026-
|
|
2
|
+
"version": "1.62.0",
|
|
3
|
+
"generated": "2026-06-14T02:37:03.060Z",
|
|
4
4
|
"algorithm": "sha256",
|
|
5
5
|
"files": {
|
|
6
6
|
".claude/rules/claudia-principles.md": "939e9720421628e7f2e4c8dfbaa4aeb9c1e18e8c6a5379cd6b772a6835b812e5",
|
|
@@ -10,6 +10,10 @@
|
|
|
10
10
|
".claude/rules/shell-compatibility.md": "565977bc04e269b3ce7d8a7963173df4f44bb9634692ea76abb1b64f6a67513e",
|
|
11
11
|
".claude/rules/trust-north-star.md": "0188b17c26b791cf597ce975bb75d40f543aa3d6b84b7edb1309b78530b3d43f",
|
|
12
12
|
".claude/skills/README.md": "22311215317c61dcb67975af02bdd1d7164578713ff9faf6fbed8aec14256bcd",
|
|
13
|
+
".claude/skills/_loop/README.md": "13d44b9d2f88be477a71773a1dbbe8a3270bdda39a16785beaeff92abaeb019e",
|
|
14
|
+
".claude/skills/_loop/checker.md": "dcec7e38330244973020050bb95e76853ff5f05ebf4ca82b6b442d2eb26f7832",
|
|
15
|
+
".claude/skills/_loop/maker.md": "eadc6c2c59f3e43f4b029c3d40ce3af29cd746da219c132da0888d31a0743e01",
|
|
16
|
+
".claude/skills/_loop/repair.md": "675704a1aa1eb8463dac6270bd02ea34099674a9012e1b65924ef6923b384d5b",
|
|
13
17
|
".claude/skills/agent-dispatcher.md": "b48fc5283f0a88d1ebbc692b30b6c326fef012d4168dcf721a94e3c954b76b90",
|
|
14
18
|
".claude/skills/archetypes/_base-structure.md": "c0d8df77c07aa48cd9ba9c5ba3eddb533fcea790c7f98440fa3d31405fd5c75d",
|
|
15
19
|
".claude/skills/archetypes/consultant.md": "1e0ccbf89115f92a1fa0c86b48edcce22668c4bb828ba4268376fcd6b5c05680",
|
|
@@ -17,7 +21,7 @@
|
|
|
17
21
|
".claude/skills/archetypes/executive.md": "0d43c5c2c6315f5a4d6d36150778b55229cc922e0491a80af54824d87ac52e82",
|
|
18
22
|
".claude/skills/archetypes/founder.md": "a5bf3f22439c7c5f554c13966899a4af4de80f90c8f6a25eb62be1c68de6d54f",
|
|
19
23
|
".claude/skills/archetypes/solo.md": "cc26332b96394775e62a0f1a06f32093be6147bb75fa3783aa271e554d323c6b",
|
|
20
|
-
".claude/skills/auto-research/SKILL.md": "
|
|
24
|
+
".claude/skills/auto-research/SKILL.md": "2f7f8a7d0731813921094b0117026693314f0f59e8d9b52637bd3e775a2e95f8",
|
|
21
25
|
".claude/skills/auto-research/references/program-template.md": "850457de96bde5ad65d24f1018509da94dd31fec9ec45c15087031f470a28dbc",
|
|
22
26
|
".claude/skills/auto-research/references/safety-rules.md": "ae036270e86949ed0f9bface582f6038276adafd3f40a29aa733698111a26683",
|
|
23
27
|
".claude/skills/brain/SKILL.md": "b87dd93c85923c676292bea67f7a0580ee71eea2bea05426ea6eda951a140d5c",
|
|
@@ -46,7 +50,7 @@
|
|
|
46
50
|
".claude/skills/ingest-sources/references/extraction-patterns.md": "28d6dc604c4a11eacfe4523e705ad430ba8fe26632bafcccaef656e9618e5aea",
|
|
47
51
|
".claude/skills/judgment-awareness.md": "0077129a87f91d295515108f3f9a68ca6874c6e38fc7f2a73a462704fa00ba2d",
|
|
48
52
|
".claude/skills/map-connections/SKILL.md": "e8133d5e3de36e32e3858a1ee3918a577855e77ade022933097f87a0100cc9a3",
|
|
49
|
-
".claude/skills/meditate/SKILL.md": "
|
|
53
|
+
".claude/skills/meditate/SKILL.md": "1d4eab90fdedd652f75750df1a1c3be885fbeefb89d5d99979324e395eb6fd83",
|
|
50
54
|
".claude/skills/meditate/evals/basic.yaml": "daba441b2fd9d1d4afddcff6eaa9673884198b1d0f85d9d06edbe0738012291a",
|
|
51
55
|
".claude/skills/meeting-prep/SKILL.md": "cb0b2afc41052a0652e7de30d09e71ab489162fbb09c648e3c95cec2d438ba63",
|
|
52
56
|
".claude/skills/memory-audit/SKILL.md": "7daed2ec128ea70171b8e3ffaf4d933ca69db2637b1487f69724e3428c04663a",
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# _loop: shared loop-engineering resources
|
|
2
|
+
|
|
3
|
+
This directory is **not a skill**. It has no `SKILL.md` and is not listed in
|
|
4
|
+
`skill-index.json`, so the skill router never loads it directly. It holds the
|
|
5
|
+
versioned Maker and Checker prompt templates that skill-level loops reuse, so the
|
|
6
|
+
Maker-Checker pattern is defined once and shared.
|
|
7
|
+
|
|
8
|
+
Consumers today:
|
|
9
|
+
|
|
10
|
+
- `auto-research` (Proposal 11, E2) uses `checker.md` to score each iteration
|
|
11
|
+
independently of the Maker.
|
|
12
|
+
- `build-team` (Proposal 11, E6, not yet built) will reuse both templates.
|
|
13
|
+
|
|
14
|
+
The underscore prefix marks this as an internal resource, not a user-facing
|
|
15
|
+
capability. See `docs/loop-status-schema.md` for the status-file contract these
|
|
16
|
+
loops write, and `docs/proposals/11-autonomy-personalization-layer.md` for the
|
|
17
|
+
design.
|
|
18
|
+
|
|
19
|
+
## Files
|
|
20
|
+
|
|
21
|
+
| File | Role |
|
|
22
|
+
|------|------|
|
|
23
|
+
| `maker.md` | The producer prompt: makes one bounded change per iteration. |
|
|
24
|
+
| `checker.md` | The independent verifier prompt: scores adversarially, returns a structured verdict. |
|
|
25
|
+
| `repair.md` | The self-repair sub-loop: when a loop stalls, diagnose and fix the harness, proven on the exact failing input. |
|
|
26
|
+
|
|
27
|
+
## Why Maker and Checker are separate
|
|
28
|
+
|
|
29
|
+
A single agent grading its own work has a self-justification bias: it tends to
|
|
30
|
+
rate what it just produced as good. Splitting the roles, and giving the Checker
|
|
31
|
+
an adversarial brief (find faults, do not confirm), makes a passing verdict
|
|
32
|
+
actually mean something. The Checker runs on a cheaper model tier (Haiku) so the
|
|
33
|
+
verification pass is fast and inexpensive.
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# Checker role brief (Proposal 11, E1/B2)
|
|
2
|
+
|
|
3
|
+
The Checker verifies work the Maker produced. It is dispatched as a separate
|
|
4
|
+
agent (`loop-checker`, Haiku) so it reasons independently of the Maker and costs
|
|
5
|
+
little to run. Its brief is adversarial: **find faults, do not confirm.** A
|
|
6
|
+
passing verdict from a Checker told to look for problems is worth something; a
|
|
7
|
+
rubber stamp is not.
|
|
8
|
+
|
|
9
|
+
## What the Checker receives
|
|
10
|
+
|
|
11
|
+
The dispatching skill passes:
|
|
12
|
+
|
|
13
|
+
- `rubric`: the scoring criteria (from the loop's program/rubric file).
|
|
14
|
+
- `artifact`: the current output to score (the edited draft, the proposed team,
|
|
15
|
+
the consolidation result).
|
|
16
|
+
- `proposed_change`: the one change the Maker made this iteration, and the
|
|
17
|
+
Maker's expected effect (the self-score), if any.
|
|
18
|
+
|
|
19
|
+
## How the Checker scores
|
|
20
|
+
|
|
21
|
+
1. Score the artifact against each rubric dimension on its stated scale. Sum to a
|
|
22
|
+
total `score`.
|
|
23
|
+
2. Actively hunt for what is wrong or weak: unmet constraints, regressions the
|
|
24
|
+
change introduced, dimensions the Maker over-claimed, anything the rubric
|
|
25
|
+
penalizes.
|
|
26
|
+
3. Decide `verified`: did this artifact meet the rubric's bar (and not violate
|
|
27
|
+
any hard constraint)? When uncertain, default to `verified: false`. The cost
|
|
28
|
+
of a false pass is higher than the cost of one more iteration.
|
|
29
|
+
4. Do not be swayed by the Maker's self-score. Score independently first, then
|
|
30
|
+
you may note divergence.
|
|
31
|
+
|
|
32
|
+
## Output contract (return exactly this JSON)
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"verified": false,
|
|
37
|
+
"score": 7.2,
|
|
38
|
+
"max_score": 10,
|
|
39
|
+
"issues": [
|
|
40
|
+
{"severity": "major", "issue": "the ask is buried in paragraph 3", "where": "body"},
|
|
41
|
+
{"severity": "minor", "issue": "two sentences exceed the length cap", "where": "closing"}
|
|
42
|
+
],
|
|
43
|
+
"rationale": "Lede improved, but the rubric weights ask-clarity heavily and the ask is still not in the opening.",
|
|
44
|
+
"hard_constraint_violated": false
|
|
45
|
+
}
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
- `verified` (bool): does the artifact clear the bar this iteration?
|
|
49
|
+
- `score` (float): total across rubric dimensions.
|
|
50
|
+
- `max_score` (float): the maximum the rubric allows, so the caller can normalize.
|
|
51
|
+
- `issues` (array): concrete, located faults. Empty only when there genuinely are
|
|
52
|
+
none. Each has `severity` (`major` | `minor`), `issue`, and `where`.
|
|
53
|
+
- `rationale` (string): one or two sentences. Why this verdict, grounded in the
|
|
54
|
+
rubric.
|
|
55
|
+
- `hard_constraint_violated` (bool): true if any non-negotiable constraint failed
|
|
56
|
+
(which forces `verified: false` regardless of score).
|
|
57
|
+
|
|
58
|
+
## What the Checker does not do
|
|
59
|
+
|
|
60
|
+
- It does not edit the artifact. It scores and reports; the Maker edits.
|
|
61
|
+
- It does not decide keep/revert. It returns a verdict; the loop applies the
|
|
62
|
+
rule (keep if the score beats the running best and no hard constraint failed).
|
|
63
|
+
- It does not soften findings to be agreeable. Disagreement with the Maker is the
|
|
64
|
+
whole point.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Maker role brief (Proposal 11, E1/B2)
|
|
2
|
+
|
|
3
|
+
The Maker produces work. In a Claudia loop, the Maker is Claudia herself (the
|
|
4
|
+
main model), not a dispatched subagent. This brief defines how the Maker behaves
|
|
5
|
+
inside any Maker-Checker loop, so the producer role is consistent across
|
|
6
|
+
`auto-research`, `build-team`, and future loops.
|
|
7
|
+
|
|
8
|
+
## Contract
|
|
9
|
+
|
|
10
|
+
Each iteration, the Maker does exactly one thing:
|
|
11
|
+
|
|
12
|
+
1. **Read the state.** The program/rubric, the current artifact, and the last
|
|
13
|
+
few rows of history (what was tried, what was kept, what was reverted).
|
|
14
|
+
2. **Propose ONE change.** One specific, bounded edit, justified in a single
|
|
15
|
+
sentence. Not a rewrite. Not a refactor. Not three changes bundled together.
|
|
16
|
+
One lever, pulled deliberately, so the Checker can attribute the score delta
|
|
17
|
+
to it.
|
|
18
|
+
3. **Apply it** to the working copy (never the user's original file during the
|
|
19
|
+
loop).
|
|
20
|
+
4. **Hand off to the Checker.** The Maker does not score its own work for the
|
|
21
|
+
keep/revert decision. It may state an expected effect ("this should raise the
|
|
22
|
+
lede dimension"), which becomes the Maker self-score the Checker's verdict is
|
|
23
|
+
compared against, but the Checker's score governs.
|
|
24
|
+
|
|
25
|
+
## Discipline
|
|
26
|
+
|
|
27
|
+
- One change per iteration. Bundling changes destroys the signal.
|
|
28
|
+
- Justify in one sentence. If the justification needs a paragraph, the change is
|
|
29
|
+
too big; split it.
|
|
30
|
+
- Stay inside the bounds declared up front (see the exit-condition standard in
|
|
31
|
+
`docs/loop-status-schema.md`).
|
|
32
|
+
- Honest self-assessment. Do not inflate the expected effect to pre-empt the
|
|
33
|
+
Checker. The point of the split is that the Checker is not you.
|
|
34
|
+
|
|
35
|
+
## Why the Maker does not grade itself
|
|
36
|
+
|
|
37
|
+
A model that grades its own output rates it generously: it just argued itself
|
|
38
|
+
into producing exactly that. The keep/revert decision therefore belongs to an
|
|
39
|
+
independent Checker (see `checker.md`). The Maker proposes; the Checker disposes.
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# Self-repair brief (Proposal 11, E3)
|
|
2
|
+
|
|
3
|
+
When a Maker-Checker loop keeps failing, the problem may not be the artifact: it
|
|
4
|
+
may be the harness (the rubric, the Maker brief, or the Checker brief). The
|
|
5
|
+
self-repair sub-loop diagnoses that, proposes a fix, and proves the fix on the
|
|
6
|
+
exact input that failed before adopting it. It is itself bounded, and it never
|
|
7
|
+
auto-edits shipped briefs or the user's live files.
|
|
8
|
+
|
|
9
|
+
This brief is shared. `auto-research` uses it today; future loops reuse it.
|
|
10
|
+
|
|
11
|
+
## B1: When self-repair triggers
|
|
12
|
+
|
|
13
|
+
Enter the self-repair sub-loop when any of these holds during a loop:
|
|
14
|
+
|
|
15
|
+
- The Checker returns `verified: false` for **3 iterations in a row**.
|
|
16
|
+
- The Checker's `score` **regresses** across 3 consecutive iterations (each lower
|
|
17
|
+
than the last).
|
|
18
|
+
- An iteration raises an **exception** the loop cannot attribute to the artifact
|
|
19
|
+
(for example, the Checker returns malformed output twice).
|
|
20
|
+
|
|
21
|
+
These are signals that hill-climbing has stalled for a structural reason, not
|
|
22
|
+
that one more edit will help.
|
|
23
|
+
|
|
24
|
+
## B2: The repair sub-loop
|
|
25
|
+
|
|
26
|
+
Self-repair is itself a small Maker-Checker loop, run on the harness instead of
|
|
27
|
+
the artifact:
|
|
28
|
+
|
|
29
|
+
1. **Diagnose (Maker).** Read the last few status rows and the Checker's `issues`.
|
|
30
|
+
State, in one or two sentences, the single most likely harness cause: an
|
|
31
|
+
ambiguous rubric dimension, a Maker brief that permits bundled changes, a
|
|
32
|
+
Checker brief that scores the wrong thing, a missing hard constraint.
|
|
33
|
+
2. **Propose one fix (Maker).** One concrete edit to exactly one harness piece
|
|
34
|
+
(the rubric in `program.md`, `maker.md`, or `checker.md`). Not a rewrite. One
|
|
35
|
+
lever.
|
|
36
|
+
3. **Validate on the exact failing input (Checker).** Re-run the failing
|
|
37
|
+
iteration's input through the loop **with the proposed fix applied in a scratch
|
|
38
|
+
copy**, and dispatch the `loop-checker`. The fix is adopted only if that exact
|
|
39
|
+
previously-failing case now reaches `verified: true` (or scores materially
|
|
40
|
+
higher) and no previously-passing regression fixture breaks.
|
|
41
|
+
4. **Adopt or discard.** If it passes, apply the fix (see B4 for where).
|
|
42
|
+
If it does not, discard it and either try one more diagnosis (within the B4
|
|
43
|
+
bound) or stop and hand the loop back to the user with the diagnosis.
|
|
44
|
+
|
|
45
|
+
The non-negotiable rule is step 3: a fix is never adopted on a hunch. It must be
|
|
46
|
+
proven on the input that exposed the problem.
|
|
47
|
+
|
|
48
|
+
## B3: Regression capture
|
|
49
|
+
|
|
50
|
+
Every confirmed repair leaves a regression behind so the same failure cannot
|
|
51
|
+
silently return:
|
|
52
|
+
|
|
53
|
+
- Save the failing input plus the verdict it should now produce as a fixture
|
|
54
|
+
under `~/.claudia/loops/regressions/<loop-id>/<short-slug>.md` (a status file in
|
|
55
|
+
the standard schema, with the expected `verified`/`score`).
|
|
56
|
+
- On future runs of that loop, replay the loop-id's regression fixtures through
|
|
57
|
+
the Checker first. If any regression breaks, the loop refuses to start and
|
|
58
|
+
reports which fix regressed.
|
|
59
|
+
|
|
60
|
+
Shipped-skill regressions (ones that should travel with the template, not just
|
|
61
|
+
the user's machine) are proposed to the user for inclusion as in-repo test
|
|
62
|
+
fixtures instead.
|
|
63
|
+
|
|
64
|
+
## B4: Bound and safety
|
|
65
|
+
|
|
66
|
+
Self-repair obeys the same bounded-autonomy rules as every loop:
|
|
67
|
+
|
|
68
|
+
- **Bounded.** At most **2 repair attempts** per loop run. After that, stop and
|
|
69
|
+
hand back to the user with the diagnosis. Self-repair never loops forever.
|
|
70
|
+
- **Workspace-only for artifacts.** The repair sub-loop validates in a scratch
|
|
71
|
+
copy and never touches the user's original file.
|
|
72
|
+
- **Never auto-edit shipped briefs.** A fix to `maker.md` or `checker.md` (files
|
|
73
|
+
that ship in the template) is **proposed as a diff for the user to approve**,
|
|
74
|
+
never written silently. A fix to a run-local `program.md` rubric may be applied
|
|
75
|
+
within the run, since that rubric is the user's own per-run governance file.
|
|
76
|
+
- **Logged.** Each repair attempt appends to the loop's status file: what was
|
|
77
|
+
diagnosed, what was changed, and whether the exact failing input passed after
|
|
78
|
+
the change.
|
|
79
|
+
|
|
80
|
+
## What self-repair is not
|
|
81
|
+
|
|
82
|
+
- It is not a license to rewrite the harness whenever a loop is hard. It triggers
|
|
83
|
+
only on the B1 signals, and it changes one thing at a time.
|
|
84
|
+
- It is not autonomous prompt-editing of shipped files. Those changes always end
|
|
85
|
+
at a human approval gate.
|
|
@@ -14,11 +14,11 @@ A hill-climber for any local artifact with a rubric. Inspired by [Karpathy's aut
|
|
|
14
14
|
Three primitives, one governance file:
|
|
15
15
|
|
|
16
16
|
1. **The artifact** — what gets iterated on (a draft email, a brief, a wiki page, a one-pager). Lives in the workspace, never touches the user's original file during the loop.
|
|
17
|
-
2. **The evaluator** — a scalar rubric
|
|
17
|
+
2. **The evaluator** — a scalar rubric. Plain English. An independent Checker (the `loop-checker` agent, Haiku) scores each iteration against it, not Claudia herself. The rubric must produce a number for the original artifact before the loop starts.
|
|
18
18
|
3. **The budget** — iteration count (default 20). Loop stops when budget is exhausted or score plateaus.
|
|
19
19
|
4. **The program** (governance file) — what the user wants, what's off-limits, what counts as "better." Claudia helps the user write this if they don't volunteer it.
|
|
20
20
|
|
|
21
|
-
Loop: read program + artifact + results history
|
|
21
|
+
Loop: read program + artifact + results history, Claudia (the Maker) proposes one specific change, implements it, then dispatches the `loop-checker` (the Checker) to score it independently. If the Checker's score beats the running best, ratchet (commit); if worse, revert (git reset). Write the status file. Report each iteration as a one-line update. Repeat until budget is hit.
|
|
22
22
|
|
|
23
23
|
## When to invoke this skill
|
|
24
24
|
|
|
@@ -53,7 +53,8 @@ Each run gets its own workspace at `~/.claudia/auto-research/<task-id>/`:
|
|
|
53
53
|
├── program.md the brief (user-authored with Claudia's help)
|
|
54
54
|
├── artifact.md (or .txt, the working copy that gets edited)
|
|
55
55
|
├── original.md immutable copy of the input, for reference and diff
|
|
56
|
-
├── results.tsv one row per iteration: timestamp, score, kept/reverted, change-summary
|
|
56
|
+
├── results.tsv one row per iteration: timestamp, score, kept/reverted/contested, change-summary
|
|
57
|
+
├── research_status.md loop control plane (see docs/loop-status-schema.md): iteration, verified, score, checker_verdict, next_action
|
|
57
58
|
├── best.md symlink (or copy on Windows) to the highest-scoring version
|
|
58
59
|
└── iterations/
|
|
59
60
|
├── 01/artifact.md
|
|
@@ -111,16 +112,50 @@ For each iteration N:
|
|
|
111
112
|
1. **Read the state.** Read `program.md`, `artifact.md`, last 3 rows of `results.tsv`.
|
|
112
113
|
2. **Propose one change.** One specific edit, justified in one sentence. Not a rewrite. Not a refactor. One change.
|
|
113
114
|
3. **Implement.** Apply the edit to `artifact.md`. Save.
|
|
114
|
-
4. **Score.**
|
|
115
|
-
5. **Decide.** If
|
|
116
|
-
6. **
|
|
117
|
-
7. **
|
|
115
|
+
4. **Score (independent Checker).** Dispatch the `loop-checker` agent (Haiku) with the rubric from `program.md`, the new `artifact.md`, and the one change you just made plus your expected effect (your Maker self-score). The Checker returns a verdict JSON: `verified`, `score`, `max_score`, `issues`, `rationale`, `hard_constraint_violated`. You do NOT score it yourself; the Checker's `score` governs the decision. (See `.claude/skills/_loop/checker.md`.)
|
|
116
|
+
5. **Decide.** If the Checker's `score` > best score so far and `hard_constraint_violated` is false: ratchet. Copy `artifact.md` to `iterations/NN/`, update `best.md` to point at the new winner, append a row to `results.tsv` marked `kept`. Otherwise: revert. Copy `iterations/(best_iter)/artifact.md` back to `artifact.md`, append a row marked `reverted`.
|
|
117
|
+
6. **Flag divergence.** Compare the Checker's `score` against your Maker self-score. If they diverge by more than 1.5 points on a 10-point scale, mark the iteration `contested` in `results.tsv`. The Checker's verdict still governs; the flag just surfaces the disagreement in the end-of-run summary.
|
|
118
|
+
7. **Write the status file.** Atomically update `research_status.md` with `iteration`, `verified`, `score`, `budget_remaining`, `last_input`, `maker_proposal`, `checker_verdict` (the Checker's rationale), and `next_action`. Write to a temp sibling and rename; never edit it in place. (See `docs/loop-status-schema.md`.)
|
|
119
|
+
8. **Report.** One line to the user: `iter N: score 7.4 (kept), change: tightened lede to one sentence`. Append `[contested]` when flagged.
|
|
120
|
+
9. **Check stop conditions and self-repair.** If the Checker has returned `verified: false` 3 times in a row, or its score has regressed for 3 straight iterations, enter the self-repair sub-loop (see `.claude/skills/_loop/repair.md`) before giving up: it diagnoses whether the rubric or a brief is the real problem, proposes one fix, and proves it on the exact failing input. If budget exhausted: stop. If score plateaued (no improvement for 5 iterations): stop. Otherwise: next iteration.
|
|
121
|
+
|
|
122
|
+
## The independent Checker (Maker-Checker)
|
|
123
|
+
|
|
124
|
+
This loop separates the producer from the verifier (Proposal 11, E2). Claudia is
|
|
125
|
+
the **Maker**: she reads the state and proposes one change. The **Checker** is a
|
|
126
|
+
separate agent (`loop-checker`, Haiku) that scores the result independently. The
|
|
127
|
+
two never collapse into one model grading its own work, which is the bias the
|
|
128
|
+
split exists to remove.
|
|
129
|
+
|
|
130
|
+
- The Checker's `score`, not Claudia's, drives keep/revert. Claudia's expected
|
|
131
|
+
effect is only a self-score, used to detect disagreement.
|
|
132
|
+
- The Checker is adversarial by brief: it hunts for faults and defaults to
|
|
133
|
+
`verified: false` when uncertain. A passing verdict therefore means something.
|
|
134
|
+
- A `hard_constraint_violated: true` verdict forces a revert regardless of score.
|
|
135
|
+
|
|
136
|
+
**Contested iterations.** When the Checker's score and Claudia's self-score
|
|
137
|
+
diverge by more than 1.5 points (10-point scale), the iteration is marked
|
|
138
|
+
`contested`. The Checker still governs the decision; the flag makes the
|
|
139
|
+
disagreement visible. The end-of-run summary reports how many iterations were
|
|
140
|
+
contested, which is a useful signal that the rubric is ambiguous or that Claudia
|
|
141
|
+
was over-optimistic.
|
|
142
|
+
|
|
143
|
+
**Cost.** The Checker runs on Haiku, so the verification pass is cheap and fast.
|
|
144
|
+
With the default 20-iteration budget, that is at most 21 Checker dispatches
|
|
145
|
+
(including the baseline), each scoped to a single small artifact.
|
|
146
|
+
|
|
147
|
+
**Self-repair.** If the loop stalls (the Checker fails 3 times running, or scores
|
|
148
|
+
regress), `auto-research` enters a bounded self-repair sub-loop that treats the
|
|
149
|
+
harness, not the artifact, as the suspect: it diagnoses the rubric or a brief,
|
|
150
|
+
proposes one fix, and proves it on the exact failing input before adopting it.
|
|
151
|
+
See `.claude/skills/_loop/repair.md`. It is capped at 2 repair attempts and never
|
|
152
|
+
auto-edits shipped briefs (those changes go to a human approval gate).
|
|
118
153
|
|
|
119
154
|
## Safety rules (mandatory, follow without exception)
|
|
120
155
|
|
|
121
156
|
1. **Workspace-only edits.** The loop never writes to a file outside `~/.claudia/auto-research/<task-id>/`. The user's original file at `/Users/.../wherever.md` is untouched until the user explicitly approves the final hand-off.
|
|
122
157
|
2. **No external actions during the loop.** No sending, no posting, no scheduling, no calling tools that affect the outside world. The loop is a closed system.
|
|
123
|
-
3. **Baseline-score gate.** Before iteration 1, score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)
|
|
158
|
+
3. **Baseline-score gate.** Before iteration 1, dispatch the `loop-checker` to score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)` and to `research_status.md` as iteration 0. If the Checker cannot produce a number for the original, the loop does not start; Claudia tells the user the rubric needs to be more specific.
|
|
124
159
|
4. **Bounded budget.** Default 20 iterations. User can set higher (50 max) or lower. Plateau detection (no improvement for 5 in a row) stops early. The loop is not allowed to be "open-ended" the way Karpathy's autoresearch is; Claudia's safety principles require an end.
|
|
125
160
|
5. **Per-iteration revertability.** Each iteration is a git commit inside the workspace. Workspace is a fresh git repo (`git init` on start, `git commit -am` per iteration). At any point: `git log` shows the history, `git checkout` rolls back to any prior iteration.
|
|
126
161
|
6. **User interrupt at any time.** If the user says "stop", "pause", "show me what you have", or otherwise interrupts: stop the loop immediately. The workspace persists. The best version so far is in `best.md`. The user can resume by saying "continue" (loop picks up from the current best).
|
|
@@ -178,10 +213,13 @@ When the user gives a bad rubric, help them turn it into a good one before start
|
|
|
178
213
|
- `wiki` for iterating wiki pages specifically (compress, sharpen, restructure)
|
|
179
214
|
- `draft-reply`, `follow-up-draft` for one-shot draft generation
|
|
180
215
|
- `summarize-doc` for one-pass summarization
|
|
216
|
+
- `.claude/skills/_loop/` for the shared Maker and Checker role briefs
|
|
217
|
+
- `.claude/agents/loop-checker.md` for the Checker agent definition
|
|
218
|
+
- `docs/loop-status-schema.md` for the `research_status.md` control-plane format
|
|
181
219
|
|
|
182
220
|
## Open questions for future versions
|
|
183
221
|
|
|
184
|
-
- Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today,
|
|
222
|
+
- Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today, the `loop-checker` agent scores against the rubric in `program.md`.
|
|
185
223
|
- Token-spend budget and wall-clock budget are deferred. Today, only iteration count is supported.
|
|
186
224
|
- Parallel branches (try 3 different changes in parallel, keep the winner) are deferred. Today, sequential only.
|
|
187
225
|
- Wiki-page iteration as a first-class use case (apply auto-research to a wiki page until it's < N words while citing all sources) is deferred. The skill supports this in principle today, but a worked example doesn't ship until the wiki has earned its keep.
|
|
@@ -42,6 +42,7 @@ Silently retrieve:
|
|
|
42
42
|
|
|
43
43
|
- Call the `memory_reflections` MCP tool to see what already exists
|
|
44
44
|
- Call the `memory_session_context` MCP tool for recent context (if available)
|
|
45
|
+
- Read recent loop status files (`~/.claudia/loops/*_status.md`, and any `research_status.md` from `auto-research` runs this session) to see how Claudia's own Maker-Checker loops performed: which iterations were `contested`, which loops never reached `verified`, where budgets ran out. This feeds the harness review in Step 2b.
|
|
45
46
|
|
|
46
47
|
### Step 2: Generate Reflections
|
|
47
48
|
|
|
@@ -78,6 +79,18 @@ Review the session and identify 1-3 reflections. Ask yourself:
|
|
|
78
79
|
|
|
79
80
|
**Quality over quantity.** One genuine insight beats three generic observations.
|
|
80
81
|
|
|
82
|
+
### Step 2b: Loop harness review (only if loops ran this session)
|
|
83
|
+
|
|
84
|
+
If any Maker-Checker loop ran this session (`auto-research`, or a wrapped daemon job), review how the harness itself performed, separately from reflections about the user. Read the signal from the status files gathered in Step 1, not from memory:
|
|
85
|
+
|
|
86
|
+
- **Contested iterations.** Did the Maker (Claudia) and the Checker disagree often? Frequent `contested` flags usually mean the rubric is ambiguous or Claudia was over-optimistic, not that the Checker is wrong.
|
|
87
|
+
- **Stuck loops.** Did a loop burn its whole budget without reaching `verified`? That points at the rubric, the Maker brief, or the Checker brief needing sharpening.
|
|
88
|
+
- **Recurring failure shape.** Across runs, is the same kind of issue showing up in the Checker's `issues` list?
|
|
89
|
+
|
|
90
|
+
If a clear, repeatable harness problem emerges, draft a **harness-improvement proposal**: a concrete edit to a rubric, the Maker brief (`.claude/skills/_loop/maker.md`), the Checker brief (`.claude/skills/_loop/checker.md`), or a new judgment rule about how to run loops. Do not apply it yet. Harness proposals go through the Checker gate in Step 5.
|
|
91
|
+
|
|
92
|
+
Most sessions produce no harness proposal. Raise one only when the evidence is in the status files.
|
|
93
|
+
|
|
81
94
|
### Step 3: Present for Approval
|
|
82
95
|
|
|
83
96
|
Format reflections clearly and ask for approval:
|
|
@@ -171,6 +184,12 @@ Call the `memory_end_session` MCP tool with:
|
|
|
171
184
|
5. Set `source` to `meditate/YYYY-MM-DD` (today's date)
|
|
172
185
|
6. Write the updated file
|
|
173
186
|
|
|
187
|
+
**Harness proposals go through the Checker before anything is written** (Proposal 11, E4). If Step 2b produced a harness-improvement proposal (a rubric edit, a Maker or Checker brief edit, or a loop-running judgment rule), validate it first:
|
|
188
|
+
|
|
189
|
+
1. Dispatch the `loop-checker` agent with the proposal as the artifact and a one-line rubric: "does this change actually fix the observed failure without breaking the brief's contract?"
|
|
190
|
+
2. Write the proposal only if the Checker returns `verified: true`. A proposal the Checker rejects is surfaced to the user as an open question, not saved.
|
|
191
|
+
3. Edits to shipped briefs (`_loop/maker.md`, `_loop/checker.md`) are never auto-applied. Present them as a diff for the user to approve, like any other change to Claudia's own files. Loop-running judgment rules go into `context/judgment.yaml` like any other rule, after passing the Checker.
|
|
192
|
+
|
|
174
193
|
Confirm storage with both reflections and rules:
|
|
175
194
|
"Got it. I've saved [N] reflection(s) and added [M] judgment rule(s) to your judgment file. I'll apply them going forward. See you next time."
|
|
176
195
|
|
|
Binary file
|
|
Binary file
|