get-claudia 1.61.0 → 1.62.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,32 @@
2
2
 
3
3
  All notable changes to Claudia will be documented in this file.
4
4
 
5
+ ## 1.62.0 (2026-06-13)
6
+
7
+ ### Loop engineering foundation (Proposal 11, Phases 1-2)
8
+
9
+ The first two phases of the Autonomy & Personalization Layer: the Maker-Checker pattern on the existing `auto-research` loop, a bounded self-repair sub-loop, the shared loop infrastructure later loops reuse, and a `meditate` feed that reviews how the loops themselves performed. See `docs/proposals/11-autonomy-personalization-layer.md`.
10
+
11
+ ### Added
12
+
13
+ - **Independent Checker for `auto-research`** (Proposal 11, E2). The iteration loop no longer grades its own work. Each iteration now dispatches a new `loop-checker` agent (Haiku, adversarial brief) that scores the artifact against the rubric independently; the Checker's score, not Claudia's, drives the keep/revert decision. When the Checker's score and Claudia's self-score diverge past threshold, the iteration is flagged `contested` and surfaced in the end-of-run summary. New files: `template-v2/.claude/agents/loop-checker.md` and `template-v2/.claude/skills/_loop/` (`maker.md`, `checker.md`, `README.md`).
14
+ - **Loop status-file schema** (Proposal 11, E1). `docs/loop-status-schema.md` defines a standard Markdown-body + YAML-frontmatter control plane (`last_input`, `maker_proposal`, `checker_verdict`, `verified`, `next_action`, and friends) plus the exit-condition standard every loop must declare up front. `auto-research` now writes `research_status.md` in this format each iteration.
15
+ - **Atomic status-file helper** (Proposal 11, E1/B3). New daemon module `claudia_memory/loops/status.py` (`write_status` / `read_status`) writes a status file through a same-directory temp file plus `os.replace`, so an interrupted write never leaves a partial file at the canonical path. Six tests in `memory-daemon/tests/test_loop_status.py` cover round-trip, native-type preservation, parent-dir creation, temp cleanup, and crash-safety.
16
+ - **Self-repair sub-loop** (Proposal 11, E3). When a loop stalls (the Checker fails 3 times running or scores regress), `auto-research` now enters a bounded self-repair pass (`template-v2/.claude/skills/_loop/repair.md`): it diagnoses whether the rubric or a brief is the real problem, proposes one fix, and proves it on the exact failing input before adopting it. Capped at 2 repair attempts; confirmed repairs leave a regression fixture under `~/.claudia/loops/regressions/`; edits to shipped briefs always go to a human approval gate.
17
+ - **`meditate` reviews loop-harness performance** (Proposal 11, E4). End-of-session reflection now reads recent loop status files, summarizes how Claudia's own Maker-Checker loops did (contested iterations, stuck loops, recurring Checker findings), and can propose a harness improvement. Any such proposal passes through the Checker before it is written, and edits to shipped briefs are presented as a diff for approval.
18
+
19
+ ## 1.61.1 (2026-05-23)
20
+
21
+ ### `claudia` command works from anywhere, and repairs itself
22
+
23
+ A follow-up to 1.61.0's shell command. The installer was storing a relative path, so `claudia` only worked from the install's parent directory. This release fixes the stored path and makes the command self-healing, so the error cannot recur.
24
+
25
+ ### Fixed
26
+
27
+ - **`claudia` command failing with "Claudia home directory not found: claudia".** The installer wrote a *relative* path (`claudia`) into `~/.claudia/claudia-home` instead of the absolute install path, because the shell-helper step was handed `targetDir` (relative) rather than the resolved `targetPath`. The `claudia` command then only worked when run from the install's parent directory and failed everywhere else. The installer now always stores the absolute path (via `path.resolve`, which also fixes a separate bug where passing an absolute install path as an argument produced a doubled path like `/home/you/home/you/claudia`).
28
+ - **`claudia` command now self-heals a bad `claudia-home`.** The shell function resolves a relative stored value against `$HOME` (never the current directory), falls back to the default `~/claudia` install when the stored path is missing or stale, and rewrites `claudia-home` with the corrected absolute path so the error cannot recur. Existing installs that already have a bad value are repaired the first time `claudia` runs after upgrading.
29
+ - **`--skip-memory` installs now get the `claudia` command too.** Previously the shell-helper step was skipped entirely on the `--skip-memory` path, leaving those users with no `claudia`/`update-claudia` command. The shell helper is independent of the memory system and now installs on every path.
30
+
5
31
  ## 1.61.0 (2026-05-22)
6
32
 
7
33
  ### Launch from anywhere, upgrade from anywhere
@@ -19,7 +45,6 @@ This release adds a `claudia` shell command (and `update-claudia`) so you no lon
19
45
 
20
46
  - **ASCII greeting logo: row 1 (hair top) realigned over face** (in `template-v2/CLAUDE.md`). The top hair row was indented 6 leading spaces while the face row below it starts at column 0, so the hair appeared shifted ~4 characters to the right. Now sits correctly above the face. Existing installs see the fix on next upgrade.
21
47
  - **Installer no longer treats flag-looking arguments as install targets.** Previously, `npx get-claudia --help` (or any unrecognized `--flag`) was interpreted as the install path and would create a `./--help/` directory with templates copied into it. Now: `--help`/`-h` prints a proper usage message and exits, `--version`/`-V` prints the version, and any other leading-dash argument prints `Error: unrecognized argument: <flag>` and exits 1 without touching the filesystem.
22
-
23
48
  ## 1.60.1 (2026-05-22)
24
49
 
25
50
  ### Two bug fixes from the fixing-phase pass
package/bin/index.js CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
 
3
3
  import { existsSync, mkdirSync, cpSync, readdirSync, readFileSync, writeFileSync, statSync, renameSync, unlinkSync, copyFileSync } from 'fs';
4
- import { join, dirname } from 'path';
4
+ import { join, dirname, resolve } from 'path';
5
5
  import { fileURLToPath } from 'url';
6
6
  import { spawn, execFileSync } from 'child_process';
7
7
  import { homedir } from 'os';
@@ -828,7 +828,9 @@ async function main() {
828
828
  // Support "." or "upgrade" for current directory
829
829
  const isCurrentDir = arg === '.' || arg === 'upgrade';
830
830
  const targetDir = isCurrentDir ? '.' : (arg || 'claudia');
831
- const targetPath = isCurrentDir ? process.cwd() : join(process.cwd(), targetDir);
831
+ // resolve() (not join) so an absolute arg is honored as-is and the result is
832
+ // always absolute (this value is what gets written into ~/.claudia/claudia-home).
833
+ const targetPath = isCurrentDir ? process.cwd() : resolve(process.cwd(), targetDir);
832
834
 
833
835
  // Check if directory already exists with Claudia files
834
836
  let isUpgrade = false;
@@ -969,11 +971,14 @@ async function main() {
969
971
  }
970
972
  renderer.render();
971
973
 
972
- // Only run vault step
973
- runVaultStep(renderer, () => {
974
- renderer.stopSpinner();
975
- renderer.render();
976
- showCompletion(targetDir, isCurrentDir, false, undefined, isUpgrade);
974
+ // The `claudia` shell command is independent of the memory system, so install
975
+ // it here too, otherwise --skip-memory users get no `claudia` command.
976
+ runShellStep(renderer, targetPath, () => {
977
+ runVaultStep(renderer, () => {
978
+ renderer.stopSpinner();
979
+ renderer.render();
980
+ showCompletion(targetDir, isCurrentDir, false, undefined, isUpgrade);
981
+ });
977
982
  });
978
983
  return;
979
984
  }
@@ -1563,8 +1568,11 @@ async function main() {
1563
1568
 
1564
1569
  renderer.stopSpinner();
1565
1570
 
1566
- // Shell helper step, then vault, then completion
1567
- runShellStep(renderer, targetDir, () => {
1571
+ // Shell helper step, then vault, then completion.
1572
+ // Pass the absolute targetPath (not the relative targetDir): this value lands
1573
+ // in ~/.claudia/claudia-home, and a relative value there breaks `claudia` from
1574
+ // any directory other than the install's parent.
1575
+ runShellStep(renderer, targetPath, () => {
1568
1576
  runVaultStep(renderer, () => {
1569
1577
  renderer.render();
1570
1578
  showDbScanResults(dbScan);
package/bin/shell-init.js CHANGED
@@ -28,17 +28,37 @@ export const SHELL_INIT_CONTENT = `# Claudia shell helpers — sourced from your
28
28
 
29
29
  _claudia_home() {
30
30
  local home_file="$HOME/.claudia/claudia-home"
31
- if [ ! -f "$home_file" ]; then
32
- echo "Claudia home not configured. Run: npx get-claudia ." >&2
33
- return 1
31
+ local dir=""
32
+ [ -f "$home_file" ] && dir="$(cat "$home_file" 2>/dev/null)"
33
+
34
+ # A relative value (e.g. an older install that stored "claudia") is anchored to
35
+ # $HOME, never the current directory, so \`claudia\` works from anywhere.
36
+ case "$dir" in
37
+ /*) ;; # already absolute
38
+ "") ;; # empty -> handled by recovery below
39
+ *) dir="$HOME/$dir" ;; # relative -> resolve under $HOME
40
+ esac
41
+
42
+ # Recover from a missing or stale path by falling back to the default install
43
+ # location when it looks like a real Claudia install.
44
+ if [ -z "$dir" ] || [ ! -d "$dir" ]; then
45
+ if [ -d "$HOME/claudia/.claude" ] || [ -f "$HOME/claudia/CLAUDE.md" ]; then
46
+ dir="$HOME/claudia"
47
+ fi
34
48
  fi
35
- local dir
36
- dir="$(cat "$home_file")"
37
- if [ ! -d "$dir" ]; then
38
- echo "Claudia home directory not found: $dir" >&2
39
- echo "Fix by editing $home_file" >&2
49
+
50
+ if [ -z "$dir" ] || [ ! -d "$dir" ]; then
51
+ echo "Claudia install not found. Run: npx get-claudia ." >&2
52
+ [ -f "$home_file" ] && echo "(or set the correct path in $home_file)" >&2
40
53
  return 1
41
54
  fi
55
+
56
+ # Self-heal: persist the corrected absolute path so the error never recurs.
57
+ if [ "$(cat "$home_file" 2>/dev/null)" != "$dir" ]; then
58
+ mkdir -p "$HOME/.claudia" 2>/dev/null
59
+ printf '%s\\n' "$dir" > "$home_file" 2>/dev/null || true
60
+ fi
61
+
42
62
  printf '%s' "$dir"
43
63
  }
44
64
 
@@ -0,0 +1,9 @@
1
+ """Loop-engineering helpers (Proposal 11).
2
+
3
+ Shared infrastructure for Maker-Checker loops: atomic, human-readable status
4
+ files that act as the control plane for both skill-level and daemon-level loops.
5
+ """
6
+
7
+ from claudia_memory.loops.status import read_status, write_status
8
+
9
+ __all__ = ["read_status", "write_status"]
@@ -0,0 +1,85 @@
1
+ """Atomic status-file helper for loop engineering (Proposal 11, E1/B3).
2
+
3
+ A status file is Markdown with a YAML frontmatter block. The frontmatter carries
4
+ the structured control fields (loop_id, verified, checker_verdict, next_action,
5
+ ...) and the body carries the human-readable narrative:
6
+
7
+ ---
8
+ loop_id: consolidation
9
+ verified: true
10
+ next_action: none
11
+ ---
12
+
13
+ # Loop status: consolidation
14
+
15
+ Last run consolidated 12 memories; all invariants held.
16
+
17
+ Writes go through a temp file in the same directory followed by ``os.replace``,
18
+ so an interrupted write never leaves a partially-written file at the canonical
19
+ path. A reader sees either the complete previous file or the complete new one.
20
+ """
21
+
22
+ from __future__ import annotations
23
+
24
+ import os
25
+ from pathlib import Path
26
+ from typing import Any
27
+
28
+ import yaml
29
+
30
+ _DELIM = "---"
31
+
32
+
33
+ def write_status(path: str | os.PathLike, fields: dict[str, Any], body: str = "") -> None:
34
+ """Atomically write a status file at ``path``.
35
+
36
+ Serializes ``fields`` as YAML frontmatter and appends ``body`` as the
37
+ Markdown body. Creates missing parent directories. If the final rename
38
+ fails, the canonical path is left untouched and the temp file is removed.
39
+ """
40
+ target = Path(path)
41
+ target.parent.mkdir(parents=True, exist_ok=True)
42
+
43
+ frontmatter = yaml.safe_dump(fields, sort_keys=False, default_flow_style=False).strip()
44
+ content = f"{_DELIM}\n{frontmatter}\n{_DELIM}\n\n{body}"
45
+
46
+ # Temp file lives in the SAME directory as the target so that os.replace is
47
+ # a same-filesystem rename (atomic), not a cross-device copy.
48
+ tmp = target.with_name(f".{target.name}.tmp")
49
+ try:
50
+ with open(tmp, "w", encoding="utf-8") as fh:
51
+ fh.write(content)
52
+ fh.flush()
53
+ os.fsync(fh.fileno())
54
+ os.replace(tmp, target)
55
+ except BaseException:
56
+ # Any failure (including the rename) must leave the canonical path
57
+ # intact and must not litter the directory with a temp file.
58
+ try:
59
+ os.unlink(tmp)
60
+ except OSError:
61
+ pass
62
+ raise
63
+
64
+
65
+ def read_status(path: str | os.PathLike) -> tuple[dict[str, Any], str]:
66
+ """Read a status file, returning ``(fields, body)``.
67
+
68
+ ``fields`` is the parsed frontmatter (empty dict if there is none); ``body``
69
+ is the Markdown body with the single separating blank line removed.
70
+ """
71
+ text = Path(path).read_text(encoding="utf-8")
72
+ return _split_frontmatter(text)
73
+
74
+
75
+ def _split_frontmatter(text: str) -> tuple[dict[str, Any], str]:
76
+ if not text.startswith(_DELIM):
77
+ return {}, text
78
+ rest = text[len(_DELIM):].lstrip("\n")
79
+ end = rest.find(f"\n{_DELIM}")
80
+ if end == -1:
81
+ return {}, text
82
+ raw = rest[:end]
83
+ body = rest[end + 1 + len(_DELIM):].lstrip("\n")
84
+ fields = yaml.safe_load(raw) or {}
85
+ return fields, body
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "get-claudia",
3
- "version": "1.61.0",
3
+ "version": "1.62.0",
4
4
  "description": "An AI assistant who learns how you work.",
5
5
  "keywords": [
6
6
  "claudia",
@@ -0,0 +1,51 @@
1
+ ---
2
+ name: loop-checker
3
+ description: Independently scores a loop iteration's output against a rubric and returns a structured verdict. Adversarial by design: finds faults, does not confirm. Used by auto-research and future Maker-Checker loops.
4
+ model: haiku
5
+ dispatch-category: verification
6
+ dispatch-tier: task
7
+ auto-dispatch: false
8
+ ---
9
+
10
+ # Loop Checker
11
+
12
+ You are Claudia's Loop Checker. When a loop produces an iteration, Claudia
13
+ dispatches you to score it independently of whoever produced it. You are the
14
+ "Checker" half of the Maker-Checker pattern (Proposal 11).
15
+
16
+ Your brief is the Checker role brief at `.claude/skills/_loop/checker.md`. Follow
17
+ it exactly. The short version:
18
+
19
+ - You receive a `rubric`, an `artifact` to score, and the `proposed_change` the
20
+ Maker made (with the Maker's expected effect, if any).
21
+ - Score the artifact against each rubric dimension. Sum to a total.
22
+ - Hunt for faults: unmet constraints, regressions the change introduced,
23
+ dimensions the Maker over-claimed.
24
+ - Decide `verified`. When uncertain, default to `verified: false`: a false pass
25
+ costs more than one more iteration.
26
+ - Do not be swayed by the Maker's self-score. Score independently.
27
+
28
+ ## Output
29
+
30
+ Return exactly the verdict JSON defined in the Checker role brief:
31
+
32
+ ```json
33
+ {
34
+ "verified": false,
35
+ "score": 7.2,
36
+ "max_score": 10,
37
+ "issues": [
38
+ {"severity": "major", "issue": "the ask is buried in paragraph 3", "where": "body"}
39
+ ],
40
+ "rationale": "One or two sentences, grounded in the rubric.",
41
+ "hard_constraint_violated": false
42
+ }
43
+ ```
44
+
45
+ ## Constraints
46
+
47
+ - Do NOT edit the artifact. You score and report; the Maker edits.
48
+ - Do NOT decide keep or revert. You return a verdict; the loop applies the rule.
49
+ - Do NOT soften findings to be agreeable. Disagreement with the Maker is the
50
+ entire reason you exist as a separate agent.
51
+ - Be concrete: every issue names a severity, what is wrong, and where.
@@ -1,6 +1,6 @@
1
1
  {
2
- "version": "1.61.0",
3
- "generated": "2026-05-23T02:25:12.290Z",
2
+ "version": "1.62.0",
3
+ "generated": "2026-06-14T02:37:03.060Z",
4
4
  "algorithm": "sha256",
5
5
  "files": {
6
6
  ".claude/rules/claudia-principles.md": "939e9720421628e7f2e4c8dfbaa4aeb9c1e18e8c6a5379cd6b772a6835b812e5",
@@ -10,6 +10,10 @@
10
10
  ".claude/rules/shell-compatibility.md": "565977bc04e269b3ce7d8a7963173df4f44bb9634692ea76abb1b64f6a67513e",
11
11
  ".claude/rules/trust-north-star.md": "0188b17c26b791cf597ce975bb75d40f543aa3d6b84b7edb1309b78530b3d43f",
12
12
  ".claude/skills/README.md": "22311215317c61dcb67975af02bdd1d7164578713ff9faf6fbed8aec14256bcd",
13
+ ".claude/skills/_loop/README.md": "13d44b9d2f88be477a71773a1dbbe8a3270bdda39a16785beaeff92abaeb019e",
14
+ ".claude/skills/_loop/checker.md": "dcec7e38330244973020050bb95e76853ff5f05ebf4ca82b6b442d2eb26f7832",
15
+ ".claude/skills/_loop/maker.md": "eadc6c2c59f3e43f4b029c3d40ce3af29cd746da219c132da0888d31a0743e01",
16
+ ".claude/skills/_loop/repair.md": "675704a1aa1eb8463dac6270bd02ea34099674a9012e1b65924ef6923b384d5b",
13
17
  ".claude/skills/agent-dispatcher.md": "b48fc5283f0a88d1ebbc692b30b6c326fef012d4168dcf721a94e3c954b76b90",
14
18
  ".claude/skills/archetypes/_base-structure.md": "c0d8df77c07aa48cd9ba9c5ba3eddb533fcea790c7f98440fa3d31405fd5c75d",
15
19
  ".claude/skills/archetypes/consultant.md": "1e0ccbf89115f92a1fa0c86b48edcce22668c4bb828ba4268376fcd6b5c05680",
@@ -17,7 +21,7 @@
17
21
  ".claude/skills/archetypes/executive.md": "0d43c5c2c6315f5a4d6d36150778b55229cc922e0491a80af54824d87ac52e82",
18
22
  ".claude/skills/archetypes/founder.md": "a5bf3f22439c7c5f554c13966899a4af4de80f90c8f6a25eb62be1c68de6d54f",
19
23
  ".claude/skills/archetypes/solo.md": "cc26332b96394775e62a0f1a06f32093be6147bb75fa3783aa271e554d323c6b",
20
- ".claude/skills/auto-research/SKILL.md": "1a54e5d82f4e87c78a54e1331ff2d32aaff12c2712058e21173aa900a57b01c3",
24
+ ".claude/skills/auto-research/SKILL.md": "2f7f8a7d0731813921094b0117026693314f0f59e8d9b52637bd3e775a2e95f8",
21
25
  ".claude/skills/auto-research/references/program-template.md": "850457de96bde5ad65d24f1018509da94dd31fec9ec45c15087031f470a28dbc",
22
26
  ".claude/skills/auto-research/references/safety-rules.md": "ae036270e86949ed0f9bface582f6038276adafd3f40a29aa733698111a26683",
23
27
  ".claude/skills/brain/SKILL.md": "b87dd93c85923c676292bea67f7a0580ee71eea2bea05426ea6eda951a140d5c",
@@ -46,7 +50,7 @@
46
50
  ".claude/skills/ingest-sources/references/extraction-patterns.md": "28d6dc604c4a11eacfe4523e705ad430ba8fe26632bafcccaef656e9618e5aea",
47
51
  ".claude/skills/judgment-awareness.md": "0077129a87f91d295515108f3f9a68ca6874c6e38fc7f2a73a462704fa00ba2d",
48
52
  ".claude/skills/map-connections/SKILL.md": "e8133d5e3de36e32e3858a1ee3918a577855e77ade022933097f87a0100cc9a3",
49
- ".claude/skills/meditate/SKILL.md": "1d814f0c52f4309708669139ba4b637bf05a228e2381950ee3851c6145e25cc7",
53
+ ".claude/skills/meditate/SKILL.md": "1d4eab90fdedd652f75750df1a1c3be885fbeefb89d5d99979324e395eb6fd83",
50
54
  ".claude/skills/meditate/evals/basic.yaml": "daba441b2fd9d1d4afddcff6eaa9673884198b1d0f85d9d06edbe0738012291a",
51
55
  ".claude/skills/meeting-prep/SKILL.md": "cb0b2afc41052a0652e7de30d09e71ab489162fbb09c648e3c95cec2d438ba63",
52
56
  ".claude/skills/memory-audit/SKILL.md": "7daed2ec128ea70171b8e3ffaf4d933ca69db2637b1487f69724e3428c04663a",
@@ -0,0 +1,33 @@
1
+ # _loop: shared loop-engineering resources
2
+
3
+ This directory is **not a skill**. It has no `SKILL.md` and is not listed in
4
+ `skill-index.json`, so the skill router never loads it directly. It holds the
5
+ versioned Maker and Checker prompt templates that skill-level loops reuse, so the
6
+ Maker-Checker pattern is defined once and shared.
7
+
8
+ Consumers today:
9
+
10
+ - `auto-research` (Proposal 11, E2) uses `checker.md` to score each iteration
11
+ independently of the Maker.
12
+ - `build-team` (Proposal 11, E6, not yet built) will reuse both templates.
13
+
14
+ The underscore prefix marks this as an internal resource, not a user-facing
15
+ capability. See `docs/loop-status-schema.md` for the status-file contract these
16
+ loops write, and `docs/proposals/11-autonomy-personalization-layer.md` for the
17
+ design.
18
+
19
+ ## Files
20
+
21
+ | File | Role |
22
+ |------|------|
23
+ | `maker.md` | The producer prompt: makes one bounded change per iteration. |
24
+ | `checker.md` | The independent verifier prompt: scores adversarially, returns a structured verdict. |
25
+ | `repair.md` | The self-repair sub-loop: when a loop stalls, diagnose and fix the harness, proven on the exact failing input. |
26
+
27
+ ## Why Maker and Checker are separate
28
+
29
+ A single agent grading its own work has a self-justification bias: it tends to
30
+ rate what it just produced as good. Splitting the roles, and giving the Checker
31
+ an adversarial brief (find faults, do not confirm), makes a passing verdict
32
+ actually mean something. The Checker runs on a cheaper model tier (Haiku) so the
33
+ verification pass is fast and inexpensive.
@@ -0,0 +1,64 @@
1
+ # Checker role brief (Proposal 11, E1/B2)
2
+
3
+ The Checker verifies work the Maker produced. It is dispatched as a separate
4
+ agent (`loop-checker`, Haiku) so it reasons independently of the Maker and costs
5
+ little to run. Its brief is adversarial: **find faults, do not confirm.** A
6
+ passing verdict from a Checker told to look for problems is worth something; a
7
+ rubber stamp is not.
8
+
9
+ ## What the Checker receives
10
+
11
+ The dispatching skill passes:
12
+
13
+ - `rubric`: the scoring criteria (from the loop's program/rubric file).
14
+ - `artifact`: the current output to score (the edited draft, the proposed team,
15
+ the consolidation result).
16
+ - `proposed_change`: the one change the Maker made this iteration, and the
17
+ Maker's expected effect (the self-score), if any.
18
+
19
+ ## How the Checker scores
20
+
21
+ 1. Score the artifact against each rubric dimension on its stated scale. Sum to a
22
+ total `score`.
23
+ 2. Actively hunt for what is wrong or weak: unmet constraints, regressions the
24
+ change introduced, dimensions the Maker over-claimed, anything the rubric
25
+ penalizes.
26
+ 3. Decide `verified`: did this artifact meet the rubric's bar (and not violate
27
+ any hard constraint)? When uncertain, default to `verified: false`. The cost
28
+ of a false pass is higher than the cost of one more iteration.
29
+ 4. Do not be swayed by the Maker's self-score. Score independently first, then
30
+ you may note divergence.
31
+
32
+ ## Output contract (return exactly this JSON)
33
+
34
+ ```json
35
+ {
36
+ "verified": false,
37
+ "score": 7.2,
38
+ "max_score": 10,
39
+ "issues": [
40
+ {"severity": "major", "issue": "the ask is buried in paragraph 3", "where": "body"},
41
+ {"severity": "minor", "issue": "two sentences exceed the length cap", "where": "closing"}
42
+ ],
43
+ "rationale": "Lede improved, but the rubric weights ask-clarity heavily and the ask is still not in the opening.",
44
+ "hard_constraint_violated": false
45
+ }
46
+ ```
47
+
48
+ - `verified` (bool): does the artifact clear the bar this iteration?
49
+ - `score` (float): total across rubric dimensions.
50
+ - `max_score` (float): the maximum the rubric allows, so the caller can normalize.
51
+ - `issues` (array): concrete, located faults. Empty only when there genuinely are
52
+ none. Each has `severity` (`major` | `minor`), `issue`, and `where`.
53
+ - `rationale` (string): one or two sentences. Why this verdict, grounded in the
54
+ rubric.
55
+ - `hard_constraint_violated` (bool): true if any non-negotiable constraint failed
56
+ (which forces `verified: false` regardless of score).
57
+
58
+ ## What the Checker does not do
59
+
60
+ - It does not edit the artifact. It scores and reports; the Maker edits.
61
+ - It does not decide keep/revert. It returns a verdict; the loop applies the
62
+ rule (keep if the score beats the running best and no hard constraint failed).
63
+ - It does not soften findings to be agreeable. Disagreement with the Maker is the
64
+ whole point.
@@ -0,0 +1,39 @@
1
+ # Maker role brief (Proposal 11, E1/B2)
2
+
3
+ The Maker produces work. In a Claudia loop, the Maker is Claudia herself (the
4
+ main model), not a dispatched subagent. This brief defines how the Maker behaves
5
+ inside any Maker-Checker loop, so the producer role is consistent across
6
+ `auto-research`, `build-team`, and future loops.
7
+
8
+ ## Contract
9
+
10
+ Each iteration, the Maker does exactly one thing:
11
+
12
+ 1. **Read the state.** The program/rubric, the current artifact, and the last
13
+ few rows of history (what was tried, what was kept, what was reverted).
14
+ 2. **Propose ONE change.** One specific, bounded edit, justified in a single
15
+ sentence. Not a rewrite. Not a refactor. Not three changes bundled together.
16
+ One lever, pulled deliberately, so the Checker can attribute the score delta
17
+ to it.
18
+ 3. **Apply it** to the working copy (never the user's original file during the
19
+ loop).
20
+ 4. **Hand off to the Checker.** The Maker does not score its own work for the
21
+ keep/revert decision. It may state an expected effect ("this should raise the
22
+ lede dimension"), which becomes the Maker self-score the Checker's verdict is
23
+ compared against, but the Checker's score governs.
24
+
25
+ ## Discipline
26
+
27
+ - One change per iteration. Bundling changes destroys the signal.
28
+ - Justify in one sentence. If the justification needs a paragraph, the change is
29
+ too big; split it.
30
+ - Stay inside the bounds declared up front (see the exit-condition standard in
31
+ `docs/loop-status-schema.md`).
32
+ - Honest self-assessment. Do not inflate the expected effect to pre-empt the
33
+ Checker. The point of the split is that the Checker is not you.
34
+
35
+ ## Why the Maker does not grade itself
36
+
37
+ A model that grades its own output rates it generously: it just argued itself
38
+ into producing exactly that. The keep/revert decision therefore belongs to an
39
+ independent Checker (see `checker.md`). The Maker proposes; the Checker disposes.
@@ -0,0 +1,85 @@
1
+ # Self-repair brief (Proposal 11, E3)
2
+
3
+ When a Maker-Checker loop keeps failing, the problem may not be the artifact: it
4
+ may be the harness (the rubric, the Maker brief, or the Checker brief). The
5
+ self-repair sub-loop diagnoses that, proposes a fix, and proves the fix on the
6
+ exact input that failed before adopting it. It is itself bounded, and it never
7
+ auto-edits shipped briefs or the user's live files.
8
+
9
+ This brief is shared. `auto-research` uses it today; future loops reuse it.
10
+
11
+ ## B1: When self-repair triggers
12
+
13
+ Enter the self-repair sub-loop when any of these holds during a loop:
14
+
15
+ - The Checker returns `verified: false` for **3 iterations in a row**.
16
+ - The Checker's `score` **regresses** across 3 consecutive iterations (each lower
17
+ than the last).
18
+ - An iteration raises an **exception** the loop cannot attribute to the artifact
19
+ (for example, the Checker returns malformed output twice).
20
+
21
+ These are signals that hill-climbing has stalled for a structural reason, not
22
+ that one more edit will help.
23
+
24
+ ## B2: The repair sub-loop
25
+
26
+ Self-repair is itself a small Maker-Checker loop, run on the harness instead of
27
+ the artifact:
28
+
29
+ 1. **Diagnose (Maker).** Read the last few status rows and the Checker's `issues`.
30
+ State, in one or two sentences, the single most likely harness cause: an
31
+ ambiguous rubric dimension, a Maker brief that permits bundled changes, a
32
+ Checker brief that scores the wrong thing, a missing hard constraint.
33
+ 2. **Propose one fix (Maker).** One concrete edit to exactly one harness piece
34
+ (the rubric in `program.md`, `maker.md`, or `checker.md`). Not a rewrite. One
35
+ lever.
36
+ 3. **Validate on the exact failing input (Checker).** Re-run the failing
37
+ iteration's input through the loop **with the proposed fix applied in a scratch
38
+ copy**, and dispatch the `loop-checker`. The fix is adopted only if that exact
39
+ previously-failing case now reaches `verified: true` (or scores materially
40
+ higher) and no previously-passing regression fixture breaks.
41
+ 4. **Adopt or discard.** If it passes, apply the fix (see B4 for where).
42
+ If it does not, discard it and either try one more diagnosis (within the B4
43
+ bound) or stop and hand the loop back to the user with the diagnosis.
44
+
45
+ The non-negotiable rule is step 3: a fix is never adopted on a hunch. It must be
46
+ proven on the input that exposed the problem.
47
+
48
+ ## B3: Regression capture
49
+
50
+ Every confirmed repair leaves a regression behind so the same failure cannot
51
+ silently return:
52
+
53
+ - Save the failing input plus the verdict it should now produce as a fixture
54
+ under `~/.claudia/loops/regressions/<loop-id>/<short-slug>.md` (a status file in
55
+ the standard schema, with the expected `verified`/`score`).
56
+ - On future runs of that loop, replay the loop-id's regression fixtures through
57
+ the Checker first. If any regression breaks, the loop refuses to start and
58
+ reports which fix regressed.
59
+
60
+ Shipped-skill regressions (ones that should travel with the template, not just
61
+ the user's machine) are proposed to the user for inclusion as in-repo test
62
+ fixtures instead.
63
+
64
+ ## B4: Bound and safety
65
+
66
+ Self-repair obeys the same bounded-autonomy rules as every loop:
67
+
68
+ - **Bounded.** At most **2 repair attempts** per loop run. After that, stop and
69
+ hand back to the user with the diagnosis. Self-repair never loops forever.
70
+ - **Workspace-only for artifacts.** The repair sub-loop validates in a scratch
71
+ copy and never touches the user's original file.
72
+ - **Never auto-edit shipped briefs.** A fix to `maker.md` or `checker.md` (files
73
+ that ship in the template) is **proposed as a diff for the user to approve**,
74
+ never written silently. A fix to a run-local `program.md` rubric may be applied
75
+ within the run, since that rubric is the user's own per-run governance file.
76
+ - **Logged.** Each repair attempt appends to the loop's status file: what was
77
+ diagnosed, what was changed, and whether the exact failing input passed after
78
+ the change.
79
+
80
+ ## What self-repair is not
81
+
82
+ - It is not a license to rewrite the harness whenever a loop is hard. It triggers
83
+ only on the B1 signals, and it changes one thing at a time.
84
+ - It is not autonomous prompt-editing of shipped files. Those changes always end
85
+ at a human approval gate.
@@ -14,11 +14,11 @@ A hill-climber for any local artifact with a rubric. Inspired by [Karpathy's aut
14
14
  Three primitives, one governance file:
15
15
 
16
16
  1. **The artifact** — what gets iterated on (a draft email, a brief, a wiki page, a one-pager). Lives in the workspace, never touches the user's original file during the loop.
17
- 2. **The evaluator** — a scalar rubric Claudia scores against. Plain English. Must produce a number for the original artifact before the loop starts.
17
+ 2. **The evaluator** — a scalar rubric. Plain English. An independent Checker (the `loop-checker` agent, Haiku) scores each iteration against it, not Claudia herself. The rubric must produce a number for the original artifact before the loop starts.
18
18
  3. **The budget** — iteration count (default 20). Loop stops when budget is exhausted or score plateaus.
19
19
  4. **The program** (governance file) — what the user wants, what's off-limits, what counts as "better." Claudia helps the user write this if they don't volunteer it.
20
20
 
21
- Loop: read program + artifact + results history propose one specific change implement score if better, ratchet (commit); if worse, revert (git reset). Report each iteration as a one-line update. Repeat until budget is hit.
21
+ Loop: read program + artifact + results history, Claudia (the Maker) proposes one specific change, implements it, then dispatches the `loop-checker` (the Checker) to score it independently. If the Checker's score beats the running best, ratchet (commit); if worse, revert (git reset). Write the status file. Report each iteration as a one-line update. Repeat until budget is hit.
22
22
 
23
23
  ## When to invoke this skill
24
24
 
@@ -53,7 +53,8 @@ Each run gets its own workspace at `~/.claudia/auto-research/<task-id>/`:
53
53
  ├── program.md the brief (user-authored with Claudia's help)
54
54
  ├── artifact.md (or .txt, the working copy that gets edited)
55
55
  ├── original.md immutable copy of the input, for reference and diff
56
- ├── results.tsv one row per iteration: timestamp, score, kept/reverted, change-summary
56
+ ├── results.tsv one row per iteration: timestamp, score, kept/reverted/contested, change-summary
57
+ ├── research_status.md loop control plane (see docs/loop-status-schema.md): iteration, verified, score, checker_verdict, next_action
57
58
  ├── best.md symlink (or copy on Windows) to the highest-scoring version
58
59
  └── iterations/
59
60
  ├── 01/artifact.md
@@ -111,16 +112,50 @@ For each iteration N:
111
112
  1. **Read the state.** Read `program.md`, `artifact.md`, last 3 rows of `results.tsv`.
112
113
  2. **Propose one change.** One specific edit, justified in one sentence. Not a rewrite. Not a refactor. One change.
113
114
  3. **Implement.** Apply the edit to `artifact.md`. Save.
114
- 4. **Score.** Score the new `artifact.md` against the rubric in `program.md`. Sum the dimensions. Round to one decimal.
115
- 5. **Decide.** If new score > best score so far: ratchet. Copy `artifact.md` to `iterations/NN/`, update `best.md` to point at the new winner, append a row to `results.tsv` marked `kept`. If new score <= best: revert. Copy `iterations/(best_iter)/artifact.md` back to `artifact.md`, append a row marked `reverted`.
116
- 6. **Report.** One line to the user: `iter N: score 7.4 (kept), change: tightened lede to one sentence`.
117
- 7. **Check stop conditions.** If budget exhausted: stop. If score plateaued (no improvement for 5 iterations): stop. Otherwise: next iteration.
115
+ 4. **Score (independent Checker).** Dispatch the `loop-checker` agent (Haiku) with the rubric from `program.md`, the new `artifact.md`, and the one change you just made plus your expected effect (your Maker self-score). The Checker returns a verdict JSON: `verified`, `score`, `max_score`, `issues`, `rationale`, `hard_constraint_violated`. You do NOT score it yourself; the Checker's `score` governs the decision. (See `.claude/skills/_loop/checker.md`.)
116
+ 5. **Decide.** If the Checker's `score` > best score so far and `hard_constraint_violated` is false: ratchet. Copy `artifact.md` to `iterations/NN/`, update `best.md` to point at the new winner, append a row to `results.tsv` marked `kept`. Otherwise: revert. Copy `iterations/(best_iter)/artifact.md` back to `artifact.md`, append a row marked `reverted`.
117
+ 6. **Flag divergence.** Compare the Checker's `score` against your Maker self-score. If they diverge by more than 1.5 points on a 10-point scale, mark the iteration `contested` in `results.tsv`. The Checker's verdict still governs; the flag just surfaces the disagreement in the end-of-run summary.
118
+ 7. **Write the status file.** Atomically update `research_status.md` with `iteration`, `verified`, `score`, `budget_remaining`, `last_input`, `maker_proposal`, `checker_verdict` (the Checker's rationale), and `next_action`. Write to a temp sibling and rename; never edit it in place. (See `docs/loop-status-schema.md`.)
119
+ 8. **Report.** One line to the user: `iter N: score 7.4 (kept), change: tightened lede to one sentence`. Append `[contested]` when flagged.
120
+ 9. **Check stop conditions and self-repair.** If the Checker has returned `verified: false` 3 times in a row, or its score has regressed for 3 straight iterations, enter the self-repair sub-loop (see `.claude/skills/_loop/repair.md`) before giving up: it diagnoses whether the rubric or a brief is the real problem, proposes one fix, and proves it on the exact failing input. If budget exhausted: stop. If score plateaued (no improvement for 5 iterations): stop. Otherwise: next iteration.
121
+
122
+ ## The independent Checker (Maker-Checker)
123
+
124
+ This loop separates the producer from the verifier (Proposal 11, E2). Claudia is
125
+ the **Maker**: she reads the state and proposes one change. The **Checker** is a
126
+ separate agent (`loop-checker`, Haiku) that scores the result independently. The
127
+ two never collapse into one model grading its own work, which is the bias the
128
+ split exists to remove.
129
+
130
+ - The Checker's `score`, not Claudia's, drives keep/revert. Claudia's expected
131
+ effect is only a self-score, used to detect disagreement.
132
+ - The Checker is adversarial by brief: it hunts for faults and defaults to
133
+ `verified: false` when uncertain. A passing verdict therefore means something.
134
+ - A `hard_constraint_violated: true` verdict forces a revert regardless of score.
135
+
136
+ **Contested iterations.** When the Checker's score and Claudia's self-score
137
+ diverge by more than 1.5 points (10-point scale), the iteration is marked
138
+ `contested`. The Checker still governs the decision; the flag makes the
139
+ disagreement visible. The end-of-run summary reports how many iterations were
140
+ contested, which is a useful signal that the rubric is ambiguous or that Claudia
141
+ was over-optimistic.
142
+
143
+ **Cost.** The Checker runs on Haiku, so the verification pass is cheap and fast.
144
+ With the default 20-iteration budget, that is at most 21 Checker dispatches
145
+ (including the baseline), each scoped to a single small artifact.
146
+
147
+ **Self-repair.** If the loop stalls (the Checker fails 3 times running, or scores
148
+ regress), `auto-research` enters a bounded self-repair sub-loop that treats the
149
+ harness, not the artifact, as the suspect: it diagnoses the rubric or a brief,
150
+ proposes one fix, and proves it on the exact failing input before adopting it.
151
+ See `.claude/skills/_loop/repair.md`. It is capped at 2 repair attempts and never
152
+ auto-edits shipped briefs (those changes go to a human approval gate).
118
153
 
119
154
  ## Safety rules (mandatory, follow without exception)
120
155
 
121
156
  1. **Workspace-only edits.** The loop never writes to a file outside `~/.claudia/auto-research/<task-id>/`. The user's original file at `/Users/.../wherever.md` is untouched until the user explicitly approves the final hand-off.
122
157
  2. **No external actions during the loop.** No sending, no posting, no scheduling, no calling tools that affect the outside world. The loop is a closed system.
123
- 3. **Baseline-score gate.** Before iteration 1, score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)`. If the rubric cannot produce a number for the original, the loop does not start; Claudia tells the user the rubric needs to be more specific.
158
+ 3. **Baseline-score gate.** Before iteration 1, dispatch the `loop-checker` to score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)` and to `research_status.md` as iteration 0. If the Checker cannot produce a number for the original, the loop does not start; Claudia tells the user the rubric needs to be more specific.
124
159
  4. **Bounded budget.** Default 20 iterations. User can set higher (50 max) or lower. Plateau detection (no improvement for 5 in a row) stops early. The loop is not allowed to be "open-ended" the way Karpathy's autoresearch is; Claudia's safety principles require an end.
125
160
  5. **Per-iteration revertability.** Each iteration is a git commit inside the workspace. Workspace is a fresh git repo (`git init` on start, `git commit -am` per iteration). At any point: `git log` shows the history, `git checkout` rolls back to any prior iteration.
126
161
  6. **User interrupt at any time.** If the user says "stop", "pause", "show me what you have", or otherwise interrupts: stop the loop immediately. The workspace persists. The best version so far is in `best.md`. The user can resume by saying "continue" (loop picks up from the current best).
@@ -178,10 +213,13 @@ When the user gives a bad rubric, help them turn it into a good one before start
178
213
  - `wiki` for iterating wiki pages specifically (compress, sharpen, restructure)
179
214
  - `draft-reply`, `follow-up-draft` for one-shot draft generation
180
215
  - `summarize-doc` for one-pass summarization
216
+ - `.claude/skills/_loop/` for the shared Maker and Checker role briefs
217
+ - `.claude/agents/loop-checker.md` for the Checker agent definition
218
+ - `docs/loop-status-schema.md` for the `research_status.md` control-plane format
181
219
 
182
220
  ## Open questions for future versions
183
221
 
184
- - Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today, Claudia scores against the rubric in `program.md` directly.
222
+ - Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today, the `loop-checker` agent scores against the rubric in `program.md`.
185
223
  - Token-spend budget and wall-clock budget are deferred. Today, only iteration count is supported.
186
224
  - Parallel branches (try 3 different changes in parallel, keep the winner) are deferred. Today, sequential only.
187
225
  - Wiki-page iteration as a first-class use case (apply auto-research to a wiki page until it's < N words while citing all sources) is deferred. The skill supports this in principle today, but a worked example doesn't ship until the wiki has earned its keep.
@@ -42,6 +42,7 @@ Silently retrieve:
42
42
 
43
43
  - Call the `memory_reflections` MCP tool to see what already exists
44
44
  - Call the `memory_session_context` MCP tool for recent context (if available)
45
+ - Read recent loop status files (`~/.claudia/loops/*_status.md`, and any `research_status.md` from `auto-research` runs this session) to see how Claudia's own Maker-Checker loops performed: which iterations were `contested`, which loops never reached `verified`, where budgets ran out. This feeds the harness review in Step 2b.
45
46
 
46
47
  ### Step 2: Generate Reflections
47
48
 
@@ -78,6 +79,18 @@ Review the session and identify 1-3 reflections. Ask yourself:
78
79
 
79
80
  **Quality over quantity.** One genuine insight beats three generic observations.
80
81
 
82
+ ### Step 2b: Loop harness review (only if loops ran this session)
83
+
84
+ If any Maker-Checker loop ran this session (`auto-research`, or a wrapped daemon job), review how the harness itself performed, separately from reflections about the user. Read the signal from the status files gathered in Step 1, not from memory:
85
+
86
+ - **Contested iterations.** Did the Maker (Claudia) and the Checker disagree often? Frequent `contested` flags usually mean the rubric is ambiguous or Claudia was over-optimistic, not that the Checker is wrong.
87
+ - **Stuck loops.** Did a loop burn its whole budget without reaching `verified`? That points at the rubric, the Maker brief, or the Checker brief needing sharpening.
88
+ - **Recurring failure shape.** Across runs, is the same kind of issue showing up in the Checker's `issues` list?
89
+
90
+ If a clear, repeatable harness problem emerges, draft a **harness-improvement proposal**: a concrete edit to a rubric, the Maker brief (`.claude/skills/_loop/maker.md`), the Checker brief (`.claude/skills/_loop/checker.md`), or a new judgment rule about how to run loops. Do not apply it yet. Harness proposals go through the Checker gate in Step 5.
91
+
92
+ Most sessions produce no harness proposal. Raise one only when the evidence is in the status files.
93
+
81
94
  ### Step 3: Present for Approval
82
95
 
83
96
  Format reflections clearly and ask for approval:
@@ -171,6 +184,12 @@ Call the `memory_end_session` MCP tool with:
171
184
  5. Set `source` to `meditate/YYYY-MM-DD` (today's date)
172
185
  6. Write the updated file
173
186
 
187
+ **Harness proposals go through the Checker before anything is written** (Proposal 11, E4). If Step 2b produced a harness-improvement proposal (a rubric edit, a Maker or Checker brief edit, or a loop-running judgment rule), validate it first:
188
+
189
+ 1. Dispatch the `loop-checker` agent with the proposal as the artifact and a one-line rubric: "does this change actually fix the observed failure without breaking the brief's contract?"
190
+ 2. Write the proposal only if the Checker returns `verified: true`. A proposal the Checker rejects is surfaced to the user as an open question, not saved.
191
+ 3. Edits to shipped briefs (`_loop/maker.md`, `_loop/checker.md`) are never auto-applied. Present them as a diff for the user to approve, like any other change to Claudia's own files. Loop-running judgment rules go into `context/judgment.yaml` like any other rule, after passing the Checker.
192
+
174
193
  Confirm storage with both reflections and rules:
175
194
  "Got it. I've saved [N] reflection(s) and added [M] judgment rule(s) to your judgment file. I'll apply them going forward. See you next time."
176
195