get-claudia 1.61.1 → 1.63.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,27 @@
2
2
 
3
3
  All notable changes to Claudia will be documented in this file.
4
4
 
5
+ ## 1.63.0 (2026-06-13)
6
+
7
+ ### Added
8
+
9
+ - **`/build-team` skill** (Proposal 11, E6). A user-invoked counterpart to `hire-agent`: it reads the user's profile, judgment rules, and real task history, then proposes a minimal, tailored agent team in one pass. The proposal runs through the independent `loop-checker` (scored against goal alignment, right-sizing, and a "progressive, not overwhelming" hard constraint, bounded to 2 revisions), is written to a `team_status.md` control file, and is gated on explicit user approval. On approval it scaffolds agent definitions and writes `.bak` siblings for any modified file so the change is reversible. Reuses the existing roster first and never auto-spawns agents or takes external actions. New files: `template-v2/.claude/skills/build-team/SKILL.md` (+ `skill-index.json` entry).
10
+ - **Proactive team review** (Proposal 11, E7). `hire-agent` now distinguishes "add one agent" from "the whole team has drifted." When it detects an archetype shift, a new recurring task class the roster does not cover, or two-plus unused agents, it gently suggests reviewing the team as a unit and routes to `/build-team` for the validated, approval-gated, reversible change. One suggestion at a time; never auto-applied; a review can shrink a team as readily as grow it.
11
+
12
+ ## 1.62.0 (2026-06-13)
13
+
14
+ ### Loop engineering foundation (Proposal 11, Phases 1-2)
15
+
16
+ The first two phases of the Autonomy & Personalization Layer: the Maker-Checker pattern on the existing `auto-research` loop, a bounded self-repair sub-loop, the shared loop infrastructure later loops reuse, and a `meditate` feed that reviews how the loops themselves performed. See `docs/proposals/11-autonomy-personalization-layer.md`.
17
+
18
+ ### Added
19
+
20
+ - **Independent Checker for `auto-research`** (Proposal 11, E2). The iteration loop no longer grades its own work. Each iteration now dispatches a new `loop-checker` agent (Haiku, adversarial brief) that scores the artifact against the rubric independently; the Checker's score, not Claudia's, drives the keep/revert decision. When the Checker's score and Claudia's self-score diverge past threshold, the iteration is flagged `contested` and surfaced in the end-of-run summary. New files: `template-v2/.claude/agents/loop-checker.md` and `template-v2/.claude/skills/_loop/` (`maker.md`, `checker.md`, `README.md`).
21
+ - **Loop status-file schema** (Proposal 11, E1). `docs/loop-status-schema.md` defines a standard Markdown-body + YAML-frontmatter control plane (`last_input`, `maker_proposal`, `checker_verdict`, `verified`, `next_action`, and friends) plus the exit-condition standard every loop must declare up front. `auto-research` now writes `research_status.md` in this format each iteration.
22
+ - **Atomic status-file helper** (Proposal 11, E1/B3). New daemon module `claudia_memory/loops/status.py` (`write_status` / `read_status`) writes a status file through a same-directory temp file plus `os.replace`, so an interrupted write never leaves a partial file at the canonical path. Six tests in `memory-daemon/tests/test_loop_status.py` cover round-trip, native-type preservation, parent-dir creation, temp cleanup, and crash-safety.
23
+ - **Self-repair sub-loop** (Proposal 11, E3). When a loop stalls (the Checker fails 3 times running or scores regress), `auto-research` now enters a bounded self-repair pass (`template-v2/.claude/skills/_loop/repair.md`): it diagnoses whether the rubric or a brief is the real problem, proposes one fix, and proves it on the exact failing input before adopting it. Capped at 2 repair attempts; confirmed repairs leave a regression fixture under `~/.claudia/loops/regressions/`; edits to shipped briefs always go to a human approval gate.
24
+ - **`meditate` reviews loop-harness performance** (Proposal 11, E4). End-of-session reflection now reads recent loop status files, summarizes how Claudia's own Maker-Checker loops did (contested iterations, stuck loops, recurring Checker findings), and can propose a harness improvement. Any such proposal passes through the Checker before it is written, and edits to shipped briefs are presented as a diff for approval.
25
+
5
26
  ## 1.61.1 (2026-05-23)
6
27
 
7
28
  ### `claudia` command works from anywhere, and repairs itself
@@ -31,7 +52,6 @@ This release adds a `claudia` shell command (and `update-claudia`) so you no lon
31
52
 
32
53
  - **ASCII greeting logo: row 1 (hair top) realigned over face** (in `template-v2/CLAUDE.md`). The top hair row was indented 6 leading spaces while the face row below it starts at column 0, so the hair appeared shifted ~4 characters to the right. Now sits correctly above the face. Existing installs see the fix on next upgrade.
33
54
  - **Installer no longer treats flag-looking arguments as install targets.** Previously, `npx get-claudia --help` (or any unrecognized `--flag`) was interpreted as the install path and would create a `./--help/` directory with templates copied into it. Now: `--help`/`-h` prints a proper usage message and exits, `--version`/`-V` prints the version, and any other leading-dash argument prints `Error: unrecognized argument: <flag>` and exits 1 without touching the filesystem.
34
-
35
55
  ## 1.60.1 (2026-05-22)
36
56
 
37
57
  ### Two bug fixes from the fixing-phase pass
@@ -0,0 +1,9 @@
1
+ """Loop-engineering helpers (Proposal 11).
2
+
3
+ Shared infrastructure for Maker-Checker loops: atomic, human-readable status
4
+ files that act as the control plane for both skill-level and daemon-level loops.
5
+ """
6
+
7
+ from claudia_memory.loops.status import read_status, write_status
8
+
9
+ __all__ = ["read_status", "write_status"]
@@ -0,0 +1,85 @@
1
+ """Atomic status-file helper for loop engineering (Proposal 11, E1/B3).
2
+
3
+ A status file is Markdown with a YAML frontmatter block. The frontmatter carries
4
+ the structured control fields (loop_id, verified, checker_verdict, next_action,
5
+ ...) and the body carries the human-readable narrative:
6
+
7
+ ---
8
+ loop_id: consolidation
9
+ verified: true
10
+ next_action: none
11
+ ---
12
+
13
+ # Loop status: consolidation
14
+
15
+ Last run consolidated 12 memories; all invariants held.
16
+
17
+ Writes go through a temp file in the same directory followed by ``os.replace``,
18
+ so an interrupted write never leaves a partially-written file at the canonical
19
+ path. A reader sees either the complete previous file or the complete new one.
20
+ """
21
+
22
+ from __future__ import annotations
23
+
24
+ import os
25
+ from pathlib import Path
26
+ from typing import Any
27
+
28
+ import yaml
29
+
30
+ _DELIM = "---"
31
+
32
+
33
+ def write_status(path: str | os.PathLike, fields: dict[str, Any], body: str = "") -> None:
34
+ """Atomically write a status file at ``path``.
35
+
36
+ Serializes ``fields`` as YAML frontmatter and appends ``body`` as the
37
+ Markdown body. Creates missing parent directories. If the final rename
38
+ fails, the canonical path is left untouched and the temp file is removed.
39
+ """
40
+ target = Path(path)
41
+ target.parent.mkdir(parents=True, exist_ok=True)
42
+
43
+ frontmatter = yaml.safe_dump(fields, sort_keys=False, default_flow_style=False).strip()
44
+ content = f"{_DELIM}\n{frontmatter}\n{_DELIM}\n\n{body}"
45
+
46
+ # Temp file lives in the SAME directory as the target so that os.replace is
47
+ # a same-filesystem rename (atomic), not a cross-device copy.
48
+ tmp = target.with_name(f".{target.name}.tmp")
49
+ try:
50
+ with open(tmp, "w", encoding="utf-8") as fh:
51
+ fh.write(content)
52
+ fh.flush()
53
+ os.fsync(fh.fileno())
54
+ os.replace(tmp, target)
55
+ except BaseException:
56
+ # Any failure (including the rename) must leave the canonical path
57
+ # intact and must not litter the directory with a temp file.
58
+ try:
59
+ os.unlink(tmp)
60
+ except OSError:
61
+ pass
62
+ raise
63
+
64
+
65
+ def read_status(path: str | os.PathLike) -> tuple[dict[str, Any], str]:
66
+ """Read a status file, returning ``(fields, body)``.
67
+
68
+ ``fields`` is the parsed frontmatter (empty dict if there is none); ``body``
69
+ is the Markdown body with the single separating blank line removed.
70
+ """
71
+ text = Path(path).read_text(encoding="utf-8")
72
+ return _split_frontmatter(text)
73
+
74
+
75
+ def _split_frontmatter(text: str) -> tuple[dict[str, Any], str]:
76
+ if not text.startswith(_DELIM):
77
+ return {}, text
78
+ rest = text[len(_DELIM):].lstrip("\n")
79
+ end = rest.find(f"\n{_DELIM}")
80
+ if end == -1:
81
+ return {}, text
82
+ raw = rest[:end]
83
+ body = rest[end + 1 + len(_DELIM):].lstrip("\n")
84
+ fields = yaml.safe_load(raw) or {}
85
+ return fields, body
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "get-claudia",
3
- "version": "1.61.1",
3
+ "version": "1.63.0",
4
4
  "description": "An AI assistant who learns how you work.",
5
5
  "keywords": [
6
6
  "claudia",
@@ -0,0 +1,51 @@
1
+ ---
2
+ name: loop-checker
3
+ description: Independently scores a loop iteration's output against a rubric and returns a structured verdict. Adversarial by design: finds faults, does not confirm. Used by auto-research and future Maker-Checker loops.
4
+ model: haiku
5
+ dispatch-category: verification
6
+ dispatch-tier: task
7
+ auto-dispatch: false
8
+ ---
9
+
10
+ # Loop Checker
11
+
12
+ You are Claudia's Loop Checker. When a loop produces an iteration, Claudia
13
+ dispatches you to score it independently of whoever produced it. You are the
14
+ "Checker" half of the Maker-Checker pattern (Proposal 11).
15
+
16
+ Your brief is the Checker role brief at `.claude/skills/_loop/checker.md`. Follow
17
+ it exactly. The short version:
18
+
19
+ - You receive a `rubric`, an `artifact` to score, and the `proposed_change` the
20
+ Maker made (with the Maker's expected effect, if any).
21
+ - Score the artifact against each rubric dimension. Sum to a total.
22
+ - Hunt for faults: unmet constraints, regressions the change introduced,
23
+ dimensions the Maker over-claimed.
24
+ - Decide `verified`. When uncertain, default to `verified: false`: a false pass
25
+ costs more than one more iteration.
26
+ - Do not be swayed by the Maker's self-score. Score independently.
27
+
28
+ ## Output
29
+
30
+ Return exactly the verdict JSON defined in the Checker role brief:
31
+
32
+ ```json
33
+ {
34
+ "verified": false,
35
+ "score": 7.2,
36
+ "max_score": 10,
37
+ "issues": [
38
+ {"severity": "major", "issue": "the ask is buried in paragraph 3", "where": "body"}
39
+ ],
40
+ "rationale": "One or two sentences, grounded in the rubric.",
41
+ "hard_constraint_violated": false
42
+ }
43
+ ```
44
+
45
+ ## Constraints
46
+
47
+ - Do NOT edit the artifact. You score and report; the Maker edits.
48
+ - Do NOT decide keep or revert. You return a verdict; the loop applies the rule.
49
+ - Do NOT soften findings to be agreeable. Disagreement with the Maker is the
50
+ entire reason you exist as a separate agent.
51
+ - Be concrete: every issue names a severity, what is wrong, and where.
@@ -1,6 +1,6 @@
1
1
  {
2
- "version": "1.61.1",
3
- "generated": "2026-05-23T14:34:46.165Z",
2
+ "version": "1.63.0",
3
+ "generated": "2026-06-14T02:49:59.295Z",
4
4
  "algorithm": "sha256",
5
5
  "files": {
6
6
  ".claude/rules/claudia-principles.md": "939e9720421628e7f2e4c8dfbaa4aeb9c1e18e8c6a5379cd6b772a6835b812e5",
@@ -10,6 +10,10 @@
10
10
  ".claude/rules/shell-compatibility.md": "565977bc04e269b3ce7d8a7963173df4f44bb9634692ea76abb1b64f6a67513e",
11
11
  ".claude/rules/trust-north-star.md": "0188b17c26b791cf597ce975bb75d40f543aa3d6b84b7edb1309b78530b3d43f",
12
12
  ".claude/skills/README.md": "22311215317c61dcb67975af02bdd1d7164578713ff9faf6fbed8aec14256bcd",
13
+ ".claude/skills/_loop/README.md": "13d44b9d2f88be477a71773a1dbbe8a3270bdda39a16785beaeff92abaeb019e",
14
+ ".claude/skills/_loop/checker.md": "dcec7e38330244973020050bb95e76853ff5f05ebf4ca82b6b442d2eb26f7832",
15
+ ".claude/skills/_loop/maker.md": "eadc6c2c59f3e43f4b029c3d40ce3af29cd746da219c132da0888d31a0743e01",
16
+ ".claude/skills/_loop/repair.md": "675704a1aa1eb8463dac6270bd02ea34099674a9012e1b65924ef6923b384d5b",
13
17
  ".claude/skills/agent-dispatcher.md": "b48fc5283f0a88d1ebbc692b30b6c326fef012d4168dcf721a94e3c954b76b90",
14
18
  ".claude/skills/archetypes/_base-structure.md": "c0d8df77c07aa48cd9ba9c5ba3eddb533fcea790c7f98440fa3d31405fd5c75d",
15
19
  ".claude/skills/archetypes/consultant.md": "1e0ccbf89115f92a1fa0c86b48edcce22668c4bb828ba4268376fcd6b5c05680",
@@ -17,11 +21,12 @@
17
21
  ".claude/skills/archetypes/executive.md": "0d43c5c2c6315f5a4d6d36150778b55229cc922e0491a80af54824d87ac52e82",
18
22
  ".claude/skills/archetypes/founder.md": "a5bf3f22439c7c5f554c13966899a4af4de80f90c8f6a25eb62be1c68de6d54f",
19
23
  ".claude/skills/archetypes/solo.md": "cc26332b96394775e62a0f1a06f32093be6147bb75fa3783aa271e554d323c6b",
20
- ".claude/skills/auto-research/SKILL.md": "1a54e5d82f4e87c78a54e1331ff2d32aaff12c2712058e21173aa900a57b01c3",
24
+ ".claude/skills/auto-research/SKILL.md": "2f7f8a7d0731813921094b0117026693314f0f59e8d9b52637bd3e775a2e95f8",
21
25
  ".claude/skills/auto-research/references/program-template.md": "850457de96bde5ad65d24f1018509da94dd31fec9ec45c15087031f470a28dbc",
22
26
  ".claude/skills/auto-research/references/safety-rules.md": "ae036270e86949ed0f9bface582f6038276adafd3f40a29aa733698111a26683",
23
27
  ".claude/skills/brain/SKILL.md": "b87dd93c85923c676292bea67f7a0580ee71eea2bea05426ea6eda951a140d5c",
24
28
  ".claude/skills/brain-monitor/SKILL.md": "6e9ce1e6e40bc612a5126d84d743c1fd7ebc596f60575873533479fdcae5b608",
29
+ ".claude/skills/build-team/SKILL.md": "38a33c8f3e84d0b27f4a4a6ac10cc8318e684b0611a9052a1dd47acbd26874d9",
25
30
  ".claude/skills/capability-suggester.md": "06235c8ab062d356cf9b6c18c7aa44ea667e7694983a46e8ae0a9fe32e734944",
26
31
  ".claude/skills/capture-meeting/SKILL.md": "815e7587fcd925323c379a6c57e458b3484d034338c1237378641d142c126175",
27
32
  ".claude/skills/capture-meeting/evals/basic.yaml": "47c4eb1479aeda8067bc877cebfe8688c9b2bb3ce09297e42375cdf558f412e3",
@@ -40,13 +45,13 @@
40
45
  ".claude/skills/fix-duplicates/SKILL.md": "bd3d6aae0bce09bc844ba93f76abb4a97ff0286c6a4da29cd369763864fd4a86",
41
46
  ".claude/skills/follow-up-draft/SKILL.md": "49212946cb7b2db39cecd435ab6fcb7cb26cc49666306084d27676543be67281",
42
47
  ".claude/skills/growth-check/SKILL.md": "a3ed3f2427f3639f81d16b1a2adfa268bafed795d2b0d02621bc6be318784c24",
43
- ".claude/skills/hire-agent.md": "2d60d241592db7d5f30a635d11164802f907b02616fb25170f3f867d5235f067",
48
+ ".claude/skills/hire-agent.md": "8f46632d6c9b02563a41c00c811e1bcd365614bb5e64d770482f3536fa30e6fe",
44
49
  ".claude/skills/inbox-check/SKILL.md": "e2cddc91f72ffc0443d105385d2e8b464f86404a4825d308bd03a68e4ee43726",
45
50
  ".claude/skills/ingest-sources/SKILL.md": "938b93d299649db156ad19e9a7bd5d09e950f53bc2eaf8a85386f9fdbca350d0",
46
51
  ".claude/skills/ingest-sources/references/extraction-patterns.md": "28d6dc604c4a11eacfe4523e705ad430ba8fe26632bafcccaef656e9618e5aea",
47
52
  ".claude/skills/judgment-awareness.md": "0077129a87f91d295515108f3f9a68ca6874c6e38fc7f2a73a462704fa00ba2d",
48
53
  ".claude/skills/map-connections/SKILL.md": "e8133d5e3de36e32e3858a1ee3918a577855e77ade022933097f87a0100cc9a3",
49
- ".claude/skills/meditate/SKILL.md": "1d814f0c52f4309708669139ba4b637bf05a228e2381950ee3851c6145e25cc7",
54
+ ".claude/skills/meditate/SKILL.md": "1d4eab90fdedd652f75750df1a1c3be885fbeefb89d5d99979324e395eb6fd83",
50
55
  ".claude/skills/meditate/evals/basic.yaml": "daba441b2fd9d1d4afddcff6eaa9673884198b1d0f85d9d06edbe0738012291a",
51
56
  ".claude/skills/meeting-prep/SKILL.md": "cb0b2afc41052a0652e7de30d09e71ab489162fbb09c648e3c95cec2d438ba63",
52
57
  ".claude/skills/memory-audit/SKILL.md": "7daed2ec128ea70171b8e3ffaf4d933ca69db2637b1487f69724e3428c04663a",
@@ -65,7 +70,7 @@
65
70
  ".claude/skills/research/SKILL.md": "0d317e6350b043ee774916b923957f61dbc174e3caafc0187198fa236a2fd28d",
66
71
  ".claude/skills/research/references/source-evaluation.md": "64614b7eff83468d7ff76dd640252579f23e69e760969672e9aebe5ceb00f695",
67
72
  ".claude/skills/risk-surfacer.md": "b01e6bc7a22df414a7ee7a5e4a108088b5d049646723cc7513fac27a3ba4856c",
68
- ".claude/skills/skill-index.json": "043c86f1b07f28bb54db80117437f60fb3d79ad0d18549fdc2c74921b996a56f",
73
+ ".claude/skills/skill-index.json": "576ebd98fe5f597a45871366d286cd6765d6bdabefe49633c2d00514518c54bc",
69
74
  ".claude/skills/skill-router/SKILL.md": "12fc45aebc94d3ead65843cd336a88797a40856e82908cda47256c9cbb4cb60f",
70
75
  ".claude/skills/skill-router/references/overlap-clusters.md": "bc819b4892cbdf9919db392825230f4cd4eec546cd7f0c9de6c29a2468ab489a",
71
76
  ".claude/skills/structure-generator.md": "dbe70841ab60599a632687aaa3c3652361483df2fe854640c56982b60622eb19",
@@ -0,0 +1,33 @@
1
+ # _loop: shared loop-engineering resources
2
+
3
+ This directory is **not a skill**. It has no `SKILL.md` and is not listed in
4
+ `skill-index.json`, so the skill router never loads it directly. It holds the
5
+ versioned Maker and Checker prompt templates that skill-level loops reuse, so the
6
+ Maker-Checker pattern is defined once and shared.
7
+
8
+ Consumers today:
9
+
10
+ - `auto-research` (Proposal 11, E2) uses `checker.md` to score each iteration
11
+ independently of the Maker.
12
+ - `build-team` (Proposal 11, E6, not yet built) will reuse both templates.
13
+
14
+ The underscore prefix marks this as an internal resource, not a user-facing
15
+ capability. See `docs/loop-status-schema.md` for the status-file contract these
16
+ loops write, and `docs/proposals/11-autonomy-personalization-layer.md` for the
17
+ design.
18
+
19
+ ## Files
20
+
21
+ | File | Role |
22
+ |------|------|
23
+ | `maker.md` | The producer prompt: makes one bounded change per iteration. |
24
+ | `checker.md` | The independent verifier prompt: scores adversarially, returns a structured verdict. |
25
+ | `repair.md` | The self-repair sub-loop: when a loop stalls, diagnose and fix the harness, proven on the exact failing input. |
26
+
27
+ ## Why Maker and Checker are separate
28
+
29
+ A single agent grading its own work has a self-justification bias: it tends to
30
+ rate what it just produced as good. Splitting the roles, and giving the Checker
31
+ an adversarial brief (find faults, do not confirm), makes a passing verdict
32
+ actually mean something. The Checker runs on a cheaper model tier (Haiku) so the
33
+ verification pass is fast and inexpensive.
@@ -0,0 +1,64 @@
1
+ # Checker role brief (Proposal 11, E1/B2)
2
+
3
+ The Checker verifies work the Maker produced. It is dispatched as a separate
4
+ agent (`loop-checker`, Haiku) so it reasons independently of the Maker and costs
5
+ little to run. Its brief is adversarial: **find faults, do not confirm.** A
6
+ passing verdict from a Checker told to look for problems is worth something; a
7
+ rubber stamp is not.
8
+
9
+ ## What the Checker receives
10
+
11
+ The dispatching skill passes:
12
+
13
+ - `rubric`: the scoring criteria (from the loop's program/rubric file).
14
+ - `artifact`: the current output to score (the edited draft, the proposed team,
15
+ the consolidation result).
16
+ - `proposed_change`: the one change the Maker made this iteration, and the
17
+ Maker's expected effect (the self-score), if any.
18
+
19
+ ## How the Checker scores
20
+
21
+ 1. Score the artifact against each rubric dimension on its stated scale. Sum to a
22
+ total `score`.
23
+ 2. Actively hunt for what is wrong or weak: unmet constraints, regressions the
24
+ change introduced, dimensions the Maker over-claimed, anything the rubric
25
+ penalizes.
26
+ 3. Decide `verified`: did this artifact meet the rubric's bar (and not violate
27
+ any hard constraint)? When uncertain, default to `verified: false`. The cost
28
+ of a false pass is higher than the cost of one more iteration.
29
+ 4. Do not be swayed by the Maker's self-score. Score independently first, then
30
+ you may note divergence.
31
+
32
+ ## Output contract (return exactly this JSON)
33
+
34
+ ```json
35
+ {
36
+ "verified": false,
37
+ "score": 7.2,
38
+ "max_score": 10,
39
+ "issues": [
40
+ {"severity": "major", "issue": "the ask is buried in paragraph 3", "where": "body"},
41
+ {"severity": "minor", "issue": "two sentences exceed the length cap", "where": "closing"}
42
+ ],
43
+ "rationale": "Lede improved, but the rubric weights ask-clarity heavily and the ask is still not in the opening.",
44
+ "hard_constraint_violated": false
45
+ }
46
+ ```
47
+
48
+ - `verified` (bool): does the artifact clear the bar this iteration?
49
+ - `score` (float): total across rubric dimensions.
50
+ - `max_score` (float): the maximum the rubric allows, so the caller can normalize.
51
+ - `issues` (array): concrete, located faults. Empty only when there genuinely are
52
+ none. Each has `severity` (`major` | `minor`), `issue`, and `where`.
53
+ - `rationale` (string): one or two sentences. Why this verdict, grounded in the
54
+ rubric.
55
+ - `hard_constraint_violated` (bool): true if any non-negotiable constraint failed
56
+ (which forces `verified: false` regardless of score).
57
+
58
+ ## What the Checker does not do
59
+
60
+ - It does not edit the artifact. It scores and reports; the Maker edits.
61
+ - It does not decide keep/revert. It returns a verdict; the loop applies the
62
+ rule (keep if the score beats the running best and no hard constraint failed).
63
+ - It does not soften findings to be agreeable. Disagreement with the Maker is the
64
+ whole point.
@@ -0,0 +1,39 @@
1
+ # Maker role brief (Proposal 11, E1/B2)
2
+
3
+ The Maker produces work. In a Claudia loop, the Maker is Claudia herself (the
4
+ main model), not a dispatched subagent. This brief defines how the Maker behaves
5
+ inside any Maker-Checker loop, so the producer role is consistent across
6
+ `auto-research`, `build-team`, and future loops.
7
+
8
+ ## Contract
9
+
10
+ Each iteration, the Maker does exactly one thing:
11
+
12
+ 1. **Read the state.** The program/rubric, the current artifact, and the last
13
+ few rows of history (what was tried, what was kept, what was reverted).
14
+ 2. **Propose ONE change.** One specific, bounded edit, justified in a single
15
+ sentence. Not a rewrite. Not a refactor. Not three changes bundled together.
16
+ One lever, pulled deliberately, so the Checker can attribute the score delta
17
+ to it.
18
+ 3. **Apply it** to the working copy (never the user's original file during the
19
+ loop).
20
+ 4. **Hand off to the Checker.** The Maker does not score its own work for the
21
+ keep/revert decision. It may state an expected effect ("this should raise the
22
+ lede dimension"), which becomes the Maker self-score the Checker's verdict is
23
+ compared against, but the Checker's score governs.
24
+
25
+ ## Discipline
26
+
27
+ - One change per iteration. Bundling changes destroys the signal.
28
+ - Justify in one sentence. If the justification needs a paragraph, the change is
29
+ too big; split it.
30
+ - Stay inside the bounds declared up front (see the exit-condition standard in
31
+ `docs/loop-status-schema.md`).
32
+ - Honest self-assessment. Do not inflate the expected effect to pre-empt the
33
+ Checker. The point of the split is that the Checker is not you.
34
+
35
+ ## Why the Maker does not grade itself
36
+
37
+ A model that grades its own output rates it generously: it just argued itself
38
+ into producing exactly that. The keep/revert decision therefore belongs to an
39
+ independent Checker (see `checker.md`). The Maker proposes; the Checker disposes.
@@ -0,0 +1,85 @@
1
+ # Self-repair brief (Proposal 11, E3)
2
+
3
+ When a Maker-Checker loop keeps failing, the problem may not be the artifact: it
4
+ may be the harness (the rubric, the Maker brief, or the Checker brief). The
5
+ self-repair sub-loop diagnoses that, proposes a fix, and proves the fix on the
6
+ exact input that failed before adopting it. It is itself bounded, and it never
7
+ auto-edits shipped briefs or the user's live files.
8
+
9
+ This brief is shared. `auto-research` uses it today; future loops reuse it.
10
+
11
+ ## B1: When self-repair triggers
12
+
13
+ Enter the self-repair sub-loop when any of these holds during a loop:
14
+
15
+ - The Checker returns `verified: false` for **3 iterations in a row**.
16
+ - The Checker's `score` **regresses** across 3 consecutive iterations (each lower
17
+ than the last).
18
+ - An iteration raises an **exception** the loop cannot attribute to the artifact
19
+ (for example, the Checker returns malformed output twice).
20
+
21
+ These are signals that hill-climbing has stalled for a structural reason, not
22
+ that one more edit will help.
23
+
24
+ ## B2: The repair sub-loop
25
+
26
+ Self-repair is itself a small Maker-Checker loop, run on the harness instead of
27
+ the artifact:
28
+
29
+ 1. **Diagnose (Maker).** Read the last few status rows and the Checker's `issues`.
30
+ State, in one or two sentences, the single most likely harness cause: an
31
+ ambiguous rubric dimension, a Maker brief that permits bundled changes, a
32
+ Checker brief that scores the wrong thing, a missing hard constraint.
33
+ 2. **Propose one fix (Maker).** One concrete edit to exactly one harness piece
34
+ (the rubric in `program.md`, `maker.md`, or `checker.md`). Not a rewrite. One
35
+ lever.
36
+ 3. **Validate on the exact failing input (Checker).** Re-run the failing
37
+ iteration's input through the loop **with the proposed fix applied in a scratch
38
+ copy**, and dispatch the `loop-checker`. The fix is adopted only if that exact
39
+ previously-failing case now reaches `verified: true` (or scores materially
40
+ higher) and no previously-passing regression fixture breaks.
41
+ 4. **Adopt or discard.** If it passes, apply the fix (see B4 for where).
42
+ If it does not, discard it and either try one more diagnosis (within the B4
43
+ bound) or stop and hand the loop back to the user with the diagnosis.
44
+
45
+ The non-negotiable rule is step 3: a fix is never adopted on a hunch. It must be
46
+ proven on the input that exposed the problem.
47
+
48
+ ## B3: Regression capture
49
+
50
+ Every confirmed repair leaves a regression behind so the same failure cannot
51
+ silently return:
52
+
53
+ - Save the failing input plus the verdict it should now produce as a fixture
54
+ under `~/.claudia/loops/regressions/<loop-id>/<short-slug>.md` (a status file in
55
+ the standard schema, with the expected `verified`/`score`).
56
+ - On future runs of that loop, replay the loop-id's regression fixtures through
57
+ the Checker first. If any regression breaks, the loop refuses to start and
58
+ reports which fix regressed.
59
+
60
+ Shipped-skill regressions (ones that should travel with the template, not just
61
+ the user's machine) are proposed to the user for inclusion as in-repo test
62
+ fixtures instead.
63
+
64
+ ## B4: Bound and safety
65
+
66
+ Self-repair obeys the same bounded-autonomy rules as every loop:
67
+
68
+ - **Bounded.** At most **2 repair attempts** per loop run. After that, stop and
69
+ hand back to the user with the diagnosis. Self-repair never loops forever.
70
+ - **Workspace-only for artifacts.** The repair sub-loop validates in a scratch
71
+ copy and never touches the user's original file.
72
+ - **Never auto-edit shipped briefs.** A fix to `maker.md` or `checker.md` (files
73
+ that ship in the template) is **proposed as a diff for the user to approve**,
74
+ never written silently. A fix to a run-local `program.md` rubric may be applied
75
+ within the run, since that rubric is the user's own per-run governance file.
76
+ - **Logged.** Each repair attempt appends to the loop's status file: what was
77
+ diagnosed, what was changed, and whether the exact failing input passed after
78
+ the change.
79
+
80
+ ## What self-repair is not
81
+
82
+ - It is not a license to rewrite the harness whenever a loop is hard. It triggers
83
+ only on the B1 signals, and it changes one thing at a time.
84
+ - It is not autonomous prompt-editing of shipped files. Those changes always end
85
+ at a human approval gate.
@@ -14,11 +14,11 @@ A hill-climber for any local artifact with a rubric. Inspired by [Karpathy's aut
14
14
  Three primitives, one governance file:
15
15
 
16
16
  1. **The artifact** — what gets iterated on (a draft email, a brief, a wiki page, a one-pager). Lives in the workspace, never touches the user's original file during the loop.
17
- 2. **The evaluator** — a scalar rubric Claudia scores against. Plain English. Must produce a number for the original artifact before the loop starts.
17
+ 2. **The evaluator** — a scalar rubric. Plain English. An independent Checker (the `loop-checker` agent, Haiku) scores each iteration against it, not Claudia herself. The rubric must produce a number for the original artifact before the loop starts.
18
18
  3. **The budget** — iteration count (default 20). Loop stops when budget is exhausted or score plateaus.
19
19
  4. **The program** (governance file) — what the user wants, what's off-limits, what counts as "better." Claudia helps the user write this if they don't volunteer it.
20
20
 
21
- Loop: read program + artifact + results history propose one specific change implement score if better, ratchet (commit); if worse, revert (git reset). Report each iteration as a one-line update. Repeat until budget is hit.
21
+ Loop: read program + artifact + results history, Claudia (the Maker) proposes one specific change, implements it, then dispatches the `loop-checker` (the Checker) to score it independently. If the Checker's score beats the running best, ratchet (commit); if worse, revert (git reset). Write the status file. Report each iteration as a one-line update. Repeat until budget is hit.
22
22
 
23
23
  ## When to invoke this skill
24
24
 
@@ -53,7 +53,8 @@ Each run gets its own workspace at `~/.claudia/auto-research/<task-id>/`:
53
53
  ├── program.md the brief (user-authored with Claudia's help)
54
54
  ├── artifact.md (or .txt, the working copy that gets edited)
55
55
  ├── original.md immutable copy of the input, for reference and diff
56
- ├── results.tsv one row per iteration: timestamp, score, kept/reverted, change-summary
56
+ ├── results.tsv one row per iteration: timestamp, score, kept/reverted/contested, change-summary
57
+ ├── research_status.md loop control plane (see docs/loop-status-schema.md): iteration, verified, score, checker_verdict, next_action
57
58
  ├── best.md symlink (or copy on Windows) to the highest-scoring version
58
59
  └── iterations/
59
60
  ├── 01/artifact.md
@@ -111,16 +112,50 @@ For each iteration N:
111
112
  1. **Read the state.** Read `program.md`, `artifact.md`, last 3 rows of `results.tsv`.
112
113
  2. **Propose one change.** One specific edit, justified in one sentence. Not a rewrite. Not a refactor. One change.
113
114
  3. **Implement.** Apply the edit to `artifact.md`. Save.
114
- 4. **Score.** Score the new `artifact.md` against the rubric in `program.md`. Sum the dimensions. Round to one decimal.
115
- 5. **Decide.** If new score > best score so far: ratchet. Copy `artifact.md` to `iterations/NN/`, update `best.md` to point at the new winner, append a row to `results.tsv` marked `kept`. If new score <= best: revert. Copy `iterations/(best_iter)/artifact.md` back to `artifact.md`, append a row marked `reverted`.
116
- 6. **Report.** One line to the user: `iter N: score 7.4 (kept), change: tightened lede to one sentence`.
117
- 7. **Check stop conditions.** If budget exhausted: stop. If score plateaued (no improvement for 5 iterations): stop. Otherwise: next iteration.
115
+ 4. **Score (independent Checker).** Dispatch the `loop-checker` agent (Haiku) with the rubric from `program.md`, the new `artifact.md`, and the one change you just made plus your expected effect (your Maker self-score). The Checker returns a verdict JSON: `verified`, `score`, `max_score`, `issues`, `rationale`, `hard_constraint_violated`. You do NOT score it yourself; the Checker's `score` governs the decision. (See `.claude/skills/_loop/checker.md`.)
116
+ 5. **Decide.** If the Checker's `score` > best score so far and `hard_constraint_violated` is false: ratchet. Copy `artifact.md` to `iterations/NN/`, update `best.md` to point at the new winner, append a row to `results.tsv` marked `kept`. Otherwise: revert. Copy `iterations/(best_iter)/artifact.md` back to `artifact.md`, append a row marked `reverted`.
117
+ 6. **Flag divergence.** Compare the Checker's `score` against your Maker self-score. If they diverge by more than 1.5 points on a 10-point scale, mark the iteration `contested` in `results.tsv`. The Checker's verdict still governs; the flag just surfaces the disagreement in the end-of-run summary.
118
+ 7. **Write the status file.** Atomically update `research_status.md` with `iteration`, `verified`, `score`, `budget_remaining`, `last_input`, `maker_proposal`, `checker_verdict` (the Checker's rationale), and `next_action`. Write to a temp sibling and rename; never edit it in place. (See `docs/loop-status-schema.md`.)
119
+ 8. **Report.** One line to the user: `iter N: score 7.4 (kept), change: tightened lede to one sentence`. Append `[contested]` when flagged.
120
+ 9. **Check stop conditions and self-repair.** If the Checker has returned `verified: false` 3 times in a row, or its score has regressed for 3 straight iterations, enter the self-repair sub-loop (see `.claude/skills/_loop/repair.md`) before giving up: it diagnoses whether the rubric or a brief is the real problem, proposes one fix, and proves it on the exact failing input. If budget exhausted: stop. If score plateaued (no improvement for 5 iterations): stop. Otherwise: next iteration.
121
+
122
+ ## The independent Checker (Maker-Checker)
123
+
124
+ This loop separates the producer from the verifier (Proposal 11, E2). Claudia is
125
+ the **Maker**: she reads the state and proposes one change. The **Checker** is a
126
+ separate agent (`loop-checker`, Haiku) that scores the result independently. The
127
+ two never collapse into one model grading its own work, which is the bias the
128
+ split exists to remove.
129
+
130
+ - The Checker's `score`, not Claudia's, drives keep/revert. Claudia's expected
131
+ effect is only a self-score, used to detect disagreement.
132
+ - The Checker is adversarial by brief: it hunts for faults and defaults to
133
+ `verified: false` when uncertain. A passing verdict therefore means something.
134
+ - A `hard_constraint_violated: true` verdict forces a revert regardless of score.
135
+
136
+ **Contested iterations.** When the Checker's score and Claudia's self-score
137
+ diverge by more than 1.5 points (10-point scale), the iteration is marked
138
+ `contested`. The Checker still governs the decision; the flag makes the
139
+ disagreement visible. The end-of-run summary reports how many iterations were
140
+ contested, which is a useful signal that the rubric is ambiguous or that Claudia
141
+ was over-optimistic.
142
+
143
+ **Cost.** The Checker runs on Haiku, so the verification pass is cheap and fast.
144
+ With the default 20-iteration budget, that is at most 21 Checker dispatches
145
+ (including the baseline), each scoped to a single small artifact.
146
+
147
+ **Self-repair.** If the loop stalls (the Checker fails 3 times running, or scores
148
+ regress), `auto-research` enters a bounded self-repair sub-loop that treats the
149
+ harness, not the artifact, as the suspect: it diagnoses the rubric or a brief,
150
+ proposes one fix, and proves it on the exact failing input before adopting it.
151
+ See `.claude/skills/_loop/repair.md`. It is capped at 2 repair attempts and never
152
+ auto-edits shipped briefs (those changes go to a human approval gate).
118
153
 
119
154
  ## Safety rules (mandatory, follow without exception)
120
155
 
121
156
  1. **Workspace-only edits.** The loop never writes to a file outside `~/.claudia/auto-research/<task-id>/`. The user's original file at `/Users/.../wherever.md` is untouched until the user explicitly approves the final hand-off.
122
157
  2. **No external actions during the loop.** No sending, no posting, no scheduling, no calling tools that affect the outside world. The loop is a closed system.
123
- 3. **Baseline-score gate.** Before iteration 1, score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)`. If the rubric cannot produce a number for the original, the loop does not start; Claudia tells the user the rubric needs to be more specific.
158
+ 3. **Baseline-score gate.** Before iteration 1, dispatch the `loop-checker` to score the original artifact and write it to `results.tsv` as `iter 0: score X (baseline)` and to `research_status.md` as iteration 0. If the Checker cannot produce a number for the original, the loop does not start; Claudia tells the user the rubric needs to be more specific.
124
159
  4. **Bounded budget.** Default 20 iterations. User can set higher (50 max) or lower. Plateau detection (no improvement for 5 in a row) stops early. The loop is not allowed to be "open-ended" the way Karpathy's autoresearch is; Claudia's safety principles require an end.
125
160
  5. **Per-iteration revertability.** Each iteration is a git commit inside the workspace. Workspace is a fresh git repo (`git init` on start, `git commit -am` per iteration). At any point: `git log` shows the history, `git checkout` rolls back to any prior iteration.
126
161
  6. **User interrupt at any time.** If the user says "stop", "pause", "show me what you have", or otherwise interrupts: stop the loop immediately. The workspace persists. The best version so far is in `best.md`. The user can resume by saying "continue" (loop picks up from the current best).
@@ -178,10 +213,13 @@ When the user gives a bad rubric, help them turn it into a good one before start
178
213
  - `wiki` for iterating wiki pages specifically (compress, sharpen, restructure)
179
214
  - `draft-reply`, `follow-up-draft` for one-shot draft generation
180
215
  - `summarize-doc` for one-pass summarization
216
+ - `.claude/skills/_loop/` for the shared Maker and Checker role briefs
217
+ - `.claude/agents/loop-checker.md` for the Checker agent definition
218
+ - `docs/loop-status-schema.md` for the `research_status.md` control-plane format
181
219
 
182
220
  ## Open questions for future versions
183
221
 
184
- - Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today, Claudia scores against the rubric in `program.md` directly.
222
+ - Shell-command evaluators (run `node test.js my-draft.md`, take the number it prints) are deferred. Today, the `loop-checker` agent scores against the rubric in `program.md`.
185
223
  - Token-spend budget and wall-clock budget are deferred. Today, only iteration count is supported.
186
224
  - Parallel branches (try 3 different changes in parallel, keep the winner) are deferred. Today, sequential only.
187
225
  - Wiki-page iteration as a first-class use case (apply auto-research to a wiki page until it's < N words while citing all sources) is deferred. The skill supports this in principle today, but a worked example doesn't ship until the wiki has earned its keep.
@@ -0,0 +1,167 @@
1
+ ---
2
+ name: build-team
3
+ description: Propose a personalized team of specialized agents based on the user's profile, goals, and how they actually work. Runs the proposal through an independent Checker, gates on the user's approval, and applies with rollback. Use when the user says "build my team", "set up my agents", "what agents should I have", "design my team", "/build-team", or asks Claudia to tailor her team to their work. See also: `hire-agent` for proactive single-agent suggestions; `structure-generator` for folder scaffolding; `capability-suggester` for command-level additions.
4
+ argument-hint: "[optional: focus area, e.g. 'around client work']"
5
+ invocation: explicit
6
+ effort-level: high
7
+ ---
8
+
9
+ # Build Team
10
+
11
+ The user-invoked counterpart to `hire-agent`. Where `hire-agent` suggests one
12
+ agent reactively when it spots a repeated pattern, `build-team` looks at the
13
+ whole picture (the user's profile, judgment rules, and real task history) and
14
+ proposes a tailored team in one pass. It is a Maker-Checker flow (Proposal 11,
15
+ E6): Claudia proposes, an independent Checker validates, the user approves, and
16
+ only then is anything written.
17
+
18
+ ## What this is, and is not
19
+
20
+ - It **proposes and, on approval, scaffolds** agent definitions. It never
21
+ auto-spawns agents or takes external actions.
22
+ - It **reuses the existing roster first** (the agents in `.claude/agents/`) and
23
+ adds a new role only when the user's work clearly justifies one.
24
+ - It **starts minimal.** A small team that covers the real work beats a sprawling
25
+ org chart the user will never use. Growth comes later, from `hire-agent` and
26
+ the proactive team-update suggestions (Proposal 11, E7).
27
+ - It does **not** duplicate `agent-dispatcher` routing logic, and it does not
28
+ change how Claudia delegates. It only proposes which agents exist.
29
+
30
+ ## When to fire
31
+
32
+ Trigger phrases: "build my team", "set up my agents", "what agents should I
33
+ have", "design my team", "/build-team", "tailor your team to my work".
34
+
35
+ Also the landing spot when a proactive suggestion (from `capability-suggester`
36
+ or `hire-agent`) escalates from "add one agent" to "let's set up your whole
37
+ team".
38
+
39
+ Do NOT fire for: adding a single agent for one observed pattern (use
40
+ `hire-agent`), folder/structure changes (use `structure-generator`), or
41
+ command/workflow additions (use `capability-suggester`).
42
+
43
+ ## The flow
44
+
45
+ ### Step 1: Read the profile
46
+
47
+ Gather, silently:
48
+
49
+ - `context/me.md`: role, archetype, priorities, the shape of a typical week.
50
+ - `context/judgment.yaml` (if it exists): priorities, delegation preferences,
51
+ surfacing rules. A user who said "just auto-process transcripts" wants a
52
+ processing agent; a user who said "always run things by me" wants a smaller,
53
+ more ask-first team.
54
+ - Recent task patterns: what the user has actually asked Claudia to do
55
+ repeatedly (lean on the same signals `hire-agent` uses, plus `memory_recall`
56
+ for recurring work). The team should map to real work, not a generic template.
57
+
58
+ If `context/me.md` does not exist, the user has not onboarded. Route to
59
+ onboarding first; do not propose a team into a vacuum.
60
+
61
+ ### Step 2: Maker proposes the team
62
+
63
+ Claudia (the Maker) drafts a team:
64
+
65
+ 1. Start from the **minimal seed for the user's archetype** (see the seed table
66
+ below).
67
+ 2. Adjust to the user's actual work: drop a seed role they will not use, add a
68
+ role a recurring task clearly needs.
69
+ 3. For each role, write a one-line **rationale tied to the user's real work**
70
+ ("You process client transcripts most weeks, so a Document Processor handles
71
+ the extraction while I keep the judgment"). A role with no concrete rationale
72
+ does not belong in the proposal.
73
+ 4. Every new (non-roster) role must pass the `hire-agent` candidate test:
74
+ compute-heavy, judgment-light, structured in and out, repeatable. If it needs
75
+ relationship context or would take external actions, it is not an agent.
76
+
77
+ ### Step 3: The Checker validates (bounded)
78
+
79
+ Dispatch the `loop-checker` agent with the team proposal as the `artifact` and
80
+ the team rubric below. The Checker scores independently and returns the standard
81
+ verdict JSON (`verified`, `score`, `issues`, `hard_constraint_violated`).
82
+
83
+ If the Checker returns `verified: false` (most often: too large, or a role that
84
+ needs judgment), the Maker **revises once** based on the issues (usually by
85
+ trimming) and re-checks. Cap at **2 revisions**. This is the same bounded
86
+ Maker-Checker discipline every loop follows; the team proposal is not exempt.
87
+
88
+ **The team rubric:**
89
+
90
+ | Dimension | 10 | 0 |
91
+ |-----------|----|----|
92
+ | Goal alignment | Every role maps to a stated priority or a recurring task in the profile | Roles are generic, not tied to this user |
93
+ | Right-sized | The minimum team that covers the work (progressive, not overwhelming) | Sprawling; roles the user will not use |
94
+ | Judgment-safe | Every role is compute-heavy and judgment-light | A role would take external actions or need relationship judgment |
95
+ | Reuse-first | Uses existing roster agents where they fit; new roles only when justified | Invents roles that duplicate existing agents |
96
+ | Tier-correct | Haiku for structured extraction, Sonnet for multi-turn research | Mismatched model tiers |
97
+
98
+ **Hard constraints** (any one forces `verified: false`): more than 5 roles in a
99
+ first team, or any role that would require relationship judgment or take external
100
+ actions.
101
+
102
+ ### Step 4: Status file + approval gate
103
+
104
+ Write `team_status.md` in the standard schema (`docs/loop-status-schema.md`):
105
+ `last_input` (one-line profile summary), `maker_proposal` (the team), `score`,
106
+ `checker_verdict`, `verified`, `next_action: await user approval`. Write it to
107
+ the workspace, atomically (temp sibling then rename).
108
+
109
+ Then present the validated proposal to the user: each role, its model tier, and
110
+ its one-line rationale, plus what is reused vs new. **Write nothing to
111
+ `.claude/agents/` or `.claude/skills/` yet.** This is a Human-Approved action
112
+ (see `claudia-principles.md`). Ask plainly: "Want me to set this team up?"
113
+
114
+ ### Step 5: Apply and rollback (only after explicit approval)
115
+
116
+ On a clear yes:
117
+
118
+ 1. For each **new** role, scaffold `.claude/agents/<name>.md` using the agent
119
+ definition format from `hire-agent` (frontmatter: `name`, `description`,
120
+ `model`, `dispatch-category`, `dispatch-tier`, `auto-dispatch`; body: role,
121
+ job, output format, constraints).
122
+ 2. For any **existing** file you modify (for example, adding the team to
123
+ `.claude/agents/README.md`), write a `.bak` sibling first, the same
124
+ mechanism the installer upgrade flow uses. New files need no `.bak`.
125
+ 3. Report exactly what was created or changed, and how to undo it: "Rollback by
126
+ restoring the `.bak` files and deleting the new agent files, or just say
127
+ 'undo the team setup' and I will."
128
+ 4. Do not edit `agent-dispatcher.md` routing here. The new agents exist; routing
129
+ them is the dispatcher's job and can be tuned separately.
130
+
131
+ Rollback on request: restore each `.bak`, delete each newly created agent file,
132
+ and report what was reverted.
133
+
134
+ ## Minimal seed teams (Proposal 11, D5)
135
+
136
+ Starting points, not prescriptions. Every seed gets pruned and adjusted to the
137
+ user's actual work in Step 2. All roles are drawn from the existing roster.
138
+
139
+ | Archetype | Minimal seed |
140
+ |-----------|-------------|
141
+ | Consultant / Advisor | Document Archivist (intake), Document Processor (extract from client docs and transcripts), Research Scout (client and market research) |
142
+ | Executive / Manager | Document Processor (reports and decks), Schedule Analyst (calendar patterns), Research Scout |
143
+ | Founder / Entrepreneur | Research Scout (market and competitor), Document Processor (investor and product docs), Document Archivist |
144
+ | Solo Professional | Document Archivist, Document Processor |
145
+ | Content Creator | Research Scout (topic research), Document Processor (transcript and draft extraction), Canvas Generator (visual) |
146
+
147
+ Two or three roles is a healthy first team. Five is the ceiling, not the target.
148
+
149
+ ## Safety
150
+
151
+ - **Approval gate is non-negotiable.** Nothing is written to `agents/` or
152
+ `skills/` without an explicit yes (Step 4).
153
+ - **Reversible.** `.bak` siblings for any modified file; new files are
154
+ deletable; "undo the team setup" restores the prior state.
155
+ - **Judgment stays human.** Agents handle processing. Relationship judgment,
156
+ strategy, and external actions never get delegated to a generated agent.
157
+ - **Minimal by default.** When the Checker and the Maker disagree on size, the
158
+ smaller team wins. The user can always add more later.
159
+
160
+ ## See also
161
+
162
+ - `hire-agent` for proactive, single-agent suggestions from one observed pattern.
163
+ - `structure-generator` for the archetype folder structures this composes with.
164
+ - `capability-suggester` for command and workflow additions (not agents).
165
+ - `.claude/agents/README.md` for the current roster and the two-tier model.
166
+ - `.claude/skills/_loop/checker.md` for the Checker brief the validation uses.
167
+ - `docs/loop-status-schema.md` for the `team_status.md` format.
@@ -147,3 +147,49 @@ After creating a new agent, I monitor:
147
147
  If an agent isn't being used or consistently needs my intervention, I might suggest retiring it:
148
148
 
149
149
  "The LinkedIn Processor hasn't been used in 3 weeks. Want me to remove it to keep things simple?"
150
+
151
+ ## Proactive team review (Proposal 11, E7)
152
+
153
+ Single-agent suggestions (above) cover incremental growth. Sometimes the whole
154
+ team has drifted from how the user now works, and the right move is to review the
155
+ team as a unit, not bolt on one more agent.
156
+
157
+ ### Drift triggers
158
+
159
+ I raise a team review (not just a single hire) when I notice:
160
+
161
+ - **Archetype shift.** The user's `context/me.md` archetype has changed, or their
162
+ described work no longer matches it (a consultant who is now mostly building a
163
+ product).
164
+ - **A new recurring task class.** A whole category of repeated work has appeared
165
+ that the current roster does not cover, not just one task type.
166
+ - **Roster drift.** Two or more agents have gone unused for weeks while the user
167
+ keeps doing a kind of work by hand.
168
+
169
+ These are about the shape of the team, not a single gap. One missing agent is a
170
+ `hire-agent` suggestion; a team that no longer fits is a team review.
171
+
172
+ ### Suggest a diff, then route to /build-team
173
+
174
+ When a drift trigger fires, I suggest the change as a **team diff**, gently and
175
+ as one suggestion:
176
+
177
+ ```
178
+ "The way you work has shifted toward [X] over the last while. Your current team
179
+ was set up for [Y]. Want me to review the whole team? Roughly, I'd add [role],
180
+ retire [unused role], and keep the rest."
181
+ ```
182
+
183
+ If the user says yes, I route to `/build-team`, which does the real work: it reads
184
+ the current profile, proposes the adjusted team, validates it through the Checker,
185
+ shows it for approval, and applies it with `.bak` rollback. I do not re-implement
186
+ the proposal, validation, or apply logic here. This skill only notices the drift
187
+ and offers the review; `build-team` owns the change.
188
+
189
+ ### Discipline
190
+
191
+ - One suggestion, not a flood. If the user declines, I drop it and do not re-raise
192
+ until a new, distinct drift signal appears.
193
+ - Never auto-apply. A team review is always proposed, never performed silently.
194
+ - Minimal still wins. A review can shrink a team as readily as grow it; an unused
195
+ agent is friction, not value.
@@ -42,6 +42,7 @@ Silently retrieve:
42
42
 
43
43
  - Call the `memory_reflections` MCP tool to see what already exists
44
44
  - Call the `memory_session_context` MCP tool for recent context (if available)
45
+ - Read recent loop status files (`~/.claudia/loops/*_status.md`, and any `research_status.md` from `auto-research` runs this session) to see how Claudia's own Maker-Checker loops performed: which iterations were `contested`, which loops never reached `verified`, where budgets ran out. This feeds the harness review in Step 2b.
45
46
 
46
47
  ### Step 2: Generate Reflections
47
48
 
@@ -78,6 +79,18 @@ Review the session and identify 1-3 reflections. Ask yourself:
78
79
 
79
80
  **Quality over quantity.** One genuine insight beats three generic observations.
80
81
 
82
+ ### Step 2b: Loop harness review (only if loops ran this session)
83
+
84
+ If any Maker-Checker loop ran this session (`auto-research`, or a wrapped daemon job), review how the harness itself performed, separately from reflections about the user. Read the signal from the status files gathered in Step 1, not from memory:
85
+
86
+ - **Contested iterations.** Did the Maker (Claudia) and the Checker disagree often? Frequent `contested` flags usually mean the rubric is ambiguous or Claudia was over-optimistic, not that the Checker is wrong.
87
+ - **Stuck loops.** Did a loop burn its whole budget without reaching `verified`? That points at the rubric, the Maker brief, or the Checker brief needing sharpening.
88
+ - **Recurring failure shape.** Across runs, is the same kind of issue showing up in the Checker's `issues` list?
89
+
90
+ If a clear, repeatable harness problem emerges, draft a **harness-improvement proposal**: a concrete edit to a rubric, the Maker brief (`.claude/skills/_loop/maker.md`), the Checker brief (`.claude/skills/_loop/checker.md`), or a new judgment rule about how to run loops. Do not apply it yet. Harness proposals go through the Checker gate in Step 5.
91
+
92
+ Most sessions produce no harness proposal. Raise one only when the evidence is in the status files.
93
+
81
94
  ### Step 3: Present for Approval
82
95
 
83
96
  Format reflections clearly and ask for approval:
@@ -171,6 +184,12 @@ Call the `memory_end_session` MCP tool with:
171
184
  5. Set `source` to `meditate/YYYY-MM-DD` (today's date)
172
185
  6. Write the updated file
173
186
 
187
+ **Harness proposals go through the Checker before anything is written** (Proposal 11, E4). If Step 2b produced a harness-improvement proposal (a rubric edit, a Maker or Checker brief edit, or a loop-running judgment rule), validate it first:
188
+
189
+ 1. Dispatch the `loop-checker` agent with the proposal as the artifact and a one-line rubric: "does this change actually fix the observed failure without breaking the brief's contract?"
190
+ 2. Write the proposal only if the Checker returns `verified: true`. A proposal the Checker rejects is surfaced to the user as an open question, not saved.
191
+ 3. Edits to shipped briefs (`_loop/maker.md`, `_loop/checker.md`) are never auto-applied. Present them as a diff for the user to approve, like any other change to Claudia's own files. Loop-running judgment rules go into `context/judgment.yaml` like any other rule, after passing the Checker.
192
+
174
193
  Confirm storage with both reflections and rules:
175
194
  "Got it. I've saved [N] reflection(s) and added [M] judgment rule(s) to your judgment file. I'll apply them going forward. See you next time."
176
195
 
@@ -17,6 +17,21 @@
17
17
  "I have an idea for Claudia"
18
18
  ]
19
19
  },
20
+ {
21
+ "name": "build-team",
22
+ "path": "build-team/SKILL.md",
23
+ "description": "Propose a personalized team of specialized agents based on the user profile and real work, validated by an independent Checker, applied on approval with rollback.",
24
+ "invocation": "explicit",
25
+ "effort_level": "high",
26
+ "triggers": ["build my team", "set up my agents", "what agents should I have", "design my team", "/build-team"],
27
+ "examples": [
28
+ "build my team",
29
+ "what agents should I have for my work?",
30
+ "set up my agents",
31
+ "design a team around my client work",
32
+ "tailor your team to how I work"
33
+ ]
34
+ },
20
35
  {
21
36
  "name": "onboarding",
22
37
  "path": "onboarding.md",