PyPI - threadkeeper - Versions diffs - 0.4.1__tar.gz → 0.5.0__tar.gz - Mend

threadkeeper 0.4.1tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

{threadkeeper-0.4.1 → threadkeeper-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: threadkeeper
-Version: 0.4.1
+Version: 0.5.0
 Summary: Multi-agent shared brain across Claude Code/Desktop, Codex, Gemini, Copilot, VS Code. Cross-session memory, self-improving skill loops, inter-agent signaling — one local MCP server.
 Author: thread-keeper contributors
 License: MIT
@@ -77,10 +77,10 @@ make it more than a memory store:
   concurrent sessions signal each other across CLIs. Parent /
   children / sibling agents become a coordinated swarm, not isolated
   chats.
-- **Self-improving skill library** — four autonomous background loops
+- **Self-improving skill library** — five autonomous background loops
   (auto-review on thread close, shadow-review daemon, extract
-  harvester, weekly Curator) materialize class-level skills as the
-  agents work. Hermes Agent v0.12 pattern adapted to multi-CLI:
+  harvester, candidate-reviewer, weekly Curator) materialize
+  class-level skills as the agents work. Adapted to multi-CLI:
   SKILL.md is the primary write target and gets mirrored to every
   detected CLI's skills directory simultaneously
   (`~/.claude/skills/`, `~/.codex/skills/`, `~/.threadkeeper/skills/`),
@@ -189,64 +189,202 @@ refuses a new spawn that would exceed `THREADKEEPER_SPAWN_BUDGET_MB`
 (3 GB default). Slim children that need semantic search delegate to the
 parent via `search_via_parent` — no per-child copy of sentence-transformers.
-### Learning loop (hermes-style)
-Four loops materialize knowledge into Anthropic-style Skill files
-(`SKILL.md` under each detected CLI's skills directory — Claude's
-`~/.claude/skills/`, Codex's `~/.codex/skills/`, plus the canonical
-`~/.threadkeeper/skills/` mirror) with a CLI-agnostic
-`~/.threadkeeper/lessons.md` fallback for CLIs that don't auto-trigger
-on the Skill format (Gemini / Copilot / bare MCP clients):
-- **Auto-review on close_thread** — when a closed thread is rich
-  (≥5 notes, ≥2 insight/move), `close_thread` spawns a slim child with
-  `SKILL_REVIEW_PROMPT` + the thread's notes. The prompt is rubric-form
-  (Q1–Q5 yes/no) with explicit positive examples for incident-vs-rule
-  classification. The fork also receives a "recently active skills"
-  block so it prefers PATCHing existing umbrellas over creating new
-  ones (Hermes Agent v0.12's *active-update bias*). Child appends a
-  lesson via `lesson_append`, optionally mirrors to
-  `~/.claude/skills/<name>/SKILL.md`, then closes with
-  `mark_skill_materialized`. Opt in with `THREADKEEPER_AUTO_REVIEW=1`.
-- **Shadow-review daemon** — every `THREADKEEPER_SHADOW_REVIEW_INTERVAL_S`
-  seconds (default off; 15 min recommended), scans the diff of
-  `dialog_messages` since the last cursor across **all** CLIs. The
-  window filters internal review-child sessions (no self-pollution)
-  and strips adapter `[tool_result]` / `[tool_call]` noise — Hermes
-  v0.12's "clean context" rule. If ≥500 chars of meaningful signal
-  remain, spawns a slim observer child that decides on class-level
-  learning. Idempotent through `events.kind='shadow_review_pass'`.
-- **Extract daemon** — every `THREADKEEPER_EXTRACT_INTERVAL_S` seconds
-  (default off; 10 min recommended), scans recent `dialog_messages`
-  with heuristic matchers (locale-aware "I want / next time / always"
-  patterns, headers + insight markers, bullet regularities, paraphrase
-  clusters via cosine ≥ 0.80) and enqueues candidates in
-  `extract_candidates.status='pending'` for the agent to review via
-  `review_candidates()` / `accept_candidate()`. The same self-pollution
-  filter as shadow_review excludes internal review-child sessions.
-  Where shadow extracts CLASS-LEVEL durable rules, extract harvests
-  PER-INCIDENT decision-shaped utterances — sidesteps the empirical
-  problem that agents focused on their primary task don't call
-  `note()` / `verbatim_user()` on their own.
-- **Autonomous Curator** — every `THREADKEEPER_CURATOR_INTERVAL_S`
-  seconds (default off; 7 days recommended), spawns a slim child that
-  reviews the EXISTING `lessons.md` + `skill_usage` inventory and
-  writes `~/.threadkeeper/curator/REPORT-<isodate>.md` with KEEP /
-  PATCH / CONSOLIDATE / PRUNE recommendations. Pinned and
-  foreground-authored entries are marked `[PROTECTED]` in the
-  inventory so the curator never proposes destructive changes against
-  them. Phase 1 is advisory-only — user reviews the REPORT and
-  applies changes manually. Inspired by Hermes Agent v0.12's
-  `hermes curator` cron agent.
+### Learning loops
+Five loops turn raw agent dialog into a curated, multi-CLI-mirrored
+skill library — autonomously, without requiring agents to call
+`note()` / `verbatim_user()` / `close_thread()` on their own (audit
+shows agents focused on their primary task rarely do).
+**Pipeline at a glance:**
+```
+   every CLI's transcripts
+            │
+            ▼  (ingest, every 30s — always-on)
+   dialog_messages  ◄──────────────────────────────────────┐
+            │                                              │
+            ├────────► [1] auto_review on close_thread     │
+            │              (agent triggers — rare)         │
+            │                  │                           │
+            ├────────► [2] shadow_review daemon            │
+            │              (cron, every 15 min)            │
+            │                  │                           │
+            ├────────► [3] extract daemon                  │
+            │              (cron, every 10 min)            │
+            │                  │                           │
+            │              extract_candidates              │
+            │                  │                           │
+            │                  ▼                           │
+            │          [4] candidate_reviewer daemon       │
+            │              (cron, every 1 h) ──────────────┤
+            │                  │                           │
+            ▼                  ▼                           │
+         brief()    SKILL.md + lessons.md ─► skill_usage   │
+            │              │                  │            │
+            │              ▼                  ▼            │
+            │         (every detected CLI's   │            │
+            │          skills/ directory)     │            │
+            │              │                  │            │
+            │              └──────► [5] Curator daemon ───┘
+            │                          (cron, every 7d)
+            │                              │
+            │                              ▼
+            │                       REPORT-<date>.md
+            ▼
+   injected into every new session at SessionStart
+```
+**Each loop in one row:**
+| # | Loop | Default tick | Reads | Writes |
+|---|---|---|---|---|
+| 1 | auto_review on close_thread | on `close_thread()` for rich threads | the thread's notes | SKILL.md, lessons.md |
+| 2 | shadow_review daemon | every 15 min (env knob) | recent `dialog_messages` window | SKILL.md, lessons.md |
+| 3 | extract daemon | every 10 min (env knob) | recent `dialog_messages` window | `extract_candidates` pending queue |
+| 4 | candidate-reviewer daemon | every 1 h (env knob) | pending candidates queue | SKILL.md (create/patch) / notes / verbatim / reject |
+| 5 | Curator daemon | every 7 days (env knob) | every existing lesson + recently-touched skill | REPORT-`<date>`.md (advisory) or direct PATCH/PRUNE/CONSOLIDATE |
+All five write into the universal Skill format (`SKILL.md` under each
+detected CLI's skills directory — `~/.claude/skills/`,
+`~/.codex/skills/`, plus the canonical `~/.threadkeeper/skills/`
+mirror), with `~/.threadkeeper/lessons.md` as a CLI-agnostic fallback
+for clients without a native skills loader (Gemini, Copilot, bare
+MCP).
+#### 1. Auto-review on close_thread
+When a closed thread is rich (≥5 notes, ≥2 insight/move),
+`close_thread` spawns a slim child with `SKILL_REVIEW_PROMPT` + the
+thread's notes. The prompt is rubric-form (Q1–Q5 yes/no) with explicit
+positive examples for incident-vs-rule classification. The fork also
+receives a "recently active skills" block so it prefers PATCHing
+existing umbrellas over creating new ones (*active-update bias*).
+Child appends a lesson via `lesson_append`, optionally mirrors to
+`~/.claude/skills/<name>/SKILL.md`, then closes with
+`mark_skill_materialized`. Opt in with `THREADKEEPER_AUTO_REVIEW=1`.
+#### 2. Shadow-review daemon
+Every `THREADKEEPER_SHADOW_REVIEW_INTERVAL_S` seconds (default off,
+900 = 15 min recommended) scans the diff of `dialog_messages` since
+the last cursor **across all CLIs at once**. The window filters
+internal review-child sessions (no self-pollution) and strips adapter
+`[tool_result]` / `[tool_call]` noise (the "clean context" rule). If
+≥500 chars of meaningful signal remain, spawns a slim observer child
+that decides on class-level learning. Idempotent through
+`events.kind='shadow_review_pass'`.
+#### 3. Extract daemon
+Every `THREADKEEPER_EXTRACT_INTERVAL_S` seconds (default off, 600 =
+10 min recommended) scans recent `dialog_messages` with heuristic
+matchers: locale-aware "I want / next time / always" patterns,
+headers + insight markers, bullet regularities, and paraphrase
+clusters via cosine ≥ 0.80. Each match enqueues a row in
+`extract_candidates.status='pending'`. Same self-pollution filter as
+shadow_review (internal review-child sessions excluded) plus
+message-level noise filter (compaction summaries, SKILL.md
+injections, subagent role prompts, test-runner log dumps).
+Where shadow extracts CLASS-LEVEL durable rules, extract harvests
+PER-INCIDENT decision-shaped utterances. Heuristic, not LLM —
+findings get refined by loop 4.
+#### 4. Candidate-reviewer daemon
+Every `THREADKEEPER_CANDIDATE_REVIEW_INTERVAL_S` seconds (default off,
+3600 = 1 h recommended) consumes the pending queue extract built up.
+Spawns a slim LLM child that decides per candidate or per coherent
+cluster:
+- **SKILL.create** — class-level rule; merge 2-5 related candidates
+  into one skill (active-update bias prefers PATCH over CREATE)
+- **SKILL.patch** — refines a recently-active skill
+- **SKILL.write_file** — adds `references/<topic>.md` under an
+  existing umbrella
+- **NOTE** — per-incident decision (requires `thread_id`)
+- **VERBATIM** — user quote worth preserving in `brief()`
+- **REJECT** — false positive that slipped past extract's filters
+Hard limits: max 2 new skills per pass, `[PROTECTED]` (pinned +
+foreground-authored) skills off-limits. Closes the gap between
+heuristic harvest and SKILL.md materialization — previously pending
+candidates accumulated indefinitely waiting for an agent to call
+`accept_candidate()` manually.
+#### 5. Autonomous Curator
+Every `THREADKEEPER_CURATOR_INTERVAL_S` seconds (default off, 604800
+= 7 days recommended) spawns a slim child that reviews the EXISTING
+`lessons.md` + `skill_usage` inventory and writes
+`~/.threadkeeper/curator/REPORT-<isodate>.md` with KEEP / PATCH /
+CONSOLIDATE / PRUNE recommendations. Pinned and foreground-authored
+entries are marked `[PROTECTED]` in the inventory so the curator
+never proposes destructive changes against them.
+Phase 1 is advisory-only (REPORT only); flip
+`THREADKEEPER_CURATOR_DESTRUCTIVE=1` once trust builds to let the
+child apply its own recommendations directly.
+#### Honest take
+What works **without** agent cooperation (passive, opt-in via env):
+- Loop 2 (shadow), 3 (extract), 4 (candidate-reviewer), 5 (curator) —
+  all run from the parent process, never require `note()` or
+  `close_thread()` from the agent
+What depends on the agent **calling tools explicitly**:
+- Loop 1 (auto-review on close_thread) — only fires if the agent
+  closes threads, which the audit shows agents focused on coding
+  tasks rarely do
+- Manual `skill_record(outcome='wrong')` — strongest feedback signal
+  to the Curator, but agents need to remember to flag bad skills
+The whole point of having five loops (not one) is graceful
+degradation: even when agents don't actively contribute, loops 2-5
+keep the library growing from passive observation of the dialog
+stream.
 ### Dialectic user model
 A model of you, accumulated as you use the agent. `dialectic_claim`,
-`dialectic_evidence` (support / contradict / clarifying),
-`dialectic_synthesis`, `dialectic_supersede`. Honcho-inspired smoothed
-ratio `(s-c)/(s+c+3)` → low / medium / high / disputed confidence.
+`dialectic_evidence` (support / contradict),
+`dialectic_synthesis`, `dialectic_supersede`. Honcho-inspired
+**weighted, smoothed** ratio
+`(Σw_support − Σw_contradict) / (Σw_support + Σw_contradict + 3)`
+→ low / medium / high / disputed confidence.
 Grouped by domain (style, values, workflow, ...) in `brief()`.
+**Source-based evidence discount.** Each evidence row's effective weight
+is `base_weight × discount(WRITE_ORIGIN)`. Foreground (direct user / human
+signal) = 1.0. shadow_review / background_review / candidate_review /
+curator review-forks = 0.5. Structural defence against self-confirmation
+loops: a claim that surfaces in `brief()` and then gets "confirmed" by a
+review-fork reading the same dialog can't ride that internal evidence
+all the way to high confidence — internal evidence buys half as much.
+**Discrete tier on each claim** — `hypothesis → observed → validated`
+(plus `disputed`). Independent of the continuous confidence band; tier
+is the **action-gating** signal:
+- `validated` → agent applies by default (★ in brief)
+- `observed`  → agent references and may mention the assumption (· in brief)
+- `hypothesis` → active probe; surfaces in a separate `currently_testing`
+  block so the agent watches the next user moves through that lens
+Transitions are discrete events (`tier_promoted` / `tier_demoted` in the
+`events` table) with timestamps for an auditable trail of when each
+claim earned trust. Thresholds:
+- `hypothesis → observed`: `w_support ≥ 2.0` (claim has real backing)
+- `observed → validated`: `w_support ≥ 4.0` **and** no contradict in 14 days
+- `validated → observed`: any recent contradict (demote on user pushback)
+- any → `disputed`: `w_contradict > w_support`
+- `disputed → hypothesis`: support overtakes contradict (recovery path)
 ### i18n bundle
 All multilingual regex and prompt fragments live in
@@ -271,11 +409,13 @@ The most-used env knobs (full list in `threadkeeper/config.py`):
 | `THREADKEEPER_SHADOW_REVIEW_WINDOW_S` | 900 | sliding window for shadow scan (s) |
 | `THREADKEEPER_EXTRACT_INTERVAL_S` | 0 (off) | extract daemon tick (s); 600 = 10 min recommended |
 | `THREADKEEPER_EXTRACT_WINDOW_MIN` | 30 | sliding dialog window per extract pass (min) |
+| `THREADKEEPER_CANDIDATE_REVIEW_INTERVAL_S` | 0 (off) | candidate-reviewer daemon tick (s); 3600 = 1h recommended |
+| `THREADKEEPER_CANDIDATE_REVIEW_MIN` | 3 | min pending candidates before reviewer engages |
 | `THREADKEEPER_CURATOR_INTERVAL_S` | 0 (off) | curator daemon tick (s); 604800 = 7d recommended |
 | `THREADKEEPER_CURATOR_MIN_LESSONS` | 3 | min lessons before curator engages |
 | `THREADKEEPER_CURATOR_DESTRUCTIVE` | "" (advisory) | when "1": curator child applies its own PATCH/PRUNE/CONSOLIDATE directly instead of writing advisory REPORT only |
 | `THREADKEEPER_SPAWN_BUDGET_MB` | 3072 | combined child RSS cap (MB); 0 disables |
-| `THREADKEEPER_INGEST_INTERVAL_S` | 30 | transcript ingest tick (s) |
+| `THREADKEEPER_INGEST_INTERVAL_S` | 3 | transcript ingest tick (s) |
 | `THREADKEEPER_NO_EMBEDDINGS` | "" | force-disable sentence-transformers |
 | `THREADKEEPER_SKILL_NUDGE_INTERVAL` | 10 | events between `skill_hint` nudges |
@@ -283,6 +423,73 @@ Persist them via `~/.claude/settings.json`'s `env` block (Claude Code) or
 the equivalent env section in each CLI's config. Hot-config reload is
 [tracked](https://github.com/po4erk91/thread-keeper/issues/2).
+### Per-loop agent dispatch
+By default every learning-loop spawn runs through the same CLI that
+hosts thread-keeper — Opus-session ⇒ Opus spawn, Codex-session ⇒
+Codex spawn, etc. Detection: process-tree walk at startup, cached for
+the server lifetime. The MCP tool `spawn_status()` shows the live
+resolution table.
+Override per role via `~/.threadkeeper/spawn.toml`:
+```toml
+[default]
+agent = "auto"     # "auto" = use active CLI (default)
+[loops]
+# Force specific roles to specific CLIs regardless of active host
+shadow_observer    = "claude"   # heaviest reasoning → keep on Claude
+curator            = "codex"    # weekly audit → Codex is fine
+candidate_reviewer = "auto"     # follow active CLI
+archivist          = "claude"   # close_thread auto-review
+extract            = "auto"     # this one is local (no spawn)
+[models]
+# Optional per-CLI model pin — overrides each CLI's own default
+claude = "opus"
+codex  = "gpt-5.4"
+gemini = "gemini-2.5-pro"
+```
+Or via env (highest priority, overrides the TOML):
+```bash
+export THREADKEEPER_SPAWN_DEFAULT=codex                 # global default
+export THREADKEEPER_SPAWN_LOOP_CURATOR=gemini           # per-role
+export THREADKEEPER_SPAWN_MODEL_CLAUDE=opus             # per-CLI model
+export THREADKEEPER_ACTIVE_CLI=claude                   # force detection
+```
+Adapters without headless support (Claude Desktop, VS Code) can't be
+spawn targets — `spawn_status()` reports them as "no adapter" and any
+override pointing at them falls back to the next priority level.
+---
+## Hygiene tools
+Two tools keep the memory tidy — both default to `dry_run=True`, run
+them with `dry_run=False` to apply:
+- **`consolidate()`** — dedup near-identical notes (intra-thread cosine
+  ≥ 0.95), deduplicate verbatim quotes, demote untouched-active threads
+  to `idle` after 30 days, release orphaned thread claims.
+- **`validate_threads()`** — heuristic triage of active threads with
+  four categories (first match wins per thread):
+  - `no_notes_old` — active with zero notes ≥ 7 days → close as abandoned.
+  - `shipped` — last note matches a shipped-marker regex (EN+RU:
+    shipped/fixed/works/passed/done/merged/закрыто/готово/сделано/…)
+    and has settled ≥ 3 days → close with the last move as outcome.
+  - `dropped_open_q` — last note is an `open_q` left unfollowed
+    ≥ 14 days → close as dropped.
+  - `stale_idle` — any active not touched in ≥ 30 days → demote to
+    `idle` (not closed — revives on next `note()`).
+  Idle threads are never touched. Tunable via `no_notes_days`,
+  `shipped_settle_days`, `drop_open_q_days`, `stale_days`, and
+  `shipped_markers` (comma-separated extra tokens).
 ---
 ## Storage
@@ -317,7 +524,7 @@ pip install -e '.[semantic,dev]'
 python -m pytest
 ```
-412 tests passing on Python 3.11 / 3.12 / 3.13 (1 skipped). CI runs
+495 tests passing on Python 3.11 / 3.12 / 3.13 (1 skipped). CI runs
 the suite on every push and PR.
 ---
@@ -342,11 +549,13 @@ threadkeeper/
 │   ├── gemini.py
 │   ├── copilot.py
 │   └── vscode.py
-└── tools/                # @mcp.tool entries — 83 of them
+└── tools/                # @mcp.tool entries — 89 of them
     ├── threads.py
     ├── peers.py
     ├── spawn.py
     ├── skills.py
+    ├── dialectic.py
+    ├── validate.py
     └── ...
 ```

threadkeeper 0.4.1__tar.gz → 0.5.0__tar.gz

threadkeeper 0.4.1tar.gz → 0.5.0tar.gz