PyPI - threadkeeper - Versions diffs - 0.12.0__tar.gz → 0.13.1__tar.gz - Mend

threadkeeper 0.12.0tar.gz → 0.13.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (153) hide show

{threadkeeper-0.12.0 → threadkeeper-0.13.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: threadkeeper
-Version: 0.12.0
+Version: 0.13.1
 Summary: Multi-agent shared brain across Claude Code/Desktop, Codex, Antigravity CLI, Gemini, Copilot, VS Code. Cross-session memory, self-improving skill loops, inter-agent signaling — one local MCP server.
 Author: thread-keeper contributors
 License: MIT
@@ -25,6 +25,7 @@ License-File: LICENSE
 Requires-Dist: mcp>=1.0.0
 Requires-Dist: pydantic>=2
 Requires-Dist: pydantic-settings>=2
+Requires-Dist: pyyaml>=6.0
 Provides-Extra: semantic
 Requires-Dist: fastembed>=0.3; extra == "semantic"
 Requires-Dist: numpy>=1.24.0; extra == "semantic"
@@ -221,7 +222,7 @@ tk-agent-status --cleanup-memory
 ```
 `apps/macos-agent-status/` contains a small macOS menu-bar app that polls this
-command every 5 seconds and shows every autonomous learning loop: enabled/off,
+command every 15 seconds and shows every autonomous learning loop: enabled/off,
 running/idle/ready, last pass, backlog, and active child RSS when that loop has
 spawned a worker. PyPI wheels and sdists also bundle the same Swift source under
 `threadkeeper/assets/macos-agent-status/`, so a normal `pipx`/`uv tool` install
@@ -239,17 +240,22 @@ memory button, self-restarts when its own RSS crosses
 notification permission, and sends a notification when a newly completed
 autonomous child task produces a useful result in `recent_results`; the first
 poll only marks existing results as seen, so old completions do not spam
-notifications. The header gear opens a separate Settings window for
+notifications. Status polling and cleanup commands run off the main actor, so
+opening the popover does not wait for `tk-agent-status --json`. The header gear
+opens a separate Settings window for
 `~/.threadkeeper/.env`: common knobs are grouped into guided controls, the raw
 `.env` remains editable for advanced values, three local presets can be saved
 and loaded, and Save & Restart writes the file then asks existing
 `threadkeeper.server` processes to exit so MCP hosts reconnect with the new
-configuration. Probe backlog is due objective
+configuration. Spawn CLI selectors collapse `agy` into canonical `antigravity`
+while keeping `gemini` as legacy, and model selectors use dropdowns with exact
+CLI model ids/labels instead of free-text fields. Probe backlog is due objective
 probes only, not every registered probe, so a healthy cooldown shows `0 due
 probes` instead of looking stuck. On macOS, `python -m threadkeeper.server`
-automatically installs and launches it on MCP startup, and restarts the app when
-the installed bundle has changed while an older menu-bar process is still
-running. Set
+automatically installs and launches it on MCP startup. The installed app records
+a source fingerprint, so package upgrades rebuild the helper even when an older
+bundle has a newer file timestamp, then restart any stale running menu-bar
+process. Set
 `THREADKEEPER_MENUBAR_AUTO_LAUNCH=0` to disable that behavior.
 ### Auto Update
@@ -633,11 +639,15 @@ keys are lowercased:
 # default agent for roles with no explicit pin ("" / unset = use the active CLI)
 THREADKEEPER_SPAWN__DEFAULT=claude
 # per-role CLI:  THREADKEEPER_SPAWN__LOOP__<ROLE>=<cli>
+# supported CLI keys: claude, codex, antigravity (agy executable), gemini (legacy), copilot
 THREADKEEPER_SPAWN__LOOP__SHADOW_OBSERVER=claude   # heaviest reasoning → keep on Claude
 THREADKEEPER_SPAWN__LOOP__CURATOR=codex            # weekly audit → Codex is fine
 THREADKEEPER_SPAWN__LOOP__CANDIDATE_REVIEWER=auto  # "auto" = follow active CLI
 # model pin per CLI or per role:  THREADKEEPER_SPAWN__MODEL__<KEY>=<model>
 THREADKEEPER_SPAWN__MODEL__CLAUDE=opus
+THREADKEEPER_SPAWN__MODEL__CODEX=gpt-5.5
+THREADKEEPER_SPAWN__MODEL__AGY="Gemini 3.1 Pro (High)"
+THREADKEEPER_SPAWN__MODEL__GEMINI=gemini-3.1-pro-preview
 THREADKEEPER_SPAWN__MODEL__DIALECTIC_VALIDATOR=opus
 ```
@@ -645,7 +655,9 @@ Resolution per role: `SPAWN__LOOP__<role>` → `SPAWN__DEFAULT` → active CLI
 `claude`; `"auto"` (or unset) defers to the active CLI. Real environment
 variables override the `.env`. Force host detection with
 `THREADKEEPER_ACTIVE_CLI=claude` (or `codex`, `antigravity`/`agy`,
-`gemini`, `copilot`). See `.env.example` for the full knob list.
+`gemini`, `copilot`). `agy` is normalized to `antigravity`; `gemini` remains a
+legacy Gemini CLI adapter for old installs/enterprise paths. See `.env.example`
+for the full knob list.
 Adapters without headless support (Claude Desktop, VS Code) can't be
 spawn targets — `spawn_status()` reports them as "no adapter" and any
@@ -745,12 +757,34 @@ unchanged.
 ## Verifying ingest across CLIs
 ```bash
-python scripts/tk_verify_ingest.py
+python scripts/tk_verify_ingest.py            # both checks below
+python scripts/tk_verify_ingest.py --contract # parse/ingest contract only
+python scripts/tk_verify_ingest.py --live      # production verdict only
+python scripts/tk_verify_ingest.py --live --json   # machine-readable
 ```
-Walks every installed CLI adapter, parses recent transcripts in an
-isolated tempdir DB, reports per-source message counts and any silent
-parse failures. Read-only with respect to live state.
+Two read-only checks:
+- **Contract test** (`--contract`) — walks every installed CLI adapter,
+  parses recent transcripts into an isolated tempdir DB, reports
+  per-source message counts and flags any adapter that parsed messages
+  but silently failed to persist them. Answers *"does the pipeline
+  work?"*
+- **Production verification** (`--live`) — reads the **live**
+  `dialog_messages` table read-only and scores the three acceptance
+  criteria from [roadmap issue #1](https://github.com/po4erk91/thread-keeper/issues/1):
+  (1) every targeted CLI *slot* has production rows, (2) shadow-review
+  sees more than one adapter in the same recent window, (3) the learning
+  loop has fired on non-Claude sessions. Emits a `PASS` / `PARTIAL` /
+  `FAIL` verdict. The four slots are `claude-code`, `codex`, `copilot`,
+  and `google` — where the Google slot is satisfied by *either* the
+  legacy `gemini` adapter or its successor Antigravity (`agy`), since
+  both live under `~/.gemini`.
+`--strict` makes the process exit non-zero unless the live verdict is
+`PASS`, so it can gate CI; `PARTIAL` (e.g. a box that doesn't run all
+four CLIs) is a valid real-world state and exits 0 by default. The
+reusable verdict logic lives in `threadkeeper/verify_ingest.py`.
 ---
@@ -776,6 +810,7 @@ threadkeeper/
 ├── db.py                 # SQLite schema + sqlite-vec loader
 ├── identity.py           # session, self-cid, daemon launchers
 ├── ingest.py             # adapter-driven transcript ingest
+├── verify_ingest.py      # cross-CLI production verification verdict
 ├── brief.py              # render_brief / render_context
 ├── shadow_review.py      # autonomous learning observer
 ├── i18n.py               # 10 locales of regex + prompt bundles
@@ -814,3 +849,5 @@ locale. Look for the `good-first-issue` label.
 ## License
 MIT — see [LICENSE](LICENSE).
+<!-- mcp-name: io.github.po4erk91/thread-keeper -->

{threadkeeper-0.12.0 → threadkeeper-0.13.1}/README.md RENAMED Viewed

@@ -180,7 +180,7 @@ tk-agent-status --cleanup-memory
 ```
 `apps/macos-agent-status/` contains a small macOS menu-bar app that polls this
-command every 5 seconds and shows every autonomous learning loop: enabled/off,
+command every 15 seconds and shows every autonomous learning loop: enabled/off,
 running/idle/ready, last pass, backlog, and active child RSS when that loop has
 spawned a worker. PyPI wheels and sdists also bundle the same Swift source under
 `threadkeeper/assets/macos-agent-status/`, so a normal `pipx`/`uv tool` install
@@ -198,17 +198,22 @@ memory button, self-restarts when its own RSS crosses
 notification permission, and sends a notification when a newly completed
 autonomous child task produces a useful result in `recent_results`; the first
 poll only marks existing results as seen, so old completions do not spam
-notifications. The header gear opens a separate Settings window for
+notifications. Status polling and cleanup commands run off the main actor, so
+opening the popover does not wait for `tk-agent-status --json`. The header gear
+opens a separate Settings window for
 `~/.threadkeeper/.env`: common knobs are grouped into guided controls, the raw
 `.env` remains editable for advanced values, three local presets can be saved
 and loaded, and Save & Restart writes the file then asks existing
 `threadkeeper.server` processes to exit so MCP hosts reconnect with the new
-configuration. Probe backlog is due objective
+configuration. Spawn CLI selectors collapse `agy` into canonical `antigravity`
+while keeping `gemini` as legacy, and model selectors use dropdowns with exact
+CLI model ids/labels instead of free-text fields. Probe backlog is due objective
 probes only, not every registered probe, so a healthy cooldown shows `0 due
 probes` instead of looking stuck. On macOS, `python -m threadkeeper.server`
-automatically installs and launches it on MCP startup, and restarts the app when
-the installed bundle has changed while an older menu-bar process is still
-running. Set
+automatically installs and launches it on MCP startup. The installed app records
+a source fingerprint, so package upgrades rebuild the helper even when an older
+bundle has a newer file timestamp, then restart any stale running menu-bar
+process. Set
 `THREADKEEPER_MENUBAR_AUTO_LAUNCH=0` to disable that behavior.
 ### Auto Update
@@ -592,11 +597,15 @@ keys are lowercased:
 # default agent for roles with no explicit pin ("" / unset = use the active CLI)
 THREADKEEPER_SPAWN__DEFAULT=claude
 # per-role CLI:  THREADKEEPER_SPAWN__LOOP__<ROLE>=<cli>
+# supported CLI keys: claude, codex, antigravity (agy executable), gemini (legacy), copilot
 THREADKEEPER_SPAWN__LOOP__SHADOW_OBSERVER=claude   # heaviest reasoning → keep on Claude
 THREADKEEPER_SPAWN__LOOP__CURATOR=codex            # weekly audit → Codex is fine
 THREADKEEPER_SPAWN__LOOP__CANDIDATE_REVIEWER=auto  # "auto" = follow active CLI
 # model pin per CLI or per role:  THREADKEEPER_SPAWN__MODEL__<KEY>=<model>
 THREADKEEPER_SPAWN__MODEL__CLAUDE=opus
+THREADKEEPER_SPAWN__MODEL__CODEX=gpt-5.5
+THREADKEEPER_SPAWN__MODEL__AGY="Gemini 3.1 Pro (High)"
+THREADKEEPER_SPAWN__MODEL__GEMINI=gemini-3.1-pro-preview
 THREADKEEPER_SPAWN__MODEL__DIALECTIC_VALIDATOR=opus
 ```
@@ -604,7 +613,9 @@ Resolution per role: `SPAWN__LOOP__<role>` → `SPAWN__DEFAULT` → active CLI
 `claude`; `"auto"` (or unset) defers to the active CLI. Real environment
 variables override the `.env`. Force host detection with
 `THREADKEEPER_ACTIVE_CLI=claude` (or `codex`, `antigravity`/`agy`,
-`gemini`, `copilot`). See `.env.example` for the full knob list.
+`gemini`, `copilot`). `agy` is normalized to `antigravity`; `gemini` remains a
+legacy Gemini CLI adapter for old installs/enterprise paths. See `.env.example`
+for the full knob list.
 Adapters without headless support (Claude Desktop, VS Code) can't be
 spawn targets — `spawn_status()` reports them as "no adapter" and any
@@ -704,12 +715,34 @@ unchanged.
 ## Verifying ingest across CLIs
 ```bash
-python scripts/tk_verify_ingest.py
+python scripts/tk_verify_ingest.py            # both checks below
+python scripts/tk_verify_ingest.py --contract # parse/ingest contract only
+python scripts/tk_verify_ingest.py --live      # production verdict only
+python scripts/tk_verify_ingest.py --live --json   # machine-readable
 ```
-Walks every installed CLI adapter, parses recent transcripts in an
-isolated tempdir DB, reports per-source message counts and any silent
-parse failures. Read-only with respect to live state.
+Two read-only checks:
+- **Contract test** (`--contract`) — walks every installed CLI adapter,
+  parses recent transcripts into an isolated tempdir DB, reports
+  per-source message counts and flags any adapter that parsed messages
+  but silently failed to persist them. Answers *"does the pipeline
+  work?"*
+- **Production verification** (`--live`) — reads the **live**
+  `dialog_messages` table read-only and scores the three acceptance
+  criteria from [roadmap issue #1](https://github.com/po4erk91/thread-keeper/issues/1):
+  (1) every targeted CLI *slot* has production rows, (2) shadow-review
+  sees more than one adapter in the same recent window, (3) the learning
+  loop has fired on non-Claude sessions. Emits a `PASS` / `PARTIAL` /
+  `FAIL` verdict. The four slots are `claude-code`, `codex`, `copilot`,
+  and `google` — where the Google slot is satisfied by *either* the
+  legacy `gemini` adapter or its successor Antigravity (`agy`), since
+  both live under `~/.gemini`.
+`--strict` makes the process exit non-zero unless the live verdict is
+`PASS`, so it can gate CI; `PARTIAL` (e.g. a box that doesn't run all
+four CLIs) is a valid real-world state and exits 0 by default. The
+reusable verdict logic lives in `threadkeeper/verify_ingest.py`.
 ---
@@ -735,6 +768,7 @@ threadkeeper/
 ├── db.py                 # SQLite schema + sqlite-vec loader
 ├── identity.py           # session, self-cid, daemon launchers
 ├── ingest.py             # adapter-driven transcript ingest
+├── verify_ingest.py      # cross-CLI production verification verdict
 ├── brief.py              # render_brief / render_context
 ├── shadow_review.py      # autonomous learning observer
 ├── i18n.py               # 10 locales of regex + prompt bundles
@@ -773,3 +807,5 @@ locale. Look for the `good-first-issue` label.
 ## License
 MIT — see [LICENSE](LICENSE).
+<!-- mcp-name: io.github.po4erk91/thread-keeper -->

{threadkeeper-0.12.0 → threadkeeper-0.13.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "threadkeeper"
-version = "0.12.0"
+version = "0.13.1"
 description = "Multi-agent shared brain across Claude Code/Desktop, Codex, Antigravity CLI, Gemini, Copilot, VS Code. Cross-session memory, self-improving skill loops, inter-agent signaling — one local MCP server."
 requires-python = ">=3.11"
 authors = [{ name = "thread-keeper contributors" }]
@@ -29,6 +29,7 @@ dependencies = [
     "mcp>=1.0.0",
     "pydantic>=2",
     "pydantic-settings>=2",
+    "pyyaml>=6.0",
 ]
 [project.optional-dependencies]

{threadkeeper-0.12.0 → threadkeeper-0.13.1}/tests/test_menubar_app.py RENAMED Viewed

@@ -46,6 +46,8 @@ def test_menubar_status_item_uses_idle_chip_and_running_gears():
     assert 'button.title = ""' in swift
     assert 'button.title = " TK' not in swift
     assert 'return "TK ' not in swift
+    assert "statusPollInterval: TimeInterval = 15.0" in swift
+    assert "Timer.scheduledTimer(withTimeInterval: statusPollInterval" in swift
     assert "Timer(timeInterval: gearSpinInterval" in swift
     assert "gearFrameStepDegrees = 17.0" in swift
     assert "largeGearDiameter: CGFloat = 12.0" in swift
@@ -59,6 +61,9 @@ def test_menubar_status_item_uses_idle_chip_and_running_gears():
     assert "store.snapshot.runningCount > 0" not in swift
     assert "button.image = gearFrames" in swift
     assert "TimelineView" not in swift
+    assert "refreshInFlight" in swift
+    assert "Task.detached(priority: .utility)" in swift
+    assert "nonisolated private static func runStatusCommand" in swift
     assert "store.openEnvSettings()" in swift
     assert '.help("Settings")' in swift
     assert '.help("Refresh")' not in swift
@@ -67,6 +72,19 @@ def test_menubar_status_item_uses_idle_chip_and_running_gears():
     assert '.help("Clean memory")' in swift
+def test_menubar_popover_shows_before_status_refresh():
+    repo = Path(__file__).resolve().parents[1]
+    swift = (
+        repo / "apps" / "macos-agent-status" / "ThreadKeeperAgentStatus.swift"
+    ).read_text(encoding="utf-8")
+    start = swift.index("@objc private func togglePopover")
+    end = swift.index("    private func updateStatusButton", start)
+    body = swift[start:end]
+    assert body.index("popover.show(") < body.index("store.refresh()")
 def test_menubar_env_settings_window_edits_env_and_presets():
     repo = Path(__file__).resolve().parents[1]
     swift = (
@@ -81,6 +99,22 @@ def test_menubar_env_settings_window_edits_env_and_presets():
     assert "(1...3).map" in swift
     assert "EnvPresetCard" in swift
     assert "mergeEnvText(raw:" in swift
+    assert "EnvSettingsTab" in swift
+    assert "case .raw:" in swift
+    assert "saveRaw(restart:" in swift
+    assert ".onChange(of: envStore.rawEnvText)" not in swift
+    assert "syncRawEditsIntoForm" not in swift
+    assert 'ChoiceOption("antigravity", label: "antigravity (agy)")' in swift
+    assert 'ChoiceOption("agy")' not in swift
+    assert 'ChoiceOption("gemini", label: "gemini (legacy)")' in swift
+    assert "antigravityModelChoices" in swift
+    assert "geminiLegacyModelChoices" in swift
+    assert '"Gemini 3.1 Pro (High)"' in swift
+    assert '"Gemini 3.5 Flash (Medium)"' in swift
+    assert '"gemini-3.1-pro-preview"' in swift
+    assert '"gemini-3.1-pro"' not in swift
+    assert "THREADKEEPER_SPAWN__MODEL__CODEX" in swift
+    assert "THREADKEEPER_SPAWN__MODEL__GEMINI" in swift
     assert "THREADKEEPER_DISABLE_BG_DAEMONS" in swift
     assert "THREADKEEPER_EVOLVE_APPLY_INTERVAL_S" in swift
     assert "THREADKEEPER_SPAWN__MODEL__EVOLVE_APPLIER" in swift
@@ -96,6 +130,35 @@ def test_menubar_source_falls_back_to_packaged_assets(fresh_mp, tmp_path, monkey
     assert menubar_app._source_dir() == menubar_app._package_source_dir()
+def test_app_current_requires_matching_source_fingerprint(tmp_path):
+    import threadkeeper.menubar_app as menubar_app
+    src = tmp_path / "source"
+    src.mkdir()
+    for name in menubar_app.SOURCE_FILES:
+        (src / name).write_text(f"{name}\n", encoding="utf-8")
+    app = tmp_path / menubar_app.APP_BUNDLE
+    binary = app / "Contents" / "MacOS" / menubar_app.APP_NAME
+    plist = app / "Contents" / "Info.plist"
+    binary.parent.mkdir(parents=True)
+    plist.parent.mkdir(parents=True, exist_ok=True)
+    binary.write_text("old binary\n", encoding="utf-8")
+    plist.write_text("<plist></plist>\n", encoding="utf-8")
+    assert menubar_app._app_is_current(src, app) is False
+    marker = menubar_app._source_fingerprint_path(app)
+    marker.parent.mkdir(parents=True)
+    marker.write_text(menubar_app._source_fingerprint(src) + "\n", encoding="utf-8")
+    assert menubar_app._app_is_current(src, app) is True
+    (src / "ThreadKeeperAgentStatus.swift").write_text("// changed\n", encoding="utf-8")
+    assert menubar_app._app_is_current(src, app) is False
 def test_install_app_builds_from_task_log_scratch_without_executable_bit(
     fresh_mp,
     tmp_path,
@@ -139,6 +202,10 @@ def test_install_app_builds_from_task_log_scratch_without_executable_bit(
     assert calls[0][2] == task_logs / "menubar-build" / "source"
     assert (installed / "Contents" / "Info.plist").exists()
     assert (installed / "Contents" / "MacOS" / menubar_app.APP_NAME).exists()
+    marker = menubar_app._source_fingerprint_path(installed)
+    assert marker.read_text(encoding="utf-8").strip() == menubar_app._source_fingerprint(
+        src
+    )
     assert not (src / "build").exists()

{threadkeeper-0.12.0 → threadkeeper-0.13.1}/tests/test_spawn_config.py RENAMED Viewed

@@ -114,12 +114,12 @@ def test_resolve_model_from_dotenv(tmp_path, monkeypatch):
     envf = tmp_path / "tk.env"
     envf.write_text(
         "THREADKEEPER_SPAWN__MODEL__CODEX=gpt-5.4\n"
-        "THREADKEEPER_SPAWN__MODEL__AGY=gemini-3.1-pro\n"
+        'THREADKEEPER_SPAWN__MODEL__AGY="Gemini 3.1 Pro (High)"\n'
         "THREADKEEPER_SPAWN__MODEL__GEMINI=gemini-2.5-pro\n"
     )
     sc = _reset(monkeypatch, tmp_path, env_file=str(envf))
     assert sc.resolve_model("codex") == "gpt-5.4"
-    assert sc.resolve_model("antigravity") == "gemini-3.1-pro"
+    assert sc.resolve_model("antigravity") == "Gemini 3.1 Pro (High)"
     assert sc.resolve_model("gemini") == "gemini-2.5-pro"
     assert sc.resolve_model("claude") == ""  # no entry
@@ -216,12 +216,12 @@ def test_antigravity_spawn_argv_uses_p_flag(tmp_path, monkeypatch):
     for name in [m for m in list(sys.modules) if m.startswith("threadkeeper")]:
         del sys.modules[name]
     from threadkeeper.adapters.antigravity import ADAPTER
-    argv = ADAPTER.spawn_argv("hello", model="gemini-3.1-pro")
+    argv = ADAPTER.spawn_argv("hello", model="Gemini 3.1 Pro (High)")
     if argv is None:
         pytest.skip("agy binary not installed in test env")
     assert "-p" in argv
     assert "--model" in argv
-    assert "gemini-3.1-pro" in argv
+    assert "Gemini 3.1 Pro (High)" in argv
 def test_gemini_spawn_argv_uses_p_flag(tmp_path, monkeypatch):

threadkeeper-0.13.1/tests/test_verify_ingest.py ADDED Viewed

@@ -0,0 +1,162 @@
+"""Tests for the cross-CLI production verification harness (issue #1).
+The verdict logic is pure, so most of this exercises ``evaluate_coverage``
+and ``evaluate_verdict`` directly. One test drives the read-only SQL layer
+against an in-memory sqlite so the live-DB query path is covered without a
+real ~/.threadkeeper store.
+"""
+from __future__ import annotations
+import sqlite3
+from threadkeeper.verify_ingest import (
+    CANONICAL_SLOTS,
+    collect_live_signals,
+    evaluate_coverage,
+    evaluate_verdict,
+    format_report,
+    slot_for_source,
+)
+def test_slot_mapping_groups_gemini_and_antigravity():
+    # Gemini legacy and Antigravity (agy) both satisfy the single Google slot.
+    assert slot_for_source("gemini") == "google"
+    assert slot_for_source("antigravity") == "google"
+    assert slot_for_source("claude-code") == "claude-code"
+    assert slot_for_source("vscode") is None  # not a canonical slot
+def test_coverage_status_verified_thin_absent():
+    cov = evaluate_coverage(
+        {"claude-code": 200, "codex": 50, "copilot": 2, "gemini": 0},
+        thin_threshold=5,
+    )
+    assert cov["claude-code"]["status"] == "verified"
+    assert cov["codex"]["status"] == "verified"
+    assert cov["copilot"]["status"] == "thin"   # 2 rows, below threshold
+    assert cov["google"]["status"] == "absent"  # gemini=0, no antigravity rows
+    # every canonical slot is represented even when no source mapped to it
+    assert set(cov) == set(CANONICAL_SLOTS)
+def test_coverage_antigravity_fills_google_slot():
+    cov = evaluate_coverage({"antigravity": 42}, thin_threshold=5)
+    assert cov["google"]["status"] == "verified"
+    assert cov["google"]["sources"] == {"antigravity": 42}
+def test_verdict_pass_when_all_criteria_met():
+    rep = evaluate_verdict(
+        source_counts={
+            "claude-code": 100, "codex": 100, "copilot": 100, "antigravity": 100,
+        },
+        window_sources=["claude-code", "codex", "antigravity"],
+        shadow_passes=10,
+    )
+    assert rep["verdict"] == "PASS"
+    assert rep["criteria"]["all_sources_present"]["pass"] is True
+    assert rep["criteria"]["cross_adapter_window"]["pass"] is True
+    assert rep["criteria"]["learning_loop_non_claude"]["pass"] is True
+def test_verdict_partial_three_of_four_slots():
+    # This is the real shape on a dev box: claude/codex/copilot present,
+    # google slot empty, but cross-adapter window + non-claude loop confirmed.
+    rep = evaluate_verdict(
+        source_counts={"claude-code": 200000, "codex": 11000, "copilot": 10},
+        window_sources=["claude-code", "codex"],
+        shadow_passes=2567,
+    )
+    assert rep["verdict"] == "PARTIAL"
+    assert rep["criteria"]["all_sources_present"]["pass"] is False
+    assert rep["criteria"]["all_sources_present"]["verified_slots"] == [
+        "claude-code", "codex", "copilot",
+    ]
+    assert rep["criteria"]["cross_adapter_window"]["pass"] is True
+    assert rep["criteria"]["learning_loop_non_claude"]["pass"] is True
+    assert "codex" in rep["criteria"]["learning_loop_non_claude"]["sources"]
+def test_verdict_fail_single_adapter_only():
+    # Only Claude Code has data and the window — not a cross-CLI demonstration.
+    rep = evaluate_verdict(
+        source_counts={"claude-code": 5000},
+        window_sources=["claude-code"],
+        shadow_passes=100,
+    )
+    assert rep["verdict"] == "FAIL"
+    assert rep["criteria"]["cross_adapter_window"]["pass"] is False
+    assert rep["criteria"]["learning_loop_non_claude"]["pass"] is False
+def test_verdict_fail_when_loop_never_ran():
+    rep = evaluate_verdict(
+        source_counts={"claude-code": 100, "codex": 100},
+        window_sources=["claude-code", "codex"],
+        shadow_passes=0,  # learning loop has never fired
+    )
+    # cross-adapter window passes, but loop criterion fails and only 2 slots
+    # verified → PARTIAL (loop is one signal, window is the other).
+    assert rep["verdict"] == "PARTIAL"
+    assert rep["criteria"]["learning_loop_non_claude"]["pass"] is False
+def _seed_live_db(conn: sqlite3.Connection) -> None:
+    conn.execute(
+        "CREATE TABLE dialog_messages (source TEXT, created_at INTEGER)"
+    )
+    conn.execute("CREATE TABLE events (kind TEXT)")
+    rows = [
+        ("claude-code", 1_000_000),
+        ("claude-code", 1_000_500),
+        ("codex", 1_000_600),   # interleaved with claude in the window
+        ("copilot", 100),       # ancient — outside the recent window
+    ]
+    conn.executemany(
+        "INSERT INTO dialog_messages (source, created_at) VALUES (?, ?)", rows
+    )
+    conn.executemany(
+        "INSERT INTO events (kind) VALUES (?)",
+        [("shadow_review_pass",)] * 3 + [("ingest_pass",)],
+    )
+    conn.commit()
+def test_collect_live_signals_reads_window_and_passes():
+    conn = sqlite3.connect(":memory:")
+    conn.row_factory = sqlite3.Row
+    _seed_live_db(conn)
+    sig = collect_live_signals(conn, window_hours=24)
+    assert sig["source_counts"] == {
+        "claude-code": 2, "codex": 1, "copilot": 1,
+    }
+    # newest is 1_000_600; copilot@100 is far outside a 24h window of it.
+    assert set(sig["window_sources"]) == {"claude-code", "codex"}
+    assert sig["shadow_passes"] == 3
+    assert sig["newest_ts"] == 1_000_600
+def test_collect_live_signals_tolerates_missing_events_table():
+    conn = sqlite3.connect(":memory:")
+    conn.row_factory = sqlite3.Row
+    conn.execute("CREATE TABLE dialog_messages (source TEXT, created_at INTEGER)")
+    conn.execute("INSERT INTO dialog_messages VALUES ('codex', 5)")
+    conn.commit()
+    sig = collect_live_signals(conn)
+    assert sig["shadow_passes"] == 0  # no events table → graceful 0
+def test_format_report_renders_verdict_and_slots():
+    rep = evaluate_verdict(
+        source_counts={"claude-code": 200000, "codex": 11000, "copilot": 10},
+        window_sources=["claude-code", "codex"],
+        shadow_passes=2567,
+    )
+    rep["db_path"] = "/tmp/x.sqlite"
+    rep["signals"] = {"window_hours": 24}
+    text = format_report(rep)
+    assert "VERDICT: PARTIAL" in text
+    assert "claude-code" in text
+    assert "learning_loop_non_claude" in text

{threadkeeper-0.12.0 → threadkeeper-0.13.1}/threadkeeper/assets/macos-agent-status/README.md RENAMED Viewed

@@ -5,7 +5,7 @@ The status-bar item itself is AppKit `NSStatusItem`; the popover content is
 SwiftUI. That lets the app update the menu-bar image directly instead of relying
 on SwiftUI `MenuBarExtra` label animation.
-It polls `tk-agent-status --json` every 5 seconds and shows:
+It polls `tk-agent-status --json` every 15 seconds and shows:
 - an icon-only menu-bar status item, with loop counts in the popover and
   tooltip,
@@ -20,10 +20,14 @@ It polls `tk-agent-status --json` every 5 seconds and shows:
 - active spawned-child RSS when a loop has a worker running,
 - a Clean memory button that runs `tk-agent-status --cleanup-memory`,
 - a Settings gear that opens a separate `~/.threadkeeper/.env` editor with
-  guided controls, raw text editing, three saved presets, and Save & Restart,
+  guided controls, exact dropdowns for spawn CLI/model choices, raw text
+  editing, three saved presets, and Save & Restart,
 - macOS notifications for newly completed autonomous child tasks that produced
   a useful result.
+Status polling and cleanup commands run in the background, so opening the
+popover does not wait for `tk-agent-status --json`.
 The first poll primes the seen-result list, so the app does not notify for old
 completed tasks that existed before it started.
@@ -35,9 +39,10 @@ keeps the menu-bar helper from becoming the memory-pressure offender.
 On macOS, `python -m threadkeeper.server` installs and launches this app
 automatically when the MCP server starts. The startup hook is idempotent: it
-rebuilds only when the installed app is missing or older than the source,
-registers the LaunchAgent, and restarts the app when a rebuild or stale running
-process means the menu-bar process is still using older code.
+rebuilds when the installed app is missing or its recorded source fingerprint no
+longer matches the bundled/source Swift files, registers the LaunchAgent, and
+restarts the app when a rebuild or stale running process means the menu-bar
+process is still using older code.
 Disable automatic startup with:
@@ -55,7 +60,9 @@ The Settings gear edits `~/.threadkeeper/.env` by default, or the path in
 `THREADKEEPER_ENV_FILE` when the app was launched with that override. Save &
 Restart writes the file, runs the safe cleanup command, and sends TERM to
 running `threadkeeper.server` processes so MCP hosts reconnect with the new
-environment.
+environment. In the spawn routing controls, `antigravity` is the stored CLI
+value and `agy` is only the executable alias; `gemini` remains available as the
+legacy Gemini CLI adapter.
 ## Build

threadkeeper 0.12.0__tar.gz → 0.13.1__tar.gz

threadkeeper 0.12.0tar.gz → 0.13.1tar.gz