cctally 1.28.0 → 1.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/CHANGELOG.md +30 -0
  2. package/bin/_cctally_cache.py +147 -59
  3. package/bin/_cctally_core.py +22 -49
  4. package/bin/_cctally_dashboard.py +239 -152
  5. package/bin/_cctally_db.py +211 -31
  6. package/bin/_cctally_milestones.py +126 -166
  7. package/bin/_cctally_record.py +161 -192
  8. package/bin/_lib_alert_axes.py +7 -4
  9. package/bin/_lib_conversation.py +59 -8
  10. package/bin/_lib_conversation_query.py +306 -52
  11. package/bin/_lib_jsonl.py +69 -50
  12. package/bin/cctally +5 -5
  13. package/dashboard/static/assets/index-4OxMhN7N.js +53 -0
  14. package/dashboard/static/assets/index-DEDO-eqP.css +1 -0
  15. package/dashboard/static/assets/newsreader-latin-400-italic-CEihAR-f.woff2 +0 -0
  16. package/dashboard/static/assets/newsreader-latin-400-italic-CNZoH1hn.woff +0 -0
  17. package/dashboard/static/assets/newsreader-latin-400-normal-BFBkh4jY.woff2 +0 -0
  18. package/dashboard/static/assets/newsreader-latin-400-normal-gRTjlS2D.woff +0 -0
  19. package/dashboard/static/assets/newsreader-latin-500-normal-B66TYsaK.woff2 +0 -0
  20. package/dashboard/static/assets/newsreader-latin-500-normal-DFwuUcdu.woff +0 -0
  21. package/dashboard/static/assets/newsreader-latin-600-normal-30OJ_TG_.woff2 +0 -0
  22. package/dashboard/static/assets/newsreader-latin-600-normal-DUnT2r2g.woff +0 -0
  23. package/dashboard/static/assets/newsreader-latin-ext-400-italic-BMTE_bNQ.woff2 +0 -0
  24. package/dashboard/static/assets/newsreader-latin-ext-400-italic-qdgKLcPG.woff +0 -0
  25. package/dashboard/static/assets/newsreader-latin-ext-400-normal-DYA1XoQK.woff +0 -0
  26. package/dashboard/static/assets/newsreader-latin-ext-400-normal-svq1FPys.woff2 +0 -0
  27. package/dashboard/static/assets/newsreader-latin-ext-500-normal-BNHmvKvI.woff2 +0 -0
  28. package/dashboard/static/assets/newsreader-latin-ext-500-normal-CZruMFou.woff +0 -0
  29. package/dashboard/static/assets/newsreader-latin-ext-600-normal-BXv5iMHi.woff2 +0 -0
  30. package/dashboard/static/assets/newsreader-latin-ext-600-normal-BrbfzHZ5.woff +0 -0
  31. package/dashboard/static/assets/newsreader-vietnamese-400-italic-QbB8kb5s.woff +0 -0
  32. package/dashboard/static/assets/newsreader-vietnamese-400-italic-bZegYFuM.woff2 +0 -0
  33. package/dashboard/static/assets/newsreader-vietnamese-400-normal-BekUZro8.woff +0 -0
  34. package/dashboard/static/assets/newsreader-vietnamese-400-normal-DdKr49mV.woff2 +0 -0
  35. package/dashboard/static/assets/newsreader-vietnamese-500-normal-BEAbKU8A.woff +0 -0
  36. package/dashboard/static/assets/newsreader-vietnamese-500-normal-CL6a8tp2.woff2 +0 -0
  37. package/dashboard/static/assets/newsreader-vietnamese-600-normal-CVAR0otO.woff +0 -0
  38. package/dashboard/static/assets/newsreader-vietnamese-600-normal-CaH84vfx.woff2 +0 -0
  39. package/dashboard/static/dashboard.html +2 -2
  40. package/package.json +1 -1
  41. package/dashboard/static/assets/index-Bj5ckRUE.css +0 -1
  42. package/dashboard/static/assets/index-Dw4G5FD9.js +0 -18
package/CHANGELOG.md CHANGED
@@ -5,6 +5,36 @@ based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
5
5
 
6
6
  ## [Unreleased]
7
7
 
8
+ ## [1.30.0] - 2026-06-09
9
+
10
+ ### Added
11
+ - **Keyboard navigation in the dashboard conversation reader.** With a conversation open, `j` / `k` move a focused-turn cursor between turns (scrolling each into view and auto-loading the next page when you reach the end), `[` / `]` collapse-all / expand-all the disclosure sections in the thread, and `g` jumps back to the top. The bindings are listed in the dashboard help overlay (`?`) under a new "Conversations" group and are inert while a modal or the rail search/filter input is active.
12
+ - **Syntax-highlighted code blocks and copy buttons in the conversation reader.** Fenced code in transcripts is now syntax-highlighted with a language label, and one-click copy buttons appear on code blocks, tool output, and message text.
13
+ - **Derived conversation titles.** Each conversation now shows a short derived title in the sidebar rail, the reader header, and full-text search results — and cross-session search now matches on that title too; the rail also groups conversations under date dividers.
14
+ - Jump-to-message now expands the owning collapsed subagent thread when the target message lands inside one, instead of silently scrolling to a hidden turn (#160).
15
+
16
+ ### Changed
17
+ - **The dashboard conversation viewer (introduced in 1.29.0) has been redesigned end-to-end.** Transcripts now render as serif prose (self-hosted Newsreader) on a readable ~68-character measure with higher-contrast text, laid out along a timeline spine with role-differentiated turns; assistant turns are walked in document order so each tool call renders paired with its result as an inline tool-I/O chip (chevron + preview) and stray orphan tool-result runs are collapsed; subagent sidechains render as weighted thread cards; system-command messages fold into an expandable pill; and inline-SVG icons replace the previous emoji throughout, with disclosure sections opening on a smooth animation and turns staggering in on first load behind a refined jump-flash (#161, #162, #164, #165, #168).
18
+ - Internal (no action needed on upgrade): a new cache migration (`003`) re-ingests existing conversation history id-aware so older transcripts pick up the tool-call linkage ids the paired tool-I/O view needs. The cache is fully re-derivable and the migration runs automatically on the next DB open.
19
+
20
+ ## [1.29.0] - 2026-06-08
21
+
22
+ ### Added
23
+ - **Dashboard conversation viewer.** New full-screen Conversations workspace in `cctally dashboard`: a cost-aware transcript reader (rendered markdown, per-turn cost, collapsible thinking/tool/sidechain detail) plus cross-session full-text search that jumps to the highlighted message. Loopback-only by default; LAN needs `dashboard.expose_transcripts`. Backend shipped earlier; this adds the front end.
24
+
25
+ ### Fixed
26
+ - **The local web dashboard server now tears down on a single SIGINT/SIGTERM unconditionally, closing a rare lost-wakeup race in its shutdown path.** `cctally dashboard` blocked its main thread on a `threading.Event.wait()` woken only by the SIGINT/SIGTERM handler calling `stop.set()` — and CPython can lose a single signal that races the entry into `Event.wait()` (the Python-level handler never runs, or `set()`'s `notify_all()` fires before the waiter registers), so ~0.04–0.07% of single signals failed to wake the loop and recovery needed a second signal (a timed-poll does *not* fix it: on the miss the flag is never set, so it polls forever). The wait now uses a self-pipe wakeup fd (`signal.set_wakeup_fd` + a `select` on the read end): CPython's C-level signal trampoline writes the signum to the pipe on every delivery — before and independent of the Python-level handler running — so the first signal always unblocks the loop. This was already mitigated in practice (interactive Ctrl-C sends more than one signal, a process manager escalates to SIGKILL, and the #153 harness fix already bounded test-server teardown), so there is no behavior change to the banner, browser-open, or clean-shutdown print paths and nothing to do on upgrade (#154).
27
+ - **`cctally db recover --db stats` no longer resets a recovered DB's schema `user_version` to 0 when a known migration is recorded only under its legacy unprefixed marker name.** When healing a version-ahead `stats.db`, the all-known-applied check that decides whether to fast-path straight to the known schema head compared each migration's canonical `NNN_`-prefixed name against the recorded markers without normalizing the three pre-framework legacy aliases (`five_hour_block_models_backfill_v1`, `five_hour_block_projects_backfill_v1`, `merge_5h_block_duplicates_v1`) to their canonical names — so a DB whose those markers predate the framework rename was misread as having a missing migration and reverted to `user_version = 0`, forcing an unnecessary full migration re-walk on the next open instead of reconciling directly to head. The recover path now normalizes legacy aliases before the membership test (matching the alias-aware read in `db status`), so such a DB reconciles straight to the known head; cache.db (which has no legacy markers) is unaffected (#148).
28
+ - **Internal (test infra, no user-facing change): the golden-file test suite no longer hangs indefinitely when a backgrounded dashboard test server drops a single SIGTERM.** The server-spawning harnesses (`dashboard`, `conversation`, `settings-api`) tore their `cctally dashboard` test servers down with an unbounded `kill "$pid"; wait "$pid"` — and CPython can lose a single SIGTERM that races the server's main-thread `threading.Event.wait()` (woken only by its signal handler's `stop.set()`; ~0.04–0.07% of single signals are dropped and recovery needs a second signal), so on the rare miss the `wait` blocked forever and wedged the whole suite (observed once under #153 as a 30+ minute hang on a non-TTY/piped run; the foreground suite always completed `1395/0`). Teardown now routes through a shared `bin/_lib-kill-server.sh` helper that escalates SIGTERM → bounded grace poll → uncatchable SIGKILL → reap (a wedge emits a non-fatal WARN rather than hanging or failing), guaranteeing teardown regardless of the server's signal-handling state; a new `bin/cctally-kill-server-test` harness pins the behavior (#153).
29
+
30
+ ### Changed
31
+ - Conversation viewer: subagent (sidechain) threads are now grouped by their originating agent file so parallel subagents render as separate collapsible threads (with task-prompt label, message count, and thread cost) instead of being fused by adjacency; threads nest under a parent message where a real cross-file link exists. Reader items expose a privacy-safe subagent key + parent link (never a raw filesystem path).
32
+ - **Internal performance (no user-facing change): the conversation-viewer search endpoint now dedups, pages, and counts entirely in SQL instead of materializing every match in Python.** `/api/conversation/search` (and the `_lib_conversation_query` FTS/LIKE kernels behind it) previously ran an unbounded `SELECT` that built a hit object — and, for FTS, a `snippet()` string; for LIKE, the full row `text` — for *every* corpus match, then deduped by `(session_id, uuid)` and sliced one page in Python, so latency and memory scaled with the number of matches rather than the page size. The match set is now deduped via a SQL window function (`ROW_NUMBER() OVER (PARTITION BY session_id, uuid …)`, keeping the same first-occurrence row as before), paged with `LIMIT/OFFSET`, and the exact post-dedup `total` is a separate `COUNT(*)` over `SELECT DISTINCT session_id, uuid`; snippet/text generation is deferred to a second query covering only the page's rowids — so Python never holds more than one page of hits/snippets regardless of corpus match count. The JSON response (`{query, mode, hits, total}`, deduped by `(session_id, uuid)`, cost-once) is byte-identical (the conversation-query unit suite and the `bin/cctally-conversation-test` search goldens are unchanged), so there is nothing to do on upgrade (#149).
33
+ - **Internal refactor (no user-facing change): the two per-vendor budget-milestone tables are now one vendor-tagged `budget_milestones` table.** The structurally-identical Claude and Codex budget-milestone tables (`budget_milestones` keyed on `week_start_at`, `codex_budget_milestones` keyed on `period_start_at`) are merged by a new stats migration `012_unify_budget_milestones_vendor` into a single `budget_milestones` with a `vendor` column (`'claude'`/`'codex'`), the renamed `period_start_at` key, and `UNIQUE(vendor, period_start_at, period, threshold)` — history, `alerted_at`, and `period` are preserved verbatim and the migration is idempotent / partial-state-safe (the Codex table is dropped). The `budget` and `codex_budget` desktop-alert axes stay two distinct axes but now share the one table (filtered `WHERE vendor=?`), and the parallel insert / dashboard envelope / reconcile-on-set / firing code collapses to a single vendor-parameterized path with the two `maybe_record_*` entry points kept as thin vendor adapters. Also folds in a dashboard fix: the Settings `POST /api/settings` budget-reconcile trigger now fires on a changed `period` (parity with the CLI `budget set --period` path). Alert ids, dashboard envelope bytes, and notification text are unchanged (no frontend bundle change), so there is nothing to do on upgrade — the merge runs automatically on the next DB open (#143).
34
+ - **The `0700` data-dir hardening now also covers a stats-first cold start.** The owner-only data-dir permission shipped in 1.28.0 was applied when `cache.db` was opened, but a cold start that opened `stats.db` first (e.g. `record-usage` on a fresh machine) materialized the directory at the default umask and left it that way until the next `cache.db` open. The `0700` chmod now lives in the shared `ensure_dirs()` primitive (best-effort, swallowing `OSError`) that every `stats.db` open runs through, with the `cache.db` open keeping its own chmod as a backstop — so the data dir is owner-only regardless of which database is touched first. Posture-only; no action needed on upgrade (#150).
35
+ - **Internal refactor (no user-facing change): the three local-dashboard conversation GET handlers (`/api/conversations`, `/api/conversation/<id>`, `/api/conversation/search`) now share one `_run_conversation_query` scaffold for the open-cache → run-query → close → 500-envelope lifecycle (previously triplicated), and the single-value query-string string parse routes through a new `_qs_str` helper (the string sibling of the existing `_qs_int`).** Status codes, JSON bodies, the `cache unavailable:` / `<type>: <msg>` 500 envelopes, and the reader's 404 are byte-identical — the conversation-endpoint, conversation-query, and dashboard golden suites are unchanged (a new test also pins the cache-open-failure 500 across all three routes). Purely a maintainability / de-duplication change; nothing to do on upgrade (#151).
36
+ - **Internal performance (no user-facing change): the cache sync now parses each changed session JSONL file once per sync instead of twice, and a `cache-sync --rebuild` / truncation re-ingest clears the conversation full-text search index without the per-row delete-trigger storm.** Cost rows and conversation message rows are now produced from a single fused pass over each changed file (previously the cost walk and the conversation walk each re-read and re-parsed the same byte range), and the rebuild/truncation full-clear drops the FTS sync triggers, truncates, then resets the index with one `'delete-all'` instead of firing an FTS shadow-write per row inside the held cache lock — on a large index (≈850k rows) the full-clear dropped from ~8.5s to ~0.3s of held-lock time. Output is byte-identical (cost totals, conversation rows, and the search index are unchanged; the reconcile and conversation-ingest suites stay green), so there is nothing to do on upgrade (#138).
37
+
8
38
  ## [1.28.0] - 2026-06-06
9
39
 
10
40
  ### Added
@@ -168,13 +168,15 @@ _parse_usage_entries = _lib_jsonl._parse_usage_entries
168
168
  _should_replace = _lib_jsonl._should_replace
169
169
 
170
170
  # Conversation-message parser kernel (Plan 1). Pure leaf (stdlib-only), so
171
- # it loads at module-load time alongside _lib_jsonl. ``sync_cache``'s second
172
- # seek-and-walk and the backfill walker both call ``_iter_message_rows``.
171
+ # it loads at module-load time alongside _lib_jsonl. Since #138 the per-file
172
+ # sync ingest goes through the fused ``_iter_sync_entries`` walker (which calls
173
+ # ``_lib_conversation.parse_message_row`` directly); ``_iter_message_rows`` is
174
+ # now used only by ``backfill_conversation_messages``.
173
175
  _lib_conversation = _load_lib("_lib_conversation")
174
176
  _iter_message_rows = _lib_conversation.iter_message_rows
175
177
 
176
- # Shared by sync_cache's second seek-and-walk AND backfill_conversation_messages
177
- # so the column list, placeholders, and tuple order live in ONE place — a column
178
+ # Shared by the fused per-file walk AND backfill_conversation_messages so the
179
+ # column list, placeholders, and tuple order live in ONE place — a column
178
180
  # add/reorder can't silently desync the two ingest paths (which would land
179
181
  # values in the wrong columns on whichever path was missed).
180
182
  _CONV_INSERT_SQL = (
@@ -195,6 +197,52 @@ def _conv_row_tuple(m, path_str):
195
197
  )
196
198
 
197
199
 
200
+ def _iter_sync_entries(fh, path_str):
201
+ """Fused single-pass sync walker (#138). Yields
202
+ ``(byte_offset, cost_or_None, msgrow_or_None)`` for each JSONL line from
203
+ ``fh``'s current position that produces a cost entry and/or a conversation
204
+ message row.
205
+
206
+ Each line is read once (readline()+tell()) and ``json.loads``-parsed ONCE,
207
+ then classified by both pure per-line parsers:
208
+
209
+ * ``cost_or_None`` is ``(UsageEntry, msg_id, req_id)`` when the line is a
210
+ billable assistant entry (``_lib_jsonl.parse_cost_entry``), else None.
211
+ * ``msgrow_or_None`` is a ``MessageRow`` when the line is a user/assistant
212
+ turn carrying a uuid (``_lib_conversation.parse_message_row``), else None.
213
+
214
+ The two are independent — a normal assistant line yields both. This replaces
215
+ the former cost walk + re-seek-and-walk over the identical byte span: with a
216
+ single walk the "identical span" invariant is structural (one stop point),
217
+ not a prose-enforced ``mrow.byte_offset >= final_offset`` runtime break. A
218
+ partial mid-write tail line (no trailing newline) rewinds the handle and
219
+ stops, so ``fh.tell()`` after the loop is the cost cursor's ``final_offset``
220
+ and the next sync re-reads the line once the newline lands.
221
+ """
222
+ while True:
223
+ offset = fh.tell()
224
+ line = fh.readline()
225
+ if not line:
226
+ return
227
+ if not line.endswith("\n"):
228
+ # Partial tail line — writer is mid-flight. Rewind so the next sync
229
+ # re-reads this line once the newline is in place (and so fh.tell()
230
+ # reports the cost cursor's stop, never past the partial).
231
+ fh.seek(offset)
232
+ return
233
+ stripped = line.strip()
234
+ if not stripped:
235
+ continue
236
+ try:
237
+ obj = json.loads(stripped)
238
+ except json.JSONDecodeError:
239
+ continue
240
+ cost = _lib_jsonl.parse_cost_entry(obj, path_str)
241
+ mrow = _lib_conversation.parse_message_row(obj, offset)
242
+ if cost is not None or mrow is not None:
243
+ yield offset, cost, mrow
244
+
245
+
198
246
  def _iter_claude_jsonl_files():
199
247
  """Yield every Claude transcript ``*.jsonl`` under each data dir's
200
248
  ``projects/`` tree. Shared by ``sync_cache`` and the conversation backfill
@@ -208,6 +256,10 @@ _cctally_db_sib = _load_lib("_cctally_db")
208
256
  add_column_if_missing = _cctally_db_sib.add_column_if_missing
209
257
  _run_pending_migrations = _cctally_db_sib._run_pending_migrations
210
258
  _CACHE_MIGRATIONS = _cctally_db_sib._CACHE_MIGRATIONS
259
+ # Storm-free conversation_messages + FTS full-clear (#138). Owns the trigger
260
+ # drop/recreate dance so the per-row delete trigger never fires O(rows) under
261
+ # the held lock on a rebuild / truncation escalation.
262
+ clear_conversation_messages = _cctally_db_sib.clear_conversation_messages
211
263
 
212
264
 
213
265
  # === BEGIN MOVED REGIONS ===
@@ -542,21 +594,30 @@ def sync_cache(
542
594
  # Plan 1: conversation_messages shares the cost path's lifecycle.
543
595
  # A rebuild re-derives the whole cache from on-disk JSONL, so the
544
596
  # message index is wiped here (inside the held lock) and the
545
- # per-file second seek-and-walk repopulates it. The FTS delete
546
- # trigger empties conversation_fts row-by-row in lockstep.
547
- conn.execute("DELETE FROM conversation_messages")
597
+ # per-file fused walk repopulates it. clear_conversation_messages
598
+ # drops the FTS triggers, truncates, and clears the index via
599
+ # 'delete-all' so the per-row delete trigger never storms O(rows)
600
+ # under the lock (#138) — NOT a bare DELETE that fires conv_fts_ad
601
+ # per row.
602
+ clear_conversation_messages(conn)
548
603
  # Clear the walk-complete sentinel atomically with the wipe
549
604
  # (cctally-dev#93, D5/D2): a stale "complete" marker must never
550
605
  # survive a destructive rebuild. The end-of-loop write below
551
606
  # re-establishes it only after this rebuild's clean walk.
552
607
  conn.execute("DELETE FROM cache_meta WHERE key='claude_ingest_walk_complete'")
553
608
  # Issue #139: a rebuild walks every file from offset 0, so the
554
- # per-file second seek-and-walk below repopulates the whole message
609
+ # per-file fused walk below repopulates the whole message
555
610
  # index — that satisfies any deferred existing-install backfill.
556
611
  # Drop the pending flag here so the post-rebuild sync does not also
557
612
  # run a redundant (idempotent but wasteful) offset-0 backfill pass.
558
613
  conn.execute(
559
614
  "DELETE FROM cache_meta WHERE key='conversation_backfill_pending'")
615
+ # Issue #164: a rebuild also clears + repopulates the message index
616
+ # id-aware via the normal offset-0 walk, so drop the 003 reingest
617
+ # flag too — the post-rebuild sync must not run a redundant
618
+ # (idempotent but wasteful) clear+backfill pass.
619
+ conn.execute(
620
+ "DELETE FROM cache_meta WHERE key='conversation_reingest_pending'")
560
621
  conn.commit()
561
622
  eprint("[cache-sync] rebuild: cleared Claude cached entries")
562
623
 
@@ -592,6 +653,36 @@ def sync_cache(
592
653
  )
593
654
  conn.commit()
594
655
 
656
+ # Issue #164: consume the deferred conversation_messages re-ingest.
657
+ # Cache migration 003 is flag-only — it sets
658
+ # ``conversation_reingest_pending`` rather than clearing inline
659
+ # (clearing in the handler would run WITHOUT this flock, racing a
660
+ # concurrent sync, and would empty the reader on stats-only /
661
+ # eager-migration opens or ``dashboard --no-sync``). The destructive
662
+ # clear + id-aware offset-0 re-derive live here, UNDER the held
663
+ # flock. Distinct from 002's backfill-without-clear: 003 is
664
+ # clear-then-backfill, re-deriving the WHOLE index id-aware so
665
+ # existing history pairs tool_use<->tool_result. The clear is
666
+ # storm-free (#138); the offset-0 backfill walks every JSONL from 0;
667
+ # the flag is dropped LAST so a crash mid-walk re-runs cleanly on the
668
+ # next sync. Never on the rebuild path (which already wipes +
669
+ # repopulates the index id-aware via the normal walk).
670
+ try:
671
+ _reingest = conn.execute(
672
+ "SELECT 1 FROM cache_meta "
673
+ "WHERE key='conversation_reingest_pending'"
674
+ ).fetchone() is not None
675
+ except sqlite3.OperationalError:
676
+ _reingest = False
677
+ if _reingest:
678
+ clear_conversation_messages(conn)
679
+ backfill_conversation_messages(conn)
680
+ conn.execute(
681
+ "DELETE FROM cache_meta "
682
+ "WHERE key='conversation_reingest_pending'"
683
+ )
684
+ conn.commit()
685
+
595
686
  paths: list[pathlib.Path] = list(_iter_claude_jsonl_files())
596
687
  stats.files_total = len(paths)
597
688
 
@@ -693,10 +784,11 @@ def sync_cache(
693
784
  conn.execute("DELETE FROM session_entries")
694
785
  # Plan 1: truncation escalates to a full re-ingest of EVERY file,
695
786
  # so conversation_messages is wiped here (parallel to the
696
- # session_entries full-reset) and the per-file second seek-and-walk
697
- # repopulates it from offset 0. Mirrors the cost path's lifecycle;
698
- # the FTS delete trigger empties conversation_fts in lockstep.
699
- conn.execute("DELETE FROM conversation_messages")
787
+ # session_entries full-reset) and the per-file fused walk
788
+ # repopulates it from offset 0. Storm-free clear (#138): drop FTS
789
+ # triggers truncate → 'delete-all' recreate, so conv_fts_ad
790
+ # never fires O(rows) inside the held lock.
791
+ clear_conversation_messages(conn)
700
792
  # Clear the walk-complete sentinel atomically with the truncation
701
793
  # full-reset (cctally-dev#93, D5/D2): the cache is being wiped, so
702
794
  # any "complete" marker is now stale. The end-of-loop write below
@@ -772,54 +864,50 @@ def sync_cache(
772
864
  try:
773
865
  with open(jp, "r", encoding="utf-8", errors="replace") as fh:
774
866
  fh.seek(start_offset)
775
- for offset, entry, msg_id, req_id in _iter_jsonl_entries_with_offsets(fh, str(jp)):
776
- usage = entry.usage
777
- inp = int(usage.get("input_tokens", 0) or 0)
778
- out = int(usage.get("output_tokens", 0) or 0)
779
- cc = int(usage.get("cache_creation_input_tokens", 0) or 0)
780
- cr = int(usage.get("cache_read_input_tokens", 0) or 0)
781
- extras = {
782
- k: v for k, v in usage.items()
783
- if k not in (
784
- "input_tokens", "output_tokens",
785
- "cache_creation_input_tokens",
786
- "cache_read_input_tokens",
787
- )
788
- }
789
- rows.append((
790
- path_str,
791
- offset,
792
- entry.timestamp.astimezone(dt.timezone.utc).isoformat(),
793
- entry.model,
794
- msg_id,
795
- req_id,
796
- inp, out, cc, cr,
797
- json.dumps(extras, sort_keys=True) if extras else None,
798
- entry.cost_usd,
799
- ))
800
- # ``final_offset`` is the cost walk's stop. Capture it into a
801
- # local int BEFORE the conversation walk below re-seeks the
802
- # handle — the value is what session_files.last_byte_offset
803
- # is written from, so it must reflect the COST walk's
804
- # position, never the conversation walk's. (#Plan1 Task 4
805
- # cursor-consistency invariant.)
867
+ # Fused single-pass walk (#138): cost rows AND conversation
868
+ # message rows come from ONE parse of each line. An assistant
869
+ # line yields both; a user line yields only a message row.
870
+ # This replaces the former cost walk + re-seek conversation
871
+ # walk over the identical span — the "identical span"
872
+ # invariant is now structural (a single stop point) rather
873
+ # than a prose-enforced ``>= final_offset`` runtime break.
874
+ for offset, cost, mrow in _iter_sync_entries(fh, path_str):
875
+ if cost is not None:
876
+ entry, msg_id, req_id = cost
877
+ usage = entry.usage
878
+ inp = int(usage.get("input_tokens", 0) or 0)
879
+ out = int(usage.get("output_tokens", 0) or 0)
880
+ cc = int(usage.get("cache_creation_input_tokens", 0) or 0)
881
+ cr = int(usage.get("cache_read_input_tokens", 0) or 0)
882
+ extras = {
883
+ k: v for k, v in usage.items()
884
+ if k not in (
885
+ "input_tokens", "output_tokens",
886
+ "cache_creation_input_tokens",
887
+ "cache_read_input_tokens",
888
+ )
889
+ }
890
+ rows.append((
891
+ path_str,
892
+ offset,
893
+ entry.timestamp.astimezone(dt.timezone.utc).isoformat(),
894
+ entry.model,
895
+ msg_id,
896
+ req_id,
897
+ inp, out, cc, cr,
898
+ json.dumps(extras, sort_keys=True) if extras else None,
899
+ entry.cost_usd,
900
+ ))
901
+ if mrow is not None:
902
+ conv_rows.append(_conv_row_tuple(mrow, path_str))
903
+ # ``final_offset`` is the single walk's stop — captured AFTER
904
+ # the loop drains (or rewinds a partial mid-write tail line).
905
+ # It is what session_files.last_byte_offset is written from,
906
+ # so it must reflect the cost cursor's position; with the
907
+ # fused walk there is exactly one stop point shared by the
908
+ # cost and conversation rows (#138 / #Plan1 Task 4
909
+ # cursor-consistency invariant).
806
910
  final_offset = fh.tell()
807
- # --- conversation message ingest (Plan 1) ----------------
808
- # Second seek-and-walk over the SAME
809
- # [start_offset, final_offset] byte region as the
810
- # reconcile-guarded cost walk above, BEFORE the per-file
811
- # cursor advances. Independent of the cost walk: it touches
812
- # only conversation_messages, never session_entries or the
813
- # cost-row build. We re-seek to start_offset and stop the
814
- # moment a message row's byte_offset reaches the cost walk's
815
- # final_offset, so the two walks always cover the identical
816
- # span and a partial mid-write tail line (rewound by the
817
- # parser) is left for the next sync.
818
- fh.seek(start_offset)
819
- for mrow in _iter_message_rows(fh, path_str):
820
- if mrow.byte_offset >= final_offset:
821
- break
822
- conv_rows.append(_conv_row_tuple(mrow, path_str))
823
911
  except OSError as exc:
824
912
  eprint(f"[cache] could not read {jp}: {exc}")
825
913
  walk_clean = False # skipped a file without ingesting (D5a)
@@ -436,6 +436,16 @@ def compute_week_bounds(anchor_dt: dt.datetime, week_start_name: str) -> tuple[d
436
436
  def ensure_dirs() -> None:
437
437
  APP_DIR.mkdir(parents=True, exist_ok=True)
438
438
  LOG_DIR.mkdir(parents=True, exist_ok=True)
439
+ # cache.db holds plaintext conversation prose at rest (Plan 2, spec §5), so
440
+ # the data dir must be 0700. Hardening it here in the shared primitive means
441
+ # a stats-first cold start — open_db() materializing APP_DIR before any
442
+ # cache.db open (e.g. record-usage) — is covered, not only the
443
+ # open_cache_db backstop (which keeps its own chmod). Best-effort and
444
+ # idempotent: swallow OSError + continue (issue #150).
445
+ try:
446
+ os.chmod(APP_DIR, 0o700)
447
+ except OSError as exc:
448
+ eprint(f"[core] could not chmod data dir 0700 ({exc}); continuing")
439
449
 
440
450
 
441
451
  # === Alerts validation cluster ======================================
@@ -1261,25 +1271,28 @@ def open_db() -> sqlite3.Connection:
1261
1271
  # (set-then-dispatch invariant); NULL = "recorded without dispatch" (the
1262
1272
  # forward-only-from-set reconcile path) OR "not yet dispatched", never
1263
1273
  # "delivery failed".
1264
- # Schema owned by migration 011_budget_milestone_period_keys (the `period`
1265
- # column + the period-inclusive UNIQUE; see _cctally_db.py). The live CREATE
1266
- # below makes the new shape on fresh installs (dispatcher fast-stamps 011);
1267
- # pre-011 DBs trip the migration's rename-recreate-copy. `period` is the
1268
- # configured period noun at crossing ('calendar-week'|'calendar-month'|
1269
- # 'subscription-week'); NULL = pre-011 unknown.
1274
+ # Unified vendor-tagged table (#143): one row per (vendor, period_start_at,
1275
+ # period, threshold). `vendor` 'claude'|'codex'. `period_start_at` is the
1276
+ # resolved period-window start instant (subscription-week OR calendar
1277
+ # period-start). `period` is the configured period at crossing; NULL = pre-012
1278
+ # unknown. Owned by migration 012_unify_budget_milestones_vendor (merge of the
1279
+ # former budget_milestones + codex_budget_milestones). The Codex table is NO
1280
+ # LONGER live-created here — migration 012 drops it and this CREATE must not
1281
+ # resurrect it; migration 011 is hardened to skip it when absent (#143).
1270
1282
  conn.execute(
1271
1283
  """
1272
1284
  CREATE TABLE IF NOT EXISTS budget_milestones (
1273
1285
  id INTEGER PRIMARY KEY AUTOINCREMENT,
1274
- week_start_at TEXT NOT NULL,
1275
- period TEXT, -- configured period at crossing; NULL = pre-011 unknown (migration 011)
1286
+ vendor TEXT NOT NULL,
1287
+ period_start_at TEXT NOT NULL,
1288
+ period TEXT,
1276
1289
  threshold INTEGER NOT NULL,
1277
1290
  budget_usd REAL NOT NULL,
1278
1291
  spent_usd REAL NOT NULL,
1279
1292
  consumption_pct REAL NOT NULL,
1280
1293
  crossed_at_utc TEXT NOT NULL,
1281
1294
  alerted_at TEXT,
1282
- UNIQUE(week_start_at, period, threshold)
1295
+ UNIQUE(vendor, period_start_at, period, threshold)
1283
1296
  )
1284
1297
  """
1285
1298
  )
@@ -1352,46 +1365,6 @@ def open_db() -> sqlite3.Connection:
1352
1365
  """
1353
1366
  )
1354
1367
 
1355
- # ── codex_budget_milestones (per-vendor Codex budget crossings) ──────────
1356
- # Plain CREATE TABLE IF NOT EXISTS, NO migration handler / backfill — the
1357
- # same posture as `budget_milestones` / `projected_milestones` /
1358
- # `project_budget_milestones` (write-once, forward-only, framework-untracked;
1359
- # calendar-period-codex-budgets feature, spec §6). The dedup key is keyed on
1360
- # `period_start_at` — the resolved period-window START instant stored as the
1361
- # `isoformat(timespec="seconds")` `+00:00` offset form (NOT a `Z` suffix),
1362
- # e.g. calendar-month June → `2026-06-01T00:00:00+00:00` — NOT a subscription
1363
- # week:
1364
- # Codex has no Anthropic week, so the budget runs over a calendar period
1365
- # (calendar-week / calendar-month). Rolling to the next period yields a fresh
1366
- # `period_start_at` → fresh crossings under UNIQUE(period_start_at, period,
1367
- # threshold) (the budget-pattern reset handling — hence NO `reset_event_id`
1368
- # segment column). `budget_usd` snapshots the Codex target AT crossing so the
1369
- # dashboard renders "$210 of $200" from the ROW, not from live config that
1370
- # may have changed since (the Codex P0-4 lesson, baked into the sibling
1371
- # tables). `alerted_at` is stamped BEFORE the osascript Popen (set-then-
1372
- # dispatch invariant); NULL = "recorded without dispatch" (the forward-only-
1373
- # from-set reconcile path) OR "not yet dispatched", never "delivery failed".
1374
- # Schema owned by migration 011_budget_milestone_period_keys (the `period`
1375
- # column + the period-inclusive UNIQUE; see _cctally_db.py). `period` is the
1376
- # configured Codex period noun at crossing ('calendar-week'|'calendar-
1377
- # month'); NULL = pre-011 unknown.
1378
- conn.execute(
1379
- """
1380
- CREATE TABLE IF NOT EXISTS codex_budget_milestones (
1381
- id INTEGER PRIMARY KEY AUTOINCREMENT,
1382
- period_start_at TEXT NOT NULL, -- resolved period-window start instant (+00:00 offset form, NOT Z)
1383
- period TEXT, -- configured period at crossing; NULL = pre-011 unknown (migration 011)
1384
- threshold INTEGER NOT NULL,
1385
- budget_usd REAL NOT NULL, -- Codex target snapshotted AT crossing
1386
- spent_usd REAL NOT NULL,
1387
- consumption_pct REAL NOT NULL,
1388
- crossed_at_utc TEXT NOT NULL,
1389
- alerted_at TEXT,
1390
- UNIQUE(period_start_at, period, threshold)
1391
- )
1392
- """
1393
- )
1394
-
1395
1368
  # Migration framework dispatcher. Replaces the prior inline gate stack
1396
1369
  # (has_blocks + _migration_done) with the framework's _run_pending_-
1397
1370
  # migrations entry point. See spec §2.3, §5.2 + the migration handlers