PyPI - TeLLMgramBot - Versions diffs - 3.12.0__tar.gz → 3.13.0__tar.gz - Mend

TeLLMgramBot 3.12.0tar.gz → 3.13.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

{tellmgrambot-3.12.0 → tellmgrambot-3.13.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: TeLLMgramBot
-Version: 3.12.0
+Version: 3.13.0
 Summary: LLM-powered Telegram bot (OpenAI + Anthropic)
 Home-page: https://github.com/Digital-Heresy/TeLLMgramBot
 Author: Digital Heresy
@@ -48,6 +48,9 @@ The basic goal of this project is to create a bridge between a Telegram Bot and
 * Token limits measure conversation length and determine when to prune oldest messages to stay within model limits.
   * The bot loads the user's full history across all chats up to 50% of the token budget. In private chats, shared group context fills the remaining budget, enabling the bot to reference group conversations from a private context.
   * This eliminates amnesia when switching between private and group chats.
+* Conversation archive preserves long-term context without consuming token budget.
+  * Older messages are automatically distilled into concise daily summaries (Tier 1), then progressively compressed into monthly digests (Tier 2). Raw messages are never deleted; archive rows surface seamlessly in search results and context loading.
+  * Configurable via `archive_days` (default 60 days before Tier 1 triggers; Tier 2 triggers at 2x this value).
 * Users can manage privacy via two commands:
   * `/forget` - In private chats, clears your full conversation and resets all active sessions. In group chats, removes only your messages and cleans up paired bot replies.
   * `/private` - Toggle private mode (private chats only). When ON, your messages in private chats are excluded from group conversation contexts, enabling selective privacy even in shared groups.
@@ -152,6 +155,7 @@ When the bot is triggered in a group and about to respond (not deferring to anot
    - `db_name`: Optional custom database filename without extension (e.g. `MyBot` creates `MyBot.db`); omit for default `conversations.db`. Use distinct names when running multiple bot instances in the same directory.
    - `token_limit`: Max tokens (optional; defaults to model's maximum)
    - `search_limit`: Max search results (optional; defaults to 30)
+   - `archive_days`: Days before messages are eligible for archival (optional; default 60, minimum 1). Older messages are distilled into daily summaries, then progressively compressed into monthly digests. Once archived their respective raw messages do not return to the LLM context any more, only when searching messages.
    - `tools`: Optional list of webhook and MCP tool definitions (admin-only, private chat only). See [docs/tools.md](docs/tools.md) for schema and examples.
 4. **Disable group privacy mode in BotFather:**
    ```

{tellmgrambot-3.12.0 → tellmgrambot-3.13.0}/README.md RENAMED Viewed

@@ -16,6 +16,9 @@ The basic goal of this project is to create a bridge between a Telegram Bot and
 * Token limits measure conversation length and determine when to prune oldest messages to stay within model limits.
   * The bot loads the user's full history across all chats up to 50% of the token budget. In private chats, shared group context fills the remaining budget, enabling the bot to reference group conversations from a private context.
   * This eliminates amnesia when switching between private and group chats.
+* Conversation archive preserves long-term context without consuming token budget.
+  * Older messages are automatically distilled into concise daily summaries (Tier 1), then progressively compressed into monthly digests (Tier 2). Raw messages are never deleted; archive rows surface seamlessly in search results and context loading.
+  * Configurable via `archive_days` (default 60 days before Tier 1 triggers; Tier 2 triggers at 2x this value).
 * Users can manage privacy via two commands:
   * `/forget` - In private chats, clears your full conversation and resets all active sessions. In group chats, removes only your messages and cleans up paired bot replies.
   * `/private` - Toggle private mode (private chats only). When ON, your messages in private chats are excluded from group conversation contexts, enabling selective privacy even in shared groups.
@@ -120,6 +123,7 @@ When the bot is triggered in a group and about to respond (not deferring to anot
    - `db_name`: Optional custom database filename without extension (e.g. `MyBot` creates `MyBot.db`); omit for default `conversations.db`. Use distinct names when running multiple bot instances in the same directory.
    - `token_limit`: Max tokens (optional; defaults to model's maximum)
    - `search_limit`: Max search results (optional; defaults to 30)
+   - `archive_days`: Days before messages are eligible for archival (optional; default 60, minimum 1). Older messages are distilled into daily summaries, then progressively compressed into monthly digests. Once archived their respective raw messages do not return to the LLM context any more, only when searching messages.
    - `tools`: Optional list of webhook and MCP tool definitions (admin-only, private chat only). See [docs/tools.md](docs/tools.md) for schema and examples.
 4. **Disable group privacy mode in BotFather:**
    ```

{tellmgrambot-3.12.0 → tellmgrambot-3.13.0}/TeLLMgramBot/TeLLMgramBot.py RENAMED Viewed

@@ -22,6 +22,8 @@ from .database import (
     delete_messages_for_chat,
     delete_private_messages_for_user,
     delete_bot_replies_for_user,
+    delete_archive_for_user,
+    delete_archive_for_chat,
     get_shared_group_chat_ids,
     message_id_exists,
     update_message_tg_id,
@@ -30,6 +32,7 @@ from .database import (
     upsert_user,
     wipe_all_data,
 )
+from .archive import run_archival
 from .initialize import (
     INIT_BOT_CONFIG,
     ApiKeyStatus,
@@ -61,11 +64,12 @@ _SEARCH_TOOL = {
     "name": "search_messages",
     "description": (
         "Search the full message history across the user's private chat and shared group chats. "
+        "Results include both raw messages and archived summaries of older content. "
         "Use whenever the user asks who said something, what someone said, or what was discussed. "
         "Always search before claiming a person has no message history -- do not assume from context alone. "
         "Run the search immediately when it would help answer the question -- do not ask the user for permission to search. "
         "All filters are optional -- omit them to retrieve recent messages broadly. "
-        "Results are ordered most-recent-first; to find the earliest message, look at the last result."
+        "Results are ordered most-recent-first by default; use ascending=true for oldest-first."
     ),
     "parameters": {
         "type": "object",
@@ -75,6 +79,7 @@ _SEARCH_TOOL = {
             "chat_query": {"type": "string", "description": "Name of the group chat to search within. Use the exact chat title if known. If multiple chats match, the search will return an ambiguity error asking you to clarify."},
             "date_from": {"type": "string", "description": "Start of time range as ISO datetime (YYYY-MM-DDTHH:MM). For a full day, use T00:00."},
             "date_to": {"type": "string", "description": "End of time range as ISO datetime (YYYY-MM-DDTHH:MM). For a full day, use T23:59."},
+            "ascending": {"type": "boolean", "description": "If true, results are ordered oldest-first. Use for queries like 'what was the first message about X?' or 'earliest mention of Y'."},
         },
         "required": [],
     },
@@ -337,10 +342,13 @@ class TelegramBot:
             # Wipe bot replies linked to this user, then user's own rows across all chats,
             # then any remaining bot replies in the private chat (pre-migration rows).
             await delete_bot_replies_for_user(user_id)
+            await delete_archive_for_user(user_id)
             await delete_messages_for_user(user_id)
+            await delete_archive_for_chat(chat_id)
             await delete_messages_for_chat(chat_id)
         else:
             await delete_bot_replies_for_user(user_id)
+            await delete_archive_for_user(user_id)
             await delete_messages_for_user(user_id)
         # Evict only the Conversations that contain this user's data, not all sessions.
@@ -827,6 +835,7 @@ class TelegramBot:
                 args.get('date_from'),
                 args.get('date_to'),
                 self.llm['search_limit'],
+                bool(args.get('ascending', False)),
             )
             if isinstance(results, list):
                 for r in results:
@@ -837,6 +846,13 @@ class TelegramBot:
                             r['timestamp'] = format_dt(dt)
                         except ValueError:
                             pass
+                # Lazy archival trigger: cap hit may indicate more archivable content.
+                if len(results) == self.llm['search_limit']:
+                    try:
+                        loop = asyncio.get_running_loop()
+                        loop.create_task(run_archival(self.llm))
+                    except RuntimeError:
+                        pass
             return json.dumps(results)
         tool_def = self.webhook_defs.get(tool_call.name)
@@ -943,6 +959,7 @@ class TelegramBot:
         token_limit    = INIT_BOT_CONFIG['token_limit'],
         search_limit   = INIT_BOT_CONFIG['search_limit'],
         persona_temp   = INIT_BOT_CONFIG['persona_temp'],
+        archive_days   = INIT_BOT_CONFIG['archive_days'],
         persona_prompt = INIT_BOT_CONFIG['persona_prompt'],
         key_status: ApiKeyStatus | None = None,
         log_name: str = 'tellmgrambot',
@@ -964,6 +981,9 @@ class TelegramBot:
             persona_temp: LLM temperature (0.0-2.0). If None, defaults to 1.0.
             persona_prompt: System prompt defining the bot's behavior and personality.
             key_status: ApiKeyStatus object indicating available features. If None, calls init_structure().
+            archive_days: Days before messages are eligible for Tier 1 archival (default: 60).
+                          Must be an integer >= 1; invalid values log a warning and fall back to 60.
+                          Tier 2 compression triggers at archive_days * 2.
             webhook_schemas: Provider-compatible tool schema dicts for webhook tools (from build_tool_registry).
                              If None, no webhook tools are registered.
             webhook_defs: Resolved webhook tool definitions keyed by tool name (from build_tool_registry).
@@ -1033,17 +1053,21 @@ class TelegramBot:
         if persona_temp is not None and not (isinstance(persona_temp, (int, float)) and 0.0 <= persona_temp <= 2.0):
             logger.warning(f"Invalid persona_temp '{persona_temp}' (must be a decimal between 0.0 and 2.0), using default 1.0")
             persona_temp = None
+        if archive_days is not None and not (isinstance(archive_days, int) and archive_days >= 1):
+            logger.warning(f"Invalid archive_days '{archive_days}' (must be an integer >= 1), using default 60")
+            archive_days = None
         # Get our LLM spun up with defaults if not defined by user input
         # Tokens as integers measure the length of conversation messages
         self.llm = {
-            'prompt'      : persona_prompt,
-            'chat_model'  : chat_model,
-            'url_model'   : url_model,
-            'token_limit' : token_limit or TokenLimits(chat_model).max_tokens(),
-            'search_limit': search_limit or 30,
-            'temperature' : persona_temp or 1.0,
-            'top_p'       : 0.9
+            'prompt'       : persona_prompt,
+            'chat_model'   : chat_model,
+            'url_model'    : url_model,
+            'token_limit'  : token_limit or TokenLimits(chat_model).max_tokens(),
+            'search_limit' : search_limit or 30,
+            'temperature'  : persona_temp or 1.0,
+            'top_p'        : 0.9,
+            'archive_days' : archive_days if archive_days is not None else 60,
         }
         # Set a rounded-down integer to prune a lengthy conversation by 500 tokens
         # Note if the upper limit is below 500, the lower limit is set to 0
@@ -1117,6 +1141,7 @@ class TelegramBot:
             token_limit     = config['token_limit'],
             search_limit    = config['search_limit'],
             persona_temp    = config['persona_temp'],
+            archive_days    = config['archive_days'],
             persona_prompt  = prompt,
             key_status      = key_status,
             log_name        = log_name,

tellmgrambot-3.13.0/TeLLMgramBot/archive.py ADDED Viewed

@@ -0,0 +1,321 @@
+"""
+Two-tier conversation archive for TeLLMgramBot.
+Tier 1: Key fact extraction - batches of old messages distilled into concise statements,
+        grouped by chat + day. Private chats produce single-user rows; group chats produce
+        multi-speaker rows with a participants JSON array of contributing user_ids.
+Tier 2: Episodic summarization - old Tier 1 rows compressed into thematic digests,
+        grouped by chat + month.
+Raw messages are never deleted; archived_at flags rows to skip during context loading.
+Search still hits raw rows regardless of archived_at.
+"""
+import json
+import logging
+import aiosqlite
+from .database import get_db_path
+from .providers.factory import get_provider
+from .utils import cutoff_iso, now_iso
+logger = logging.getLogger(__name__)
+_archival_running = False
+_TIER1_PROMPT = (
+    "Extract key facts from this conversation. "
+    "Ignore greetings, acknowledgments, and filler. "
+    "Return one concise factual statement per line, no numbering. "
+    "Only include meaningful, specific information. "
+    "Keep each statement under 20 words.\n\n"
+    "Conversation:\n{conversation}"
+)
+_TIER2_PROMPT = (
+    "Summarize these key facts into a concise thematic digest. "
+    "Group related facts together into 2-5 sentences. "
+    "Be specific, not generic. Do not use bullet points.\n\n"
+    "Key facts:\n{facts}"
+)
+async def run_archival(config: dict) -> None:
+    """
+    Run Tier 1 and Tier 2 archival passes. No-op if already running.
+    Called at bot startup and lazily when search_messages hits the result cap.
+    Args:
+        config: Dict with keys: chat_model, and optionally archive_days.
+    """
+    global _archival_running
+    if _archival_running:
+        return
+    _archival_running = True
+    try:
+        await _run_tier1(config)
+        await _run_tier2(config)
+    except Exception as e:
+        logger.error(f"Archival run failed: {e}", exc_info=True)
+    finally:
+        _archival_running = False
+def _get_model(config: dict) -> str:
+    """Return the chat_model from config, or empty string if not set."""
+    return config.get('chat_model', '')
+def _get_archive_days(config: dict) -> int:
+    """Return validated archive_days from config, falling back to 60 on invalid values."""
+    _ad = config.get('archive_days')
+    if _ad is not None:
+        try:
+            days = int(_ad)
+            if days >= 1:
+                return days
+        except (TypeError, ValueError):
+            pass
+        logger.warning(f"ARCHIVE: invalid archive_days '{_ad}', using default 60")
+    return 60
+def _fmt_ts(ts: str) -> str:
+    if not ts:
+        return ''
+    return ts[:16].replace('T', ' ') + ' UTC'
+async def _run_tier1(config: dict) -> None:
+    """
+    Extract key facts from messages older than archive_days into Tier 1 rows.
+    Groups old messages by chat and day, batches each group through the LLM with
+    _TIER1_PROMPT, stores the extracted facts as summary_archive rows, and flags
+    source rows with archived_at. Logs warnings on batch failures but continues
+    processing other batches.
+    Args:
+        config: Dict with keys: chat_model, and optionally archive_days.
+    """
+    model = _get_model(config)
+    if not model:
+        logger.warning("ARCHIVE: no model configured, skipping Tier 1")
+        return
+    after_days = _get_archive_days(config)
+    async with aiosqlite.connect(get_db_path()) as db:
+        cursor = await db.execute(
+            """
+            SELECT m.id, m.chat_id, m.user_id, m.role, m.content, m.created_at,
+                   u.first_name, u.username
+            FROM messages m
+            LEFT JOIN users u ON m.user_id = u.user_id
+            WHERE m.archived_at IS NULL AND m.is_private = 0
+              AND m.created_at < ?
+            ORDER BY m.chat_id, date(m.created_at), m.created_at ASC, m.id ASC
+            """,
+            (cutoff_iso(after_days),),
+        )
+        raw_rows = await cursor.fetchall()
+    if not raw_rows:
+        return
+    batches: dict[tuple, list] = {}
+    for row in raw_rows:
+        key = (row[1], row[5][:10])  # (chat_id, YYYY-MM-DD)
+        batches.setdefault(key, []).append(row)
+    provider = get_provider(model)
+    for (chat_id, day), batch in batches.items():
+        try:
+            await _process_tier1_batch(provider, model, chat_id, day, batch)
+        except Exception as e:
+            logger.warning(f"ARCHIVE: Tier 1 batch {chat_id}/{day} failed: {e}")
+async def _process_tier1_batch(provider, model: str, chat_id: int, day: str, rows: list) -> None:
+    """
+    Extract key facts from a single day's batch of messages via LLM.
+    Formats messages with speaker/timestamp annotations, calls the provider with
+    _TIER1_PROMPT, inserts the result as a summary_archive row (tier 1), and marks
+    source rows with archived_at. For private chats (chat_id > 0), stores as a
+    single-user row; for groups, extracts participant user_ids and stores as
+    multi-speaker row with participants JSON array.
+    Args:
+        provider: LLM provider instance (e.g., from get_provider).
+        model: Model name to pass to provider.complete().
+        chat_id: Telegram chat ID.
+        day: YYYY-MM-DD date string (used for logging).
+        rows: List of message tuples from the database query.
+    """
+    # rows: (id, chat_id, user_id, role, content, created_at, first_name, username)
+    lines = []
+    for row in rows:
+        if row[3] == 'assistant':
+            speaker = 'Assistant'
+        else:
+            first_name, username = row[6], row[7]
+            speaker = first_name or (f"@{username}" if username else f"User {row[2]}")
+        lines.append(f"[{speaker}, {_fmt_ts(row[5])}]: {row[4]}")
+    conversation = '\n'.join(lines)
+    messages = [{"role": "user", "content": _TIER1_PROMPT.format(conversation=conversation)}]
+    result = await provider.complete(model, messages)
+    if not isinstance(result, str) or not result.strip():
+        return
+    user_ids = sorted({row[2] for row in rows if row[3] == 'user'})
+    is_private = chat_id > 0
+    if is_private and len(user_ids) == 1:
+        archive_user_id = user_ids[0]
+        participants_json = None
+    else:
+        archive_user_id = None
+        participants_json = json.dumps(user_ids) if user_ids else None
+    covers_from = rows[0][5]
+    covers_to = rows[-1][5]
+    msg_ids = [row[0] for row in rows]
+    async with aiosqlite.connect(get_db_path()) as db:
+        await db.execute(
+            "INSERT INTO summary_archive "
+            "(chat_id, user_id, participants, tier, content, covers_from, covers_to) "
+            "VALUES (?, ?, ?, 1, ?, ?, ?)",
+            (chat_id, archive_user_id, participants_json, result.strip(), covers_from, covers_to),
+        )
+        placeholders = ','.join('?' * len(msg_ids))
+        await db.execute(
+            f"UPDATE messages SET archived_at = ? WHERE id IN ({placeholders})",
+            [now_iso()] + msg_ids,
+        )
+        await db.commit()
+    logger.info(
+        f"ARCHIVE: Tier 1 stored for chat {chat_id} day {day} "
+        f"({len(rows)} messages -> {len(result.splitlines())} facts)"
+    )
+async def _run_tier2(config: dict) -> None:
+    """
+    Compress old Tier 1 rows into Tier 2 (episodic) summaries.
+    Groups Tier 1 rows older than archive_days * 2 by chat and month, batches each
+    group through the LLM with _TIER2_PROMPT, stores the result as a summary_archive
+    row (tier 2), and flags source Tier 1 rows with archived_at. Logs warnings on
+    batch failures but continues processing other batches.
+    Args:
+        config: Dict with keys: chat_model, and optionally archive_days.
+    """
+    model = _get_model(config)
+    if not model:
+        return
+    after_days = _get_archive_days(config)
+    episode_days = after_days * 2
+    async with aiosqlite.connect(get_db_path()) as db:
+        cursor = await db.execute(
+            """
+            SELECT id, chat_id, user_id, participants, content, covers_from, covers_to
+            FROM summary_archive
+            WHERE tier = 1 AND archived_at IS NULL
+              AND covers_to < ?
+            ORDER BY chat_id, strftime('%Y-%m', covers_from), covers_from ASC
+            """,
+            (cutoff_iso(episode_days),),
+        )
+        tier1_rows = await cursor.fetchall()
+    if not tier1_rows:
+        return
+    batches: dict[tuple, list] = {}
+    for row in tier1_rows:
+        key = (row[1], row[5][:7])  # (chat_id, YYYY-MM)
+        batches.setdefault(key, []).append(row)
+    provider = get_provider(model)
+    for (chat_id, month), batch in batches.items():
+        try:
+            await _process_tier2_batch(provider, model, chat_id, month, batch)
+        except Exception as e:
+            logger.warning(f"ARCHIVE: Tier 2 batch {chat_id}/{month} failed: {e}")
+async def _process_tier2_batch(provider, model: str, chat_id: int, month: str, rows: list) -> None:
+    """
+    Compress a month's worth of Tier 1 summaries into a single thematic digest.
+    Concatenates Tier 1 facts, calls the provider with _TIER2_PROMPT, inserts the
+    result as a summary_archive row (tier 2), and marks source Tier 1 rows with
+    archived_at. Merges participants from all source rows into a single JSON array
+    for the Tier 2 row (unless all rows are attributed to a single user, in which
+    case stores as single-user row).
+    Args:
+        provider: LLM provider instance (e.g., from get_provider).
+        model: Model name to pass to provider.complete().
+        chat_id: Telegram chat ID.
+        month: YYYY-MM month string (used for logging).
+        rows: List of Tier 1 archive tuples from the database query.
+    """
+    # rows: (id, chat_id, user_id, participants, content, covers_from, covers_to)
+    facts = '\n'.join(row[4] for row in rows)
+    messages = [{"role": "user", "content": _TIER2_PROMPT.format(facts=facts)}]
+    result = await provider.complete(model, messages)
+    if not isinstance(result, str) or not result.strip():
+        return
+    all_user_ids: set[int] = set()
+    for row in rows:
+        if row[2] is not None:
+            all_user_ids.add(row[2])
+        if row[3]:
+            try:
+                all_user_ids.update(json.loads(row[3]))
+            except (json.JSONDecodeError, TypeError):
+                pass
+    unique_attributed = {row[2] for row in rows if row[2] is not None}
+    has_multi_speaker = any(row[3] for row in rows)
+    if len(unique_attributed) == 1 and not has_multi_speaker:
+        archive_user_id = next(iter(unique_attributed))
+        participants_json = None
+    else:
+        archive_user_id = None
+        participants_json = json.dumps(sorted(all_user_ids)) if all_user_ids else None
+    covers_from = rows[0][5]
+    covers_to = rows[-1][6]
+    source_ids = [row[0] for row in rows]
+    async with aiosqlite.connect(get_db_path()) as db:
+        await db.execute(
+            "INSERT INTO summary_archive "
+            "(chat_id, user_id, participants, tier, content, covers_from, covers_to) "
+            "VALUES (?, ?, ?, 2, ?, ?, ?)",
+            (chat_id, archive_user_id, participants_json, result.strip(), covers_from, covers_to),
+        )
+        placeholders = ','.join('?' * len(source_ids))
+        await db.execute(
+            f"UPDATE summary_archive SET archived_at = ? WHERE id IN ({placeholders})",
+            [now_iso()] + source_ids,
+        )
+        await db.commit()
+    logger.info(
+        f"ARCHIVE: Tier 2 stored for chat {chat_id} month {month} "
+        f"({len(rows)} Tier 1 rows -> 1 episode)"
+    )

TeLLMgramBot 3.12.0__tar.gz → 3.13.0__tar.gz

TeLLMgramBot 3.12.0tar.gz → 3.13.0tar.gz