PyPI - TeLLMgramBot - Versions diffs - 3.14.2__tar.gz → 3.15.0__tar.gz - Mend

TeLLMgramBot 3.14.2tar.gz → 3.15.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: TeLLMgramBot
-Version: 3.14.2
+Version: 3.15.0
 Summary: LLM-powered Telegram bot (OpenAI + Anthropic)
 Home-page: https://github.com/Digital-Heresy/TeLLMgramBot
 Author: Digital Heresy
@@ -19,6 +19,9 @@ Requires-Dist: tiktoken>=0.12
 Requires-Dist: python-telegram-bot>=20.8
 Requires-Dist: aiosqlite>=0.19
 Requires-Dist: tzdata>=2025.2
+Requires-Dist: pypdf>=6.0
+Requires-Dist: defusedxml>=0.7
+Requires-Dist: charset-normalizer>=3.0
 Dynamic: author
 Dynamic: author-email
 Dynamic: description
@@ -41,6 +44,10 @@ The basic goal of this project is to create a bridge between a Telegram Bot and
 * Pass URLs in [square brackets] and mention how the bot should interpret them.
   * Example: "What do you think of this article? [https://some_site/article]"
   * Uses a separate model (configurable via `url_model`) to handle larger URL content.
+* Share documents and text files for analysis and summarisation.
+  * Supported formats: PDF, plain-text files (.txt, .md, .rst, .csv, .json, etc.), HTML, and XML.
+  * The bot extracts and summarises content, with automatic encoding detection for non-UTF-8 files. Files over 20 MB are rejected.
+  * Can be disabled via `document_processing: false` in config.
 * Ask questions about message history across all your chats using natural language; the bot will search, attribute messages to speakers, and include messages from other bots.
   * Example: "Who said thanks for the breakdown?" or "What did George say about the project?" or "Show me the last few messages."
   * All search filters (speaker, chat, date) are optional. Results are ordered most-recent-first. Configure `search_limit` to control how many results to return (default: 30).
@@ -157,6 +164,7 @@ When the bot is triggered in a group and about to respond (not deferring to anot
    - `token_limit`: Max tokens (optional; defaults to model's maximum)
    - `search_limit`: Max search results (optional; defaults to 30)
    - `archive_days`: Days before messages are eligible for archival (optional; default 60, minimum 1). Older messages are distilled into daily summaries, then progressively compressed into monthly digests. Once archived their respective raw messages do not return to the LLM context any more, only when searching messages.
+   - `document_processing`: Optional bool (default: true). Set to false to disable document and text file summarisation.
    - `allow_local_webhooks`: Set to `true` to permit webhook/MCP URLs targeting loopback or link-local addresses (optional; default `false`). Useful when tools like Home Assistant run on the same host.
    - `tools`: Optional list of webhook and MCP tool definitions (admin-only, private chat only). See [docs/tools.md](docs/tools.md) for schema and examples.
 4. **Disable group privacy mode in BotFather:**

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/README.md RENAMED Viewed

@@ -9,6 +9,10 @@ The basic goal of this project is to create a bridge between a Telegram Bot and
 * Pass URLs in [square brackets] and mention how the bot should interpret them.
   * Example: "What do you think of this article? [https://some_site/article]"
   * Uses a separate model (configurable via `url_model`) to handle larger URL content.
+* Share documents and text files for analysis and summarisation.
+  * Supported formats: PDF, plain-text files (.txt, .md, .rst, .csv, .json, etc.), HTML, and XML.
+  * The bot extracts and summarises content, with automatic encoding detection for non-UTF-8 files. Files over 20 MB are rejected.
+  * Can be disabled via `document_processing: false` in config.
 * Ask questions about message history across all your chats using natural language; the bot will search, attribute messages to speakers, and include messages from other bots.
   * Example: "Who said thanks for the breakdown?" or "What did George say about the project?" or "Show me the last few messages."
   * All search filters (speaker, chat, date) are optional. Results are ordered most-recent-first. Configure `search_limit` to control how many results to return (default: 30).
@@ -125,6 +129,7 @@ When the bot is triggered in a group and about to respond (not deferring to anot
    - `token_limit`: Max tokens (optional; defaults to model's maximum)
    - `search_limit`: Max search results (optional; defaults to 30)
    - `archive_days`: Days before messages are eligible for archival (optional; default 60, minimum 1). Older messages are distilled into daily summaries, then progressively compressed into monthly digests. Once archived their respective raw messages do not return to the LLM context any more, only when searching messages.
+   - `document_processing`: Optional bool (default: true). Set to false to disable document and text file summarisation.
    - `allow_local_webhooks`: Set to `true` to permit webhook/MCP URLs targeting loopback or link-local addresses (optional; default `false`). Useful when tools like Home Assistant run on the same host.
    - `tools`: Optional list of webhook and MCP tool definitions (admin-only, private chat only). See [docs/tools.md](docs/tools.md) for schema and examples.
 4. **Disable group privacy mode in BotFather:**

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/TeLLMgramBot.py RENAMED Viewed

@@ -40,7 +40,7 @@ from .initialize import (
     bind_log_identity,
     init_structure,
 )
-from .message_handlers import handle_greetings, handle_common_queries, handle_url_ask
+from .message_handlers import handle_greetings, handle_common_queries, handle_url_ask, handle_document_message
 from .models import TokenLimits
 from .tools import build_tool_registry, discover_mcp_tools, execute_mcp, execute_webhook
 from .providers.factory import get_provider
@@ -50,16 +50,18 @@ from .utils import exact_word_match, log_error
 logger = logging.getLogger(__name__)
 # Dialog copy - centralised so tests never hard-code these strings
-_MSG_ADMIN_ONLY        = "Sorry, I can't do that for you."
-_MSG_PROCESS_ERROR     = "Sorry, I couldn't process your message! Please contact my creator."
-_MSG_TOOL_RESULT_ERROR = "Sorry, I couldn't process the tool result."
-_MSG_NOT_YOUR_PROMPT   = "Sorry, this prompt is not for you!"
-_MSG_WIPE_PROMPT       = "ALL of my memories will be lost! Are you sure?"
-_MSG_WIPE_COMPLETE     = "Wipe complete. I hope you won't regret this..."
-_MSG_WIPE_CANCELLED    = "Wipe cancelled. Whew, you scared me for a moment!"
-_MSG_FORGET_PROMPT     = "Do you really want me to forget our memories together?"
-_MSG_FORGET_COMPLETE   = "Forget complete. Fresh start it is..."
-_MSG_FORGET_CANCELLED  = "Forget cancelled. Glad you changed your mind!"
+_MSG_ADMIN_ONLY         = "Sorry, I can't do that for you."
+_MSG_PROCESS_ERROR      = "Sorry, I couldn't process your message! Please contact my creator."
+_MSG_TOOL_RESULT_ERROR  = "Sorry, I couldn't process the tool result."
+_MSG_DOC_PROCESSING_OFF = "Sorry, I can't process documents right now."
+_MSG_OFFLINE            = "I'd love to chat, but I am offline at the moment!"
+_MSG_NOT_YOUR_PROMPT    = "Sorry, this prompt is not for you!"
+_MSG_WIPE_PROMPT        = "ALL of my memories will be lost! Are you sure?"
+_MSG_WIPE_COMPLETE      = "Wipe complete. I hope you won't regret this..."
+_MSG_WIPE_CANCELLED     = "Wipe cancelled. Whew, you scared me for a moment!"
+_MSG_FORGET_PROMPT      = "Do you really want me to forget our memories together?"
+_MSG_FORGET_COMPLETE    = "Forget complete. Fresh start it is..."
+_MSG_FORGET_CANCELLED   = "Forget cancelled. Glad you changed your mind!"
 _SEARCH_TOOL = {
     "name": "search_messages",
@@ -409,6 +411,44 @@ class TelegramBot:
                 "from group conversation contexts. Use /private off to disable."
             )
+    async def _get_or_load_conversation(
+        self, chat_id: int, chat_type: str, chat_title: str | None, user_id: int,
+    ) -> Conversation:
+        """
+        Get the Conversation for chat_id, creating it if new, and load/refresh the user's context.
+        Creates a new Conversation keyed by chat_id on first use, refreshes its "Current date
+        and time" line, then loads the user's cross-chat history on first appearance this
+        session (get_past_interaction) or checks for new messages since the last load
+        (refresh_user_context).
+        Args:
+            chat_id: Telegram chat ID.
+            chat_type: 'private', 'group', or 'supergroup'.
+            chat_title: Chat title, or None for private chats.
+            user_id: Telegram user ID triggering this message.
+        Returns:
+            The active Conversation for this chat.
+        """
+        if chat_id not in self.conversations:
+            self.conversations[chat_id] = Conversation(
+                chat_id, chat_type, self.llm['prompt'], self.llm['chat_model'], chat_title,
+            )
+        conv = self.conversations[chat_id]
+        conv.update_datetime()
+        token_budget = floor(self.llm['prune_threshold'] / 2)
+        if user_id not in conv._context_cursor:
+            # First appearance of this user in this session - load their cross-chat history.
+            # Pass bot_id so private chats can also load shared group context.
+            # If no history exists yet, the cursor stays unset so the next message retries.
+            await conv.get_past_interaction(token_budget, user_id, self.telegram['bot_id'])
+        else:
+            # Already loaded - check for new cross-chat messages since last load.
+            await conv.refresh_user_context(user_id, token_budget)
+        return conv
     async def tele_handle_response(self, text: str, msg: Message) -> tuple[str, int | None]:
         """
         Primary function for handling any response including Generative AI, ensuring:
@@ -448,7 +488,7 @@ class TelegramBot:
         """
         # Starting ensures we get some kind of user account details for logging
         if not self._online:
-            return "I'd love to chat, but I am offline at the moment!", None
+            return _MSG_OFFLINE, None
         # Extract identity and context from the message
         user_id    = msg.from_user.id
@@ -461,31 +501,7 @@ class TelegramBot:
         identity = f"@{username}" if username else ' '.join(filter(None, [first_name, last_name]))
         logger.info(f"User {user_id} ({identity}) in {chat_type} Chat {chat_id}{f' ({chat_title})' if chat_title else ''}")
-        # For a new session, create a Conversation keyed by chat_id
-        if chat_id not in self.conversations:
-            self.conversations[chat_id] = Conversation(
-                chat_id,
-                chat_type,
-                self.llm['prompt'],
-                self.llm['chat_model'],
-                chat_title,
-            )
-        conv = self.conversations[chat_id]
-        # Refresh datetime on every message
-        conv.update_datetime()
-        token_budget = floor(self.llm['prune_threshold'] / 2)
-        if user_id not in conv._context_cursor:
-            # First appearance of this user in this session - load their cross-chat history.
-            # Pass bot_id so private chats can also load shared group context.
-            # If no history exists yet, the cursor stays unset so the next message retries.
-            await conv.get_past_interaction(token_budget, user_id, self.telegram['bot_id'])
-        else:
-            # Already loaded - check for new cross-chat messages since last load.
-            await conv.refresh_user_context(user_id, token_budget)
+        conv = await self._get_or_load_conversation(chat_id, chat_type, chat_title, user_id)
         # Surface the replied-to message into context before adding the triggering message.
         await self._surface_replied_to_message(msg, conv)
@@ -695,29 +711,32 @@ class TelegramBot:
             )
             await prune_bot_messages(msg.chat.id)
-    def _exclusive_foreign_mention(self, msg: Message) -> str | None:
+    def _exclusive_foreign_mention(self, msg: Message, caption: bool = False) -> str | None:
         """
         Return the first foreign @mention if all @mention entities are exclusively foreign.
         Used in the reply-to-bot path: when the user threads Kowi's message but addresses
         a different account via @mention, return that account's username so the caller can
         detect a redirect. Returns None when we are also @mentioned (co-mention),
-        when there are no @mention entities, or when msg.entities is absent.
+        when there are no @mention entities, or when the relevant entities list is absent.
         Args:
             msg: The incoming Telegram Message object.
+            caption: True to read msg.caption_entities (document captions) instead of
+                msg.entities (text messages).
         Returns:
             The first foreign @username string (with leading @) if all @mentions are
             foreign, or None if we are co-mentioned or no @mention entities exist.
         """
-        if not msg.entities:
+        entities = msg.caption_entities if caption else msg.entities
+        if not entities:
             return None
         our_username = self.telegram['username'].lower()
         first_foreign = None
-        for entity in msg.entities:
+        for entity in entities:
             if entity.type == MessageEntity.MENTION:
-                mentioned = msg.parse_entity(entity)
+                mentioned = msg.parse_caption_entity(entity) if caption else msg.parse_entity(entity)
                 if mentioned.lstrip('@').lower() == our_username:
                     return None
                 if first_foreign is None:
@@ -747,6 +766,90 @@ class TelegramBot:
         except (TelegramError, AttributeError):
             await msg.reply_text("Got it!")
+    async def _resolve_group_trigger(
+        self, msg: Message, text: str, context: ContextTypes.DEFAULT_TYPE, is_caption: bool = False,
+    ) -> str | None:
+        """
+        Apply group/supergroup trigger rules shared by text messages and document captions.
+        Checked in this order:
+          1. Exclusive foreign mention in a reply-to-foreign-bot thread - yields unconditionally,
+             taking precedence over @username, nickname/initials, and reply-to-bot signals.
+          2. @username mention - strips the mention from text; responds even if a foreign bot
+             is also @mentioned.
+          3. Nickname or initials mention - engages unconditionally.
+          4. Reply-to-bot - weaker signal; yields if exclusively addressed to another account
+             via @mention.
+          5. None of the above - not triggered.
+        Sends a read receipt via _send_read_receipt() before returning text for any triggered path.
+        Args:
+            msg: The Telegram Message in the group/supergroup chat.
+            text: msg.text or msg.caption - the content to evaluate for trigger words.
+            context: The Telegram context, passed through to _send_read_receipt().
+            is_caption: True when text is a document caption, so @mention entities are
+                read from msg.caption_entities instead of msg.entities.
+        Returns:
+            The text to process (username mention stripped if matched), or None if untriggered.
+        """
+        is_reply_to_bot = (
+            msg.reply_to_message is not None and
+            msg.reply_to_message.from_user is not None and
+            msg.reply_to_message.from_user.id == self.telegram['bot_id']
+        )
+        is_reply_to_foreign_bot = (
+            msg.reply_to_message is not None and
+            msg.reply_to_message.from_user is not None and
+            msg.reply_to_message.from_user.is_bot and
+            msg.reply_to_message.from_user.id != self.telegram['bot_id']
+        )
+        if is_reply_to_foreign_bot and self._exclusive_foreign_mention(msg, is_caption):
+            return None
+        if exact_word_match(self.telegram['username'], text):
+            pattern = r'@?\b' + re.escape(self.telegram['username']) + r'\b'
+            text = re.sub(pattern, '', text).strip()
+        elif (
+            exact_word_match(self.telegram['nickname'], text) or
+            exact_word_match(self.telegram['initials'], text)
+        ):
+            pass
+        elif is_reply_to_bot:
+            if self._exclusive_foreign_mention(msg, is_caption):
+                return None
+        else:
+            return None
+        await self._send_read_receipt(msg, context)
+        return text
+    async def _send_chunked_reply(
+        self, msg: Message, text: str, conv: Conversation | None = None, assistant_db_id: int | None = None,
+    ) -> None:
+        """
+        Split text into Telegram-sized chunks and send them sequentially.
+        Tracks each sent chunk's Telegram message ID in conv._loaded_message_ids (if conv is
+        given) and persists the last chunk's ID to assistant_db_id via update_message_tg_id
+        (if given), so cross-session tier 2 dedup can find the bot's own reply. A small pause
+        between sends reduces risk of Telegram rate limiting.
+        Args:
+            msg: The Telegram Message to reply to.
+            text: The full response text to chunk and send.
+            conv: The active Conversation, if any, for in-session message ID tracking.
+            assistant_db_id: The DB row id of the stored assistant message, if any.
+        """
+        chunk_length = MessageLimit.MAX_TEXT_LENGTH - 1
+        chunks = [text[i:i+chunk_length] for i in range(0, len(text), chunk_length)]
+        for chunk in chunks:
+            sent = await msg.reply_text(chunk)
+            if sent:
+                if conv:
+                    conv._loaded_message_ids.add(sent.message_id)
+                if assistant_db_id:
+                    await update_message_tg_id(assistant_db_id, sent.message_id)
+            await asyncio.sleep(0.12)  # small pause to reduce risk of rate limiting
     async def tele_handle_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE):
         """
         Route incoming Telegram messages to appropriate handlers based on chat type and trigger conditions.
@@ -804,66 +907,114 @@ class TelegramBot:
         response = _MSG_PROCESS_ERROR
         assistant_db_id = None
         if chat.type == 'supergroup' or chat.type == 'group':
-            is_reply_to_bot = (
-                msg.reply_to_message is not None and
-                msg.reply_to_message.from_user is not None and
-                msg.reply_to_message.from_user.id == self.telegram['bot_id']
-            )
-            is_reply_to_foreign_bot = (
-                msg.reply_to_message is not None and
-                msg.reply_to_message.from_user is not None and
-                msg.reply_to_message.from_user.is_bot and
-                msg.reply_to_message.from_user.id != self.telegram['bot_id']
-            )
-            # In a foreign-bot reply thread, an exclusive @mention of another account
-            # takes absolute precedence over any nickname/initials match in the text.
-            if is_reply_to_foreign_bot and self._exclusive_foreign_mention(msg):
-                return
-            if exact_word_match(self.telegram['username'], msg.text):
-                # Explicit @username mention: strongest signal - respond even if another
-                # bot is also @mentioned (both may be intentionally addressed).
-                pattern = r'@?\b' + re.escape(self.telegram['username']) + r'\b'
-                new_text = re.sub(pattern, '', msg.text).strip()
-                await self._send_read_receipt(msg, context)
-                response, assistant_db_id = await self.tele_handle_response(new_text, msg)
-            elif (
-                exact_word_match(self.telegram['nickname'], msg.text) or
-                exact_word_match(self.telegram['initials'], msg.text)
-            ):
-                # Nickname/initials: always engage - no reliable way to distinguish
-                # our name as addressee vs topic from text position alone.
-                await self._send_read_receipt(msg, context)
-                response, assistant_db_id = await self.tele_handle_response(msg.text, msg)
-            elif is_reply_to_bot:
-                # Reply-to-bot: weaker signal - yield silently if the message is
-                # exclusively addressed to a foreign account via @mention.
-                if self._exclusive_foreign_mention(msg):
-                    return
-                await self._send_read_receipt(msg, context)
-                response, assistant_db_id = await self.tele_handle_response(msg.text, msg)
-            else:
+            triggered_text = await self._resolve_group_trigger(msg, msg.text, context)
+            if triggered_text is None:
                 return
+            response, assistant_db_id = await self.tele_handle_response(triggered_text, msg)
         elif chat.type == 'private':
             response, assistant_db_id = await self.tele_handle_response(msg.text, msg)
         else:
             return
-        # Split into smaller chunks since Telegram messages have a maximum text length (likely 4096)
-        chunk_length = MessageLimit.MAX_TEXT_LENGTH - 1
-        chunks = [response[i:i+chunk_length] for i in range(0, len(response), chunk_length)]
+        # Persist this chunk's Telegram message ID on the assistant row. Each call overwrites
+        # the previous, so only the last chunk's ID is retained for cross-session tier 2 dedup.
+        # Tier 1 covers all chunks in-session via _loaded_message_ids.
         conv = self.conversations.get(msg.chat.id)
-        for chunk in chunks:
-            sent = await msg.reply_text(chunk)
-            if sent:
-                if conv:
-                    conv._loaded_message_ids.add(sent.message_id)
-                # Persist this chunk's Telegram message ID on the assistant row.
-                # Each call overwrites the previous, so only the last chunk's ID is
-                # retained for cross-session tier 2 dedup. Tier 1 covers all chunks
-                # in-session via _loaded_message_ids.
-                if assistant_db_id:
-                    await update_message_tg_id(assistant_db_id, sent.message_id)
-            await asyncio.sleep(0.12)  # small pause to reduce risk of rate limiting
+        await self._send_chunked_reply(msg, response, conv, assistant_db_id)
+    async def tele_handle_document(self, update: Update, context: ContextTypes.DEFAULT_TYPE):
+        """
+        Route Telegram document messages through the document summarisation pipeline.
+        Group trigger conditions (caption @mention, nickname/initials match, or reply-to-bot)
+        are resolved via the shared _resolve_group_trigger() also used by tele_handle_message,
+        including the exclusive-foreign-mention yield on reply-to-bot threads. Silently ignores
+        documents in channels, edited messages, and in groups/supergroups where no trigger
+        condition matched. Once triggered, respects the same global online/offline gate as
+        tele_handle_response() (set via /start, /stop) - replies with the offline message
+        rather than processing while offline. When document_processing is disabled in config,
+        replies with a friendly message instead of processing - but only when the message was
+        otherwise triggered (private chat, or a matched group trigger); untriggered group
+        documents still yield silently regardless of the flag. Files over 20 MB receive a
+        friendly error before download.
+        The user message stored in DB is '[Document: filename] caption'; document
+        bytes are never persisted. Respects is_private for cross-chat context isolation.
+        Args:
+            update: The Telegram Update containing the document message.
+            context: The Telegram context for downloading files and sending replies.
+        """
+        validated = await self.tele_validate(update)
+        if not validated:
+            return
+        (msg, chat, user) = validated
+        caption = msg.caption or ''
+        if chat.type in ('group', 'supergroup'):
+            triggered_caption = await self._resolve_group_trigger(msg, caption, context, is_caption=True)
+            if triggered_caption is None:
+                return
+            caption = triggered_caption
+        elif chat.type != 'private':
+            return
+        if not self._online:
+            await msg.reply_text(_MSG_OFFLINE)
+            return
+        if not self.llm.get('document_processing', True):
+            await msg.reply_text(_MSG_DOC_PROCESSING_OFF)
+            return
+        if msg.document.file_size and msg.document.file_size > 20_000_000:
+            await msg.reply_text("That file is too large for me to read - please keep it under 20 MB.")
+            return
+        chat_id   = chat.id
+        chat_type = chat.type
+        user_id   = user.id
+        username   = user.username
+        first_name = user.first_name or ''
+        last_name  = user.last_name or ''
+        conv = await self._get_or_load_conversation(chat_id, chat_type, chat.title, user_id)
+        user_private_mode = await get_private_mode(user_id)
+        is_private = (chat_type == 'private') and user_private_mode
+        filename  = msg.document.file_name or 'document'
+        user_text = f"[Document: {filename}] {caption}".strip() if caption else f"[Document: {filename}]"
+        user_msg_id = await conv.add_user_message(
+            user_text, user_id, username, first_name, last_name, is_private, msg.message_id
+        )
+        await msg.reply_text("Sure, give me a moment to read that...")
+        doc_file   = await context.bot.get_file(msg.document.file_id)
+        file_bytes = bytes(await doc_file.download_as_bytearray())
+        mime_type  = msg.document.mime_type or ''
+        reply = await handle_document_message(
+            file_bytes, mime_type, filename, caption, self.llm['url_model'], conv.system_content
+        )
+        assistant_db_id = await conv.add_assistant_message(
+            reply,
+            self.telegram['bot_id'],
+            self.telegram['username'],
+            self.telegram['first_name'],
+            self.telegram['last_name'],
+            is_private,
+            user_msg_id,
+        )
+        token_count = await conv.get_message_token_count()
+        if token_count > self.llm['prune_threshold']:
+            await conv.prune_conversation(self.llm['prune_back_to'])
+        await self._send_chunked_reply(msg, reply, conv, assistant_db_id)
     async def tele_validate(self, update: Update) -> tuple[Message, Chat, User] | None:
         """
@@ -1097,8 +1248,9 @@ class TelegramBot:
         token_limit    = INIT_BOT_CONFIG['token_limit'],
         search_limit   = INIT_BOT_CONFIG['search_limit'],
         persona_temp   = INIT_BOT_CONFIG['persona_temp'],
-        archive_days   = INIT_BOT_CONFIG['archive_days'],
-        persona_prompt = INIT_BOT_CONFIG['persona_prompt'],
+        archive_days        = INIT_BOT_CONFIG['archive_days'],
+        document_processing = INIT_BOT_CONFIG['document_processing'],
+        persona_prompt      = INIT_BOT_CONFIG['persona_prompt'],
         key_status: ApiKeyStatus | None = None,
         instance_name: str | None = None,
         webhook_schemas: list | None = None,
@@ -1180,6 +1332,7 @@ class TelegramBot:
         self.telegram['app'].add_handler(CommandHandler('private', self.tele_private_command))
         self.telegram['app'].add_handler(MessageHandler(filters.COMMAND, self.tele_unknown_command))
         self.telegram['app'].add_handler(MessageHandler(filters.TEXT & ~filters.UpdateType.EDITED_MESSAGE, self.tele_handle_message))
+        self.telegram['app'].add_handler(MessageHandler(filters.Document.ALL & ~filters.UpdateType.EDITED_MESSAGE, self.tele_handle_document))
         self.telegram['app'].add_error_handler(self.tele_error)
         # Validate optional config values before storing; warn and fall back to defaults on bad input
@@ -1199,14 +1352,15 @@ class TelegramBot:
         # Get our LLM spun up with defaults if not defined by user input
         # Tokens as integers measure the length of conversation messages
         self.llm = {
-            'prompt'       : persona_prompt,
-            'chat_model'   : chat_model,
-            'url_model'    : url_model,
-            'token_limit'  : token_limit or TokenLimits(chat_model).max_tokens(),
-            'search_limit' : search_limit or 30,
-            'temperature'  : persona_temp or 1.0,
-            'top_p'        : 0.9,
-            'archive_days' : archive_days if archive_days is not None else 60,
+            'prompt'              : persona_prompt,
+            'chat_model'          : chat_model,
+            'url_model'           : url_model,
+            'token_limit'         : token_limit or TokenLimits(chat_model).max_tokens(),
+            'search_limit'        : search_limit or 30,
+            'temperature'         : persona_temp or 1.0,
+            'top_p'               : 0.9,
+            'archive_days'        : archive_days if archive_days is not None else 60,
+            'document_processing' : document_processing if document_processing is not None else True,
         }
         # Set a rounded-down integer to prune a lengthy conversation by 500 tokens
         # Note if the upper limit is below 500, the lower limit is set to 0
@@ -1291,8 +1445,9 @@ class TelegramBot:
             token_limit     = config['token_limit'],
             search_limit    = config['search_limit'],
             persona_temp    = config['persona_temp'],
-            archive_days    = config['archive_days'],
-            persona_prompt  = prompt,
+            archive_days        = config['archive_days'],
+            document_processing = config.get('document_processing'),
+            persona_prompt      = prompt,
             key_status      = key_status,
             instance_name   = config['instance_name'],
             webhook_schemas = webhook_schemas,

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/initialize.py RENAMED Viewed

@@ -113,6 +113,7 @@ INIT_BOT_CONFIG = {
     'persona_temp': None,
     'archive_days': None,
     'allow_local_webhooks': None,
+    'document_processing': None,
     'persona_prompt': 'You are a generic test bot powered by a user-configured LLM.'
 }
@@ -123,6 +124,7 @@ INIT_BOT_CONFIG_COMMENTS = {
     'persona_temp': '# Optional, LLM temperature 0.0-2.0 (default: model\'s default)',
     'archive_days': '# Optional, days before messages are eligible for Tier 1 archival (default: 60, min: 1). Tier 2 triggers at 2x this value.',
     'allow_local_webhooks': '# Optional, set to true to permit webhook/MCP URLs targeting loopback or link-local addresses (default: false)',
+    'document_processing': '# Optional, set to false to disable document summarisation (default: true)',
 }
 # Append the framework-owned system appendix to the persona prompt.

tellmgrambot-3.15.0/TeLLMgramBot/message_handlers.py ADDED Viewed

@@ -0,0 +1,316 @@
+# Handles incoming messages and URLs unique for TeLLMgramBot
+import io
+import logging
+import re
+from pathlib import Path
+from typing import Optional
+from charset_normalizer import from_bytes as _cn_from_bytes
+import defusedxml.ElementTree as _defusedxml_ET
+import pypdf
+from .utils import log_error
+from .models import TokenLimits
+from .web_utils import (
+    fetch_url,
+    strip_html_markup,
+    InvalidURLException,
+    InsecureURLException,
+    SusURLException,
+)
+from .providers.factory import get_provider
+logger = logging.getLogger(__name__)
+_URL_ANALYSIS_TEMPLATE = (
+    "## URL Analysis\n"
+    "The user has provided a URL to perform some level of analysis. You will infer "
+    "the nature of the analysis from the user's query.\n\n"
+    "The contents of the URL mentioned have already been harvested and cleansed. "
+    "Note the URL contents will likely have sections of text that are less relevant "
+    "to the user's question (headers, footers, menus, ads, etc.). You will need to "
+    "ignore those sections of text and focus on the main content of the page.\n\n"
+    "The contents of the URL are shown below:\n"
+    "BEGIN URL CONTENTS\n"
+    "{content}\n"
+    "END URL CONTENTS\n"
+)
+_DOCUMENT_ANALYSIS_TEMPLATE = (
+    "## Document Analysis\n"
+    "The user has shared a document for analysis. Infer the nature of the analysis "
+    "from the user's caption or question. If no specific question is provided, "
+    "summarise the document's main content and key points.\n\n"
+    "The document contents are shown below:\n"
+    "BEGIN DOCUMENT CONTENTS\n"
+    "{content}\n"
+    "END DOCUMENT CONTENTS\n"
+)
+_PLAIN_TEXT_EXTENSIONS = frozenset({
+    '.txt', '.md', '.rst', '.csv', '.tsv', '.json', '.jsonl', '.xml',
+    '.html', '.htm', '.yaml', '.yml', '.toml', '.ini', '.cfg', '.conf',
+    '.log', '.py', '.js', '.ts', '.sh', '.bash', '.rb', '.go', '.rs',
+    '.java', '.c', '.cpp', '.h', '.cs', '.php', '.sql', '.r', '.tex',
+})
+_HTML_MIMES = frozenset({'text/html', 'application/xhtml+xml'})
+_XML_MIMES  = frozenset({'text/xml', 'application/xml'})
+_PDF_PAGE_CAP = 100
+def handle_greetings(text: str) -> Optional[str]:
+    """
+    Respond quickly with single-word greetings like these examples:
+    - ' hello ' -> 'Hello!'
+    - 'Hey...?' -> 'Hey!'
+    - 'SUP?!?!' -> 'Sup!'
+    """
+    greetings = {'Hello', 'Hi', 'Hey', 'Heya', 'Sup', 'Yo'}
+    word = re.sub(r'[^\w]', '', text.title().strip())
+    if word in greetings:
+        return f"{word}!"
+    return None
+def handle_common_queries(text: str) -> Optional[str]:
+    """
+    Send messages for assistant bot to respond quickly with some example phrases:
+    - ' How you doing ' -> 'How YOU doin?'
+    - 'What's up!' -> 'Wassup?'
+    """
+    phrase = re.sub(r'[^\w]', '', text.lower().strip())
+    if phrase.startswith('howyoudoin'):
+        return 'How YOU doin?'
+    elif phrase == 'wassup' or phrase == 'whatup' or phrase == 'whatsup':
+        return 'Wassup?'
+    return None
+def _decode_bytes(raw: bytes) -> tuple:
+    """
+    Decode raw bytes to a string via UTF-8 -> charset-normalizer -> Latin-1 chain.
+    Returns:
+        Tuple of (decoded_text, encoding). encoding is '' for UTF-8 (no annotation needed).
+    """
+    try:
+        return raw.decode('utf-8'), ''
+    except UnicodeDecodeError:
+        pass
+    result = _cn_from_bytes(raw).best()
+    if result is not None:
+        return str(result), result.encoding
+    return raw.decode('latin-1'), 'ISO-8859-1'
+def _extract_document_text(file_bytes: bytes, mime_type: str, filename: str) -> tuple:
+    """
+    Extract plain text from document bytes, routing by MIME type and file extension.
+    PDF text is extracted via pypdf (capped at _PDF_PAGE_CAP pages, strict=False).
+    HTML content has tags stripped via strip_html_markup. XML is safely parsed via
+    defusedxml to extract text nodes without XXE risk; falls back to plain-text
+    decode if the XML is malformed. All other plain-text types are decoded using
+    a UTF-8 -> charset-normalizer -> Latin-1 chain; non-UTF-8 files prepend a
+    [File encoding: ...] annotation so the LLM has context.
+    Args:
+        file_bytes: Raw document bytes downloaded from Telegram.
+        mime_type: MIME type reported by Telegram (may be empty string).
+        filename: Original filename used for extension-based routing (may be empty).
+    Returns:
+        Tuple of (text, error). On success text is the extracted content and error
+        is None. On failure text is None and error is the user-facing response string.
+    """
+    mime = (mime_type or '').lower()
+    ext  = Path(filename).suffix.lower() if filename else ''
+    is_pdf      = mime == 'application/pdf' or ext == '.pdf'
+    is_html     = ext in ('.html', '.htm') or mime in _HTML_MIMES
+    is_xml      = ext == '.xml' or mime in _XML_MIMES
+    is_plain    = mime.startswith('text/') or ext in _PLAIN_TEXT_EXTENSIONS
+    if is_pdf:
+        try:
+            reader = pypdf.PdfReader(io.BytesIO(file_bytes), strict=False)
+            text = '\n'.join(
+                page.extract_text() or '' for page in reader.pages[:_PDF_PAGE_CAP]
+            )
+            if not text.strip():
+                return None, "This PDF appears to be image-only; I can't read the text in it."
+            return text, None
+        except Exception as e:
+            log_error(e, 'PDF')
+            return None, "Something went wrong while reading that PDF. Please try again."
+    if is_html:
+        raw_text, _ = _decode_bytes(file_bytes)
+        return strip_html_markup(raw_text), None
+    if is_xml:
+        try:
+            root = _defusedxml_ET.fromstring(file_bytes)
+            xml_text = ' '.join(root.itertext()).strip()
+            return xml_text or _decode_bytes(file_bytes)[0], None
+        except Exception:
+            text, encoding = _decode_bytes(file_bytes)
+            if encoding:
+                text = f"[File encoding: {encoding}]\n{text}"
+            return text, None
+    if is_plain:
+        text, encoding = _decode_bytes(file_bytes)
+        if encoding:
+            text = f"[File encoding: {encoding}]\n{text}"
+        return text, None
+    return None, "I can only read plain text and PDF files right now."
+async def summarise_text(
+    content: str,
+    question: str,
+    model: str,
+    template: str,
+    prompt: str = '',
+) -> str:
+    """
+    Token-prune content, apply template, and complete via the LLM.
+    Prunes content so the fully composed system message (prompt + template with content
+    substituted) fits within the model's token budget (max_tokens - 500), then calls the LLM.
+    Token counting is measured against the composed message at every pruning step, not just
+    the raw content, so the budget guarantee matches what is actually sent to the provider -
+    a large template or persona prompt is accounted for, not just the content itself. The
+    template must contain a {content} placeholder.
+    Args:
+        content: Text content to summarise (URL body or document text).
+        question: The user's message or caption; used as the LLM user turn.
+        model: LLM model name.
+        template: System prompt template with a {content} placeholder.
+        prompt: Bot persona prompt prepended to the composed system message.
+    Returns:
+        LLM response string. Appends a truncation note when content was pruned.
+        Returns a user-friendly error string on LLM failure.
+    """
+    def _compose(c: str) -> str:
+        system = template.replace('{content}', c)
+        return f"{prompt}\n\n{system}" if prompt else system
+    working_content = content
+    messages = [
+        {"role": "system", "content": _compose(working_content)},
+        {"role": "user",   "content": question},
+    ]
+    lengthy = False
+    pruned_tail = ''
+    try:
+        token_model = TokenLimits(model)
+        token_count = await token_model.num_tokens_from_messages(messages)
+        token_limit = token_model.max_tokens() - 500
+        if token_count > token_limit:
+            lengthy = True
+            while token_count > token_limit and working_content:
+                head, _, _ = working_content.rpartition(' ')
+                # No space left to split on (minified JSON, base64, one long token) - halve
+                # the string instead so each iteration still guarantees progress toward 0.
+                working_content = head if head else working_content[:len(working_content) // 2]
+                # Re-measure the full composed system message (template + prompt), not just
+                # the raw content, so the token budget matches what is actually sent.
+                messages[0]["content"] = _compose(working_content)
+                token_count = await token_model.num_tokens_from_messages(messages)
+            pruned_tail = working_content[-50:]
+        response = await get_provider(model).complete(model, messages)
+    except Exception as e:
+        log_error(e, model)
+        return "Something went wrong while processing the content. Please try again later."
+    if lengthy:
+        response += (
+            "\n\n*NOTE*: The content was too long and needed to be pruned for my summary."
+            f" If the text after \"{pruned_tail}\" is crucial, insert the rest for me."
+        )
+    return response
+async def handle_url_ask(text: str, model: str = 'gpt-4o', prompt: str = '') -> Optional[str]:
+    """
+    Process URL content in an LLM to provide a summary.
+    Extracts URLs wrapped in square brackets [], validates them, checks for safety via VirusTotal,
+    fetches content, and summarizes via an LLM specified by model name. The bot's persona prompt
+    is prepended to the URL analysis system message so responses match the bot's personality.
+    Args:
+        text: The message text potentially containing a URL in [square brackets].
+        model: The LLM model to use for URL summarization (default: 'gpt-4o').
+        prompt: Bot persona prompt prepended to the URL analysis system message.
+    Returns:
+        A summary string if a URL was found and processed successfully, an error
+        message string if processing failed, or None if no URL detected in text.
+    """
+    url_match = re.search(r'\[http(s)?://\S+]', text.strip())
+    if url_match:
+        url = url_match.group()[1:-1]
+        try:
+            url_content = strip_html_markup(await fetch_url(url))
+            return await summarise_text(url_content, text, model, _URL_ANALYSIS_TEMPLATE, prompt)
+        except InvalidURLException as e:
+            log_error(e, 'URL')
+            return "The URL you provided appears to be invalid. Could you please check it and try again?"
+        except InsecureURLException:
+            return (
+                "The URL you provided is not secure. Could you please try another URL, "
+                "or just pasting the relevant content here?"
+            )
+        except SusURLException:
+            return (
+                "The URL you provided is potentially unsafe, based on my internal scans. "
+                "You can check the safety of URLS using this site: "
+                "https://www.virustotal.com/gui/home/url"
+            )
+        except Exception as e:
+            log_error(e, 'URL')
+            return f"Something went wrong while fetching the URL: {e}"
+    return None
+async def handle_document_message(
+    file_bytes: bytes,
+    mime_type: str,
+    filename: str,
+    caption: str,
+    model: str,
+    prompt: str = '',
+) -> str:
+    """
+    Extract text from a document and summarise it via the LLM.
+    Routes by MIME type and file extension via _extract_document_text(), then
+    feeds the extracted text through summarise_text() with the document template.
+    Logs only the filename, MIME type, and file size - never file content.
+    Args:
+        file_bytes: Raw document bytes downloaded from Telegram.
+        mime_type: MIME type reported by Telegram (may be empty string).
+        filename: Original filename for extension-based routing.
+        caption: User's caption or question; used as the LLM user turn.
+        model: LLM model to use for summarisation.
+        prompt: Bot persona prompt prepended to the system message.
+    Returns:
+        LLM response string, or a user-facing error message string.
+    """
+    logger.info(
+        "Document: name=%s mime=%s size=%d",
+        filename or 'unknown', mime_type or 'unknown', len(file_bytes),
+    )
+    text, error = _extract_document_text(file_bytes, mime_type, filename)
+    if error:
+        return error
+    question = caption or 'Please summarise this document.'
+    return await summarise_text(text, question, model, _DOCUMENT_ANALYSIS_TEMPLATE, prompt)

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/tools.py RENAMED Viewed

@@ -330,7 +330,7 @@ async def discover_mcp_tools(
         raw_headers = entry.get('headers') or {}
         if not isinstance(raw_headers, dict):
-            logger.warning(f"MCP server '{server_url}': 'headers' must be a dict; treating as empty.")
+            logger.warning(f"MCP server '{log_url}': 'headers' must be a dict; treating as empty.")
             raw_headers = {}
         expanded_headers = {}
         disabled = False
@@ -431,7 +431,7 @@ async def discover_mcp_tools(
             all_registered.add(tool_name)
             server_count += 1
-        logger.info(f"MCP server '{server_url}': registered {server_count} tool(s).")
+        logger.info(f"MCP server '{log_url}': registered {server_count} tool(s).")
     return schemas, defs

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: TeLLMgramBot
-Version: 3.14.2
+Version: 3.15.0
 Summary: LLM-powered Telegram bot (OpenAI + Anthropic)
 Home-page: https://github.com/Digital-Heresy/TeLLMgramBot
 Author: Digital Heresy
@@ -19,6 +19,9 @@ Requires-Dist: tiktoken>=0.12
 Requires-Dist: python-telegram-bot>=20.8
 Requires-Dist: aiosqlite>=0.19
 Requires-Dist: tzdata>=2025.2
+Requires-Dist: pypdf>=6.0
+Requires-Dist: defusedxml>=0.7
+Requires-Dist: charset-normalizer>=3.0
 Dynamic: author
 Dynamic: author-email
 Dynamic: description
@@ -41,6 +44,10 @@ The basic goal of this project is to create a bridge between a Telegram Bot and
 * Pass URLs in [square brackets] and mention how the bot should interpret them.
   * Example: "What do you think of this article? [https://some_site/article]"
   * Uses a separate model (configurable via `url_model`) to handle larger URL content.
+* Share documents and text files for analysis and summarisation.
+  * Supported formats: PDF, plain-text files (.txt, .md, .rst, .csv, .json, etc.), HTML, and XML.
+  * The bot extracts and summarises content, with automatic encoding detection for non-UTF-8 files. Files over 20 MB are rejected.
+  * Can be disabled via `document_processing: false` in config.
 * Ask questions about message history across all your chats using natural language; the bot will search, attribute messages to speakers, and include messages from other bots.
   * Example: "Who said thanks for the breakdown?" or "What did George say about the project?" or "Show me the last few messages."
   * All search filters (speaker, chat, date) are optional. Results are ordered most-recent-first. Configure `search_limit` to control how many results to return (default: 30).
@@ -157,6 +164,7 @@ When the bot is triggered in a group and about to respond (not deferring to anot
    - `token_limit`: Max tokens (optional; defaults to model's maximum)
    - `search_limit`: Max search results (optional; defaults to 30)
    - `archive_days`: Days before messages are eligible for archival (optional; default 60, minimum 1). Older messages are distilled into daily summaries, then progressively compressed into monthly digests. Once archived their respective raw messages do not return to the LLM context any more, only when searching messages.
+   - `document_processing`: Optional bool (default: true). Set to false to disable document and text file summarisation.
    - `allow_local_webhooks`: Set to `true` to permit webhook/MCP URLs targeting loopback or link-local addresses (optional; default `false`). Useful when tools like Home Assistant run on the same host.
    - `tools`: Optional list of webhook and MCP tool definitions (admin-only, private chat only). See [docs/tools.md](docs/tools.md) for schema and examples.
 4. **Disable group privacy mode in BotFather:**

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot.egg-info/requires.txt RENAMED Viewed

@@ -8,3 +8,6 @@ tiktoken>=0.12
 python-telegram-bot>=20.8
 aiosqlite>=0.19
 tzdata>=2025.2
+pypdf>=6.0
+defusedxml>=0.7
+charset-normalizer>=3.0

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/setup.py RENAMED Viewed

@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
 setup(
     name='TeLLMgramBot',
-    version='3.14.2',
+    version='3.15.0',
     packages=find_packages(),
     license='MIT',
     author='Digital Heresy',
@@ -24,7 +24,10 @@ setup(
         'tiktoken>=0.12',
         'python-telegram-bot>=20.8',
         'aiosqlite>=0.19',
-        'tzdata>=2025.2'
+        'tzdata>=2025.2',
+        'pypdf>=6.0',
+        'defusedxml>=0.7',
+        'charset-normalizer>=3.0',
     ],
     python_requires='>=3.10'
 )

tellmgrambot-3.14.2/TeLLMgramBot/message_handlers.py DELETED Viewed

@@ -1,153 +0,0 @@
-# Handles incoming messages and URLs unique for TeLLMgramBot
-import re
-from typing import Optional
-import validators
-from .utils import log_error
-from .models import TokenLimits
-from .web_utils import (
-    fetch_url,
-    strip_html_markup,
-    InvalidURLException,
-    InsecureURLException,
-    SusURLException,
-)
-from .providers.factory import get_provider
-_URL_ANALYSIS_TEMPLATE = (
-    "## URL Analysis\n"
-    "The user has provided a URL to perform some level of analysis. You will infer "
-    "the nature of the analysis from the user's query.\n\n"
-    "The contents of the URL mentioned have already been harvested and cleansed. "
-    "Note the URL contents will likely have sections of text that are less relevant "
-    "to the user's question (headers, footers, menus, ads, etc.). You will need to "
-    "ignore those sections of text and focus on the main content of the page.\n\n"
-    "The contents of the URL are shown below:\n"
-    "BEGIN URL CONTENTS\n"
-    "{url_content}\n"
-    "END URL CONTENTS\n"
-)
-def handle_greetings(text: str) -> Optional[str]:
-    """
-    Respond quickly with single-word greetings like these examples:
-    - ' hello ' -> 'Hello!'
-    - 'Hey...?' -> 'Hey!'
-    - 'SUP?!?!' -> 'Sup!'
-    """
-    greetings = {'Hello', 'Hi', 'Hey', 'Heya', 'Sup', 'Yo'}
-    word = re.sub(r'[^\w]', '', text.title().strip())
-    if word in greetings:
-        return f"{word}!"
-    return None
-def handle_common_queries(text: str) -> Optional[str]:
-    """
-    Send messages for assistant bot to respond quickly with some example phrases:
-    - ' How you doing ' -> 'How YOU doin?'
-    - 'What's up!' -> 'Wassup?'
-    """
-    phrase = re.sub(r'[^\w]', '', text.lower().strip())
-    if phrase.startswith('howyoudoin'):
-        return 'How YOU doin?'
-    elif phrase == 'wassup' or phrase == 'whatup' or phrase == 'whatsup':
-        return 'Wassup?'
-    return None
-async def handle_url_ask(text: str, model: str = 'gpt-4o', prompt: str = '') -> Optional[str]:
-    """
-    Process URL content in an LLM to provide a summary.
-    Extracts URLs wrapped in square brackets [], validates them, checks for
-    safety via VirusTotal, fetches content, and summarizes via an LLM specified
-    by model name. The bot's persona prompt is prepended to the URL analysis
-    system message so responses match the bot's personality.
-    Args:
-        text: The message text potentially containing a URL in [square brackets].
-        model: The LLM model to use for URL summarization (default: 'gpt-4o').
-        prompt: Bot persona prompt prepended to the URL analysis system message.
-    Returns:
-        A summary string if a URL was found and processed successfully, an error
-        message string if processing failed, or None if no URL detected in text.
-    Raises:
-        No exceptions are raised; all errors are caught and logged, returning
-        user-friendly error messages instead.
-    """
-    url_match = re.search(r'\[http(s)?://\S+]', text.strip())
-    if url_match:
-        # Extract the URL from the message, but not the square brackets
-        url = url_match.group()[1:-1]
-        # Fetch the URL content
-        try:
-            # The function strips the HTML markup and ensures the URL is valid and safe
-            url_content = strip_html_markup(await fetch_url(url))
-            # Check if the URL is valid real quick
-            if not validators.url(url):
-                raise InvalidURLException(f"Invalid URL parsed by message_handlers.handle_url_ask(): {url}")
-            # Build messages:
-            # 1. URL content to be added into the system prompt template
-            # 2. User message requesting URL in [square brackets]
-            messages = [
-                {"role": "system", "content": url_content},
-                {"role": "user", "content": text}
-            ]
-            # Consider the maximum amount of tokens a LLM can support.
-            # If the URL content is too big, we need to prune it down to a reasonable size.
-            # Let's also reserve 500 tokens for prompt and response.
-            lengthy_url = False
-            pruned_tail = ''
-            token_model = TokenLimits(model)
-            token_count = await token_model.num_tokens_from_messages(messages)
-            token_limit = token_model.max_tokens() - 500
-            if token_count > token_limit:
-                lengthy_url = True
-                while token_count > token_limit:
-                    # Remove every last word until the token limit is satisfied
-                    messages[0]["content"] = messages[0]["content"].rsplit(' ', 1)[0]
-                    token_count = await token_model.num_tokens_from_messages(messages)
-                # Show the last 50 characters of the pruned URL content
-                pruned_tail = messages[0]["content"][-50:]
-            # Build system message: bot persona (if any) + URL analysis template with content
-            url_system = _URL_ANALYSIS_TEMPLATE.replace('{url_content}', messages[0]["content"])
-            messages[0]["content"] = f"{prompt}\n\n{url_system}" if prompt else url_system
-            # Call the LLM for the response that summarizes URL content
-            try:
-                response = await get_provider(model).complete(model, messages)
-            except Exception as e:
-                log_error(e, f"{model} URL")
-                return "Something went wrong while fetching the URL. Please try again later."
-            # If the URL content was too long, let the user know
-            if lengthy_url:
-                response += ("\n\n"
-                    "*NOTE*: The URL content was too long and needed to be pruned for my summary."
-                    f" If the text after \"{pruned_tail}\" is crucial, insert the rest for me."
-                )
-            return response
-        except InvalidURLException as e:
-            log_error(e, 'URL')
-            return "The URL you provided appears to be invalid. Could you please check it and try again?"
-        except InsecureURLException:
-            return ("The URL you provided is not secure. Could you please try another URL, or just pasting the "
-                    "relevant content here?")
-        except SusURLException:
-            return ("The URL you provided is potentially unsafe, based on my internal scans. You can check the safety "
-                    "of URLS using this site: https://www.virustotal.com/gui/home/url")
-        except Exception as e:
-            log_error(e, 'URL')
-            return f"Something went wrong while fetching the URL: {e}"
-    return None

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/LICENSE RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/__init__.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/archive.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/conversation.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/database.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/models.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/providers/__init__.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/providers/anthropic_provider.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/providers/base.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/providers/factory.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/providers/openai_provider.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/utils.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot/web_utils.py RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/TeLLMgramBot.egg-info/top_level.txt RENAMED Viewed

File without changes

{tellmgrambot-3.14.2 → tellmgrambot-3.15.0}/setup.cfg RENAMED Viewed

File without changes

TeLLMgramBot 3.14.2__tar.gz → 3.15.0__tar.gz

TeLLMgramBot 3.14.2tar.gz → 3.15.0tar.gz