PyPI - khoj - Versions diffs - 1.20.3__py3-none-any.whl → 1.20.5.dev10__py3-none-any.whl - Mend

khoj 1.20.3py3-none-any.whl → 1.20.5.dev10py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

khoj/processor/conversation/prompts.py CHANGED Viewed

@@ -208,10 +208,12 @@ Construct search queries to retrieve relevant information to answer the user's q
 - Add as much context from the previous questions and answers as required into your search queries.
 - Break messages into multiple search queries when required to retrieve the relevant information.
 - Add date filters to your search queries from questions and answers when required to retrieve the relevant information.
+- When asked a meta, vague or random questions, search for a variety of broad topics to answer the user's question.
 - Share relevant search queries as a JSON list of strings. Do not say anything else.
 Current Date: {day_of_week}, {current_date}
 User's Location: {location}
+{username}
 Examples:
 Q: How was my trip to Cambodia?
@@ -238,6 +240,9 @@ Khoj: ["What kind of plants do I have?", "What issues do my plants have?"]
 Q: Who all did I meet here yesterday?
 Khoj: ["Met in {location} on {yesterday_date} dt>='{yesterday_date}' dt<'{current_date}'"]
+Q: Share some random, interesting experiences from this month
+Khoj: ["Exciting travel adventures from {current_month}", "Fun social events dt>='{current_month}-01' dt<'{current_date}'", "Intense emotional experiences in {current_month}"]
 Chat History:
 {chat_history}
 What searches will you perform to answer the following question, using the chat history as reference? Respond only with relevant search queries as a valid JSON list of strings.
@@ -254,10 +259,12 @@ Construct search queries to retrieve relevant information to answer the user's q
 - Add as much context from the previous questions and answers as required into your search queries.
 - Break messages into multiple search queries when required to retrieve the relevant information.
 - Add date filters to your search queries from questions and answers when required to retrieve the relevant information.
+- When asked a meta, vague or random questions, search for a variety of broad topics to answer the user's question.
 What searches will you perform to answer the users question? Respond with search queries as list of strings in a JSON object.
 Current Date: {day_of_week}, {current_date}
 User's Location: {location}
+{username}
 Q: How was my trip to Cambodia?
 Khoj: {{"queries": ["How was my trip to Cambodia?"]}}
@@ -279,6 +286,10 @@ Q: How many tennis balls fit in the back of a 2002 Honda Civic?
 Khoj: {{"queries": ["What is the size of a tennis ball?", "What is the trunk size of a 2002 Honda Civic?"]}}
 A: 1085 tennis balls will fit in the trunk of a Honda Civic
+Q: Share some random, interesting experiences from this month
+Khoj: {{"queries": ["Exciting travel adventures from {current_month}", "Fun social events dt>='{current_month}-01' dt<'{current_date}'", "Intense emotional experiences in {current_month}"]}}
+A: You had a great time at the local beach with your friends, attended a music concert and had a deep conversation with your friend, Khalid.
 Q: Is Bob older than Tom?
 Khoj: {{"queries": ["When was Bob born?", "What is Tom's age?"]}}
 A: Yes, Bob is older than Tom. As Bob was born on 1984-01-01 and Tom is 30 years old.
@@ -305,11 +316,13 @@ Construct search queries to retrieve relevant information to answer the user's q
 - Add as much context from the previous questions and answers as required into your search queries.
 - Break messages into multiple search queries when required to retrieve the relevant information.
 - Add date filters to your search queries from questions and answers when required to retrieve the relevant information.
+- When asked a meta, vague or random questions, search for a variety of broad topics to answer the user's question.
 What searches will you perform to answer the users question? Respond with a JSON object with the key "queries" mapping to a list of searches you would perform on the user's knowledge base. Just return the queries and nothing else.
 Current Date: {day_of_week}, {current_date}
 User's Location: {location}
+{username}
 Here are some examples of how you can construct search queries to answer the user's question:
@@ -328,6 +341,11 @@ A: I can help you live healthier and happier across work and personal life
 User: Who all did I meet here yesterday?
 Assistant: {{"queries": ["Met in {location} on {yesterday_date} dt>='{yesterday_date}' dt<'{current_date}'"]}}
 A: Yesterday's note mentions your visit to your local beach with Ram and Shyam.
+User: Share some random, interesting experiences from this month
+Assistant: {{"queries": ["Exciting travel adventures from {current_month}", "Fun social events dt>='{current_month}-01' dt<'{current_date}'", "Intense emotional experiences in {current_month}"]}}
+A: You had a great time at the local beach with your friends, attended a music concert and had a deep conversation with your friend, Khalid.
 """.strip()
 )
@@ -525,6 +543,7 @@ Which webpages will you need to read to answer the user's question?
 Provide web page links as a list of strings in a JSON object.
 Current Date: {current_date}
 User's Location: {location}
+{username}
 Here are some examples:
 History:
@@ -571,6 +590,7 @@ What Google searches, if any, will you need to perform to answer the user's ques
 Provide search queries as a list of strings in a JSON object. Do not wrap the json in a codeblock.
 Current Date: {current_date}
 User's Location: {location}
+{username}
 Here are some examples:
 History:

khoj/processor/embeddings.py CHANGED Viewed

@@ -95,11 +95,13 @@ class CrossEncoderModel:
         model_name: str = "mixedbread-ai/mxbai-rerank-xsmall-v1",
         cross_encoder_inference_endpoint: str = None,
         cross_encoder_inference_endpoint_api_key: str = None,
+        model_kwargs: dict = {},
     ):
         self.model_name = model_name
-        self.cross_encoder_model = CrossEncoder(model_name=self.model_name, device=get_device())
         self.inference_endpoint = cross_encoder_inference_endpoint
         self.api_key = cross_encoder_inference_endpoint_api_key
+        self.model_kwargs = merge_dicts(model_kwargs, {"device": get_device()})
+        self.cross_encoder_model = CrossEncoder(model_name=self.model_name, **self.model_kwargs)
     def inference_server_enabled(self) -> bool:
         return self.api_key is not None and self.inference_endpoint is not None

khoj/processor/tools/online_search.py CHANGED Viewed

@@ -10,6 +10,7 @@ import aiohttp
 from bs4 import BeautifulSoup
 from markdownify import markdownify
+from khoj.database.models import KhojUser
 from khoj.routers.helpers import (
     ChatEvent,
     extract_relevant_info,
@@ -51,6 +52,7 @@ async def search_online(
     query: str,
     conversation_history: dict,
     location: LocationData,
+    user: KhojUser,
     send_status_func: Optional[Callable] = None,
     custom_filters: List[str] = [],
 ):
@@ -61,7 +63,7 @@ async def search_online(
         return
     # Breakdown the query into subqueries to get the correct answer
-    subqueries = await generate_online_subqueries(query, conversation_history, location)
+    subqueries = await generate_online_subqueries(query, conversation_history, location, user)
     response_dict = {}
     if subqueries:
@@ -126,14 +128,18 @@ async def search_with_google(query: str) -> Tuple[str, Dict[str, List[Dict]]]:
 async def read_webpages(
-    query: str, conversation_history: dict, location: LocationData, send_status_func: Optional[Callable] = None
+    query: str,
+    conversation_history: dict,
+    location: LocationData,
+    user: KhojUser,
+    send_status_func: Optional[Callable] = None,
 ):
     "Infer web pages to read from the query and extract relevant information from them"
     logger.info(f"Inferring web pages to read")
     if send_status_func:
         async for event in send_status_func(f"**Inferring web pages to read**"):
             yield {ChatEvent.STATUS: event}
-    urls = await infer_webpage_urls(query, conversation_history, location)
+    urls = await infer_webpage_urls(query, conversation_history, location, user)
     logger.info(f"Reading web pages at: {urls}")
     if send_status_func:

khoj/routers/api.py CHANGED Viewed

@@ -388,6 +388,7 @@ async def extract_references_and_questions(
                 conversation_log=meta_log,
                 should_extract_questions=True,
                 location_data=location_data,
+                user=user,
                 max_prompt_size=conversation_config.max_prompt_size,
             )
         elif conversation_config.model_type == ChatModelOptions.ModelType.OPENAI:
@@ -402,7 +403,7 @@ async def extract_references_and_questions(
                 api_base_url=base_url,
                 conversation_log=meta_log,
                 location_data=location_data,
-                max_tokens=conversation_config.max_prompt_size,
+                user=user,
             )
         elif conversation_config.model_type == ChatModelOptions.ModelType.ANTHROPIC:
             api_key = conversation_config.openai_config.api_key
@@ -413,6 +414,7 @@ async def extract_references_and_questions(
                 api_key=api_key,
                 conversation_log=meta_log,
                 location_data=location_data,
+                user=user,
             )
     # Collate search results as context for GPT
@@ -545,15 +547,19 @@ async def post_automation(
     if not subject:
         subject = await acreate_title_from_query(q)
+    title = f"Automation: {subject}"
     # Create new Conversation Session associated with this new task
-    conversation = await ConversationAdapters.acreate_conversation_session(user, request.user.client_app)
+    conversation = await ConversationAdapters.acreate_conversation_session(user, request.user.client_app, title=title)
-    calling_url = request.url.replace(query=f"{request.url.query}&conversation_id={conversation.id}")
+    calling_url = request.url.replace(query=f"{request.url.query}")
     # Schedule automation with query_to_run, timezone, subject directly provided by user
     try:
         # Use the query to run as the scheduling request if the scheduling request is unset
-        automation = await schedule_automation(query_to_run, subject, crontime, timezone, q, user, calling_url)
+        automation = await schedule_automation(
+            query_to_run, subject, crontime, timezone, q, user, calling_url, conversation.id
+        )
     except Exception as e:
         logger.error(f"Error creating automation {q} for {user.email}: {e}", exc_info=True)
         return Response(
@@ -649,6 +655,16 @@ def edit_job(
     automation_metadata["query_to_run"] = query_to_run
     automation_metadata["subject"] = subject.strip()
     automation_metadata["crontime"] = crontime
+    conversation_id = automation_metadata.get("conversation_id")
+    if not conversation_id:
+        title = f"Automation: {subject}"
+        # Create new Conversation Session associated with this new task
+        conversation = ConversationAdapters.create_conversation_session(user, request.user.client_app, title=title)
+        conversation_id = conversation.id
+        automation_metadata["conversation_id"] = conversation_id
     # Modify automation with updated query, subject
     automation.modify(
@@ -659,6 +675,7 @@ def edit_job(
             "scheduling_request": q,
             "user": user,
             "calling_url": request.url,
+            "conversation_id": conversation_id,
         },
     )

khoj/routers/api_chat.py CHANGED Viewed

@@ -146,7 +146,7 @@ async def sendfeedback(request: Request, data: FeedbackData):
 @api_chat.post("/speech")
-@requires(["authenticated", "premium"])
+@requires(["authenticated"])
 async def text_to_speech(
     request: Request,
     common: CommonQueryParams,
@@ -792,7 +792,7 @@ async def chat(
         if ConversationCommand.Online in conversation_commands:
             try:
                 async for result in search_online(
-                    defiltered_query, meta_log, location, partial(send_event, ChatEvent.STATUS), custom_filters
+                    defiltered_query, meta_log, location, user, partial(send_event, ChatEvent.STATUS), custom_filters
                 ):
                     if isinstance(result, dict) and ChatEvent.STATUS in result:
                         yield result[ChatEvent.STATUS]
@@ -809,7 +809,7 @@ async def chat(
         if ConversationCommand.Webpage in conversation_commands:
             try:
                 async for result in read_webpages(
-                    defiltered_query, meta_log, location, partial(send_event, ChatEvent.STATUS)
+                    defiltered_query, meta_log, location, user, partial(send_event, ChatEvent.STATUS)
                 ):
                     if isinstance(result, dict) and ChatEvent.STATUS in result:
                         yield result[ChatEvent.STATUS]

khoj/routers/email.py CHANGED Viewed

@@ -117,7 +117,8 @@ def send_task_email(name, email, query, result, subject, is_image=False):
     template = env.get_template("task.html")
     if is_image:
-        result = f"![{subject}]({result})"
+        image = result.get("image")
+        result = f"![{subject}]({image})"
     html_result = markdown_it.MarkdownIt().render(result)
     html_content = template.render(name=name, subject=subject, query=query, result=html_result)

khoj/routers/helpers.py CHANGED Viewed

@@ -340,11 +340,14 @@ async def aget_relevant_output_modes(query: str, conversation_history: dict, is_
         return ConversationCommand.Text
-async def infer_webpage_urls(q: str, conversation_history: dict, location_data: LocationData) -> List[str]:
+async def infer_webpage_urls(
+    q: str, conversation_history: dict, location_data: LocationData, user: KhojUser
+) -> List[str]:
     """
     Infer webpage links from the given query
     """
     location = f"{location_data.city}, {location_data.region}, {location_data.country}" if location_data else "Unknown"
+    username = prompts.user_name.format(name=user.get_full_name()) if user.get_full_name() else ""
     chat_history = construct_chat_history(conversation_history)
     utc_date = datetime.utcnow().strftime("%Y-%m-%d")
@@ -353,6 +356,7 @@ async def infer_webpage_urls(q: str, conversation_history: dict, location_data:
         query=q,
         chat_history=chat_history,
         location=location,
+        username=username,
     )
     with timer("Chat actor: Infer webpage urls to read", logger):
@@ -370,11 +374,14 @@ async def infer_webpage_urls(q: str, conversation_history: dict, location_data:
         raise ValueError(f"Invalid list of urls: {response}")
-async def generate_online_subqueries(q: str, conversation_history: dict, location_data: LocationData) -> List[str]:
+async def generate_online_subqueries(
+    q: str, conversation_history: dict, location_data: LocationData, user: KhojUser
+) -> List[str]:
     """
     Generate subqueries from the given query
     """
     location = f"{location_data.city}, {location_data.region}, {location_data.country}" if location_data else "Unknown"
+    username = prompts.user_name.format(name=user.get_full_name()) if user.get_full_name() else ""
     chat_history = construct_chat_history(conversation_history)
     utc_date = datetime.utcnow().strftime("%Y-%m-%d")
@@ -383,6 +390,7 @@ async def generate_online_subqueries(q: str, conversation_history: dict, locatio
         query=q,
         chat_history=chat_history,
         location=location,
+        username=username,
     )
     with timer("Chat actor: Generate online search subqueries", logger):
@@ -1074,7 +1082,13 @@ def should_notify(original_query: str, executed_query: str, ai_response: str) ->
 def scheduled_chat(
-    query_to_run: str, scheduling_request: str, subject: str, user: KhojUser, calling_url: URL, job_id: str = None
+    query_to_run: str,
+    scheduling_request: str,
+    subject: str,
+    user: KhojUser,
+    calling_url: URL,
+    job_id: str = None,
+    conversation_id: int = None,
 ):
     logger.info(f"Processing scheduled_chat: {query_to_run}")
     if job_id:
@@ -1101,6 +1115,10 @@ def scheduled_chat(
     # Replace the original scheduling query with the scheduled query
     query_dict["q"] = [query_to_run]
+    # Replace the original conversation_id with the conversation_id
+    if conversation_id:
+        query_dict["conversation_id"] = [conversation_id]
     # Construct the URL to call the chat API with the scheduled query string
     encoded_query = urlencode(query_dict, doseq=True)
     url = f"{scheme}://{calling_url.netloc}/api/chat?{encoded_query}"
@@ -1130,7 +1148,9 @@ def scheduled_chat(
     if raw_response.headers.get("Content-Type") == "application/json":
         response_map = raw_response.json()
         ai_response = response_map.get("response") or response_map.get("image")
-        is_image = response_map.get("image") is not None
+        is_image = False
+        if type(ai_response) == dict:
+            is_image = ai_response.get("image") is not None
     else:
         ai_response = raw_response.text
@@ -1142,9 +1162,11 @@ def scheduled_chat(
             return raw_response
-async def create_automation(q: str, timezone: str, user: KhojUser, calling_url: URL, meta_log: dict = {}):
+async def create_automation(
+    q: str, timezone: str, user: KhojUser, calling_url: URL, meta_log: dict = {}, conversation_id: int = None
+):
     crontime, query_to_run, subject = await schedule_query(q, meta_log)
-    job = await schedule_automation(query_to_run, subject, crontime, timezone, q, user, calling_url)
+    job = await schedule_automation(query_to_run, subject, crontime, timezone, q, user, calling_url, conversation_id)
     return job, crontime, query_to_run, subject
@@ -1156,6 +1178,7 @@ async def schedule_automation(
     scheduling_request: str,
     user: KhojUser,
     calling_url: URL,
+    conversation_id: int,
 ):
     # Disable minute level automation recurrence
     minute_value = crontime.split(" ")[0]
@@ -1173,6 +1196,7 @@ async def schedule_automation(
             "scheduling_request": scheduling_request,
             "subject": subject,
             "crontime": crontime,
+            "conversation_id": conversation_id,
         }
     )
     query_id = hashlib.md5(f"{query_to_run}_{crontime}".encode("utf-8")).hexdigest()
@@ -1191,6 +1215,7 @@ async def schedule_automation(
             "user": user,
             "calling_url": calling_url,
             "job_id": job_id,
+            "conversation_id": conversation_id,
         },
         id=job_id,
         name=job_metadata,

khoj/search_filter/file_filter.py CHANGED Viewed

@@ -11,7 +11,8 @@ logger = logging.getLogger(__name__)
 class FileFilter(BaseFilter):
-    file_filter_regex = r'file:"(.+?)" ?'
+    file_filter_regex = r'(?<!-)file:"(.+?)" ?'
+    excluded_file_filter_regex = r'-file:"(.+?)" ?'
     def __init__(self, entry_key="file"):
         self.entry_key = entry_key
@@ -20,7 +21,9 @@ class FileFilter(BaseFilter):
     def get_filter_terms(self, query: str) -> List[str]:
         "Get all filter terms in query"
-        return [f"{self.convert_to_regex(term)}" for term in re.findall(self.file_filter_regex, query)]
+        required_files = [f"{required_file}" for required_file in re.findall(self.file_filter_regex, query)]
+        excluded_files = [f"-{excluded_file}" for excluded_file in re.findall(self.excluded_file_filter_regex, query)]
+        return required_files + excluded_files
     def convert_to_regex(self, file_filter: str) -> str:
         "Convert file filter to regex"

{khoj-1.20.3.dist-info → khoj-1.20.5.dev10.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: khoj
-Version: 1.20.3
+Version: 1.20.5.dev10
 Summary: Your Second Brain
 Project-URL: Homepage, https://khoj.dev
 Project-URL: Documentation, https://docs.khoj.dev

khoj 1.20.3__py3-none-any.whl → 1.20.5.dev10__py3-none-any.whl

khoj 1.20.3py3-none-any.whl → 1.20.5.dev10py3-none-any.whl