PyPI - kash-shell - Versions diffs - 0.3.25__py3-none-any.whl → 0.3.27__py3-none-any.whl - Mend

kash-shell 0.3.25py3-none-any.whl → 0.3.27py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

kash/actions/__init__.py +51 -6
kash/actions/core/minify_html.py +2 -2
kash/commands/base/general_commands.py +4 -2
kash/commands/help/assistant_commands.py +4 -3
kash/commands/help/welcome.py +1 -1
kash/config/colors.py +7 -3
kash/config/logger.py +4 -0
kash/config/text_styles.py +1 -0
kash/config/unified_live.py +249 -0
kash/docs/markdown/assistant_instructions_template.md +3 -3
kash/docs/markdown/topics/a1_what_is_kash.md +22 -20
kash/docs/markdown/topics/a2_installation.md +10 -10
kash/docs/markdown/topics/a3_getting_started.md +8 -8
kash/docs/markdown/topics/a4_elements.md +3 -3
kash/docs/markdown/topics/a5_tips_for_use_with_other_tools.md +12 -12
kash/docs/markdown/topics/b0_philosophy_of_kash.md +17 -17
kash/docs/markdown/topics/b1_kash_overview.md +7 -7
kash/docs/markdown/topics/b2_workspace_and_file_formats.md +1 -1
kash/docs/markdown/topics/b3_modern_shell_tool_recommendations.md +1 -1
kash/docs/markdown/topics/b4_faq.md +7 -7
kash/docs/markdown/welcome.md +1 -1
kash/embeddings/embeddings.py +110 -39
kash/embeddings/text_similarity.py +2 -2
kash/exec/shell_callable_action.py +4 -3
kash/help/help_embeddings.py +5 -2
kash/mcp/mcp_server_sse.py +0 -5
kash/model/graph_model.py +2 -0
kash/model/items_model.py +4 -4
kash/shell/output/shell_output.py +2 -2
kash/shell/shell_main.py +64 -6
kash/shell/version.py +18 -2
kash/utils/file_utils/csv_utils.py +105 -0
kash/utils/rich_custom/multitask_status.py +19 -5
kash/web_gen/templates/base_styles.css.jinja +384 -31
kash/web_gen/templates/base_webpage.html.jinja +43 -0
kash/web_gen/templates/components/toc_styles.css.jinja +25 -4
kash/web_gen/templates/components/tooltip_styles.css.jinja +2 -0
kash/web_gen/templates/content_styles.css.jinja +23 -9
kash/web_gen/templates/item_view.html.jinja +12 -4
kash/web_gen/templates/simple_webpage.html.jinja +2 -2
kash/xonsh_custom/custom_shell.py +6 -6
{kash_shell-0.3.25.dist-info → kash_shell-0.3.27.dist-info}/METADATA +59 -56
{kash_shell-0.3.25.dist-info → kash_shell-0.3.27.dist-info}/RECORD +46 -44
{kash_shell-0.3.25.dist-info → kash_shell-0.3.27.dist-info}/WHEEL +0 -0
{kash_shell-0.3.25.dist-info → kash_shell-0.3.27.dist-info}/entry_points.txt +0 -0
{kash_shell-0.3.25.dist-info → kash_shell-0.3.27.dist-info}/licenses/LICENSE +0 -0

kash/docs/markdown/topics/a3_getting_started.md CHANGED Viewed

@@ -15,11 +15,11 @@ Type `help` for the full documentation.
 The simplest way to illustrate how to use kash is by example.
 You can go through the commands below a few at a time, trying each one.
-This is a "real" example that uses ffmpeg and a few other libraries.
-So to get it to work you must install not just the main shell but the kash "media kit"
+This is a “real” example that uses ffmpeg and a few other libraries.
+So to get it to work you must install not just the main shell but the kash “media kit”
 with extra dependencies.
 This is discussed in [the installation instructions](#installation-steps).
-If you don't have these already installed, you can add these tools:
+If you don’t have these already installed, you can add these tools:
 Then run `kash` to start.
@@ -170,8 +170,8 @@ All of these steps are just actions.
 ### Creating a New Workspace
-Although you don't always need one, a *workspace* is very helpful for any real work in
-kash. It's just a directory of files, plus a `.kash/` directory with various logs and
+Although you don’t always need one, a *workspace* is very helpful for any real work in
+kash. It’s just a directory of files, plus a `.kash/` directory with various logs and
 metadata.
 Note the `.kash/cache` directory contains all the downloaded videos and media you
@@ -192,7 +192,7 @@ By default, when you are not using the shell inside a workspace directory, or wh
 run kash the first time, it uses the default *global workspace*.
 Once you create a workspace, you can `cd` into that workspace and that will become the
-current workspace. (If you're familiar with how the `git` command-line works in
+current workspace. (If you’re familiar with how the `git` command-line works in
 conjunction with the `.git/` directory, this behavior is very similar.)
 To start a new workspace, run a command like
@@ -230,7 +230,7 @@ A few of the most important commands for managing files and work are these:
 - `workspace` shows or selects or creates a new workspace.
   Initially you work in the default global workspace (typically at `~/Kash/workspace`)
-  but for more real work you'll want to create a workspace, which is a directory to hold
+  but for more real work you’ll want to create a workspace, which is a directory to hold
   the files you are working with.
 - `select` shows or sets selections, which are the set of files the next command will
@@ -244,7 +244,7 @@ A few of the most important commands for managing files and work are these:
 - `logs` to see full logs (typically more detailed than what you see in the console).
-- `history` to see recent commands you've run.
+- `history` to see recent commands you’ve run.
 - `import_item` to add a resource such as a URL or a file to your local workspace.

kash/docs/markdown/topics/a4_elements.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ### What is Included?
-I've tried to build independently useful pieces that fit together in a simple way:
+I’ve tried to build independently useful pieces that fit together in a simple way:
 - The kash **action framework**:
@@ -75,9 +75,9 @@ I've tried to build independently useful pieces that fit together in a simple wa
     OSC 8 links
   - Sadly, we may have mind-boggling AI tools, but Terminals are still incredibly
-    archaic and don't support these features well (more on this below) but I have a new
+    archaic and don’t support these features well (more on this below) but I have a new
     terminal, Kerm, that shows these as tooltips and makes every command clickable
-    (please contact me if you'd like an early developer preview, as I'd love feedback)
+    (please contact me if you’d like an early developer preview, as I’d love feedback)
 ## Tools Used by Kash

kash/docs/markdown/topics/a5_tips_for_use_with_other_tools.md CHANGED Viewed

@@ -17,14 +17,14 @@ I tried half a dozen different popular terminals on Mac
 [Hyper](https://hyper.is/)). Unfortunately, none offer really good support right out of
 the box, but I encourage you to try
-✨**Would you be willing to help test something new?** If you've made it this far and are
+✨**Would you be willing to help test something new?** If you’ve made it this far and are
 still reading, I have a request.
-So alongside kash, I've begun to build a new terminal app, **Kerm**, that has the
+So alongside kash, I’ve begun to build a new terminal app, **Kerm**, that has the
 features we would want in a modern command line, such as clickable links and commands,
 tooltips, and image support.
 Kash also takes advantage of this support by embedding OSC 8 links.
 It is *so* much nicer to use.
-I'd like feedback so please [message me](https://twitter.com/ojoshe) if you'd like to
+I’d like feedback so please [message me](https://twitter.com/ojoshe) if you’d like to
 try it out an early dev version!
 ### Choosing an Editor
@@ -34,7 +34,7 @@ Kash respects the `EDITOR` environment variable if you use the `edit` command.
 ### Using on macOS
-Kash calls `open` to open some files, so in general, it's convenient to make sure your
+Kash calls `open` to open some files, so in general, it’s convenient to make sure your
 preferred editor is set up for `.yml` and `.md` files.
 For convenience, a reminder on how to do this:
@@ -42,7 +42,7 @@ For convenience, a reminder on how to do this:
 - In Finder, pick a `.md` or `.yml` file and hit Cmd-I (or right-click and select Get
   Info).
-- Select the editor, such as Cursor or VSCode or Obsidian, and click the "Change All…"
+- Select the editor, such as Cursor or VSCode or Obsidian, and click the “Change All…”
   button to have it apply to all files with that extension.
 - Repeat with each file type.
@@ -61,23 +61,23 @@ out of the box to edit workspace files in Markdown, HTML, and YAML in kash works
 Kash uses Markdown files with YAML frontmatter, which is fully compatible with
 [Obsidian](https://obsidian.md/). Some notes:
-- In Obsidian's preferences, under Editor, turn on "Strict line breaks".
+- In Obsidian’s preferences, under Editor, turn on “Strict line breaks”.
-- This makes the line breaks in kash's normalized Markdown output work well in Obsidian.
+- This makes the line breaks in kash’s normalized Markdown output work well in Obsidian.
 - Some kash files also contain HTML in Markdown.
-  This works fine, but note that only the current line's HTML is shown in Obsidian.
+  This works fine, but note that only the current line’s HTML is shown in Obsidian.
 - Install the [Front Matter Title
   plugin](https://github.com/snezhig/obsidian-front-matter-title):
-  - Go to settings, enable community plugins, search for "Front Matter Title" and
+  - Go to settings, enable community plugins, search for “Front Matter Title” and
     install.
-  - Under "Installed Plugins," adjust the settings to enable "Replace shown title in
-    file explorer," "Replace shown title in graph," etc.
+  - Under “Installed Plugins,” adjust the settings to enable “Replace shown title in
+    file explorer,” “Replace shown title in graph,” etc.
-  - You probably want to keep the "Replace titles in header of leaves" off so you can
+  - You probably want to keep the “Replace titles in header of leaves” off so you can
     still see original filenames if needed.
   - Now titles are easy to read for all kash notes.

kash/docs/markdown/topics/b0_philosophy_of_kash.md CHANGED Viewed

@@ -3,14 +3,14 @@
 > “*Civilization advances by extending the number of important operations which we can
 > perform without thinking about them.*” —Alfred North Whitehead
-Here is a bit more motivation for experimenting with kash, why I think it's potentially
+Here is a bit more motivation for experimenting with kash, why I think it’s potentially
 so useful, and some design principles.
 (You may skip ahead to the next section if you just want a more concrete overview!)
-### Why Apps Can't Solve All Your Problems
+### Why Apps Can’t Solve All Your Problems
 AI has radically changed the way we use software.
-With LLMs and other generative AI models, we've seen big improvements in two areas:
+With LLMs and other generative AI models, we’ve seen big improvements in two areas:
 1. Powerful general-purpose new AI tools (ChatGPT, Perplexity, etc.)
@@ -18,20 +18,20 @@ With LLMs and other generative AI models, we've seen big improvements in two are
    want to solve, like Notion, Figma, Descript, etc.
 While we have these powerful cloud apps, we all know numerous situations where our
-problems aren't easily solved or automated with single tool like ChatGPT, Notion, Google
+problems aren’t easily solved or automated with single tool like ChatGPT, Notion, Google
 Docs, Slack, Excel, and Zapier.
 If you want to use any of the newest AI models and APIs for something not supported by
 an existing tool, you generally have to design and build it yourself—in Python and/or a
 full-stack web app.
-It's true tools like GitHub Copilot, Claude Code, and Cursor can help anyone write code
+It’s true tools like GitHub Copilot, Claude Code, and Cursor can help anyone write code
 much faster. But even if you have a tool like this, building polished apps that are good
 enough people will pay them takes time, and many good product ideas never get built.
-And the curse of [Conway's Law](https://en.wikipedia.org/wiki/Conway%27s_law) means many
-companies won't add specific features you want, or at best are likely to do it slowly.
+And the curse of [Conway’s Law](https://en.wikipedia.org/wiki/Conway%27s_law) means many
+companies won’t add specific features you want, or at best are likely to do it slowly.
-In short, in spite of AI tools accelerating software, certain things don't change: we
+In short, in spite of AI tools accelerating software, certain things don’t change: we
 are waiting for developers, product managers, designers, and entrepreneurs to design and
 ship solutions for us.
@@ -58,9 +58,9 @@ Command-line shells generally still suffer from three big issues:
 - A text-based interface many find confusing or ugly
 - No easy, “native” support for modern tools, apps, and APIs (especially LLMs—and using
-  `curl` to call OpenAI APIs doesn't count!)
+  `curl` to call OpenAI APIs doesn’t count!)
-Even worse, command lines haven't gotten much better.
+Even worse, command lines haven’t gotten much better.
 Few companies make money shipping new command-line tooling.
 (In the last few years this has slowly starting to change with tools like nushell, fish,
 and Warp.)
@@ -73,7 +73,7 @@ developer, a designer, or an enterpreneur building a product.
 Any tool that lets you solve complex problems yourself, without waiting for engineers
 and designers, can radically improve your productivity.
-I think it's a good time to revisit this idea.
+I think it’s a good time to revisit this idea.
 In a post-LLM world, it should be possible to do more things without so much time and
 effort spent (even with the help of LLMs) on coding and UI/UX design.
@@ -84,7 +84,7 @@ to see how well it works.
 ### The Goals of Kash
-Kash is an experimental attempt at building the tool I've wanted for a long time, using
+Kash is an experimental attempt at building the tool I’ve wanted for a long time, using
 a command line as a starting point, and with an initial focus on content-related tasks.
 That brings us to the goals behind building a new, AI-native shell.
@@ -99,17 +99,17 @@ That brings us to the goals behind building a new, AI-native shell.
 - **Make complex tasks possible:** Highly complex tasks and workflows should be easy to
   assemble (and rerun if they need to be automated) by adding new primitive actions and
   combining primitive actions into more complex workflows.
-  You shouldn't need to be a programmer to use any task—but any task should be
+  You shouldn’t need to be a programmer to use any task—but any task should be
   extensible with arbitrary code (written by you and an LLM) when needed.
 - **Augment human skills and judgement:** Many AI agent efforts aim for pure automation.
   But even with powerful LLMs and tools, full automation is rare.
-  Invariably, the best results come from human review wherever it's needed—experimenting
+  Invariably, the best results come from human review wherever it’s needed—experimenting
   with different models and prompts, looking at what works, focusing expert human
   attention in the right places.
   The most flexible tools augment, not replace, your ability to review and manipulate
   information. It should help both very technical users, like developers, as well as less
-  technical but sophisticated users who aren't traditional programmers.
+  technical but sophisticated users who aren’t traditional programmers.
 - **Accelerate discovery of the workflows that work best:** We have so many powerful
   APIs, models, libraries, and tools now—but the real bottleneck is in discovering and
@@ -125,7 +125,7 @@ That brings us to the goals behind building a new, AI-native shell.
 A better command line like a first step toward an item-based information operating
 system—an alternate, more flexible UX and information architecture for knowledge
-workflows. My hope is that kash becomes the tool you need when you don't know what tool
+workflows. My hope is that kash becomes the tool you need when you don’t know what tool
 you need.
 ### Design Principles
@@ -155,7 +155,7 @@ Key design choices:
    transition)
 7. **Maintain context in workspaces** (keep files organized by project or effort in a
-   folder that can be persisted, won't get lost, and includes content, metadata,
+   folder that can be persisted, won’t get lost, and includes content, metadata,
    actions, settings, selections, caches, history, etc.)
 8. **Maintain metadata on files** (so you always know where each piece of content comes

kash/docs/markdown/topics/b1_kash_overview.md CHANGED Viewed

@@ -8,7 +8,7 @@ extensibility of a modern command line interface.
 The philosophy behind kash is similar to Unix shell tools: simple commands that can be
 combined flexibly in powerful ways.
-It operates on "items" such as URLs, files, or Markdown notes within a workspace
+It operates on “items” such as URLs, files, or Markdown notes within a workspace
 directory.
 This command-line is also AI enabled.
@@ -29,7 +29,7 @@ intuitive than old Unix commands.
 ### MCP Support
 If the idea of having lots of commands runnable by an LLM sounds to you a little like
-MCP, you're right. Any action in kash can also be an MCP tool!
+MCP, you’re right. Any action in kash can also be an MCP tool!
 You can connect Claude Desktop or Cursor or other MCP clients to kash and use any kash
 action as a tool. However, unlike the complexity of writing a new MCP server, the idea
@@ -41,22 +41,22 @@ Anyone, including kash itself, can write new actions.
 You write a simple Python function, add a decorator, and it becomes an action you can
 use in your shell.
-Finally, getting really useful things to work still takes effort, so I've also added a
+Finally, getting really useful things to work still takes effort, so I’ve also added a
 number of little libraries to help with this.
 ### Supporting Complex Tasks
-Because it's really just a set of Python libraries, kash is more capable than a typical
+Because it’s really just a set of Python libraries, kash is more capable than a typical
 shell. It is starting to become a sort of AI-friendly scripting framework as well.
 Inputs and outputs of commands are stored as files, so you can easily chain commands
 together and inspect intermediate results.
 When possible, actions are nondestructive and idempotent—that is, they will either
-create new files or simply skip an operation if it's already complete.
+create new files or simply skip an operation if it’s already complete.
 So it can work a bit like a Makefile: suppose you run a command like `transcribe` on a
-video. If you've already run that command on the same YouTube URL, kash knows it and can
+video. If you’ve already run that command on the same YouTube URL, kash knows it and can
 recognize the downloaded video and transcribed text is already present in your current
 workspace.
@@ -95,7 +95,7 @@ original document), the sources are listed in a `derived_from` array within the
 This means actions can find citations or other data on the provenance of a given piece
 of information.
-This might sound a little complex, but it's quite simple in practice.
+This might sound a little complex, but it’s quite simple in practice.
 All the metadata is in a standard format,
 [Frontmatter Format](https://github.com/jlevy/frontmatter-format), and the information
 is compatible with other apps and pretty self explanatory.

kash/docs/markdown/topics/b2_workspace_and_file_formats.md CHANGED Viewed

@@ -2,7 +2,7 @@
 A kash workspace is simply a directory of files.
 The goal is for a workspace to be easy to use not just with kash but with other editors
-or tools, so it's possible to edit, share, or commit files to version control.
+or tools, so it’s possible to edit, share, or commit files to version control.
 It makes sense to devote a workspace to a single topic, project, or area of research.
 File formats and conventions:

kash/docs/markdown/topics/b3_modern_shell_tool_recommendations.md CHANGED Viewed

@@ -2,7 +2,7 @@
 Many of us (myself included) have long believed in sticking with tried-and-true bash and
 the classic command-line tools.
-While it's still wise to know these tools, we've in recent years seen many new tools
+While it’s still wise to know these tools, we’ve in recent years seen many new tools
 emerge that are more powerful, modern, and cross-platform.
 When using kash it makes sense to use these.

kash/docs/markdown/topics/b4_faq.md CHANGED Viewed

@@ -17,7 +17,7 @@ Anyone, including kash itself, can write new actions easily.
 The philosophy behind kash is similar to Unix shell tools: simple commands that can be
 combined in flexible and powerful ways.
-It operates on "items" such as URLs, files, or Markdown notes within a workspace
+It operates on “items” such as URLs, files, or Markdown notes within a workspace
 directory.
 For more detailed information, you can run `help` to get background and a list of
@@ -42,7 +42,7 @@ questions.
 ### How does kash accept both shell and assistant requests to the LLM with natural language?
 By default, if a command is valid shell or Python, kash will treat it as a shell
-command, using xonsh's conventions.
+command, using xonsh’s conventions.
 Commands that begin with a `?` are automatically considered assistant requests.
@@ -136,9 +136,9 @@ fit kash commands and actions, reading metadata on items, etc.
 ### Can kash replace my regular shell?
-While kash doesn't aim to completely replace all uses of the shell—for example, that's
+While kash doesn’t aim to completely replace all uses of the shell—for example, that’s
 hard to do in general for remote use, and people have many constraints, customizations,
-and preferences—I've found it's highly useful for a lot of situations.
+and preferences—I’ve found it’s highly useful for a lot of situations.
 It is starting to replace bash or zsh for day-to-day local use on my laptop.
 Kash basically wraps xonsh, so you have almost all the functionality of xonsh and Python
@@ -154,18 +154,18 @@ Any command you type on the command-line in kash is a command.
 Some commands are basic, built-in commands.
 The idea is there are relatively few of these, and they do important primitive things
 like `select` (select or show selections), `show` (show an item), `files` (list
-files—kash's better version of `ls`), `workspace` (shows information about the current
+files—kash’s better version of `ls`), `workspace` (shows information about the current
 workspace), or `logs` (shows the detailed logs for the current workspace).
 In Python, built-in commands are defined by simple functions.
 But most commands are defined as an *action*. Actions are invoked just like any other
-command but have a standard structure: they are assumed to perform an "action" on a set
+command but have a standard structure: they are assumed to perform an “action” on a set
 of items (files of known types) and then save those items, all within an existing
 workspace. Actions are defined as a subclass of `Action` in Python.
 ### Does nvm (Node version manager) work in kash?
-It's hard to get nvm to work well in xonsh, but try [fnm](https://github.com/Schniz/fnm)
+It’s hard to get nvm to work well in xonsh, but try [fnm](https://github.com/Schniz/fnm)
 instead! It works just as well and kash supports `fnm` automatically so it auto-detects
 and uses fnm to switch or install Node versions for directories with Node projects (i.e.
 there is an `.nvmrc`, `.node-version`, or `package.json` file).

kash/docs/markdown/welcome.md CHANGED Viewed

@@ -6,7 +6,7 @@ You may simply ask a question and the kash assistant will help you.
 Press **space** (or type **?**), then write your question or request.
 Use `logs` for detailed logs.
-*I'd love to hear from you with issues, bugs, and ideas.
+*I’d love to hear from you with issues, bugs, and ideas.
 Discuss at github.com/jlevy/kash or contact me github.com/jlevy or x.com/ojoshe (DMs
 open).*

kash/embeddings/embeddings.py CHANGED Viewed

@@ -1,16 +1,18 @@
 from __future__ import annotations
 import ast
+import json
 from collections.abc import Iterable
 from pathlib import Path
-from typing import TYPE_CHECKING, TypeAlias
+from typing import TYPE_CHECKING, Any, TypeAlias
+import pandas as pd
 from pydantic.dataclasses import dataclass
 from strif import abbrev_list
 from kash.config.logger import get_logger
 from kash.llm_utils.init_litellm import init_litellm
-from kash.llm_utils.llms import DEFAULT_EMBEDDING_MODEL
+from kash.llm_utils.llms import DEFAULT_EMBEDDING_MODEL, EmbeddingModel
 if TYPE_CHECKING:
     from pandas import DataFrame
@@ -18,15 +20,26 @@ if TYPE_CHECKING:
 log = get_logger(__name__)
-BATCH_SIZE = 1024
+BATCH_SIZE: int = 1024
 Key: TypeAlias = str
-KeyVal: TypeAlias = tuple[Key, str]
-"""
-A key-value pair where the key is a unique identifier (such as the path)
-and the value is the text to embed.
-"""
+@dataclass(frozen=True)
+class EmbValue:
+    emb_text: str
+    data: dict[str, Any] | None = None
+@dataclass(frozen=True)
+class KeyVal:
+    """
+    A key-value pair where the key is a unique identifier (such as the path)
+    and the value is the text to embed and any additional data.
+    """
+    key: Key
+    value: EmbValue
 @dataclass
@@ -36,39 +49,45 @@ class Embeddings:
     small texts, the text itself).
     """
-    data: dict[Key, tuple[str, list[float]]]
-    """Mapping of key to text and embedding."""
+    data: dict[Key, tuple[EmbValue, list[float]]]
+    """Mapping of key to EmbValue and embedding."""
-    def as_iterable(self) -> Iterable[tuple[Key, str, list[float]]]:
-        return ((key, text, emb) for key, (text, emb) in self.data.items())
+    def as_iterable(self) -> Iterable[tuple[Key, EmbValue, list[float]]]:
+        return ((key, emb_value, emb) for key, (emb_value, emb) in self.data.items())
     def as_df(self) -> DataFrame:
         from pandas import DataFrame
-        keys, texts, embeddings = zip(
-            *[(key, text, emb) for key, (text, emb) in self.data.items()], strict=False
-        )
+        if not self.data:
+            return DataFrame({"key": [], "text": [], "data": [], "embedding": []})
+        items = [(key, emb_value, emb) for key, (emb_value, emb) in self.data.items()]
+        keys, emb_values, embeddings = zip(*items, strict=False)
         return DataFrame(
             {
-                "key": keys,
-                "text": texts,
-                "embedding": embeddings,
+                "key": list(keys),
+                "text": [ev.emb_text for ev in emb_values],
+                "data": [ev.data for ev in emb_values],
+                "embedding": list(embeddings),
             }
         )
-    def __getitem__(self, key: Key) -> tuple[str, list[float]]:
+    def __getitem__(self, key: Key) -> tuple[EmbValue, list[float]]:
         if key in self.data:
             return self.data[key]
         else:
             raise KeyError(f"Key '{key}' not found in embeddings")
     @classmethod
-    def embed(cls, keyvals: list[KeyVal], model=DEFAULT_EMBEDDING_MODEL) -> Embeddings:
+    def embed(
+        cls, keyvals: list[KeyVal], model: EmbeddingModel = DEFAULT_EMBEDDING_MODEL
+    ) -> Embeddings:
         from litellm import embedding
         init_litellm()
-        data = {}
+        data: dict[Key, tuple[EmbValue, list[float]]] = {}
         log.info(
             "Embedding %d texts (model %s, batch size %s)…",
             len(keyvals),
@@ -76,21 +95,23 @@ class Embeddings:
             BATCH_SIZE,
         )
         for batch_start in range(0, len(keyvals), BATCH_SIZE):
-            batch_end = batch_start + BATCH_SIZE
-            batch = keyvals[batch_start:batch_end]
-            keys = [kv[0] for kv in batch]
-            texts = [kv[1] for kv in batch]
+            batch_end: int = batch_start + BATCH_SIZE
+            batch: list[KeyVal] = keyvals[batch_start:batch_end]
+            keys: list[Key] = [kv.key for kv in batch]
+            texts: list[str] = [kv.value.emb_text for kv in batch]
             response = embedding(model=model.litellm_name, input=texts)
             if not response.data:
                 raise ValueError("No embedding response data")
-            batch_embeddings = [e["embedding"] for e in response.data]
+            batch_embeddings: list[list[float]] = [e["embedding"] for e in response.data]
             data.update(
                 {
-                    key: (text, emb)
-                    for key, text, emb in zip(keys, texts, batch_embeddings, strict=False)
+                    key: (emb_value, emb)
+                    for key, emb_value, emb in zip(
+                        keys, [kv.value for kv in batch], batch_embeddings, strict=False
+                    )
                 }
             )
@@ -110,32 +131,82 @@ class Embeddings:
     def read_from_csv(cls, path: Path) -> Embeddings:
         import pandas as pd
-        df = pd.read_csv(path)
+        df: pd.DataFrame = pd.read_csv(path)
         df["embedding"] = df["embedding"].apply(ast.literal_eval)
-        data = {row["key"]: (row["text"], row["embedding"]) for _, row in df.iterrows()}
-        return cls(data=data)  # pyright: ignore
+        # Handle missing data column just in case.
+        if "data" in df.columns:
+            df["data"] = df["data"].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else None)
+        else:
+            df["data"] = None
+        data: dict[Key, tuple[EmbValue, list[float]]] = {}
+        for _, row in df.iterrows():
+            key = str(row["key"])
+            text = str(row["text"])
+            embedding = list(row["embedding"])
+            # Type-safe handling of data column
+            raw_data = row["data"] if "data" in df.columns else None
+            data_value: dict[str, Any] | None = (
+                raw_data if isinstance(raw_data, dict) or raw_data is None else None
+            )
+            data[key] = (
+                EmbValue(emb_text=text, data=data_value),
+                embedding,
+            )
+        return cls(data=data)
     def to_npz(self, path: Path) -> None:
         """Save embeddings in numpy's compressed format."""
         import numpy as np
         keys: list[Key] = list(self.data.keys())
-        texts: list[str] = [self.data[k][0] for k in keys]
+        texts: list[str] = [self.data[k][0].emb_text for k in keys]
+        # Serialize data as JSON strings
+        data_strings: list[str] = [
+            json.dumps(self.data[k][0].data) if self.data[k][0].data is not None else ""
+            for k in keys
+        ]
         embeddings = np.array([self.data[k][1] for k in keys])
-        np.savez_compressed(path, keys=keys, texts=texts, embeddings=embeddings)
+        np.savez_compressed(
+            path,
+            keys=keys,
+            texts=texts,
+            data=data_strings,
+            embeddings=embeddings,
+        )
     @classmethod
     def read_from_npz(cls, path: Path) -> Embeddings:
         """Load embeddings from numpy's compressed format."""
         import numpy as np
-        with np.load(path) as data:
-            loaded_data = {
-                k: (t, e.tolist())
-                for k, t, e in zip(data["keys"], data["texts"], data["embeddings"], strict=False)
-            }
+        with np.load(path) as npz_data:
+            if "data" in npz_data.files:
+                data_array = npz_data["data"]
+            else:
+                # No data column, so no data.
+                data_array = None
+            loaded_data: dict[Key, tuple[EmbValue, list[float]]] = {}
+            for i, (k, t, e) in enumerate(
+                zip(
+                    npz_data["keys"],
+                    npz_data["texts"],
+                    npz_data["embeddings"],
+                    strict=False,
+                )
+            ):
+                data_str = data_array[i] if data_array is not None else ""
+                loaded_data[k] = (
+                    EmbValue(emb_text=t, data=json.loads(data_str) if data_str else None),
+                    e.tolist(),
+                )
         return cls(data=loaded_data)
     def __str__(self) -> str:
-        dims = -1 if len(self.data) == 0 else len(next(iter(self.data))[1])
+        dims: int = -1 if len(self.data) == 0 else len(next(iter(self.data.values()))[1])
         return f"Embeddings({len(self.data)} items, {dims} dimensions)"

kash/embeddings/text_similarity.py CHANGED Viewed

@@ -52,8 +52,8 @@ def rank_by_relatedness(
     query_embedding = response.data[0]["embedding"]
     scored_strings = [
-        (key, text, relatedness_fn(query_embedding, emb))
-        for key, text, emb in embeddings.as_iterable()
+        (key, emb_value.emb_text, relatedness_fn(query_embedding, emb))
+        for key, emb_value, emb in embeddings.as_iterable()
     ]
     scored_strings.sort(key=lambda x: x[2], reverse=True)

kash-shell 0.3.25__py3-none-any.whl → 0.3.27__py3-none-any.whl

kash-shell 0.3.25py3-none-any.whl → 0.3.27py3-none-any.whl