PyPI - contextpilot - Versions diffs - 0.3.1__tar.gz → 0.3.2__tar.gz - Mend

contextpilot 0.3.1tar.gz → 0.3.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

contextpilot-0.3.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,239 @@
+Metadata-Version: 2.4
+Name: contextpilot
+Version: 0.3.2
+Summary: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
+Author: Yinsicheng Jiang, Chivier Humber
+License: Apache-2.0
+Project-URL: Homepage, https://github.com/SecretSettler/ContextPilot
+Project-URL: Repository, https://github.com/SecretSettler/ContextPilot
+Project-URL: Issues, https://github.com/SecretSettler/ContextPilot/issues
+Keywords: rag,llm,context-reuse,kv-cache,retrieval-augmented-generation
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: datasets
+Requires-Dist: transformers
+Requires-Dist: elasticsearch==8.18.1
+Requires-Dist: aiohttp
+Requires-Dist: ujson
+Requires-Dist: scipy
+Requires-Dist: fastapi[all]
+Requires-Dist: cupy-cuda12x
+Requires-Dist: pytest
+Provides-Extra: dev
+Requires-Dist: black; extra == "dev"
+Requires-Dist: bumpver; extra == "dev"
+Requires-Dist: isort; extra == "dev"
+Requires-Dist: pip-tools; extra == "dev"
+Requires-Dist: pytest; extra == "dev"
+Requires-Dist: pytest-cov; extra == "dev"
+Requires-Dist: ipython; extra == "dev"
+Dynamic: license-file
+<div align="center">
+  <img src="assets/about.png" alt="ContextPilot Logo" width="800"/>
+  <h1><strong>ContextPilot: Efficient Long Context Inference with Context Reuse</strong></h1>
+  [![Python](https://img.shields.io/badge/python-≥3.10-blue)](https://www.python.org/)
+  [![PyPI](https://img.shields.io/pypi/v/contextpilot)](https://pypi.org/project/contextpilot/)
+  [![License](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)
+</div>
+--------------------------------------------------------------------------------
+| [**Documentation**](docs/README.md) | [**Examples**](examples/) | [**Benchmarks**](docs/reference/benchmarks.md) |
+## News
+- [2026/02] ContextPilot v0.3.2 released, supporting [PageIndex](https://github.com/VectifyAI/PageIndex) and [Mem0](https://github.com/mem0ai/mem0).
+- [2026/01] ContextPilot has been accepted to MLSys 2026 🎉! See you in Bellevue, WA, USA.
+- [2025/12] ContextPilot v0.2.0 released.
+## About
+ContextPilot is a fast optimization system on context engineering layer for agentic workloads:
+1. **High Throughput & Cache Hit Ratio**: Boosting prefill throughput and prefix cache hit ratio with intelligent context reuse.
+2. **Strong Compatibility**: Strong compatibility with existing popular RAG libraries (PageIndex), Agentic memory layer (Mem0), KV cache optimization engine (LMCache), and Inference engines (vLLM and SGLang).
+3. **Negligible Accuracy Loss**: Achieving significant performance improvements with minimal to no accuracy degradation across various benchmarks.
+3. **Widely Tested**: Tested with a wide range of RAG and Agentic AI applications.
+## Target Workloads
+1. **Trending Topic QA** — Search and generation for breaking news and hot topics beyond model knowledge
+2. **Closed-Domain Long-Context QA** — QA over specialized corpora (novels, financial reports, legal documents) with retrieval or in-context search
+3. **Large-Batch Long-Context Execution** — High-throughput inference where many requests share overlapping contexts; ContextPilot maximizes prefix reuse regardless of the search method
+4. **Multi-Turn Conversations with Long-Term Memory** — Persistent context reuse across turns (e.g. [Mem0](https://github.com/mem0ai/mem0))
+## Benchmark and Performance
+### System Performance
+<div align="center">
+<img src="assets/deepseek_r1_results.png" alt="Benchmark Results" width="600"/>
+</div>
+ContextPilot (Stateless) on DeepSeek-R1 maintains accuracy compared to SGLang, achieving 64.68% vs 64.15% F1 on MultihopRAG and 41.08% vs 40.20% F1 on NarrativeQA.
+### Accuracy on MT-RAG Benchmark (Online Scheduling)
+<div align="center">
+| Method | Qwen3-4B | Llama3.1-8B | Qwen3-30B-A3B |
+|--------|----------|-------------|-----------|
+| LMCache | 62.56 | **68.46** | 75.12 |
+| CacheBlend | 50.33 | 56.52 | X |
+| RadixCache | 62.56 | **68.46** | 75.12 |
+| **ContextPilot** | **64.27** | 68.12 | **75.81** |
+</div>
+ContextPilot delivers **4-13x** improvements in cache hit rates and **1.5-3.5x** reductions in prefill latency for large-batch RAG workloads, while maintaining or improving accuracy.
+**Furthermore**, ContextPilot has been tested to reduce input token costs by around **36%** with GPT-5.2.
+See [Benchmarks](docs/reference/benchmarks.md) in the documentation for GPU vs CPU performance analysis and detailed benchmark methodology.
+## Getting Started
+### Installation
+**Requirements:** Python >= 3.10
+```bash
+pip install contextpilot
+```
+Or install from source:
+```bash
+git clone https://github.com/Edinburgh-AgenticAI/ContextPilot.git
+cd ContextPilot
+pip install -e .
+```
+More [detailed installation instructions](docs/getting_started/installation.md) are available in the docs.
+### Quick Start
+**Offline / Online Stateless** — build index & schedule in one shot:
+```python
+from openai import OpenAI
+import contextpilot as cp
+client = OpenAI(base_url="http://localhost:30000/v1", api_key="...") # Your inference engine URL and API key
+queries = ["What is AI?", "Explain neural networks", "What is deep learning?"]
+all_contexts = [
+    ["Doc about AI", "Doc about ML", "Doc about computing"],
+    ["Doc about neural nets", "Doc about deep learning"],
+    ["Doc about ML", "Doc about AI", "Doc about deep learning basics"],
+]
+# Build index and schedule for prefix sharing
+index = cp.build_context_index(all_contexts, use_gpu=False)
+reordered, _, order, _ = cp.InterContextScheduler().schedule_contexts(index)
+# Send in optimized order — shared prefixes hit KV cache
+for ctx, orig_idx in zip(reordered, order):
+    docs_section = "\n".join(f"[{i+1}] {doc}" for i, doc in enumerate(ctx))
+    # Importance ranking restores original retrieval order for the model
+    importance_ranking = ">".join(
+        str(ctx.index(doc) + 1) for doc in all_contexts[orig_idx] if doc in ctx
+    )
+    response = client.chat.completions.create(
+        model="Qwen/Qwen3-4B",
+        messages=[
+            {"role": "system", "content": (
+                f"Answer the question based on the provided documents.\n\n"
+                f"<documents>\n{docs_section}\n</documents>\n\n"
+                f"Read the documents in this importance ranking: {importance_ranking}\n"
+                f"Prioritize information from higher-ranked documents."
+            )},
+            {"role": "user", "content": queries[orig_idx]},
+        ],
+    )
+    print(f"Q: {queries[orig_idx]}\nA: {response.choices[0].message.content}\n")
+```
+> For online stateless scheduling via HTTP server, see the [online usage guide](docs/guides/online_usage.md).
+**Stateful** — `LiveContextIndex` tracks cached state:
+```python
+from openai import OpenAI
+import contextpilot as cp
+client = OpenAI(base_url="http://localhost:30000/v1", api_key="...")
+live = cp.LiveContextIndex(use_gpu=False)
+# Simulate multi-turn: each turn has batch_size=1
+turns = [
+    {
+        "query": "What is AI?",
+        "contexts": [["Doc about AI", "Doc about ML", "Doc about computing"]],
+    },
+    {
+        "query": "Compare supervised and unsupervised learning",
+        # 2 of 3 docs overlap with Turn 1 ("Doc about AI", "Doc about ML"), different order + 1 new doc
+        "contexts": [["Doc about ML", "Doc about clustering", "Doc about AI"]],
+    },
+]
+for turn_idx, turn in enumerate(turns):
+    contexts = turn["contexts"]
+    query = turn["query"]
+    # build_incremental handles both cold start and incremental turns
+    result = live.build_incremental(contexts)
+    reordered = result['reordered_contexts']
+    # Turn 2: reordered to ["Doc about AI", "Doc about ML", "Doc about clustering"]
+    #                        ^— shared prefix from Turn 1 —^    ^— new doc appended
+    ctx = reordered[0]
+    docs_section = "\n".join(f"[{i+1}] {doc}" for i, doc in enumerate(ctx))
+    importance_ranking = ">".join(
+        str(ctx.index(doc) + 1) for doc in contexts[0] if doc in ctx
+    )
+    response = client.chat.completions.create(
+        model="Qwen/Qwen3-4B",
+        messages=[
+            {"role": "system", "content": (
+                f"Answer the question based on the provided documents.\n\n"
+                f"<documents>\n{docs_section}\n</documents>\n\n"
+                f"Read the documents in this importance ranking: {importance_ranking}\n"
+                f"Prioritize information from higher-ranked documents."
+            )},
+            {"role": "user", "content": query},
+        ],
+    )
+    print(f"[Turn {turn_idx+1}] Q: {query}")
+    print(f"A: {response.choices[0].message.content}\n")
+```
+> **Note:** Stateful mode works without eviction sync — `LiveContextIndex` tracks the previous ordering and reorders new contexts to maximize prefix cache hits. For production deployments with limited storage size where the KV cache may evict entries, install the [SGLang eviction patch](docs/guides/online_usage.md#sglang-integration) to keep the index in sync. See the [online usage guide](docs/guides/online_usage.md) for HTTP server setup.
+## Documentation
+Check out the ContextPilot [documentation](docs/README.md) for comprehensive guides.
+## Examples
+Go hands-on with our [examples](examples/), demonstrating how to address different use cases with ContextPilot.
+## Contributing
+We welcome and value all contributions! Please feel free to submit issues and pull requests.
+## Citation
+We will include the paper citation soon!

contextpilot-0.3.2/README.md ADDED Viewed

@@ -0,0 +1,198 @@
+<div align="center">
+  <img src="assets/about.png" alt="ContextPilot Logo" width="800"/>
+  <h1><strong>ContextPilot: Efficient Long Context Inference with Context Reuse</strong></h1>
+  [![Python](https://img.shields.io/badge/python-≥3.10-blue)](https://www.python.org/)
+  [![PyPI](https://img.shields.io/pypi/v/contextpilot)](https://pypi.org/project/contextpilot/)
+  [![License](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)
+</div>
+--------------------------------------------------------------------------------
+| [**Documentation**](docs/README.md) | [**Examples**](examples/) | [**Benchmarks**](docs/reference/benchmarks.md) |
+## News
+- [2026/02] ContextPilot v0.3.2 released, supporting [PageIndex](https://github.com/VectifyAI/PageIndex) and [Mem0](https://github.com/mem0ai/mem0).
+- [2026/01] ContextPilot has been accepted to MLSys 2026 🎉! See you in Bellevue, WA, USA.
+- [2025/12] ContextPilot v0.2.0 released.
+## About
+ContextPilot is a fast optimization system on context engineering layer for agentic workloads:
+1. **High Throughput & Cache Hit Ratio**: Boosting prefill throughput and prefix cache hit ratio with intelligent context reuse.
+2. **Strong Compatibility**: Strong compatibility with existing popular RAG libraries (PageIndex), Agentic memory layer (Mem0), KV cache optimization engine (LMCache), and Inference engines (vLLM and SGLang).
+3. **Negligible Accuracy Loss**: Achieving significant performance improvements with minimal to no accuracy degradation across various benchmarks.
+3. **Widely Tested**: Tested with a wide range of RAG and Agentic AI applications.
+## Target Workloads
+1. **Trending Topic QA** — Search and generation for breaking news and hot topics beyond model knowledge
+2. **Closed-Domain Long-Context QA** — QA over specialized corpora (novels, financial reports, legal documents) with retrieval or in-context search
+3. **Large-Batch Long-Context Execution** — High-throughput inference where many requests share overlapping contexts; ContextPilot maximizes prefix reuse regardless of the search method
+4. **Multi-Turn Conversations with Long-Term Memory** — Persistent context reuse across turns (e.g. [Mem0](https://github.com/mem0ai/mem0))
+## Benchmark and Performance
+### System Performance
+<div align="center">
+<img src="assets/deepseek_r1_results.png" alt="Benchmark Results" width="600"/>
+</div>
+ContextPilot (Stateless) on DeepSeek-R1 maintains accuracy compared to SGLang, achieving 64.68% vs 64.15% F1 on MultihopRAG and 41.08% vs 40.20% F1 on NarrativeQA.
+### Accuracy on MT-RAG Benchmark (Online Scheduling)
+<div align="center">
+| Method | Qwen3-4B | Llama3.1-8B | Qwen3-30B-A3B |
+|--------|----------|-------------|-----------|
+| LMCache | 62.56 | **68.46** | 75.12 |
+| CacheBlend | 50.33 | 56.52 | X |
+| RadixCache | 62.56 | **68.46** | 75.12 |
+| **ContextPilot** | **64.27** | 68.12 | **75.81** |
+</div>
+ContextPilot delivers **4-13x** improvements in cache hit rates and **1.5-3.5x** reductions in prefill latency for large-batch RAG workloads, while maintaining or improving accuracy.
+**Furthermore**, ContextPilot has been tested to reduce input token costs by around **36%** with GPT-5.2.
+See [Benchmarks](docs/reference/benchmarks.md) in the documentation for GPU vs CPU performance analysis and detailed benchmark methodology.
+## Getting Started
+### Installation
+**Requirements:** Python >= 3.10
+```bash
+pip install contextpilot
+```
+Or install from source:
+```bash
+git clone https://github.com/Edinburgh-AgenticAI/ContextPilot.git
+cd ContextPilot
+pip install -e .
+```
+More [detailed installation instructions](docs/getting_started/installation.md) are available in the docs.
+### Quick Start
+**Offline / Online Stateless** — build index & schedule in one shot:
+```python
+from openai import OpenAI
+import contextpilot as cp
+client = OpenAI(base_url="http://localhost:30000/v1", api_key="...") # Your inference engine URL and API key
+queries = ["What is AI?", "Explain neural networks", "What is deep learning?"]
+all_contexts = [
+    ["Doc about AI", "Doc about ML", "Doc about computing"],
+    ["Doc about neural nets", "Doc about deep learning"],
+    ["Doc about ML", "Doc about AI", "Doc about deep learning basics"],
+]
+# Build index and schedule for prefix sharing
+index = cp.build_context_index(all_contexts, use_gpu=False)
+reordered, _, order, _ = cp.InterContextScheduler().schedule_contexts(index)
+# Send in optimized order — shared prefixes hit KV cache
+for ctx, orig_idx in zip(reordered, order):
+    docs_section = "\n".join(f"[{i+1}] {doc}" for i, doc in enumerate(ctx))
+    # Importance ranking restores original retrieval order for the model
+    importance_ranking = ">".join(
+        str(ctx.index(doc) + 1) for doc in all_contexts[orig_idx] if doc in ctx
+    )
+    response = client.chat.completions.create(
+        model="Qwen/Qwen3-4B",
+        messages=[
+            {"role": "system", "content": (
+                f"Answer the question based on the provided documents.\n\n"
+                f"<documents>\n{docs_section}\n</documents>\n\n"
+                f"Read the documents in this importance ranking: {importance_ranking}\n"
+                f"Prioritize information from higher-ranked documents."
+            )},
+            {"role": "user", "content": queries[orig_idx]},
+        ],
+    )
+    print(f"Q: {queries[orig_idx]}\nA: {response.choices[0].message.content}\n")
+```
+> For online stateless scheduling via HTTP server, see the [online usage guide](docs/guides/online_usage.md).
+**Stateful** — `LiveContextIndex` tracks cached state:
+```python
+from openai import OpenAI
+import contextpilot as cp
+client = OpenAI(base_url="http://localhost:30000/v1", api_key="...")
+live = cp.LiveContextIndex(use_gpu=False)
+# Simulate multi-turn: each turn has batch_size=1
+turns = [
+    {
+        "query": "What is AI?",
+        "contexts": [["Doc about AI", "Doc about ML", "Doc about computing"]],
+    },
+    {
+        "query": "Compare supervised and unsupervised learning",
+        # 2 of 3 docs overlap with Turn 1 ("Doc about AI", "Doc about ML"), different order + 1 new doc
+        "contexts": [["Doc about ML", "Doc about clustering", "Doc about AI"]],
+    },
+]
+for turn_idx, turn in enumerate(turns):
+    contexts = turn["contexts"]
+    query = turn["query"]
+    # build_incremental handles both cold start and incremental turns
+    result = live.build_incremental(contexts)
+    reordered = result['reordered_contexts']
+    # Turn 2: reordered to ["Doc about AI", "Doc about ML", "Doc about clustering"]
+    #                        ^— shared prefix from Turn 1 —^    ^— new doc appended
+    ctx = reordered[0]
+    docs_section = "\n".join(f"[{i+1}] {doc}" for i, doc in enumerate(ctx))
+    importance_ranking = ">".join(
+        str(ctx.index(doc) + 1) for doc in contexts[0] if doc in ctx
+    )
+    response = client.chat.completions.create(
+        model="Qwen/Qwen3-4B",
+        messages=[
+            {"role": "system", "content": (
+                f"Answer the question based on the provided documents.\n\n"
+                f"<documents>\n{docs_section}\n</documents>\n\n"
+                f"Read the documents in this importance ranking: {importance_ranking}\n"
+                f"Prioritize information from higher-ranked documents."
+            )},
+            {"role": "user", "content": query},
+        ],
+    )
+    print(f"[Turn {turn_idx+1}] Q: {query}")
+    print(f"A: {response.choices[0].message.content}\n")
+```
+> **Note:** Stateful mode works without eviction sync — `LiveContextIndex` tracks the previous ordering and reorders new contexts to maximize prefix cache hits. For production deployments with limited storage size where the KV cache may evict entries, install the [SGLang eviction patch](docs/guides/online_usage.md#sglang-integration) to keep the index in sync. See the [online usage guide](docs/guides/online_usage.md) for HTTP server setup.
+## Documentation
+Check out the ContextPilot [documentation](docs/README.md) for comprehensive guides.
+## Examples
+Go hands-on with our [examples](examples/), demonstrating how to address different use cases with ContextPilot.
+## Contributing
+We welcome and value all contributions! Please feel free to submit issues and pull requests.
+## Citation
+We will include the paper citation soon!

{contextpilot-0.3.1 → contextpilot-0.3.2}/contextpilot/__init__.py RENAMED Viewed

@@ -38,6 +38,8 @@ from .context_ordering import (
     InterContextScheduler,
 )
+from .server.live_index import LiveContextIndex
 from .retriever import (
     BM25Retriever,
     FAISSRetriever,
@@ -47,7 +49,7 @@ from .retriever import (
     MEM0_AVAILABLE,
 )
-__version__ = "0.3.1"
+__version__ = "0.3.2"
 __all__ = [
     # High-level pipeline API
@@ -63,6 +65,7 @@ __all__ = [
     'build_context_index',
     'IntraContextOrderer',
     'InterContextScheduler',
+    'LiveContextIndex',
     # Retrievers
     'BM25Retriever',

{contextpilot-0.3.1 → contextpilot-0.3.2}/contextpilot/context_index/index_construction.py RENAMED Viewed

@@ -96,6 +96,12 @@ class ContextIndex:
         self.node_manager = NodeManager()
         self.context_orderer = IntraContextOrderer()
+        # String-to-int mapping (auto-populated when string inputs are given)
+        self._str_to_id: dict = {}
+        self._id_to_str: dict = {}
+        self._next_str_id: int = 0
+        self._is_string_input: bool = False
         if self.use_gpu:
             print("Using GPU for distance computation")
         else:
@@ -104,16 +110,48 @@ class ContextIndex:
             else:
                 print("Using CPU for distance computation")
-    def fit_transform(self, contexts: List[List[int]]) -> IndexResult:
+    def _convert_to_int(self, contexts):
+        """Convert string contexts to integer IDs if needed."""
+        if not contexts or not contexts[0]:
+            return contexts
+        if isinstance(contexts[0][0], str):
+            self._is_string_input = True
+            converted = []
+            for ctx in contexts:
+                converted_ctx = []
+                for item in ctx:
+                    sid = self._str_to_id.get(item)
+                    if sid is None:
+                        sid = self._next_str_id
+                        self._str_to_id[item] = sid
+                        self._id_to_str[sid] = item
+                        self._next_str_id += 1
+                    converted_ctx.append(sid)
+                converted.append(converted_ctx)
+            return converted
+        return contexts
+    def _convert_to_str(self, contexts):
+        """Convert integer contexts back to strings if input was strings."""
+        if not self._is_string_input or not contexts:
+            return contexts
+        # Skip if already converted (e.g. from fit_transform output)
+        if contexts[0] and isinstance(contexts[0][0], str):
+            return contexts
+        return [[self._id_to_str[i] for i in ctx] for ctx in contexts]
+    def fit_transform(self, contexts) -> IndexResult:
         """
         Perform clustering and return results.
         Args:
-            contexts: List of contexts, where each prompt is a list of chunk IDs
+            contexts: List of contexts, where each context is a list of chunk IDs (int) or strings.
+                      String inputs are automatically converted to integer IDs.
         Returns:
             IndexResult object containing clustering results
         """
+        contexts = self._convert_to_int(contexts)
         n = len(contexts)
         if n < 2:
@@ -194,14 +232,41 @@ class ContextIndex:
             )
     def _handle_single_prompt(self, contexts: List[List[int]]) -> IndexResult:
-        """Handle case with less than 2 contexts."""
+        """Handle case with less than 2 contexts.
+        Always creates an empty root node above the leaf(s) so that
+        leaf.is_root is never True.  This prevents the root-exclusion
+        guard in build_incremental from skipping legitimate matches.
+        """
         for i, prompt in enumerate(contexts):
-            self.node_manager.create_leaf_node(i, prompt)
+            node = self.node_manager.create_leaf_node(i, prompt)
+            # ClusterNode.__init__ sets doc_ids = sorted(content).
+            # Override to preserve the original context order so that
+            # build_incremental can use it as a correct prefix for Turn 2.
+            node.doc_ids = list(prompt)
+        # Wrap leaf node(s) under an empty root so that no leaf is the root.
+        # This mirrors the virtual-root logic in update_search_paths for forests,
+        # but applies it even for a single leaf.
+        leaf_ids = list(self.node_manager.unique_nodes.keys())
+        virtual_root_id = max(leaf_ids) + 1 if leaf_ids else 0
+        virtual_root = ClusterNode(
+            node_id=virtual_root_id,
+            content=set(),
+            original_indices=set(),
+            distance=0.0,
+            children=leaf_ids,
+            parent=None,
+            frequency=sum(self.node_manager.unique_nodes[nid].frequency for nid in leaf_ids)
+        )
+        self.node_manager.unique_nodes[virtual_root_id] = virtual_root
+        for nid in leaf_ids:
+            self.node_manager.unique_nodes[nid].parent = virtual_root_id
-        # Update search paths even for single nodes
+        # Update search paths (now a proper rooted tree)
         self.node_manager.update_search_paths()
-        # For single context, extract search paths (will be empty for root-only tree)
+        # For single context, extract search paths
         search_paths = self.context_orderer.extract_search_paths(
             self.node_manager.unique_nodes, len(contexts)
         )
@@ -233,7 +298,7 @@ class ContextIndex:
 # Convenience function for backward compatibility
-def build_context_index(contexts: List[List[int]],
+def build_context_index(contexts,
                        linkage_method: str = "average",
                        use_gpu: bool = True,
                        alpha: float = 0.005,
@@ -243,7 +308,7 @@ def build_context_index(contexts: List[List[int]],
     Convenience function for building a context index.
     Args:
-        contexts: List of contexts, where each prompt is a list of chunk IDs
+        contexts: List of contexts, where each context is a list of chunk IDs (int) or strings
         linkage_method: Linkage method for hierarchical clustering
         use_gpu: Whether to use GPU for distance computation
         alpha: Weight for position term in distance calculation
@@ -260,4 +325,11 @@ def build_context_index(contexts: List[List[int]],
         num_workers=num_workers,
         batch_size=batch_size
     )
-    return indexer.fit_transform(contexts)
+    result = indexer.fit_transform(contexts)
+    # Convert back to strings at the API boundary if input was strings
+    if indexer._is_string_input:
+        result.reordered_contexts = indexer._convert_to_str(result.reordered_contexts)
+        result.original_contexts = indexer._convert_to_str(result.original_contexts)
+        result.reordered_prompts = result.reordered_contexts
+        result.original_prompts = result.original_contexts
+    return result

{contextpilot-0.3.1 → contextpilot-0.3.2}/contextpilot/server/http_client.py RENAMED Viewed

@@ -28,11 +28,11 @@ class ContextPilotIndexClient:
     Example usage in SGLang:
         # In scheduler initialization:
         self.contextpilot_client = ContextPilotIndexClient("http://localhost:8765")
-        # In eviction code:
-        def evict_tokens(self, num_tokens):
-            self.tree_cache.evict(num_tokens)
-            self.contextpilot_client.evict(num_tokens)  # Sync with index
+        # In eviction callback:
+        def on_cache_evict(self, evicted_request_ids):
+            # Sync eviction with ContextPilot index
+            self.contextpilot_client.evict(evicted_request_ids)
     """
     def __init__(
@@ -92,19 +92,23 @@ class ContextPilotIndexClient:
             logger.warning(f"ContextPilot index request failed: {e}")
             return None
-    def evict(self, num_tokens: int) -> Optional[Dict[str, Any]]:
+    def evict(self, request_ids: List[str]) -> Optional[Dict[str, Any]]:
         """
-        Evict tokens from the index.
+        Evict requests from the index.
         THIS IS THE MAIN METHOD THAT SGLANG SHOULD CALL FOR EVICTION SYNC.
         Args:
-            num_tokens: Number of tokens to evict (same as SGLang's eviction)
+            request_ids: List of request IDs to evict (from SGLang's cache eviction)
         Returns:
-            Dictionary with eviction results, or None if request failed
+            Dictionary with eviction results:
+            - removed_count: Number of requests successfully removed
+            - not_found: List of request IDs that were not in the index
+            - conversations_cleared: Number of conversation chains cleared
+            Returns None if request failed
         """
-        return self._post("/evict", {"num_tokens": num_tokens})
+        return self._post("/evict", {"request_ids": request_ids})
     def search(
         self,
@@ -327,16 +331,26 @@ class ContextPilotIndexClient:
 # Convenience functions for simple usage
-def evict_tokens(num_tokens: int, server_url: str = "http://localhost:8765"):
+def evict_requests(
+    request_ids: List[str],
+    server_url: str = "http://localhost:8765"
+) -> Optional[Dict[str, Any]]:
     """
-    Simple function to evict tokens.
+    Simple function to evict requests from the index.
     For one-off calls without maintaining a client instance.
+    Args:
+        request_ids: List of request IDs to evict
+        server_url: ContextPilot server URL
+    Returns:
+        Dictionary with removed_count, not_found, conversations_cleared
     """
     try:
         response = requests.post(
             f"{server_url}/evict",
-            json={"num_tokens": num_tokens},
+            json={"request_ids": request_ids},
             timeout=1.0
         )
         response.raise_for_status()

contextpilot 0.3.1__tar.gz → 0.3.2__tar.gz

contextpilot 0.3.1tar.gz → 0.3.2tar.gz