PyPI - bit-ttt-engine - Versions diffs - 0.6.1__cp310-cp310-win_amd64.whl → 0.7.0__cp310-cp310-win_amd64.whl - Mend

bit-ttt-engine 0.6.1__cp310-cp310-win_amd64.whl → 0.7.0__cp310-cp310-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

bit_ttt_engine-0.7.0.dist-info/METADATA +136 -0
bit_ttt_engine-0.7.0.dist-info/RECORD +14 -0
bit_ttt_engine-0.7.0.dist-info/entry_points.txt +2 -0
bit_ttt_engine-0.7.0.dist-info/licenses/LICENSE +21 -0
cortex_rust/__init__.py +21 -1
cortex_rust/__main__.py +4 -0
cortex_rust/__pycache__/__init__.cpython-310.pyc +0 -0
cortex_rust/chat.py +196 -0
cortex_rust/cli.py +381 -0
cortex_rust/cortex_rust.cp310-win_amd64.pyd +0 -0
cortex_rust/engine.py +253 -0
cortex_rust/server.py +493 -0
bit_ttt_engine-0.6.1.dist-info/METADATA +0 -92
bit_ttt_engine-0.6.1.dist-info/RECORD +0 -8
cortex_rust/__init__.pyi +0 -100
cortex_rust/py.typed +0 -0
/bit_ttt_engine-0.6.1.dist-info/licenses/LICENSE → /LICENSE +0 -0
{bit_ttt_engine-0.6.1.dist-info → bit_ttt_engine-0.7.0.dist-info}/WHEEL +0 -0

bit_ttt_engine-0.7.0.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,136 @@
+Metadata-Version: 2.4
+Name: bit-ttt-engine
+Version: 0.7.0
+Classifier: Development Status :: 4 - Beta
+Classifier: Programming Language :: Rust
+Classifier: Programming Language :: Python :: Implementation :: CPython
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Operating System :: Microsoft :: Windows
+Requires-Dist: tokenizers>=0.19
+Requires-Dist: huggingface-hub>=0.20
+Requires-Dist: bit-ttt-engine[server] ; extra == 'all'
+Requires-Dist: fastapi>=0.100 ; extra == 'server'
+Requires-Dist: uvicorn>=0.20 ; extra == 'server'
+Requires-Dist: sse-starlette>=1.6 ; extra == 'server'
+Provides-Extra: all
+Provides-Extra: server
+License-File: LICENSE
+Summary: Fast local LLM inference with TTT (Test-Time Training) and LoRA — the model that learns while it runs
+Keywords: llm,inference,ttt,lora,gguf,quantization,cuda
+Home-Page: https://github.com/imonoonoko/Bit-TTT-Engine
+Author: imonoonoko
+License: MIT
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
+# 🧠 Bit-TTT-Engine
+[![PyPI](https://img.shields.io/pypi/v/bit-ttt-engine.svg)](https://pypi.org/project/bit-ttt-engine/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org/)
+**Fast local LLM inference that learns while it runs.**
+- 🏎️ **47+ tok/s** on RTX 4060 Ti (7B Q4_K_M)
+- 🧠 **TTT** (Test-Time Training) — adapts during inference (world's first!)
+- 🎨 **LoRA** — fine-tune with one flag
+- 📦 **5 models** — Llama-2/3, Gemma-2, Qwen2.5, Mistral
+- 🔌 **OpenAI-compatible API** — drop-in replacement
+## 🚀 Quick Start
+```bash
+pip install bit-ttt-engine
+```
+```python
+import cortex_rust
+# Load any GGUF model (auto-downloads from HuggingFace!)
+model = cortex_rust.load("user/model-GGUF")
+# Chat
+response = model.chat([
+    {"role": "user", "content": "Hello!"}
+])
+print(response)
+# Stream
+for token in model.chat_stream([
+    {"role": "user", "content": "Tell me a story"}
+]):
+    print(token, end="", flush=True)
+```
+## 🖥️ CLI
+```bash
+# Interactive chat
+bit-ttt chat model.gguf
+# Generate text
+bit-ttt generate model.gguf -p "Once upon a time"
+# OpenAI-compatible API server
+bit-ttt serve model.gguf --port 8000
+# With LoRA + Q8 KV cache
+bit-ttt chat model.gguf --lora adapter.bin --q8-cache
+```
+## 🧠 TTT — Test-Time Training
+**The model learns while it generates.** No other local LLM does this.
+```python
+model = cortex_rust.load("model.gguf")
+model.enable_ttt(True)
+# Each conversation makes the model smarter
+response = model.chat([{"role": "user", "content": "My name is Alice"}])
+# Next time, it remembers context better!
+```
+## ⚡ Performance
+| Model | Speed | VRAM |
+|-------|-------|------|
+| Llama-2 7B (Q4_K_M) | 47.8 tok/s | ~5 GB |
+| Llama-3 8B (Q4_K_M) | 36.8 tok/s | ~6 GB |
+| Mistral 7B (Q4_K_M) | 40.8 tok/s | ~5 GB |
+| Qwen2.5 1.5B (Q4_K_M) | 70.4 tok/s | ~2 GB |
+With `--q8-cache`: **82% VRAM reduction** for KV cache.
+## 🔌 OpenAI-Compatible API
+```bash
+bit-ttt serve model.gguf --port 8000
+```
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
+response = client.chat.completions.create(
+    model="default",
+    messages=[{"role": "user", "content": "Hi!"}],
+    stream=True,
+)
+```
+## 📖 Links
+- [GitHub](https://github.com/imonoonoko/Bit-TTT-Engine)
+- [Documentation](https://github.com/imonoonoko/Bit-TTT-Engine#readme)
+## 💖 License
+MIT License

bit_ttt_engine-0.7.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,14 @@
+LICENSE,sha256=JJLZ3h6-sbZqpBjH8srqgZ40NaAKhVqh2oXOM6E2Mak,1088
+bit_ttt_engine-0.7.0.dist-info\METADATA,sha256=DcIuKoAPVwcTyCrC93zMP8HDeP6N_268XJuc6_4B8x0,3895
+bit_ttt_engine-0.7.0.dist-info\WHEEL,sha256=GCQ19ZBvayuBQJpz6xNbc8p6I5GMQcns9k4vQBQ8VH8,97
+bit_ttt_engine-0.7.0.dist-info\entry_points.txt,sha256=UDj3hFWWNAkFeEunLb_bbBXiZiR2fTdJ_3SYHGxSm8Y,47
+bit_ttt_engine-0.7.0.dist-info\licenses\LICENSE,sha256=JJLZ3h6-sbZqpBjH8srqgZ40NaAKhVqh2oXOM6E2Mak,1088
+cortex_rust\__init__.py,sha256=Ljn3niikl34z9Aa0H6C2mRH1-LFQUyhjHBJ-kqBevbU,847
+cortex_rust\__main__.py,sha256=2VfobbcvxxN5ez4AlEC12IhkhfXKLilTdfhH9A9maXw,98
+cortex_rust\__pycache__\__init__.cpython-310.pyc,sha256=buwvbY3RJ7QEIWPNXGFqEBWThA6f44RNsFpATpQc2kg,765
+cortex_rust\chat.py,sha256=Hvk3JPA2CurvTrVGEeL0kuUU-uFxFWLRSQWGvPkQKAo,6727
+cortex_rust\cli.py,sha256=U4K-4MT9HvVYPGnZv_C5j_8l3-EoD4kcCGIyOQHosbQ,13318
+cortex_rust\cortex_rust.cp310-win_amd64.pyd,sha256=pdHNyK7KYKwSnzPEahJm-Jc6y_TJHHMeosqozZlDXjc,26248704
+cortex_rust\engine.py,sha256=E5K23bonOov45gBMLEogZsvy2LeaX7kkMF3qTPzxYhs,8576
+cortex_rust\server.py,sha256=Pna2HPxMI_lzjUmG0wHhbp1-49Cd0T97H5LxXeLkvBU,16167
+bit_ttt_engine-0.7.0.dist-info\RECORD,,

bit_ttt_engine-0.7.0.dist-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ bit-ttt=cortex_rust.cli:main

bit_ttt_engine-0.7.0.dist-info/licenses/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 imonoonoko
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

cortex_rust/__init__.py CHANGED Viewed

@@ -1,5 +1,25 @@
+# Auto-add CUDA DLL path on Windows
+import os
+import sys
+if sys.platform == "win32":
+    cuda_paths = [
+        os.environ.get("CUDA_PATH", ""),
+        r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4",
+        r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3",
+        r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0",
+        r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8",
+    ]
+    for cuda_path in cuda_paths:
+        bin_path = os.path.join(cuda_path, "bin")
+        if os.path.isdir(bin_path):
+            os.add_dll_directory(bin_path)
+            break
 from .cortex_rust import *
+from .chat import format_chat, format_simple, detect_template, list_templates
+from .engine import load, Model
 __doc__ = cortex_rust.__doc__
 if hasattr(cortex_rust, "__all__"):
-    __all__ = cortex_rust.__all__
+    __all__ = cortex_rust.__all__

cortex_rust/__main__.py ADDED Viewed

@@ -0,0 +1,4 @@
+"""Enable `python -m cortex_rust` as CLI entry point."""
+from cortex_rust.cli import main
+main()

cortex_rust/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file

cortex_rust/chat.py ADDED Viewed

@@ -0,0 +1,196 @@
+"""Chat template support for various LLM architectures.
+Provides format_chat() to convert messages into model-specific prompt strings.
+Supports Llama-3, Llama-2, Gemma-2, Qwen/ChatML, and generic formats.
+Usage:
+    from cortex_rust.chat import format_chat, detect_template
+    messages = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Hello!"},
+    ]
+    prompt = format_chat(messages, template="llama3")
+    # Or auto-detect from model path:
+    template = detect_template("Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
+    prompt = format_chat(messages, template=template)
+"""
+from typing import List, Dict, Optional
+# ============================================================================
+# Template Definitions
+# ============================================================================
+TEMPLATES = {
+    "llama3": {
+        "bos": "<|begin_of_text|>",
+        "system_start": "<|start_header_id|>system<|end_header_id|>\n\n",
+        "system_end": "<|eot_id|>",
+        "user_start": "<|start_header_id|>user<|end_header_id|>\n\n",
+        "user_end": "<|eot_id|>",
+        "assistant_start": "<|start_header_id|>assistant<|end_header_id|>\n\n",
+        "assistant_end": "<|eot_id|>",
+        "default_system": "You are a helpful assistant.",
+    },
+    "llama2": {
+        "bos": "<s>",
+        "system_start": "<<SYS>>\n",
+        "system_end": "\n<</SYS>>\n\n",
+        "user_start": "[INST] ",
+        "user_end": " [/INST]",
+        "assistant_start": " ",
+        "assistant_end": " </s>",
+        "default_system": "You are a helpful, respectful and honest assistant.",
+        # Llama-2 embeds system inside first [INST]
+        "system_inside_user": True,
+    },
+    "gemma2": {
+        "bos": "<bos>",
+        "system_start": "",  # Gemma-2 has no system role
+        "system_end": "",
+        "user_start": "<start_of_turn>user\n",
+        "user_end": "<end_of_turn>\n",
+        "assistant_start": "<start_of_turn>model\n",
+        "assistant_end": "<end_of_turn>\n",
+        "default_system": None,  # No system support
+    },
+    "chatml": {
+        # Used by Qwen, Mistral-Instruct, etc.
+        "bos": "",
+        "system_start": "<|im_start|>system\n",
+        "system_end": "<|im_end|>\n",
+        "user_start": "<|im_start|>user\n",
+        "user_end": "<|im_end|>\n",
+        "assistant_start": "<|im_start|>assistant\n",
+        "assistant_end": "<|im_end|>\n",
+        "default_system": "You are a helpful assistant.",
+    },
+}
+# Model name patterns → template mapping
+_DETECTION_PATTERNS = [
+    ("llama-3", "llama3"),
+    ("llama3", "llama3"),
+    ("meta-llama-3", "llama3"),
+    ("llama-2", "llama2"),
+    ("llama2", "llama2"),
+    ("gemma-2", "gemma2"),
+    ("gemma2", "gemma2"),
+    ("qwen", "chatml"),
+    ("mistral", "chatml"),
+    ("yi-", "chatml"),
+]
+def detect_template(model_path: str) -> str:
+    """Auto-detect chat template from model filename.
+    Args:
+        model_path: Path to model file (e.g., "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
+    Returns:
+        Template name (e.g., "llama3", "chatml"). Falls back to "chatml" if unknown.
+    """
+    name = model_path.lower().replace("\\", "/").split("/")[-1]
+    for pattern, template in _DETECTION_PATTERNS:
+        if pattern in name:
+            return template
+    return "chatml"  # Safe default
+def list_templates() -> List[str]:
+    """List all available template names."""
+    return list(TEMPLATES.keys())
+def format_chat(
+    messages: List[Dict[str, str]],
+    template: str = "chatml",
+    add_generation_prompt: bool = True,
+) -> str:
+    """Format chat messages into a model-specific prompt string.
+    Args:
+        messages: List of {"role": "system"|"user"|"assistant", "content": "..."}
+        template: Template name ("llama3", "llama2", "gemma2", "chatml")
+        add_generation_prompt: If True, append assistant start token at the end
+    Returns:
+        Formatted prompt string ready for model.generate()
+    Example:
+        >>> messages = [{"role": "user", "content": "Hello!"}]
+        >>> format_chat(messages, template="llama3")
+        '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\\n\\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n'
+    """
+    if template not in TEMPLATES:
+        raise ValueError(f"Unknown template '{template}'. Available: {list_templates()}")
+    tmpl = TEMPLATES[template]
+    parts = [tmpl["bos"]]
+    system_inside_user = tmpl.get("system_inside_user", False)
+    system_content = None
+    for msg in messages:
+        role = msg["role"]
+        content = msg["content"]
+        if role == "system":
+            if system_inside_user:
+                # Llama-2 style: save system for embedding in first user message
+                system_content = content
+            elif tmpl["system_start"]:  # Skip if no system support (Gemma-2)
+                parts.append(tmpl["system_start"])
+                parts.append(content)
+                parts.append(tmpl["system_end"])
+        elif role == "user":
+            parts.append(tmpl["user_start"])
+            if system_inside_user and system_content is not None:
+                # Llama-2: embed system before user content
+                parts.append(tmpl["system_start"])
+                parts.append(system_content)
+                parts.append(tmpl["system_end"])
+                system_content = None  # Only first user message
+            parts.append(content)
+            parts.append(tmpl["user_end"])
+        elif role == "assistant":
+            parts.append(tmpl["assistant_start"])
+            parts.append(content)
+            parts.append(tmpl["assistant_end"])
+    if add_generation_prompt:
+        parts.append(tmpl["assistant_start"])
+    return "".join(parts)
+def format_simple(
+    user_message: str,
+    system_message: Optional[str] = None,
+    template: str = "chatml",
+) -> str:
+    """Convenience: format a single user message (with optional system prompt).
+    Args:
+        user_message: The user's message
+        system_message: Optional system prompt (uses template default if None)
+        template: Template name
+    Returns:
+        Formatted prompt string
+    """
+    messages = []
+    tmpl = TEMPLATES.get(template, TEMPLATES["chatml"])
+    if system_message is not None:
+        messages.append({"role": "system", "content": system_message})
+    elif tmpl.get("default_system"):
+        messages.append({"role": "system", "content": tmpl["default_system"]})
+    messages.append({"role": "user", "content": user_message})
+    return format_chat(messages, template=template)