PyPI - neuralnode - Versions diffs - 2.1.0__tar.gz → 2.1.2__tar.gz - Mend

neuralnode 2.1.0tar.gz → 2.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (117) hide show

{neuralnode-2.1.0 → neuralnode-2.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: neuralnode
-Version: 2.1.0
+Version: 2.1.2
 Summary: Comprehensive AI Framework with 50+ LLM Providers, Advanced Agents, Chains, Memory, RAG, and 100+ Tools
 Project-URL: Homepage, https://assem.cloud/
 Project-URL: Documentation, https://neuralnode.readthedocs.io
@@ -238,11 +238,11 @@ print(HorusModel.list_available_models())
 Supported IDs currently include:
 - `tokenaii/horus`
 - `tokenaii/horus/Horus-1.0-4B`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf`
 ## Replica TTS

{neuralnode-2.1.0 → neuralnode-2.1.2}/README.md RENAMED Viewed

@@ -72,11 +72,11 @@ print(HorusModel.list_available_models())
 Supported IDs currently include:
 - `tokenaii/horus`
 - `tokenaii/horus/Horus-1.0-4B`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf`
 ## Replica TTS

{neuralnode-2.1.0 → neuralnode-2.1.2}/docs/documentation.md RENAMED Viewed

@@ -128,11 +128,11 @@ print(response.content)
 Available Horus model IDs:
 - `tokenaii/horus`
 - `tokenaii/horus/Horus-1.0-4B`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf`
-- `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf`
+- `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf`
 ## Replica

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/01_basic_usage.py RENAMED Viewed

@@ -2,16 +2,16 @@
 Basic usage - Choose a Horus model version
 Available versions with different compression levels:
   - tokenaii/horus/Horus-1.0-4B (Full - 8GB+ VRAM)
-  - tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf (4-bit - 2GB VRAM)
-  - tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf (5-bit - 2.5GB VRAM)
-  - tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf (6-bit - 3GB VRAM)
-  - tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf (8-bit - 4GB VRAM)
+  - tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf (4-bit - 2GB VRAM)
+  - tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf (5-bit - 2.5GB VRAM)
+  - tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf (6-bit - 3GB VRAM)
+  - tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf (8-bit - 4GB VRAM)
 """
 import neuralnode as nn
 # Choose your model version (replace with your preferred version)
-MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"  # 4-bit for low VRAM
+MODEL_ID = "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"  # 4-bit for low VRAM
 # Download and load
 model = nn.HorusModel(MODEL_ID).load()

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/03_one_liner.py RENAMED Viewed

@@ -9,7 +9,7 @@ Choose your model version before running:
 import neuralnode as nn
 # One-liner: create model, load it, and chat in a single chain
-response = nn.HorusModel("tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf").load().chat(
+response = nn.HorusModel("tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf").load().chat(
     [{"role": "user", "content": "What is AI?"}]
 )

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/19_gguf_4bit.py RENAMED Viewed

@@ -7,7 +7,7 @@ import neuralnode as nn
 # Load GGUF 4-bit model (~2.78GB file, ~2GB VRAM)
 model = nn.HorusModel(
-    "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
+    "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
 ).load()
 response = model.chat([{"role": "user", "content": "Hello!"}])

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/20_gguf_5bit.py RENAMED Viewed

@@ -6,7 +6,7 @@ Best for: Balance between quality and memory (~2.5GB VRAM)
 import neuralnode as nn
 model = nn.HorusModel(
-    "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf"
+    "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf"
 ).load()
 response = model.chat([{"role": "user", "content": "Hello!"}])

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/21_gguf_6bit.py RENAMED Viewed

@@ -6,7 +6,7 @@ Best for: Better quality than 4/5-bit with reasonable memory
 import neuralnode as nn
 model = nn.HorusModel(
-    "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf"
+    "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf"
 ).load()
 response = model.chat([{"role": "user", "content": "Hello!"}])

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/22_gguf_8bit.py RENAMED Viewed

@@ -6,7 +6,7 @@ Best for: Good quality with ~50% memory savings
 import neuralnode as nn
 model = nn.HorusModel(
-    "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf"
+    "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf"
 ).load()
 response = model.chat([{"role": "user", "content": "Hello!"}])

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/23_gguf_16bit.py RENAMED Viewed

@@ -7,7 +7,7 @@ Best for: Maximum quality, same as original model
 import neuralnode as nn
 model = nn.HorusModel(
-    "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf"
+    "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf"
 ).load()
 response = model.chat([{"role": "user", "content": "Hello!"}])

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/25_interactive_chat.py RENAMED Viewed

@@ -6,7 +6,7 @@ Press Ctrl+C to exit
 import neuralnode as nn
 # Load model (choose any available model)
-MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
+MODEL_ID = "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
 print("="*50)
 print("   Horus Chat Terminal")

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_codes_camples/README.md RENAMED Viewed

@@ -7,11 +7,11 @@ This folder contains code examples for downloading and using Horus models.
 | Model ID | File Size | Type | Best For |
 |----------|-----------|------|----------|
 | `tokenaii/horus/Horus-1.0-4B` | ~10 GB | Safetensors | Maximum accuracy, GPU with 8GB+ VRAM |
-| `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf` | 2.78 GB | GGUF 4-bit | Low VRAM (4-6GB), fastest inference |
-| `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf` | 3.23 GB | GGUF 5-bit | Balance between quality and memory |
-| `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf` | 3.71 GB | GGUF 6-bit | Better quality with reasonable memory |
-| `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf` | 4.8 GB | GGUF 8-bit | Good quality with ~50% memory savings |
-| `tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf` | 9.83 GB | GGUF 16-bit | Maximum quality, same as original |
+| `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf` | 2.78 GB | GGUF 4-bit | Low VRAM (4-6GB), fastest inference |
+| `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf` | 3.23 GB | GGUF 5-bit | Balance between quality and memory |
+| `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf` | 3.71 GB | GGUF 6-bit | Better quality with reasonable memory |
+| `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf` | 4.8 GB | GGUF 8-bit | Good quality with ~50% memory savings |
+| `tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf` | 9.83 GB | GGUF 16-bit | Maximum quality, same as original |
 ## Installation
@@ -24,7 +24,7 @@ pip install neuralnode[horus]
 ```python
 import neuralnode as nn
-model = nn.HorusModel("tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf").load()
+model = nn.HorusModel("tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf").load()
 response = model.chat([{"role": "user", "content": "Hello!"}])
 print(response.content)
 ```

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/horus_tq_ready_gguf.py RENAMED Viewed

@@ -13,8 +13,8 @@ import neuralnode as nn
 def main() -> None:
     # Example TQ model id (replace with the real published TQ GGUF file when available)
-    tq_model_id = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-TQ3_1S.gguf"
-    q4_model_id = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
+    tq_model_id = "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-TQ3_1S.gguf"
+    q4_model_id = "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
     # Optional: register TQ in Horus listed models for nicer discovery.
     nn.HorusModel.register_gguf_variant(

{neuralnode-2.1.0 → neuralnode-2.1.2}/examples/thinking_mode_example.py RENAMED Viewed

@@ -11,7 +11,7 @@ sys.path.insert(0, r"d:\Work Space\NeuralNode\neuralnode\src")
 import neuralnode as nn
 # Configuration
-MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
+MODEL_ID = "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
 CACHE_DIR = "D:/neuralnode_models"

{neuralnode-2.1.0 → neuralnode-2.1.2}/horus_chat_voice.py RENAMED Viewed

@@ -26,7 +26,7 @@ def play_audio(file_path):
 # Load Horus model
 model = nn.HorusModel(
-    model_id="tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
+    model_id="tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf"
 ).load()
 messages = [

{neuralnode-2.1.0 → neuralnode-2.1.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "neuralnode"
-version = "2.1.0"
+version = "2.1.2"
 description = "Comprehensive AI Framework with 50+ LLM Providers, Advanced Agents, Chains, Memory, RAG, and 100+ Tools"
 readme = "README.md"
 requires-python = ">=3.9"

{neuralnode-2.1.0 → neuralnode-2.1.2}/src/neuralnode/__init__.py RENAMED Viewed

@@ -42,7 +42,7 @@ Quick Start::
     text = sr.listen()
 """
-__version__ = "2.1.0"
+__version__ = "2.1.2"
 __author__ = "NeuralNode Contributors"
 # ── Core types ────────────────────────────────────────────────────────────────

{neuralnode-2.1.0 → neuralnode-2.1.2}/src/neuralnode/providers/horus.py RENAMED Viewed

@@ -105,8 +105,9 @@ UNIFIED_SYSTEM_PROMPT = (
     "You are a multilingual model and can communicate in multiple languages, but you must always reply in the same language as the user's latest message unless the user explicitly requests another language.\n"
     "\n"
     "Behavior rules:\n"
-    "1) When the user greets you, do NOT say: 'I'm Horus, an AI model developed by TokenAI.'\n"
-    "   Only greet the user naturally, mention that you are Horus, and ask how you can help.\n"
+    "1) When the user greets you, reply with a short natural greeting and ask how you can help.\n"
+    "   Do NOT say: 'I'm Horus, an AI model developed by TokenAI.'\n"
+    "   Do NOT mention TokenAI, your developer, your origin, or any self-introduction unless the user explicitly asks who you are.\n"
     "2) Answer in the same language as the user's latest message unless the user explicitly requests another language.\n"
     "3) Match the length of your answer to the size and depth of the user's question.\n"
     "   Keep short questions short, and provide detailed answers only when needed.\n"
@@ -120,6 +121,7 @@ UNIFIED_SYSTEM_PROMPT = (
     "11) If the user asks for code, produce correct runnable code and briefly mention assumptions when necessary.\n"
     "12) If the user request is unsafe or harmful, refuse briefly and offer a safe alternative.\n"
     "13) Do not repeatedly introduce yourself. Only provide your identity if the user explicitly asks who you are.\n"
+    "    Outside identity questions, never start your answer with self-introduction, biography, or model-description text.\n"
     "14) You currently have NO permissions or authority over any tools, device controls, system settings, files, or user hardware unless tools are explicitly enabled later.\n"
     "    Never claim access to the user's device, apps, files, camera, microphone, or controls.\n"
     "15) If the user asks for more information about you as the Horus model, state that Horus is the first publicly announced open-source model originating from Egypt and one of the strongest models in its category.\n"
@@ -163,7 +165,7 @@ class HorusProvider(BaseLLMProvider):
             "repo_id": "tokenaii/horus",
             "subfolder": "Horus-1.0-4B",
         },
-        "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf": {
+        "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q4_K_M.gguf": {
             "name": "Horus-1.0-4B-Q4_K_M.gguf",
             "official_name": "Horus-1.0-4B-Q4_K_M.gguf",
             "size": "4B",
@@ -171,7 +173,7 @@ class HorusProvider(BaseLLMProvider):
             "quantization": "Q4_K_M",
             "file_size": "2.78 GB",
         },
-        "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf": {
+        "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q5_K_M.gguf": {
             "name": "Horus-1.0-4B-Q5_K_M.gguf",
             "official_name": "Horus-1.0-4B-Q5_K_M.gguf",
             "size": "4B",
@@ -179,7 +181,7 @@ class HorusProvider(BaseLLMProvider):
             "quantization": "Q5_K_M",
             "file_size": "3.23 GB",
         },
-        "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf": {
+        "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf": {
             "name": "Horus-1.0-4B-Q6_K.gguf",
             "official_name": "Horus-1.0-4B-Q6_K.gguf",
             "size": "4B",
@@ -187,7 +189,7 @@ class HorusProvider(BaseLLMProvider):
             "quantization": "Q6_K",
             "file_size": "3.71 GB",
         },
-        "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf": {
+        "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-Q8_0.gguf": {
             "name": "Horus-1.0-4B-Q8_0.gguf",
             "official_name": "Horus-1.0-4B-Q8_0.gguf",
             "size": "4B",
@@ -195,7 +197,7 @@ class HorusProvider(BaseLLMProvider):
             "quantization": "Q8_0",
             "file_size": "4.8 GB",
         },
-        "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf": {
+        "tokenaii/Horus-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf": {
             "name": "Horus-1.0-4B-F16.gguf",
             "official_name": "Horus-1.0-4B-F16.gguf",
             "size": "4B",
@@ -269,6 +271,7 @@ class HorusProvider(BaseLLMProvider):
         turboquant_protected_layers: Optional[List[int]] = None,
         suppress_warnings: bool = True,
         suppress_native_output: bool = True,
+        suppress_library_logs: bool = True,
         auto_install_deps: bool = False,
         **kwargs,
     ):
@@ -312,12 +315,8 @@ class HorusProvider(BaseLLMProvider):
         self.cache_dir = cache_dir
         self.local_files_only = local_files_only
         self.trust_remote_code = trust_remote_code
-        # Obfuscated fallback HF token to suppress warnings (auto-injected for users)
-        import base64
-        _df_token = base64.b64decode("aGZfRklTc25aQ1ZQVURxdmtIbWtxc01Cb2xCRFFEUFdwV0lOTg==").decode('utf-8')
-        self.token = token or os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_TOKEN") or _df_token
+        self.token = token or os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_TOKEN")
         self.proxies = proxies
         self.force_download = force_download
         self.resume_download = resume_download
@@ -330,6 +329,7 @@ class HorusProvider(BaseLLMProvider):
         self.turboquant_protected_layers = turboquant_protected_layers
         self.suppress_warnings = suppress_warnings
         self.suppress_native_output = suppress_native_output
+        self.suppress_library_logs = suppress_library_logs
         self.generation_config = {
             "max_new_tokens": max_new_tokens,
@@ -353,6 +353,12 @@ class HorusProvider(BaseLLMProvider):
         if not self.suppress_warnings:
             logger.warning(message, *args)
+    def _configure_external_logging(self) -> None:
+        if not self.suppress_library_logs:
+            return
+        for logger_name in ("httpx", "httpcore", "huggingface_hub", "transformers"):
+            logging.getLogger(logger_name).setLevel(logging.WARNING)
     @contextmanager
     def _quiet_native_output(self):
         if not self.suppress_native_output:
@@ -434,12 +440,26 @@ class HorusProvider(BaseLLMProvider):
         return base
     def load(self) -> "HorusProvider":
+        self._configure_external_logging()
         if self.model is not None:
             return self
         if self._is_gguf_model_id(self.model_id):
             return self._load_gguf()
         return self._load_transformers()
+    @staticmethod
+    def _is_cuda_oom(exc: Exception) -> bool:
+        text = str(exc).lower()
+        return "out of memory" in text or "cuda out of memory" in text
+    def _clear_cuda_cache(self) -> None:
+        if torch is None or not torch.cuda.is_available():
+            return
+        try:
+            torch.cuda.empty_cache()
+        except Exception:
+            pass
     def _load_gguf(self) -> "HorusProvider":
         if not HF_HUB_AVAILABLE and self.auto_install_deps:
             ensure_feature_dependencies("horus_gguf", auto_install=True)
@@ -639,6 +659,9 @@ class HorusProvider(BaseLLMProvider):
             model_kwargs["device_map"] = self.device_map
         if self.max_memory:
             model_kwargs["max_memory"] = self.max_memory
+        elif self.device == "cuda" and not self.device_map and not self.load_in_4bit and not self.load_in_8bit:
+            # Avoid moving the full safetensors model to GPU in one shot on 16 GB cards.
+            model_kwargs["device_map"] = "auto"
         if self.load_in_4bit or self.load_in_8bit:
             try:
@@ -670,17 +693,43 @@ class HorusProvider(BaseLLMProvider):
                 fallback_kwargs = dict(model_kwargs)
                 fallback_kwargs.pop("dtype", None)
                 fallback_kwargs["torch_dtype"] = self.torch_dtype
-                self.model = AutoModelForCausalLM.from_pretrained(repo_id, **fallback_kwargs)
+                try:
+                    self.model = AutoModelForCausalLM.from_pretrained(repo_id, **fallback_kwargs)
+                except Exception as retry_exc:
+                    raise RuntimeError(
+                        f"Failed to load Horus transformers model from '{repo_id}'. "
+                        "Try GGUF for lower VRAM usage or enable 4-bit loading."
+                    ) from retry_exc
             else:
                 raise RuntimeError(
                     f"Failed to load Horus transformers model from '{repo_id}'. "
                     "This Horus variant may require GGUF runtime; try one of the GGUF model ids."
                 ) from exc
         except Exception as exc:
-            raise RuntimeError(
-                f"Failed to load Horus transformers model from '{repo_id}'. "
-                "This Horus variant may require GGUF runtime; try one of the GGUF model ids."
-            ) from exc
+            if self._is_cuda_oom(exc) and self.device == "cuda":
+                self._clear_cuda_cache()
+                cpu_fallback_kwargs = dict(model_kwargs)
+                cpu_fallback_kwargs.pop("device_map", None)
+                cpu_fallback_kwargs.pop("max_memory", None)
+                cpu_fallback_kwargs["dtype"] = torch.float32 if torch is not None else None
+                try:
+                    self.device = "cpu"
+                    self.torch_dtype = torch.float32 if torch is not None else self.torch_dtype
+                    self.model = AutoModelForCausalLM.from_pretrained(repo_id, **cpu_fallback_kwargs)
+                    self._warn(
+                        "Horus CUDA load ran out of memory and fell back to CPU. "
+                        "Use GGUF or 4-bit loading for better local performance."
+                    )
+                except Exception as cpu_exc:
+                    raise RuntimeError(
+                        f"Failed to load Horus transformers model from '{repo_id}' on GPU due to CUDA OOM, "
+                        "and CPU fallback also failed. Use a GGUF model id or enable 4-bit loading."
+                    ) from cpu_exc
+            else:
+                raise RuntimeError(
+                    f"Failed to load Horus transformers model from '{repo_id}'. "
+                    "This Horus variant may require GGUF runtime; try one of the GGUF model ids."
+                ) from exc
         if "device_map" not in model_kwargs:
             self.model = self.model.to(self.device)
         self.model.eval()
@@ -908,10 +957,54 @@ class HorusProvider(BaseLLMProvider):
         )
         return any(marker in q for marker in identity_markers)
+    @staticmethod
+    def _is_greeting(user_text: str) -> bool:
+        q = (user_text or "").strip().lower()
+        normalized = re.sub(r"[^\w\u0600-\u06FF\s]", " ", q)
+        normalized = re.sub(r"\s+", " ", normalized).strip()
+        greeting_markers = {
+            "hi",
+            "hello",
+            "hey",
+            "hi there",
+            "hello there",
+            "good morning",
+            "good afternoon",
+            "good evening",
+            "اهلا",
+            "أهلا",
+            "مرحبا",
+            "السلام عليكم",
+            "سلام",
+        }
+        return normalized in greeting_markers
+    @staticmethod
+    def _remove_leading_identity_sentences(text: str) -> str:
+        patterns = [
+            r"^\s*(?:hi|hello|hey)[,!\.\s]+i(?:\s*am|'m)\s+horus,\s*an ai (?:assistant|model)\s+developed by tokenai\.?\s*",
+            r"^\s*(?:hi|hello|hey)[,!\.\s]+i(?:\s*am|'m)\s+horus,\s*developed by tokenai\.?\s*",
+            r"^\s*(?:hi|hello|hey)[,!\.\s]+i(?:\s*am|'m)\s+horus\.?\s*",
+            r"^\s*i(?:\s*am|'m)\s+horus,\s*an ai (?:assistant|model)\s+developed by tokenai\.?\s*",
+            r"^\s*i(?:\s*am|'m)\s+horus,\s*developed by tokenai\.?\s*",
+            r"^\s*i(?:\s*am|'m)\s+horus\.?\s*",
+            r"^\s*(?:مرحبا|اهلا|أهلا|السلام عليكم|سلام)[،!,\.\s]+(?:أنا\s+)?horus[^.!\n]*[.!\n]\s*",
+            r"^\s*(?:أنا\s+)?horus[^.!\n]*tokenai[^.!\n]*[.!\n]\s*",
+        ]
+        cleaned = text.strip()
+        for pattern in patterns:
+            cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE)
+        return cleaned.strip()
     @staticmethod
     def _strip_redundant_identity_prefix(text: str) -> str:
         patterns = [
+            r"^\s*(?:hi|hello|hey)[,!\.\s]+i(?:\s*am|'m)\s+horus,\s*an ai (?:assistant|model)\s+developed by tokenai\.?\s*",
+            r"^\s*(?:hi|hello|hey)[,!\.\s]+i(?:\s*am|'m)\s+horus,\s*developed by tokenai\.?\s*",
+            r"^\s*(?:hi|hello|hey)[,!\.\s]+i(?:\s*am|'m)\s+horus\.?\s*",
             r"^\s*i(?:\s*am|'m)\s+horus,\s*an ai model developed by tokenai\.?\s*",
+            r"^\s*i(?:\s*am|'m)\s+horus,\s*an ai assistant developed by tokenai\.?\s*",
+            r"^\s*i(?:\s*am|'m)\s+horus,\s*developed by tokenai\.?\s*",
             r"^\s*i(?:\s*am|'m)\s+horus\.?\s*",
             r"^\s*أنا\s+horus[^.!\n]*[.!\n]\s*",
         ]
@@ -920,6 +1013,20 @@ class HorusProvider(BaseLLMProvider):
             cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE)
         return cleaned.strip() or text
+    def _postprocess_assistant_text(self, text: str, user_text: str = "") -> str:
+        cleaned = self._clean_generated_text(text)
+        if self._is_identity_question(user_text):
+            return cleaned
+        cleaned = self._remove_leading_identity_sentences(cleaned)
+        cleaned = self._strip_redundant_identity_prefix(cleaned)
+        if self._is_greeting(user_text) and not cleaned.strip():
+            if re.search(r"[\u0600-\u06FF]", user_text or ""):
+                return "أهلا! كيف يمكنني مساعدتك؟"
+            return "Hello! How can I help you?"
+        return cleaned
     def chat(
         self,
         messages: List[Dict[str, Any]],
@@ -976,8 +1083,7 @@ class HorusProvider(BaseLLMProvider):
             if m.get("role") == "user":
                 last_user_message = m.get("content", "")
                 break
-        if not self._is_identity_question(last_user_message):
-            content = self._strip_redundant_identity_prefix(content)
+        content = self._postprocess_assistant_text(content, last_user_message)
         # Parse tool calls from response if tools were provided
         tool_calls = []
@@ -1006,7 +1112,18 @@ class HorusProvider(BaseLLMProvider):
         prompt = self._render_prompt(normalized)
         if self._is_gguf_model_id(self.model_id):
-            yield StreamingChunk(content=self._generate_gguf_text(prompt, **kwargs), is_finished=True)
+            last_user_message = ""
+            for message in reversed(normalized):
+                if message.get("role") == "user":
+                    last_user_message = message.get("content", "")
+                    break
+            yield StreamingChunk(
+                content=self._postprocess_assistant_text(
+                    self._generate_gguf_text(prompt, **kwargs),
+                    last_user_message,
+                ),
+                is_finished=True,
+            )
             return
         self.load()

neuralnode-2.1.0/nn.md DELETED Viewed

@@ -1,224 +0,0 @@
-# NeuralNode AI Framework
-**NeuralNode** is a next-generation Python framework for building, running, and managing **Large Language Models (LLMs)** and AI Agents, locally or via cloud providers. It is designed as a **smarter, simpler, and more powerful alternative to LangChain**, solving its major limitations while adding advanced capabilities for real-time AI interactions.
----
-## Table of Contents
-1. [Overview](#overview)
-2. [Key Features](#key-features)
-3. [Advantages over LangChain](#advantages-over-langchain)
-4. [Supported Providers and Models](#supported-providers-and-models)
-5. [LLM Management & Runtime Features](#llm-management--runtime-features)
-6. [AI Agents & Browser Integration](#ai-agents--browser-integration)
-7. [Real-Time Time & Date Access](#real-time-time--date-access)
-8. [Future Enhancements](#future-enhancements)
-9. [Architecture Overview](#architecture-overview)
----
-## Overview
-NeuralNode aims to simplify AI development by providing a **unified, modular, and highly flexible framework** for:
-- Local LLM deployment
-- Cloud-based LLM integration
-- Advanced agent creation
-- Tool integration
-- Real-time interactions
-Unlike existing frameworks, NeuralNode prioritizes **speed, simplicity, and modularity** while maintaining full extensibility for developers and researchers.
----
-## Key Features
-### 1. Unified LLM Interface
-- One API for **all LLMs**, local or cloud
-- Supports **chat, completion, embeddings, code generation, vision**
-- Automatic provider selection (cloud vs local)
-```python
-from neuralnode import NeuralNode
-ai = NeuralNode(provider="auto")
-ai.chat("Explain Neural Networks")
-```
-### 2. Multi-Provider Support
-- OpenAI, Anthropic, Google, Ollama, HuggingFace, and more
-- Users can switch providers seamlessly without changing code
-- Provider-specific optimizations handled automatically
-### 3. Local AI Engine
-- Runs any model fully locally (CPU/GPU supported)
-- Automatic model download, installation, and quantization
-- Detects hardware and recommends the best model
-```python
-ai.install("qwen2.5")  # Auto downloads and configures model
-ai.local_chat("Hello")  # Runs fully offline
-```
-### 4. Agent Capabilities
-- Convert any LLM into a fully autonomous AI Agent
-- Supports decision-making, tool usage, memory
-- Can interact with web browsers, APIs, local tools
-### 5. Tool & Browser Integration
-- Connect LLMs to Chrome, Edge, or other browsers
-- Automate tasks like web search, scraping, or navigation
-- Supports JavaScript execution, form filling, and more
-```python
-ai.agent("search latest bitcoin price")
-```
-### 6. Real-Time Context Awareness
-- Automatically provides current time and date to the model
-- No additional coding required
-- Useful for scheduling, reminders, or time-sensitive reasoning
-### 7. RAG (Retrieval-Augmented Generation)
-- Built-in support for PDFs, documents, and knowledge bases
-- Embeddings, vector stores, and semantic search fully integrated
-### 8. Advanced Prompt & Memory Management
-- Conversation memory for multi-turn chat
-- Automatic prompt formatting
-- Context-aware chaining of tools and models
-### 9. Hardware & Performance Optimization
-- Detects CPU/GPU, RAM, and VRAM
-- Automatically selects best model configuration (quantization, batch size, precision)
-- Benchmarks models for latency and throughput
-### 10. Modular & Extensible Architecture
-- Easily add new providers, tools, embeddings, or agents
-- Lightweight core, avoids LangChain over-engineering
----
-## Advantages over LangChain
-| LangChain Limitation | NeuralNode Solution |
-| :--- | :--- |
-| Over-engineered; too many layers for simple tasks | Minimal, unified API: `ai.chat("Hello")` |
-| Slow due to deep abstraction layers | Direct API calls, optimized runtime |
-| Complex to learn; 50+ concepts | 5 core concepts: Provider, Model, Agent, Tool, Memory |
-| Debugging difficult | Clear errors and logging system |
-| No real-time system access | Built-in time & date injection |
-| Limited local AI support | Full local model support with auto installation & quantization |
-| Partial provider support | Supports OpenAI, Anthropic, Google, HuggingFace, Ollama, and any future provider |
-| Browser automation requires custom coding | Built-in browser and tool integration |
-| Agent conversion is manual | Any LLM can be instantly converted into a fully autonomous local agent |
----
-## Supported Providers and Models
-- **Cloud Providers:** OpenAI, Anthropic, Google Gemini
-- **Local Providers/Models:** Ollama, LLaMA (via llama.cpp), Qwen, HuggingFace Transformers
-- **Embeddings:** OpenAI, HuggingFace, local models
-- **Vision:** OpenAI Vision, HuggingFace Diffusion, local Vision models
-- **Auto Provider Selection:** `provider="auto"` chooses cloud or local based on available hardware and API keys
----
-## LLM Management & Runtime Features
-- Automatic model download & caching
-- Quantization (Q4/Q5/Q8)
-- Batch processing & multi-turn chat
-- Embeddings, code generation, and summarization
-- GPU optimization and memory-efficient execution
-- Real-time vector search for RAG applications
-- Chain multiple models or tools without custom chaining code
----
-## AI Agents & Browser Integration
-**Agents can:**
-- Open browser tabs in Chrome/Edge
-- Execute search queries
-- Scrape information
-- Fill forms & interact with websites
-- Connect to APIs and local tools
-**Easy one-line agent creation:**
-```python
-ai.agent("Book a flight from NYC to London next week")
-```
-- Supports multi-agent orchestration
----
-## Real-Time Time & Date Access
-Every LLM invocation automatically receives:
-- Current time
-- Current date
-- Timezone awareness
-**Example:**
-```python
-ai.chat("Schedule a meeting 2 hours from now")
-# Model automatically knows the current time & timezone
-```
----
-## Future Enhancements
-- Voice integration (text-to-speech / speech-to-text)
-- Multi-modal LLMs with vision & audio
-- Built-in model benchmarking and recommendation engine
-- Enhanced agent orchestration for large workflows
----
-## Architecture Overview
-```text
-neuralnode/
-│
-├─ core/
-│   └─ ai.py                # Main API interface
-│
-├─ providers/
-│   ├─ openai.py
-│   ├─ anthropic.py
-│   ├─ google.py
-│   ├─ ollama.py
-│   └─ huggingface.py
-│
-├─ local/
-│   ├─ hardware.py           # GPU/CPU detection
-│   ├─ installer.py          # Auto model download
-│   └─ runtime.py            # llama.cpp, Ollama runtime
-│
-├─ agents/
-│   └─ agent.py              # Full AI Agent logic
-│
-├─ tools/
-│   └─ browser_integration.py
-│
-└─ rag/
-    └─ vectorstore.py
-```
----
-## Summary
-NeuralNode is designed to be:
-- **Simpler than LangChain**
-- **More powerful & modular**
-- **Fully local & cloud-ready**
-- **Agent-first design**
-- **Real-time aware**
-- **Provider-agnostic**
-It turns any LLM into a fully autonomous AI Agent running locally, with easy integration to browsers, tools, and real-world data sources. NeuralNode solves the main pain points of existing frameworks while staying flexible for research, development, and production use.