PyPI - lemonade-sdk - Versions diffs - 8.0.6__tar.gz → 8.1.1__tar.gz - Mend

lemonade-sdk 8.0.6tar.gz → 8.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of lemonade-sdk might be problematic. Click here for more details.

Files changed (78) hide show

{lemonade_sdk-8.0.6/src/lemonade_sdk.egg-info → lemonade_sdk-8.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lemonade-sdk
-Version: 8.0.6
+Version: 8.1.1
 Summary: Lemonade SDK: Your LLM Aide for Validation and Deployment
 Author-email: lemonade@amd.com
 Requires-Python: >=3.10, <3.13
@@ -22,16 +22,16 @@ Requires-Dist: pytz
 Requires-Dist: zstandard
 Requires-Dist: fastapi
 Requires-Dist: uvicorn[standard]
-Requires-Dist: openai>=1.81.0
+Requires-Dist: openai<1.97.1,>=1.81.0
 Requires-Dist: transformers<=4.53.2
 Requires-Dist: jinja2
 Requires-Dist: tabulate
 Requires-Dist: sentencepiece
-Requires-Dist: huggingface-hub==0.33.0
-Provides-Extra: oga-hybrid
-Requires-Dist: onnx==1.16.1; extra == "oga-hybrid"
-Requires-Dist: numpy==1.26.4; extra == "oga-hybrid"
-Requires-Dist: protobuf>=6.30.1; extra == "oga-hybrid"
+Requires-Dist: huggingface-hub[hf_xet]==0.33.0
+Requires-Dist: python-dotenv
+Provides-Extra: oga-ryzenai
+Requires-Dist: onnxruntime-genai-directml-ryzenai==0.7.0.2; extra == "oga-ryzenai"
+Requires-Dist: protobuf>=6.30.1; extra == "oga-ryzenai"
 Provides-Extra: oga-cpu
 Requires-Dist: onnxruntime-genai==0.8.2; extra == "oga-cpu"
 Requires-Dist: onnxruntime>=1.22.0; extra == "oga-cpu"
@@ -41,16 +41,35 @@ Requires-Dist: accelerate; extra == "dev"
 Requires-Dist: datasets; extra == "dev"
 Requires-Dist: pandas>=1.5.3; extra == "dev"
 Requires-Dist: matplotlib; extra == "dev"
+Requires-Dist: model-generate==1.5.0; (platform_system == "Windows" and python_version == "3.10") and extra == "dev"
 Requires-Dist: human-eval-windows==1.0.4; extra == "dev"
 Requires-Dist: lm-eval[api]; extra == "dev"
+Provides-Extra: oga-hybrid
+Requires-Dist: lemonade-sdk[oga-ryzenai]; extra == "oga-hybrid"
+Provides-Extra: oga-unified
+Requires-Dist: lemonade-sdk[oga-ryzenai]; extra == "oga-unified"
 Provides-Extra: oga-hybrid-minimal
-Requires-Dist: lemonade-sdk[oga-hybrid]; extra == "oga-hybrid-minimal"
+Requires-Dist: lemonade-sdk[oga-ryzenai]; extra == "oga-hybrid-minimal"
 Provides-Extra: oga-cpu-minimal
 Requires-Dist: lemonade-sdk[oga-cpu]; extra == "oga-cpu-minimal"
+Provides-Extra: oga-npu-minimal
+Requires-Dist: lemonade-sdk[oga-ryzenai]; extra == "oga-npu-minimal"
 Provides-Extra: llm
 Requires-Dist: lemonade-sdk[dev]; extra == "llm"
 Provides-Extra: llm-oga-cpu
 Requires-Dist: lemonade-sdk[dev,oga-cpu]; extra == "llm-oga-cpu"
+Provides-Extra: llm-oga-npu
+Requires-Dist: onnx==1.16.0; extra == "llm-oga-npu"
+Requires-Dist: onnxruntime==1.18.0; extra == "llm-oga-npu"
+Requires-Dist: numpy==1.26.4; extra == "llm-oga-npu"
+Requires-Dist: protobuf>=6.30.1; extra == "llm-oga-npu"
+Requires-Dist: lemonade-sdk[dev]; extra == "llm-oga-npu"
+Provides-Extra: llm-oga-hybrid
+Requires-Dist: onnx==1.16.1; extra == "llm-oga-hybrid"
+Requires-Dist: numpy==1.26.4; extra == "llm-oga-hybrid"
+Requires-Dist: protobuf>=6.30.1; extra == "llm-oga-hybrid"
+Provides-Extra: llm-oga-unified
+Requires-Dist: lemonade-sdk[dev,llm-oga-hybrid]; extra == "llm-oga-unified"
 Provides-Extra: llm-oga-igpu
 Requires-Dist: onnxruntime-genai-directml==0.6.0; extra == "llm-oga-igpu"
 Requires-Dist: onnxruntime-directml<1.22.0,>=1.19.0; extra == "llm-oga-igpu"
@@ -61,16 +80,6 @@ Requires-Dist: onnxruntime-genai-cuda==0.8.2; extra == "llm-oga-cuda"
 Requires-Dist: onnxruntime-gpu>=1.22.0; extra == "llm-oga-cuda"
 Requires-Dist: transformers<=4.51.3; extra == "llm-oga-cuda"
 Requires-Dist: lemonade-sdk[dev]; extra == "llm-oga-cuda"
-Provides-Extra: llm-oga-npu
-Requires-Dist: onnx==1.16.0; extra == "llm-oga-npu"
-Requires-Dist: onnxruntime==1.18.0; extra == "llm-oga-npu"
-Requires-Dist: numpy==1.26.4; extra == "llm-oga-npu"
-Requires-Dist: protobuf>=6.30.1; extra == "llm-oga-npu"
-Requires-Dist: lemonade-sdk[dev]; extra == "llm-oga-npu"
-Provides-Extra: llm-oga-hybrid
-Requires-Dist: lemonade-sdk[dev,oga-hybrid]; extra == "llm-oga-hybrid"
-Provides-Extra: llm-oga-unified
-Requires-Dist: lemonade-sdk[llm-oga-hybrid]; extra == "llm-oga-unified"
 Dynamic: author-email
 Dynamic: description
 Dynamic: description-content-type
@@ -129,7 +138,9 @@ Dynamic: summary
   <a href="https://discord.gg/5xXzkMu8Zk">Discord</a>
 </h3>
-Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.
+Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.
+Startups such as [Styrk AI](https://styrk.ai/styrk-ai-and-amd-guardrails-for-your-on-device-ai-revolution/), research teams like [Hazy Research at Stanford](https://www.amd.com/en/developer/resources/technical-articles/2025/minions--on-device-and-cloud-language-model-collaboration-on-ryz.html), and large companies like [AMD](https://www.amd.com/en/developer/resources/technical-articles/unlocking-a-wave-of-llm-apps-on-ryzen-ai-through-lemonade-server.html) use Lemonade to run LLMs.
 ## Getting Started
@@ -148,7 +159,7 @@ Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus
 </p>
 > [!TIP]
-> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or email lemonade@amd.com.
+> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or [email](lemonade@amd.com).
 ## Using the CLI
@@ -170,11 +181,14 @@ To check all models available, use the `list` command:
 lemonade-server list
 ```
-> Note: If you installed from source, use the `lemonade-server-dev` command instead.
+> **Note**:  If you installed from source, use the `lemonade-server-dev` command instead.
+> **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models.
 ## Model Library
-Lemonade supports both GGUF and ONNX models as detailed in the [Supported Configuration](#supported-configurations) section. A list of all built-in models is available [here](https://lemonade-server.ai/docs/server/models/).
+Lemonade supports both GGUF and ONNX models as detailed in the [Supported Configuration](#supported-configurations) section. A list of all built-in models is available [here](https://lemonade-server.ai/docs/server/server_models/).
 You can also import custom GGUF and ONNX models from Hugging Face by using our [Model Manager](http://localhost:8000/#model-management) (requires server to be running).
 <p align="center">
@@ -212,7 +226,7 @@ Lemonade supports the following configurations, while also making it easy to swi
     <tr>
       <td><strong>🎮 GPU</strong></td>
       <td align="center">—</td>
-      <td align="center">Vulkan: All platforms<br><small>Focus:<br/>Ryzen™ AI 7000/8000/300<br/>Radeon™ 7000/9000</small></td>
+      <td align="center">Vulkan: All platforms<br>ROCm: Selected AMD platforms*</td>
       <td align="center">—</td>
       <td align="center">✅</td>
       <td align="center">✅</td>
@@ -228,6 +242,38 @@ Lemonade supports the following configurations, while also making it easy to swi
   </tbody>
 </table>
+<details>
+<summary><small><i>* See supported AMD ROCm platforms</i></small></summary>
+<br>
+<table>
+  <thead>
+    <tr>
+      <th>Architecture</th>
+      <th>Platform Support</th>
+      <th>GPU Models</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><b>gfx1151</b> (STX Halo)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Ryzen AI MAX+ Pro 395</td>
+    </tr>
+    <tr>
+      <td><b>gfx120X</b> (RDNA4)</td>
+      <td>Windows only</td>
+      <td>Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT</td>
+    </tr>
+    <tr>
+      <td><b>gfx110X</b> (RDNA3)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT</td>
+    </tr>
+  </tbody>
+</table>
+</details>
 ## Integrate Lemonade Server with Your Application
@@ -263,7 +309,7 @@ completion = client.chat.completions.create(
 print(completion.choices[0].message.content)
 ```
-For more detailed integration instructions, see the [Integration Guide](./server_integration.md).
+For more detailed integration instructions, see the [Integration Guide](./docs/server/server_integration.md).
 ## Beyond an LLM Server
@@ -272,6 +318,10 @@ The [Lemonade SDK](./docs/README.md) also include the following components:
 - 🐍 **[Lemonade API](./docs/lemonade_api.md)**: High-level Python API to directly integrate Lemonade LLMs into Python applications.
 - 🖥️ **[Lemonade CLI](./docs/dev_cli/README.md)**: The `lemonade` CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with prompting templates, accuracy testing, performance benchmarking, and memory profiling to characterize your models on your hardware.
+## FAQ
+To read our frequently asked questions, see our [FAQ Guide](./docs/faq.md)
 ## Contributing
 We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our [contribution guide](./docs/contribute.md).

{lemonade_sdk-8.0.6 → lemonade_sdk-8.1.1}/README.md RENAMED Viewed

@@ -47,7 +47,9 @@
   <a href="https://discord.gg/5xXzkMu8Zk">Discord</a>
 </h3>
-Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.
+Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.
+Startups such as [Styrk AI](https://styrk.ai/styrk-ai-and-amd-guardrails-for-your-on-device-ai-revolution/), research teams like [Hazy Research at Stanford](https://www.amd.com/en/developer/resources/technical-articles/2025/minions--on-device-and-cloud-language-model-collaboration-on-ryz.html), and large companies like [AMD](https://www.amd.com/en/developer/resources/technical-articles/unlocking-a-wave-of-llm-apps-on-ryzen-ai-through-lemonade-server.html) use Lemonade to run LLMs.
 ## Getting Started
@@ -66,7 +68,7 @@ Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus
 </p>
 > [!TIP]
-> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or email lemonade@amd.com.
+> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or [email](lemonade@amd.com).
 ## Using the CLI
@@ -88,11 +90,14 @@ To check all models available, use the `list` command:
 lemonade-server list
 ```
-> Note: If you installed from source, use the `lemonade-server-dev` command instead.
+> **Note**:  If you installed from source, use the `lemonade-server-dev` command instead.
+> **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models.
 ## Model Library
-Lemonade supports both GGUF and ONNX models as detailed in the [Supported Configuration](#supported-configurations) section. A list of all built-in models is available [here](https://lemonade-server.ai/docs/server/models/).
+Lemonade supports both GGUF and ONNX models as detailed in the [Supported Configuration](#supported-configurations) section. A list of all built-in models is available [here](https://lemonade-server.ai/docs/server/server_models/).
 You can also import custom GGUF and ONNX models from Hugging Face by using our [Model Manager](http://localhost:8000/#model-management) (requires server to be running).
 <p align="center">
@@ -130,7 +135,7 @@ Lemonade supports the following configurations, while also making it easy to swi
     <tr>
       <td><strong>🎮 GPU</strong></td>
       <td align="center">—</td>
-      <td align="center">Vulkan: All platforms<br><small>Focus:<br/>Ryzen™ AI 7000/8000/300<br/>Radeon™ 7000/9000</small></td>
+      <td align="center">Vulkan: All platforms<br>ROCm: Selected AMD platforms*</td>
       <td align="center">—</td>
       <td align="center">✅</td>
       <td align="center">✅</td>
@@ -146,6 +151,38 @@ Lemonade supports the following configurations, while also making it easy to swi
   </tbody>
 </table>
+<details>
+<summary><small><i>* See supported AMD ROCm platforms</i></small></summary>
+<br>
+<table>
+  <thead>
+    <tr>
+      <th>Architecture</th>
+      <th>Platform Support</th>
+      <th>GPU Models</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><b>gfx1151</b> (STX Halo)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Ryzen AI MAX+ Pro 395</td>
+    </tr>
+    <tr>
+      <td><b>gfx120X</b> (RDNA4)</td>
+      <td>Windows only</td>
+      <td>Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT</td>
+    </tr>
+    <tr>
+      <td><b>gfx110X</b> (RDNA3)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT</td>
+    </tr>
+  </tbody>
+</table>
+</details>
 ## Integrate Lemonade Server with Your Application
@@ -181,7 +218,7 @@ completion = client.chat.completions.create(
 print(completion.choices[0].message.content)
 ```
-For more detailed integration instructions, see the [Integration Guide](./server_integration.md).
+For more detailed integration instructions, see the [Integration Guide](./docs/server/server_integration.md).
 ## Beyond an LLM Server
@@ -190,6 +227,10 @@ The [Lemonade SDK](./docs/README.md) also include the following components:
 - 🐍 **[Lemonade API](./docs/lemonade_api.md)**: High-level Python API to directly integrate Lemonade LLMs into Python applications.
 - 🖥️ **[Lemonade CLI](./docs/dev_cli/README.md)**: The `lemonade` CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with prompting templates, accuracy testing, performance benchmarking, and memory profiling to characterize your models on your hardware.
+## FAQ
+To read our frequently asked questions, see our [FAQ Guide](./docs/faq.md)
 ## Contributing
 We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our [contribution guide](./docs/contribute.md).

{lemonade_sdk-8.0.6 → lemonade_sdk-8.1.1}/setup.py RENAMED Viewed

@@ -44,21 +44,20 @@ setup(
         "zstandard",
         "fastapi",
         "uvicorn[standard]",
-        "openai>=1.81.0",
+        "openai>=1.81.0,<1.97.1",
         "transformers<=4.53.2",
         "jinja2",
         "tabulate",
         "sentencepiece",
-        "huggingface-hub==0.33.0",
+        "huggingface-hub[hf_xet]==0.33.0",
+        "python-dotenv",
     ],
     extras_require={
         # The non-dev extras are meant to deploy specific backends into end-user
         # applications, without including developer-focused tools
-        "oga-hybrid": [
-            # Note: `lemonade-install --ryzenai hybrid` is necessary
-            # to complete installation
-            "onnx==1.16.1",
-            "numpy==1.26.4",
+        # Primary NPU extra using unified PyPI package
+        "oga-ryzenai": [
+            "onnxruntime-genai-directml-ryzenai==0.7.0.2",
             "protobuf>=6.30.1",
         ],
         "oga-cpu": [
@@ -75,17 +74,38 @@ setup(
             "datasets",
             "pandas>=1.5.3",
             "matplotlib",
+            "model-generate==1.5.0; platform_system=='Windows' and python_version=='3.10'",
             # Install human-eval from a forked repo with Windows support until the
             # PR (https://github.com/openai/human-eval/pull/53) is merged
             "human-eval-windows==1.0.4",
             "lm-eval[api]",
         ],
         # Keep backwards compatibility for old extras names
-        "oga-hybrid-minimal": ["lemonade-sdk[oga-hybrid]"],
+        "oga-hybrid": ["lemonade-sdk[oga-ryzenai]"],
+        "oga-unified": ["lemonade-sdk[oga-ryzenai]"],
+        "oga-hybrid-minimal": ["lemonade-sdk[oga-ryzenai]"],
         "oga-cpu-minimal": ["lemonade-sdk[oga-cpu]"],
+        "oga-npu-minimal": ["lemonade-sdk[oga-ryzenai]"],
         "llm": ["lemonade-sdk[dev]"],
         "llm-oga-cpu": ["lemonade-sdk[dev,oga-cpu]"],
         # The following extras are deprecated and/or not commonly used
+        "llm-oga-npu": [
+            "onnx==1.16.0",
+            # NPU requires specific onnxruntime version for Ryzen AI compatibility
+            # This may conflict with other OGA extras that require >=1.22.0
+            "onnxruntime==1.18.0",
+            "numpy==1.26.4",
+            "protobuf>=6.30.1",
+            "lemonade-sdk[dev]",
+        ],
+        "llm-oga-hybrid": [
+            # Note: `lemonade-install --ryzenai hybrid` is necessary
+            # to complete installation for RAI 1.4.0.
+            "onnx==1.16.1",
+            "numpy==1.26.4",
+            "protobuf>=6.30.1",
+        ],
+        "llm-oga-unified": ["lemonade-sdk[dev, llm-oga-hybrid]"],
         "llm-oga-igpu": [
             "onnxruntime-genai-directml==0.6.0",
             "onnxruntime-directml>=1.19.0,<1.22.0",
@@ -98,17 +118,6 @@ setup(
             "transformers<=4.51.3",
             "lemonade-sdk[dev]",
         ],
-        "llm-oga-npu": [
-            "onnx==1.16.0",
-            # NPU requires specific onnxruntime version for Ryzen AI compatibility
-            # This may conflict with other OGA extras that require >=1.22.0
-            "onnxruntime==1.18.0",
-            "numpy==1.26.4",
-            "protobuf>=6.30.1",
-            "lemonade-sdk[dev]",
-        ],
-        "llm-oga-hybrid": ["lemonade-sdk[dev,oga-hybrid]"],
-        "llm-oga-unified": ["lemonade-sdk[llm-oga-hybrid]"],
     },
     classifiers=[],
     entry_points={

{lemonade_sdk-8.0.6 → lemonade_sdk-8.1.1}/src/lemonade/common/inference_engines.py RENAMED Viewed

@@ -2,7 +2,6 @@ import os
 import sys
 import importlib.util
 import importlib.metadata
-import platform
 import subprocess
 from abc import ABC, abstractmethod
 from typing import Dict, Optional
@@ -19,7 +18,9 @@ class InferenceEngineDetector:
         self.llamacpp_detector = LlamaCppDetector()
         self.transformers_detector = TransformersDetector()
-    def detect_engines_for_device(self, device_type: str) -> Dict[str, Dict]:
+    def detect_engines_for_device(
+        self, device_type: str, device_name: str
+    ) -> Dict[str, Dict]:
         """
         Detect all available inference engines for a specific device type.
@@ -36,10 +37,19 @@ class InferenceEngineDetector:
         if oga_info:
             engines["oga"] = oga_info
-        # Detect llama.cpp availability
-        llamacpp_info = self.llamacpp_detector.detect_for_device(device_type)
+        # Detect llama.cpp vulkan availability
+        llamacpp_info = self.llamacpp_detector.detect_for_device(
+            device_type, device_name, "vulkan"
+        )
+        if llamacpp_info:
+            engines["llamacpp-vulkan"] = llamacpp_info
+        # Detect llama.cpp rocm availability
+        llamacpp_info = self.llamacpp_detector.detect_for_device(
+            device_type, device_name, "rocm"
+        )
         if llamacpp_info:
-            engines["llamacpp"] = llamacpp_info
+            engines["llamacpp-rocm"] = llamacpp_info
         # Detect Transformers availability
         transformers_info = self.transformers_detector.detect_for_device(device_type)
@@ -206,57 +216,40 @@ class LlamaCppDetector(BaseEngineDetector):
     Detector for llama.cpp.
     """
-    def detect_for_device(self, device_type: str) -> Optional[Dict]:
+    def detect_for_device(
+        self, device_type: str, device_name: str, backend: str
+    ) -> Optional[Dict]:
         """
         Detect llama.cpp availability for specific device.
         """
         try:
-            # Map device types to llama.cpp backends
-            device_backend_map = {
-                "cpu": "cpu",
-                "amd_igpu": "vulkan",
-                "amd_dgpu": "vulkan",
-            }
-            if device_type not in device_backend_map:
+            if device_type not in ["cpu", "amd_igpu", "amd_dgpu"]:
                 return None
-            backend = device_backend_map[device_type]
-            is_installed = self.is_installed()
-            # Check requirements based on backend
-            if backend == "vulkan":
-                vulkan_available = self._check_vulkan_support()
-                if not vulkan_available:
-                    return {"available": False, "error": "Vulkan not available"}
-                # Vulkan is available
-                if is_installed:
-                    result = {
-                        "available": True,
-                        "version": self._get_llamacpp_version(),
-                        "backend": backend,
-                    }
-                    return result
-                else:
-                    return {
-                        "available": False,
-                        "error": "llama.cpp binaries not installed",
-                    }
-            else:
-                # CPU backend
-                if is_installed:
-                    result = {
-                        "available": True,
-                        "version": self._get_llamacpp_version(),
-                        "backend": backend,
-                    }
-                    return result
-                else:
-                    return {
-                        "available": False,
-                        "error": "llama.cpp binaries not installed",
-                    }
+            # Check if the device is supported by the backend
+            if device_type == "cpu":
+                device_supported = True
+            elif device_type == "amd_igpu" or device_type == "amd_dgpu":
+                if backend == "vulkan":
+                    device_supported = self._check_vulkan_support()
+                elif backend == "rocm":
+                    device_supported = self._check_rocm_support(device_name.lower())
+            if not device_supported:
+                return {"available": False, "error": f"{backend} not available"}
+            is_installed = self.is_installed(backend)
+            if not is_installed:
+                return {
+                    "available": False,
+                    "error": f"{backend} binaries not installed",
+                }
+            return {
+                "available": True,
+                "version": self._get_llamacpp_version(backend),
+                "backend": backend,
+            }
         except (ImportError, OSError, subprocess.SubprocessError) as e:
             return {
@@ -264,35 +257,17 @@ class LlamaCppDetector(BaseEngineDetector):
                 "error": f"llama.cpp detection failed: {str(e)}",
             }
-    def is_installed(self) -> bool:
+    def is_installed(self, backend: str) -> bool:
         """
-        Check if llama.cpp binaries are available.
+        Check if llama.cpp binaries are available for any backend.
         """
+        from lemonade.tools.llamacpp.utils import get_llama_server_exe_path
-        # Check lemonade-managed binary locations
         try:
-            # Check lemonade server directory
-            server_base_dir = os.path.join(
-                os.path.dirname(sys.executable), "llama_server"
-            )
-            if platform.system().lower() == "windows":
-                server_exe_path = os.path.join(server_base_dir, "llama-server.exe")
-            else:
-                # Check both build/bin and root directory locations
-                build_bin_path = os.path.join(
-                    server_base_dir, "build", "bin", "llama-server"
-                )
-                root_path = os.path.join(server_base_dir, "llama-server")
-                server_exe_path = (
-                    build_bin_path if os.path.exists(build_bin_path) else root_path
-                )
+            server_exe_path = get_llama_server_exe_path(backend)
             if os.path.exists(server_exe_path):
                 return True
-        except (ImportError, OSError):
+        except (ImportError, OSError, ValueError):
             pass
         return False
@@ -334,13 +309,22 @@ class LlamaCppDetector(BaseEngineDetector):
             except OSError:
                 return False
-    def _get_llamacpp_version(self) -> str:
+    def _check_rocm_support(self, device_name: str) -> bool:
+        """
+        Check if ROCM is available for GPU acceleration.
+        """
+        from lemonade.tools.llamacpp.utils import identify_rocm_arch_from_name
+        return identify_rocm_arch_from_name(device_name) is not None
+    def _get_llamacpp_version(self, backend: str) -> str:
         """
-        Get llama.cpp version from lemonade's managed installation.
+        Get llama.cpp version from lemonade's managed installation for specific backend.
         """
         try:
+            # Use backend-specific path - same logic as get_llama_folder_path in utils.py
             server_base_dir = os.path.join(
-                os.path.dirname(sys.executable), "llama_server"
+                os.path.dirname(sys.executable), backend, "llama_server"
             )
             version_file = os.path.join(server_base_dir, "version.txt")
@@ -401,15 +385,16 @@ class TransformersDetector(BaseEngineDetector):
         )
-def detect_inference_engines(device_type: str) -> Dict[str, Dict]:
+def detect_inference_engines(device_type: str, device_name: str) -> Dict[str, Dict]:
     """
     Helper function to detect inference engines for a device type.
     Args:
         device_type: "cpu", "amd_igpu", "amd_dgpu", or "npu"
+        device_name: device name
     Returns:
         dict: Engine availability information.
     """
     detector = InferenceEngineDetector()
-    return detector.detect_engines_for_device(device_type)
+    return detector.detect_engines_for_device(device_type, device_name)

{lemonade_sdk-8.0.6 → lemonade_sdk-8.1.1}/src/lemonade/common/network.py RENAMED Viewed

@@ -1,7 +1,7 @@
 import os
 from typing import Optional
 import socket
-from huggingface_hub import model_info
+from huggingface_hub import model_info, snapshot_download
 def is_offline():
@@ -48,3 +48,20 @@ def get_base_model(checkpoint: str) -> Optional[str]:
     except Exception:  # pylint: disable=broad-except
         pass
     return None
+def custom_snapshot_download(repo_id, **kwargs):
+    """
+    Custom snapshot download with retry logic for Windows symlink privilege errors.
+    """
+    for attempt in range(2):
+        try:
+            return snapshot_download(repo_id=repo_id, **kwargs)
+        except OSError as e:
+            if (
+                hasattr(e, "winerror")
+                and e.winerror == 1314  # pylint: disable=no-member
+                and attempt < 1
+            ):
+                continue
+            raise

lemonade-sdk 8.0.6__tar.gz → 8.1.1__tar.gz

Potentially problematic release.

lemonade-sdk 8.0.6tar.gz → 8.1.1tar.gz