PyPI - lemonade-sdk - Versions diffs - 8.1.0__tar.gz → 8.1.1__tar.gz - Mend

lemonade-sdk 8.1.0tar.gz → 8.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of lemonade-sdk might be problematic. Click here for more details.

Files changed (78) hide show

{lemonade_sdk-8.1.0/src/lemonade_sdk.egg-info → lemonade_sdk-8.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lemonade-sdk
-Version: 8.1.0
+Version: 8.1.1
 Summary: Lemonade SDK: Your LLM Aide for Validation and Deployment
 Author-email: lemonade@amd.com
 Requires-Python: >=3.10, <3.13
@@ -27,7 +27,8 @@ Requires-Dist: transformers<=4.53.2
 Requires-Dist: jinja2
 Requires-Dist: tabulate
 Requires-Dist: sentencepiece
-Requires-Dist: huggingface-hub==0.33.0
+Requires-Dist: huggingface-hub[hf_xet]==0.33.0
+Requires-Dist: python-dotenv
 Provides-Extra: oga-ryzenai
 Requires-Dist: onnxruntime-genai-directml-ryzenai==0.7.0.2; extra == "oga-ryzenai"
 Requires-Dist: protobuf>=6.30.1; extra == "oga-ryzenai"
@@ -40,6 +41,7 @@ Requires-Dist: accelerate; extra == "dev"
 Requires-Dist: datasets; extra == "dev"
 Requires-Dist: pandas>=1.5.3; extra == "dev"
 Requires-Dist: matplotlib; extra == "dev"
+Requires-Dist: model-generate==1.5.0; (platform_system == "Windows" and python_version == "3.10") and extra == "dev"
 Requires-Dist: human-eval-windows==1.0.4; extra == "dev"
 Requires-Dist: lm-eval[api]; extra == "dev"
 Provides-Extra: oga-hybrid
@@ -136,7 +138,9 @@ Dynamic: summary
   <a href="https://discord.gg/5xXzkMu8Zk">Discord</a>
 </h3>
-Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.
+Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.
+Startups such as [Styrk AI](https://styrk.ai/styrk-ai-and-amd-guardrails-for-your-on-device-ai-revolution/), research teams like [Hazy Research at Stanford](https://www.amd.com/en/developer/resources/technical-articles/2025/minions--on-device-and-cloud-language-model-collaboration-on-ryz.html), and large companies like [AMD](https://www.amd.com/en/developer/resources/technical-articles/unlocking-a-wave-of-llm-apps-on-ryzen-ai-through-lemonade-server.html) use Lemonade to run LLMs.
 ## Getting Started
@@ -155,7 +159,7 @@ Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus
 </p>
 > [!TIP]
-> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or email lemonade@amd.com.
+> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or [email](lemonade@amd.com).
 ## Using the CLI
@@ -177,7 +181,10 @@ To check all models available, use the `list` command:
 lemonade-server list
 ```
-> Note: If you installed from source, use the `lemonade-server-dev` command instead.
+> **Note**:  If you installed from source, use the `lemonade-server-dev` command instead.
+> **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models.
 ## Model Library
@@ -219,7 +226,7 @@ Lemonade supports the following configurations, while also making it easy to swi
     <tr>
       <td><strong>🎮 GPU</strong></td>
       <td align="center">—</td>
-      <td align="center">Vulkan: All platforms<br><small>Focus:<br/>Ryzen™ AI 7000/8000/300<br/>Radeon™ 7000/9000</small></td>
+      <td align="center">Vulkan: All platforms<br>ROCm: Selected AMD platforms*</td>
       <td align="center">—</td>
       <td align="center">✅</td>
       <td align="center">✅</td>
@@ -235,6 +242,38 @@ Lemonade supports the following configurations, while also making it easy to swi
   </tbody>
 </table>
+<details>
+<summary><small><i>* See supported AMD ROCm platforms</i></small></summary>
+<br>
+<table>
+  <thead>
+    <tr>
+      <th>Architecture</th>
+      <th>Platform Support</th>
+      <th>GPU Models</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><b>gfx1151</b> (STX Halo)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Ryzen AI MAX+ Pro 395</td>
+    </tr>
+    <tr>
+      <td><b>gfx120X</b> (RDNA4)</td>
+      <td>Windows only</td>
+      <td>Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT</td>
+    </tr>
+    <tr>
+      <td><b>gfx110X</b> (RDNA3)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT</td>
+    </tr>
+  </tbody>
+</table>
+</details>
 ## Integrate Lemonade Server with Your Application

{lemonade_sdk-8.1.0 → lemonade_sdk-8.1.1}/README.md RENAMED Viewed

@@ -47,7 +47,9 @@
   <a href="https://discord.gg/5xXzkMu8Zk">Discord</a>
 </h3>
-Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.
+Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.
+Startups such as [Styrk AI](https://styrk.ai/styrk-ai-and-amd-guardrails-for-your-on-device-ai-revolution/), research teams like [Hazy Research at Stanford](https://www.amd.com/en/developer/resources/technical-articles/2025/minions--on-device-and-cloud-language-model-collaboration-on-ryz.html), and large companies like [AMD](https://www.amd.com/en/developer/resources/technical-articles/unlocking-a-wave-of-llm-apps-on-ryzen-ai-through-lemonade-server.html) use Lemonade to run LLMs.
 ## Getting Started
@@ -66,7 +68,7 @@ Lemonade makes it easy to run Large Language Models (LLMs) on your PC. Our focus
 </p>
 > [!TIP]
-> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or email lemonade@amd.com.
+> Want your app featured here? Let's do it! Shoot us a message on [Discord](https://discord.gg/5xXzkMu8Zk), [create an issue](https://github.com/lemonade-sdk/lemonade/issues), or [email](lemonade@amd.com).
 ## Using the CLI
@@ -88,7 +90,10 @@ To check all models available, use the `list` command:
 lemonade-server list
 ```
-> Note: If you installed from source, use the `lemonade-server-dev` command instead.
+> **Note**:  If you installed from source, use the `lemonade-server-dev` command instead.
+> **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models.
 ## Model Library
@@ -130,7 +135,7 @@ Lemonade supports the following configurations, while also making it easy to swi
     <tr>
       <td><strong>🎮 GPU</strong></td>
       <td align="center">—</td>
-      <td align="center">Vulkan: All platforms<br><small>Focus:<br/>Ryzen™ AI 7000/8000/300<br/>Radeon™ 7000/9000</small></td>
+      <td align="center">Vulkan: All platforms<br>ROCm: Selected AMD platforms*</td>
       <td align="center">—</td>
       <td align="center">✅</td>
       <td align="center">✅</td>
@@ -146,6 +151,38 @@ Lemonade supports the following configurations, while also making it easy to swi
   </tbody>
 </table>
+<details>
+<summary><small><i>* See supported AMD ROCm platforms</i></small></summary>
+<br>
+<table>
+  <thead>
+    <tr>
+      <th>Architecture</th>
+      <th>Platform Support</th>
+      <th>GPU Models</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><b>gfx1151</b> (STX Halo)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Ryzen AI MAX+ Pro 395</td>
+    </tr>
+    <tr>
+      <td><b>gfx120X</b> (RDNA4)</td>
+      <td>Windows only</td>
+      <td>Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT</td>
+    </tr>
+    <tr>
+      <td><b>gfx110X</b> (RDNA3)</td>
+      <td>Windows, Ubuntu</td>
+      <td>Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT</td>
+    </tr>
+  </tbody>
+</table>
+</details>
 ## Integrate Lemonade Server with Your Application

{lemonade_sdk-8.1.0 → lemonade_sdk-8.1.1}/setup.py RENAMED Viewed

@@ -49,7 +49,8 @@ setup(
         "jinja2",
         "tabulate",
         "sentencepiece",
-        "huggingface-hub==0.33.0",
+        "huggingface-hub[hf_xet]==0.33.0",
+        "python-dotenv",
     ],
     extras_require={
         # The non-dev extras are meant to deploy specific backends into end-user
@@ -73,6 +74,7 @@ setup(
             "datasets",
             "pandas>=1.5.3",
             "matplotlib",
+            "model-generate==1.5.0; platform_system=='Windows' and python_version=='3.10'",
             # Install human-eval from a forked repo with Windows support until the
             # PR (https://github.com/openai/human-eval/pull/53) is merged
             "human-eval-windows==1.0.4",

{lemonade_sdk-8.1.0 → lemonade_sdk-8.1.1}/src/lemonade/common/inference_engines.py RENAMED Viewed

@@ -2,7 +2,6 @@ import os
 import sys
 import importlib.util
 import importlib.metadata
-import platform
 import subprocess
 from abc import ABC, abstractmethod
 from typing import Dict, Optional
@@ -19,7 +18,9 @@ class InferenceEngineDetector:
         self.llamacpp_detector = LlamaCppDetector()
         self.transformers_detector = TransformersDetector()
-    def detect_engines_for_device(self, device_type: str) -> Dict[str, Dict]:
+    def detect_engines_for_device(
+        self, device_type: str, device_name: str
+    ) -> Dict[str, Dict]:
         """
         Detect all available inference engines for a specific device type.
@@ -36,10 +37,19 @@ class InferenceEngineDetector:
         if oga_info:
             engines["oga"] = oga_info
-        # Detect llama.cpp availability
-        llamacpp_info = self.llamacpp_detector.detect_for_device(device_type)
+        # Detect llama.cpp vulkan availability
+        llamacpp_info = self.llamacpp_detector.detect_for_device(
+            device_type, device_name, "vulkan"
+        )
+        if llamacpp_info:
+            engines["llamacpp-vulkan"] = llamacpp_info
+        # Detect llama.cpp rocm availability
+        llamacpp_info = self.llamacpp_detector.detect_for_device(
+            device_type, device_name, "rocm"
+        )
         if llamacpp_info:
-            engines["llamacpp"] = llamacpp_info
+            engines["llamacpp-rocm"] = llamacpp_info
         # Detect Transformers availability
         transformers_info = self.transformers_detector.detect_for_device(device_type)
@@ -206,57 +216,40 @@ class LlamaCppDetector(BaseEngineDetector):
     Detector for llama.cpp.
     """
-    def detect_for_device(self, device_type: str) -> Optional[Dict]:
+    def detect_for_device(
+        self, device_type: str, device_name: str, backend: str
+    ) -> Optional[Dict]:
         """
         Detect llama.cpp availability for specific device.
         """
         try:
-            # Map device types to llama.cpp backends
-            device_backend_map = {
-                "cpu": "cpu",
-                "amd_igpu": "vulkan",
-                "amd_dgpu": "vulkan",
-            }
-            if device_type not in device_backend_map:
+            if device_type not in ["cpu", "amd_igpu", "amd_dgpu"]:
                 return None
-            backend = device_backend_map[device_type]
-            is_installed = self.is_installed()
-            # Check requirements based on backend
-            if backend == "vulkan":
-                vulkan_available = self._check_vulkan_support()
-                if not vulkan_available:
-                    return {"available": False, "error": "Vulkan not available"}
-                # Vulkan is available
-                if is_installed:
-                    result = {
-                        "available": True,
-                        "version": self._get_llamacpp_version(),
-                        "backend": backend,
-                    }
-                    return result
-                else:
-                    return {
-                        "available": False,
-                        "error": "llama.cpp binaries not installed",
-                    }
-            else:
-                # CPU backend
-                if is_installed:
-                    result = {
-                        "available": True,
-                        "version": self._get_llamacpp_version(),
-                        "backend": backend,
-                    }
-                    return result
-                else:
-                    return {
-                        "available": False,
-                        "error": "llama.cpp binaries not installed",
-                    }
+            # Check if the device is supported by the backend
+            if device_type == "cpu":
+                device_supported = True
+            elif device_type == "amd_igpu" or device_type == "amd_dgpu":
+                if backend == "vulkan":
+                    device_supported = self._check_vulkan_support()
+                elif backend == "rocm":
+                    device_supported = self._check_rocm_support(device_name.lower())
+            if not device_supported:
+                return {"available": False, "error": f"{backend} not available"}
+            is_installed = self.is_installed(backend)
+            if not is_installed:
+                return {
+                    "available": False,
+                    "error": f"{backend} binaries not installed",
+                }
+            return {
+                "available": True,
+                "version": self._get_llamacpp_version(backend),
+                "backend": backend,
+            }
         except (ImportError, OSError, subprocess.SubprocessError) as e:
             return {
@@ -264,35 +257,17 @@ class LlamaCppDetector(BaseEngineDetector):
                 "error": f"llama.cpp detection failed: {str(e)}",
             }
-    def is_installed(self) -> bool:
+    def is_installed(self, backend: str) -> bool:
         """
-        Check if llama.cpp binaries are available.
+        Check if llama.cpp binaries are available for any backend.
         """
+        from lemonade.tools.llamacpp.utils import get_llama_server_exe_path
-        # Check lemonade-managed binary locations
         try:
-            # Check lemonade server directory
-            server_base_dir = os.path.join(
-                os.path.dirname(sys.executable), "llama_server"
-            )
-            if platform.system().lower() == "windows":
-                server_exe_path = os.path.join(server_base_dir, "llama-server.exe")
-            else:
-                # Check both build/bin and root directory locations
-                build_bin_path = os.path.join(
-                    server_base_dir, "build", "bin", "llama-server"
-                )
-                root_path = os.path.join(server_base_dir, "llama-server")
-                server_exe_path = (
-                    build_bin_path if os.path.exists(build_bin_path) else root_path
-                )
+            server_exe_path = get_llama_server_exe_path(backend)
             if os.path.exists(server_exe_path):
                 return True
-        except (ImportError, OSError):
+        except (ImportError, OSError, ValueError):
             pass
         return False
@@ -334,13 +309,22 @@ class LlamaCppDetector(BaseEngineDetector):
             except OSError:
                 return False
-    def _get_llamacpp_version(self) -> str:
+    def _check_rocm_support(self, device_name: str) -> bool:
+        """
+        Check if ROCM is available for GPU acceleration.
+        """
+        from lemonade.tools.llamacpp.utils import identify_rocm_arch_from_name
+        return identify_rocm_arch_from_name(device_name) is not None
+    def _get_llamacpp_version(self, backend: str) -> str:
         """
-        Get llama.cpp version from lemonade's managed installation.
+        Get llama.cpp version from lemonade's managed installation for specific backend.
         """
         try:
+            # Use backend-specific path - same logic as get_llama_folder_path in utils.py
             server_base_dir = os.path.join(
-                os.path.dirname(sys.executable), "llama_server"
+                os.path.dirname(sys.executable), backend, "llama_server"
             )
             version_file = os.path.join(server_base_dir, "version.txt")
@@ -401,15 +385,16 @@ class TransformersDetector(BaseEngineDetector):
         )
-def detect_inference_engines(device_type: str) -> Dict[str, Dict]:
+def detect_inference_engines(device_type: str, device_name: str) -> Dict[str, Dict]:
     """
     Helper function to detect inference engines for a device type.
     Args:
         device_type: "cpu", "amd_igpu", "amd_dgpu", or "npu"
+        device_name: device name
     Returns:
         dict: Engine availability information.
     """
     detector = InferenceEngineDetector()
-    return detector.detect_engines_for_device(device_type)
+    return detector.detect_engines_for_device(device_type, device_name)

lemonade-sdk 8.1.0__tar.gz → 8.1.1__tar.gz

Potentially problematic release.

lemonade-sdk 8.1.0tar.gz → 8.1.1tar.gz