PyPI - speedy-utils - Versions diffs - 1.1.40__tar.gz → 1.1.43__tar.gz - Mend

speedy-utils 1.1.40tar.gz → 1.1.43tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (147) hide show

speedy_utils-1.1.43/.githooks/pre-push ADDED Viewed

@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+set -euo pipefail
+if [[ "${SKIP_TAG_HOOK:-}" == "1" ]]; then
+  exit 0
+fi
+remote_name="${1:-}"
+remote_url="${2:-}"
+# Only tag for GitHub remotes.
+if [[ -z "$remote_url" || "$remote_url" != *"github.com"* ]]; then
+  exit 0
+fi
+today="$(date +%y%m%d)"
+max_suffix="$(git tag -l "${today}.*" | awk -F. 'NF==2 && $1 ~ /^[0-9]{6}$/ && $2 ~ /^[0-9]+$/ {print $2}' | sort -n | tail -n 1)"
+if [[ -z "${max_suffix}" ]]; then
+  next_suffix=1
+else
+  next_suffix=$((max_suffix + 1))
+fi
+tag="${today}.${next_suffix}"
+if git rev-parse -q --verify "refs/tags/${tag}" >/dev/null; then
+  exit 0
+fi
+git tag "${tag}"
+SKIP_TAG_HOOK=1 git push "${remote_name}" "refs/tags/${tag}"

speedy_utils-1.1.43/.github/prompts/improveParallelErrorHandling.prompt.md ADDED Viewed

@@ -0,0 +1,64 @@
+---
+name: improveParallelErrorHandling
+description: Enhance error tracebacks in parallel execution with rich formatting and context
+argument-hint: the parallel execution function and backend type
+---
+Improve error handling for the specified parallel execution function to provide clean, user-focused tracebacks similar to direct function calls.
+## Requirements
+1. **Filter Internal Frames**: Remove framework/library internal frames from tracebacks, showing only user code
+2. **Add Context Lines**: Display 3 lines before and after each error location with line numbers
+3. **Include Caller Frame**: Show where the parallel execution function was called, not just where the error occurred
+4. **Rich Formatting**: Use rich library's Panel/formatting for clean, readable output
+5. **Suppress Noise**: Set environment variables or flags to suppress verbose framework error logs
+## Implementation Steps
+1. **Capture Caller Context**: Use `inspect.currentframe().f_back` to capture where the parallel function was called (filename, line number, function name)
+2. **Wrap Error Handling**: Catch framework-specific exceptions (e.g., `RayTaskError`, thread exceptions) in the execution loop
+3. **Parse/Extract Original Exception**: Get the underlying user exception from the framework wrapper
+   - Extract exception type, message, and traceback information
+   - Parse from string representation if traceback objects aren't preserved
+4. **Filter Frames**: Skip frames matching internal paths:
+   - Framework internals (e.g., `ray/_private`, `concurrent/futures`)
+   - Library worker implementations (e.g., `speedy_utils/multi_worker`)
+   - Site-packages for the framework
+5. **Format with Context**:
+   - For each user frame, show: `filepath:lineno in function_name`
+   - Use `linecache.getline()` to retrieve surrounding lines
+   - Highlight the error line with `❱` marker
+   - Number all lines (e.g., `   4 │ code here` or `   5 ❱ error here`)
+6. **Display Caller Frame First**: Show where the parallel function was invoked before showing the actual error location
+7. **Clean Exit**: Flush output streams before exiting to ensure traceback displays
+## Example Output Format
+```
+╭─────────────── Traceback (most recent call last) ───────────────╮
+│ /path/to/user/script.py:42 in main                              │
+│                                                                  │
+│   40 │ data = load_data()                                        │
+│   41 │ # Process in parallel                                     │
+│   42 ❱ results = multi_process(process_item, data, workers=8)   │
+│   43 │                                                           │
+│                                                                  │
+│ /path/to/user/module.py:15 in process_item                      │
+│                                                                  │
+│   12 │ def process_item(item):                                   │
+│   13 │     value = item['key']                                   │
+│   14 │     denominator = value - 100                             │
+│   15 ❱     return 1 / denominator                                │
+│   16 │                                                           │
+╰──────────────────────────────────────────────────────────────────╯
+ZeroDivisionError: division by zero
+```
+Apply these improvements to the specified parallel execution function, ensuring error messages are as clear as direct function calls while maintaining all performance benefits of parallel execution.

{speedy_utils-1.1.40 → speedy_utils-1.1.43}/.gitignore RENAMED Viewed

@@ -122,3 +122,4 @@ dmypy.json
 .copilot
 .vscode/settings.json
 .codegen
+edu_results.json

speedy_utils-1.1.43/AGENTS.md ADDED Viewed

@@ -0,0 +1,32 @@
+# Repository Guidelines
+## Project Structure & Module Organization
+- `src/` contains `speedy_utils`, `llm_utils`, and `vision_utils` packages.
+- `tests/` holds automated tests; `examples/` and `notebooks/` are usage references.
+- `scripts/` and `experiments/` are for tooling and experiments; keep changes scoped.
+- `docs/` contains documentation assets.
+- `pyproject.toml`, `ruff.toml`, and `bumpversion.sh` define tooling and release helpers.
+## Build, Test, and Development Commands
+- `pip install -e .` installs the package in editable mode.
+- `uv pip install -e .` is a drop-in alternative if you use uv.
+- `python -m pytest` or `pytest tests` runs the test suite.
+- `ruff check .` runs lint rules; `ruff format .` formats code.
+## Coding Style & Naming Conventions
+- Formatting is aligned with Black-style settings (88 char lines) and Ruff rules in `ruff.toml`.
+- Use `snake_case` for Python modules and functions; class names follow `CamelCase`.
+- Keep public APIs exported from `src/*/__init__.py` small and intentional.
+## Testing Guidelines
+- Tests live in `tests/` and should be named `test_*.py`.
+- Prefer pytest-style assertions and keep fixtures near the tests that use them.
+## Commit & Pull Request Guidelines
+- Recent history includes informal messages; prefer concise, descriptive imperatives (e.g., `add cache backend`).
+- PRs should include test results and note any new dependencies or optional extras.

{speedy_utils-1.1.40 → speedy_utils-1.1.43}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: speedy-utils
-Version: 1.1.40
+Version: 1.1.43
 Summary: Fast and easy-to-use package for data science
 Project-URL: Homepage, https://github.com/anhvth/speedy
 Project-URL: Repository, https://github.com/anhvth/speedy
@@ -17,7 +17,7 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Classifier: Programming Language :: Python :: 3.14
-Requires-Python: >=3.8
+Requires-Python: >=3.9
 Requires-Dist: aiohttp
 Requires-Dist: bump2version
 Requires-Dist: cachetools
@@ -39,13 +39,15 @@ Requires-Dist: pydantic
 Requires-Dist: pytest
 Requires-Dist: ray
 Requires-Dist: requests
+Requires-Dist: rich>=14.3.1
 Requires-Dist: ruff
 Requires-Dist: scikit-learn
 Requires-Dist: tabulate
 Requires-Dist: tqdm
 Requires-Dist: xxhash
 Provides-Extra: ray
-Requires-Dist: ray>=2.49.1; (python_version >= '3.9') and extra == 'ray'
+Requires-Dist: ray[data,llm]>=2.40.0; extra == 'ray'
+Requires-Dist: vllm>=0.6.3; extra == 'ray'
 Description-Content-Type: text/markdown
 # Speedy Utils

speedy_utils-1.1.43/examples/llm_ray_example.py ADDED Viewed

@@ -0,0 +1,73 @@
+"""
+Example: Using LLMRay for distributed offline batch inference.
+This demonstrates how to process large batches of OpenAI-style messages
+across multiple GPUs in a Ray cluster with automatic data parallelism.
+Key concepts:
+    - dp (data parallel): Number of model replicas
+    - tp (tensor parallel): GPUs per replica
+    - Total GPUs = dp * tp
+"""
+from llm_utils import LLMRay
+from speedy_utils import dump_json_or_pickle
+# --- Example 1: Simple batch generation ---
+print('=== Example 1: Simple batch generation ===')
+# Create LLMRay instance
+# - dp=4: 4 model replicas (workers)
+# - tp=2: each replica uses 2 GPUs
+# - Total: 8 GPUs used
+# - If cluster has 16 GPUs across 2 nodes, Ray will distribute automatically
+llm = LLMRay(
+    model_name='Qwen/Qwen3-0.6B',
+    dp=4,
+    tp=2,
+    sampling_params={'temperature': 0.7, 'max_tokens': 128},
+)
+# Prepare messages (OpenAI format: list of message lists)
+messages_list = [
+    [{'role': 'user', 'content': 'What is artificial intelligence?'}],
+    [{'role': 'user', 'content': 'Explain quantum computing in simple terms.'}],
+    [{'role': 'user', 'content': 'Write a haiku about programming.'}],
+    [{'role': 'user', 'content': 'What are the benefits of distributed computing?'}],
+] + [[{'role': 'user', 'content': f'Summarize document {i}'}] for i in range(20)]
+# Generate responses (automatically distributed across all workers)
+results = llm.generate(messages_list)
+# Save results
+dump_json_or_pickle(results, 'llm_ray_results.json')
+print(f'\nProcessed {len(results)} messages')
+print(f'\nSample result:\n{results[0]}')
+# --- Example 2: Multi-turn conversation ---
+print('\n=== Example 2: Multi-turn conversation ===')
+# Multi-turn conversations with system prompts
+inputs = [
+    [
+        {'role': 'system', 'content': 'You are a creative writer.'},
+        {'role': 'user', 'content': 'Write a short story about a robot.'},
+    ],
+    [
+        {'role': 'system', 'content': 'You are a math tutor.'},
+        {'role': 'user', 'content': 'What is 2+2?'},
+        {'role': 'assistant', 'content': '2+2 equals 4.'},
+        {'role': 'user', 'content': 'What about 3+3?'},
+    ],
+]
+# Process conversations
+results = llm(inputs)  # Can also use __call__ syntax
+for i, result in enumerate(results):
+    print(f'\nConversation {i + 1}:')
+    print(f'Generated: {result["generated_text"][:100]}...')
+print('\n=== All examples completed! ===')

speedy_utils-1.1.43/examples/test_parallel_gpu.py ADDED Viewed

@@ -0,0 +1,61 @@
+import time
+import random
+import ray
+from vllm import LLM, SamplingParams
+from speedy_utils.multi_worker.parallel_gpu_pool import RayWorkerBase, RayRunner
+import os
+ray.init(ignore_reinit_error=True)
+# --- Define Your Worker ---
+class MyEduWorker(RayWorkerBase):
+    def setup(self):
+        print(f"Worker {self.worker_id}: Loading vLLM Engine...")
+        # Initialize vLLM
+        # Note: Set gpu_memory_utilization based on how many workers share a GPU
+        self.model = LLM(
+            model="Qwen/Qwen3-0.6B",
+            gpu_memory_utilization=0.4, # Adjust based on your GPU pool density
+            trust_remote_code=True,
+            enforce_eager=True,
+        )
+        # Set default sampling parameters
+        self.sampling_params = SamplingParams(
+            temperature=0.7,
+            top_p=0.9,
+            max_tokens=128
+        )
+    def process_one_item(self, item):
+        # 'item' is the prompt from your all_files list
+        prompt = f"Summarize this file metadata: {item}"
+        # vLLM offline generation
+        outputs = self.model.generate([prompt], self.sampling_params)
+        # Extract the generated text
+        generated_text = outputs[0].outputs[0].text
+        return {
+            "file": item,
+            "response": generated_text.strip(),
+            "worker_id": self.worker_id,
+            "gpu_idx": ray.get_runtime_context().get_assigned_resources().get("GPU", []),
+            "node_id": ray.get_runtime_context().node_id.hex(),
+            "cuda_visible_devices": os.environ.get("CUDA_VISIBLE_DEVICES", "")
+        }
+# --- Run It ---
+# Create fake data (prompts or filenames)
+all_files = [f"document_id_{i}" for i in range(20)]
+# Set test_mode=False if you want to use real GPUs
+runner = RayRunner(test_mode=False, gpus_per_worker=2)
+results = runner.run(
+    worker_class=MyEduWorker,
+    all_data=all_files
+)
+from speedy_utils import dump_json_or_pickle
+dump_json_or_pickle(results, "edu_results.json")

speedy_utils-1.1.43/notebooks/parallel_gpu_pool.ipynb ADDED Viewed

@@ -0,0 +1,89 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de8205ba",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ConnectionError",
+     "evalue": "Could not find any running Ray instance. Please specify the one to connect to by setting `--address` flag or `RAY_ADDRESS` environment variable.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mConnectionError\u001b[39m                           Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 24\u001b[39m\n\u001b[32m     20\u001b[39m \u001b[38;5;66;03m# --- Run It ---\u001b[39;00m\n\u001b[32m     21\u001b[39m \u001b[38;5;66;03m# Create fake data\u001b[39;00m\n\u001b[32m     22\u001b[39m all_files = [\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mfile_\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mi\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m.pdf\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m i \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mrange\u001b[39m(\u001b[32m500\u001b[39m)]\n\u001b[32m---> \u001b[39m\u001b[32m24\u001b[39m \u001b[38;5;28;43;01mwith\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mGPUCluster\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtest_mode\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mas\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mcluster\u001b[49m\u001b[43m:\u001b[49m\n\u001b[32m     25\u001b[39m \u001b[43m    \u001b[49m\u001b[43mresults\u001b[49m\u001b[43m \u001b[49m\u001b[43m=\u001b[49m\u001b[43m \u001b[49m\u001b[43mcluster\u001b[49m\u001b[43m.\u001b[49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m     26\u001b[39m \u001b[43m        \u001b[49m\u001b[43mworker_class\u001b[49m\u001b[43m=\u001b[49m\u001b[43mMyEduWorker\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m     27\u001b[39m \u001b[43m        \u001b[49m\u001b[43mall_data\u001b[49m\u001b[43m=\u001b[49m\u001b[43mall_files\u001b[49m\n\u001b[32m     28\u001b[39m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m     30\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33m\"\u001b[39m\u001b[33mProcessing Complete!\u001b[39m\u001b[33m\"\u001b[39m)\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/projects/speedy_utils/src/speedy_utils/multi_worker/parallel_gpu_pool.py:42\u001b[39m, in \u001b[36mGPUCluster.__enter__\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m     39\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\n\u001b[32m     41\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m ray.is_initialized():\n\u001b[32m---> \u001b[39m\u001b[32m42\u001b[39m     \u001b[43mray\u001b[49m\u001b[43m.\u001b[49m\u001b[43minit\u001b[49m\u001b[43m(\u001b[49m\u001b[43maddress\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mauto\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mignore_reinit_error\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[32m     43\u001b[39m     \u001b[38;5;28mself\u001b[39m.is_connected = \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[32m     45\u001b[39m resources = ray.cluster_resources()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m/mnt/data/anhvth8/venvs/Megatron-Bridge-Host/lib/python3.12/site-packages/ray/_private/client_mode_hook.py:104\u001b[39m, in \u001b[36mclient_mode_hook.<locals>.wrapper\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m    102\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m func.\u001b[34m__name__\u001b[39m != \u001b[33m\"\u001b[39m\u001b[33minit\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m is_client_mode_enabled_by_default:\n\u001b[32m    103\u001b[39m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(ray, func.\u001b[34m__name__\u001b[39m)(*args, **kwargs)\n\u001b[32m--> \u001b[39m\u001b[32m104\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m/mnt/data/anhvth8/venvs/Megatron-Bridge-Host/lib/python3.12/site-packages/ray/_private/worker.py:1818\u001b[39m, in \u001b[36minit\u001b[39m\u001b[34m(address, num_cpus, num_gpus, resources, labels, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, logging_config, log_to_driver, namespace, runtime_env, enable_resource_isolation, system_reserved_cpu, system_reserved_memory, **kwargs)\u001b[39m\n\u001b[32m   1815\u001b[39m     job_config.set_py_logging_config(logging_config)\n\u001b[32m   1817\u001b[39m redis_address, gcs_address = \u001b[38;5;28;01mNone\u001b[39;00m, \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1818\u001b[39m bootstrap_address = \u001b[43mservices\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcanonicalize_bootstrap_address\u001b[49m\u001b[43m(\u001b[49m\u001b[43maddress\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m_temp_dir\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1819\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m bootstrap_address \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m   1820\u001b[39m     gcs_address = bootstrap_address\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m/mnt/data/anhvth8/venvs/Megatron-Bridge-Host/lib/python3.12/site-packages/ray/_private/services.py:532\u001b[39m, in \u001b[36mcanonicalize_bootstrap_address\u001b[39m\u001b[34m(addr, temp_dir)\u001b[39m\n\u001b[32m    521\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"Canonicalizes Ray cluster bootstrap address to host:port.\u001b[39;00m\n\u001b[32m    522\u001b[39m \u001b[33;03mReads address from the environment if needed.\u001b[39;00m\n\u001b[32m    523\u001b[39m \n\u001b[32m   (...)\u001b[39m\u001b[32m    529\u001b[39m \u001b[33;03m    should start a local Ray instance.\u001b[39;00m\n\u001b[32m    530\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m    531\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m addr \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m addr == \u001b[33m\"\u001b[39m\u001b[33mauto\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m--> \u001b[39m\u001b[32m532\u001b[39m     addr = \u001b[43mget_ray_address_from_environment\u001b[49m\u001b[43m(\u001b[49m\u001b[43maddr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtemp_dir\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    533\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m addr \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m addr == \u001b[33m\"\u001b[39m\u001b[33mlocal\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m    534\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m/mnt/data/anhvth8/venvs/Megatron-Bridge-Host/lib/python3.12/site-packages/ray/_private/services.py:419\u001b[39m, in \u001b[36mget_ray_address_from_environment\u001b[39m\u001b[34m(addr, temp_dir)\u001b[39m\n\u001b[32m    417\u001b[39m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m    418\u001b[39m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m419\u001b[39m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mConnectionError\u001b[39;00m(\n\u001b[32m    420\u001b[39m             \u001b[33m\"\u001b[39m\u001b[33mCould not find any running Ray instance. \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    421\u001b[39m             \u001b[33m\"\u001b[39m\u001b[33mPlease specify the one to connect to by setting `--address` flag \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    422\u001b[39m             \u001b[33m\"\u001b[39m\u001b[33mor `RAY_ADDRESS` environment variable.\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    423\u001b[39m         )\n\u001b[32m    425\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m bootstrap_addr\n",
+      "\u001b[31mConnectionError\u001b[39m: Could not find any running Ray instance. Please specify the one to connect to by setting `--address` flag or `RAY_ADDRESS` environment variable."
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import random\n",
+    "# Import the class file we just created\n",
+    "from speedy_utils.multi_worker.parallel_gpu_pool import ParallelGPUPool, GPUCluster\n",
+    "\n",
+    "# --- Define Your Worker ---\n",
+    "class MyEduWorker(ParallelGPUPool):\n",
+    "    def setup(self):\n",
+    "        # Load your heavy model here\n",
+    "        print(f\"Worker {self.worker_id}: Loading Model...\")\n",
+    "        time.sleep(1) # Simulate load\n",
+    "        \n",
+    "    def process_one_item(self, item):\n",
+    "        # Simulate GPU Work\n",
+    "        time.sleep(random.uniform(0.05, 0.2)) \n",
+    "        \n",
+    "        # Return whatever you want (filename, score, etc)\n",
+    "        return f\"{item}_DONE\"\n",
+    "\n",
+    "# --- Run It ---\n",
+    "# Create fake data\n",
+    "all_files = [f\"file_{i}.pdf\" for i in range(500)]\n",
+    "\n",
+    "cluster = GPUCluster(test_mode=False)\n",
+    "results = cluster.run(\n",
+    "    worker_class=MyEduWorker,\n",
+    "    all_data=all_files\n",
+    ")\n",
+    "\n",
+    "print(\"Processing Complete!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b67e6b5",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Megatron-Bridge-Host (3.12.12)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

{speedy_utils-1.1.40 → speedy_utils-1.1.43}/pyproject.toml RENAMED Viewed

@@ -1,11 +1,11 @@
 [project]
 name = "speedy-utils"
-version = "1.1.40"
+version = "1.1.43"
 description = "Fast and easy-to-use package for data science"
 authors = [{ name = "AnhVTH", email = "anhvth.226@gmail.com" }]
 readme = "README.md"
 license = { text = "MIT" }
-requires-python = ">=3.8"
+requires-python = ">=3.9"
 dependencies = [
     "numpy",
     "requests",
@@ -33,6 +33,7 @@ dependencies = [
     "ray",
     "aiohttp",
     "pytest",
+    "rich>=14.3.1",
 ]
 classifiers = [
     "Development Status :: 4 - Beta",
@@ -53,7 +54,10 @@ Homepage = "https://github.com/anhvth/speedy"
 Repository = "https://github.com/anhvth/speedy"
 [project.optional-dependencies]
-ray = ["ray>=2.49.1; python_version >= '3.9'"]
+ray = [
+    "vllm>=0.6.3",
+    "ray[data,llm]>=2.40.0",
+]
 [project.scripts]
 mpython = "speedy_utils.scripts.mpython:main"

speedy_utils-1.1.43/scripts/bug.py ADDED Viewed

@@ -0,0 +1,34 @@
+# type: ignore
+from speedy_utils import multi_process, multi_thread
+def do_something(x):
+    if x % 3 == 0:
+        raise ValueError(f'Error at index {x}')
+    return x * 2
+inputs = range(10)
+if __name__ == '__main__':
+    print('Testing error_handler="log" with mp backend:')
+    results = multi_process(
+        do_something,
+        inputs,
+        backend='mp',
+        error_handler='log',
+        max_error_files=5,
+    )
+    print(f'Results: {results}')
+    print()
+    # print('Testing error_handler="log" with multi_thread:')
+    # results = multi_thread(
+    #     do_something,
+    #     inputs,
+    #     error_handler='log',
+    #     max_error_files=5,
+    # )
+    # print(f'Results: {results}')

speedy_utils-1.1.43/scripts/bug_simple.py ADDED Viewed

@@ -0,0 +1,11 @@
+from speedy_utils import *
+def do_something(x):
+    x = 10
+    y = 0
+    x/y
+do_something(1)

speedy-utils 1.1.40__tar.gz → 1.1.43__tar.gz

speedy-utils 1.1.40tar.gz → 1.1.43tar.gz