PyPI - lsdataset - Versions diffs - 0.1.0__tar.gz - Mend

lsdataset 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

lsdataset-0.1.0/PKG-INFO +88 -0
lsdataset-0.1.0/README.md +53 -0
lsdataset-0.1.0/lsdataset/__init__.py +4 -0
lsdataset-0.1.0/lsdataset/api/__init__.py +1 -0
lsdataset-0.1.0/lsdataset/api/routes.py +233 -0
lsdataset-0.1.0/lsdataset/core/__init__.py +1 -0
lsdataset-0.1.0/lsdataset/core/dataset.py +1631 -0
lsdataset-0.1.0/lsdataset/core/manager.py +442 -0
lsdataset-0.1.0/lsdataset/io/__init__.py +1 -0
lsdataset-0.1.0/lsdataset/io/imwriter.py +250 -0
lsdataset-0.1.0/lsdataset/io/utils.py +340 -0
lsdataset-0.1.0/lsdataset/io/video_utils.py +591 -0
lsdataset-0.1.0/lsdataset/ldp/__init__.py +15 -0
lsdataset-0.1.0/lsdataset/ldp/ldp_proxy.py +613 -0
lsdataset-0.1.0/lsdataset/logger.py +10 -0
lsdataset-0.1.0/lsdataset/schemas/__init__.py +1 -0
lsdataset-0.1.0/lsdataset/schemas/types.py +81 -0
lsdataset-0.1.0/lsdataset.egg-info/PKG-INFO +88 -0
lsdataset-0.1.0/lsdataset.egg-info/SOURCES.txt +22 -0
lsdataset-0.1.0/lsdataset.egg-info/dependency_links.txt +1 -0
lsdataset-0.1.0/lsdataset.egg-info/requires.txt +19 -0
lsdataset-0.1.0/lsdataset.egg-info/top_level.txt +2 -0
lsdataset-0.1.0/pyproject.toml +54 -0
lsdataset-0.1.0/setup.cfg +4 -0

lsdataset-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,88 @@
+Metadata-Version: 2.4
+Name: lsdataset
+Version: 0.1.0
+Summary: LeRobot-style dataset I/O (core); FastAPI / LDP 能力通过 extras 安装。
+License: Apache-2.0
+Keywords: lerobot,robot,dataset,parquet,fastapi,robotics
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: pydantic>=2.7
+Requires-Dist: numpy>=1.26
+Requires-Dist: Pillow>=10.3
+Requires-Dist: pandas
+Requires-Dist: datasets==3.2.0
+Requires-Dist: pyarrow==22.0.0
+Requires-Dist: jsonlines>=4.0.0
+Requires-Dist: av==14.2.0
+Provides-Extra: ldp
+Requires-Dist: requests==2.32.5; extra == "ldp"
+Provides-Extra: server
+Requires-Dist: fastapi>=0.111; extra == "server"
+Requires-Dist: requests==2.32.5; extra == "server"
+Provides-Extra: all
+Requires-Dist: fastapi>=0.111; extra == "all"
+Requires-Dist: requests==2.32.5; extra == "all"
+# lsdataset
+Monorepo 内的 Python 包（目录 **`packages/lsdataset/`**），PyPI 项目名与 **import 名**均为 **`lsdataset`**（[PyPI 检索 `lsdataset`](https://pypi.org/search/?q=lsdataset) 当前无同名项目，占名风险低）。
+包根 `import lsdataset` **不会**预加载路由或 FastAPI；请显式导入，例如 `from lsdataset.api.routes import router`、`from lsdataset.core.manager import add_frame`。
+安装（自仓库根或 `backend` 目录）：
+```bash
+# 仅核心：数据集读写（LsRobotDataset、io 等）
+pip install -e ../packages/lsdataset
+# 与 LCP 后端一致：含 FastAPI 路由与 LDP 请求依赖
+pip install -e "../packages/lsdataset[all]"
+# 或等价：pip install -e "../packages/lsdataset[server]"
+```
+Extras：`[ldp]` 仅加 `requests`；`[server]` / `[all]` 为 `fastapi` + `requests`（`api.routes` 同时依赖二者）。
+**与 LCP 后端的版本单一来源**：`datasets` / `pyarrow` / `jsonlines` / `av`、`fastapi`、`pydantic`、`numpy`、`Pillow`、`requests` 等仅在 **`pyproject.toml`** 中声明；`backend/requirements.txt` 首行 editable 安装本包即可，请勿在后端文件里重复 pin 上述包（避免两处漂移）。
+异步图像写入并发度可通过环境变量覆盖（默认 `4`）：
+- `LSDATASET_IMAGE_WRITER_PROCESSES`
+- `LSDATASET_IMAGE_WRITER_THREADS`
+---
+## 发布到公网 PyPI（操作清单）
+1. **占名**：在 [pypi.org/search](https://pypi.org/search/?q=lsdataset) 确认 **`lsdataset`** 仍可用；若将来被占，再改 `pyproject.toml` 的 `[project] name`（必要时同步改 import 包目录名）。
+2. **账号**：在 [pypi.org](https://pypi.org) 注册，开启 **2FA**，在 Account settings 创建 **API token**（建议 scope 选「整个账户」或后续「仅项目」）。
+3. **元数据**：确认 `pyproject.toml` 中 **`license`**、**`classifiers`**、**`version`** 与法务/开源策略一致；必要时在包根增加 **`LICENSE`** 文件并与声明一致。
+4. **构建**（在 `packages/lsdataset` 目录）：
+   ```bash
+   python -m pip install -U build twine
+   python -m build
+   twine check dist/*
+   ```
+5. **先试 TestPyPI**：在 [test.pypi.org](https://test.pypi.org) 同样注册并创建 token，然后：
+   ```bash
+   twine upload --repository testpypi dist/*
+   ```
+   用 **仅 TestPyPI** 或 **extra-index-url** 试装（因 TestPyPI 上可能没有 `datasets` 等依赖，试装时常需同时指定 `https://pypi.org/simple` 作依赖索引）。
+6. **正式上传**：
+   ```bash
+   twine upload dist/*
+   ```
+   或使用 PyPI **Trusted Publishing**（GitHub Actions OIDC，免长期 token，见 PyPI 文档）。
+7. **LCP 仓库消费方式**：将 `backend/requirements.txt` 首行改为例如
+   `lsdataset[all]==0.1.0`
+   （版本与 PyPI 已发布一致；发新版后 bump `pyproject.toml` 的 `version` 再打 tag。）
+8. **版本与 tag**：发版后在本仓库打 **git tag**（如 `lsdataset-v0.1.0`），便于从 monorepo 追溯与写 changelog。

lsdataset-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,53 @@
+# lsdataset
+Monorepo 内的 Python 包（目录 **`packages/lsdataset/`**），PyPI 项目名与 **import 名**均为 **`lsdataset`**（[PyPI 检索 `lsdataset`](https://pypi.org/search/?q=lsdataset) 当前无同名项目，占名风险低）。
+包根 `import lsdataset` **不会**预加载路由或 FastAPI；请显式导入，例如 `from lsdataset.api.routes import router`、`from lsdataset.core.manager import add_frame`。
+安装（自仓库根或 `backend` 目录）：
+```bash
+# 仅核心：数据集读写（LsRobotDataset、io 等）
+pip install -e ../packages/lsdataset
+# 与 LCP 后端一致：含 FastAPI 路由与 LDP 请求依赖
+pip install -e "../packages/lsdataset[all]"
+# 或等价：pip install -e "../packages/lsdataset[server]"
+```
+Extras：`[ldp]` 仅加 `requests`；`[server]` / `[all]` 为 `fastapi` + `requests`（`api.routes` 同时依赖二者）。
+**与 LCP 后端的版本单一来源**：`datasets` / `pyarrow` / `jsonlines` / `av`、`fastapi`、`pydantic`、`numpy`、`Pillow`、`requests` 等仅在 **`pyproject.toml`** 中声明；`backend/requirements.txt` 首行 editable 安装本包即可，请勿在后端文件里重复 pin 上述包（避免两处漂移）。
+异步图像写入并发度可通过环境变量覆盖（默认 `4`）：
+- `LSDATASET_IMAGE_WRITER_PROCESSES`
+- `LSDATASET_IMAGE_WRITER_THREADS`
+---
+## 发布到公网 PyPI（操作清单）
+1. **占名**：在 [pypi.org/search](https://pypi.org/search/?q=lsdataset) 确认 **`lsdataset`** 仍可用；若将来被占，再改 `pyproject.toml` 的 `[project] name`（必要时同步改 import 包目录名）。
+2. **账号**：在 [pypi.org](https://pypi.org) 注册，开启 **2FA**，在 Account settings 创建 **API token**（建议 scope 选「整个账户」或后续「仅项目」）。
+3. **元数据**：确认 `pyproject.toml` 中 **`license`**、**`classifiers`**、**`version`** 与法务/开源策略一致；必要时在包根增加 **`LICENSE`** 文件并与声明一致。
+4. **构建**（在 `packages/lsdataset` 目录）：
+   ```bash
+   python -m pip install -U build twine
+   python -m build
+   twine check dist/*
+   ```
+5. **先试 TestPyPI**：在 [test.pypi.org](https://test.pypi.org) 同样注册并创建 token，然后：
+   ```bash
+   twine upload --repository testpypi dist/*
+   ```
+   用 **仅 TestPyPI** 或 **extra-index-url** 试装（因 TestPyPI 上可能没有 `datasets` 等依赖，试装时常需同时指定 `https://pypi.org/simple` 作依赖索引）。
+6. **正式上传**：
+   ```bash
+   twine upload dist/*
+   ```
+   或使用 PyPI **Trusted Publishing**（GitHub Actions OIDC，免长期 token，见 PyPI 文档）。
+7. **LCP 仓库消费方式**：将 `backend/requirements.txt` 首行改为例如
+   `lsdataset[all]==0.1.0`
+   （版本与 PyPI 已发布一致；发新版后 bump `pyproject.toml` 的 `version` 再打 tag。）
+8. **版本与 tag**：发版后在本仓库打 **git tag**（如 `lsdataset-v0.1.0`），便于从 monorepo 追溯与写 changelog。

lsdataset-0.1.0/lsdataset/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+"""LeRobot-style dataset I/O；HTTP 与录制入口请从子模块显式导入。
+示例：``from lsdataset.api.routes import router``，``from lsdataset.core.manager import add_frame``。
+"""

lsdataset-0.1.0/lsdataset/api/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """FastAPI routes for `/lsdataset/*` (see `routes.py`)."""

lsdataset-0.1.0/lsdataset/api/routes.py ADDED Viewed

@@ -0,0 +1,233 @@
+import asyncio
+from concurrent.futures import ThreadPoolExecutor
+from functools import wraps
+from pathlib import Path
+from typing import Callable, List, Optional
+from fastapi import APIRouter, HTTPException, Query
+from fastapi.responses import FileResponse
+from ..core.manager import manager
+from ..io.utils import get_disk_usage
+from ..ldp.ldp_proxy import (
+    get_upload_progress,
+    ldp_login,
+    ldp_precheck_upload_datasets,
+    ldp_upload_datasets,
+)
+from ..logger import logger
+from ..schemas.types import (
+    CommonOperationResp,
+    ConfigDatasetReq,
+    DatasetsListResp,
+    DeleteDatasetReq,
+    DeleteEpisodeReq,
+    LdpLoginReq,
+    LdpUploadPrecheckReq,
+    LdpUploadReq,
+    StartRecordingReq,
+    StopRecordingReq,
+)
+router = APIRouter(tags=["Datasets"])
+# 创建线程池执行器用于执行阻塞操作（进程内单例；退出时在 lifespan 中 shutdown）
+_executor = ThreadPoolExecutor(max_workers=10, thread_name_prefix="lsdataset")
+def shutdown_routes_executor(*, wait: bool = True, cancel_futures: bool = True) -> None:
+    """关闭本模块使用的 ``ThreadPoolExecutor``。
+    应在宿主应用 **lifespan 退出阶段**（或进程退出前）调用一次；对 CPython 3.9+ 重复调用 ``shutdown`` 为安全 no-op。
+    ``cancel_futures=True`` 可在关闭时取消尚未开始执行的排队任务，避免长时间阻塞。
+    """
+    try:
+        _executor.shutdown(wait=wait, cancel_futures=cancel_futures)
+    except Exception as e:
+        logger.warning("shutdown_routes_executor: %s", e, exc_info=True)
+async def run_in_executor(func, *args, **kwargs):
+    """在线程池中执行同步函数"""
+    loop = asyncio.get_running_loop()
+    if kwargs:
+        return await loop.run_in_executor(_executor, lambda: func(*args, **kwargs))
+    else:
+        return await loop.run_in_executor(_executor, func, *args)
+def catch_http_exceptions(err_info: str) -> Callable:
+    """装饰器：捕获 HTTPException 并重新抛出，其他异常转换为 HTTP 500（仅用于 async 路由）。"""
+    def decorator(func: Callable) -> Callable:
+        @wraps(func)
+        async def wrapper(*args, **kwargs):
+            try:
+                return await func(*args, **kwargs)
+            except HTTPException:
+                raise
+            except Exception as e:
+                logger.error(f"{err_info}: {e}", exc_info=True)
+                raise HTTPException(status_code=500, detail=str(e))
+        return wrapper
+    return decorator
+@router.get("/lsdataset/disk-usage")
+@catch_http_exceptions("Error getting disk usage")
+async def get_lsdataset_disk_usage():
+    """获取磁盘的使用情况"""
+    disk_usage = await run_in_executor(get_disk_usage, Path("."))
+    return disk_usage
+@router.get("/lsdataset/datasets-with-details", response_model=DatasetsListResp)
+@catch_http_exceptions("Error listing datasets")
+async def list_datasets_with_details(
+    dataset_type: str = Query(..., description="数据集类型"),
+    detail_keys: Optional[List[str]] = Query(None, description="需要返回的详情键列表")
+):
+    """获取指定类型的数据集列表"""
+    datasets = await run_in_executor(manager.list_datasets, dataset_type, detail_keys)
+    return DatasetsListResp(datasets=datasets)
+@router.get("/lsdataset/view/{dataset_type}/{repo_id}")
+@catch_http_exceptions("Error getting dataset info")
+async def view_dataset(
+    dataset_type: str,
+    repo_id: str
+):
+    """获取指定数据集的信息"""
+    info = await run_in_executor(manager.get_dataset_info, dataset_type, repo_id)
+    return info
+@router.get("/lsdataset/view/{dataset_type}/{repo_id}/{episode_index}")
+@catch_http_exceptions("Error getting episode data")
+async def view_episode(
+    dataset_type: str,
+    repo_id: str,
+    episode_index: int
+):
+    """获取指定数据集 episode 的可视化数据"""
+    episode_data = await run_in_executor(manager.get_episode_data, dataset_type, repo_id, episode_index)
+    return episode_data
+@router.get("/lsdataset/view/{dataset_type}/{repo_id}/{episode_index}/{video_key}.mp4")
+@catch_http_exceptions("Error getting episode videos")
+async def view_episode_video(
+    dataset_type: str,
+    repo_id: str,
+    episode_index: int,
+    video_key: str
+):
+    """获取指定数据集 episode 的视频文件列表"""
+    video = await run_in_executor(manager.get_episode_video, dataset_type, repo_id, episode_index, video_key)
+    return FileResponse(video, media_type="video/mp4")
+@router.get("/lsdataset/internal/config-initialized")
+@catch_http_exceptions("Error checking config initialized")
+async def internal_config_initialized() -> bool:
+    """检查数据集配置是否已初始化"""
+    return manager.config_initialized
+@router.post("/lsdataset/internal/config", response_model=CommonOperationResp)
+@catch_http_exceptions("Error configuring dataset")
+async def internal_config_dataset(config: ConfigDatasetReq) -> CommonOperationResp:
+    """配置数据集参数"""
+    await run_in_executor(manager.config_dataset, config)
+    return CommonOperationResp(ok=True, message="数据集配置已更新")
+@router.post("/lsdataset/internal/start_recording", response_model=CommonOperationResp)
+@catch_http_exceptions("Error starting recording")
+async def internal_start_recording(req: StartRecordingReq) -> CommonOperationResp:
+    """开始记录数据集"""
+    await run_in_executor(manager.start_recording, req.dataset_type, req.repo_id)
+    return CommonOperationResp(ok=True, message="数据集记录已开始")
+@router.post("/lsdataset/internal/stop_recording", response_model=CommonOperationResp)
+@catch_http_exceptions("Error stopping recording")
+async def internal_stop_recording(req: StopRecordingReq) -> CommonOperationResp:
+    """
+    停止记录数据集
+    """
+    await run_in_executor(manager.stop_recording, req.abort)
+    return CommonOperationResp(ok=True, message="数据集记录已停止")
+@router.post("/lsdataset/delete_episode", response_model=CommonOperationResp)
+@catch_http_exceptions("Error deleting episode")
+async def delete_episode(req: DeleteEpisodeReq) -> CommonOperationResp:
+    """删除数据集中的指定 episode"""
+    await run_in_executor(manager.delete_episode, req.dataset_type, req.repo_id, req.episode_index)
+    return CommonOperationResp(ok=True, message="数据集 episode 已删除")
+@router.post("/lsdataset/delete_dataset", response_model=CommonOperationResp)
+@catch_http_exceptions("Error deleting dataset")
+async def delete_dataset(req: DeleteDatasetReq) -> CommonOperationResp:
+    """删除指定的数据集"""
+    await run_in_executor(manager.delete_dataset, req.dataset_type, req.repo_id)
+    return CommonOperationResp(ok=True, message="数据集已删除")
+# LDP 代理接口（对接灵生数据平台）
+@router.post("/lsdataset/ldp/login")
+@catch_http_exceptions("Error proxying LDP login")
+async def ldp_proxy_login(req: LdpLoginReq):
+    """代理调用 LDP 登录接口，返回 access_token 等"""
+    result = await run_in_executor(
+        ldp_login,
+        req.ldp_base_url,
+        req.phone,
+        req.password,
+    )
+    return result
+@router.post("/lsdataset/ldp/upload")
+@catch_http_exceptions("Error proxying LDP upload")
+async def ldp_proxy_upload(req: LdpUploadReq):
+    """代理将 LCP 本地数据集上传到 LDP。采集设备默认从硬件（设备树/DMI）派生机器码，数据类型默认 lerobot。"""
+    dataset_ids = [{"dataset_type": d.dataset_type, "repo_id": d.repo_id} for d in req.dataset_ids]
+    results = await run_in_executor(
+        ldp_upload_datasets,
+        req.ldp_base_url,
+        req.access_token,
+        dataset_ids,
+        req.collection_device,
+        req.data_type,
+        req.source,
+    )
+    return {"results": results}
+@router.post("/lsdataset/ldp/upload/precheck")
+@catch_http_exceptions("Error proxying LDP upload precheck")
+async def ldp_proxy_upload_precheck(req: LdpUploadPrecheckReq):
+    """上传前预检查：判断选中的数据集是否可能触发覆盖。"""
+    dataset_ids = [{"dataset_type": d.dataset_type, "repo_id": d.repo_id} for d in req.dataset_ids]
+    results = await run_in_executor(
+        ldp_precheck_upload_datasets,
+        req.ldp_base_url,
+        req.access_token,
+        dataset_ids,
+        req.collection_device,
+        req.data_type,
+    )
+    return {"results": results}
+@router.get("/lsdataset/ldp/upload/progress")
+async def ldp_proxy_upload_progress(dataset_name: str):
+    """
+    查询指定数据集当前的上传进度（仅针对通过 /lsdataset/ldp/upload 发起的上传）。
+    """
+    return get_upload_progress(dataset_name)

lsdataset-0.1.0/lsdataset/core/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """Dataset domain (`LsRobotDataset`) and process-wide `LsDatasetManager`."""