PyPI - hyperstudy - Versions diffs - 0.2.1__tar.gz → 0.2.2__tar.gz - Mend

hyperstudy 0.2.1tar.gz → 0.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hyperstudy
-Version: 0.2.1
+Version: 0.2.2
 Summary: Python SDK for the HyperStudy experiment platform API
 Project-URL: Homepage, https://hyperstudy.io
 Project-URL: Documentation, https://docs.hyperstudy.io/developers/python-sdk

hyperstudy-0.2.2/docs/superpowers/specs/2026-04-10-recording-downloads-design.md ADDED Viewed

@@ -0,0 +1,128 @@
+# Recording Downloads via Python SDK
+## Problem
+The Python SDK's `get_recordings()` returns metadata only. Users need the actual audio/video files for offline analysis (ML models, manual review, archival). Currently they must manually extract `downloadUrl` from each record and fetch files themselves.
+## Decision: SDK-only, no backend changes
+The V3 API already returns signed GCS download URLs (7-day expiry) in the recording metadata. The SDK will fetch metadata and download files in the same call, so URL expiry is not a practical concern. This matches how the frontend downloads recordings.
+## API Surface
+### `download_recordings()` — Bulk download
+```python
+df = hs.download_recordings(
+    "exp_abc123",
+    output_dir="./data/recordings",
+    scope="experiment",           # "experiment" | "room" | "participant"
+    deployment_id=None,           # optional filter
+    room_id=None,                 # optional filter
+    recording_type=None,          # "audio" | "video" | None (both)
+    progress=True,                # tqdm progress bar
+    skip_existing=True,           # skip files already on disk with matching size
+)
+```
+**Returns**: `pandas.DataFrame` with all recording metadata columns plus:
+- `local_path` — absolute path to the downloaded file on disk
+- `download_status` — `"downloaded"`, `"skipped"`, or `"failed"`
+**Side effects**:
+- Writes media files to `output_dir`
+- Writes `recordings_metadata.csv` to `output_dir`
+### `download_recording()` — Single recording
+```python
+path = hs.download_recording(
+    recording,                    # dict from get_recordings(output="dict")
+    output_dir="./data/recordings",
+)
+```
+**Returns**: `pathlib.Path` to downloaded file.
+## Directory Structure
+```
+output_dir/
+  recordings_metadata.csv
+  user1_video_EG_abc123.mp4
+  user1_audio_EG_def456.webm
+  user2_video_EG_ghi789.mp4
+```
+**Filename pattern**: `{participantName}_{recordingType}_{recordingId}.{ext}`
+- `participantName`: from recording metadata, sanitized for filesystem safety
+- `recordingType`: `"video"` or `"audio"` from `metadata.type`
+- `recordingId`: egressId or recordingId
+- `ext`: from `format` field, falling back to `mp4` (video) or `webm` (audio)
+## Internal Design
+### Download flow (`download_recordings`)
+1. Call `self.get_recordings(scope_id, scope=scope, output="dict")` to get metadata
+2. Filter by `recording_type` if specified (via `metadata.type`)
+3. Create `output_dir` via `os.makedirs(exist_ok=True)`
+4. For each recording:
+   - Build filename using pattern above
+   - If `skip_existing=True` and file exists with size matching `fileSize` metadata, mark as `"skipped"`
+   - Otherwise, fetch from `downloadUrl` (fallback: `url`) using streaming HTTP GET
+   - Write to disk in 8KB chunks
+   - Mark as `"downloaded"` or `"failed"` (with warning logged)
+5. Build DataFrame from metadata, add `local_path` and `download_status` columns
+6. Write `recordings_metadata.csv` to `output_dir`
+7. Return DataFrame
+### Streaming downloads
+Use `requests.get(url, stream=True)` with chunked iteration to avoid loading large video files into memory. The SDK's existing `HttpTransport` handles JSON responses only, so file downloads use a standalone `requests.get()` — the signed GCS URLs don't need API key auth.
+### Error handling
+- Per-file failure tolerance: if one recording fails (404, timeout, network error), log a warning, set `download_status="failed"`, continue with remaining files
+- If the metadata API call itself fails, raise normally (same as `get_recordings()`)
+- Invalid/missing `downloadUrl`: set `download_status="failed"`, log warning
+### Skip-existing logic
+Compare `os.path.getsize(local_path)` against `fileSize` from metadata. If `fileSize` is `None` (metadata missing), fall back to checking file existence only (any existing file is considered complete).
+## File Layout
+| File | Change |
+|------|--------|
+| `src/hyperstudy/_downloads.py` | **New.** `build_filename()`, `download_file()` streaming helper |
+| `src/hyperstudy/client.py` | Add `download_recordings()` and `download_recording()` methods |
+| `tests/test_downloads.py` | **New.** Unit tests for filename building, skip logic, status tracking |
+| `tests/test_client.py` | Integration test: mock API + GCS, verify files + DataFrame |
+| `tests/fixtures/sparse_ratings_response.json` | Already exists (from prior work) |
+## Testing
+### Unit tests (`tests/test_downloads.py`)
+- `test_build_filename` — video, audio, missing fields, filesystem-unsafe characters
+- `test_build_filename_dedup` — duplicate names get numeric suffix
+- `test_skip_existing_matching_size` — file with correct size is skipped
+- `test_skip_existing_wrong_size` — file with wrong size is re-downloaded
+### Integration tests (`tests/test_client.py`)
+- `test_download_recordings` — mock API + GCS fetch, verify files on disk, CSV sidecar, DataFrame with `local_path` + `download_status`
+- `test_download_recordings_filter_type` — `recording_type="audio"` only downloads audio
+- `test_download_recording_single` — single recording download
+### Mocking strategy
+- V3 API: `responses` library (existing pattern)
+- GCS signed URL: also `responses` (it's just an HTTP GET to a URL)
+- File I/O: real writes to `pytest` `tmp_path`
+## No Backend Changes Required
+The existing V3 API endpoints return all necessary data:
+- `GET /api/v3/data/recordings/{scope}/{scopeId}` returns metadata with `downloadUrl`
+- Signed GCS URLs are valid for 7 days
+- SDK downloads immediately after fetching metadata, so expiry is not an issue

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "hyperstudy"
-version = "0.2.1"
+version = "0.2.2"
 description = "Python SDK for the HyperStudy experiment platform API"
 readme = "README.md"
 license = "MIT"

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/src/hyperstudy/__init__.py RENAMED Viewed

@@ -19,7 +19,7 @@ from .exceptions import (
     ValidationError,
 )
-__version__ = "0.2.0"
+__version__ = "0.2.2"
 __all__ = [
     "HyperStudy",

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/src/hyperstudy/_dataframe.py RENAMED Viewed

@@ -6,6 +6,51 @@ from typing import Any
 import pandas as pd
+# Nested dict fields to flatten into top-level columns.
+# Mapping of {field_name: prefix} — sub-keys become ``{prefix}_{sub_key}``.
+FLATTEN_FIELDS: dict[str, str] = {
+    "sparseRatingData": "sparseRatingData",
+    "metadata": "metadata",
+}
+def _flatten_nested_dicts(
+    data: list[dict[str, Any]],
+    fields: dict[str, str] | None = None,
+) -> list[dict[str, Any]]:
+    """Promote sub-keys of nested dict fields to top-level keys.
+    For each *field* present in a record whose value is a ``dict``, every
+    sub-key is copied to ``{prefix}_{sub_key}``.  The original nested dict
+    is preserved for backward compatibility.
+    Records where the target field is ``None`` or missing are left
+    untouched — downstream DataFrame construction fills those columns
+    with ``NaN`` / ``null``.
+    """
+    if not data:
+        return data
+    fields = fields if fields is not None else FLATTEN_FIELDS
+    # Quick check on first record — skip work when no target fields exist.
+    sample = data[0]
+    targets = [f for f in fields if f in sample and isinstance(sample[f], dict)]
+    if not targets:
+        return data
+    out: list[dict[str, Any]] = []
+    for record in data:
+        record = dict(record)  # shallow copy to avoid mutating caller's data
+        for field in targets:
+            nested = record.get(field)
+            if isinstance(nested, dict):
+                prefix = fields[field]
+                for sub_key, sub_val in nested.items():
+                    record[f"{prefix}_{sub_key}"] = sub_val
+        out.append(record)
+    return out
 def _post_process(df: pd.DataFrame) -> pd.DataFrame:
     """Shared post-processing for pandas DataFrames.
@@ -32,6 +77,7 @@ def to_pandas(data: list[dict[str, Any]]) -> pd.DataFrame:
     """Convert API response data to a pandas DataFrame with post-processing."""
     if not data:
         return pd.DataFrame()
+    data = _flatten_nested_dicts(data)
     df = pd.DataFrame(data)
     return _post_process(df)
@@ -51,6 +97,7 @@ def to_polars(data: list[dict[str, Any]]):
     if not data:
         return pl.DataFrame()
+    data = _flatten_nested_dicts(data)
     df = pl.DataFrame(data)
     # Parse timestamps

hyperstudy-0.2.2/src/hyperstudy/_downloads.py ADDED Viewed

@@ -0,0 +1,50 @@
+"""Helpers for downloading recording files from signed URLs."""
+from __future__ import annotations
+import re
+from pathlib import Path
+from typing import Any
+import requests
+_CHUNK_SIZE = 65536  # 64 KB — good balance for large video files
+_UNSAFE_RE = re.compile(r"[^\w\-]")
+def get_download_url(recording: dict[str, Any]) -> str | None:
+    """Return the best download URL from a recording dict, or ``None``."""
+    return recording.get("downloadUrl") or recording.get("url") or None
+def build_filename(recording: dict[str, Any]) -> str:
+    """Build a filesystem-safe filename from recording metadata.
+    Pattern: ``{participantName}_{type}_{recordingId}.{ext}``
+    """
+    name = recording.get("participantName") or recording.get("participantId") or "unknown"
+    name = _UNSAFE_RE.sub("_", name)
+    meta = recording.get("metadata") or {}
+    rec_type = meta.get("type") or "recording"
+    rec_id = recording.get("recordingId") or recording.get("egressId") or "unknown"
+    fmt = recording.get("format")
+    if not fmt:
+        fmt = "webm" if rec_type == "audio" else "mp4"
+    return f"{name}_{rec_type}_{rec_id}.{fmt}"
+def download_file(url: str, dest: Path, timeout: int = 300) -> int:
+    """Stream-download *url* to *dest* and return bytes written."""
+    resp = requests.get(url, stream=True, timeout=timeout)
+    resp.raise_for_status()
+    written = 0
+    with open(dest, "wb") as fh:
+        for chunk in resp.iter_content(chunk_size=_CHUNK_SIZE):
+            fh.write(chunk)
+            written += len(chunk)
+    return written

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/src/hyperstudy/client.py RENAMED Viewed

@@ -2,9 +2,14 @@
 from __future__ import annotations
+import warnings
+from pathlib import Path
 from typing import Any
+from tqdm.auto import tqdm
 from ._dataframe import to_pandas, to_polars
+from ._downloads import build_filename, download_file, get_download_url
 from ._http import HttpTransport
 from ._pagination import fetch_all_pages
 from ._types import Scope
@@ -466,6 +471,125 @@ class HyperStudy(ExperimentMixin):
             "consent": self.get_consent(participant_id, **common),
         }
+    # ------------------------------------------------------------------
+    # Recording downloads
+    # ------------------------------------------------------------------
+    def download_recording(
+        self,
+        recording: dict[str, Any],
+        output_dir: str = ".",
+    ) -> Path:
+        """Download a single recording file to disk.
+        Args:
+            recording: A recording dict (from ``get_recordings(output="dict")``).
+            output_dir: Directory to save the file.
+        Returns:
+            Path to the downloaded file.
+        """
+        url = get_download_url(recording)
+        if not url:
+            raise ValueError("Recording has no downloadUrl or url field")
+        dest_dir = Path(output_dir)
+        dest_dir.mkdir(parents=True, exist_ok=True)
+        filename = build_filename(recording)
+        dest = dest_dir / filename
+        download_file(url, dest)
+        return dest
+    def download_recordings(
+        self,
+        scope_id: str,
+        *,
+        output_dir: str,
+        scope: str = "experiment",
+        deployment_id: str | None = None,
+        room_id: str | None = None,
+        recording_type: str | None = None,
+        progress: bool = True,
+        skip_existing: bool = True,
+    ):
+        """Download recording files to disk.
+        Fetches recording metadata, downloads each file from its signed
+        URL, writes a ``recordings_metadata.csv`` sidecar, and returns a
+        DataFrame with a ``local_path`` column.
+        Args:
+            scope_id: Experiment, room, or participant ID.
+            output_dir: Directory to save files.
+            scope: ``"experiment"``, ``"room"``, or ``"participant"``.
+            deployment_id: Filter by deployment (experiment scope only).
+            room_id: Filter by room.
+            recording_type: ``"audio"``, ``"video"``, or ``None`` (both).
+            progress: Show progress bar.
+            skip_existing: Skip files already on disk with matching size.
+        Returns:
+            pandas DataFrame with recording metadata plus ``local_path``
+            and ``download_status`` columns.
+        """
+        recordings = self.get_recordings(
+            scope_id,
+            scope=scope,
+            deployment_id=deployment_id,
+            room_id=room_id,
+            output="dict",
+        )
+        if recording_type:
+            recordings = [
+                r for r in recordings
+                if (r.get("metadata") or {}).get("type") == recording_type
+            ]
+        dest_dir = Path(output_dir)
+        dest_dir.mkdir(parents=True, exist_ok=True)
+        local_paths: list[str | None] = []
+        statuses: list[str] = []
+        for rec in tqdm(recordings, desc="Downloading recordings", disable=not progress):
+            filename = build_filename(rec)
+            dest = dest_dir / filename
+            url = get_download_url(rec)
+            if not url:
+                local_paths.append(None)
+                statuses.append("failed")
+                warnings.warn(f"Recording {rec.get('recordingId')} has no download URL")
+                continue
+            if skip_existing and dest.exists():
+                expected_size = rec.get("fileSize")
+                if expected_size is None or dest.stat().st_size == expected_size:
+                    local_paths.append(str(dest.resolve()))
+                    statuses.append("skipped")
+                    continue
+            try:
+                download_file(url, dest)
+                local_paths.append(str(dest.resolve()))
+                statuses.append("downloaded")
+            except Exception as exc:
+                local_paths.append(None)
+                statuses.append("failed")
+                warnings.warn(
+                    f"Failed to download recording {rec.get('recordingId')}: {exc}"
+                )
+        df = to_pandas(recordings)
+        if not df.empty:
+            df["local_path"] = local_paths
+            df["download_status"] = statuses
+            df.to_csv(dest_dir / "recordings_metadata.csv", index=False)
+        return df
     # ------------------------------------------------------------------
     # Internal helpers
     # ------------------------------------------------------------------

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/tests/conftest.py RENAMED Viewed

@@ -71,6 +71,16 @@ def deployment_sessions_response():
     return load_fixture("deployment_sessions_response.json")
+@pytest.fixture
+def sparse_ratings_response():
+    return load_fixture("sparse_ratings_response.json")
+@pytest.fixture
+def recordings_response():
+    return load_fixture("recordings_response.json")
 @pytest.fixture
 def warnings_response():
     return load_fixture("warnings_response.json")

hyperstudy-0.2.2/tests/fixtures/recordings_response.json ADDED Viewed

@@ -0,0 +1,71 @@
+{
+  "status": "success",
+  "metadata": {
+    "dataType": "recordings",
+    "scope": "experiment",
+    "scopeId": "exp_abc123",
+    "timestamp": "2024-06-15T10:00:00.000Z",
+    "query": {
+      "limit": 1000,
+      "offset": 0,
+      "sort": "startTime",
+      "order": "asc"
+    },
+    "pagination": {
+      "total": 2,
+      "returned": 2,
+      "hasMore": false,
+      "limit": 1000,
+      "offset": 0
+    },
+    "processing": {
+      "processingTimeMs": 35,
+      "enriched": true,
+      "version": "3.0.0"
+    }
+  },
+  "data": [
+    {
+      "recordingId": "EG_video_001",
+      "egressId": "EG_video_001",
+      "participantId": "user_1",
+      "participantName": "Alice",
+      "startTime": "2024-06-15T10:00:05.000Z",
+      "endTime": "2024-06-15T10:05:05.000Z",
+      "duration": 300000,
+      "videoOffset": 500,
+      "url": "https://storage.googleapis.com/bucket/recordings/video1.mp4",
+      "downloadUrl": "https://storage.googleapis.com/bucket/recordings/video1.mp4?X-Goog-Signature=abc",
+      "fileSize": 1024,
+      "format": "mp4",
+      "status": "complete",
+      "metadata": {
+        "type": "video",
+        "recordingType": "individual",
+        "roomName": "room_1",
+        "experimentId": "exp_abc123"
+      }
+    },
+    {
+      "recordingId": "EG_audio_002",
+      "egressId": "EG_audio_002",
+      "participantId": "user_1",
+      "participantName": "Alice",
+      "startTime": "2024-06-15T10:00:05.000Z",
+      "endTime": "2024-06-15T10:05:05.000Z",
+      "duration": 300000,
+      "videoOffset": 500,
+      "url": "https://storage.googleapis.com/bucket/recordings/audio1.webm",
+      "downloadUrl": "https://storage.googleapis.com/bucket/recordings/audio1.webm?X-Goog-Signature=def",
+      "fileSize": 512,
+      "format": "webm",
+      "status": "complete",
+      "metadata": {
+        "type": "audio",
+        "recordingType": "audio",
+        "roomName": "room_1",
+        "experimentId": "exp_abc123"
+      }
+    }
+  ]
+}

hyperstudy-0.2.2/tests/fixtures/sparse_ratings_response.json ADDED Viewed

@@ -0,0 +1,108 @@
+{
+  "status": "success",
+  "metadata": {
+    "dataType": "ratings",
+    "ratingType": "sparse",
+    "scope": "experiment",
+    "scopeId": "exp_abc123",
+    "timestamp": "2024-06-15T10:00:00.000Z",
+    "query": {
+      "startTime": null,
+      "endTime": null,
+      "limit": 1000,
+      "offset": 0,
+      "sort": "timestamp",
+      "order": "asc"
+    },
+    "pagination": {
+      "total": 2,
+      "returned": 2,
+      "hasMore": false,
+      "limit": 1000,
+      "offset": 0
+    },
+    "processing": {
+      "processingTimeMs": 58,
+      "enriched": true,
+      "version": "3.0.0"
+    }
+  },
+  "data": [
+    {
+      "ratingId": "rat_001",
+      "participantId": "user_1",
+      "timestamp": "2024-06-15T10:01:30.000Z",
+      "onset": 8500,
+      "rawOnset": 8520,
+      "clockOffsetApplied": 20,
+      "value": 72.5,
+      "rawValue": 58,
+      "scale": { "min": 0, "max": 80 },
+      "type": "sparse",
+      "stateId": "state_video_1",
+      "stimulusId": "video_abc",
+      "stimulusTime": 5000,
+      "responseTime": 2100,
+      "confidence": null,
+      "metadata": {
+        "question": "How engaging is this video?",
+        "dimension": "engagement",
+        "componentType": "vasrating",
+        "sampleIndex": 0
+      },
+      "ratingEndOnset": 10600,
+      "sparseRatingData": {
+        "videoId": "video_abc",
+        "pauseIndex": 0,
+        "videoRelativeTime": 5000,
+        "pauseTimestamp": 1718445690000,
+        "componentType": "vasrating",
+        "componentData": { "value": 58 },
+        "previousRatings": null,
+        "mediaPauseOnset": 8200,
+        "mediaResumeOnset": 10800,
+        "actualPauseDuration": 2600
+      },
+      "stateStartTime": "2024-06-15T10:00:00.000Z",
+      "stateDuration": 60000
+    },
+    {
+      "ratingId": "rat_002",
+      "participantId": "user_1",
+      "timestamp": "2024-06-15T10:02:45.000Z",
+      "onset": 25300,
+      "rawOnset": 25320,
+      "clockOffsetApplied": 20,
+      "value": 45.0,
+      "rawValue": 36,
+      "scale": { "min": 0, "max": 80 },
+      "type": "sparse",
+      "stateId": "state_video_1",
+      "stimulusId": "video_abc",
+      "stimulusTime": 20000,
+      "responseTime": 1800,
+      "confidence": null,
+      "metadata": {
+        "question": "How engaging is this video?",
+        "dimension": "engagement",
+        "componentType": "vasrating",
+        "sampleIndex": 1
+      },
+      "ratingEndOnset": 27100,
+      "sparseRatingData": {
+        "videoId": "video_abc",
+        "pauseIndex": 1,
+        "videoRelativeTime": 20000,
+        "pauseTimestamp": 1718445765000,
+        "componentType": "vasrating",
+        "componentData": { "value": 36 },
+        "previousRatings": { "video_abc": 58 },
+        "mediaPauseOnset": 25000,
+        "mediaResumeOnset": 27300,
+        "actualPauseDuration": 2300
+      },
+      "stateStartTime": "2024-06-15T10:00:00.000Z",
+      "stateDuration": 60000
+    }
+  ]
+}

{hyperstudy-0.2.1 → hyperstudy-0.2.2}/tests/test_client.py RENAMED Viewed

@@ -224,6 +224,24 @@ def test_get_ratings_sparse(api_key, events_response):
     assert isinstance(df, pd.DataFrame)
+@responses.activate
+def test_get_ratings_sparse_flattens_data(api_key, sparse_ratings_response):
+    """Sparse ratings DataFrame contains flattened sparseRatingData columns."""
+    responses.get(
+        f"{BASE_URL}/data/ratings/sparse/experiment/exp_abc123",
+        json=sparse_ratings_response,
+        status=200,
+    )
+    client = HyperStudy(api_key=api_key, base_url=BASE_URL)
+    df = client.get_ratings("exp_abc123", kind="sparse", limit=1000)
+    assert isinstance(df, pd.DataFrame)
+    assert "sparseRatingData_mediaPauseOnset" in df.columns
+    assert "sparseRatingData_mediaResumeOnset" in df.columns
+    assert "sparseRatingData_actualPauseDuration" in df.columns
+    assert "metadata_question" in df.columns
+    assert df["sparseRatingData_mediaPauseOnset"].iloc[0] == 8200
 @responses.activate
 def test_get_sync_with_aggregation(api_key, events_response):
     """get_sync passes aggregationWindow param."""
@@ -237,6 +255,128 @@ def test_get_sync_with_aggregation(api_key, events_response):
     assert "aggregationWindow=5000" in responses.calls[0].request.url
+# ------------------------------------------------------------------
+# download_recordings
+# ------------------------------------------------------------------
+@responses.activate
+def test_download_recordings(api_key, recordings_response, tmp_path):
+    """download_recordings writes files, CSV sidecar, and returns DataFrame."""
+    # Mock the metadata API
+    responses.get(
+        f"{BASE_URL}/data/recordings/experiment/exp_abc123",
+        json=recordings_response,
+        status=200,
+    )
+    # Mock the GCS signed URL downloads
+    responses.get(
+        recordings_response["data"][0]["downloadUrl"],
+        body=b"fake video bytes",
+        status=200,
+    )
+    responses.get(
+        recordings_response["data"][1]["downloadUrl"],
+        body=b"fake audio bytes",
+        status=200,
+    )
+    client = HyperStudy(api_key=api_key, base_url=BASE_URL)
+    df = client.download_recordings(
+        "exp_abc123", output_dir=str(tmp_path), progress=False
+    )
+    assert isinstance(df, pd.DataFrame)
+    assert len(df) == 2
+    assert "local_path" in df.columns
+    assert "download_status" in df.columns
+    assert list(df["download_status"]) == ["downloaded", "downloaded"]
+    # Files exist on disk
+    assert (tmp_path / "Alice_video_EG_video_001.mp4").exists()
+    assert (tmp_path / "Alice_audio_EG_audio_002.webm").exists()
+    assert (tmp_path / "Alice_video_EG_video_001.mp4").read_bytes() == b"fake video bytes"
+    # CSV sidecar written
+    assert (tmp_path / "recordings_metadata.csv").exists()
+@responses.activate
+def test_download_recordings_filter_type(api_key, recordings_response, tmp_path):
+    """recording_type filter limits downloads to matching type."""
+    responses.get(
+        f"{BASE_URL}/data/recordings/experiment/exp_abc123",
+        json=recordings_response,
+        status=200,
+    )
+    responses.get(
+        recordings_response["data"][1]["downloadUrl"],
+        body=b"audio bytes",
+        status=200,
+    )
+    client = HyperStudy(api_key=api_key, base_url=BASE_URL)
+    df = client.download_recordings(
+        "exp_abc123",
+        output_dir=str(tmp_path),
+        recording_type="audio",
+        progress=False,
+    )
+    assert len(df) == 1
+    assert (tmp_path / "Alice_audio_EG_audio_002.webm").exists()
+    assert not (tmp_path / "Alice_video_EG_video_001.mp4").exists()
+@responses.activate
+def test_download_recordings_skip_existing(api_key, recordings_response, tmp_path):
+    """Files with matching size are skipped."""
+    responses.get(
+        f"{BASE_URL}/data/recordings/experiment/exp_abc123",
+        json=recordings_response,
+        status=200,
+    )
+    # Pre-create the video file with the expected size (1024 bytes)
+    video_path = tmp_path / "Alice_video_EG_video_001.mp4"
+    video_path.write_bytes(b"\x00" * 1024)
+    # Only the audio file needs a mock download URL
+    responses.get(
+        recordings_response["data"][1]["downloadUrl"],
+        body=b"\x00" * 512,
+        status=200,
+    )
+    client = HyperStudy(api_key=api_key, base_url=BASE_URL)
+    df = client.download_recordings(
+        "exp_abc123", output_dir=str(tmp_path), progress=False
+    )
+    assert df["download_status"].iloc[0] == "skipped"
+    assert df["download_status"].iloc[1] == "downloaded"
+@responses.activate
+def test_download_recording_single(api_key, tmp_path):
+    """download_recording downloads a single file."""
+    url = "https://storage.example.com/rec.mp4"
+    responses.get(url, body=b"video data", status=200)
+    client = HyperStudy(api_key=api_key, base_url=BASE_URL)
+    rec = {
+        "recordingId": "EG_001",
+        "participantName": "Bob",
+        "downloadUrl": url,
+        "format": "mp4",
+        "metadata": {"type": "video"},
+    }
+    path = client.download_recording(rec, output_dir=str(tmp_path))
+    assert path.exists()
+    assert path.name == "Bob_video_EG_001.mp4"
+    assert path.read_bytes() == b"video data"
 # ------------------------------------------------------------------
 # get_all_data
 # ------------------------------------------------------------------

hyperstudy-0.2.2/tests/test_dataframe.py ADDED Viewed

@@ -0,0 +1,182 @@
+"""Tests for DataFrame conversion (pandas and polars)."""
+from __future__ import annotations
+import pandas as pd
+import pytest
+from hyperstudy._dataframe import _flatten_nested_dicts, to_pandas, to_polars
+SAMPLE_DATA = [
+    {
+        "id": "evt_001",
+        "onset": 1500,
+        "timestamp": "2024-06-15T10:00:01.500Z",
+        "category": "component",
+    },
+    {
+        "id": "evt_002",
+        "onset": 3200,
+        "timestamp": "2024-06-15T10:00:03.200Z",
+        "category": "component",
+    },
+]
+SPARSE_RATING_DATA = [
+    {
+        "ratingId": "rat_001",
+        "onset": 8500,
+        "timestamp": "2024-06-15T10:01:30.000Z",
+        "value": 72.5,
+        "type": "sparse",
+        "metadata": {
+            "question": "How engaging?",
+            "dimension": "engagement",
+            "componentType": "vasrating",
+        },
+        "sparseRatingData": {
+            "videoId": "video_abc",
+            "pauseIndex": 0,
+            "mediaPauseOnset": 8200,
+            "mediaResumeOnset": 10800,
+            "actualPauseDuration": 2600,
+            "componentData": {"value": 58},
+        },
+    },
+    {
+        "ratingId": "rat_002",
+        "onset": 25300,
+        "timestamp": "2024-06-15T10:02:45.000Z",
+        "value": 45.0,
+        "type": "sparse",
+        "metadata": {
+            "question": "How engaging?",
+            "dimension": "engagement",
+            "componentType": "vasrating",
+        },
+        "sparseRatingData": {
+            "videoId": "video_abc",
+            "pauseIndex": 1,
+            "mediaPauseOnset": 25000,
+            "mediaResumeOnset": 27300,
+            "actualPauseDuration": 2300,
+            "componentData": {"value": 36},
+        },
+    },
+]
+# ------------------------------------------------------------------
+# Pandas
+# ------------------------------------------------------------------
+def test_to_pandas_creates_dataframe():
+    df = to_pandas(SAMPLE_DATA)
+    assert isinstance(df, pd.DataFrame)
+    assert len(df) == 2
+def test_to_pandas_onset_sec():
+    df = to_pandas(SAMPLE_DATA)
+    assert "onset_sec" in df.columns
+    assert df["onset_sec"].iloc[0] == pytest.approx(1.5)
+    assert df["onset_sec"].iloc[1] == pytest.approx(3.2)
+def test_to_pandas_timestamp_parsed():
+    df = to_pandas(SAMPLE_DATA)
+    assert pd.api.types.is_datetime64_any_dtype(df["timestamp"])
+def test_to_pandas_empty():
+    df = to_pandas([])
+    assert isinstance(df, pd.DataFrame)
+    assert df.empty
+# ------------------------------------------------------------------
+# Polars
+# ------------------------------------------------------------------
+def test_to_polars_creates_dataframe():
+    polars = pytest.importorskip("polars")
+    df = to_polars(SAMPLE_DATA)
+    assert isinstance(df, polars.DataFrame)
+    assert len(df) == 2
+def test_to_polars_onset_sec():
+    pytest.importorskip("polars")
+    df = to_polars(SAMPLE_DATA)
+    assert "onset_sec" in df.columns
+    assert df["onset_sec"][0] == pytest.approx(1.5)
+def test_to_polars_empty():
+    polars = pytest.importorskip("polars")
+    df = to_polars([])
+    assert isinstance(df, polars.DataFrame)
+    assert len(df) == 0
+# ------------------------------------------------------------------
+# Nested dict flattening
+# ------------------------------------------------------------------
+def test_flatten_sparse_rating_data():
+    df = to_pandas(SPARSE_RATING_DATA)
+    assert "sparseRatingData_mediaPauseOnset" in df.columns
+    assert "sparseRatingData_mediaResumeOnset" in df.columns
+    assert "sparseRatingData_actualPauseDuration" in df.columns
+    assert "sparseRatingData_videoId" in df.columns
+    assert "sparseRatingData_pauseIndex" in df.columns
+    assert df["sparseRatingData_mediaPauseOnset"].iloc[0] == 8200
+    assert df["sparseRatingData_mediaPauseOnset"].iloc[1] == 25000
+def test_flatten_metadata():
+    df = to_pandas(SPARSE_RATING_DATA)
+    assert "metadata_question" in df.columns
+    assert "metadata_dimension" in df.columns
+    assert "metadata_componentType" in df.columns
+    assert df["metadata_question"].iloc[0] == "How engaging?"
+def test_flatten_preserves_original():
+    df = to_pandas(SPARSE_RATING_DATA)
+    assert "sparseRatingData" in df.columns
+    assert isinstance(df["sparseRatingData"].iloc[0], dict)
+    assert "metadata" in df.columns
+    assert isinstance(df["metadata"].iloc[0], dict)
+def test_flatten_handles_none():
+    data = [
+        {"ratingId": "r1", "onset": 100, "sparseRatingData": None, "metadata": None},
+    ]
+    df = to_pandas(data)
+    assert "sparseRatingData" in df.columns
+    # No flattened columns since the nested value is None, not a dict
+    assert "sparseRatingData_mediaPauseOnset" not in df.columns
+def test_flatten_no_target_fields():
+    """Data without any flatten-target fields passes through unchanged."""
+    result = _flatten_nested_dicts(SAMPLE_DATA)
+    assert result is SAMPLE_DATA  # same object — no copy needed
+def test_flatten_empty():
+    result = _flatten_nested_dicts([])
+    assert result == []
+def test_flatten_polars():
+    pytest.importorskip("polars")
+    df = to_polars(SPARSE_RATING_DATA)
+    assert "sparseRatingData_mediaPauseOnset" in df.columns
+    assert "metadata_question" in df.columns
+    assert df["sparseRatingData_mediaPauseOnset"][0] == 8200

hyperstudy-0.2.2/tests/test_downloads.py ADDED Viewed

@@ -0,0 +1,105 @@
+"""Tests for recording download helpers."""
+from __future__ import annotations
+import responses
+import pytest
+from hyperstudy._downloads import build_filename, download_file
+# ------------------------------------------------------------------
+# build_filename
+# ------------------------------------------------------------------
+VIDEO_RECORDING = {
+    "recordingId": "EG_video_001",
+    "participantName": "Alice",
+    "format": "mp4",
+    "metadata": {"type": "video"},
+}
+AUDIO_RECORDING = {
+    "recordingId": "EG_audio_002",
+    "participantName": "Alice",
+    "format": "webm",
+    "metadata": {"type": "audio"},
+}
+def test_build_filename_video():
+    assert build_filename(VIDEO_RECORDING) == "Alice_video_EG_video_001.mp4"
+def test_build_filename_audio():
+    assert build_filename(AUDIO_RECORDING) == "Alice_audio_EG_audio_002.webm"
+def test_build_filename_missing_fields():
+    rec = {"egressId": "EG_123"}
+    name = build_filename(rec)
+    assert name == "unknown_recording_EG_123.mp4"
+def test_build_filename_sanitizes_name():
+    rec = {
+        "recordingId": "EG_001",
+        "participantName": "Alice O'Brien (test)",
+        "format": "mp4",
+        "metadata": {"type": "video"},
+    }
+    name = build_filename(rec)
+    assert name == "Alice_O_Brien__test__video_EG_001.mp4"
+    # No special characters remain
+    assert "'" not in name
+    assert "(" not in name
+def test_build_filename_uses_participant_id_fallback():
+    rec = {
+        "recordingId": "EG_001",
+        "participantId": "user_42",
+        "format": "mp4",
+        "metadata": {"type": "video"},
+    }
+    assert build_filename(rec) == "user_42_video_EG_001.mp4"
+def test_build_filename_audio_default_format():
+    """Audio recording with no format field defaults to webm."""
+    rec = {
+        "recordingId": "EG_001",
+        "participantName": "Bob",
+        "metadata": {"type": "audio"},
+    }
+    assert build_filename(rec).endswith(".webm")
+# ------------------------------------------------------------------
+# download_file
+# ------------------------------------------------------------------
+@responses.activate
+def test_download_file(tmp_path):
+    url = "https://storage.example.com/file.mp4"
+    content = b"fake video content " * 100
+    responses.get(url, body=content, status=200)
+    dest = tmp_path / "output.mp4"
+    written = download_file(url, dest)
+    assert dest.exists()
+    assert dest.read_bytes() == content
+    assert written == len(content)
+@responses.activate
+def test_download_file_raises_on_error(tmp_path):
+    url = "https://storage.example.com/missing.mp4"
+    responses.get(url, status=404)
+    dest = tmp_path / "output.mp4"
+    with pytest.raises(Exception):
+        download_file(url, dest)

hyperstudy-0.2.1/tests/test_dataframe.py DELETED Viewed

@@ -1,78 +0,0 @@
-"""Tests for DataFrame conversion (pandas and polars)."""
-from __future__ import annotations
-import pandas as pd
-import pytest
-from hyperstudy._dataframe import to_pandas, to_polars
-SAMPLE_DATA = [
-    {
-        "id": "evt_001",
-        "onset": 1500,
-        "timestamp": "2024-06-15T10:00:01.500Z",
-        "category": "component",
-    },
-    {
-        "id": "evt_002",
-        "onset": 3200,
-        "timestamp": "2024-06-15T10:00:03.200Z",
-        "category": "component",
-    },
-]
-# ------------------------------------------------------------------
-# Pandas
-# ------------------------------------------------------------------
-def test_to_pandas_creates_dataframe():
-    df = to_pandas(SAMPLE_DATA)
-    assert isinstance(df, pd.DataFrame)
-    assert len(df) == 2
-def test_to_pandas_onset_sec():
-    df = to_pandas(SAMPLE_DATA)
-    assert "onset_sec" in df.columns
-    assert df["onset_sec"].iloc[0] == pytest.approx(1.5)
-    assert df["onset_sec"].iloc[1] == pytest.approx(3.2)
-def test_to_pandas_timestamp_parsed():
-    df = to_pandas(SAMPLE_DATA)
-    assert pd.api.types.is_datetime64_any_dtype(df["timestamp"])
-def test_to_pandas_empty():
-    df = to_pandas([])
-    assert isinstance(df, pd.DataFrame)
-    assert df.empty
-# ------------------------------------------------------------------
-# Polars
-# ------------------------------------------------------------------
-def test_to_polars_creates_dataframe():
-    polars = pytest.importorskip("polars")
-    df = to_polars(SAMPLE_DATA)
-    assert isinstance(df, polars.DataFrame)
-    assert len(df) == 2
-def test_to_polars_onset_sec():
-    pytest.importorskip("polars")
-    df = to_polars(SAMPLE_DATA)
-    assert "onset_sec" in df.columns
-    assert df["onset_sec"][0] == pytest.approx(1.5)
-def test_to_polars_empty():
-    polars = pytest.importorskip("polars")
-    df = to_polars([])
-    assert isinstance(df, polars.DataFrame)
-    assert len(df) == 0