PyPI - protein-quest - Versions diffs - 0.4.0__tar.gz → 0.5.1__tar.gz - Mend

protein-quest 0.4.0tar.gz → 0.5.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of protein-quest might be problematic. Click here for more details.

Files changed (75) hide show

{protein_quest-0.4.0 → protein_quest-0.5.1}/.github/workflows/ci.yml RENAMED Viewed

@@ -27,20 +27,11 @@ jobs:
       - name: Run tests
         run: |
           uv run pytest --cov --cov-report=xml
-          echo $? > pytest-exitcode
-        continue-on-error: true
-      # Always upload coverage, even if tests fail
       - name: Run codacy-coverage-reporter
         uses: codacy/codacy-coverage-reporter-action@v1.3.0
         with:
           project-token: ${{ secrets.CODACY_PROJECT_TOKEN }}
           coverage-reports: coverage.xml
-      - name: Fail job if pytest failed
-        run: |
-          if [ -f pytest-exitcode ] && [ "$(cat pytest-exitcode)" -ne 0 ]; then
-            echo "Pytest failed, failing job."
-            exit 1
-          fi
   build:
     name: build
     runs-on: ubuntu-latest

{protein_quest-0.4.0 → protein_quest-0.5.1}/.github/workflows/pages.yml RENAMED Viewed

@@ -5,6 +5,7 @@ on:
     branches:
       - main
   workflow_dispatch:
+  pull_request:
 permissions:
   contents: read
@@ -13,7 +14,7 @@ permissions:
 # Only have one deployment in progress at a time
 concurrency:
-  group: "pages"
+  group: pages
   cancel-in-progress: true
 jobs:
@@ -32,6 +33,10 @@ jobs:
       - name: Build MkDocs site
         run: |
           uv run mkdocs build
+        env:
+          # Force colored output from rich library
+          TTY_COMPATIBLE: '1'
+          TTY_INTERACTIVE: '0'
       - name: Upload artifact
         uses: actions/upload-pages-artifact@v3
@@ -42,6 +47,9 @@ jobs:
     # Add a dependency to the build job
     needs: build
+    # Only deploy on pushes to main or manual trigger of main branch
+    if: github.ref == 'refs/heads/main'
     # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
     permissions:
       pages: write      # to deploy to Pages

{protein_quest-0.4.0 → protein_quest-0.5.1}/.gitignore RENAMED Viewed

@@ -73,4 +73,17 @@ venv.bak/
 /docs/pdb_files/
 /docs/density_filtered/
 /site
-/mysession/
+/mysession/
+# Paths generated in README.md examples
+uniprot_accs.txt
+pdbe.csv
+alphafold.csv
+emdbs.csv
+interaction-partners-of-Q05471.txt
+complexes.csv
+downloads-af/
+downloads-emdb/
+downloads-pdbe/
+filtered/
+filtered-chains/
+filtered-ss/

{protein_quest-0.4.0 → protein_quest-0.5.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: protein_quest
-Version: 0.4.0
+Version: 0.5.1
 Summary: Search/retrieve/filter proteins and protein structures
 Project-URL: Homepage, https://github.com/haddocking/protein-quest
 Project-URL: Issues, https://github.com/haddocking/protein-quest/issues
@@ -17,6 +17,7 @@ Requires-Dist: cattrs[orjson]>=24.1.3
 Requires-Dist: dask>=2025.5.1
 Requires-Dist: distributed>=2025.5.1
 Requires-Dist: gemmi>=0.7.3
+Requires-Dist: platformdirs>=4.3.8
 Requires-Dist: psutil>=7.0.0
 Requires-Dist: rich-argparse>=1.7.1
 Requires-Dist: rich>=14.0.0
@@ -47,6 +48,10 @@ It uses
 - [gemmi](https://project-gemmi.github.io/) to work with macromolecular models.
 - [dask-distributed](https://docs.dask.org/en/latest/) to compute in parallel.
+The package is used by
+- [protein-detective](https://github.com/haddocking/protein-detective)
 An example workflow:
 ```mermaid
@@ -94,6 +99,9 @@ The main entry point is the `protein-quest` command line tool which has multiple
 To use programmaticly, see the [Jupyter notebooks](https://www.bonvinlab.org/protein-quest/notebooks) and [API documentation](https://www.bonvinlab.org/protein-quest/autoapi/summary/).
+While downloading or copying files it uses a global cache (located at `~/.cache/protein-quest`) and hardlinks to save disk space and improve speed.
+This behavior can be customized with the `--no-cache`, `--cache-dir`, and `--copy-method` command line arguments.
 ### Search Uniprot accessions
 ```shell

{protein_quest-0.4.0 → protein_quest-0.5.1}/README.md RENAMED Viewed

@@ -17,6 +17,10 @@ It uses
 - [gemmi](https://project-gemmi.github.io/) to work with macromolecular models.
 - [dask-distributed](https://docs.dask.org/en/latest/) to compute in parallel.
+The package is used by
+- [protein-detective](https://github.com/haddocking/protein-detective)
 An example workflow:
 ```mermaid
@@ -64,6 +68,9 @@ The main entry point is the `protein-quest` command line tool which has multiple
 To use programmaticly, see the [Jupyter notebooks](https://www.bonvinlab.org/protein-quest/notebooks) and [API documentation](https://www.bonvinlab.org/protein-quest/autoapi/summary/).
+While downloading or copying files it uses a global cache (located at `~/.cache/protein-quest`) and hardlinks to save disk space and improve speed.
+This behavior can be customized with the `--no-cache`, `--cache-dir`, and `--copy-method` command line arguments.
 ### Search Uniprot accessions
 ```shell

{protein_quest-0.4.0 → protein_quest-0.5.1}/mkdocs.yml RENAMED Viewed

@@ -3,10 +3,6 @@ site_url: https://bonvinlab.org/protein_quest
 repo_name: haddocking/protein-quest
 repo_url: https://github.com/haddocking/protein-quest
 watch: [mkdocs.yml, README.md, src/protein_quest]
-exclude_docs: |
-  cli_doc_hook.py
-hooks:
-  - docs/cli_doc_hook.py
 use_directory_urls: false
 theme:
   name: material
@@ -61,6 +57,9 @@ plugins:
         remove_tag_config:
             remove_input_tags:
                 - hide_code
+  - mkdocs-rich-argparse:
+      module: protein_quest.cli
+      factory: make_parser
 markdown_extensions:
   # Use to render part of README as home

{protein_quest-0.4.0 → protein_quest-0.5.1}/pyproject.toml RENAMED Viewed

@@ -20,6 +20,7 @@ dependencies = [
     "sparqlwrapper>=2.0.0",
     "tqdm>=4.67.1",
     "yarl>=1.20.1",
+    "platformdirs>=4.3.8",
 ]
 [project.urls]
@@ -57,6 +58,7 @@ docs = [
     "mkdocs-autoapi>=0.4.1",
     "mkdocs-jupyter>=0.25.1",
     "mkdocs-material>=9.6.14",
+    "mkdocs-rich-argparse>=0.1.2",
     "mkdocstrings[python]>=0.29.1",
 ]
 docs-type = [

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/__version__.py RENAMED Viewed

@@ -1,2 +1,2 @@
-__version__ = "0.4.0"
+__version__ = "0.5.1"
 """The version of the package."""

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/alphafold/fetch.py RENAMED Viewed

@@ -14,7 +14,7 @@ from yarl import URL
 from protein_quest.alphafold.entry_summary import EntrySummary
 from protein_quest.converter import converter
-from protein_quest.utils import friendly_session, retrieve_files, run_async
+from protein_quest.utils import Cacher, PassthroughCacher, friendly_session, retrieve_files, run_async
 logger = logging.getLogger(__name__)
@@ -104,7 +104,7 @@ class AlphaFoldEntry:
 async def fetch_summary(
-    qualifier: str, session: RetryClient, semaphore: Semaphore, save_dir: Path | None
+    qualifier: str, session: RetryClient, semaphore: Semaphore, save_dir: Path | None, cacher: Cacher
 ) -> list[EntrySummary]:
     """Fetches a summary from the AlphaFold database for a given qualifier.
@@ -116,6 +116,7 @@ async def fetch_summary(
         save_dir: An optional directory to save the fetched summary as a JSON file.
             If set and summary exists then summary will be loaded from disk instead of being fetched from the API.
             If not set then the summary will not be saved to disk and will always be fetched from the API.
+        cacher: A cacher to use for caching the fetched summary. Only used if save_dir is not None.
     Returns:
         A list of EntrySummary objects representing the fetched summary.
@@ -124,6 +125,11 @@ async def fetch_summary(
     fn: AsyncPath | None = None
     if save_dir is not None:
         fn = AsyncPath(save_dir / f"{qualifier}.json")
+        cached_file = await cacher.copy_from_cache(Path(fn))
+        if cached_file is not None:
+            logger.debug(f"Using cached file {cached_file} for summary of {qualifier}.")
+            raw_data = await AsyncPath(cached_file).read_bytes()
+            return converter.loads(raw_data, list[EntrySummary])
         if await fn.exists():
             logger.debug(f"File {fn} already exists. Skipping download from {url}.")
             raw_data = await fn.read_bytes()
@@ -133,18 +139,23 @@ async def fetch_summary(
         raw_data = await response.content.read()
         if fn is not None:
             # TODO return fn and make it part of AlphaFoldEntry as summary_file prop
-            await fn.write_bytes(raw_data)
+            await cacher.write_bytes(Path(fn), raw_data)
         return converter.loads(raw_data, list[EntrySummary])
 async def fetch_summaries(
-    qualifiers: Iterable[str], save_dir: Path | None = None, max_parallel_downloads: int = 5
+    qualifiers: Iterable[str],
+    save_dir: Path | None = None,
+    max_parallel_downloads: int = 5,
+    cacher: Cacher | None = None,
 ) -> AsyncGenerator[EntrySummary]:
     semaphore = Semaphore(max_parallel_downloads)
     if save_dir is not None:
         save_dir.mkdir(parents=True, exist_ok=True)
+    if cacher is None:
+        cacher = PassthroughCacher()
     async with friendly_session() as session:
-        tasks = [fetch_summary(qualifier, session, semaphore, save_dir) for qualifier in qualifiers]
+        tasks = [fetch_summary(qualifier, session, semaphore, save_dir, cacher) for qualifier in qualifiers]
         summaries_per_qualifier: list[list[EntrySummary]] = await tqdm.gather(
             *tasks, desc="Fetching Alphafold summaries"
         )
@@ -154,7 +165,11 @@ async def fetch_summaries(
 async def fetch_many_async(
-    uniprot_accessions: Iterable[str], save_dir: Path, what: set[DownloadableFormat], max_parallel_downloads: int = 5
+    uniprot_accessions: Iterable[str],
+    save_dir: Path,
+    what: set[DownloadableFormat],
+    max_parallel_downloads: int = 5,
+    cacher: Cacher | None = None,
 ) -> AsyncGenerator[AlphaFoldEntry]:
     """Asynchronously fetches summaries and files from
     [AlphaFold Protein Structure Database](https://alphafold.ebi.ac.uk/).
@@ -164,15 +179,17 @@ async def fetch_many_async(
         save_dir: The directory to save the fetched files to.
         what: A set of formats to download.
         max_parallel_downloads: The maximum number of parallel downloads.
+        cacher: A cacher to use for caching the fetched files. Only used if summary is in what set.
     Yields:
         A dataclass containing the summary, pdb file, and pae file.
     """
     save_dir_for_summaries = save_dir if "summary" in what and save_dir is not None else None
     summaries = [
         s
         async for s in fetch_summaries(
-            uniprot_accessions, save_dir_for_summaries, max_parallel_downloads=max_parallel_downloads
+            uniprot_accessions, save_dir_for_summaries, max_parallel_downloads=max_parallel_downloads, cacher=cacher
         )
     ]
@@ -183,6 +200,7 @@ async def fetch_many_async(
         save_dir,
         desc="Downloading AlphaFold files",
         max_parallel_downloads=max_parallel_downloads,
+        cacher=cacher,
     )
     for summary in summaries:
         yield AlphaFoldEntry(
@@ -236,7 +254,11 @@ def files_to_download(what: set[DownloadableFormat], summaries: Iterable[EntrySu
 def fetch_many(
-    ids: Iterable[str], save_dir: Path, what: set[DownloadableFormat], max_parallel_downloads: int = 5
+    ids: Iterable[str],
+    save_dir: Path,
+    what: set[DownloadableFormat],
+    max_parallel_downloads: int = 5,
+    cacher: Cacher | None = None,
 ) -> list[AlphaFoldEntry]:
     """Synchronously fetches summaries and pdb and pae files from AlphaFold Protein Structure Database.
@@ -245,6 +267,7 @@ def fetch_many(
         save_dir: The directory to save the fetched files to.
         what: A set of formats to download.
         max_parallel_downloads: The maximum number of parallel downloads.
+        cacher: A cacher to use for caching the fetched files. Only used if summary is in what set.
     Returns:
         A list of AlphaFoldEntry dataclasses containing the summary, pdb file, and pae file.
@@ -253,7 +276,9 @@ def fetch_many(
     async def gather_entries():
         return [
             entry
-            async for entry in fetch_many_async(ids, save_dir, what, max_parallel_downloads=max_parallel_downloads)
+            async for entry in fetch_many_async(
+                ids, save_dir, what, max_parallel_downloads=max_parallel_downloads, cacher=cacher
+            )
         ]
     return run_async(gather_entries())

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/cli.py RENAMED Viewed

@@ -43,7 +43,15 @@ from protein_quest.uniprot import (
     search4pdb,
     search4uniprot,
 )
-from protein_quest.utils import CopyMethod, copy_methods, copyfile
+from protein_quest.utils import (
+    Cacher,
+    CopyMethod,
+    DirectoryCacher,
+    PassthroughCacher,
+    copy_methods,
+    copyfile,
+    user_cache_root_dir,
+)
 logger = logging.getLogger(__name__)
@@ -312,6 +320,7 @@ def _add_retrieve_pdbe_parser(subparsers: argparse._SubParsersAction):
         default=5,
         help="Maximum number of parallel downloads",
     )
+    _add_cacher_arguments(parser)
 def _add_retrieve_alphafold_parser(subparsers: argparse._SubParsersAction):
@@ -342,6 +351,7 @@ def _add_retrieve_alphafold_parser(subparsers: argparse._SubParsersAction):
         default=5,
         help="Maximum number of parallel downloads",
     )
+    _add_cacher_arguments(parser)
 def _add_retrieve_emdb_parser(subparsers: argparse._SubParsersAction):
@@ -361,22 +371,7 @@ def _add_retrieve_emdb_parser(subparsers: argparse._SubParsersAction):
         help="CSV file with `emdb_id` column. Other columns are ignored. Use `-` for stdin.",
     )
     parser.add_argument("output_dir", type=Path, help="Directory to store downloaded EMDB volume files")
-def _add_copy_method_argument(parser: argparse.ArgumentParser):
-    """Add copy method argument to parser."""
-    default_copy_method = "symlink"
-    if os.name == "nt":
-        # On Windows you need developer mode or admin privileges to create symlinks
-        # so we default to copying files instead of symlinking
-        default_copy_method = "copy"
-    parser.add_argument(
-        "--copy-method",
-        type=str,
-        choices=copy_methods,
-        default=default_copy_method,
-        help="How to copy files when no changes are needed to output file.",
-    )
+    _add_cacher_arguments(parser)
 def _add_filter_confidence_parser(subparsers: argparse._SubParsersAction):
@@ -409,7 +404,7 @@ def _add_filter_confidence_parser(subparsers: argparse._SubParsersAction):
             In CSV format with `<input_file>,<residue_count>,<passed>,<output_file>` columns.
             Use `-` for stdout."""),
     )
-    _add_copy_method_argument(parser)
+    _add_copy_method_arguments(parser)
 def _add_filter_chain_parser(subparsers: argparse._SubParsersAction):
@@ -449,7 +444,7 @@ def _add_filter_chain_parser(subparsers: argparse._SubParsersAction):
             If not provided, will create a local cluster.
             If set to `sequential` will run tasks sequentially."""),
     )
-    _add_copy_method_argument(parser)
+    _add_copy_method_arguments(parser)
 def _add_filter_residue_parser(subparsers: argparse._SubParsersAction):
@@ -472,7 +467,6 @@ def _add_filter_residue_parser(subparsers: argparse._SubParsersAction):
     )
     parser.add_argument("--min-residues", type=int, default=0, help="Min residues in chain A")
     parser.add_argument("--max-residues", type=int, default=10_000_000, help="Max residues in chain A")
-    _add_copy_method_argument(parser)
     parser.add_argument(
         "--write-stats",
         type=argparse.FileType("w", encoding="UTF-8"),
@@ -481,6 +475,7 @@ def _add_filter_residue_parser(subparsers: argparse._SubParsersAction):
             In CSV format with `<input_file>,<residue_count>,<passed>,<output_file>` columns.
             Use `-` for stdout."""),
     )
+    _add_copy_method_arguments(parser)
 def _add_filter_ss_parser(subparsers: argparse._SubParsersAction):
@@ -507,7 +502,6 @@ def _add_filter_ss_parser(subparsers: argparse._SubParsersAction):
     parser.add_argument("--ratio-max-helix-residues", type=float, help="Max residues in helices (relative)")
     parser.add_argument("--ratio-min-sheet-residues", type=float, help="Min residues in sheets (relative)")
     parser.add_argument("--ratio-max-sheet-residues", type=float, help="Max residues in sheets (relative)")
-    _add_copy_method_argument(parser)
     parser.add_argument(
         "--write-stats",
         type=argparse.FileType("w", encoding="UTF-8"),
@@ -518,6 +512,7 @@ def _add_filter_ss_parser(subparsers: argparse._SubParsersAction):
             Use `-` for stdout.
         """),
     )
+    _add_copy_method_arguments(parser)
 def _add_search_subcommands(subparsers: argparse._SubParsersAction):
@@ -585,6 +580,38 @@ def _add_mcp_command(subparsers: argparse._SubParsersAction):
     parser.add_argument("--port", default=8000, type=int, help="Port to bind the server to")
+def _add_copy_method_arguments(parser):
+    parser.add_argument(
+        "--copy-method",
+        type=str,
+        choices=copy_methods,
+        default="hardlink",
+        help=dedent("""\
+            How to make target file be same file as source file.
+            By default uses hardlinks to save disk space.
+            Note that hardlinks only work within the same filesystem and are harder to track.
+            If you want to track cached files easily then use 'symlink'.
+            On Windows you need developer mode or admin privileges to create symlinks.
+        """),
+    )
+def _add_cacher_arguments(parser: argparse.ArgumentParser):
+    """Add cacher arguments to parser."""
+    parser.add_argument(
+        "--no-cache",
+        action="store_true",
+        help="Disable caching of files to central location.",
+    )
+    parser.add_argument(
+        "--cache-dir",
+        type=Path,
+        default=user_cache_root_dir(),
+        help="Directory to use as cache for files.",
+    )
+    _add_copy_method_arguments(parser)
 def make_parser() -> argparse.ArgumentParser:
     parser = argparse.ArgumentParser(
         description="Protein Quest CLI", prog="protein-quest", formatter_class=ArgumentDefaultsRichHelpFormatter
@@ -742,14 +769,26 @@ def _handle_search_complexes(args: argparse.Namespace):
     _write_complexes_csv(results, output_csv)
-def _handle_retrieve_pdbe(args):
+def _initialize_cacher(args: argparse.Namespace) -> Cacher:
+    if args.no_cache:
+        return PassthroughCacher()
+    return DirectoryCacher(
+        cache_dir=args.cache_dir,
+        copy_method=args.copy_method,
+    )
+def _handle_retrieve_pdbe(args: argparse.Namespace):
     pdbe_csv = args.pdbe_csv
     output_dir = args.output_dir
     max_parallel_downloads = args.max_parallel_downloads
+    cacher = _initialize_cacher(args)
     pdb_ids = _read_column_from_csv(pdbe_csv, "pdb_id")
     rprint(f"Retrieving {len(pdb_ids)} PDBe entries")
-    result = asyncio.run(pdbe_fetch.fetch(pdb_ids, output_dir, max_parallel_downloads=max_parallel_downloads))
+    result = asyncio.run(
+        pdbe_fetch.fetch(pdb_ids, output_dir, max_parallel_downloads=max_parallel_downloads, cacher=cacher)
+    )
     rprint(f"Retrieved {len(result)} PDBe entries")
@@ -758,6 +797,7 @@ def _handle_retrieve_alphafold(args):
     what_formats = args.what_formats
     alphafold_csv = args.alphafold_csv
     max_parallel_downloads = args.max_parallel_downloads
+    cacher = _initialize_cacher(args)
     if what_formats is None:
         what_formats = {"summary", "cif"}
@@ -767,7 +807,9 @@ def _handle_retrieve_alphafold(args):
     af_ids = _read_column_from_csv(alphafold_csv, "af_id")
     validated_what: set[DownloadableFormat] = structure(what_formats, set[DownloadableFormat])
     rprint(f"Retrieving {len(af_ids)} AlphaFold entries with formats {validated_what}")
-    afs = af_fetch(af_ids, download_dir, what=validated_what, max_parallel_downloads=max_parallel_downloads)
+    afs = af_fetch(
+        af_ids, download_dir, what=validated_what, max_parallel_downloads=max_parallel_downloads, cacher=cacher
+    )
     total_nr_files = sum(af.nr_of_files() for af in afs)
     rprint(f"Retrieved {total_nr_files} AlphaFold files and {len(afs)} summaries, written to {download_dir}")
@@ -775,10 +817,11 @@ def _handle_retrieve_alphafold(args):
 def _handle_retrieve_emdb(args):
     emdb_csv = args.emdb_csv
     output_dir = args.output_dir
+    cacher = _initialize_cacher(args)
     emdb_ids = _read_column_from_csv(emdb_csv, "emdb_id")
     rprint(f"Retrieving {len(emdb_ids)} EMDB entries")
-    result = asyncio.run(emdb_fetch(emdb_ids, output_dir))
+    result = asyncio.run(emdb_fetch(emdb_ids, output_dir, cacher=cacher))
     rprint(f"Retrieved {len(result)} EMDB entries")

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/emdb.py RENAMED Viewed

@@ -3,7 +3,7 @@
 from collections.abc import Iterable, Mapping
 from pathlib import Path
-from protein_quest.utils import retrieve_files
+from protein_quest.utils import Cacher, retrieve_files
 def _map_id2volume_url(emdb_id: str) -> tuple[str, str]:
@@ -13,13 +13,16 @@ def _map_id2volume_url(emdb_id: str) -> tuple[str, str]:
     return url, fn
-async def fetch(emdb_ids: Iterable[str], save_dir: Path, max_parallel_downloads: int = 1) -> Mapping[str, Path]:
+async def fetch(
+    emdb_ids: Iterable[str], save_dir: Path, max_parallel_downloads: int = 1, cacher: Cacher | None = None
+) -> Mapping[str, Path]:
     """Fetches volume files from the EMDB database.
     Args:
         emdb_ids: A list of EMDB IDs to fetch.
         save_dir: The directory to save the downloaded files.
         max_parallel_downloads: The maximum number of parallel downloads.
+        cacher: An optional cacher to use for caching downloaded files.
     Returns:
         A mapping of EMDB IDs to their downloaded files.
@@ -30,5 +33,5 @@ async def fetch(emdb_ids: Iterable[str], save_dir: Path, max_parallel_downloads:
     # TODO show progress of each item
     # TODO handle failed downloads, by skipping them instead of raising an error
-    await retrieve_files(urls, save_dir, max_parallel_downloads, desc="Downloading EMDB volume files")
+    await retrieve_files(urls, save_dir, max_parallel_downloads, desc="Downloading EMDB volume files", cacher=cacher)
     return id2paths

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/mcp_server.py RENAMED Viewed

@@ -32,6 +32,7 @@ Examples:
 """
+from collections.abc import Mapping
 from pathlib import Path
 from textwrap import dedent
 from typing import Annotated
@@ -89,7 +90,18 @@ def search_pdb(
     return search4pdb(uniprot_accs, limit=limit)
-mcp.tool(pdbe_fetch, name="fetch_pdbe_structures")
+@mcp.tool
+async def fetch_pdbe_structures(pdb_ids: set[str], save_dir: Path) -> Mapping[str, Path]:
+    """Fetch the PDBe structures for given PDB IDs.
+    Args:
+        pdb_ids: A set of PDB IDs.
+        save_dir: The directory to save the fetched files.
+    Returns:
+        A mapping of PDB ID to the path of the fetched structure file.
+    """
+    return await pdbe_fetch(pdb_ids, save_dir)
 @mcp.tool
@@ -163,7 +175,17 @@ def fetch_alphafold_structures(uniprot_accs: set[str], save_dir: Path) -> list[A
     return alphafold_fetch(uniprot_accs, save_dir, what)
-mcp.tool(emdb_fetch, name="fetch_emdb_volumes")
+@mcp.tool
+async def fetch_emdb_volumes(emdb_ids: set[str], save_dir: Path) -> Mapping[str, Path]:
+    """Fetch EMDB volumes for given EMDB IDs.
+    Args:
+        emdb_ids: A set of EMDB IDs.
+        save_dir: The directory to save the fetched files.
+    Returns:
+        A mapping of EMDB ID to the path of the fetched volume file.
+    """
+    return await emdb_fetch(emdb_ids=emdb_ids, save_dir=save_dir)
 @mcp.tool

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/pdbe/fetch.py RENAMED Viewed

@@ -3,7 +3,7 @@
 from collections.abc import Iterable, Mapping
 from pathlib import Path
-from protein_quest.utils import retrieve_files, run_async
+from protein_quest.utils import Cacher, retrieve_files, run_async
 def _map_id_mmcif(pdb_id: str) -> tuple[str, str]:
@@ -28,13 +28,16 @@ def _map_id_mmcif(pdb_id: str) -> tuple[str, str]:
     return url, fn
-async def fetch(ids: Iterable[str], save_dir: Path, max_parallel_downloads: int = 5) -> Mapping[str, Path]:
+async def fetch(
+    ids: Iterable[str], save_dir: Path, max_parallel_downloads: int = 5, cacher: Cacher | None = None
+) -> Mapping[str, Path]:
     """Fetches mmCIF files from the PDBe database.
     Args:
         ids: A set of PDB IDs to fetch.
         save_dir: The directory to save the fetched mmCIF files to.
         max_parallel_downloads: The maximum number of parallel downloads.
+        cacher: An optional cacher to use for caching downloaded files.
     Returns:
         A dict of id and paths to the downloaded mmCIF files.
@@ -47,7 +50,7 @@ async def fetch(ids: Iterable[str], save_dir: Path, max_parallel_downloads: int
     urls = list(id2urls.values())
     id2paths = {pdb_id: save_dir / fn for pdb_id, (_, fn) in id2urls.items()}
-    await retrieve_files(urls, save_dir, max_parallel_downloads, desc="Downloading PDBe mmCIF files")
+    await retrieve_files(urls, save_dir, max_parallel_downloads, desc="Downloading PDBe mmCIF files", cacher=cacher)
     return id2paths

{protein_quest-0.4.0 → protein_quest-0.5.1}/src/protein_quest/uniprot.py RENAMED Viewed

@@ -525,7 +525,9 @@ def _build_complex_sparql_query(uniprot_accs: Iterable[str], limit: int) -> str:
     ?protein
     ?cp_db
     ?cp_comment
-    (GROUP_CONCAT(DISTINCT ?member; separator=",") AS ?complex_members)
+    (GROUP_CONCAT(
+        DISTINCT STRAFTER(STR(?member), "http://purl.uniprot.org/uniprot/"); separator=","
+    ) AS ?complex_members)
     (COUNT(DISTINCT ?member) AS ?member_count)
     WHERE {
     # Input UniProt accessions
@@ -550,7 +552,9 @@ def _build_complex_sparql_query(uniprot_accs: Iterable[str], limit: int) -> str:
     """
     select_clause = dedent("""\
         ?protein ?cp_db ?cp_comment
-        (GROUP_CONCAT(DISTINCT ?member; separator=",") AS ?complex_members)
+        (GROUP_CONCAT(
+            DISTINCT STRAFTER(STR(?member), "http://purl.uniprot.org/uniprot/"); separator=","
+        ) AS ?complex_members)
     """)
     where_clause = dedent("""
         # --- Complex Info ---
@@ -596,7 +600,7 @@ def _flatten_results_complex(raw_results) -> list[ComplexPortalEntry]:
         complex_id = raw_result["cp_db"]["value"].split("/")[-1]
         complex_url = f"https://www.ebi.ac.uk/complexportal/complex/{complex_id}"
         complex_title = raw_result.get("cp_comment", {}).get("value", "")
-        members = {m.split("/")[-1] for m in raw_result["complex_members"]["value"].split(",")}
+        members = set(raw_result["complex_members"]["value"].split(","))
         results.append(
             ComplexPortalEntry(
                 query_protein=query_protein,

protein-quest 0.4.0__tar.gz → 0.5.1__tar.gz

Potentially problematic release.

protein-quest 0.4.0tar.gz → 0.5.1tar.gz