PyPI - protein-quest - Versions diffs - 0.3.2__tar.gz → 0.5.0__tar.gz - Mend

protein-quest 0.3.2tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of protein-quest might be problematic. Click here for more details.

Files changed (75) hide show

{protein_quest-0.3.2 → protein_quest-0.5.0}/.github/workflows/ci.yml RENAMED Viewed

@@ -27,20 +27,11 @@ jobs:
       - name: Run tests
         run: |
           uv run pytest --cov --cov-report=xml
-          echo $? > pytest-exitcode
-        continue-on-error: true
-      # Always upload coverage, even if tests fail
       - name: Run codacy-coverage-reporter
         uses: codacy/codacy-coverage-reporter-action@v1.3.0
         with:
           project-token: ${{ secrets.CODACY_PROJECT_TOKEN }}
           coverage-reports: coverage.xml
-      - name: Fail job if pytest failed
-        run: |
-          if [ -f pytest-exitcode ] && [ "$(cat pytest-exitcode)" -ne 0 ]; then
-            echo "Pytest failed, failing job."
-            exit 1
-          fi
   build:
     name: build
     runs-on: ubuntu-latest

{protein_quest-0.3.2 → protein_quest-0.5.0}/.github/workflows/pages.yml RENAMED Viewed

@@ -5,6 +5,7 @@ on:
     branches:
       - main
   workflow_dispatch:
+  pull_request:
 permissions:
   contents: read
@@ -13,7 +14,7 @@ permissions:
 # Only have one deployment in progress at a time
 concurrency:
-  group: "pages"
+  group: pages
   cancel-in-progress: true
 jobs:
@@ -32,6 +33,10 @@ jobs:
       - name: Build MkDocs site
         run: |
           uv run mkdocs build
+        env:
+          # Force colored output from rich library
+          TTY_COMPATIBLE: '1'
+          TTY_INTERACTIVE: '0'
       - name: Upload artifact
         uses: actions/upload-pages-artifact@v3
@@ -42,6 +47,9 @@ jobs:
     # Add a dependency to the build job
     needs: build
+    # Only deploy on pushes to main or manual trigger of main branch
+    if: github.ref == 'refs/heads/main'
     # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
     permissions:
       pages: write      # to deploy to Pages

{protein_quest-0.3.2 → protein_quest-0.5.0}/.gitignore RENAMED Viewed

@@ -73,4 +73,17 @@ venv.bak/
 /docs/pdb_files/
 /docs/density_filtered/
 /site
-/mysession/
+/mysession/
+# Paths generated in README.md examples
+uniprot_accs.txt
+pdbe.csv
+alphafold.csv
+emdbs.csv
+interaction-partners-of-Q05471.txt
+complexes.csv
+downloads-af/
+downloads-emdb/
+downloads-pdbe/
+filtered/
+filtered-chains/
+filtered-ss/

{protein_quest-0.3.2 → protein_quest-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: protein_quest
-Version: 0.3.2
+Version: 0.5.0
 Summary: Search/retrieve/filter proteins and protein structures
 Project-URL: Homepage, https://github.com/haddocking/protein-quest
 Project-URL: Issues, https://github.com/haddocking/protein-quest/issues
@@ -17,6 +17,7 @@ Requires-Dist: cattrs[orjson]>=24.1.3
 Requires-Dist: dask>=2025.5.1
 Requires-Dist: distributed>=2025.5.1
 Requires-Dist: gemmi>=0.7.3
+Requires-Dist: platformdirs>=4.3.8
 Requires-Dist: psutil>=7.0.0
 Requires-Dist: rich-argparse>=1.7.1
 Requires-Dist: rich>=14.0.0
@@ -47,6 +48,10 @@ It uses
 - [gemmi](https://project-gemmi.github.io/) to work with macromolecular models.
 - [dask-distributed](https://docs.dask.org/en/latest/) to compute in parallel.
+The package is used by
+- [protein-detective](https://github.com/haddocking/protein-detective)
 An example workflow:
 ```mermaid
@@ -56,12 +61,14 @@ graph TB;
     searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
     searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
     searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
+    searchintactionpartners[/Search interaction partners/] -.-x |uniprot_accessions|searchuniprot
+    searchcomplexes[/Search complexes/]
     searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
     searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
     searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
-    fetchpdbe -->|mmcif_files_with_uniprot_acc| chainfilter{{Filter on chain of uniprot}}
+    fetchpdbe -->|mmcif_files| chainfilter{{Filter on chain of uniprot}}
     chainfilter --> |mmcif_files| residuefilter{{Filter on chain length}}
-    fetchad -->|pdb_files| confidencefilter{{Filter out low confidence}}
+    fetchad -->|mmcif_files| confidencefilter{{Filter out low confidence}}
     confidencefilter --> |mmcif_files| ssfilter{{Filter on secondary structure}}
     residuefilter --> |mmcif_files| ssfilter
     classDef dashedBorder stroke-dasharray: 5 5;
@@ -69,6 +76,8 @@ graph TB;
     taxonomy:::dashedBorder
     searchemdb:::dashedBorder
     fetchemdb:::dashedBorder
+    searchintactionpartners:::dashedBorder
+    searchcomplexes:::dashedBorder
 ```
 (Dotted nodes and edges are side-quests.)
@@ -90,6 +99,9 @@ The main entry point is the `protein-quest` command line tool which has multiple
 To use programmaticly, see the [Jupyter notebooks](https://www.bonvinlab.org/protein-quest/notebooks) and [API documentation](https://www.bonvinlab.org/protein-quest/autoapi/summary/).
+While downloading or copying files it uses a global cache (located at `~/.cache/protein-quest`) and hardlinks to save disk space and improve speed.
+This behavior can be customized with the `--no-cache`, `--cache-dir`, and `--copy-method` command line arguments.
 ### Search Uniprot accessions
 ```shell
@@ -204,6 +216,32 @@ You can use following command to search for a Gene Ontology (GO) term.
 protein-quest search go --limit 5 --aspect cellular_component apoptosome -
 ```
+### Search for interaction partners
+Use https://www.ebi.ac.uk/complexportal to find interaction partners of given UniProt accession.
+```shell
+protein-quest search interaction-partners Q05471 interaction-partners-of-Q05471.txt
+```
+The `interaction-partners-of-Q05471.txt` file contains uniprot accessions (one per line).
+### Search for complexes
+Given Uniprot accessions search for macromolecular complexes at https://www.ebi.ac.uk/complexportal
+and return the complex entries and their members.
+```shell
+echo Q05471 | protein-quest search complexes - complexes.csv
+```
+The `complexes.csv` looks like
+```csv
+query_protein,complex_id,complex_url,complex_title,members
+Q05471,CPX-2122,https://www.ebi.ac.uk/complexportal/complex/CPX-2122,Swr1 chromatin remodelling complex,P31376;P35817;P38326;P53201;P53930;P60010;P80428;Q03388;Q03433;Q03940;Q05471;Q06707;Q12464;Q12509
+```
 ##  Model Context Protocol (MCP) server
 Protein quest can also help LLMs like Claude Sonnet 4 by providing a [set of tools](https://modelcontextprotocol.io/docs/learn/server-concepts#tools-ai-actions) for protein structures.

{protein_quest-0.3.2 → protein_quest-0.5.0}/README.md RENAMED Viewed

@@ -17,6 +17,10 @@ It uses
 - [gemmi](https://project-gemmi.github.io/) to work with macromolecular models.
 - [dask-distributed](https://docs.dask.org/en/latest/) to compute in parallel.
+The package is used by
+- [protein-detective](https://github.com/haddocking/protein-detective)
 An example workflow:
 ```mermaid
@@ -26,12 +30,14 @@ graph TB;
     searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
     searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
     searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
+    searchintactionpartners[/Search interaction partners/] -.-x |uniprot_accessions|searchuniprot
+    searchcomplexes[/Search complexes/]
     searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
     searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
     searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
-    fetchpdbe -->|mmcif_files_with_uniprot_acc| chainfilter{{Filter on chain of uniprot}}
+    fetchpdbe -->|mmcif_files| chainfilter{{Filter on chain of uniprot}}
     chainfilter --> |mmcif_files| residuefilter{{Filter on chain length}}
-    fetchad -->|pdb_files| confidencefilter{{Filter out low confidence}}
+    fetchad -->|mmcif_files| confidencefilter{{Filter out low confidence}}
     confidencefilter --> |mmcif_files| ssfilter{{Filter on secondary structure}}
     residuefilter --> |mmcif_files| ssfilter
     classDef dashedBorder stroke-dasharray: 5 5;
@@ -39,6 +45,8 @@ graph TB;
     taxonomy:::dashedBorder
     searchemdb:::dashedBorder
     fetchemdb:::dashedBorder
+    searchintactionpartners:::dashedBorder
+    searchcomplexes:::dashedBorder
 ```
 (Dotted nodes and edges are side-quests.)
@@ -60,6 +68,9 @@ The main entry point is the `protein-quest` command line tool which has multiple
 To use programmaticly, see the [Jupyter notebooks](https://www.bonvinlab.org/protein-quest/notebooks) and [API documentation](https://www.bonvinlab.org/protein-quest/autoapi/summary/).
+While downloading or copying files it uses a global cache (located at `~/.cache/protein-quest`) and hardlinks to save disk space and improve speed.
+This behavior can be customized with the `--no-cache`, `--cache-dir`, and `--copy-method` command line arguments.
 ### Search Uniprot accessions
 ```shell
@@ -174,6 +185,32 @@ You can use following command to search for a Gene Ontology (GO) term.
 protein-quest search go --limit 5 --aspect cellular_component apoptosome -
 ```
+### Search for interaction partners
+Use https://www.ebi.ac.uk/complexportal to find interaction partners of given UniProt accession.
+```shell
+protein-quest search interaction-partners Q05471 interaction-partners-of-Q05471.txt
+```
+The `interaction-partners-of-Q05471.txt` file contains uniprot accessions (one per line).
+### Search for complexes
+Given Uniprot accessions search for macromolecular complexes at https://www.ebi.ac.uk/complexportal
+and return the complex entries and their members.
+```shell
+echo Q05471 | protein-quest search complexes - complexes.csv
+```
+The `complexes.csv` looks like
+```csv
+query_protein,complex_id,complex_url,complex_title,members
+Q05471,CPX-2122,https://www.ebi.ac.uk/complexportal/complex/CPX-2122,Swr1 chromatin remodelling complex,P31376;P35817;P38326;P53201;P53930;P60010;P80428;Q03388;Q03433;Q03940;Q05471;Q06707;Q12464;Q12509
+```
 ##  Model Context Protocol (MCP) server
 Protein quest can also help LLMs like Claude Sonnet 4 by providing a [set of tools](https://modelcontextprotocol.io/docs/learn/server-concepts#tools-ai-actions) for protein structures.

{protein_quest-0.3.2 → protein_quest-0.5.0}/docs/notebooks/uniprot.ipynb RENAMED Viewed

@@ -12,7 +12,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 1,
    "id": "85674583",
    "metadata": {},
    "outputs": [],
@@ -282,6 +282,99 @@
     "first_uniprot = next(iter(uniprot_accessions.items()))\n",
     "pprint(first_uniprot)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e32a95f8",
+   "metadata": {},
+   "source": [
+    "## Find interaction partners for uniprot entries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d035c702",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from protein_quest.uniprot import search4interaction_partners, search4macromolecular_complexes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "601c690a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Helicase SWR1 in yeast\n",
+    "uniprot_accession = \"Q05471\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "173c764d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'Q12464': {'CPX-2122'},\n",
+       " 'P35817': {'CPX-2122'},\n",
+       " 'P80428': {'CPX-2122'},\n",
+       " 'Q12509': {'CPX-2122'},\n",
+       " 'Q03388': {'CPX-2122'},\n",
+       " 'P53201': {'CPX-2122'},\n",
+       " 'P53930': {'CPX-2122'},\n",
+       " 'P60010': {'CPX-2122'},\n",
+       " 'Q03433': {'CPX-2122'},\n",
+       " 'Q06707': {'CPX-2122'},\n",
+       " 'P38326': {'CPX-2122'},\n",
+       " 'P31376': {'CPX-2122'},\n",
+       " 'Q03940': {'CPX-2122'}}"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "partners = search4interaction_partners(uniprot_accession, limit=100)\n",
+    "partners"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a763b6f8",
+   "metadata": {},
+   "source": [
+    "To get more information about the complex you can search for the complexes themselves with:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "236050ea",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[ComplexPortalEntry(query_protein='Q05471', complex_id='CPX-2122', complex_url='https://www.ebi.ac.uk/complexportal/complex/CPX-2122', complex_title='Swr1 chromatin remodelling complex', members={'P35817', 'Q05471', 'Q12464', 'Q12509', 'Q06707', 'Q03433', 'P38326', 'P53201', 'Q03388', 'P53930', 'P80428', 'Q03940', 'P60010', 'P31376'})]"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "complexes = search4macromolecular_complexes([uniprot_accession])\n",
+    "complexes"
+   ]
   }
  ],
  "metadata": {
@@ -300,7 +393,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.2"
+   "version": "3.13.5"
   }
  },
  "nbformat": 4,

{protein_quest-0.3.2 → protein_quest-0.5.0}/mkdocs.yml RENAMED Viewed

@@ -3,10 +3,6 @@ site_url: https://bonvinlab.org/protein_quest
 repo_name: haddocking/protein-quest
 repo_url: https://github.com/haddocking/protein-quest
 watch: [mkdocs.yml, README.md, src/protein_quest]
-exclude_docs: |
-  cli_doc_hook.py
-hooks:
-  - docs/cli_doc_hook.py
 use_directory_urls: false
 theme:
   name: material
@@ -61,6 +57,9 @@ plugins:
         remove_tag_config:
             remove_input_tags:
                 - hide_code
+  - mkdocs-rich-argparse:
+      module: protein_quest.cli
+      factory: make_parser
 markdown_extensions:
   # Use to render part of README as home

{protein_quest-0.3.2 → protein_quest-0.5.0}/pyproject.toml RENAMED Viewed

@@ -20,6 +20,7 @@ dependencies = [
     "sparqlwrapper>=2.0.0",
     "tqdm>=4.67.1",
     "yarl>=1.20.1",
+    "platformdirs>=4.3.8",
 ]
 [project.urls]
@@ -52,10 +53,12 @@ dev = [
 ]
 docs = [
     "ipykernel>=6.29.5", # For notebook support in VS Code
+    "ipywidgets", # For tqdm support in notebooks
     "mkdocs>=1.6.1",
     "mkdocs-autoapi>=0.4.1",
     "mkdocs-jupyter>=0.25.1",
     "mkdocs-material>=9.6.14",
+    "mkdocs-rich-argparse>=0.1.2",
     "mkdocstrings[python]>=0.29.1",
 ]
 docs-type = [

{protein_quest-0.3.2 → protein_quest-0.5.0}/src/protein_quest/__version__.py RENAMED Viewed

@@ -1,2 +1,2 @@
-__version__ = "0.3.2"
+__version__ = "0.5.0"
 """The version of the package."""

{protein_quest-0.3.2 → protein_quest-0.5.0}/src/protein_quest/alphafold/fetch.py RENAMED Viewed

@@ -14,7 +14,7 @@ from yarl import URL
 from protein_quest.alphafold.entry_summary import EntrySummary
 from protein_quest.converter import converter
-from protein_quest.utils import friendly_session, retrieve_files, run_async
+from protein_quest.utils import Cacher, PassthroughCacher, friendly_session, retrieve_files, run_async
 logger = logging.getLogger(__name__)
@@ -104,7 +104,7 @@ class AlphaFoldEntry:
 async def fetch_summary(
-    qualifier: str, session: RetryClient, semaphore: Semaphore, save_dir: Path | None
+    qualifier: str, session: RetryClient, semaphore: Semaphore, save_dir: Path | None, cacher: Cacher
 ) -> list[EntrySummary]:
     """Fetches a summary from the AlphaFold database for a given qualifier.
@@ -116,6 +116,7 @@ async def fetch_summary(
         save_dir: An optional directory to save the fetched summary as a JSON file.
             If set and summary exists then summary will be loaded from disk instead of being fetched from the API.
             If not set then the summary will not be saved to disk and will always be fetched from the API.
+        cacher: A cacher to use for caching the fetched summary. Only used if save_dir is not None.
     Returns:
         A list of EntrySummary objects representing the fetched summary.
@@ -124,6 +125,11 @@ async def fetch_summary(
     fn: AsyncPath | None = None
     if save_dir is not None:
         fn = AsyncPath(save_dir / f"{qualifier}.json")
+        cached_file = await cacher.copy_from_cache(Path(fn))
+        if cached_file is not None:
+            logger.debug(f"Using cached file {cached_file} for summary of {qualifier}.")
+            raw_data = await AsyncPath(cached_file).read_bytes()
+            return converter.loads(raw_data, list[EntrySummary])
         if await fn.exists():
             logger.debug(f"File {fn} already exists. Skipping download from {url}.")
             raw_data = await fn.read_bytes()
@@ -133,18 +139,23 @@ async def fetch_summary(
         raw_data = await response.content.read()
         if fn is not None:
             # TODO return fn and make it part of AlphaFoldEntry as summary_file prop
-            await fn.write_bytes(raw_data)
+            await cacher.write_bytes(Path(fn), raw_data)
         return converter.loads(raw_data, list[EntrySummary])
 async def fetch_summaries(
-    qualifiers: Iterable[str], save_dir: Path | None = None, max_parallel_downloads: int = 5
+    qualifiers: Iterable[str],
+    save_dir: Path | None = None,
+    max_parallel_downloads: int = 5,
+    cacher: Cacher | None = None,
 ) -> AsyncGenerator[EntrySummary]:
     semaphore = Semaphore(max_parallel_downloads)
     if save_dir is not None:
         save_dir.mkdir(parents=True, exist_ok=True)
+    if cacher is None:
+        cacher = PassthroughCacher()
     async with friendly_session() as session:
-        tasks = [fetch_summary(qualifier, session, semaphore, save_dir) for qualifier in qualifiers]
+        tasks = [fetch_summary(qualifier, session, semaphore, save_dir, cacher) for qualifier in qualifiers]
         summaries_per_qualifier: list[list[EntrySummary]] = await tqdm.gather(
             *tasks, desc="Fetching Alphafold summaries"
         )
@@ -154,7 +165,11 @@ async def fetch_summaries(
 async def fetch_many_async(
-    uniprot_accessions: Iterable[str], save_dir: Path, what: set[DownloadableFormat], max_parallel_downloads: int = 5
+    uniprot_accessions: Iterable[str],
+    save_dir: Path,
+    what: set[DownloadableFormat],
+    max_parallel_downloads: int = 5,
+    cacher: Cacher | None = None,
 ) -> AsyncGenerator[AlphaFoldEntry]:
     """Asynchronously fetches summaries and files from
     [AlphaFold Protein Structure Database](https://alphafold.ebi.ac.uk/).
@@ -164,15 +179,17 @@ async def fetch_many_async(
         save_dir: The directory to save the fetched files to.
         what: A set of formats to download.
         max_parallel_downloads: The maximum number of parallel downloads.
+        cacher: A cacher to use for caching the fetched files. Only used if summary is in what set.
     Yields:
         A dataclass containing the summary, pdb file, and pae file.
     """
     save_dir_for_summaries = save_dir if "summary" in what and save_dir is not None else None
     summaries = [
         s
         async for s in fetch_summaries(
-            uniprot_accessions, save_dir_for_summaries, max_parallel_downloads=max_parallel_downloads
+            uniprot_accessions, save_dir_for_summaries, max_parallel_downloads=max_parallel_downloads, cacher=cacher
         )
     ]
@@ -183,6 +200,7 @@ async def fetch_many_async(
         save_dir,
         desc="Downloading AlphaFold files",
         max_parallel_downloads=max_parallel_downloads,
+        cacher=cacher,
     )
     for summary in summaries:
         yield AlphaFoldEntry(
@@ -236,7 +254,11 @@ def files_to_download(what: set[DownloadableFormat], summaries: Iterable[EntrySu
 def fetch_many(
-    ids: Iterable[str], save_dir: Path, what: set[DownloadableFormat], max_parallel_downloads: int = 5
+    ids: Iterable[str],
+    save_dir: Path,
+    what: set[DownloadableFormat],
+    max_parallel_downloads: int = 5,
+    cacher: Cacher | None = None,
 ) -> list[AlphaFoldEntry]:
     """Synchronously fetches summaries and pdb and pae files from AlphaFold Protein Structure Database.
@@ -245,6 +267,7 @@ def fetch_many(
         save_dir: The directory to save the fetched files to.
         what: A set of formats to download.
         max_parallel_downloads: The maximum number of parallel downloads.
+        cacher: A cacher to use for caching the fetched files. Only used if summary is in what set.
     Returns:
         A list of AlphaFoldEntry dataclasses containing the summary, pdb file, and pae file.
@@ -253,7 +276,9 @@ def fetch_many(
     async def gather_entries():
         return [
             entry
-            async for entry in fetch_many_async(ids, save_dir, what, max_parallel_downloads=max_parallel_downloads)
+            async for entry in fetch_many_async(
+                ids, save_dir, what, max_parallel_downloads=max_parallel_downloads, cacher=cacher
+            )
         ]
     return run_async(gather_entries())

protein-quest 0.3.2__tar.gz → 0.5.0__tar.gz

Potentially problematic release.

protein-quest 0.3.2tar.gz → 0.5.0tar.gz