PyPI - XspecT - Versions diffs - 0.1.3__py3-none-any.whl → 0.2.0__py3-none-any.whl - Mend

XspecT 0.1.3py3-none-any.whl → 0.2.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of XspecT might be problematic. Click here for more details.

Files changed (58) hide show

{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/METADATA +23 -29
XspecT-0.2.0.dist-info/RECORD +30 -0
{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/WHEEL +1 -1
xspect/definitions.py +42 -0
xspect/download_filters.py +11 -26
xspect/fastapi.py +101 -0
xspect/file_io.py +34 -103
xspect/main.py +70 -66
xspect/model_management.py +88 -0
xspect/models/__init__.py +0 -0
xspect/models/probabilistic_filter_model.py +277 -0
xspect/models/probabilistic_filter_svm_model.py +169 -0
xspect/models/probabilistic_single_filter_model.py +109 -0
xspect/models/result.py +148 -0
xspect/pipeline.py +201 -0
xspect/run.py +38 -0
xspect/train.py +304 -0
xspect/train_filter/create_svm.py +6 -183
xspect/train_filter/extract_and_concatenate.py +117 -121
xspect/train_filter/html_scrap.py +16 -28
xspect/train_filter/ncbi_api/download_assemblies.py +7 -8
xspect/train_filter/ncbi_api/ncbi_assembly_metadata.py +9 -17
xspect/train_filter/ncbi_api/ncbi_children_tree.py +3 -2
xspect/train_filter/ncbi_api/ncbi_taxon_metadata.py +7 -5
XspecT-0.1.3.dist-info/RECORD +0 -49
xspect/BF_v2.py +0 -637
xspect/Bootstrap.py +0 -29
xspect/Classifier.py +0 -142
xspect/OXA_Table.py +0 -53
xspect/WebApp.py +0 -724
xspect/XspecT_mini.py +0 -1363
xspect/XspecT_trainer.py +0 -611
xspect/map_kmers.py +0 -155
xspect/search_filter.py +0 -504
xspect/static/How-To.png +0 -0
xspect/static/Logo.png +0 -0
xspect/static/Logo2.png +0 -0
xspect/static/Workflow_AspecT.png +0 -0
xspect/static/Workflow_ClAssT.png +0 -0
xspect/static/js.js +0 -615
xspect/static/main.css +0 -280
xspect/templates/400.html +0 -64
xspect/templates/401.html +0 -62
xspect/templates/404.html +0 -62
xspect/templates/500.html +0 -62
xspect/templates/about.html +0 -544
xspect/templates/home.html +0 -51
xspect/templates/layoutabout.html +0 -87
xspect/templates/layouthome.html +0 -63
xspect/templates/layoutspecies.html +0 -468
xspect/templates/species.html +0 -33
xspect/train_filter/README_XspecT_Erweiterung.md +0 -119
xspect/train_filter/get_paths.py +0 -35
xspect/train_filter/interface_XspecT.py +0 -204
xspect/train_filter/k_mer_count.py +0 -162
{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/LICENSE +0 -0
{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/entry_points.txt +0 -0
{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/top_level.txt +0 -0

{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: XspecT
-Version: 0.1.3
+Version: 0.2.0
 Summary: Tool to monitor and characterize pathogens using Bloom filters.
 License: MIT License
@@ -32,25 +32,19 @@ Classifier: License :: OSI Approved :: MIT License
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: Flask
-Requires-Dist: Flask-WTF
-Requires-Dist: WTForms
-Requires-Dist: Werkzeug
 Requires-Dist: biopython
-Requires-Dist: bitarray
-Requires-Dist: mmh3
-Requires-Dist: numpy
-Requires-Dist: pandas
 Requires-Dist: requests
 Requires-Dist: scikit-learn
-Requires-Dist: Psutil
-Requires-Dist: Matplotlib
-Requires-Dist: Pympler
-Requires-Dist: H5py
 Requires-Dist: Bio
-Requires-Dist: wheel
 Requires-Dist: loguru
 Requires-Dist: click
+Requires-Dist: python-slugify
+Requires-Dist: cobs-reloaded
+Requires-Dist: rbloom
+Requires-Dist: xxhash
+Requires-Dist: fastapi
+Requires-Dist: uvicorn
+Requires-Dist: python-multipart
 Provides-Extra: docs
 Requires-Dist: sphinx ; extra == 'docs'
 Requires-Dist: furo ; extra == 'docs'
@@ -62,9 +56,13 @@ Requires-Dist: pytest ; extra == 'test'
 Requires-Dist: pytest-cov ; extra == 'test'
 # XspecT - Acinetobacter Species Assignment Tool
-<img src="/src/xspect/static/Logo.png" height="50%" width="50%">
+![Test](https://github.com/bionf/xspect2/actions/workflows/test.yml/badge.svg)
+[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+<img src="/src/docs/img/logo.png" height="50%" width="50%">
 <!-- start intro -->
-XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters](https://en.wikipedia.org/wiki/Bloom_filter) and a [Support Vector Machine](https://en.wikipedia.org/wiki/Support-vector_machine). It also identifies existing [blaOxa-genes](https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)) and provides a list of relevant research papers for further information.
+XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
 <br/><br/>
 XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
@@ -74,6 +72,10 @@ Local extensions of the reference database are supported.
 <br/>
 The tool is available as a web-based application and a smaller command line interface.
+[Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
+[Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
+[blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
 <!-- end intro -->
 <!-- start quickstart -->
@@ -82,11 +84,7 @@ To install Xspect, please download the lastest 64 bit Python version and install
 ```
 pip install xspect
 ```
-If you would like to train filters yourself, you need to install Jellyfish, which is used to count distinct k-meres in the assemblies. It can be installed using bioconda:
-```
-conda install -c bioconda jellyfish
-```
-On Apple Silicon, it is possible that this command installs an incorrect Jellyfish package. Please refer to the official [Jellyfish project](https://github.com/gmarcais/Jellyfish) for installation guidance.
+Please note that Apple Silicon is currently not supported.
 ## Usage
 ### Get the Bloomfilters
@@ -100,9 +98,9 @@ xspect train you-ncbi-genus-name
 ```
 ### How to run the web app
-Run the following command lines in a console, a browser window will open automatically after the application is fully loaded.
+To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
 ```
-xspect web
+xspect api
 ```
 ### How to use the XspecT command line interface
@@ -110,13 +108,9 @@ Run xspect with the configuration you want to run it with as arguments.
 ```
 xspect classify your-genus path/to/your/input-set
 ```
-For further instructions on how to use the command line interface, execute:
+For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
 ```
 xspect --help
 ```
+[documentation]: https://bionf.github.io/XspecT2/cli.html
 <!-- end quickstart -->
-## Input Data
-XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
-The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).

XspecT-0.2.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,30 @@
+xspect/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+xspect/definitions.py,sha256=gg6NvT8ypNzlnJvMMo3nHsyh8DHFFu41lOfnILkRDpE,1215
+xspect/download_filters.py,sha256=ByE7Oggx-AyJ02Wirk_wcJHNdRDrJMfjwhmUe5tgWbE,741
+xspect/fastapi.py,sha256=UuUr3eQUL0tCcB2d_ZKMToqreNLSNRKpCKK3-lwAzVo,3208
+xspect/file_io.py,sha256=zKhl6Fd9KZAYiD8YgIyje5TbDYk5lxMp1WUrNkGSBo8,2779
+xspect/main.py,sha256=rFoHKBC9UANlZh3TccZAJbOZ6023BnQaGEoPjjJjW0A,3572
+xspect/model_management.py,sha256=w0aqjLUoixCokyKTYrcN1vih5IoLYLJG9p8aeYdVc8Y,3560
+xspect/pipeline.py,sha256=h7duhVZ-hupwO_KQPstzFo8KMfMI2yleb9HmtTiMjic,7219
+xspect/run.py,sha256=OJ7pCFqva3AhIYklKjVnqWGooVRO7S3b56kIAy-xabY,1189
+xspect/train.py,sha256=khC1lldqfr4NvzLUiSJjSlh7DBG1ePielvQMiB29Hl8,10399
+xspect/models/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+xspect/models/probabilistic_filter_model.py,sha256=ImyNRzR7jf2CBPGI65ItG0_eYmrQjo9soQYlsM0r-P0,9829
+xspect/models/probabilistic_filter_svm_model.py,sha256=9Q4SBAzgbqATpS2E3IoardPpBwqkyrYSnrMwh0zwSag,5420
+xspect/models/probabilistic_single_filter_model.py,sha256=nDAd_-_Ci2eH0KOJtf4wA-w63FMq9rGSR1LGiIA-gdw,3884
+xspect/models/result.py,sha256=vHUEFXvbFyB8WmasXp99IrztjwaxH1f9QMFiRUPe40Q,4824
+xspect/train_filter/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+xspect/train_filter/create_svm.py,sha256=w6gq40yHINVfNzLhJfYFykUaNCwpU9AEDcbkUfis3DY,1504
+xspect/train_filter/extract_and_concatenate.py,sha256=lLrczGgfZi2vAGqxq8fcEmJi5pvqyK33JkB_ZoCNYG8,4840
+xspect/train_filter/html_scrap.py,sha256=76VV_ZbvD2I3IxRb62SiQwRPu2tr4fwn1HkfJQYaosM,3809
+xspect/train_filter/ncbi_api/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+xspect/train_filter/ncbi_api/download_assemblies.py,sha256=MB_mxSjCTL05DqIt1WQem8AGU3PjtJnzPndeI9J-AOI,1285
+xspect/train_filter/ncbi_api/ncbi_assembly_metadata.py,sha256=puzDIws-yyBAEHwSAIYUM7g8FpLFmvOKh5xH1EsY8ZE,3830
+xspect/train_filter/ncbi_api/ncbi_children_tree.py,sha256=_8puOsnsKp5lsMV2gZY1ijkfD_BZKG9eXZCX09qph5E,1819
+xspect/train_filter/ncbi_api/ncbi_taxon_metadata.py,sha256=O6JDXC4E6AYaf7NPnb34eSJyZhMB8r--bjoVF_ZsEdA,1868
+XspecT-0.2.0.dist-info/LICENSE,sha256=bhBGDKIRUVwYIHGOGO5hshzuVHyqFJajvSOA3XXOLKI,1094
+XspecT-0.2.0.dist-info/METADATA,sha256=efT3SkWV55firuZJh1gHCN7061Fxda7teuFLeZHvJQ0,4826
+XspecT-0.2.0.dist-info/WHEEL,sha256=Mdi9PDNwEZptOjTlUcAth7XJDFtKrHYaQMPulZeBCiQ,91
+XspecT-0.2.0.dist-info/entry_points.txt,sha256=L7qliX3pIuwupQxpuOSsrBJCSHYPOPNEzH8KZKQGGUw,43
+XspecT-0.2.0.dist-info/top_level.txt,sha256=hdoa4cnBv6OVzpyhMmyxpJxEydH5n2lDciy8urc1paE,7
+XspecT-0.2.0.dist-info/RECORD,,

{XspecT-0.1.3.dist-info → XspecT-0.2.0.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: bdist_wheel (0.42.0)
+Generator: setuptools (73.0.1)
 Root-Is-Purelib: true
 Tag: py3-none-any

xspect/definitions.py ADDED Viewed

@@ -0,0 +1,42 @@
+"""This module contains definitions for the XspecT package."""
+from pathlib import Path
+from os import getcwd
+fasta_endings = ["fasta", "fna", "fa", "ffn", "frn"]
+fastq_endings = ["fastq", "fq"]
+def get_xspect_root_path():
+    """Return the root path for XspecT data."""
+    root_path = Path(getcwd()) / "xspect-data"
+    root_path.mkdir(exist_ok=True, parents=True)
+    return root_path
+def get_xspect_model_path():
+    """Return the path to the XspecT models."""
+    model_path = get_xspect_root_path() / "models"
+    model_path.mkdir(exist_ok=True, parents=True)
+    return model_path
+def get_xspect_tmp_path():
+    """Return the path to the XspecT temporary files."""
+    tmp_path = get_xspect_root_path() / "tmp"
+    tmp_path.mkdir(exist_ok=True, parents=True)
+    return tmp_path
+def get_xspect_upload_path():
+    """Return the path to the XspecT upload directory."""
+    upload_path = get_xspect_root_path() / "uploads"
+    upload_path.mkdir(exist_ok=True, parents=True)
+    return upload_path
+def get_xspect_runs_path():
+    """Return the path to the XspecT runs directory."""
+    runs_path = get_xspect_root_path() / "runs"
+    runs_path.mkdir(exist_ok=True, parents=True)
+    return runs_path

xspect/download_filters.py CHANGED Viewed

@@ -4,45 +4,30 @@ import os
 import shutil
 import requests
+from xspect.definitions import get_xspect_model_path, get_xspect_tmp_path
 def download_test_filters(url):
     """Download filters."""
-    if not os.path.exists("filter"):
-        os.makedirs("filter")
-    if not os.path.exists("Training_data"):
-        os.makedirs("Training_data")
+    download_path = get_xspect_tmp_path() / "models.zip"
+    extract_path = get_xspect_tmp_path() / "extracted_models"
     r = requests.get(url, allow_redirects=True, timeout=10)
-    with open("filter/filters.zip", "wb") as f:
+    with open(download_path, "wb") as f:
         f.write(r.content)
     shutil.unpack_archive(
-        "filter/filters.zip",
-        "filter/temp",
+        download_path,
+        extract_path,
         "zip",
     )
     shutil.copytree(
-        "filter/temp/filters/Training_data",
-        "Training_data",
+        extract_path,
+        get_xspect_model_path(),
         dirs_exist_ok=True,
     )
-    shutil.rmtree("filter/temp/filters/Training_data")
-    shutil.copytree(
-        "filter/temp/filters",
-        "filter",
-        dirs_exist_ok=True,
-    )
-    shutil.rmtree("filter/temp")
-    os.remove("filter/filters.zip")
-    saved_options = ["Salmonella"]
-    with open("saved_options.txt", "w") as f:
-        for item in saved_options:
-            f.write("%s\n" % item)
+    os.remove(download_path)
+    shutil.rmtree(extract_path)

xspect/fastapi.py ADDED Viewed

@@ -0,0 +1,101 @@
+"""FastAPI application for XspecT."""
+import datetime
+from pathlib import Path
+from shutil import copyfileobj
+from fastapi import FastAPI, UploadFile, BackgroundTasks
+from xspect.definitions import get_xspect_runs_path, get_xspect_upload_path
+from xspect.download_filters import download_test_filters
+import xspect.model_management as mm
+from xspect.models.result import StepType
+from xspect.pipeline import ModelExecution, Pipeline, PipelineStep
+from xspect.train import train_ncbi
+app = FastAPI()
+@app.get("/download-filters")
+def download_filters():
+    """Download filters."""
+    download_test_filters("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
+@app.get("/classify")
+def classify(genus: str, file: str, meta: bool = False, step: int = 500):
+    """Classify uploaded sample."""
+    path = get_xspect_upload_path() / file
+    pipeline = Pipeline(genus + " classification", "Test Author", "test@example.com")
+    species_execution = ModelExecution(genus + "-species", sparse_sampling_step=step)
+    if meta:
+        species_filtering_step = PipelineStep(
+            StepType.FILTERING, genus, 0.7, species_execution
+        )
+        genus_execution = ModelExecution(genus + "-genus", sparse_sampling_step=step)
+        genus_execution.add_pipeline_step(species_filtering_step)
+        pipeline.add_pipeline_step(genus_execution)
+    else:
+        pipeline.add_pipeline_step(species_execution)
+    run = pipeline.run(Path(path))
+    time_str = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
+    save_path = get_xspect_runs_path() / f"run_{time_str}.json"
+    run.save(save_path)
+    return run.to_dict()
+@app.post("/train")
+def train(genus: str, background_tasks: BackgroundTasks, svm_steps: int = 1):
+    """Train NCBI model."""
+    background_tasks.add_task(train_ncbi, genus, svm_steps)
+    return {"message": "Training started."}
+@app.get("/list-models")
+def list_models():
+    """List available models."""
+    return mm.get_models()
+@app.get("/model-metadata")
+def get_model_metadata(model_slug: str):
+    """Get metadata of a model."""
+    return mm.get_model_metadata(model_slug)
+@app.post("/model-metadata")
+def post_model_metadata(model_slug: str, author: str, author_email: str):
+    """Update metadata of a model."""
+    try:
+        mm.update_model_metadata(model_slug, author, author_email)
+    except ValueError as e:
+        return {"error": str(e)}
+    return {"message": "Metadata updated."}
+@app.post("/model-display-name")
+def post_model_display_name(model_slug: str, filter_id: str, display_name: str):
+    """Update display name of a filter in a model."""
+    try:
+        mm.update_model_display_name(model_slug, filter_id, display_name)
+    except ValueError as e:
+        return {"error": str(e)}
+    return {"message": "Display name updated."}
+@app.post("/upload-file")
+def upload_file(file: UploadFile):
+    """Upload file to the server."""
+    upload_path = get_xspect_upload_path() / file.filename
+    if not upload_path.exists():
+        try:
+            with upload_path.open("wb") as buffer:
+                copyfileobj(file.file, buffer)
+        finally:
+            file.file.close()
+    return {"filename": file.filename}

xspect/file_io.py CHANGED Viewed

@@ -2,107 +2,11 @@
 File IO module.
 """
-from linecache import getline
 import os
 from pathlib import Path
 import zipfile
-from loguru import logger
-def check_folder_structure():
-    """Checks the folder structure and creates new folders if needed."""
-    # Create list of all folder paths.
-    root_path = Path(os.getcwd())
-    filter_path = root_path / "filter"
-    meta_path = root_path / "genus_metadata"
-    filter_folder_names = [
-        "array_sizes",
-        "Metagenomes",
-        "species_names",
-        "translation_dicts",
-    ]
-    folder_paths = [filter_path, meta_path]
-    for filter_folder_name in filter_folder_names:
-        filter_folder_path = filter_path / filter_folder_name
-        folder_paths.append(filter_folder_path)
-    # Check if folders exist. If not create them.
-    for folder_path in folder_paths:
-        if not os.path.isdir(folder_path):
-            os.mkdir(folder_path)
-def delete_non_fasta(files):
-    """Delete all non fasta files from the list and return the list without those file names.
-    :param files: List of file names.
-    :type files: list[str]
-    :return: List with only fasta files.
-    """
-    # All possible fasta file endings.
-    fasta_endings = ["fasta", "fna", "fa", "ffn", "frn"]
-    # Iterate through file list backwards and delete all non fasta files.
-    for i in range(len(files) - 1, -1, -1):
-        file = files[i].split(".")
-        if file[-1] in fasta_endings:
-            continue
-        else:
-            del files[i]
-    return files
-def get_accessions(file_names: list[str]) -> list[str]:
-    """Extract accessions from file names.
-    :param files: List of file names.
-    :type files: list[str]
-    :return: List of all accessions.
-    :rtype: list[str]
-    """
-    accessions = []
-    for idx, file in enumerate(file_names):
-        accessions.append(file.split("_"))
-        accessions[idx] = accessions[idx][0] + "_" + accessions[idx][1]
-    return accessions
-def get_file_paths(base_path: Path, file_names: list[str]) -> list[Path]:
-    """Make a list with the paths to the files.
-    :param base_path: Path of the parent directory.
-    :type base_path: Path
-    :param files: List of file names.
-    :type files: list[str]
-    :return: A list with all file paths.
-    :rtype: list[Path]
-    """
-    return [base_path / file for file in file_names]
-def get_species_names(file_paths: list[Path]):
-    """Extracts the species names.
-    :param file_paths: List with the file paths.
-    :type file_paths: list[Path]
-    :return: List with all species names.
-    """
-    names = list()
-    for path in file_paths:
-        header = getline(str(path), 1)
-        name = header.replace("\n", "").replace(">", "")
-        if not name.isdigit():
-            logger.error(
-                "The header of file: {path} does not contain a correct ID: {name}. The ID needs to be "
-                "just numbers"
-            )
-            logger.error("Aborting")
-            exit()
-        names.append(name)
-    return names
+from Bio import SeqIO
+from xspect.definitions import fasta_endings, fastq_endings
 def delete_zip_files(dir_path):
@@ -129,7 +33,7 @@ def extract_zip(zip_path, unzipped_path):
 def concatenate_meta(path: Path, genus: str):
-    """Concatenates all concatenated fasta files that are used to train bloomfilters to one fasta file.
+    """Concatenates all species files to one fasta file.
     :param path: Path to the directory with the concatenated fasta files.
     :type path: Path
@@ -137,20 +41,47 @@ def concatenate_meta(path: Path, genus: str):
     :type genus: str
     """
     files_path = path / "concatenate"
-    fasta_endings = ["fasta", "fna", "fa", "ffn", "frn"]
     meta_path = path / (genus + ".fasta")
     files = os.listdir(files_path)
-    with open(meta_path, "w") as meta_file:
+    with open(meta_path, "w", encoding="utf-8") as meta_file:
         # Write the header.
         meta_header = f">{genus} metagenome\n"
         meta_file.write(meta_header)
         # Open each concatenated species file and write the sequence in the meta file.
         for file in files:
-            file_ending = str(file).split(".")[-1]
+            file_ending = str(file).rsplit(".", maxsplit=1)[-1]
             if file_ending in fasta_endings:
-                with open((files_path / str(file)), "r") as species_file:
+                with open(
+                    (files_path / str(file)), "r", encoding="utf-8"
+                ) as species_file:
                     for line in species_file:
                         if line[0] != ">":
                             meta_file.write(line.replace("\n", ""))
+def get_record_iterator(file_path: Path):
+    """Returns a record iterator for a fasta or fastq file."""
+    if not isinstance(file_path, Path):
+        raise ValueError("Path must be a Path object")
+    if not file_path.exists():
+        raise ValueError("File does not exist")
+    if not file_path.is_file():
+        raise ValueError("Path must be a file")
+    if file_path.suffix[1:] in fasta_endings:
+        return SeqIO.parse(file_path, "fasta")
+    if file_path.suffix[1:] in fastq_endings:
+        return SeqIO.parse(file_path, "fastq")
+    raise ValueError("Invalid file format, must be a fasta or fastq file")
+def get_records_by_id(file: Path, ids: list[str]):
+    """Return records with the specified ids."""
+    records = get_record_iterator(file)
+    return [record for record in records if record.id in ids]

XspecT 0.1.3__py3-none-any.whl → 0.2.0__py3-none-any.whl

Potentially problematic release.

XspecT 0.1.3py3-none-any.whl → 0.2.0py3-none-any.whl