PyPI - XspecT - Versions diffs - 0.5.2__tar.gz → 0.5.4__tar.gz - Mend

XspecT 0.5.2tar.gz → 0.5.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

{xspect-0.5.2 → xspect-0.5.4}/.gitignore RENAMED Viewed

@@ -177,4 +177,14 @@ out.png
 xspect-data/
-.devcontainer/
+.devcontainer/
+# Nextflow
+.nextflow.log*
+.nextflow/
+work/
+data/
+results/
+# Slurm
+slurm-*

{xspect-0.5.2/src/XspecT.egg-info → xspect-0.5.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: XspecT
-Version: 0.5.2
+Version: 0.5.4
 Summary: Tool to monitor and characterize pathogens using Bloom filters.
 License: MIT License

xspect-0.5.4/docs/benchmark.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Benchmark
+XspecT is a tool designed for fast and accurate species classification of genome assemblies and simulated reads. To evaluate its classification accuracy, we conducted a benchmark using a set of Acinetobacter genomes.
+The benchmark was performed by first download all available Acinetobacter genomes from Genbank, filtered on a passed ("OK") taxonomy check status. Genomes assigned to strain IDs were remapped to their respective species IDs, after which genomes with species IDs not contained in XspecT's Acinetobacter model were removed. The remaining genomes were then used to classify both assemblies and simulated reads generated from them. Simulated reads were generated by first filtering on genomes that were not part of the training data and that were categorized as "complete" by NCBI. The reads were then simulated from the longest contig of each genome (assumed to be the chromosome) using a custom Python script. Up to three genomes were selected per species. 100 000 reads were simulated for each genome, with a read length of 100 bp and no simulated sequencing errors. The reads were then classified using XspecT with predictions based on the maximum-scoring species.
+## Benchmark Results
+The benchmark results show that XspecT achieves high classification accuracy, with an overall accuracy of 99.94% for whole genomes and  87.11% for simulated reads.
+| Category          | Total    | Matches  | Mismatches | Match Rate | Mismatch Rate |
+|-------------------|----------|----------|------------|------------|---------------|
+| Assemblies        | 44,905   | 44,879   | 26         | 99.94%     | 0.06%         |
+| Simulated reads   | 9,000,000| 7,839,877| 1,160,123  | 87.11%     | 12.89%        |
+## Running the benchmark yourself
+To benchmark XspecT performance yourself, you can use the Nextflow workflow provided in the `scripts/benchmark` directory. This workflow allows you to run XspecT on a set of samples and measure species classification accuracy on both genome assemblies, as well as on simulated reads.
+Before you run the benchmark, you first need to download benchmarking data to the `data` directory, for example from NCBI. To do so, you can use the bash script in `scripts/benchmark-data` to download the data using the [NCBI Datasets CLI](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/command-line-tools/download-and-install/), which needs to be installed first. The script will download all available Acinetobacter genomes, as well as taxonomic data.
+To run the benchmark, install [Nextflow](https://www.nextflow.io/docs/latest/install.html) and run the following command:
+```bash
+nextflow run scripts/benchmark
+```
+This will execute the benchmark workflow, which will classify the samples, as well as reads generated from them, using XspecT. The results will be saved in the `results` directory:
+- `results/classifications.tsv` for the classifications of the assemblies
+- `results/read_classifications.tsv` for the classifications of the simulated reads
+- `results/confusion_matrix.png` for the confusion matrix of genome assembly classifications
+- `results/mismatches_confusion_matrix.png` for a confusion matrix filtered on mismatches of genome assembly classifications
+- `results/stats.txt` for the statistics of the benchmark run

{xspect-0.5.2 → xspect-0.5.4}/docs/cli.md RENAMED Viewed

@@ -12,7 +12,7 @@ In general, XspecT commands will prompt you for parameters if they are not provi
 ## Model Management
-At its core, XspecT uses models to classify and filter samples. These models are based on kmer indices trained on publicly availabel genomes as well as, possibly, a support vector machine (SVM) classifier.
+At its core, XspecT uses models to classify and filter samples. These models are based on kmer indices trained on publicly available genomes as well as, possibly, a support vector machine (SVM) classifier.
 To manage models, the `xspect models` command can be used. This command allows you to download, train, and view available models.
@@ -114,16 +114,24 @@ xspect classify species --sparse-sampling-step 10 Acinetobacter path
 This will only consider every 10th kmer in the sample.
+### Inclusion of display names
+By default, the classification results show only the taxonomy ID of each species along with its corresponding score for better readability. To display the full names associated with each taxonomy ID, you can use the `--display-names` (or `-n`) option:
+```bash
+xspect classify species --display-names Acinetobacter path
+```
+The output will then be formatted as: `Taxonomy_ID - Display_Name: Score` for each species.
 ### MLST Classification
 Samples can also be classified based on Multi-locus sequence type schemas. To MLST-classify a sample, run:
 ```bash
-xspect classify-mlst -p path
+xspect classify mlst
 ```
 ## Filtering
-XspecT can also be used to filter samples based on their classification results. This is useful when analyzing metagenome samples, for example when looking at genomic bycatch.
+XspecT can also be used to filter samples based on their classification results. This is useful when analyzing metagenomic samples, for example when looking at genomic bycatch.
 To filter samples, the command `xspect filter` can be used. This command will filter the samples based on the specified criteria.

{xspect-0.5.2 → xspect-0.5.4}/docs/contributing.md RENAMED Viewed

@@ -20,11 +20,14 @@ Get started by cloning the repository:
 git clone https://github.com/BIONF/XspecT2.git
 ```
-You then need to build the web application using Vite. Navigate to the `xspect-web` directory and run the build command, which will also watch for changes:
+You then need to build the web application using Vite. Navigate to the `xspect-web` directory, install dependencies, and run the build command, which will also watch for changes:
 ```bash
 cd XspecT2/src/xspect/xspect-web
 ```
 ```bash
+npm i
+```
+```bash
 npx vite build --watch
 ```

{xspect-0.5.2 → xspect-0.5.4}/mkdocs.yml RENAMED Viewed

@@ -15,5 +15,7 @@ nav:
   - Quickstart: quickstart.md
   - CLI: cli.md
   - "Web App": web.md
-  - "Understanding XspecT": understanding.md
+  - "Understanding XspecT":
+    - understanding.md
+    - benchmark.md
   - Contributing: contributing.md

{xspect-0.5.2 → xspect-0.5.4}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "XspecT"
-version = "0.5.2"
+version = "0.5.4"
 description = "Tool to monitor and characterize pathogens using Bloom filters."
 readme = {file = "README.md", content-type = "text/markdown"}
 license = {file = "LICENSE"}

xspect-0.5.4/scripts/benchmark/classify/main.nf ADDED Viewed

@@ -0,0 +1,22 @@
+process classifySample {
+  conda "./scripts/benchmark/environment.yml"
+  cpus 4
+  memory '32 GB'
+  input:
+  path sample
+  output:
+  path "${sample.baseName}.json"
+  script:
+  """
+  xspect classify species -g Acinetobacter -i ${sample} -o ${sample.baseName}.json
+  """
+  stub:
+  """
+  mkdir -p results
+  touch results/${sample.baseName}.json
+  """
+}

xspect-0.5.4/scripts/benchmark/environment.yml ADDED Viewed

@@ -0,0 +1,7 @@
+name: xspect-benchmark
+channels:
+  - conda-forge
+dependencies:
+  - pip
+  - pip:
+      - XspecT

xspect-0.5.4/scripts/benchmark/main.nf ADDED Viewed

@@ -0,0 +1,473 @@
+#!/usr/bin/env nextflow
+include { classifySample as classifyAssembly } from './classify'
+include { classifySample as classifyRead } from './classify'
+process downloadModels {
+  conda "./scripts/benchmark/environment.yml"
+  cpus 2
+  memory '16 GB'
+  output:
+  path "species_model.json"
+  script:
+  """
+  if [ ! "$HOME/xspect-data/models/acinetobacter-species.json" ]; then
+    xspect models download
+  fi
+  cp "$HOME/xspect-data/models/acinetobacter-species.json" species_model.json
+  """
+}
+process getNameMapping {
+  conda "conda-forge::jq"
+  cpus 2
+  memory '16 GB'
+  input:
+  path species_model
+  output:
+  path "name_mapping.json"
+  script:
+  """
+  jq '.display_names | to_entries | map({key: .key, value: (.value | sub("Acinetobacter"; "A."))}) | from_entries' ${species_model} > name_mapping.json
+  """
+  stub:
+  """
+  touch name_mapping.json
+  """
+}
+process createAssemblyTable {
+  conda "conda-forge::ncbi-datasets-cli conda-forge::jq"
+  cpus 2
+  memory '16 GB'
+  input:
+  path genomes
+  path tax_report
+  path species_model
+  output:
+  path "assemblies.tsv"
+  when:
+  !file("assemblies.tsv").exists()
+  script:
+  """
+  inputfile="${genomes}/ncbi_dataset/data/assembly_data_report.jsonl"
+  dataformat tsv genome --inputfile \$inputfile --fields accession,assminfo-name,organism-tax-id,assminfo-level,ani-check-status > assemblies.tsv
+  # filter out assemblies with ANI check status other than "OK"
+  awk -F'\t' 'NR==1 || \$5 == "OK"' assemblies.tsv > assemblies_filtered.tsv
+  mv assemblies_filtered.tsv assemblies.tsv
+  # map taxonmic IDs to species IDs (taxonomic IDs might be strain IDs)
+  jq '
+    .reports
+    | map(select(.taxonomy.children != null))
+    | map({
+        species_id: .taxonomy.tax_id,
+        children: .taxonomy.children
+      })
+    | map(
+        . as \$entry
+        | \$entry.children
+        | map({ (tostring): \$entry.species_id })
+        | add
+      )
+    | add
+  ' ${tax_report} > tax_mapping.json
+  # add species IDs to assemblies.tsv
+  declare -A species_map
+  while IFS="=" read -r key val; do
+    species_map["\$key"]="\$val"
+  done < <(jq -r 'to_entries[] | "\\(.key)=\\(.value)"' tax_mapping.json)
+  {
+    IFS='\t' read -r -a header < assemblies.tsv
+    IFS='\t'; echo -e "\${header[*]}\tSpecies ID"
+    tail -n +2 assemblies.tsv | while IFS='\t' read -r acc name taxid level status; do
+      species_id="\${species_map[\$taxid]:-\$taxid}"
+      echo -e "\$acc\t\$name\t\$taxid\t\$level\t\$status\t\$species_id"
+    done
+  } > temp_assemblies.tsv
+  mv temp_assemblies.tsv assemblies.tsv
+  # filter out assemblies with species ID not in the species model
+  jq -r '.display_names | keys | .[]' ${species_model} > valid_species.txt
+  awk -F'\t' '
+    BEGIN {
+      while ((getline species < "valid_species.txt") > 0) {
+        valid[species] = 1;
+      }
+      close("valid_species.txt");
+    }
+    NR==1 { print; next }
+    \$6 in valid { print }
+  ' assemblies.tsv > temp_assemblies.tsv
+  mv temp_assemblies.tsv assemblies.tsv
+  rm valid_species.txt
+  """
+  stub:
+  """
+  touch assemblies.tsv
+  """
+}
+process summarizeClassifications {
+  conda "jq"
+  cpus 2
+  memory '16 GB'
+  publishDir "results"
+  input:
+  path assemblies
+  path classifications
+  output:
+  path "classifications.tsv"
+  script:
+  """
+  cp ${assemblies} classifications.tsv
+  awk 'BEGIN {FS=OFS="\t"}
+    NR==1 {print \$0, "Prediction"}
+    NR>1 {print \$0, "unknown"}' classifications.tsv > temp_classifications.tsv
+  mv temp_classifications.tsv classifications.tsv
+  for json_file in ${classifications}; do
+    basename=\$(basename \$json_file .json)
+    accession=\$(echo \$basename | cut -d'_' -f1-2)
+    prediction=\$(jq '.["prediction"]' \$json_file | tr -d '"')
+    awk -v acc="\$accession" -v pred="\$prediction" 'BEGIN {FS=OFS="\t"}
+      NR==1 {print}
+      NR>1 && \$1 ~ acc {\$NF=pred; print}
+      NR>1 && \$1 !~ acc {print}' classifications.tsv > temp_classifications.tsv
+    mv temp_classifications.tsv classifications.tsv
+  done
+  """
+}
+process selectForReadGen {
+  conda "conda-forge::pandas"
+  cpus 2
+  memory '16 GB'
+  input:
+  path assemblies
+  path species_model
+  output:
+  path "selected_samples.tsv"
+  script:
+  """
+  #!/usr/bin/env python
+  import pandas as pd
+  import json
+  assemblies = pd.read_csv('${assemblies}', sep='\\t')
+  training_accessions = []
+  with open('${species_model}', 'r') as f:
+    species_model = json.load(f)
+    for id, accession in species_model["training_accessions"].items():
+      training_accessions.extend(accession)
+  assemblies = assemblies[assemblies['Assembly Level'] == 'Complete Genome']
+  assemblies = assemblies[~assemblies['Assembly Accession'].isin(training_accessions)]
+  # use up to three assemblies for each species
+  assemblies = assemblies.groupby('Species ID').head(3)
+  assemblies.to_csv('selected_samples.tsv', sep='\\t', index=False)
+  """
+}
+process generateReads {
+  conda "conda-forge::pandas conda-forge::biopython"
+  cpus 2
+  memory '16 GB'
+  input:
+  path sample
+  output:
+  path "${sample.baseName}_simulated.fq"
+  script:
+  """
+  #!/usr/bin/env python
+  import random
+  from Bio import SeqIO
+  read_length = 100
+  num_reads = 100000
+  seed = 42
+  random.seed(seed)
+  sequences = list(SeqIO.parse("${sample}", "fasta"))
+  chromosome_sequence = max(sequences, key=len)  # we assume the longest sequence is the chromosome
+  ch_rec_id = chromosome_sequence.id
+  ch_seq = chromosome_sequence.seq
+  ch_seqlen = len(chromosome_sequence.seq)
+  with open("${sample.baseName}_simulated.fq", "w") as f:
+    for i in range(num_reads):
+      start = random.randint(0, ch_seqlen - read_length)
+      read_seq = ch_seq[start:start + read_length]
+      f.write(f"@read_{i}_{ch_rec_id}_{start}-{start+read_length}\\n")
+      f.write(f"{read_seq}\\n")
+      f.write("+\\n")
+      f.write(f"{len(read_seq)*'~'}\\n")
+  """
+}
+process summarizeReadClassifications {
+  conda "conda-forge::jq"
+  cpus 2
+  memory '16 GB'
+  publishDir "results"
+  input:
+  path read_assemblies
+  path read_classifications
+  output:
+  path "read_classifications.tsv"
+  script:
+  """
+  echo -e "Assembly Accession\tRead\tPrediction\tSpecies ID" > read_classifications.tsv
+  for json_file in ${read_classifications}; do
+    basename=\$(basename \$json_file .json)
+    accession=\$(echo \$basename | cut -d'_' -f1-2)
+    # Get species ID from assemblies table
+    species_id=\$(awk -F'\t' -v acc="\$accession" '\$1 == acc {print \$6}' ${read_assemblies})
+    # Extract predictions from JSON and append to TSV
+    jq -r --arg acc "\$accession" --arg species "\$species_id" '
+      .scores
+      | to_entries[]
+      | select(.key != "total")
+      | "\\(.key)\\t\\(.value | to_entries | max_by(.value) | .key)"
+      | "\\(\$acc)\\t" + . + "\\t\\(\$species)"
+    ' "\$json_file" >> read_classifications.tsv
+  done
+  """
+}
+process calculateStats {
+  conda "conda-forge::pandas"
+  cpus 2
+  memory '16 GB'
+  publishDir "results"
+  input:
+  path assembly_classifications
+  path read_classifications
+  output:
+  path "stats.txt"
+  script:
+  """
+  #!/usr/bin/env python
+  import pandas as pd
+  df_assembly = pd.read_csv('${assembly_classifications}', sep='\\t')
+  df_assembly['Species ID'] = df_assembly['Species ID'].astype(str)
+  df_assembly['Prediction'] = df_assembly['Prediction'].astype(str)
+  assembly_matches = df_assembly.loc[df_assembly['Species ID'] == df_assembly['Prediction']]
+  assembly_mismatches = df_assembly.loc[df_assembly['Species ID'] != df_assembly['Prediction']]
+  df_read = pd.read_csv('${read_classifications}', sep='\\t')
+  df_read['Species ID'] = df_read['Species ID'].astype(str)
+  df_read['Prediction'] = df_read['Prediction'].astype(str)
+  read_matches = df_read.loc[df_read['Species ID'] == df_read['Prediction']]
+  read_mismatches = df_read.loc[df_read['Species ID'] != df_read['Prediction']]
+  with open('stats.txt', 'w') as f:
+      f.write(f"Assembly Total: {len(df_assembly)}\\n")
+      f.write(f"Assembly Matches: {len(assembly_matches)}\\n")
+      f.write(f"Assembly Mismatches: {len(assembly_mismatches)}\\n")
+      f.write(f"Assembly Match Rate: {len(assembly_matches) / len(df_assembly) * 100:.2f}%\\n")
+      f.write(f"Assembly Mismatch Rate: {len(assembly_mismatches) / len(df_assembly) * 100:.2f}%\\n")
+      f.write("\\n")
+      f.write(f"Read Total: {len(df_read)}\\n")
+      f.write(f"Read Matches: {len(read_matches)}\\n")
+      f.write(f"Read Mismatches: {len(read_mismatches)}\\n")
+      f.write(f"Read Match Rate: {len(read_matches) / len(df_read) * 100:.2f}%\\n")
+      f.write(f"Read Mismatch Rate: {len(read_mismatches) / len(df_read) * 100:.2f}%\\n")
+  """
+}
+process confusionMatrix {
+  conda "conda-forge::pandas conda-forge::scikit-learn conda-forge::numpy conda-forge::matplotlib"
+  cpus 2
+  memory '16 GB'
+  publishDir "results"
+  input:
+  path classifications
+  path name_mapping
+  output:
+  path "confusion_matrix.png"
+  script:
+  """
+  #!/usr/bin/env python
+  import pandas as pd
+  from sklearn.metrics import confusion_matrix
+  import matplotlib.pyplot as plt
+  import numpy as np
+  import json
+  df = pd.read_csv('${classifications}', sep='\\t')
+  y_true = df["Species ID"].astype(str)
+  y_pred = df["Prediction"].astype(str)
+  with open('${name_mapping}', 'r') as f:
+      name_mapping_dict = json.load(f)
+  labels = list(set(y_true) | set(y_pred))
+  labels = sorted(labels, key=lambda x: name_mapping_dict.get(x, x))
+  display_labels = [name_mapping_dict.get(label, label) for label in labels]
+  cm = confusion_matrix(y_true, y_pred, labels=labels)
+  cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
+  plt.figure(figsize=(30, 30))
+  plt.imshow(cm_normalized, interpolation='nearest', cmap=plt.cm.Blues)
+  plt.colorbar()
+  plt.xticks(ticks=np.arange(len(labels)), labels=display_labels, rotation=90, fontsize=12)
+  plt.yticks(ticks=np.arange(len(labels)), labels=display_labels, fontsize=12)
+  plt.title('Xspect Acinetobacter Confusion Matrix', fontsize=24)
+  plt.xlabel('Predicted Labels', fontsize=20)
+  plt.ylabel('True Labels', fontsize=20)
+  plt.savefig('confusion_matrix.png', dpi=300, bbox_inches='tight')
+  """
+}
+process mismatchConfusionMatrix {
+  conda "conda-forge::pandas conda-forge::scikit-learn conda-forge::numpy conda-forge::matplotlib"
+  cpus 2
+  memory '16 GB'
+  publishDir "results"
+  input:
+  path classifications
+  path name_mapping
+  output:
+  path "mismatches_confusion_matrix.png"
+  script:
+  """
+  #!/usr/bin/env python
+  import pandas as pd
+  from sklearn.metrics import confusion_matrix
+  import matplotlib.pyplot as plt
+  import numpy as np
+  import json
+  df = pd.read_csv('${classifications}', sep='\\t')
+  df["Species ID"] = df["Species ID"].astype(str)
+  df["Prediction"] = df["Prediction"].astype(str)
+  df_comparison_mismatch = df[df["Species ID"] != df["Prediction"]]
+  with open('${name_mapping}', 'r') as f:
+      name_mapping_dict = json.load(f)
+  y_true = df_comparison_mismatch["Species ID"]
+  y_pred = df_comparison_mismatch["Prediction"]
+  labels = list(set(y_true) | set(y_pred))
+  labels = sorted(labels, key=lambda x: name_mapping_dict.get(x, x))
+  display_labels = [name_mapping_dict.get(label, label) for label in labels]
+  cm = confusion_matrix(y_true, y_pred, labels=labels)
+  plt.figure(figsize=(30, 30))
+  plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
+  cbar = plt.colorbar()
+  cbar.ax.tick_params(labelsize=20)
+  plt.xticks(ticks=np.arange(len(labels)), labels=display_labels, rotation=90, fontsize=16)
+  plt.yticks(ticks=np.arange(len(labels)), labels=display_labels, fontsize=16)
+  thresh = cm.max() / 2.
+  for i in range(cm.shape[0]):
+      for j in range(cm.shape[1]):
+          plt.text(j, i, format(cm[i, j], 'd'),  # 'd' ensures integer formatting
+                  horizontalalignment="center",
+                  color="white" if cm[i, j] > thresh else "black",
+                  fontsize=14)
+  plt.title('Mismatches Confusion Matrix', fontsize=30)
+  plt.xlabel('Predicted Labels', fontsize=24)
+  plt.ylabel('True Labels', fontsize=24)
+  plt.savefig('mismatches_confusion_matrix.png', dpi=300, bbox_inches='tight')
+  """
+}
+workflow {
+  species_model = downloadModels()
+  name_mapping = getNameMapping(species_model)
+  genomes = file("data/genomes")
+  tax_report = file("data/aci_species.json")
+  assemblies = createAssemblyTable(genomes, tax_report, species_model)
+  // Whole genome assemblies
+  samples = Channel.fromPath("${genomes}/**/*.fna")
+    .flatten()
+  filtered_samples = assemblies
+    .splitCsv(header: true, sep: '\t')
+    .map { row -> row['Assembly Accession'] }
+    .cross(samples.map { sample ->
+      [sample.baseName.split('_')[0..1].join('_'), sample]
+    })
+    .map { it[1][1] }
+  classifications = classifyAssembly(filtered_samples)
+  summarizeClassifications(assemblies, classifications.collect())
+  confusionMatrix(summarizeClassifications.out, name_mapping)
+  mismatchConfusionMatrix(summarizeClassifications.out, name_mapping)
+  // Simulated reads
+  selectForReadGen(assemblies, species_model)
+  read_assemblies = selectForReadGen.out
+    .splitCsv(header: true, sep: '\t')
+    .map { row -> row['Assembly Accession'] }
+    .cross(samples.map { sample ->
+      [sample.baseName.split('_')[0..1].join('_'), sample]
+    })
+    .map { it[1][1] }
+  generateReads(read_assemblies)
+  read_classifications = classifyRead(generateReads.out)
+  summarizeReadClassifications(selectForReadGen.out, read_classifications.collect())
+  calculateStats(summarizeClassifications.out, summarizeReadClassifications.out)
+  }

xspect-0.5.4/scripts/benchmark/nextflow.config ADDED Viewed

@@ -0,0 +1,7 @@
+process.executor = 'slurm'
+executor.account = 'intern'
+process.queue = 'all'
+executor.perCpuMemAllocation = true
+conda.enabled = true

xspect-0.5.4/scripts/benchmark-data/download_data.slurm ADDED Viewed

@@ -0,0 +1,13 @@
+#!/bin/bash
+#SBATCH --partition=all
+#SBATCH --account=intern
+#SBATCH --cpus-per-task=4
+#SBATCH --mem-per-cpu=8gb
+#SBATCH --job-name="download_acinetobacter"
+datasets download genome taxon 469 --filename acinetobacter_dataset.zip --assembly-source GenBank --assembly-version latest --exclude-atypical --dehydrated
+unzip -o acinetobacter_dataset.zip -d genomes
+datasets rehydrate --directory genomes
+rm acinetobacter_dataset.zip
+datasets summary taxonomy taxon 469 --rank species --children > aci_species.json

{xspect-0.5.2 → xspect-0.5.4/src/XspecT.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: XspecT
-Version: 0.5.2
+Version: 0.5.4
 Summary: Tool to monitor and characterize pathogens using Bloom filters.
 License: MIT License

{xspect-0.5.2 → xspect-0.5.4}/src/XspecT.egg-info/SOURCES.txt RENAMED Viewed

@@ -8,12 +8,18 @@ pyproject.toml
 .github/workflows/pylint.yml
 .github/workflows/pypi.yml
 .github/workflows/test.yml
+docs/benchmark.md
 docs/cli.md
 docs/contributing.md
 docs/index.md
 docs/quickstart.md
 docs/understanding.md
 docs/web.md
+scripts/benchmark/environment.yml
+scripts/benchmark/main.nf
+scripts/benchmark/nextflow.config
+scripts/benchmark-data/download_data.slurm
+scripts/benchmark/classify/main.nf
 src/XspecT.egg-info/PKG-INFO
 src/XspecT.egg-info/SOURCES.txt
 src/XspecT.egg-info/dependency_links.txt

XspecT 0.5.2__tar.gz → 0.5.4__tar.gz

XspecT 0.5.2tar.gz → 0.5.4tar.gz