XspecT 0.1.3__tar.gz → 0.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- xspect-0.2.2/.github/workflows/pylint.yml +13 -0
- xspect-0.2.2/.github/workflows/pypi.yml +61 -0
- {XspecT-0.1.3 → xspect-0.2.2}/.gitignore +5 -1
- {XspecT-0.1.3 → xspect-0.2.2}/PKG-INFO +23 -30
- xspect-0.2.2/README.md +59 -0
- xspect-0.2.2/docs/cli.md +114 -0
- xspect-0.2.2/docs/diagrams/probabilistic_filter_models.md +52 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/index.md +4 -0
- xspect-0.2.2/docs/input_data.md +4 -0
- xspect-0.2.2/docs/installation.md +35 -0
- xspect-0.2.2/docs/web.md +3 -0
- {XspecT-0.1.3 → xspect-0.2.2}/pyproject.toml +8 -15
- {XspecT-0.1.3 → xspect-0.2.2}/src/XspecT.egg-info/PKG-INFO +23 -30
- {XspecT-0.1.3 → xspect-0.2.2}/src/XspecT.egg-info/SOURCES.txt +25 -44
- {XspecT-0.1.3 → xspect-0.2.2}/src/XspecT.egg-info/requires.txt +7 -14
- xspect-0.2.2/src/xspect/definitions.py +42 -0
- xspect-0.2.2/src/xspect/download_filters.py +33 -0
- xspect-0.2.2/src/xspect/fastapi.py +105 -0
- xspect-0.2.2/src/xspect/file_io.py +87 -0
- xspect-0.2.2/src/xspect/main.py +123 -0
- xspect-0.2.2/src/xspect/model_management.py +88 -0
- xspect-0.2.2/src/xspect/models/probabilistic_filter_model.py +277 -0
- xspect-0.2.2/src/xspect/models/probabilistic_filter_svm_model.py +169 -0
- xspect-0.2.2/src/xspect/models/probabilistic_single_filter_model.py +109 -0
- xspect-0.2.2/src/xspect/models/result.py +148 -0
- xspect-0.2.2/src/xspect/pipeline.py +201 -0
- xspect-0.2.2/src/xspect/run.py +38 -0
- xspect-0.2.2/src/xspect/train.py +304 -0
- xspect-0.2.2/src/xspect/train_filter/create_svm.py +45 -0
- xspect-0.2.2/src/xspect/train_filter/extract_and_concatenate.py +124 -0
- {XspecT-0.1.3 → xspect-0.2.2}/src/xspect/train_filter/html_scrap.py +16 -28
- {XspecT-0.1.3 → xspect-0.2.2}/src/xspect/train_filter/ncbi_api/download_assemblies.py +7 -8
- {XspecT-0.1.3 → xspect-0.2.2}/src/xspect/train_filter/ncbi_api/ncbi_assembly_metadata.py +9 -17
- {XspecT-0.1.3 → xspect-0.2.2}/src/xspect/train_filter/ncbi_api/ncbi_children_tree.py +3 -2
- {XspecT-0.1.3 → xspect-0.2.2}/src/xspect/train_filter/ncbi_api/ncbi_taxon_metadata.py +7 -5
- xspect-0.2.2/tests/__init__.py +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/tests/conftest.py +54 -1
- xspect-0.2.2/tests/test_file_io.py +56 -0
- xspect-0.2.2/tests/test_model_management.py +45 -0
- xspect-0.2.2/tests/test_model_result.py +25 -0
- xspect-0.2.2/tests/test_pipeline.py +26 -0
- xspect-0.2.2/tests/test_probabilistic_filter_model.py +161 -0
- xspect-0.2.2/tests/test_probabilistic_filter_svm_model.py +94 -0
- xspect-0.2.2/tests/test_probabilistic_single_filter_model.py +56 -0
- xspect-0.2.2/tests/test_train.py +15 -0
- XspecT-0.1.3/README.md +0 -59
- XspecT-0.1.3/data/Results/Readme.txt +0 -54
- XspecT-0.1.3/data/Results/Strain-Typing/Assemblys as reference, all k-mers.csv +0 -3079
- XspecT-0.1.3/data/Results/Strain-Typing/Assemblys as reference, quick run.csv +0 -3241
- XspecT-0.1.3/data/Results/Strain-Typing/Core-Genome as Reference, quick run.csv +0 -3079
- XspecT-0.1.3/data/Results/Strain-Typing/Core-Genome as reference, all k-mers.csv +0 -3079
- XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Assembly as Reference, 15% of the nucleotides changed.csv +0 -300
- XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Assembly as Reference.csv +0 -300
- XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Core-Genome as Reference, 15% of the nucleotides changed.csv +0 -300
- XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Core-Genome as Reference.csv +0 -300
- XspecT-0.1.3/data/Results/Strain-Typing/StratifiedKFold.py +0 -495
- XspecT-0.1.3/data/Results/Strain-Typing/genomes_and_labels.csv +0 -2709
- XspecT-0.1.3/data/Results/XspecT_mini_csv/README +0 -2
- XspecT-0.1.3/src/xspect/BF_v2.py +0 -637
- XspecT-0.1.3/src/xspect/Bootstrap.py +0 -29
- XspecT-0.1.3/src/xspect/Classifier.py +0 -142
- XspecT-0.1.3/src/xspect/OXA_Table.py +0 -53
- XspecT-0.1.3/src/xspect/WebApp.py +0 -724
- XspecT-0.1.3/src/xspect/XspecT_mini.py +0 -1363
- XspecT-0.1.3/src/xspect/XspecT_trainer.py +0 -611
- XspecT-0.1.3/src/xspect/download_filters.py +0 -48
- XspecT-0.1.3/src/xspect/file_io.py +0 -156
- XspecT-0.1.3/src/xspect/main.py +0 -119
- XspecT-0.1.3/src/xspect/map_kmers.py +0 -155
- XspecT-0.1.3/src/xspect/search_filter.py +0 -504
- XspecT-0.1.3/src/xspect/static/How-To.png +0 -0
- XspecT-0.1.3/src/xspect/static/Logo.png +0 -0
- XspecT-0.1.3/src/xspect/static/Logo2.png +0 -0
- XspecT-0.1.3/src/xspect/static/Workflow_AspecT.png +0 -0
- XspecT-0.1.3/src/xspect/static/Workflow_ClAssT.png +0 -0
- XspecT-0.1.3/src/xspect/static/js.js +0 -615
- XspecT-0.1.3/src/xspect/static/main.css +0 -280
- XspecT-0.1.3/src/xspect/templates/400.html +0 -64
- XspecT-0.1.3/src/xspect/templates/401.html +0 -62
- XspecT-0.1.3/src/xspect/templates/404.html +0 -62
- XspecT-0.1.3/src/xspect/templates/500.html +0 -62
- XspecT-0.1.3/src/xspect/templates/about.html +0 -544
- XspecT-0.1.3/src/xspect/templates/home.html +0 -51
- XspecT-0.1.3/src/xspect/templates/layoutabout.html +0 -87
- XspecT-0.1.3/src/xspect/templates/layouthome.html +0 -63
- XspecT-0.1.3/src/xspect/templates/layoutspecies.html +0 -468
- XspecT-0.1.3/src/xspect/templates/species.html +0 -33
- XspecT-0.1.3/src/xspect/train_filter/README_XspecT_Erweiterung.md +0 -119
- XspecT-0.1.3/src/xspect/train_filter/create_svm.py +0 -222
- XspecT-0.1.3/src/xspect/train_filter/extract_and_concatenate.py +0 -128
- XspecT-0.1.3/src/xspect/train_filter/get_paths.py +0 -35
- XspecT-0.1.3/src/xspect/train_filter/interface_XspecT.py +0 -204
- XspecT-0.1.3/src/xspect/train_filter/k_mer_count.py +0 -162
- XspecT-0.1.3/tests/test_cli.py +0 -82
- XspecT-0.1.3/tests/test_file_io.py +0 -145
- XspecT-0.1.3/tests/test_flask.py +0 -151
- {XspecT-0.1.3 → xspect-0.2.2}/.github/workflows/black.yml +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/.github/workflows/docs.yml +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/.github/workflows/test.yml +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/LICENSE +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/About.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/AddFilter.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/AddSpecies1.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/AddSpecies2.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/BF.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/ClAssT_Ergebnis1.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/ClAssT_Ergebnis2.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/ClAssT_Ergebnis3.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/ClAssT_Hauptseite.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/CommandLine_Input.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/CommandLine_results.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/CommandLine_whole.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/How2Use.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/HowtouseAspecT.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Ergebnis1.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Ergebnis2.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Ergebnis3.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Ergebnis4.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Hauptseite.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Runtime.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Runtime_Oxa.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/XspecT_Startseite.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/change_pw.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/modify_vecs.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Instructions/pictures/secretkey.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/Makefile +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/conf.py +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/img/logo.png +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/make.bat +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/docs/quickstart.md +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/setup.cfg +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/src/XspecT.egg-info/dependency_links.txt +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/src/XspecT.egg-info/entry_points.txt +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/src/XspecT.egg-info/top_level.txt +0 -0
- {XspecT-0.1.3 → xspect-0.2.2}/src/xspect/__init__.py +0 -0
- {XspecT-0.1.3/src/xspect/train_filter → xspect-0.2.2/src/xspect/models}/__init__.py +0 -0
- {XspecT-0.1.3/src/xspect/train_filter/ncbi_api → xspect-0.2.2/src/xspect/train_filter}/__init__.py +0 -0
- {XspecT-0.1.3/tests → xspect-0.2.2/src/xspect/train_filter/ncbi_api}/__init__.py +0 -0
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
name: pylint
|
|
2
|
+
on: [push]
|
|
3
|
+
jobs:
|
|
4
|
+
black:
|
|
5
|
+
runs-on: ubuntu-latest
|
|
6
|
+
steps:
|
|
7
|
+
- uses: actions/checkout@v4
|
|
8
|
+
- name: Pylint Code Linter
|
|
9
|
+
run: |
|
|
10
|
+
python -m pip install --upgrade pip
|
|
11
|
+
pip install .
|
|
12
|
+
pip install pylint
|
|
13
|
+
pylint --fail-under=9.0 src/xspect
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
workflow_dispatch:
|
|
5
|
+
release:
|
|
6
|
+
types:
|
|
7
|
+
- published
|
|
8
|
+
jobs:
|
|
9
|
+
release-build:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
|
|
12
|
+
steps:
|
|
13
|
+
- uses: actions/checkout@v4
|
|
14
|
+
|
|
15
|
+
- uses: actions/setup-python@v5
|
|
16
|
+
with:
|
|
17
|
+
python-version: "3.x"
|
|
18
|
+
|
|
19
|
+
- name: Build release distributions
|
|
20
|
+
run: |
|
|
21
|
+
python -m pip install build
|
|
22
|
+
python -m build
|
|
23
|
+
|
|
24
|
+
- name: Upload distributions
|
|
25
|
+
uses: actions/upload-artifact@v4
|
|
26
|
+
with:
|
|
27
|
+
name: release-dists
|
|
28
|
+
path: dist/
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
upload_testpypi:
|
|
32
|
+
needs: [release-build]
|
|
33
|
+
runs-on: ubuntu-latest
|
|
34
|
+
environment: testpypi
|
|
35
|
+
permissions:
|
|
36
|
+
id-token: write
|
|
37
|
+
if: github.event_name == 'release' && github.event.action == 'published' && github.event.release.prerelease
|
|
38
|
+
steps:
|
|
39
|
+
- uses: actions/download-artifact@v4
|
|
40
|
+
with:
|
|
41
|
+
name: release-dists
|
|
42
|
+
path: dist/
|
|
43
|
+
|
|
44
|
+
- uses: pypa/gh-action-pypi-publish@release/v1
|
|
45
|
+
with:
|
|
46
|
+
repository-url: https://test.pypi.org/legacy/
|
|
47
|
+
|
|
48
|
+
upload_pypi:
|
|
49
|
+
needs: [release-build]
|
|
50
|
+
runs-on: ubuntu-latest
|
|
51
|
+
environment: pypi
|
|
52
|
+
permissions:
|
|
53
|
+
id-token: write
|
|
54
|
+
if: github.event_name == 'release' && github.event.action == 'published' && !github.event.release.prerelease
|
|
55
|
+
steps:
|
|
56
|
+
- uses: actions/download-artifact@v4
|
|
57
|
+
with:
|
|
58
|
+
name: release-dists
|
|
59
|
+
path: dist/
|
|
60
|
+
|
|
61
|
+
- uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: XspecT
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.2.2
|
|
4
4
|
Summary: Tool to monitor and characterize pathogens using Bloom filters.
|
|
5
5
|
License: MIT License
|
|
6
6
|
|
|
@@ -32,26 +32,19 @@ Classifier: License :: OSI Approved :: MIT License
|
|
|
32
32
|
Requires-Python: >=3.10
|
|
33
33
|
Description-Content-Type: text/markdown
|
|
34
34
|
License-File: LICENSE
|
|
35
|
-
Requires-Dist: Flask
|
|
36
|
-
Requires-Dist: Flask-WTF
|
|
37
|
-
Requires-Dist: WTForms
|
|
38
|
-
Requires-Dist: Werkzeug
|
|
39
35
|
Requires-Dist: biopython
|
|
40
|
-
Requires-Dist: bitarray
|
|
41
|
-
Requires-Dist: mmh3
|
|
42
|
-
Requires-Dist: numpy
|
|
43
|
-
Requires-Dist: pandas
|
|
44
36
|
Requires-Dist: requests
|
|
45
37
|
Requires-Dist: scikit-learn
|
|
46
|
-
Requires-Dist: Psutil
|
|
47
|
-
Requires-Dist: Matplotlib
|
|
48
|
-
Requires-Dist: Pympler
|
|
49
|
-
Requires-Dist: H5py
|
|
50
38
|
Requires-Dist: Bio
|
|
51
|
-
Requires-Dist: wheel
|
|
52
39
|
Requires-Dist: loguru
|
|
53
|
-
Requires-Dist: Pympler
|
|
54
40
|
Requires-Dist: click
|
|
41
|
+
Requires-Dist: python-slugify
|
|
42
|
+
Requires-Dist: cobs-reloaded
|
|
43
|
+
Requires-Dist: rbloom
|
|
44
|
+
Requires-Dist: xxhash
|
|
45
|
+
Requires-Dist: fastapi
|
|
46
|
+
Requires-Dist: uvicorn
|
|
47
|
+
Requires-Dist: python-multipart
|
|
55
48
|
Provides-Extra: docs
|
|
56
49
|
Requires-Dist: sphinx; extra == "docs"
|
|
57
50
|
Requires-Dist: furo; extra == "docs"
|
|
@@ -63,9 +56,13 @@ Requires-Dist: pytest; extra == "test"
|
|
|
63
56
|
Requires-Dist: pytest-cov; extra == "test"
|
|
64
57
|
|
|
65
58
|
# XspecT - Acinetobacter Species Assignment Tool
|
|
66
|
-
|
|
59
|
+

|
|
60
|
+
[](https://github.com/pylint-dev/pylint)
|
|
61
|
+
[](https://github.com/psf/black)
|
|
62
|
+
|
|
63
|
+
<img src="/src/docs/img/logo.png" height="50%" width="50%">
|
|
67
64
|
<!-- start intro -->
|
|
68
|
-
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters]
|
|
65
|
+
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
|
|
69
66
|
<br/><br/>
|
|
70
67
|
|
|
71
68
|
XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
|
|
@@ -75,6 +72,10 @@ Local extensions of the reference database are supported.
|
|
|
75
72
|
<br/>
|
|
76
73
|
|
|
77
74
|
The tool is available as a web-based application and a smaller command line interface.
|
|
75
|
+
|
|
76
|
+
[Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
|
|
77
|
+
[Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
|
|
78
|
+
[blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
|
|
78
79
|
<!-- end intro -->
|
|
79
80
|
|
|
80
81
|
<!-- start quickstart -->
|
|
@@ -83,11 +84,7 @@ To install Xspect, please download the lastest 64 bit Python version and install
|
|
|
83
84
|
```
|
|
84
85
|
pip install xspect
|
|
85
86
|
```
|
|
86
|
-
|
|
87
|
-
```
|
|
88
|
-
conda install -c bioconda jellyfish
|
|
89
|
-
```
|
|
90
|
-
On Apple Silicon, it is possible that this command installs an incorrect Jellyfish package. Please refer to the official [Jellyfish project](https://github.com/gmarcais/Jellyfish) for installation guidance.
|
|
87
|
+
Please note that Apple Silicon is currently not supported.
|
|
91
88
|
|
|
92
89
|
## Usage
|
|
93
90
|
### Get the Bloomfilters
|
|
@@ -101,9 +98,9 @@ xspect train you-ncbi-genus-name
|
|
|
101
98
|
```
|
|
102
99
|
|
|
103
100
|
### How to run the web app
|
|
104
|
-
|
|
101
|
+
To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
|
|
105
102
|
```
|
|
106
|
-
xspect
|
|
103
|
+
xspect api
|
|
107
104
|
```
|
|
108
105
|
|
|
109
106
|
### How to use the XspecT command line interface
|
|
@@ -111,13 +108,9 @@ Run xspect with the configuration you want to run it with as arguments.
|
|
|
111
108
|
```
|
|
112
109
|
xspect classify your-genus path/to/your/input-set
|
|
113
110
|
```
|
|
114
|
-
For further instructions on how to use the command line interface, execute:
|
|
111
|
+
For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
|
|
115
112
|
```
|
|
116
113
|
xspect --help
|
|
117
114
|
```
|
|
115
|
+
[documentation]: https://bionf.github.io/XspecT2/cli.html
|
|
118
116
|
<!-- end quickstart -->
|
|
119
|
-
|
|
120
|
-
## Input Data
|
|
121
|
-
XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
|
|
122
|
-
|
|
123
|
-
The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
|
xspect-0.2.2/README.md
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# XspecT - Acinetobacter Species Assignment Tool
|
|
2
|
+

|
|
3
|
+
[](https://github.com/pylint-dev/pylint)
|
|
4
|
+
[](https://github.com/psf/black)
|
|
5
|
+
|
|
6
|
+
<img src="/src/docs/img/logo.png" height="50%" width="50%">
|
|
7
|
+
<!-- start intro -->
|
|
8
|
+
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
|
|
9
|
+
<br/><br/>
|
|
10
|
+
|
|
11
|
+
XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
|
|
12
|
+
<br/>
|
|
13
|
+
|
|
14
|
+
Local extensions of the reference database are supported.
|
|
15
|
+
<br/>
|
|
16
|
+
|
|
17
|
+
The tool is available as a web-based application and a smaller command line interface.
|
|
18
|
+
|
|
19
|
+
[Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
|
|
20
|
+
[Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
|
|
21
|
+
[blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
|
|
22
|
+
<!-- end intro -->
|
|
23
|
+
|
|
24
|
+
<!-- start quickstart -->
|
|
25
|
+
## Installation
|
|
26
|
+
To install Xspect, please download the lastest 64 bit Python version and install the package using pip:
|
|
27
|
+
```
|
|
28
|
+
pip install xspect
|
|
29
|
+
```
|
|
30
|
+
Please note that Apple Silicon is currently not supported.
|
|
31
|
+
|
|
32
|
+
## Usage
|
|
33
|
+
### Get the Bloomfilters
|
|
34
|
+
To download basic pre-trained filters, you can use the built-in command:
|
|
35
|
+
```
|
|
36
|
+
xspect download-filters
|
|
37
|
+
```
|
|
38
|
+
Additional species filters can be trained using:
|
|
39
|
+
```
|
|
40
|
+
xspect train you-ncbi-genus-name
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
### How to run the web app
|
|
44
|
+
To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
|
|
45
|
+
```
|
|
46
|
+
xspect api
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### How to use the XspecT command line interface
|
|
50
|
+
Run xspect with the configuration you want to run it with as arguments.
|
|
51
|
+
```
|
|
52
|
+
xspect classify your-genus path/to/your/input-set
|
|
53
|
+
```
|
|
54
|
+
For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
|
|
55
|
+
```
|
|
56
|
+
xspect --help
|
|
57
|
+
```
|
|
58
|
+
[documentation]: https://bionf.github.io/XspecT2/cli.html
|
|
59
|
+
<!-- end quickstart -->
|
xspect-0.2.2/docs/cli.md
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# How to use the CLI
|
|
2
|
+
|
|
3
|
+
XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train filters.
|
|
4
|
+
|
|
5
|
+
After installing XspecT, a list of available commands can be viewed by running:
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
xspect --help
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## Filter downloads
|
|
12
|
+
|
|
13
|
+
A basic set of pre-trained filters (Acinetobacter and Salonella) can be downloaded using the following command:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
xspect download-filters
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
For the moment, it is not possible to specify exactly which filters should be downloaded.
|
|
20
|
+
|
|
21
|
+
## Classification
|
|
22
|
+
|
|
23
|
+
To classify samples, the command
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
xspect classify GENUS PATH
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
can be used, when `GENUS` refers to the NCBI genus name of your sample and `PATH` refers to the path to your sample *directory*. This command will classify the species of your sample within the given genus.
|
|
30
|
+
|
|
31
|
+
The following options are available:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
-s, --species / --no-species Species classification.
|
|
35
|
+
-i, --ic / --no-ic IC strain typing.
|
|
36
|
+
-o, --oxa / --no-oxa OXA gene family detection.
|
|
37
|
+
-m, --meta / --no-meta Metagenome classification.
|
|
38
|
+
-c, --complete Use every single k-mer as input for
|
|
39
|
+
classification.
|
|
40
|
+
-s, --save Save results to csv file.
|
|
41
|
+
--help Show this message and exit.
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### Species Classification
|
|
45
|
+
|
|
46
|
+
Species classification is run by default, without the need for further parameters:
|
|
47
|
+
```bash
|
|
48
|
+
xspect classify Acinetobacter path
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Species classification can be toggled using the `-s`/`--species` (`--no-species`) option. To run classification without species classification, the option `--no-species` can be used, for example when running a different analysis:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
xspect classify --no-species -i Acinetobacter path
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### IC Strain Typing
|
|
58
|
+
|
|
59
|
+
To perform International Clonal (IC) type classification, the `-i`/`--ic` (`--no-ic`) option can be used:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
xspect classify -i Acinetobacter path
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Please note that IC strain typing is only available for Acinetobacter baumanii.
|
|
66
|
+
|
|
67
|
+
### OXA Gene Detection
|
|
68
|
+
|
|
69
|
+
OXA gene detection can be enabled using the `-o`/`--oxa` (`--no-oxa`) option.
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
xspect classify -o Acinetobacter path
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Metagenome Mode
|
|
76
|
+
|
|
77
|
+
To analyze a sample in metagenome mode, the `-m`/`--meta` (`--no-meta`) option can be used:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
xspect classify -m Acinetobacter path
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Compared to normal XspecT modes, this mode first identifies reads belonging to the given genus and continues classification only with the resulting reads and is thus more suitable for metagenomic samples as the resulting runtime is decreased.
|
|
84
|
+
|
|
85
|
+
## Filter Training
|
|
86
|
+
|
|
87
|
+
<aside>
|
|
88
|
+
⚠️ Depending on genome size and the amount of species, training can take time!
|
|
89
|
+
|
|
90
|
+
</aside>
|
|
91
|
+
|
|
92
|
+
In order to train filters, please first ensure [Jellyfish](https://github.com/gmarcais/Jellyfish) is installed.
|
|
93
|
+
|
|
94
|
+
### NCBI-based filter training
|
|
95
|
+
|
|
96
|
+
The easiest way to train new filters is to use data from NCBI, which is automatically downloaded and processed by XspecT.
|
|
97
|
+
|
|
98
|
+
To train a filter with data from NCBI, run the following command:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
xspect train your-ncbi-genus
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
`you-ncbi-genus` can be a genus name from NCBI or an NCBI taxonomy ID.
|
|
105
|
+
|
|
106
|
+
### Custom data filter training
|
|
107
|
+
|
|
108
|
+
XspecT filters can also be trained using custom data, which need to be provided as a folder for both filter and SVM training. The provided assembly files need to be in FASTA format and their names should be the species ID and the species name, for example `28901_enterica.fasta`. While the ID can be arbitrary, the standard is NCBI taxon IDs.
|
|
109
|
+
|
|
110
|
+
The filters can then be trained using:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
xspect train -bf-path directory/1 -svm-path directory/2
|
|
114
|
+
```
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
:::mermaid
|
|
2
|
+
classDiagram
|
|
3
|
+
ProbabilisticFilterModel <|-- ProbabilisticFilterSVMModel
|
|
4
|
+
ProbabilisticFilterModel <|-- ProbabilisticSingleFilterModel
|
|
5
|
+
ProbabilisticFilterModel : String filter_display_name
|
|
6
|
+
ProbabilisticFilterModel : String author
|
|
7
|
+
ProbabilisticFilterModel : String author_email
|
|
8
|
+
ProbabilisticFilterModel : String filter_type
|
|
9
|
+
ProbabilisticFilterModel : Path base_path
|
|
10
|
+
ProbabilisticFilterModel : dict display_names
|
|
11
|
+
ProbabilisticFilterModel : int k
|
|
12
|
+
ProbabilisticFilterModel : float fpr
|
|
13
|
+
ProbabilisticFilterModel : int num_hashes
|
|
14
|
+
ProbabilisticFilterModel : dict filters
|
|
15
|
+
ProbabilisticFilterModel : dict non_distinct_kmer_counts
|
|
16
|
+
ProbabilisticFilterModel : __init__(k, filter_display_name, author, author_email, filter_type, base_path, fpr, num_hashes)
|
|
17
|
+
ProbabilisticFilterModel : __dict__()
|
|
18
|
+
ProbabilisticFilterModel : to_dict()
|
|
19
|
+
ProbabilisticFilterModel : slug()
|
|
20
|
+
ProbabilisticFilterModel : fit(dir_path, display_names)
|
|
21
|
+
ProbabilisticFilterModel: calculate_hits(sequence, filter_ids)
|
|
22
|
+
ProbabilisticFilterModel : predict(sequence_input, filter_ids)
|
|
23
|
+
ProbabilisticFilterModel : filter(sequences, threshold, filter_ids)
|
|
24
|
+
ProbabilisticFilterModel : save()
|
|
25
|
+
ProbabilisticFilterModel : load(path)
|
|
26
|
+
ProbabilisticFilterModel : _convert_cobs_result_to_dict(cobs_result)
|
|
27
|
+
ProbabilisticFilterModel : _count_kmers(sequence_input)
|
|
28
|
+
ProbabilisticFilterModel : _get_cobs_index_path()
|
|
29
|
+
|
|
30
|
+
class ProbabilisticFilterSVMModel {
|
|
31
|
+
SVM svm
|
|
32
|
+
String svm_kernel
|
|
33
|
+
float svm_c
|
|
34
|
+
__init__(..., kernel, c)
|
|
35
|
+
to_dict()
|
|
36
|
+
set_svm_params(kernel, c)
|
|
37
|
+
fit(dir_path, svm_path, display_names)
|
|
38
|
+
predict(sequence_input, filter_ids) return_by_display_names)
|
|
39
|
+
load(path)
|
|
40
|
+
_get_svm(id_keys)
|
|
41
|
+
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
class ProbabilisticSingleFilterModel {
|
|
45
|
+
Bloom filter
|
|
46
|
+
__init__(...)
|
|
47
|
+
fit(file_path, display_name)
|
|
48
|
+
calculate_hits(sequence)
|
|
49
|
+
load(path)
|
|
50
|
+
_generate_kmers(sequence)
|
|
51
|
+
}
|
|
52
|
+
:::
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
# Input Data
|
|
2
|
+
XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
|
|
3
|
+
|
|
4
|
+
The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Installation
|
|
2
|
+
|
|
3
|
+
## Installation using PyPi
|
|
4
|
+
|
|
5
|
+
The easiest way to install XspecT is to use PyPi on the latest version of Python:
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pip install xspect
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
Check if the installation was successful:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
xspect --version
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## Manual Installation
|
|
18
|
+
|
|
19
|
+
If you would like to manually install XspecT, clone the Github repository to a local working directory. You can now install XspecT by running:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
pip install .
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
For development purposes, it is recommended to install the package in edit mode:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
pip install -e '.[all]'
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Check if the installation was successful:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
xspect --version
|
|
35
|
+
```
|
xspect-0.2.2/docs/web.md
ADDED
|
@@ -1,31 +1,24 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "XspecT"
|
|
3
|
-
version = "0.
|
|
3
|
+
version = "0.2.2"
|
|
4
4
|
description = "Tool to monitor and characterize pathogens using Bloom filters."
|
|
5
5
|
readme = {file = "README.md", content-type = "text/markdown"}
|
|
6
6
|
license = {file = "LICENSE"}
|
|
7
7
|
requires-python = ">=3.10"
|
|
8
8
|
dependencies = [
|
|
9
|
-
"Flask",
|
|
10
|
-
"Flask-WTF",
|
|
11
|
-
"WTForms",
|
|
12
|
-
"Werkzeug",
|
|
13
9
|
"biopython",
|
|
14
|
-
"bitarray",
|
|
15
|
-
"mmh3",
|
|
16
|
-
"numpy",
|
|
17
|
-
"pandas",
|
|
18
10
|
"requests",
|
|
19
11
|
"scikit-learn",
|
|
20
|
-
"Psutil",
|
|
21
|
-
"Matplotlib",
|
|
22
|
-
"Pympler",
|
|
23
|
-
"H5py",
|
|
24
12
|
"Bio",
|
|
25
|
-
"wheel",
|
|
26
13
|
"loguru",
|
|
27
|
-
"Pympler",
|
|
28
14
|
"click",
|
|
15
|
+
"python-slugify",
|
|
16
|
+
"cobs-reloaded",
|
|
17
|
+
"rbloom",
|
|
18
|
+
"xxhash",
|
|
19
|
+
"fastapi",
|
|
20
|
+
"uvicorn",
|
|
21
|
+
"python-multipart"
|
|
29
22
|
]
|
|
30
23
|
classifiers = [
|
|
31
24
|
"Intended Audience :: Developers",
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: XspecT
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.2.2
|
|
4
4
|
Summary: Tool to monitor and characterize pathogens using Bloom filters.
|
|
5
5
|
License: MIT License
|
|
6
6
|
|
|
@@ -32,26 +32,19 @@ Classifier: License :: OSI Approved :: MIT License
|
|
|
32
32
|
Requires-Python: >=3.10
|
|
33
33
|
Description-Content-Type: text/markdown
|
|
34
34
|
License-File: LICENSE
|
|
35
|
-
Requires-Dist: Flask
|
|
36
|
-
Requires-Dist: Flask-WTF
|
|
37
|
-
Requires-Dist: WTForms
|
|
38
|
-
Requires-Dist: Werkzeug
|
|
39
35
|
Requires-Dist: biopython
|
|
40
|
-
Requires-Dist: bitarray
|
|
41
|
-
Requires-Dist: mmh3
|
|
42
|
-
Requires-Dist: numpy
|
|
43
|
-
Requires-Dist: pandas
|
|
44
36
|
Requires-Dist: requests
|
|
45
37
|
Requires-Dist: scikit-learn
|
|
46
|
-
Requires-Dist: Psutil
|
|
47
|
-
Requires-Dist: Matplotlib
|
|
48
|
-
Requires-Dist: Pympler
|
|
49
|
-
Requires-Dist: H5py
|
|
50
38
|
Requires-Dist: Bio
|
|
51
|
-
Requires-Dist: wheel
|
|
52
39
|
Requires-Dist: loguru
|
|
53
|
-
Requires-Dist: Pympler
|
|
54
40
|
Requires-Dist: click
|
|
41
|
+
Requires-Dist: python-slugify
|
|
42
|
+
Requires-Dist: cobs-reloaded
|
|
43
|
+
Requires-Dist: rbloom
|
|
44
|
+
Requires-Dist: xxhash
|
|
45
|
+
Requires-Dist: fastapi
|
|
46
|
+
Requires-Dist: uvicorn
|
|
47
|
+
Requires-Dist: python-multipart
|
|
55
48
|
Provides-Extra: docs
|
|
56
49
|
Requires-Dist: sphinx; extra == "docs"
|
|
57
50
|
Requires-Dist: furo; extra == "docs"
|
|
@@ -63,9 +56,13 @@ Requires-Dist: pytest; extra == "test"
|
|
|
63
56
|
Requires-Dist: pytest-cov; extra == "test"
|
|
64
57
|
|
|
65
58
|
# XspecT - Acinetobacter Species Assignment Tool
|
|
66
|
-
|
|
59
|
+

|
|
60
|
+
[](https://github.com/pylint-dev/pylint)
|
|
61
|
+
[](https://github.com/psf/black)
|
|
62
|
+
|
|
63
|
+
<img src="/src/docs/img/logo.png" height="50%" width="50%">
|
|
67
64
|
<!-- start intro -->
|
|
68
|
-
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters]
|
|
65
|
+
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
|
|
69
66
|
<br/><br/>
|
|
70
67
|
|
|
71
68
|
XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
|
|
@@ -75,6 +72,10 @@ Local extensions of the reference database are supported.
|
|
|
75
72
|
<br/>
|
|
76
73
|
|
|
77
74
|
The tool is available as a web-based application and a smaller command line interface.
|
|
75
|
+
|
|
76
|
+
[Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
|
|
77
|
+
[Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
|
|
78
|
+
[blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
|
|
78
79
|
<!-- end intro -->
|
|
79
80
|
|
|
80
81
|
<!-- start quickstart -->
|
|
@@ -83,11 +84,7 @@ To install Xspect, please download the lastest 64 bit Python version and install
|
|
|
83
84
|
```
|
|
84
85
|
pip install xspect
|
|
85
86
|
```
|
|
86
|
-
|
|
87
|
-
```
|
|
88
|
-
conda install -c bioconda jellyfish
|
|
89
|
-
```
|
|
90
|
-
On Apple Silicon, it is possible that this command installs an incorrect Jellyfish package. Please refer to the official [Jellyfish project](https://github.com/gmarcais/Jellyfish) for installation guidance.
|
|
87
|
+
Please note that Apple Silicon is currently not supported.
|
|
91
88
|
|
|
92
89
|
## Usage
|
|
93
90
|
### Get the Bloomfilters
|
|
@@ -101,9 +98,9 @@ xspect train you-ncbi-genus-name
|
|
|
101
98
|
```
|
|
102
99
|
|
|
103
100
|
### How to run the web app
|
|
104
|
-
|
|
101
|
+
To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
|
|
105
102
|
```
|
|
106
|
-
xspect
|
|
103
|
+
xspect api
|
|
107
104
|
```
|
|
108
105
|
|
|
109
106
|
### How to use the XspecT command line interface
|
|
@@ -111,13 +108,9 @@ Run xspect with the configuration you want to run it with as arguments.
|
|
|
111
108
|
```
|
|
112
109
|
xspect classify your-genus path/to/your/input-set
|
|
113
110
|
```
|
|
114
|
-
For further instructions on how to use the command line interface, execute:
|
|
111
|
+
For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
|
|
115
112
|
```
|
|
116
113
|
xspect --help
|
|
117
114
|
```
|
|
115
|
+
[documentation]: https://bionf.github.io/XspecT2/cli.html
|
|
118
116
|
<!-- end quickstart -->
|
|
119
|
-
|
|
120
|
-
## Input Data
|
|
121
|
-
XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
|
|
122
|
-
|
|
123
|
-
The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
|