XspecT 0.1.3__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of XspecT might be problematic. Click here for more details.

Files changed (138) hide show
  1. xspect-0.2.0/.github/workflows/pylint.yml +13 -0
  2. xspect-0.2.0/.github/workflows/pypi.yml +61 -0
  3. {XspecT-0.1.3 → xspect-0.2.0}/.gitignore +5 -1
  4. {XspecT-0.1.3 → xspect-0.2.0}/PKG-INFO +23 -30
  5. xspect-0.2.0/README.md +59 -0
  6. xspect-0.2.0/docs/cli.md +114 -0
  7. xspect-0.2.0/docs/diagrams/probabilistic_filter_models.md +52 -0
  8. {XspecT-0.1.3 → xspect-0.2.0}/docs/index.md +4 -0
  9. xspect-0.2.0/docs/input_data.md +4 -0
  10. xspect-0.2.0/docs/installation.md +35 -0
  11. xspect-0.2.0/docs/web.md +3 -0
  12. {XspecT-0.1.3 → xspect-0.2.0}/pyproject.toml +8 -15
  13. {XspecT-0.1.3 → xspect-0.2.0}/src/XspecT.egg-info/PKG-INFO +23 -30
  14. {XspecT-0.1.3 → xspect-0.2.0}/src/XspecT.egg-info/SOURCES.txt +25 -44
  15. {XspecT-0.1.3 → xspect-0.2.0}/src/XspecT.egg-info/requires.txt +7 -14
  16. xspect-0.2.0/src/xspect/definitions.py +42 -0
  17. xspect-0.2.0/src/xspect/download_filters.py +33 -0
  18. xspect-0.2.0/src/xspect/fastapi.py +101 -0
  19. xspect-0.2.0/src/xspect/file_io.py +87 -0
  20. xspect-0.2.0/src/xspect/main.py +123 -0
  21. xspect-0.2.0/src/xspect/model_management.py +88 -0
  22. xspect-0.2.0/src/xspect/models/probabilistic_filter_model.py +277 -0
  23. xspect-0.2.0/src/xspect/models/probabilistic_filter_svm_model.py +169 -0
  24. xspect-0.2.0/src/xspect/models/probabilistic_single_filter_model.py +109 -0
  25. xspect-0.2.0/src/xspect/models/result.py +148 -0
  26. xspect-0.2.0/src/xspect/pipeline.py +201 -0
  27. xspect-0.2.0/src/xspect/run.py +38 -0
  28. xspect-0.2.0/src/xspect/train.py +304 -0
  29. xspect-0.2.0/src/xspect/train_filter/create_svm.py +45 -0
  30. xspect-0.2.0/src/xspect/train_filter/extract_and_concatenate.py +124 -0
  31. {XspecT-0.1.3 → xspect-0.2.0}/src/xspect/train_filter/html_scrap.py +16 -28
  32. {XspecT-0.1.3 → xspect-0.2.0}/src/xspect/train_filter/ncbi_api/download_assemblies.py +7 -8
  33. {XspecT-0.1.3 → xspect-0.2.0}/src/xspect/train_filter/ncbi_api/ncbi_assembly_metadata.py +9 -17
  34. {XspecT-0.1.3 → xspect-0.2.0}/src/xspect/train_filter/ncbi_api/ncbi_children_tree.py +3 -2
  35. {XspecT-0.1.3 → xspect-0.2.0}/src/xspect/train_filter/ncbi_api/ncbi_taxon_metadata.py +7 -5
  36. xspect-0.2.0/tests/__init__.py +0 -0
  37. {XspecT-0.1.3 → xspect-0.2.0}/tests/conftest.py +54 -1
  38. xspect-0.2.0/tests/test_file_io.py +56 -0
  39. xspect-0.2.0/tests/test_model_management.py +45 -0
  40. xspect-0.2.0/tests/test_model_result.py +25 -0
  41. xspect-0.2.0/tests/test_pipeline.py +26 -0
  42. xspect-0.2.0/tests/test_probabilistic_filter_model.py +161 -0
  43. xspect-0.2.0/tests/test_probabilistic_filter_svm_model.py +94 -0
  44. xspect-0.2.0/tests/test_probabilistic_single_filter_model.py +56 -0
  45. xspect-0.2.0/tests/test_train.py +15 -0
  46. XspecT-0.1.3/README.md +0 -59
  47. XspecT-0.1.3/data/Results/Readme.txt +0 -54
  48. XspecT-0.1.3/data/Results/Strain-Typing/Assemblys as reference, all k-mers.csv +0 -3079
  49. XspecT-0.1.3/data/Results/Strain-Typing/Assemblys as reference, quick run.csv +0 -3241
  50. XspecT-0.1.3/data/Results/Strain-Typing/Core-Genome as Reference, quick run.csv +0 -3079
  51. XspecT-0.1.3/data/Results/Strain-Typing/Core-Genome as reference, all k-mers.csv +0 -3079
  52. XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Assembly as Reference, 15% of the nucleotides changed.csv +0 -300
  53. XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Assembly as Reference.csv +0 -300
  54. XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Core-Genome as Reference, 15% of the nucleotides changed.csv +0 -300
  55. XspecT-0.1.3/data/Results/Strain-Typing/Reads as Input, Core-Genome as Reference.csv +0 -300
  56. XspecT-0.1.3/data/Results/Strain-Typing/StratifiedKFold.py +0 -495
  57. XspecT-0.1.3/data/Results/Strain-Typing/genomes_and_labels.csv +0 -2709
  58. XspecT-0.1.3/data/Results/XspecT_mini_csv/README +0 -2
  59. XspecT-0.1.3/src/xspect/BF_v2.py +0 -637
  60. XspecT-0.1.3/src/xspect/Bootstrap.py +0 -29
  61. XspecT-0.1.3/src/xspect/Classifier.py +0 -142
  62. XspecT-0.1.3/src/xspect/OXA_Table.py +0 -53
  63. XspecT-0.1.3/src/xspect/WebApp.py +0 -724
  64. XspecT-0.1.3/src/xspect/XspecT_mini.py +0 -1363
  65. XspecT-0.1.3/src/xspect/XspecT_trainer.py +0 -611
  66. XspecT-0.1.3/src/xspect/download_filters.py +0 -48
  67. XspecT-0.1.3/src/xspect/file_io.py +0 -156
  68. XspecT-0.1.3/src/xspect/main.py +0 -119
  69. XspecT-0.1.3/src/xspect/map_kmers.py +0 -155
  70. XspecT-0.1.3/src/xspect/search_filter.py +0 -504
  71. XspecT-0.1.3/src/xspect/static/How-To.png +0 -0
  72. XspecT-0.1.3/src/xspect/static/Logo.png +0 -0
  73. XspecT-0.1.3/src/xspect/static/Logo2.png +0 -0
  74. XspecT-0.1.3/src/xspect/static/Workflow_AspecT.png +0 -0
  75. XspecT-0.1.3/src/xspect/static/Workflow_ClAssT.png +0 -0
  76. XspecT-0.1.3/src/xspect/static/js.js +0 -615
  77. XspecT-0.1.3/src/xspect/static/main.css +0 -280
  78. XspecT-0.1.3/src/xspect/templates/400.html +0 -64
  79. XspecT-0.1.3/src/xspect/templates/401.html +0 -62
  80. XspecT-0.1.3/src/xspect/templates/404.html +0 -62
  81. XspecT-0.1.3/src/xspect/templates/500.html +0 -62
  82. XspecT-0.1.3/src/xspect/templates/about.html +0 -544
  83. XspecT-0.1.3/src/xspect/templates/home.html +0 -51
  84. XspecT-0.1.3/src/xspect/templates/layoutabout.html +0 -87
  85. XspecT-0.1.3/src/xspect/templates/layouthome.html +0 -63
  86. XspecT-0.1.3/src/xspect/templates/layoutspecies.html +0 -468
  87. XspecT-0.1.3/src/xspect/templates/species.html +0 -33
  88. XspecT-0.1.3/src/xspect/train_filter/README_XspecT_Erweiterung.md +0 -119
  89. XspecT-0.1.3/src/xspect/train_filter/create_svm.py +0 -222
  90. XspecT-0.1.3/src/xspect/train_filter/extract_and_concatenate.py +0 -128
  91. XspecT-0.1.3/src/xspect/train_filter/get_paths.py +0 -35
  92. XspecT-0.1.3/src/xspect/train_filter/interface_XspecT.py +0 -204
  93. XspecT-0.1.3/src/xspect/train_filter/k_mer_count.py +0 -162
  94. XspecT-0.1.3/tests/test_cli.py +0 -82
  95. XspecT-0.1.3/tests/test_file_io.py +0 -145
  96. XspecT-0.1.3/tests/test_flask.py +0 -151
  97. {XspecT-0.1.3 → xspect-0.2.0}/.github/workflows/black.yml +0 -0
  98. {XspecT-0.1.3 → xspect-0.2.0}/.github/workflows/docs.yml +0 -0
  99. {XspecT-0.1.3 → xspect-0.2.0}/.github/workflows/test.yml +0 -0
  100. {XspecT-0.1.3 → xspect-0.2.0}/LICENSE +0 -0
  101. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/About.png +0 -0
  102. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/AddFilter.png +0 -0
  103. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/AddSpecies1.png +0 -0
  104. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/AddSpecies2.png +0 -0
  105. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/BF.png +0 -0
  106. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/ClAssT_Ergebnis1.png +0 -0
  107. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/ClAssT_Ergebnis2.png +0 -0
  108. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/ClAssT_Ergebnis3.png +0 -0
  109. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/ClAssT_Hauptseite.png +0 -0
  110. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/CommandLine_Input.png +0 -0
  111. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/CommandLine_results.png +0 -0
  112. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/CommandLine_whole.png +0 -0
  113. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/How2Use.png +0 -0
  114. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/HowtouseAspecT.png +0 -0
  115. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Ergebnis1.png +0 -0
  116. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Ergebnis2.png +0 -0
  117. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Ergebnis3.png +0 -0
  118. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Ergebnis4.png +0 -0
  119. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Hauptseite.png +0 -0
  120. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Runtime.png +0 -0
  121. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Runtime_Oxa.png +0 -0
  122. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/XspecT_Startseite.png +0 -0
  123. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/change_pw.png +0 -0
  124. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/modify_vecs.png +0 -0
  125. {XspecT-0.1.3 → xspect-0.2.0}/docs/Instructions/pictures/secretkey.png +0 -0
  126. {XspecT-0.1.3 → xspect-0.2.0}/docs/Makefile +0 -0
  127. {XspecT-0.1.3 → xspect-0.2.0}/docs/conf.py +0 -0
  128. {XspecT-0.1.3 → xspect-0.2.0}/docs/img/logo.png +0 -0
  129. {XspecT-0.1.3 → xspect-0.2.0}/docs/make.bat +0 -0
  130. {XspecT-0.1.3 → xspect-0.2.0}/docs/quickstart.md +0 -0
  131. {XspecT-0.1.3 → xspect-0.2.0}/setup.cfg +0 -0
  132. {XspecT-0.1.3 → xspect-0.2.0}/src/XspecT.egg-info/dependency_links.txt +0 -0
  133. {XspecT-0.1.3 → xspect-0.2.0}/src/XspecT.egg-info/entry_points.txt +0 -0
  134. {XspecT-0.1.3 → xspect-0.2.0}/src/XspecT.egg-info/top_level.txt +0 -0
  135. {XspecT-0.1.3 → xspect-0.2.0}/src/xspect/__init__.py +0 -0
  136. {XspecT-0.1.3/src/xspect/train_filter → xspect-0.2.0/src/xspect/models}/__init__.py +0 -0
  137. {XspecT-0.1.3/src/xspect/train_filter/ncbi_api → xspect-0.2.0/src/xspect/train_filter}/__init__.py +0 -0
  138. {XspecT-0.1.3/tests → xspect-0.2.0/src/xspect/train_filter/ncbi_api}/__init__.py +0 -0
@@ -0,0 +1,13 @@
1
+ name: pylint
2
+ on: [push]
3
+ jobs:
4
+ black:
5
+ runs-on: ubuntu-latest
6
+ steps:
7
+ - uses: actions/checkout@v4
8
+ - name: Pylint Code Linter
9
+ run: |
10
+ python -m pip install --upgrade pip
11
+ pip install .
12
+ pip install pylint
13
+ pylint --fail-under=9.0 src/xspect
@@ -0,0 +1,61 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ release:
6
+ types:
7
+ - published
8
+ jobs:
9
+ release-build:
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - uses: actions/checkout@v4
14
+
15
+ - uses: actions/setup-python@v5
16
+ with:
17
+ python-version: "3.x"
18
+
19
+ - name: Build release distributions
20
+ run: |
21
+ python -m pip install build
22
+ python -m build
23
+
24
+ - name: Upload distributions
25
+ uses: actions/upload-artifact@v4
26
+ with:
27
+ name: release-dists
28
+ path: dist/
29
+
30
+
31
+ upload_testpypi:
32
+ needs: [release-build]
33
+ runs-on: ubuntu-latest
34
+ environment: testpypi
35
+ permissions:
36
+ id-token: write
37
+ if: github.event_name == 'release' && github.event.action == 'published' && github.event.release.prerelease
38
+ steps:
39
+ - uses: actions/download-artifact@v4
40
+ with:
41
+ name: release-dists
42
+ path: dist/
43
+
44
+ - uses: pypa/gh-action-pypi-publish@release/v1
45
+ with:
46
+ repository-url: https://test.pypi.org/legacy/
47
+
48
+ upload_pypi:
49
+ needs: [release-build]
50
+ runs-on: ubuntu-latest
51
+ environment: pypi
52
+ permissions:
53
+ id-token: write
54
+ if: github.event_name == 'release' && github.event.action == 'published' && !github.event.release.prerelease
55
+ steps:
56
+ - uses: actions/download-artifact@v4
57
+ with:
58
+ name: release-dists
59
+ path: dist/
60
+
61
+ - uses: pypa/gh-action-pypi-publish@release/v1
@@ -174,4 +174,8 @@ saved_options.txt
174
174
 
175
175
  files/
176
176
  out.gv
177
- out.png
177
+ out.png
178
+
179
+ xspect-data/
180
+
181
+ .devcontainer/
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: XspecT
3
- Version: 0.1.3
3
+ Version: 0.2.0
4
4
  Summary: Tool to monitor and characterize pathogens using Bloom filters.
5
5
  License: MIT License
6
6
 
@@ -32,26 +32,19 @@ Classifier: License :: OSI Approved :: MIT License
32
32
  Requires-Python: >=3.10
33
33
  Description-Content-Type: text/markdown
34
34
  License-File: LICENSE
35
- Requires-Dist: Flask
36
- Requires-Dist: Flask-WTF
37
- Requires-Dist: WTForms
38
- Requires-Dist: Werkzeug
39
35
  Requires-Dist: biopython
40
- Requires-Dist: bitarray
41
- Requires-Dist: mmh3
42
- Requires-Dist: numpy
43
- Requires-Dist: pandas
44
36
  Requires-Dist: requests
45
37
  Requires-Dist: scikit-learn
46
- Requires-Dist: Psutil
47
- Requires-Dist: Matplotlib
48
- Requires-Dist: Pympler
49
- Requires-Dist: H5py
50
38
  Requires-Dist: Bio
51
- Requires-Dist: wheel
52
39
  Requires-Dist: loguru
53
- Requires-Dist: Pympler
54
40
  Requires-Dist: click
41
+ Requires-Dist: python-slugify
42
+ Requires-Dist: cobs-reloaded
43
+ Requires-Dist: rbloom
44
+ Requires-Dist: xxhash
45
+ Requires-Dist: fastapi
46
+ Requires-Dist: uvicorn
47
+ Requires-Dist: python-multipart
55
48
  Provides-Extra: docs
56
49
  Requires-Dist: sphinx; extra == "docs"
57
50
  Requires-Dist: furo; extra == "docs"
@@ -63,9 +56,13 @@ Requires-Dist: pytest; extra == "test"
63
56
  Requires-Dist: pytest-cov; extra == "test"
64
57
 
65
58
  # XspecT - Acinetobacter Species Assignment Tool
66
- <img src="/src/xspect/static/Logo.png" height="50%" width="50%">
59
+ ![Test](https://github.com/bionf/xspect2/actions/workflows/test.yml/badge.svg)
60
+ [![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
61
+ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
62
+
63
+ <img src="/src/docs/img/logo.png" height="50%" width="50%">
67
64
  <!-- start intro -->
68
- XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters](https://en.wikipedia.org/wiki/Bloom_filter) and a [Support Vector Machine](https://en.wikipedia.org/wiki/Support-vector_machine). It also identifies existing [blaOxa-genes](https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)) and provides a list of relevant research papers for further information.
65
+ XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
69
66
  <br/><br/>
70
67
 
71
68
  XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
@@ -75,6 +72,10 @@ Local extensions of the reference database are supported.
75
72
  <br/>
76
73
 
77
74
  The tool is available as a web-based application and a smaller command line interface.
75
+
76
+ [Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
77
+ [Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
78
+ [blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
78
79
  <!-- end intro -->
79
80
 
80
81
  <!-- start quickstart -->
@@ -83,11 +84,7 @@ To install Xspect, please download the lastest 64 bit Python version and install
83
84
  ```
84
85
  pip install xspect
85
86
  ```
86
- If you would like to train filters yourself, you need to install Jellyfish, which is used to count distinct k-meres in the assemblies. It can be installed using bioconda:
87
- ```
88
- conda install -c bioconda jellyfish
89
- ```
90
- On Apple Silicon, it is possible that this command installs an incorrect Jellyfish package. Please refer to the official [Jellyfish project](https://github.com/gmarcais/Jellyfish) for installation guidance.
87
+ Please note that Apple Silicon is currently not supported.
91
88
 
92
89
  ## Usage
93
90
  ### Get the Bloomfilters
@@ -101,9 +98,9 @@ xspect train you-ncbi-genus-name
101
98
  ```
102
99
 
103
100
  ### How to run the web app
104
- Run the following command lines in a console, a browser window will open automatically after the application is fully loaded.
101
+ To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
105
102
  ```
106
- xspect web
103
+ xspect api
107
104
  ```
108
105
 
109
106
  ### How to use the XspecT command line interface
@@ -111,13 +108,9 @@ Run xspect with the configuration you want to run it with as arguments.
111
108
  ```
112
109
  xspect classify your-genus path/to/your/input-set
113
110
  ```
114
- For further instructions on how to use the command line interface, execute:
111
+ For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
115
112
  ```
116
113
  xspect --help
117
114
  ```
115
+ [documentation]: https://bionf.github.io/XspecT2/cli.html
118
116
  <!-- end quickstart -->
119
-
120
- ## Input Data
121
- XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
122
-
123
- The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
xspect-0.2.0/README.md ADDED
@@ -0,0 +1,59 @@
1
+ # XspecT - Acinetobacter Species Assignment Tool
2
+ ![Test](https://github.com/bionf/xspect2/actions/workflows/test.yml/badge.svg)
3
+ [![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
4
+ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
5
+
6
+ <img src="/src/docs/img/logo.png" height="50%" width="50%">
7
+ <!-- start intro -->
8
+ XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
9
+ <br/><br/>
10
+
11
+ XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
12
+ <br/>
13
+
14
+ Local extensions of the reference database are supported.
15
+ <br/>
16
+
17
+ The tool is available as a web-based application and a smaller command line interface.
18
+
19
+ [Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
20
+ [Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
21
+ [blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
22
+ <!-- end intro -->
23
+
24
+ <!-- start quickstart -->
25
+ ## Installation
26
+ To install Xspect, please download the lastest 64 bit Python version and install the package using pip:
27
+ ```
28
+ pip install xspect
29
+ ```
30
+ Please note that Apple Silicon is currently not supported.
31
+
32
+ ## Usage
33
+ ### Get the Bloomfilters
34
+ To download basic pre-trained filters, you can use the built-in command:
35
+ ```
36
+ xspect download-filters
37
+ ```
38
+ Additional species filters can be trained using:
39
+ ```
40
+ xspect train you-ncbi-genus-name
41
+ ```
42
+
43
+ ### How to run the web app
44
+ To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
45
+ ```
46
+ xspect api
47
+ ```
48
+
49
+ ### How to use the XspecT command line interface
50
+ Run xspect with the configuration you want to run it with as arguments.
51
+ ```
52
+ xspect classify your-genus path/to/your/input-set
53
+ ```
54
+ For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
55
+ ```
56
+ xspect --help
57
+ ```
58
+ [documentation]: https://bionf.github.io/XspecT2/cli.html
59
+ <!-- end quickstart -->
@@ -0,0 +1,114 @@
1
+ # How to use the CLI
2
+
3
+ XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train filters.
4
+
5
+ After installing XspecT, a list of available commands can be viewed by running:
6
+
7
+ ```bash
8
+ xspect --help
9
+ ```
10
+
11
+ ## Filter downloads
12
+
13
+ A basic set of pre-trained filters (Acinetobacter and Salonella) can be downloaded using the following command:
14
+
15
+ ```bash
16
+ xspect download-filters
17
+ ```
18
+
19
+ For the moment, it is not possible to specify exactly which filters should be downloaded.
20
+
21
+ ## Classification
22
+
23
+ To classify samples, the command
24
+
25
+ ```bash
26
+ xspect classify GENUS PATH
27
+ ```
28
+
29
+ can be used, when `GENUS` refers to the NCBI genus name of your sample and `PATH` refers to the path to your sample *directory*. This command will classify the species of your sample within the given genus.
30
+
31
+ The following options are available:
32
+
33
+ ```bash
34
+ -s, --species / --no-species Species classification.
35
+ -i, --ic / --no-ic IC strain typing.
36
+ -o, --oxa / --no-oxa OXA gene family detection.
37
+ -m, --meta / --no-meta Metagenome classification.
38
+ -c, --complete Use every single k-mer as input for
39
+ classification.
40
+ -s, --save Save results to csv file.
41
+ --help Show this message and exit.
42
+ ```
43
+
44
+ ### Species Classification
45
+
46
+ Species classification is run by default, without the need for further parameters:
47
+ ```bash
48
+ xspect classify Acinetobacter path
49
+ ```
50
+
51
+ Species classification can be toggled using the `-s`/`--species` (`--no-species`) option. To run classification without species classification, the option `--no-species` can be used, for example when running a different analysis:
52
+
53
+ ```bash
54
+ xspect classify --no-species -i Acinetobacter path
55
+ ```
56
+
57
+ ### IC Strain Typing
58
+
59
+ To perform International Clonal (IC) type classification, the `-i`/`--ic` (`--no-ic`) option can be used:
60
+
61
+ ```bash
62
+ xspect classify -i Acinetobacter path
63
+ ```
64
+
65
+ Please note that IC strain typing is only available for Acinetobacter baumanii.
66
+
67
+ ### OXA Gene Detection
68
+
69
+ OXA gene detection can be enabled using the `-o`/`--oxa` (`--no-oxa`) option.
70
+
71
+ ```bash
72
+ xspect classify -o Acinetobacter path
73
+ ```
74
+
75
+ ### Metagenome Mode
76
+
77
+ To analyze a sample in metagenome mode, the `-m`/`--meta` (`--no-meta`) option can be used:
78
+
79
+ ```bash
80
+ xspect classify -m Acinetobacter path
81
+ ```
82
+
83
+ Compared to normal XspecT modes, this mode first identifies reads belonging to the given genus and continues classification only with the resulting reads and is thus more suitable for metagenomic samples as the resulting runtime is decreased.
84
+
85
+ ## Filter Training
86
+
87
+ <aside>
88
+ ⚠️ Depending on genome size and the amount of species, training can take time!
89
+
90
+ </aside>
91
+
92
+ In order to train filters, please first ensure [Jellyfish](https://github.com/gmarcais/Jellyfish) is installed.
93
+
94
+ ### NCBI-based filter training
95
+
96
+ The easiest way to train new filters is to use data from NCBI, which is automatically downloaded and processed by XspecT.
97
+
98
+ To train a filter with data from NCBI, run the following command:
99
+
100
+ ```bash
101
+ xspect train your-ncbi-genus
102
+ ```
103
+
104
+ `you-ncbi-genus` can be a genus name from NCBI or an NCBI taxonomy ID.
105
+
106
+ ### Custom data filter training
107
+
108
+ XspecT filters can also be trained using custom data, which need to be provided as a folder for both filter and SVM training. The provided assembly files need to be in FASTA format and their names should be the species ID and the species name, for example `28901_enterica.fasta`. While the ID can be arbitrary, the standard is NCBI taxon IDs.
109
+
110
+ The filters can then be trained using:
111
+
112
+ ```bash
113
+ xspect train -bf-path directory/1 -svm-path directory/2
114
+ ```
@@ -0,0 +1,52 @@
1
+ :::mermaid
2
+ classDiagram
3
+ ProbabilisticFilterModel <|-- ProbabilisticFilterSVMModel
4
+ ProbabilisticFilterModel <|-- ProbabilisticSingleFilterModel
5
+ ProbabilisticFilterModel : String filter_display_name
6
+ ProbabilisticFilterModel : String author
7
+ ProbabilisticFilterModel : String author_email
8
+ ProbabilisticFilterModel : String filter_type
9
+ ProbabilisticFilterModel : Path base_path
10
+ ProbabilisticFilterModel : dict display_names
11
+ ProbabilisticFilterModel : int k
12
+ ProbabilisticFilterModel : float fpr
13
+ ProbabilisticFilterModel : int num_hashes
14
+ ProbabilisticFilterModel : dict filters
15
+ ProbabilisticFilterModel : dict non_distinct_kmer_counts
16
+ ProbabilisticFilterModel : __init__(k, filter_display_name, author, author_email, filter_type, base_path, fpr, num_hashes)
17
+ ProbabilisticFilterModel : __dict__()
18
+ ProbabilisticFilterModel : to_dict()
19
+ ProbabilisticFilterModel : slug()
20
+ ProbabilisticFilterModel : fit(dir_path, display_names)
21
+ ProbabilisticFilterModel: calculate_hits(sequence, filter_ids)
22
+ ProbabilisticFilterModel : predict(sequence_input, filter_ids)
23
+ ProbabilisticFilterModel : filter(sequences, threshold, filter_ids)
24
+ ProbabilisticFilterModel : save()
25
+ ProbabilisticFilterModel : load(path)
26
+ ProbabilisticFilterModel : _convert_cobs_result_to_dict(cobs_result)
27
+ ProbabilisticFilterModel : _count_kmers(sequence_input)
28
+ ProbabilisticFilterModel : _get_cobs_index_path()
29
+
30
+ class ProbabilisticFilterSVMModel {
31
+ SVM svm
32
+ String svm_kernel
33
+ float svm_c
34
+ __init__(..., kernel, c)
35
+ to_dict()
36
+ set_svm_params(kernel, c)
37
+ fit(dir_path, svm_path, display_names)
38
+ predict(sequence_input, filter_ids) return_by_display_names)
39
+ load(path)
40
+ _get_svm(id_keys)
41
+
42
+ }
43
+
44
+ class ProbabilisticSingleFilterModel {
45
+ Bloom filter
46
+ __init__(...)
47
+ fit(file_path, display_name)
48
+ calculate_hits(sequence)
49
+ load(path)
50
+ _generate_kmers(sequence)
51
+ }
52
+ :::
@@ -3,6 +3,10 @@
3
3
  :hidden:
4
4
 
5
5
  quickstart
6
+ installation
7
+ input_data
8
+ cli
9
+ web
6
10
  :::
7
11
 
8
12
  :::{image} img/logo.png
@@ -0,0 +1,4 @@
1
+ # Input Data
2
+ XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
3
+
4
+ The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
@@ -0,0 +1,35 @@
1
+ # Installation
2
+
3
+ ## Installation using PyPi
4
+
5
+ The easiest way to install XspecT is to use PyPi on the latest version of Python:
6
+
7
+ ```bash
8
+ pip install xspect
9
+ ```
10
+
11
+ Check if the installation was successful:
12
+
13
+ ```bash
14
+ xspect --version
15
+ ```
16
+
17
+ ## Manual Installation
18
+
19
+ If you would like to manually install XspecT, clone the Github repository to a local working directory. You can now install XspecT by running:
20
+
21
+ ```bash
22
+ pip install .
23
+ ```
24
+
25
+ For development purposes, it is recommended to install the package in edit mode:
26
+
27
+ ```bash
28
+ pip install -e '.[all]'
29
+ ```
30
+
31
+ Check if the installation was successful:
32
+
33
+ ```bash
34
+ xspect --version
35
+ ```
@@ -0,0 +1,3 @@
1
+ # How to use the Web app
2
+
3
+ Coming soon...
@@ -1,31 +1,24 @@
1
1
  [project]
2
2
  name = "XspecT"
3
- version = "0.1.3"
3
+ version = "0.2.0"
4
4
  description = "Tool to monitor and characterize pathogens using Bloom filters."
5
5
  readme = {file = "README.md", content-type = "text/markdown"}
6
6
  license = {file = "LICENSE"}
7
7
  requires-python = ">=3.10"
8
8
  dependencies = [
9
- "Flask",
10
- "Flask-WTF",
11
- "WTForms",
12
- "Werkzeug",
13
9
  "biopython",
14
- "bitarray",
15
- "mmh3",
16
- "numpy",
17
- "pandas",
18
10
  "requests",
19
11
  "scikit-learn",
20
- "Psutil",
21
- "Matplotlib",
22
- "Pympler",
23
- "H5py",
24
12
  "Bio",
25
- "wheel",
26
13
  "loguru",
27
- "Pympler",
28
14
  "click",
15
+ "python-slugify",
16
+ "cobs-reloaded",
17
+ "rbloom",
18
+ "xxhash",
19
+ "fastapi",
20
+ "uvicorn",
21
+ "python-multipart"
29
22
  ]
30
23
  classifiers = [
31
24
  "Intended Audience :: Developers",
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: XspecT
3
- Version: 0.1.3
3
+ Version: 0.2.0
4
4
  Summary: Tool to monitor and characterize pathogens using Bloom filters.
5
5
  License: MIT License
6
6
 
@@ -32,26 +32,19 @@ Classifier: License :: OSI Approved :: MIT License
32
32
  Requires-Python: >=3.10
33
33
  Description-Content-Type: text/markdown
34
34
  License-File: LICENSE
35
- Requires-Dist: Flask
36
- Requires-Dist: Flask-WTF
37
- Requires-Dist: WTForms
38
- Requires-Dist: Werkzeug
39
35
  Requires-Dist: biopython
40
- Requires-Dist: bitarray
41
- Requires-Dist: mmh3
42
- Requires-Dist: numpy
43
- Requires-Dist: pandas
44
36
  Requires-Dist: requests
45
37
  Requires-Dist: scikit-learn
46
- Requires-Dist: Psutil
47
- Requires-Dist: Matplotlib
48
- Requires-Dist: Pympler
49
- Requires-Dist: H5py
50
38
  Requires-Dist: Bio
51
- Requires-Dist: wheel
52
39
  Requires-Dist: loguru
53
- Requires-Dist: Pympler
54
40
  Requires-Dist: click
41
+ Requires-Dist: python-slugify
42
+ Requires-Dist: cobs-reloaded
43
+ Requires-Dist: rbloom
44
+ Requires-Dist: xxhash
45
+ Requires-Dist: fastapi
46
+ Requires-Dist: uvicorn
47
+ Requires-Dist: python-multipart
55
48
  Provides-Extra: docs
56
49
  Requires-Dist: sphinx; extra == "docs"
57
50
  Requires-Dist: furo; extra == "docs"
@@ -63,9 +56,13 @@ Requires-Dist: pytest; extra == "test"
63
56
  Requires-Dist: pytest-cov; extra == "test"
64
57
 
65
58
  # XspecT - Acinetobacter Species Assignment Tool
66
- <img src="/src/xspect/static/Logo.png" height="50%" width="50%">
59
+ ![Test](https://github.com/bionf/xspect2/actions/workflows/test.yml/badge.svg)
60
+ [![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
61
+ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
62
+
63
+ <img src="/src/docs/img/logo.png" height="50%" width="50%">
67
64
  <!-- start intro -->
68
- XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters](https://en.wikipedia.org/wiki/Bloom_filter) and a [Support Vector Machine](https://en.wikipedia.org/wiki/Support-vector_machine). It also identifies existing [blaOxa-genes](https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)) and provides a list of relevant research papers for further information.
65
+ XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
69
66
  <br/><br/>
70
67
 
71
68
  XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
@@ -75,6 +72,10 @@ Local extensions of the reference database are supported.
75
72
  <br/>
76
73
 
77
74
  The tool is available as a web-based application and a smaller command line interface.
75
+
76
+ [Bloom Filters]: https://en.wikipedia.org/wiki/Bloom_filter
77
+ [Support Vector Machine]: https://en.wikipedia.org/wiki/Support-vector_machine
78
+ [blaOxa-genes]: https://en.wikipedia.org/wiki/Beta-lactamase#OXA_beta-lactamases_(class_D)
78
79
  <!-- end intro -->
79
80
 
80
81
  <!-- start quickstart -->
@@ -83,11 +84,7 @@ To install Xspect, please download the lastest 64 bit Python version and install
83
84
  ```
84
85
  pip install xspect
85
86
  ```
86
- If you would like to train filters yourself, you need to install Jellyfish, which is used to count distinct k-meres in the assemblies. It can be installed using bioconda:
87
- ```
88
- conda install -c bioconda jellyfish
89
- ```
90
- On Apple Silicon, it is possible that this command installs an incorrect Jellyfish package. Please refer to the official [Jellyfish project](https://github.com/gmarcais/Jellyfish) for installation guidance.
87
+ Please note that Apple Silicon is currently not supported.
91
88
 
92
89
  ## Usage
93
90
  ### Get the Bloomfilters
@@ -101,9 +98,9 @@ xspect train you-ncbi-genus-name
101
98
  ```
102
99
 
103
100
  ### How to run the web app
104
- Run the following command lines in a console, a browser window will open automatically after the application is fully loaded.
101
+ To run the web app, install and run [XspecT Web](https://github.com/aromberg/xspect-web). Additionally, run XspecT in API mode:
105
102
  ```
106
- xspect web
103
+ xspect api
107
104
  ```
108
105
 
109
106
  ### How to use the XspecT command line interface
@@ -111,13 +108,9 @@ Run xspect with the configuration you want to run it with as arguments.
111
108
  ```
112
109
  xspect classify your-genus path/to/your/input-set
113
110
  ```
114
- For further instructions on how to use the command line interface, execute:
111
+ For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
115
112
  ```
116
113
  xspect --help
117
114
  ```
115
+ [documentation]: https://bionf.github.io/XspecT2/cli.html
118
116
  <!-- end quickstart -->
119
-
120
- ## Input Data
121
- XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
122
-
123
- The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).