XspecT 0.1.0__py3-none-any.whl → 0.1.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of XspecT might be problematic. Click here for more details.
- {XspecT-0.1.0.dist-info → XspecT-0.1.3.dist-info}/METADATA +1 -1
- XspecT-0.1.3.dist-info/RECORD +49 -0
- xspect/BF_v2.py +31 -42
- xspect/WebApp.py +15 -28
- xspect/XspecT_mini.py +15 -29
- xspect/main.py +1 -1
- xspect/search_filter.py +8 -8
- xspect/static/How-To.png +0 -0
- xspect/static/Logo.png +0 -0
- xspect/static/Logo2.png +0 -0
- xspect/static/Workflow_AspecT.png +0 -0
- xspect/static/Workflow_ClAssT.png +0 -0
- xspect/static/js.js +615 -0
- xspect/static/main.css +280 -0
- xspect/templates/400.html +64 -0
- xspect/templates/401.html +62 -0
- xspect/templates/404.html +62 -0
- xspect/templates/500.html +62 -0
- xspect/templates/about.html +544 -0
- xspect/templates/home.html +51 -0
- xspect/templates/layoutabout.html +87 -0
- xspect/templates/layouthome.html +63 -0
- xspect/templates/layoutspecies.html +468 -0
- xspect/templates/species.html +33 -0
- xspect/train_filter/README_XspecT_Erweiterung.md +119 -0
- xspect/train_filter/create_svm.py +1 -1
- XspecT-0.1.0.dist-info/RECORD +0 -31
- {XspecT-0.1.0.dist-info → XspecT-0.1.3.dist-info}/LICENSE +0 -0
- {XspecT-0.1.0.dist-info → XspecT-0.1.3.dist-info}/WHEEL +0 -0
- {XspecT-0.1.0.dist-info → XspecT-0.1.3.dist-info}/entry_points.txt +0 -0
- {XspecT-0.1.0.dist-info → XspecT-0.1.3.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
# XspecT-Erweiterung
|
|
2
|
+
|
|
3
|
+
Expands XspecT, so new filter for a genus can automatically be trained. It's main
|
|
4
|
+
script is XspecT_trainer.py. The rest of the scripts are inside the python module
|
|
5
|
+
train_filter.
|
|
6
|
+
|
|
7
|
+
## Training new filter
|
|
8
|
+
|
|
9
|
+
XspecT_trainer.py uses command line arguments. The examples for using XspecT_trainer.py
|
|
10
|
+
are using Salmonella since this genus only has two defined species in the NCBI
|
|
11
|
+
databases.
|
|
12
|
+
|
|
13
|
+
### Jellyfish
|
|
14
|
+
|
|
15
|
+
The program jellyfish is used to count distinct k-meres in the assemblies. For XspecT_
|
|
16
|
+
trainer.py to work jellyfish needs to be installed. It can be installed using bioconda:
|
|
17
|
+
|
|
18
|
+
`
|
|
19
|
+
conda install -c bioconda jellyfish
|
|
20
|
+
`
|
|
21
|
+
|
|
22
|
+
### Training examples
|
|
23
|
+
|
|
24
|
+
New filters with assemblies from NCBI RefSeq can be trained with the following line. The
|
|
25
|
+
python libraries from [requirements.txt](..%2Frequirements.txt) need to be installed.
|
|
26
|
+
|
|
27
|
+
`
|
|
28
|
+
python XspecT_trainer.py Salmonella 1
|
|
29
|
+
`
|
|
30
|
+
|
|
31
|
+
Training filters with custom data can be done using the following line.
|
|
32
|
+
|
|
33
|
+
`
|
|
34
|
+
python XspecT_trainer.py Salmonella 2 -bf /path/to/concate_assemblies -svm
|
|
35
|
+
/path/to/assemblies
|
|
36
|
+
`
|
|
37
|
+
|
|
38
|
+
All command line arguments are explained using the following line.
|
|
39
|
+
|
|
40
|
+
`
|
|
41
|
+
python XspecT_trainer.py -h
|
|
42
|
+
`
|
|
43
|
+
|
|
44
|
+
# Explanation of the scripts
|
|
45
|
+
|
|
46
|
+
## backup_filter.py
|
|
47
|
+
|
|
48
|
+
Creates a backup of all files needed for the species assignment by XspecT for a specific
|
|
49
|
+
genus. The backup will be done, if new filters will be created for a genus which
|
|
50
|
+
already has trained filters.
|
|
51
|
+
|
|
52
|
+
## create_svm.py
|
|
53
|
+
|
|
54
|
+
Downloads the needed assemblies and trains a support-vector-machine for the genus.
|
|
55
|
+
|
|
56
|
+
## extract_and_concatenate.py
|
|
57
|
+
|
|
58
|
+
Unzips the downloaded assemblies. Concatenates assemblies per species that will be used
|
|
59
|
+
to train the bloomfilters.
|
|
60
|
+
|
|
61
|
+
## get_paths.py
|
|
62
|
+
|
|
63
|
+
Functions that get specific paths.
|
|
64
|
+
|
|
65
|
+
## html_scrap.py
|
|
66
|
+
|
|
67
|
+
Updates a list of all NCBI RefSeq assembly accessions that have a taxonomy check result
|
|
68
|
+
of OK. The taxonomy check from NCBI RefSeq uses the ANI (average-nucleotide-
|
|
69
|
+
identity) to compute a result.
|
|
70
|
+
|
|
71
|
+
## interface_XspecT.py
|
|
72
|
+
|
|
73
|
+
Mostly functions that train new bloomfilters automatically. The functions were
|
|
74
|
+
originally writen for XspecT in a non-automatic way and were updated.
|
|
75
|
+
|
|
76
|
+
## k_mer_count.py
|
|
77
|
+
|
|
78
|
+
Uses jellyfish to count distinct k-meres in every concatenated assembly. The highest
|
|
79
|
+
count will be used to compute the size of the bloomfilters.
|
|
80
|
+
|
|
81
|
+
## ncbi_api
|
|
82
|
+
|
|
83
|
+
A module which makes requests to the NCBI Datasets API.
|
|
84
|
+
|
|
85
|
+
### download_assemblies.py
|
|
86
|
+
|
|
87
|
+
The specific function that downloads assemblies from NCBI RefSeq using NCBI
|
|
88
|
+
datasets.
|
|
89
|
+
|
|
90
|
+
### ncbi_assembly_metadata.py
|
|
91
|
+
|
|
92
|
+
Takes a dictionary with species and their taxon ID and asks NCBI for assemblies of
|
|
93
|
+
the species. Saves the collected accessions of the found and selected assemblies.
|
|
94
|
+
|
|
95
|
+
### ncbi_children_tree.py
|
|
96
|
+
|
|
97
|
+
Takes the name or ID of a genus and gives a list with all its species.
|
|
98
|
+
|
|
99
|
+
### ncbi_taxon_metadata.py
|
|
100
|
+
|
|
101
|
+
Takes a list with taxon and collects metadata like their scientific name and rank.
|
|
102
|
+
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
|
|
108
|
+
|
|
109
|
+
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
|
|
119
|
+
|
|
@@ -127,7 +127,7 @@ def perform_lookup(bloomfilter, files, file_paths, accessions, names, spacing):
|
|
|
127
127
|
# Dominik: changed sample size to var
|
|
128
128
|
for j in range(0, len(sequence.seq) - BF.k, spacing):
|
|
129
129
|
BF.number_of_kmeres += 1
|
|
130
|
-
BF.
|
|
130
|
+
BF.lookup_canonical(str(sequence.seq[j : j + BF.k]))
|
|
131
131
|
|
|
132
132
|
score = BF.get_score()
|
|
133
133
|
score = [str(x) for x in score]
|
XspecT-0.1.0.dist-info/RECORD
DELETED
|
@@ -1,31 +0,0 @@
|
|
|
1
|
-
xspect/BF_v2.py,sha256=r6aeUFCy0nKuXvP-v6qnpq24ZphzuaxfvTKDwfyhJKg,26068
|
|
2
|
-
xspect/Bootstrap.py,sha256=AYyEBo3MoOnPqhPAHe726mX8L9NuXDa5SATxZKLMv3s,830
|
|
3
|
-
xspect/Classifier.py,sha256=BgqpZiMYi2maaccTzJcgH2tjrtDH-U7COc7E4t4cQt8,3602
|
|
4
|
-
xspect/OXA_Table.py,sha256=1GxsyxMpUEgQirY0nJHtR3jl61DoPZh2Rb9L0VdMxD4,1632
|
|
5
|
-
xspect/WebApp.py,sha256=eo1EJOMjW5grCZyvX5g1J4ppwyZb_M9lYGCNuJidM0Q,25224
|
|
6
|
-
xspect/XspecT_mini.py,sha256=t_4OlhzLytRXkM0ig9lo0Szfm2QgJhls52TScUxFN1s,55411
|
|
7
|
-
xspect/XspecT_trainer.py,sha256=6Gj2mltyVyM8Rsh5EU8tSCGMG7niYBLfId664zYaVXI,21703
|
|
8
|
-
xspect/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
9
|
-
xspect/download_filters.py,sha256=wSyX-IucjuKIEcVx-E0ClsA0XL0DI1FgMlO2UULgaXc,1048
|
|
10
|
-
xspect/file_io.py,sha256=IWae7xxAt-EmyEbxo0nDSe3RJHmLkQT5jNS2Z3qLKdg,4807
|
|
11
|
-
xspect/main.py,sha256=bF7ntgy_gR0ZNIB9JVxtXb-a6o0Lt0__tI_zzj03B24,2977
|
|
12
|
-
xspect/map_kmers.py,sha256=63iTQS_GZZBK2DxjEs5xoI4KgfpZOntCKul06rrgi5w,6000
|
|
13
|
-
xspect/search_filter.py,sha256=EZkM2917cjy4Q0zQDC9bJ0S-dyD-MBBmJqrAHQ1P260,17190
|
|
14
|
-
xspect/train_filter/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
15
|
-
xspect/train_filter/create_svm.py,sha256=E1QwBeUtAlOlKf6QKfmRtKaz_6idv7M8Hb-jbNb_wGk,6820
|
|
16
|
-
xspect/train_filter/extract_and_concatenate.py,sha256=kXGqCrOk3TbOkKLJV8nKC6nL8Zg0TWKDCJu2gq8K_cw,5239
|
|
17
|
-
xspect/train_filter/get_paths.py,sha256=JXPbv_Fx5BKHZQ4bkSIGU7yj5zjkmhsI0Z6U4nU0gug,941
|
|
18
|
-
xspect/train_filter/html_scrap.py,sha256=iQXREhG37SNUx7gHoP8eqayMEIH00QLFMTNmIMogb_M,3799
|
|
19
|
-
xspect/train_filter/interface_XspecT.py,sha256=HVCwVHqtvJ1EA9u6GByeKCve-6sADK5AceB5itPV62k,6735
|
|
20
|
-
xspect/train_filter/k_mer_count.py,sha256=0yHCxzsOH8LhO6tD35O7BjWodfE5lJDKWYzzcCrr0JE,5226
|
|
21
|
-
xspect/train_filter/ncbi_api/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
22
|
-
xspect/train_filter/ncbi_api/download_assemblies.py,sha256=iX1qK8R6p2b3RiHPfqVsLp-dV_7iZZv0AxY1xQ-Ad48,1171
|
|
23
|
-
xspect/train_filter/ncbi_api/ncbi_assembly_metadata.py,sha256=RhHvxKiQ8HJgoSb6njYEgO_vPioBqEMPvT3lE2lHXp0,3766
|
|
24
|
-
xspect/train_filter/ncbi_api/ncbi_children_tree.py,sha256=pmzg6-fDGLinNSXNbBRv0v62lRgHxW4aXZ0uV1TJhOE,1793
|
|
25
|
-
xspect/train_filter/ncbi_api/ncbi_taxon_metadata.py,sha256=uhBBGffgL4mcJpyp9KxVyOGUh8FxUTAI4xKzoLDav_Y,1577
|
|
26
|
-
XspecT-0.1.0.dist-info/LICENSE,sha256=bhBGDKIRUVwYIHGOGO5hshzuVHyqFJajvSOA3XXOLKI,1094
|
|
27
|
-
XspecT-0.1.0.dist-info/METADATA,sha256=z0sd9RECNiNoQPrLPDzHf-VmgWI4B1qDnvy1a8X2kuQ,5475
|
|
28
|
-
XspecT-0.1.0.dist-info/WHEEL,sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM,92
|
|
29
|
-
XspecT-0.1.0.dist-info/entry_points.txt,sha256=L7qliX3pIuwupQxpuOSsrBJCSHYPOPNEzH8KZKQGGUw,43
|
|
30
|
-
XspecT-0.1.0.dist-info/top_level.txt,sha256=hdoa4cnBv6OVzpyhMmyxpJxEydH5n2lDciy8urc1paE,7
|
|
31
|
-
XspecT-0.1.0.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|