XspecT 0.1.0__py3-none-any.whl → 0.1.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of XspecT might be problematic. Click here for more details.

@@ -0,0 +1,119 @@
1
+ # XspecT-Erweiterung
2
+
3
+ Expands XspecT, so new filter for a genus can automatically be trained. It's main
4
+ script is XspecT_trainer.py. The rest of the scripts are inside the python module
5
+ train_filter.
6
+
7
+ ## Training new filter
8
+
9
+ XspecT_trainer.py uses command line arguments. The examples for using XspecT_trainer.py
10
+ are using Salmonella since this genus only has two defined species in the NCBI
11
+ databases.
12
+
13
+ ### Jellyfish
14
+
15
+ The program jellyfish is used to count distinct k-meres in the assemblies. For XspecT_
16
+ trainer.py to work jellyfish needs to be installed. It can be installed using bioconda:
17
+
18
+ `
19
+ conda install -c bioconda jellyfish
20
+ `
21
+
22
+ ### Training examples
23
+
24
+ New filters with assemblies from NCBI RefSeq can be trained with the following line. The
25
+ python libraries from [requirements.txt](..%2Frequirements.txt) need to be installed.
26
+
27
+ `
28
+ python XspecT_trainer.py Salmonella 1
29
+ `
30
+
31
+ Training filters with custom data can be done using the following line.
32
+
33
+ `
34
+ python XspecT_trainer.py Salmonella 2 -bf /path/to/concate_assemblies -svm
35
+ /path/to/assemblies
36
+ `
37
+
38
+ All command line arguments are explained using the following line.
39
+
40
+ `
41
+ python XspecT_trainer.py -h
42
+ `
43
+
44
+ # Explanation of the scripts
45
+
46
+ ## backup_filter.py
47
+
48
+ Creates a backup of all files needed for the species assignment by XspecT for a specific
49
+ genus. The backup will be done, if new filters will be created for a genus which
50
+ already has trained filters.
51
+
52
+ ## create_svm.py
53
+
54
+ Downloads the needed assemblies and trains a support-vector-machine for the genus.
55
+
56
+ ## extract_and_concatenate.py
57
+
58
+ Unzips the downloaded assemblies. Concatenates assemblies per species that will be used
59
+ to train the bloomfilters.
60
+
61
+ ## get_paths.py
62
+
63
+ Functions that get specific paths.
64
+
65
+ ## html_scrap.py
66
+
67
+ Updates a list of all NCBI RefSeq assembly accessions that have a taxonomy check result
68
+ of OK. The taxonomy check from NCBI RefSeq uses the ANI (average-nucleotide-
69
+ identity) to compute a result.
70
+
71
+ ## interface_XspecT.py
72
+
73
+ Mostly functions that train new bloomfilters automatically. The functions were
74
+ originally writen for XspecT in a non-automatic way and were updated.
75
+
76
+ ## k_mer_count.py
77
+
78
+ Uses jellyfish to count distinct k-meres in every concatenated assembly. The highest
79
+ count will be used to compute the size of the bloomfilters.
80
+
81
+ ## ncbi_api
82
+
83
+ A module which makes requests to the NCBI Datasets API.
84
+
85
+ ### download_assemblies.py
86
+
87
+ The specific function that downloads assemblies from NCBI RefSeq using NCBI
88
+ datasets.
89
+
90
+ ### ncbi_assembly_metadata.py
91
+
92
+ Takes a dictionary with species and their taxon ID and asks NCBI for assemblies of
93
+ the species. Saves the collected accessions of the found and selected assemblies.
94
+
95
+ ### ncbi_children_tree.py
96
+
97
+ Takes the name or ID of a genus and gives a list with all its species.
98
+
99
+ ### ncbi_taxon_metadata.py
100
+
101
+ Takes a list with taxon and collects metadata like their scientific name and rank.
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+
110
+
111
+
112
+
113
+
114
+
115
+
116
+
117
+
118
+
119
+
@@ -127,7 +127,7 @@ def perform_lookup(bloomfilter, files, file_paths, accessions, names, spacing):
127
127
  # Dominik: changed sample size to var
128
128
  for j in range(0, len(sequence.seq) - BF.k, spacing):
129
129
  BF.number_of_kmeres += 1
130
- BF.lookup(str(sequence.seq[j : j + BF.k]))
130
+ BF.lookup_canonical(str(sequence.seq[j : j + BF.k]))
131
131
 
132
132
  score = BF.get_score()
133
133
  score = [str(x) for x in score]
@@ -1,31 +0,0 @@
1
- xspect/BF_v2.py,sha256=r6aeUFCy0nKuXvP-v6qnpq24ZphzuaxfvTKDwfyhJKg,26068
2
- xspect/Bootstrap.py,sha256=AYyEBo3MoOnPqhPAHe726mX8L9NuXDa5SATxZKLMv3s,830
3
- xspect/Classifier.py,sha256=BgqpZiMYi2maaccTzJcgH2tjrtDH-U7COc7E4t4cQt8,3602
4
- xspect/OXA_Table.py,sha256=1GxsyxMpUEgQirY0nJHtR3jl61DoPZh2Rb9L0VdMxD4,1632
5
- xspect/WebApp.py,sha256=eo1EJOMjW5grCZyvX5g1J4ppwyZb_M9lYGCNuJidM0Q,25224
6
- xspect/XspecT_mini.py,sha256=t_4OlhzLytRXkM0ig9lo0Szfm2QgJhls52TScUxFN1s,55411
7
- xspect/XspecT_trainer.py,sha256=6Gj2mltyVyM8Rsh5EU8tSCGMG7niYBLfId664zYaVXI,21703
8
- xspect/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
9
- xspect/download_filters.py,sha256=wSyX-IucjuKIEcVx-E0ClsA0XL0DI1FgMlO2UULgaXc,1048
10
- xspect/file_io.py,sha256=IWae7xxAt-EmyEbxo0nDSe3RJHmLkQT5jNS2Z3qLKdg,4807
11
- xspect/main.py,sha256=bF7ntgy_gR0ZNIB9JVxtXb-a6o0Lt0__tI_zzj03B24,2977
12
- xspect/map_kmers.py,sha256=63iTQS_GZZBK2DxjEs5xoI4KgfpZOntCKul06rrgi5w,6000
13
- xspect/search_filter.py,sha256=EZkM2917cjy4Q0zQDC9bJ0S-dyD-MBBmJqrAHQ1P260,17190
14
- xspect/train_filter/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
15
- xspect/train_filter/create_svm.py,sha256=E1QwBeUtAlOlKf6QKfmRtKaz_6idv7M8Hb-jbNb_wGk,6820
16
- xspect/train_filter/extract_and_concatenate.py,sha256=kXGqCrOk3TbOkKLJV8nKC6nL8Zg0TWKDCJu2gq8K_cw,5239
17
- xspect/train_filter/get_paths.py,sha256=JXPbv_Fx5BKHZQ4bkSIGU7yj5zjkmhsI0Z6U4nU0gug,941
18
- xspect/train_filter/html_scrap.py,sha256=iQXREhG37SNUx7gHoP8eqayMEIH00QLFMTNmIMogb_M,3799
19
- xspect/train_filter/interface_XspecT.py,sha256=HVCwVHqtvJ1EA9u6GByeKCve-6sADK5AceB5itPV62k,6735
20
- xspect/train_filter/k_mer_count.py,sha256=0yHCxzsOH8LhO6tD35O7BjWodfE5lJDKWYzzcCrr0JE,5226
21
- xspect/train_filter/ncbi_api/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
22
- xspect/train_filter/ncbi_api/download_assemblies.py,sha256=iX1qK8R6p2b3RiHPfqVsLp-dV_7iZZv0AxY1xQ-Ad48,1171
23
- xspect/train_filter/ncbi_api/ncbi_assembly_metadata.py,sha256=RhHvxKiQ8HJgoSb6njYEgO_vPioBqEMPvT3lE2lHXp0,3766
24
- xspect/train_filter/ncbi_api/ncbi_children_tree.py,sha256=pmzg6-fDGLinNSXNbBRv0v62lRgHxW4aXZ0uV1TJhOE,1793
25
- xspect/train_filter/ncbi_api/ncbi_taxon_metadata.py,sha256=uhBBGffgL4mcJpyp9KxVyOGUh8FxUTAI4xKzoLDav_Y,1577
26
- XspecT-0.1.0.dist-info/LICENSE,sha256=bhBGDKIRUVwYIHGOGO5hshzuVHyqFJajvSOA3XXOLKI,1094
27
- XspecT-0.1.0.dist-info/METADATA,sha256=z0sd9RECNiNoQPrLPDzHf-VmgWI4B1qDnvy1a8X2kuQ,5475
28
- XspecT-0.1.0.dist-info/WHEEL,sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM,92
29
- XspecT-0.1.0.dist-info/entry_points.txt,sha256=L7qliX3pIuwupQxpuOSsrBJCSHYPOPNEzH8KZKQGGUw,43
30
- XspecT-0.1.0.dist-info/top_level.txt,sha256=hdoa4cnBv6OVzpyhMmyxpJxEydH5n2lDciy8urc1paE,7
31
- XspecT-0.1.0.dist-info/RECORD,,
File without changes