kssdtree 2.0.1__tar.gz → 2.0.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {kssdtree-2.0.1/kssdtree.egg-info → kssdtree-2.0.3}/PKG-INFO +6 -3
- kssdtree-2.0.3/README.md +8 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/command_set.c +0 -1
- {kssdtree-2.0.1 → kssdtree-2.0.3/kssdtree.egg-info}/PKG-INFO +6 -3
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/requires.txt +1 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.py +107 -94
- {kssdtree-2.0.1 → kssdtree-2.0.3}/mman.c +0 -1
- {kssdtree-2.0.1 → kssdtree-2.0.3}/pykssd.c +35 -5
- {kssdtree-2.0.1 → kssdtree-2.0.3}/setup.py +10 -27
- {kssdtree-2.0.1 → kssdtree-2.0.3}/toolutils.py +6 -1
- kssdtree-2.0.1/README.md +0 -12
- {kssdtree-2.0.1 → kssdtree-2.0.3}/MANIFEST.in +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/align.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/buildtree.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/bytescale.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/cluster.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/co2mco.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/command_composite.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/command_dist.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/command_dist_wrapper.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/command_shuffle.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/distancemat.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnj.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/bytescale.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/dnj.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/filebuff.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/hclust.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/matrix.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/mman.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/nj.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/nwck.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/pherror.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/phy.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/qseqs.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/str.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/threader.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/tmp.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/vector.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/filebuff.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/global_basic.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/hclust.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/iseq2comem.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/co2mco.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_composite.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_dist.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_dist_wrapper.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_set.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_shuffle.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/global_basic.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/iseq2comem.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/mman.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/mytime.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/SOURCES.txt +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/dependency_links.txt +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/not-zip-safe +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/top_level.txt +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/matrix.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/mytime.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/nj.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/align.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/buildtree.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/cluster.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/distancemat.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/sequence.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/tree.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/util.h +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/nwck.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/pherror.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/phy.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/pydnj.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/pynj.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/qseqs.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/sequence.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/setup.cfg +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/str.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/tmp.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/tree.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/util.c +0 -0
- {kssdtree-2.0.1 → kssdtree-2.0.3}/vector.c +0 -0
|
@@ -1,8 +1,11 @@
|
|
|
1
|
-
Metadata-Version:
|
|
1
|
+
Metadata-Version: 1.1
|
|
2
2
|
Name: kssdtree
|
|
3
|
-
Version: 2.0.
|
|
3
|
+
Version: 2.0.3
|
|
4
4
|
Summary: Kssdtree is a versatile Python package for phylogenetic analysis. It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not.
|
|
5
5
|
Home-page: https://github.com/yhlink/kssdtree
|
|
6
|
-
Download-URL: https://pypi.org/project/kssdtree
|
|
7
6
|
Author: Hang Yang
|
|
8
7
|
Author-email: yhlink1207@gmail.com
|
|
8
|
+
License: UNKNOWN
|
|
9
|
+
Download-URL: https://pypi.org/project/kssdtree
|
|
10
|
+
Description: UNKNOWN
|
|
11
|
+
Platform: UNKNOWN
|
kssdtree-2.0.3/README.md
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
Kssdtree is a versatile Python package for phylogenetic analysis, offering three distinct pipelines: the Routine Pipeline, the Reference Subtraction Pipeline, and the GTDB-based Phylogenetic Placement Pipeline.
|
|
2
|
+
|
|
3
|
+
Routine Pipeline: A general-purpose tool for phylogenetic analysis of user genomic data.
|
|
4
|
+
Reference Subtraction Pipeline: Designed for intra-species phylogenomic analysis.
|
|
5
|
+
GTDB-based Phylogenetic Placement Pipeline: Facilitates the search for similar genomes in the Genome Taxonomy Database (GTDB), conducting phylogenetic analysis alongside these genomes and positioning the input genomes within the entire prokaryotic tree of life.
|
|
6
|
+
Kssdtree also provides one-stop tree construction and visualization. It can handle DNA sequences in both fasta and fastq formats, whether gzipped or not. Additionally, Kssdtree is compatible with multiple platforms (Linux, MacOS, and Windows) and can be run using Jupyter notebooks.
|
|
7
|
+
|
|
8
|
+
More usages about Kssdtree, please see Kssdtree documentation (https://kssdtree.readthedocs.io/en/latest).
|
|
@@ -1,8 +1,11 @@
|
|
|
1
|
-
Metadata-Version:
|
|
1
|
+
Metadata-Version: 1.1
|
|
2
2
|
Name: kssdtree
|
|
3
|
-
Version: 2.0.
|
|
3
|
+
Version: 2.0.3
|
|
4
4
|
Summary: Kssdtree is a versatile Python package for phylogenetic analysis. It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not.
|
|
5
5
|
Home-page: https://github.com/yhlink/kssdtree
|
|
6
|
-
Download-URL: https://pypi.org/project/kssdtree
|
|
7
6
|
Author: Hang Yang
|
|
8
7
|
Author-email: yhlink1207@gmail.com
|
|
8
|
+
License: UNKNOWN
|
|
9
|
+
Download-URL: https://pypi.org/project/kssdtree
|
|
10
|
+
Description: UNKNOWN
|
|
11
|
+
Platform: UNKNOWN
|
|
@@ -28,23 +28,18 @@ def sketch(shuf_file=None, genome_files=None, output=None, set_opt=None):
|
|
|
28
28
|
if not os.path.exists(shuf_file):
|
|
29
29
|
if shuf_file in ['L3K9.shuf', './L3K9.shuf', 'L3K10.shuf', './L3K10.shuf']:
|
|
30
30
|
print('Downloading...', shuf_file)
|
|
31
|
-
import http.client
|
|
32
|
-
http.client.HTTPConnection._http_vsn = 10
|
|
33
|
-
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
|
|
34
31
|
if shuf_file == 'L3K9.shuf' or shuf_file == './L3K9.shuf':
|
|
35
32
|
url = 'https://zenodo.org/records/12699159/files/L3K9.shuf?download=1'
|
|
36
33
|
else:
|
|
37
34
|
url = 'https://zenodo.org/records/12699159/files/L3K10.shuf?download=1'
|
|
38
|
-
|
|
39
|
-
response = requests.get(url, stream=True)
|
|
40
|
-
with open(shuf_file, 'wb') as
|
|
41
|
-
for chunk in response.iter_content(chunk_size=
|
|
42
|
-
|
|
43
|
-
file.write(chunk)
|
|
35
|
+
headers = {'Accept-Encoding': 'gzip, deflate'}
|
|
36
|
+
response = requests.get(url, headers=headers, stream=True)
|
|
37
|
+
with open(shuf_file, 'wb') as f:
|
|
38
|
+
for chunk in response.iter_content(chunk_size=8192):
|
|
39
|
+
f.write(chunk)
|
|
44
40
|
end_time = time.time()
|
|
45
|
-
if end_time - start_time >
|
|
46
|
-
print(
|
|
47
|
-
"Network timeout, please manually download from github (https://github.com/yhlink/kssdtree/tree/master/shuffle_file)")
|
|
41
|
+
if end_time - start_time > 200:
|
|
42
|
+
print("Network timeout, please manually download from https://zenodo.org/records/12699159")
|
|
48
43
|
return False
|
|
49
44
|
print('Download finished: ', shuf_file)
|
|
50
45
|
elif shuf_file in ['L2K8.shuf', 'L2K9.shuf', 'L3K11.shuf', './L2K8.shuf', './L2K9.shuf', './L3K11.shuf']:
|
|
@@ -64,9 +59,9 @@ def sketch(shuf_file=None, genome_files=None, output=None, set_opt=None):
|
|
|
64
59
|
print('Sketching...')
|
|
65
60
|
start = time.time()
|
|
66
61
|
if set_opt:
|
|
67
|
-
kssd.dist_dispatch(shuf_file, genome_files, output, 1, 0, 0)
|
|
62
|
+
kssd.dist_dispatch(shuf_file, genome_files, output, 1, 0, 0, '')
|
|
68
63
|
else:
|
|
69
|
-
kssd.dist_dispatch(shuf_file, genome_files, output, 0, 0, 0)
|
|
64
|
+
kssd.dist_dispatch(shuf_file, genome_files, output, 0, 0, 0, '')
|
|
70
65
|
end = time.time()
|
|
71
66
|
print('Sketch spend time:%.2fs' % (end - start))
|
|
72
67
|
print('Sketch finished!')
|
|
@@ -76,7 +71,7 @@ def sketch(shuf_file=None, genome_files=None, output=None, set_opt=None):
|
|
|
76
71
|
return False
|
|
77
72
|
|
|
78
73
|
|
|
79
|
-
def dist(genome_sketch=None, output=None, flag=None):
|
|
74
|
+
def dist(genome_sketch=None, output=None, metric = None, flag=None):
|
|
80
75
|
if genome_sketch is not None and output is not None:
|
|
81
76
|
if not os.path.exists(genome_sketch):
|
|
82
77
|
print('No such file or directory: ', genome_sketch)
|
|
@@ -86,6 +81,9 @@ def dist(genome_sketch=None, output=None, flag=None):
|
|
|
86
81
|
# return False
|
|
87
82
|
if flag is None:
|
|
88
83
|
flag = 0
|
|
84
|
+
if metric is None:
|
|
85
|
+
metric = 'mash'
|
|
86
|
+
|
|
89
87
|
print('Disting...')
|
|
90
88
|
start = time.time()
|
|
91
89
|
if '/' in output:
|
|
@@ -97,11 +95,15 @@ def dist(genome_sketch=None, output=None, flag=None):
|
|
|
97
95
|
else:
|
|
98
96
|
output_name = output
|
|
99
97
|
if output_name.endswith(".phy") or output_name.endswith(".phylip"):
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
98
|
+
if metric not in ['mash', 'aaf']:
|
|
99
|
+
print('Metric type error, only supports mash or aaf distance')
|
|
100
|
+
return False
|
|
101
|
+
else:
|
|
102
|
+
kssd.dist_dispatch(genome_sketch, output, genome_sketch, 2, 0, flag, metric)
|
|
103
|
+
end = time.time()
|
|
104
|
+
print('Dist spend time:%.2fs' % (end - start))
|
|
105
|
+
print('Dist finished!')
|
|
106
|
+
return True
|
|
105
107
|
else:
|
|
106
108
|
print('Output type error, only supports .phylip (.phy) format:', output_name)
|
|
107
109
|
return False
|
|
@@ -207,7 +209,12 @@ def visualize(newick=None, taxonomy=None, mode=None):
|
|
|
207
209
|
return False
|
|
208
210
|
if mode is None:
|
|
209
211
|
mode = 'r'
|
|
210
|
-
|
|
212
|
+
if taxonomy is not None and mode == 'c':
|
|
213
|
+
print('Warning: this pipeline only support 'r' (rectangle) mode !!!')
|
|
214
|
+
mode = 'r'
|
|
215
|
+
toolutils.view_tree(newick, taxonomy, mode=mode)
|
|
216
|
+
else:
|
|
217
|
+
toolutils.view_tree(newick, taxonomy, mode=mode)
|
|
211
218
|
else:
|
|
212
219
|
print('Args error!!!')
|
|
213
220
|
return False
|
|
@@ -280,6 +287,9 @@ def subtract(ref_sketch=None, genome_sketch=None, output=None, flag=None):
|
|
|
280
287
|
def quick(shuf_file=None, genome_files=None, output=None, reference=None, database=None, method='nj', mode='r', N=0):
|
|
281
288
|
if reference is None and database is None:
|
|
282
289
|
if shuf_file is not None and genome_files is not None and output is not None:
|
|
290
|
+
if toolutils.is_positive_integer(N) or toolutils.is_negative_integer(N):
|
|
291
|
+
print("N must = 0 !!!")
|
|
292
|
+
return False
|
|
283
293
|
timeStamp = int(time.mktime(time.localtime(time.time())))
|
|
284
294
|
temp_sketch = toolutils.rs() + '_sketch_' + str(timeStamp)
|
|
285
295
|
temp_phy = toolutils.rs() + '_temp.phy'
|
|
@@ -325,7 +335,7 @@ def quick(shuf_file=None, genome_files=None, output=None, reference=None, databa
|
|
|
325
335
|
elif reference is None and database == 'gtdbr214':
|
|
326
336
|
if shuf_file is not None and genome_files is not None and output is not None:
|
|
327
337
|
if not toolutils.is_positive_integer(N):
|
|
328
|
-
print("N must >0 !!!")
|
|
338
|
+
print("N must > 0 !!!")
|
|
329
339
|
return False
|
|
330
340
|
if shuf_file != 'L3K9.shuf':
|
|
331
341
|
print("shuf_file must be set to 'L3K9.shuf'")
|
|
@@ -354,83 +364,86 @@ def quick(shuf_file=None, genome_files=None, output=None, reference=None, databa
|
|
|
354
364
|
return False
|
|
355
365
|
elif reference is not None and database is None:
|
|
356
366
|
if shuf_file is not None and genome_files is not None and output is not None and method in ['nj', 'dnj']:
|
|
357
|
-
if
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
print('genome_files is a folder containing at least two . fasta or .fastq files, not a file!!!')
|
|
367
|
+
if toolutils.is_positive_integer(N) or toolutils.is_negative_integer(N):
|
|
368
|
+
print("N must = 0 !!!")
|
|
369
|
+
return False
|
|
370
|
+
if not toolutils.allowed_file(genome_files):
|
|
371
|
+
num = toolutils.get_file_num(genome_files)
|
|
372
|
+
if num == 1:
|
|
373
|
+
print('genome_files is a folder containing at least two . fasta or .fastq files!!!')
|
|
365
374
|
return False
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
else:
|
|
375
|
+
else:
|
|
376
|
+
print('genome_files is a folder containing at least two . fasta or .fastq files, not a file!!!')
|
|
377
|
+
return False
|
|
378
|
+
timeStamp = int(time.mktime(time.localtime(time.time())))
|
|
379
|
+
temp_reference_sketch = toolutils.rs() + '_ref_sketch_' + str(timeStamp)
|
|
380
|
+
temp_genomes_sketch = toolutils.rs() + '_sketch_' + str(timeStamp)
|
|
381
|
+
if not toolutils.allowed_file(reference):
|
|
382
|
+
# cur_path = os.getcwd()
|
|
383
|
+
# ref_path = os.path.join(cur_path, reference)
|
|
384
|
+
num = toolutils.get_file_num(reference)
|
|
385
|
+
if num == 1:
|
|
378
386
|
temp_union_sketch = temp_reference_sketch
|
|
379
|
-
temp_subtract_sketch = toolutils.rs() + '_subtract_sketch_' + str(timeStamp)
|
|
380
|
-
temp_phy = toolutils.rs() + '_temp.phy'
|
|
381
|
-
print('Step1...')
|
|
382
|
-
s1 = sketch(shuf_file=shuf_file, genome_files=reference, output=temp_reference_sketch, set_opt=True)
|
|
383
|
-
if not s1:
|
|
384
|
-
return False
|
|
385
|
-
s2 = sketch(shuf_file=shuf_file, genome_files=genome_files, output=temp_genomes_sketch, set_opt=True)
|
|
386
|
-
if not s2:
|
|
387
|
-
return False
|
|
388
|
-
print('Step2...')
|
|
389
|
-
s3 = union(ref_sketch=temp_reference_sketch, output=temp_union_sketch)
|
|
390
|
-
if not s3:
|
|
391
|
-
return False
|
|
392
|
-
s4 = subtract(ref_sketch=temp_union_sketch, genome_sketch=temp_genomes_sketch,
|
|
393
|
-
output=temp_subtract_sketch, flag=1)
|
|
394
|
-
if not s4:
|
|
395
|
-
return False
|
|
396
|
-
print('Step3...')
|
|
397
|
-
if method == 'nj':
|
|
398
|
-
s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
|
|
399
|
-
flag=0)
|
|
400
387
|
else:
|
|
401
|
-
|
|
402
|
-
flag=1)
|
|
403
|
-
if not s5:
|
|
404
|
-
return False
|
|
405
|
-
print('Step4...')
|
|
406
|
-
s6 = build(phylip=temp_phy, output=output, method=method)
|
|
407
|
-
if not s6:
|
|
408
|
-
return False
|
|
409
|
-
print('Step5...')
|
|
410
|
-
print('Tree visualization finished!')
|
|
411
|
-
visualize(newick=output, mode=mode)
|
|
412
|
-
if platform.system() == 'Linux':
|
|
413
|
-
current_directory = os.getcwd()
|
|
414
|
-
temp_dir1 = os.path.join(current_directory, temp_reference_sketch)
|
|
415
|
-
temp_dir2 = os.path.join(current_directory, temp_genomes_sketch)
|
|
416
|
-
temp_dir3 = os.path.join(current_directory, temp_union_sketch)
|
|
417
|
-
temp_dir4 = os.path.join(current_directory, temp_subtract_sketch)
|
|
418
|
-
temp_dir5 = os.path.join(current_directory, 'distout')
|
|
419
|
-
if os.path.exists(temp_dir1):
|
|
420
|
-
shutil.rmtree(temp_dir1)
|
|
421
|
-
if os.path.exists(temp_dir2):
|
|
422
|
-
shutil.rmtree(temp_dir2)
|
|
423
|
-
if os.path.exists(temp_dir3):
|
|
424
|
-
shutil.rmtree(temp_dir3)
|
|
425
|
-
if os.path.exists(temp_dir4):
|
|
426
|
-
shutil.rmtree(temp_dir4)
|
|
427
|
-
if os.path.exists(temp_dir5):
|
|
428
|
-
shutil.rmtree(temp_dir5)
|
|
429
|
-
if os.path.exists(temp_phy):
|
|
430
|
-
os.remove(temp_phy)
|
|
388
|
+
temp_union_sketch = toolutils.rs() + '_ref_union_sketch_' + str(timeStamp)
|
|
431
389
|
else:
|
|
432
|
-
|
|
390
|
+
temp_union_sketch = temp_reference_sketch
|
|
391
|
+
temp_subtract_sketch = toolutils.rs() + '_subtract_sketch_' + str(timeStamp)
|
|
392
|
+
temp_phy = toolutils.rs() + '_temp.phy'
|
|
393
|
+
print('Step1...')
|
|
394
|
+
s1 = sketch(shuf_file=shuf_file, genome_files=reference, output=temp_reference_sketch, set_opt=True)
|
|
395
|
+
if not s1:
|
|
396
|
+
return False
|
|
397
|
+
s2 = sketch(shuf_file=shuf_file, genome_files=genome_files, output=temp_genomes_sketch, set_opt=True)
|
|
398
|
+
if not s2:
|
|
399
|
+
return False
|
|
400
|
+
print('Step2...')
|
|
401
|
+
s3 = union(ref_sketch=temp_reference_sketch, output=temp_union_sketch)
|
|
402
|
+
if not s3:
|
|
403
|
+
return False
|
|
404
|
+
s4 = subtract(ref_sketch=temp_union_sketch, genome_sketch=temp_genomes_sketch,
|
|
405
|
+
output=temp_subtract_sketch, flag=1)
|
|
406
|
+
if not s4:
|
|
407
|
+
return False
|
|
408
|
+
print('Step3...')
|
|
409
|
+
if method == 'nj':
|
|
410
|
+
s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
|
|
411
|
+
flag=0)
|
|
412
|
+
else:
|
|
413
|
+
s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
|
|
414
|
+
flag=1)
|
|
415
|
+
if not s5:
|
|
416
|
+
return False
|
|
417
|
+
print('Step4...')
|
|
418
|
+
s6 = build(phylip=temp_phy, output=output, method=method)
|
|
419
|
+
if not s6:
|
|
433
420
|
return False
|
|
421
|
+
print('Step5...')
|
|
422
|
+
print('Tree visualization finished!')
|
|
423
|
+
visualize(newick=output, mode=mode)
|
|
424
|
+
if platform.system() == 'Linux':
|
|
425
|
+
current_directory = os.getcwd()
|
|
426
|
+
temp_dir1 = os.path.join(current_directory, temp_reference_sketch)
|
|
427
|
+
temp_dir2 = os.path.join(current_directory, temp_genomes_sketch)
|
|
428
|
+
temp_dir3 = os.path.join(current_directory, temp_union_sketch)
|
|
429
|
+
temp_dir4 = os.path.join(current_directory, temp_subtract_sketch)
|
|
430
|
+
temp_dir5 = os.path.join(current_directory, 'distout')
|
|
431
|
+
if os.path.exists(temp_dir1):
|
|
432
|
+
shutil.rmtree(temp_dir1)
|
|
433
|
+
if os.path.exists(temp_dir2):
|
|
434
|
+
shutil.rmtree(temp_dir2)
|
|
435
|
+
if os.path.exists(temp_dir3):
|
|
436
|
+
shutil.rmtree(temp_dir3)
|
|
437
|
+
if os.path.exists(temp_dir4):
|
|
438
|
+
shutil.rmtree(temp_dir4)
|
|
439
|
+
if os.path.exists(temp_dir5):
|
|
440
|
+
shutil.rmtree(temp_dir5)
|
|
441
|
+
if os.path.exists(temp_phy):
|
|
442
|
+
os.remove(temp_phy)
|
|
443
|
+
else:
|
|
444
|
+
print('Args error, please see https://kssdtree.readthedocs.io/en/latest!!!')
|
|
445
|
+
return False
|
|
446
|
+
|
|
434
447
|
else:
|
|
435
448
|
print('Pipeline error, please see https://kssdtree.readthedocs.io/en/latest!!!')
|
|
436
449
|
return False
|
|
@@ -121,12 +121,18 @@ int create_matrix(char *input_name, char *output_name, int flag) {
|
|
|
121
121
|
for (int j = 0; j <= i; j++)
|
|
122
122
|
distances[i][j] = 0.0;
|
|
123
123
|
}
|
|
124
|
-
} else {
|
|
124
|
+
} else if (flag == 1) {
|
|
125
125
|
for (int i = 0; i < num_seqs; i++) {
|
|
126
126
|
distances[i] = malloc(i * sizeof(double));
|
|
127
127
|
for (int j = 0; j < i; j++)
|
|
128
128
|
distances[i][j] = 0.0;
|
|
129
129
|
}
|
|
130
|
+
} else {
|
|
131
|
+
for (int i = 0; i < num_seqs; i++) {
|
|
132
|
+
distances[i] = malloc(num_seqs * sizeof(double));
|
|
133
|
+
for (int j = 0; j < num_seqs; j++)
|
|
134
|
+
distances[i][j] = 0.0;
|
|
135
|
+
}
|
|
130
136
|
}
|
|
131
137
|
rewind(fp);
|
|
132
138
|
fgets(line, PATHLEN, fp);
|
|
@@ -144,7 +150,7 @@ int create_matrix(char *input_name, char *output_name, int flag) {
|
|
|
144
150
|
j += 1;
|
|
145
151
|
}
|
|
146
152
|
}
|
|
147
|
-
} else {
|
|
153
|
+
} else if (flag == 1) {
|
|
148
154
|
while (fgets(line, PATHLEN, fp) != NULL) {
|
|
149
155
|
sscanf(line, "%*s %*s %*s %*s %lf", &distance);
|
|
150
156
|
if (j < i) {
|
|
@@ -156,6 +162,16 @@ int create_matrix(char *input_name, char *output_name, int flag) {
|
|
|
156
162
|
j += 1;
|
|
157
163
|
}
|
|
158
164
|
}
|
|
165
|
+
} else {
|
|
166
|
+
while (fgets(line, PATHLEN, fp) != NULL) {
|
|
167
|
+
sscanf(line, "%*s %*s %*s %*s %lf", &distance);
|
|
168
|
+
distances[i][j] = distance;
|
|
169
|
+
i += 1;
|
|
170
|
+
if (i == num_seqs && j < num_seqs) {
|
|
171
|
+
i = 0;
|
|
172
|
+
j += 1;
|
|
173
|
+
}
|
|
174
|
+
}
|
|
159
175
|
}
|
|
160
176
|
|
|
161
177
|
for (int i = 0; i < num_seqs; i++) {
|
|
@@ -187,7 +203,7 @@ int create_matrix(char *input_name, char *output_name, int flag) {
|
|
|
187
203
|
}
|
|
188
204
|
fprintf(fo, "\n");
|
|
189
205
|
}
|
|
190
|
-
} else {
|
|
206
|
+
} else if (flag == 1) {
|
|
191
207
|
for (int i = 0; i < num_seqs; i++) {
|
|
192
208
|
fprintf(fo, "%s\t", seq_names[i]);
|
|
193
209
|
for (int j = 0; j < num_seqs; j++) {
|
|
@@ -197,6 +213,14 @@ int create_matrix(char *input_name, char *output_name, int flag) {
|
|
|
197
213
|
}
|
|
198
214
|
fprintf(fo, "\n");
|
|
199
215
|
}
|
|
216
|
+
} else {
|
|
217
|
+
for (int i = 0; i < num_seqs; i++) {
|
|
218
|
+
fprintf(fo, "%s\t", seq_names[i]);
|
|
219
|
+
for (int j = 0; j < num_seqs; j++) {
|
|
220
|
+
fprintf(fo, "%.6f\t", distances[i][j]);
|
|
221
|
+
}
|
|
222
|
+
fprintf(fo, "\n");
|
|
223
|
+
}
|
|
200
224
|
}
|
|
201
225
|
for (int i = 0; i < num_seqs; i++) {
|
|
202
226
|
free(distances[i]);
|
|
@@ -235,8 +259,9 @@ static PyObject *py_dist_dispatch(PyObject *self, PyObject *args) {
|
|
|
235
259
|
char *str3;
|
|
236
260
|
int flag1;
|
|
237
261
|
int flag2;
|
|
262
|
+
char *str4;
|
|
238
263
|
int N;
|
|
239
|
-
if (!PyArg_ParseTuple(args, "
|
|
264
|
+
if (!PyArg_ParseTuple(args, "sssiiis", &str1, &str2, &str3, &flag1, &N, &flag2, &str4)) {
|
|
240
265
|
return NULL;
|
|
241
266
|
}
|
|
242
267
|
if (flag1 == 0) {
|
|
@@ -394,6 +419,11 @@ static PyObject *py_dist_dispatch(PyObject *self, PyObject *args) {
|
|
|
394
419
|
dist_opt_val3.num_remaining_args = 1;
|
|
395
420
|
dist_opt_val3.remaining_args = &str3;
|
|
396
421
|
dist_opt_val3.num_neigb = N;
|
|
422
|
+
if (strcmp(str4, "mash") == 0) {
|
|
423
|
+
dist_opt_val3.metric = 0;
|
|
424
|
+
} else {
|
|
425
|
+
dist_opt_val3.metric = 1;
|
|
426
|
+
}
|
|
397
427
|
#ifdef _OPENMP
|
|
398
428
|
if(dist_opt_val3.p == 0)
|
|
399
429
|
dist_opt_val3.p = omp_get_num_procs();
|
|
@@ -628,4 +658,4 @@ static struct PyModuleDef kssdmodule = {
|
|
|
628
658
|
|
|
629
659
|
PyMODINIT_FUNC PyInit_kssd(void) {
|
|
630
660
|
return PyModule_Create(&kssdmodule);
|
|
631
|
-
}
|
|
661
|
+
}
|
|
@@ -2,14 +2,7 @@ import sys
|
|
|
2
2
|
from setuptools import setup, Extension, find_packages
|
|
3
3
|
from os import environ
|
|
4
4
|
import os
|
|
5
|
-
import platform
|
|
6
5
|
|
|
7
|
-
|
|
8
|
-
# def get_gcc_version():
|
|
9
|
-
# gcc_version = subprocess.check_output(['gcc', '--version']).decode('utf-8')
|
|
10
|
-
# version_line = gcc_version.split('\n', 1)[0]
|
|
11
|
-
# version_str = version_line.split()[-1]
|
|
12
|
-
# return version_str
|
|
13
6
|
extra_compile_args = []
|
|
14
7
|
extra_link_args = []
|
|
15
8
|
if 'darwin' in sys.platform:
|
|
@@ -26,12 +19,12 @@ if 'darwin' in sys.platform:
|
|
|
26
19
|
gcc_path = "/opt/homebrew/bin/gcc-11"
|
|
27
20
|
elif 'gcc-12' == gcc_version:
|
|
28
21
|
gcc_path = "/opt/homebrew/bin/gcc-12"
|
|
22
|
+
elif 'gcc-13' == gcc_version:
|
|
23
|
+
gcc_path = "/opt/homebrew/bin/gcc-13"
|
|
29
24
|
elif 'gcc-14' == gcc_version:
|
|
30
25
|
gcc_path = "/opt/homebrew/bin/gcc-14"
|
|
31
|
-
elif 'gcc-15' == gcc_version:
|
|
32
|
-
gcc_path = "/opt/homebrew/bin/gcc-15"
|
|
33
26
|
else:
|
|
34
|
-
gcc_path = "
|
|
27
|
+
gcc_path = ""
|
|
35
28
|
extra_compile_args = ['-fopenmp']
|
|
36
29
|
extra_link_args = ['-fopenmp']
|
|
37
30
|
os.environ["CC"] = gcc_path
|
|
@@ -85,25 +78,16 @@ include_dirs1 = ['kssdheaders']
|
|
|
85
78
|
include_dirs2 = ['njheaders']
|
|
86
79
|
include_dirs3 = ['dnjheaders']
|
|
87
80
|
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
'ete3',
|
|
95
|
-
'requests'
|
|
96
|
-
]
|
|
97
|
-
else:
|
|
98
|
-
require_pakages = [
|
|
99
|
-
'pyqt5',
|
|
100
|
-
'ete3',
|
|
101
|
-
'requests'
|
|
102
|
-
]
|
|
81
|
+
require_pakages = [
|
|
82
|
+
'pyqt5',
|
|
83
|
+
'ete3',
|
|
84
|
+
'requests',
|
|
85
|
+
'pandas'
|
|
86
|
+
]
|
|
103
87
|
|
|
104
88
|
setup(
|
|
105
89
|
name='kssdtree',
|
|
106
|
-
version='2.0.
|
|
90
|
+
version='2.0.3',
|
|
107
91
|
author='Hang Yang',
|
|
108
92
|
author_email='yhlink1207@gmail.com',
|
|
109
93
|
description="Kssdtree is a versatile Python package for phylogenetic analysis. It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not. ",
|
|
@@ -126,4 +110,3 @@ setup(
|
|
|
126
110
|
include_package_data=True
|
|
127
111
|
)
|
|
128
112
|
|
|
129
|
-
|
|
@@ -8,7 +8,7 @@ import pandas as pd
|
|
|
8
8
|
import string
|
|
9
9
|
|
|
10
10
|
def allowed_file(filename):
|
|
11
|
-
allowed_extensions = ['.fa', '.fa.gz', '.fasta', '.fasta.gz', '.fna', '.fna.gz', '.fastq', '.fastq.gz']
|
|
11
|
+
allowed_extensions = ['.fa', '.fa.gz', '.fasta', '.fasta.gz', '.fna', '.fna.gz', '.fastq', '.fastq.gz', '.fq', '.fq.gz']
|
|
12
12
|
return any(filename.endswith(ext) for ext in allowed_extensions)
|
|
13
13
|
|
|
14
14
|
def rs():
|
|
@@ -24,6 +24,11 @@ def is_positive_integer(num):
|
|
|
24
24
|
else:
|
|
25
25
|
return False
|
|
26
26
|
|
|
27
|
+
def is_negative_integer(num):
|
|
28
|
+
if isinstance(num, int) and num < 0:
|
|
29
|
+
return True
|
|
30
|
+
else:
|
|
31
|
+
return False
|
|
27
32
|
|
|
28
33
|
def randomcolor():
|
|
29
34
|
colorArr = ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F']
|
kssdtree-2.0.1/README.md
DELETED
|
@@ -1,12 +0,0 @@
|
|
|
1
|
-
# Kssdree: an interactive Python package for phylogenetic analysis based on sketching technique
|
|
2
|
-
|
|
3
|
-
Kssdtree is a versatile Python package for phylogenetic analysis, including three different pipelines: routine pipeline, reference subtraction pipeline and phylogenetic placement pipeline. The routine pipeline serves as a versatile tool for general-purpose phylogenetic analysis of users' genomic data. The reference subtraction pipeline designs for population-level phylogenetic analysis. The phylogenetic placement pipeline facilitates the search for similar genomes in the Genome Taxonomy Database (GTDB). It conducts phylogenetic analysis alongside these similar genomes and positions the input genomes within the entire prokaryotic tree of life.
|
|
4
|
-
|
|
5
|
-
It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not. Kssdtree can run on multiple platforms (Linux, Windows, and MacOS) with Jupyter notebooks.
|
|
6
|
-
|
|
7
|
-
More usages about Kssdtree, please see Kssdtree documentation (https://kssdtree.readthedocs.io/en/latest).
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|