kssdtree 2.0.1__tar.gz → 2.0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (79) hide show
  1. {kssdtree-2.0.1/kssdtree.egg-info → kssdtree-2.0.3}/PKG-INFO +6 -3
  2. kssdtree-2.0.3/README.md +8 -0
  3. {kssdtree-2.0.1 → kssdtree-2.0.3}/command_set.c +0 -1
  4. {kssdtree-2.0.1 → kssdtree-2.0.3/kssdtree.egg-info}/PKG-INFO +6 -3
  5. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/requires.txt +1 -0
  6. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.py +107 -94
  7. {kssdtree-2.0.1 → kssdtree-2.0.3}/mman.c +0 -1
  8. {kssdtree-2.0.1 → kssdtree-2.0.3}/pykssd.c +35 -5
  9. {kssdtree-2.0.1 → kssdtree-2.0.3}/setup.py +10 -27
  10. {kssdtree-2.0.1 → kssdtree-2.0.3}/toolutils.py +6 -1
  11. kssdtree-2.0.1/README.md +0 -12
  12. {kssdtree-2.0.1 → kssdtree-2.0.3}/MANIFEST.in +0 -0
  13. {kssdtree-2.0.1 → kssdtree-2.0.3}/align.c +0 -0
  14. {kssdtree-2.0.1 → kssdtree-2.0.3}/buildtree.c +0 -0
  15. {kssdtree-2.0.1 → kssdtree-2.0.3}/bytescale.c +0 -0
  16. {kssdtree-2.0.1 → kssdtree-2.0.3}/cluster.c +0 -0
  17. {kssdtree-2.0.1 → kssdtree-2.0.3}/co2mco.c +0 -0
  18. {kssdtree-2.0.1 → kssdtree-2.0.3}/command_composite.c +0 -0
  19. {kssdtree-2.0.1 → kssdtree-2.0.3}/command_dist.c +0 -0
  20. {kssdtree-2.0.1 → kssdtree-2.0.3}/command_dist_wrapper.c +0 -0
  21. {kssdtree-2.0.1 → kssdtree-2.0.3}/command_shuffle.c +0 -0
  22. {kssdtree-2.0.1 → kssdtree-2.0.3}/distancemat.c +0 -0
  23. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnj.c +0 -0
  24. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/bytescale.h +0 -0
  25. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/dnj.h +0 -0
  26. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/filebuff.h +0 -0
  27. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/hclust.h +0 -0
  28. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/matrix.h +0 -0
  29. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/mman.h +0 -0
  30. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/nj.h +0 -0
  31. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/nwck.h +0 -0
  32. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/pherror.h +0 -0
  33. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/phy.h +0 -0
  34. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/qseqs.h +0 -0
  35. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/str.h +0 -0
  36. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/threader.h +0 -0
  37. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/tmp.h +0 -0
  38. {kssdtree-2.0.1 → kssdtree-2.0.3}/dnjheaders/vector.h +0 -0
  39. {kssdtree-2.0.1 → kssdtree-2.0.3}/filebuff.c +0 -0
  40. {kssdtree-2.0.1 → kssdtree-2.0.3}/global_basic.c +0 -0
  41. {kssdtree-2.0.1 → kssdtree-2.0.3}/hclust.c +0 -0
  42. {kssdtree-2.0.1 → kssdtree-2.0.3}/iseq2comem.c +0 -0
  43. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/co2mco.h +0 -0
  44. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_composite.h +0 -0
  45. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_dist.h +0 -0
  46. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_dist_wrapper.h +0 -0
  47. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_set.h +0 -0
  48. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/command_shuffle.h +0 -0
  49. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/global_basic.h +0 -0
  50. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/iseq2comem.h +0 -0
  51. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/mman.h +0 -0
  52. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdheaders/mytime.h +0 -0
  53. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/SOURCES.txt +0 -0
  54. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/dependency_links.txt +0 -0
  55. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/not-zip-safe +0 -0
  56. {kssdtree-2.0.1 → kssdtree-2.0.3}/kssdtree.egg-info/top_level.txt +0 -0
  57. {kssdtree-2.0.1 → kssdtree-2.0.3}/matrix.c +0 -0
  58. {kssdtree-2.0.1 → kssdtree-2.0.3}/mytime.c +0 -0
  59. {kssdtree-2.0.1 → kssdtree-2.0.3}/nj.c +0 -0
  60. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/align.h +0 -0
  61. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/buildtree.h +0 -0
  62. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/cluster.h +0 -0
  63. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/distancemat.h +0 -0
  64. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/sequence.h +0 -0
  65. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/tree.h +0 -0
  66. {kssdtree-2.0.1 → kssdtree-2.0.3}/njheaders/util.h +0 -0
  67. {kssdtree-2.0.1 → kssdtree-2.0.3}/nwck.c +0 -0
  68. {kssdtree-2.0.1 → kssdtree-2.0.3}/pherror.c +0 -0
  69. {kssdtree-2.0.1 → kssdtree-2.0.3}/phy.c +0 -0
  70. {kssdtree-2.0.1 → kssdtree-2.0.3}/pydnj.c +0 -0
  71. {kssdtree-2.0.1 → kssdtree-2.0.3}/pynj.c +0 -0
  72. {kssdtree-2.0.1 → kssdtree-2.0.3}/qseqs.c +0 -0
  73. {kssdtree-2.0.1 → kssdtree-2.0.3}/sequence.c +0 -0
  74. {kssdtree-2.0.1 → kssdtree-2.0.3}/setup.cfg +0 -0
  75. {kssdtree-2.0.1 → kssdtree-2.0.3}/str.c +0 -0
  76. {kssdtree-2.0.1 → kssdtree-2.0.3}/tmp.c +0 -0
  77. {kssdtree-2.0.1 → kssdtree-2.0.3}/tree.c +0 -0
  78. {kssdtree-2.0.1 → kssdtree-2.0.3}/util.c +0 -0
  79. {kssdtree-2.0.1 → kssdtree-2.0.3}/vector.c +0 -0
@@ -1,8 +1,11 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 1.1
2
2
  Name: kssdtree
3
- Version: 2.0.1
3
+ Version: 2.0.3
4
4
  Summary: Kssdtree is a versatile Python package for phylogenetic analysis. It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not.
5
5
  Home-page: https://github.com/yhlink/kssdtree
6
- Download-URL: https://pypi.org/project/kssdtree
7
6
  Author: Hang Yang
8
7
  Author-email: yhlink1207@gmail.com
8
+ License: UNKNOWN
9
+ Download-URL: https://pypi.org/project/kssdtree
10
+ Description: UNKNOWN
11
+ Platform: UNKNOWN
@@ -0,0 +1,8 @@
1
+ Kssdtree is a versatile Python package for phylogenetic analysis, offering three distinct pipelines: the Routine Pipeline, the Reference Subtraction Pipeline, and the GTDB-based Phylogenetic Placement Pipeline.
2
+
3
+ Routine Pipeline: A general-purpose tool for phylogenetic analysis of user genomic data.
4
+ Reference Subtraction Pipeline: Designed for intra-species phylogenomic analysis.
5
+ GTDB-based Phylogenetic Placement Pipeline: Facilitates the search for similar genomes in the Genome Taxonomy Database (GTDB), conducting phylogenetic analysis alongside these genomes and positioning the input genomes within the entire prokaryotic tree of life.
6
+ Kssdtree also provides one-stop tree construction and visualization. It can handle DNA sequences in both fasta and fastq formats, whether gzipped or not. Additionally, Kssdtree is compatible with multiple platforms (Linux, MacOS, and Windows) and can be run using Jupyter notebooks.
7
+
8
+ More usages about Kssdtree, please see Kssdtree documentation (https://kssdtree.readthedocs.io/en/latest).
@@ -25,7 +25,6 @@
25
25
  #include <errno.h>
26
26
  #include <math.h>
27
27
  #include <unistd.h>
28
-
29
28
  const char skch_prefix[]="combco";
30
29
  const char idx_prefix[]="combco.index";
31
30
  const char pan_prefix[]="pan";
@@ -1,8 +1,11 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 1.1
2
2
  Name: kssdtree
3
- Version: 2.0.1
3
+ Version: 2.0.3
4
4
  Summary: Kssdtree is a versatile Python package for phylogenetic analysis. It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not.
5
5
  Home-page: https://github.com/yhlink/kssdtree
6
- Download-URL: https://pypi.org/project/kssdtree
7
6
  Author: Hang Yang
8
7
  Author-email: yhlink1207@gmail.com
8
+ License: UNKNOWN
9
+ Download-URL: https://pypi.org/project/kssdtree
10
+ Description: UNKNOWN
11
+ Platform: UNKNOWN
@@ -1,3 +1,4 @@
1
1
  pyqt5
2
2
  ete3
3
3
  requests
4
+ pandas
@@ -28,23 +28,18 @@ def sketch(shuf_file=None, genome_files=None, output=None, set_opt=None):
28
28
  if not os.path.exists(shuf_file):
29
29
  if shuf_file in ['L3K9.shuf', './L3K9.shuf', 'L3K10.shuf', './L3K10.shuf']:
30
30
  print('Downloading...', shuf_file)
31
- import http.client
32
- http.client.HTTPConnection._http_vsn = 10
33
- http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
34
31
  if shuf_file == 'L3K9.shuf' or shuf_file == './L3K9.shuf':
35
32
  url = 'https://zenodo.org/records/12699159/files/L3K9.shuf?download=1'
36
33
  else:
37
34
  url = 'https://zenodo.org/records/12699159/files/L3K10.shuf?download=1'
38
- start_time = time.time()
39
- response = requests.get(url, stream=True)
40
- with open(shuf_file, 'wb') as file:
41
- for chunk in response.iter_content(chunk_size=1024):
42
- if chunk:
43
- file.write(chunk)
35
+ headers = {'Accept-Encoding': 'gzip, deflate'}
36
+ response = requests.get(url, headers=headers, stream=True)
37
+ with open(shuf_file, 'wb') as f:
38
+ for chunk in response.iter_content(chunk_size=8192):
39
+ f.write(chunk)
44
40
  end_time = time.time()
45
- if end_time - start_time > 120:
46
- print(
47
- "Network timeout, please manually download from github (https://github.com/yhlink/kssdtree/tree/master/shuffle_file)")
41
+ if end_time - start_time > 200:
42
+ print("Network timeout, please manually download from https://zenodo.org/records/12699159")
48
43
  return False
49
44
  print('Download finished: ', shuf_file)
50
45
  elif shuf_file in ['L2K8.shuf', 'L2K9.shuf', 'L3K11.shuf', './L2K8.shuf', './L2K9.shuf', './L3K11.shuf']:
@@ -64,9 +59,9 @@ def sketch(shuf_file=None, genome_files=None, output=None, set_opt=None):
64
59
  print('Sketching...')
65
60
  start = time.time()
66
61
  if set_opt:
67
- kssd.dist_dispatch(shuf_file, genome_files, output, 1, 0, 0)
62
+ kssd.dist_dispatch(shuf_file, genome_files, output, 1, 0, 0, '')
68
63
  else:
69
- kssd.dist_dispatch(shuf_file, genome_files, output, 0, 0, 0)
64
+ kssd.dist_dispatch(shuf_file, genome_files, output, 0, 0, 0, '')
70
65
  end = time.time()
71
66
  print('Sketch spend time:%.2fs' % (end - start))
72
67
  print('Sketch finished!')
@@ -76,7 +71,7 @@ def sketch(shuf_file=None, genome_files=None, output=None, set_opt=None):
76
71
  return False
77
72
 
78
73
 
79
- def dist(genome_sketch=None, output=None, flag=None):
74
+ def dist(genome_sketch=None, output=None, metric = None, flag=None):
80
75
  if genome_sketch is not None and output is not None:
81
76
  if not os.path.exists(genome_sketch):
82
77
  print('No such file or directory: ', genome_sketch)
@@ -86,6 +81,9 @@ def dist(genome_sketch=None, output=None, flag=None):
86
81
  # return False
87
82
  if flag is None:
88
83
  flag = 0
84
+ if metric is None:
85
+ metric = 'mash'
86
+
89
87
  print('Disting...')
90
88
  start = time.time()
91
89
  if '/' in output:
@@ -97,11 +95,15 @@ def dist(genome_sketch=None, output=None, flag=None):
97
95
  else:
98
96
  output_name = output
99
97
  if output_name.endswith(".phy") or output_name.endswith(".phylip"):
100
- kssd.dist_dispatch(genome_sketch, output, genome_sketch, 2, 0, flag)
101
- end = time.time()
102
- print('Dist spend time:%.2fs' % (end - start))
103
- print('Dist finished!')
104
- return True
98
+ if metric not in ['mash', 'aaf']:
99
+ print('Metric type error, only supports mash or aaf distance')
100
+ return False
101
+ else:
102
+ kssd.dist_dispatch(genome_sketch, output, genome_sketch, 2, 0, flag, metric)
103
+ end = time.time()
104
+ print('Dist spend time:%.2fs' % (end - start))
105
+ print('Dist finished!')
106
+ return True
105
107
  else:
106
108
  print('Output type error, only supports .phylip (.phy) format:', output_name)
107
109
  return False
@@ -207,7 +209,12 @@ def visualize(newick=None, taxonomy=None, mode=None):
207
209
  return False
208
210
  if mode is None:
209
211
  mode = 'r'
210
- toolutils.view_tree(newick, taxonomy, mode=mode)
212
+ if taxonomy is not None and mode == 'c':
213
+ print('Warning: this pipeline only support 'r' (rectangle) mode !!!')
214
+ mode = 'r'
215
+ toolutils.view_tree(newick, taxonomy, mode=mode)
216
+ else:
217
+ toolutils.view_tree(newick, taxonomy, mode=mode)
211
218
  else:
212
219
  print('Args error!!!')
213
220
  return False
@@ -280,6 +287,9 @@ def subtract(ref_sketch=None, genome_sketch=None, output=None, flag=None):
280
287
  def quick(shuf_file=None, genome_files=None, output=None, reference=None, database=None, method='nj', mode='r', N=0):
281
288
  if reference is None and database is None:
282
289
  if shuf_file is not None and genome_files is not None and output is not None:
290
+ if toolutils.is_positive_integer(N) or toolutils.is_negative_integer(N):
291
+ print("N must = 0 !!!")
292
+ return False
283
293
  timeStamp = int(time.mktime(time.localtime(time.time())))
284
294
  temp_sketch = toolutils.rs() + '_sketch_' + str(timeStamp)
285
295
  temp_phy = toolutils.rs() + '_temp.phy'
@@ -325,7 +335,7 @@ def quick(shuf_file=None, genome_files=None, output=None, reference=None, databa
325
335
  elif reference is None and database == 'gtdbr214':
326
336
  if shuf_file is not None and genome_files is not None and output is not None:
327
337
  if not toolutils.is_positive_integer(N):
328
- print("N must >0 !!!")
338
+ print("N must > 0 !!!")
329
339
  return False
330
340
  if shuf_file != 'L3K9.shuf':
331
341
  print("shuf_file must be set to 'L3K9.shuf'")
@@ -354,83 +364,86 @@ def quick(shuf_file=None, genome_files=None, output=None, reference=None, databa
354
364
  return False
355
365
  elif reference is not None and database is None:
356
366
  if shuf_file is not None and genome_files is not None and output is not None and method in ['nj', 'dnj']:
357
- if shuf_file is not None and genome_files is not None and output is not None and method in ['nj', 'dnj']:
358
- if not toolutils.allowed_file(genome_files):
359
- num = toolutils.get_file_num(genome_files)
360
- if num == 1:
361
- print('genome_files is a folder containing at least two . fasta or .fastq files!!!')
362
- return False
363
- else:
364
- print('genome_files is a folder containing at least two . fasta or .fastq files, not a file!!!')
367
+ if toolutils.is_positive_integer(N) or toolutils.is_negative_integer(N):
368
+ print("N must = 0 !!!")
369
+ return False
370
+ if not toolutils.allowed_file(genome_files):
371
+ num = toolutils.get_file_num(genome_files)
372
+ if num == 1:
373
+ print('genome_files is a folder containing at least two . fasta or .fastq files!!!')
365
374
  return False
366
- timeStamp = int(time.mktime(time.localtime(time.time())))
367
- temp_reference_sketch = toolutils.rs() + '_ref_sketch_' + str(timeStamp)
368
- temp_genomes_sketch = toolutils.rs() + '_sketch_' + str(timeStamp)
369
- if not toolutils.allowed_file(reference):
370
- # cur_path = os.getcwd()
371
- # ref_path = os.path.join(cur_path, reference)
372
- num = toolutils.get_file_num(reference)
373
- if num == 1:
374
- temp_union_sketch = temp_reference_sketch
375
- else:
376
- temp_union_sketch = toolutils.rs() + '_ref_union_sketch_' + str(timeStamp)
377
- else:
375
+ else:
376
+ print('genome_files is a folder containing at least two . fasta or .fastq files, not a file!!!')
377
+ return False
378
+ timeStamp = int(time.mktime(time.localtime(time.time())))
379
+ temp_reference_sketch = toolutils.rs() + '_ref_sketch_' + str(timeStamp)
380
+ temp_genomes_sketch = toolutils.rs() + '_sketch_' + str(timeStamp)
381
+ if not toolutils.allowed_file(reference):
382
+ # cur_path = os.getcwd()
383
+ # ref_path = os.path.join(cur_path, reference)
384
+ num = toolutils.get_file_num(reference)
385
+ if num == 1:
378
386
  temp_union_sketch = temp_reference_sketch
379
- temp_subtract_sketch = toolutils.rs() + '_subtract_sketch_' + str(timeStamp)
380
- temp_phy = toolutils.rs() + '_temp.phy'
381
- print('Step1...')
382
- s1 = sketch(shuf_file=shuf_file, genome_files=reference, output=temp_reference_sketch, set_opt=True)
383
- if not s1:
384
- return False
385
- s2 = sketch(shuf_file=shuf_file, genome_files=genome_files, output=temp_genomes_sketch, set_opt=True)
386
- if not s2:
387
- return False
388
- print('Step2...')
389
- s3 = union(ref_sketch=temp_reference_sketch, output=temp_union_sketch)
390
- if not s3:
391
- return False
392
- s4 = subtract(ref_sketch=temp_union_sketch, genome_sketch=temp_genomes_sketch,
393
- output=temp_subtract_sketch, flag=1)
394
- if not s4:
395
- return False
396
- print('Step3...')
397
- if method == 'nj':
398
- s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
399
- flag=0)
400
387
  else:
401
- s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
402
- flag=1)
403
- if not s5:
404
- return False
405
- print('Step4...')
406
- s6 = build(phylip=temp_phy, output=output, method=method)
407
- if not s6:
408
- return False
409
- print('Step5...')
410
- print('Tree visualization finished!')
411
- visualize(newick=output, mode=mode)
412
- if platform.system() == 'Linux':
413
- current_directory = os.getcwd()
414
- temp_dir1 = os.path.join(current_directory, temp_reference_sketch)
415
- temp_dir2 = os.path.join(current_directory, temp_genomes_sketch)
416
- temp_dir3 = os.path.join(current_directory, temp_union_sketch)
417
- temp_dir4 = os.path.join(current_directory, temp_subtract_sketch)
418
- temp_dir5 = os.path.join(current_directory, 'distout')
419
- if os.path.exists(temp_dir1):
420
- shutil.rmtree(temp_dir1)
421
- if os.path.exists(temp_dir2):
422
- shutil.rmtree(temp_dir2)
423
- if os.path.exists(temp_dir3):
424
- shutil.rmtree(temp_dir3)
425
- if os.path.exists(temp_dir4):
426
- shutil.rmtree(temp_dir4)
427
- if os.path.exists(temp_dir5):
428
- shutil.rmtree(temp_dir5)
429
- if os.path.exists(temp_phy):
430
- os.remove(temp_phy)
388
+ temp_union_sketch = toolutils.rs() + '_ref_union_sketch_' + str(timeStamp)
431
389
  else:
432
- print('Args error, please see https://kssdtree.readthedocs.io/en/latest!!!')
390
+ temp_union_sketch = temp_reference_sketch
391
+ temp_subtract_sketch = toolutils.rs() + '_subtract_sketch_' + str(timeStamp)
392
+ temp_phy = toolutils.rs() + '_temp.phy'
393
+ print('Step1...')
394
+ s1 = sketch(shuf_file=shuf_file, genome_files=reference, output=temp_reference_sketch, set_opt=True)
395
+ if not s1:
396
+ return False
397
+ s2 = sketch(shuf_file=shuf_file, genome_files=genome_files, output=temp_genomes_sketch, set_opt=True)
398
+ if not s2:
399
+ return False
400
+ print('Step2...')
401
+ s3 = union(ref_sketch=temp_reference_sketch, output=temp_union_sketch)
402
+ if not s3:
403
+ return False
404
+ s4 = subtract(ref_sketch=temp_union_sketch, genome_sketch=temp_genomes_sketch,
405
+ output=temp_subtract_sketch, flag=1)
406
+ if not s4:
407
+ return False
408
+ print('Step3...')
409
+ if method == 'nj':
410
+ s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
411
+ flag=0)
412
+ else:
413
+ s5 = dist(genome_sketch=temp_subtract_sketch, output=temp_phy,
414
+ flag=1)
415
+ if not s5:
416
+ return False
417
+ print('Step4...')
418
+ s6 = build(phylip=temp_phy, output=output, method=method)
419
+ if not s6:
433
420
  return False
421
+ print('Step5...')
422
+ print('Tree visualization finished!')
423
+ visualize(newick=output, mode=mode)
424
+ if platform.system() == 'Linux':
425
+ current_directory = os.getcwd()
426
+ temp_dir1 = os.path.join(current_directory, temp_reference_sketch)
427
+ temp_dir2 = os.path.join(current_directory, temp_genomes_sketch)
428
+ temp_dir3 = os.path.join(current_directory, temp_union_sketch)
429
+ temp_dir4 = os.path.join(current_directory, temp_subtract_sketch)
430
+ temp_dir5 = os.path.join(current_directory, 'distout')
431
+ if os.path.exists(temp_dir1):
432
+ shutil.rmtree(temp_dir1)
433
+ if os.path.exists(temp_dir2):
434
+ shutil.rmtree(temp_dir2)
435
+ if os.path.exists(temp_dir3):
436
+ shutil.rmtree(temp_dir3)
437
+ if os.path.exists(temp_dir4):
438
+ shutil.rmtree(temp_dir4)
439
+ if os.path.exists(temp_dir5):
440
+ shutil.rmtree(temp_dir5)
441
+ if os.path.exists(temp_phy):
442
+ os.remove(temp_phy)
443
+ else:
444
+ print('Args error, please see https://kssdtree.readthedocs.io/en/latest!!!')
445
+ return False
446
+
434
447
  else:
435
448
  print('Pipeline error, please see https://kssdtree.readthedocs.io/en/latest!!!')
436
449
  return False
@@ -1,4 +1,3 @@
1
-
2
1
  #ifdef _WIN32
3
2
  #include <windows.h>
4
3
  #include <io.h>
@@ -121,12 +121,18 @@ int create_matrix(char *input_name, char *output_name, int flag) {
121
121
  for (int j = 0; j <= i; j++)
122
122
  distances[i][j] = 0.0;
123
123
  }
124
- } else {
124
+ } else if (flag == 1) {
125
125
  for (int i = 0; i < num_seqs; i++) {
126
126
  distances[i] = malloc(i * sizeof(double));
127
127
  for (int j = 0; j < i; j++)
128
128
  distances[i][j] = 0.0;
129
129
  }
130
+ } else {
131
+ for (int i = 0; i < num_seqs; i++) {
132
+ distances[i] = malloc(num_seqs * sizeof(double));
133
+ for (int j = 0; j < num_seqs; j++)
134
+ distances[i][j] = 0.0;
135
+ }
130
136
  }
131
137
  rewind(fp);
132
138
  fgets(line, PATHLEN, fp);
@@ -144,7 +150,7 @@ int create_matrix(char *input_name, char *output_name, int flag) {
144
150
  j += 1;
145
151
  }
146
152
  }
147
- } else {
153
+ } else if (flag == 1) {
148
154
  while (fgets(line, PATHLEN, fp) != NULL) {
149
155
  sscanf(line, "%*s %*s %*s %*s %lf", &distance);
150
156
  if (j < i) {
@@ -156,6 +162,16 @@ int create_matrix(char *input_name, char *output_name, int flag) {
156
162
  j += 1;
157
163
  }
158
164
  }
165
+ } else {
166
+ while (fgets(line, PATHLEN, fp) != NULL) {
167
+ sscanf(line, "%*s %*s %*s %*s %lf", &distance);
168
+ distances[i][j] = distance;
169
+ i += 1;
170
+ if (i == num_seqs && j < num_seqs) {
171
+ i = 0;
172
+ j += 1;
173
+ }
174
+ }
159
175
  }
160
176
 
161
177
  for (int i = 0; i < num_seqs; i++) {
@@ -187,7 +203,7 @@ int create_matrix(char *input_name, char *output_name, int flag) {
187
203
  }
188
204
  fprintf(fo, "\n");
189
205
  }
190
- } else {
206
+ } else if (flag == 1) {
191
207
  for (int i = 0; i < num_seqs; i++) {
192
208
  fprintf(fo, "%s\t", seq_names[i]);
193
209
  for (int j = 0; j < num_seqs; j++) {
@@ -197,6 +213,14 @@ int create_matrix(char *input_name, char *output_name, int flag) {
197
213
  }
198
214
  fprintf(fo, "\n");
199
215
  }
216
+ } else {
217
+ for (int i = 0; i < num_seqs; i++) {
218
+ fprintf(fo, "%s\t", seq_names[i]);
219
+ for (int j = 0; j < num_seqs; j++) {
220
+ fprintf(fo, "%.6f\t", distances[i][j]);
221
+ }
222
+ fprintf(fo, "\n");
223
+ }
200
224
  }
201
225
  for (int i = 0; i < num_seqs; i++) {
202
226
  free(distances[i]);
@@ -235,8 +259,9 @@ static PyObject *py_dist_dispatch(PyObject *self, PyObject *args) {
235
259
  char *str3;
236
260
  int flag1;
237
261
  int flag2;
262
+ char *str4;
238
263
  int N;
239
- if (!PyArg_ParseTuple(args, "sssiii", &str1, &str2, &str3, &flag1, &N, &flag2)) {
264
+ if (!PyArg_ParseTuple(args, "sssiiis", &str1, &str2, &str3, &flag1, &N, &flag2, &str4)) {
240
265
  return NULL;
241
266
  }
242
267
  if (flag1 == 0) {
@@ -394,6 +419,11 @@ static PyObject *py_dist_dispatch(PyObject *self, PyObject *args) {
394
419
  dist_opt_val3.num_remaining_args = 1;
395
420
  dist_opt_val3.remaining_args = &str3;
396
421
  dist_opt_val3.num_neigb = N;
422
+ if (strcmp(str4, "mash") == 0) {
423
+ dist_opt_val3.metric = 0;
424
+ } else {
425
+ dist_opt_val3.metric = 1;
426
+ }
397
427
  #ifdef _OPENMP
398
428
  if(dist_opt_val3.p == 0)
399
429
  dist_opt_val3.p = omp_get_num_procs();
@@ -628,4 +658,4 @@ static struct PyModuleDef kssdmodule = {
628
658
 
629
659
  PyMODINIT_FUNC PyInit_kssd(void) {
630
660
  return PyModule_Create(&kssdmodule);
631
- }
661
+ }
@@ -2,14 +2,7 @@ import sys
2
2
  from setuptools import setup, Extension, find_packages
3
3
  from os import environ
4
4
  import os
5
- import platform
6
5
 
7
-
8
- # def get_gcc_version():
9
- # gcc_version = subprocess.check_output(['gcc', '--version']).decode('utf-8')
10
- # version_line = gcc_version.split('\n', 1)[0]
11
- # version_str = version_line.split()[-1]
12
- # return version_str
13
6
  extra_compile_args = []
14
7
  extra_link_args = []
15
8
  if 'darwin' in sys.platform:
@@ -26,12 +19,12 @@ if 'darwin' in sys.platform:
26
19
  gcc_path = "/opt/homebrew/bin/gcc-11"
27
20
  elif 'gcc-12' == gcc_version:
28
21
  gcc_path = "/opt/homebrew/bin/gcc-12"
22
+ elif 'gcc-13' == gcc_version:
23
+ gcc_path = "/opt/homebrew/bin/gcc-13"
29
24
  elif 'gcc-14' == gcc_version:
30
25
  gcc_path = "/opt/homebrew/bin/gcc-14"
31
- elif 'gcc-15' == gcc_version:
32
- gcc_path = "/opt/homebrew/bin/gcc-15"
33
26
  else:
34
- gcc_path = "/opt/homebrew/bin/gcc-13"
27
+ gcc_path = ""
35
28
  extra_compile_args = ['-fopenmp']
36
29
  extra_link_args = ['-fopenmp']
37
30
  os.environ["CC"] = gcc_path
@@ -85,25 +78,16 @@ include_dirs1 = ['kssdheaders']
85
78
  include_dirs2 = ['njheaders']
86
79
  include_dirs3 = ['dnjheaders']
87
80
 
88
- if 'darwin' in sys.platform:
89
- if platform.machine() == 'arm64':
90
- require_pakages = []
91
- else:
92
- require_pakages = [
93
- 'pyqt5',
94
- 'ete3',
95
- 'requests'
96
- ]
97
- else:
98
- require_pakages = [
99
- 'pyqt5',
100
- 'ete3',
101
- 'requests'
102
- ]
81
+ require_pakages = [
82
+ 'pyqt5',
83
+ 'ete3',
84
+ 'requests',
85
+ 'pandas'
86
+ ]
103
87
 
104
88
  setup(
105
89
  name='kssdtree',
106
- version='2.0.1',
90
+ version='2.0.3',
107
91
  author='Hang Yang',
108
92
  author_email='yhlink1207@gmail.com',
109
93
  description="Kssdtree is a versatile Python package for phylogenetic analysis. It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not. ",
@@ -126,4 +110,3 @@ setup(
126
110
  include_package_data=True
127
111
  )
128
112
 
129
-
@@ -8,7 +8,7 @@ import pandas as pd
8
8
  import string
9
9
 
10
10
  def allowed_file(filename):
11
- allowed_extensions = ['.fa', '.fa.gz', '.fasta', '.fasta.gz', '.fna', '.fna.gz', '.fastq', '.fastq.gz']
11
+ allowed_extensions = ['.fa', '.fa.gz', '.fasta', '.fasta.gz', '.fna', '.fna.gz', '.fastq', '.fastq.gz', '.fq', '.fq.gz']
12
12
  return any(filename.endswith(ext) for ext in allowed_extensions)
13
13
 
14
14
  def rs():
@@ -24,6 +24,11 @@ def is_positive_integer(num):
24
24
  else:
25
25
  return False
26
26
 
27
+ def is_negative_integer(num):
28
+ if isinstance(num, int) and num < 0:
29
+ return True
30
+ else:
31
+ return False
27
32
 
28
33
  def randomcolor():
29
34
  colorArr = ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F']
kssdtree-2.0.1/README.md DELETED
@@ -1,12 +0,0 @@
1
- # Kssdree: an interactive Python package for phylogenetic analysis based on sketching technique
2
-
3
- Kssdtree is a versatile Python package for phylogenetic analysis, including three different pipelines: routine pipeline, reference subtraction pipeline and phylogenetic placement pipeline. The routine pipeline serves as a versatile tool for general-purpose phylogenetic analysis of users' genomic data. The reference subtraction pipeline designs for population-level phylogenetic analysis. The phylogenetic placement pipeline facilitates the search for similar genomes in the Genome Taxonomy Database (GTDB). It conducts phylogenetic analysis alongside these similar genomes and positions the input genomes within the entire prokaryotic tree of life.
4
-
5
- It also provides one-stop tree construction and visualization. It can handle DNA sequences of both fasta or fastq format, whether gzipped or not. Kssdtree can run on multiple platforms (Linux, Windows, and MacOS) with Jupyter notebooks.
6
-
7
- More usages about Kssdtree, please see Kssdtree documentation (https://kssdtree.readthedocs.io/en/latest).
8
-
9
-
10
-
11
-
12
-
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes