PyamilySeq 1.1.2__tar.gz → 1.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/PKG-INFO +6 -6
  2. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/README.md +5 -5
  3. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/setup.cfg +5 -1
  4. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Seq_Combiner.py +2 -2
  5. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Seq_Extractor.py +6 -1
  6. pyamilyseq-1.2.0/src/PyamilySeq/constants.py +2 -0
  7. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/utils.py +1 -3
  8. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq.egg-info/PKG-INFO +6 -6
  9. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq.egg-info/entry_points.txt +3 -0
  10. pyamilyseq-1.1.2/src/PyamilySeq/constants.py +0 -2
  11. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/LICENSE +0 -0
  12. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/pyproject.toml +0 -0
  13. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Cluster_Compare.py +0 -0
  14. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Cluster_Summary.py +0 -0
  15. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Group_Extractor.py +0 -0
  16. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Group_Sizes.py +0 -0
  17. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Group_Splitter.py +0 -0
  18. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/PyamilySeq.py +0 -0
  19. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/PyamilySeq_Genus.py +0 -0
  20. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/PyamilySeq_Species.py +0 -0
  21. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/Seq_Finder.py +0 -0
  22. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/__init__.py +0 -0
  23. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/clusterings.py +0 -0
  24. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq/config.py +0 -0
  25. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq.egg-info/SOURCES.txt +0 -0
  26. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq.egg-info/dependency_links.txt +0 -0
  27. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq.egg-info/requires.txt +0 -0
  28. {pyamilyseq-1.1.2 → pyamilyseq-1.2.0}/src/PyamilySeq.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PyamilySeq
3
- Version: 1.1.2
3
+ Version: 1.2.0
4
4
  Summary: PyamilySeq - A a tool to investigate sequence-based gene groups identified by clustering methods such as CD-HIT, DIAMOND, BLAST or MMseqs2.
5
5
  Home-page: https://github.com/NickJD/PyamilySeq
6
6
  Author: Nicholas Dimonaco
@@ -46,7 +46,7 @@ To update to the newest version add '-U' to end of the pip install command.
46
46
  ```commandline
47
47
  usage: PyamilySeq.py [-h] {Full,Partial} ...
48
48
 
49
- PyamilySeq v1.1.2: A tool for gene clustering and analysis.
49
+ PyamilySeq v1.2.0: A tool for gene clustering and analysis.
50
50
 
51
51
  positional arguments:
52
52
  {Full,Partial} Choose a mode: 'Full' or 'Partial'.
@@ -76,7 +76,7 @@ Escherichia_coli_110957|ENSB_TIZS9kbTvShDvyX Escherichia_coli_110957|ENSB_TIZS9k
76
76
  ```
77
77
  ### Example output:
78
78
  ```
79
- Running PyamilySeq v1.1.2
79
+ Running PyamilySeq v1.2.0
80
80
  Calculating Groups
81
81
  Number of Genomes: 10
82
82
  Gene Groups
@@ -221,7 +221,7 @@ Seq-Combiner -input_dir .../test_data/genomes -name_split_gff .gff3 -output_dir
221
221
  ```
222
222
  usage: Seq_Combiner.py [-h] -input_dir INPUT_DIR -input_type {separate,combined,fasta} [-name_split_gff NAME_SPLIT_GFF] [-name_split_fasta NAME_SPLIT_FASTA] -output_dir OUTPUT_DIR -output_name OUTPUT_FILE [-gene_ident GENE_IDENT] [-translate] [-v]
223
223
 
224
- PyamilySeq v1.1.2: Seq-Combiner - A tool to extract sequences from GFF/FASTA files and prepare them for PyamilySeq.
224
+ PyamilySeq v1.2.0: Seq-Combiner - A tool to extract sequences from GFF/FASTA files and prepare them for PyamilySeq.
225
225
 
226
226
  options:
227
227
  -h, --help show this help message and exit
@@ -264,7 +264,7 @@ usage: Group_Splitter.py [-h] -input_fasta INPUT_FASTA -sequence_type {AA,DNA}
264
264
  [-M CLUSTERING_MEMORY] [-no_delete_temp_files]
265
265
  [-verbose] [-v]
266
266
 
267
- PyamilySeq v1.1.2: Group-Splitter - A tool to split multi-copy gene groups
267
+ PyamilySeq v1.2.0: Group-Splitter - A tool to split multi-copy gene groups
268
268
  identified by PyamilySeq.
269
269
 
270
270
  options:
@@ -317,7 +317,7 @@ Cluster-Summary -genome_num 10 -input_clstr .../test_data/species/E-coli/E-coli_
317
317
  usage: Cluster_Summary.py [-h] -input_clstr INPUT_CLSTR -output OUTPUT -genome_num GENOME_NUM
318
318
  [-output_dir OUTPUT_DIR] [-verbose] [-v]
319
319
 
320
- PyamilySeq v1.1.2: Cluster-Summary - A tool to summarise CD-HIT clustering files.
320
+ PyamilySeq v1.2.0: Cluster-Summary - A tool to summarise CD-HIT clustering files.
321
321
 
322
322
  options:
323
323
  -h, --help show this help message and exit
@@ -29,7 +29,7 @@ To update to the newest version add '-U' to end of the pip install command.
29
29
  ```commandline
30
30
  usage: PyamilySeq.py [-h] {Full,Partial} ...
31
31
 
32
- PyamilySeq v1.1.2: A tool for gene clustering and analysis.
32
+ PyamilySeq v1.2.0: A tool for gene clustering and analysis.
33
33
 
34
34
  positional arguments:
35
35
  {Full,Partial} Choose a mode: 'Full' or 'Partial'.
@@ -59,7 +59,7 @@ Escherichia_coli_110957|ENSB_TIZS9kbTvShDvyX Escherichia_coli_110957|ENSB_TIZS9k
59
59
  ```
60
60
  ### Example output:
61
61
  ```
62
- Running PyamilySeq v1.1.2
62
+ Running PyamilySeq v1.2.0
63
63
  Calculating Groups
64
64
  Number of Genomes: 10
65
65
  Gene Groups
@@ -204,7 +204,7 @@ Seq-Combiner -input_dir .../test_data/genomes -name_split_gff .gff3 -output_dir
204
204
  ```
205
205
  usage: Seq_Combiner.py [-h] -input_dir INPUT_DIR -input_type {separate,combined,fasta} [-name_split_gff NAME_SPLIT_GFF] [-name_split_fasta NAME_SPLIT_FASTA] -output_dir OUTPUT_DIR -output_name OUTPUT_FILE [-gene_ident GENE_IDENT] [-translate] [-v]
206
206
 
207
- PyamilySeq v1.1.2: Seq-Combiner - A tool to extract sequences from GFF/FASTA files and prepare them for PyamilySeq.
207
+ PyamilySeq v1.2.0: Seq-Combiner - A tool to extract sequences from GFF/FASTA files and prepare them for PyamilySeq.
208
208
 
209
209
  options:
210
210
  -h, --help show this help message and exit
@@ -247,7 +247,7 @@ usage: Group_Splitter.py [-h] -input_fasta INPUT_FASTA -sequence_type {AA,DNA}
247
247
  [-M CLUSTERING_MEMORY] [-no_delete_temp_files]
248
248
  [-verbose] [-v]
249
249
 
250
- PyamilySeq v1.1.2: Group-Splitter - A tool to split multi-copy gene groups
250
+ PyamilySeq v1.2.0: Group-Splitter - A tool to split multi-copy gene groups
251
251
  identified by PyamilySeq.
252
252
 
253
253
  options:
@@ -300,7 +300,7 @@ Cluster-Summary -genome_num 10 -input_clstr .../test_data/species/E-coli/E-coli_
300
300
  usage: Cluster_Summary.py [-h] -input_clstr INPUT_CLSTR -output OUTPUT -genome_num GENOME_NUM
301
301
  [-output_dir OUTPUT_DIR] [-verbose] [-v]
302
302
 
303
- PyamilySeq v1.1.2: Cluster-Summary - A tool to summarise CD-HIT clustering files.
303
+ PyamilySeq v1.2.0: Cluster-Summary - A tool to summarise CD-HIT clustering files.
304
304
 
305
305
  options:
306
306
  -h, --help show this help message and exit
@@ -1,6 +1,6 @@
1
1
  [metadata]
2
2
  name = PyamilySeq
3
- version = v1.1.2
3
+ version = v1.2.0
4
4
  license_files = LICENSE
5
5
  author = Nicholas Dimonaco
6
6
  author_email = nicholas@dimonaco.co.uk
@@ -43,6 +43,10 @@ console_scripts =
43
43
  seq-finder = PyamilySeq.Seq_Finder:main
44
44
  Seq-Extractor = PyamilySeq.Seq_Extractor:main
45
45
  seq-extractor = PyamilySeq.Seq_Extractor:main
46
+
47
+ compute-singletrees-rf = aux_tools.RF.Compute_SingleTree_RFs:main
48
+ compare-rf = aux_tools.RF.compare_RF:main
49
+ compare-contree-singletrees = aux_tools.RF.compare_contree_singletrees:main
46
50
 
47
51
  [egg_info]
48
52
  tag_build =
@@ -59,7 +59,7 @@ def main():
59
59
  exit(1)
60
60
  if options.input_type == 'fasta' and options.name_split_fasta is None:
61
61
  print("Please provide a substring to split the filename and extract the genome name.")
62
- exit
62
+ exit(1)
63
63
 
64
64
  output_path = os.path.abspath(options.output_dir)
65
65
  if not os.path.exists(output_path):
@@ -77,7 +77,7 @@ def main():
77
77
  elif options.input_type == 'combined':
78
78
  read_combined_files(options.input_dir, options.name_split_gff, options.gene_ident, combined_out_file, options.translate, True)
79
79
  elif options.input_type == 'fasta':
80
- read_fasta_files(options.input_dir, options.name_split_fasta, combined_out_file, options.translate)
80
+ read_fasta_files(options.input_dir, options.name_split_fasta, combined_out_file, options.translate, True)
81
81
 
82
82
  if __name__ == "__main__":
83
83
  main()
@@ -9,8 +9,13 @@ def find_gene_ids_in_csv(csv_file, group_name):
9
9
  cells = line.strip().split(',')
10
10
  if cells[0].replace('"','') == group_name:
11
11
  # Collect gene IDs from column 14 onward
12
+ # for cell in cells[14:]:
13
+ # gene_ids.extend(cell.strip().replace('"','').split()) # Splitting by spaces if there are multiple IDs in a cell break
12
14
  for cell in cells[14:]:
13
- gene_ids.extend(cell.strip().replace('"','').split()) # Splitting by spaces if there are multiple IDs in a cell break
15
+ for gene in cell.strip().replace('"', '').split(';'):
16
+ if gene:
17
+ gene_ids.append(gene)
18
+
14
19
  return gene_ids
15
20
 
16
21
  def extract_sequences(fasta_file, gene_ids):
@@ -0,0 +1,2 @@
1
+ PyamilySeq_Version = 'v1.2.0'
2
+
@@ -7,7 +7,6 @@ from tempfile import NamedTemporaryFile
7
7
  import sys
8
8
  import re
9
9
  import math
10
- #from config import config_params
11
10
 
12
11
  ####
13
12
  # Placeholder for the distance function
@@ -15,11 +14,10 @@ levenshtein_distance_cal = None
15
14
  # Check for Levenshtein library once
16
15
  try:
17
16
  import Levenshtein as LV
18
- # Assign the optimized function
17
+ # Assign the optimised function
19
18
  def levenshtein_distance_calc(seq1, seq2):
20
19
  return LV.distance(seq1, seq2)
21
20
  except (ModuleNotFoundError, ImportError):
22
- #if config_params.verbose == True: - Not implemented yet
23
21
  print("Levenshtein package not installed - Will fallback to slower Python implementation.")
24
22
  # Fallback implementation
25
23
  def levenshtein_distance_calc(seq1, seq2):
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: PyamilySeq
3
- Version: 1.1.2
3
+ Version: 1.2.0
4
4
  Summary: PyamilySeq - A a tool to investigate sequence-based gene groups identified by clustering methods such as CD-HIT, DIAMOND, BLAST or MMseqs2.
5
5
  Home-page: https://github.com/NickJD/PyamilySeq
6
6
  Author: Nicholas Dimonaco
@@ -46,7 +46,7 @@ To update to the newest version add '-U' to end of the pip install command.
46
46
  ```commandline
47
47
  usage: PyamilySeq.py [-h] {Full,Partial} ...
48
48
 
49
- PyamilySeq v1.1.2: A tool for gene clustering and analysis.
49
+ PyamilySeq v1.2.0: A tool for gene clustering and analysis.
50
50
 
51
51
  positional arguments:
52
52
  {Full,Partial} Choose a mode: 'Full' or 'Partial'.
@@ -76,7 +76,7 @@ Escherichia_coli_110957|ENSB_TIZS9kbTvShDvyX Escherichia_coli_110957|ENSB_TIZS9k
76
76
  ```
77
77
  ### Example output:
78
78
  ```
79
- Running PyamilySeq v1.1.2
79
+ Running PyamilySeq v1.2.0
80
80
  Calculating Groups
81
81
  Number of Genomes: 10
82
82
  Gene Groups
@@ -221,7 +221,7 @@ Seq-Combiner -input_dir .../test_data/genomes -name_split_gff .gff3 -output_dir
221
221
  ```
222
222
  usage: Seq_Combiner.py [-h] -input_dir INPUT_DIR -input_type {separate,combined,fasta} [-name_split_gff NAME_SPLIT_GFF] [-name_split_fasta NAME_SPLIT_FASTA] -output_dir OUTPUT_DIR -output_name OUTPUT_FILE [-gene_ident GENE_IDENT] [-translate] [-v]
223
223
 
224
- PyamilySeq v1.1.2: Seq-Combiner - A tool to extract sequences from GFF/FASTA files and prepare them for PyamilySeq.
224
+ PyamilySeq v1.2.0: Seq-Combiner - A tool to extract sequences from GFF/FASTA files and prepare them for PyamilySeq.
225
225
 
226
226
  options:
227
227
  -h, --help show this help message and exit
@@ -264,7 +264,7 @@ usage: Group_Splitter.py [-h] -input_fasta INPUT_FASTA -sequence_type {AA,DNA}
264
264
  [-M CLUSTERING_MEMORY] [-no_delete_temp_files]
265
265
  [-verbose] [-v]
266
266
 
267
- PyamilySeq v1.1.2: Group-Splitter - A tool to split multi-copy gene groups
267
+ PyamilySeq v1.2.0: Group-Splitter - A tool to split multi-copy gene groups
268
268
  identified by PyamilySeq.
269
269
 
270
270
  options:
@@ -317,7 +317,7 @@ Cluster-Summary -genome_num 10 -input_clstr .../test_data/species/E-coli/E-coli_
317
317
  usage: Cluster_Summary.py [-h] -input_clstr INPUT_CLSTR -output OUTPUT -genome_num GENOME_NUM
318
318
  [-output_dir OUTPUT_DIR] [-verbose] [-v]
319
319
 
320
- PyamilySeq v1.1.2: Cluster-Summary - A tool to summarise CD-HIT clustering files.
320
+ PyamilySeq v1.2.0: Cluster-Summary - A tool to summarise CD-HIT clustering files.
321
321
 
322
322
  options:
323
323
  -h, --help show this help message and exit
@@ -8,6 +8,9 @@ Seq-Extractor = PyamilySeq.Seq_Extractor:main
8
8
  Seq-Finder = PyamilySeq.Seq_Finder:main
9
9
  cluster-extractor = PyamilySeq.Cluster_Extractor:main
10
10
  cluster-summary = PyamilySeq.Cluster_Summary:main
11
+ compare-contree-singletrees = aux_tools.RF.compare_contree_singletrees:main
12
+ compare-rf = aux_tools.RF.compare_RF:main
13
+ compute-singletrees-rf = aux_tools.RF.Compute_SingleTree_RFs:main
11
14
  group-splitter = PyamilySeq.Group_Splitter:main
12
15
  pyamilyseq = PyamilySeq.PyamilySeq:main
13
16
  seq-combiner = PyamilySeq.Seq_Combiner:main
@@ -1,2 +0,0 @@
1
- PyamilySeq_Version = 'v1.1.2'
2
-
File without changes
File without changes