PyamilySeq 0.0.1__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: PyamilySeq
3
- Version: 0.0.1
3
+ Version: 0.2.0
4
4
  Summary: PyamilySeq - A a tool to look for sequence-based gene families identified by clustering methods such as CD-HIT, DIAMOND, BLAST or MMseqs2.
5
5
  Home-page: https://github.com/NickJD/PyamilySeq
6
6
  Author: Nicholas Dimonaco
@@ -12,7 +12,6 @@ Classifier: Operating System :: OS Independent
12
12
  Requires-Python: >=3.6
13
13
  Description-Content-Type: text/markdown
14
14
  License-File: LICENSE
15
- Requires-Dist: numpy
16
15
 
17
16
  # PyamilySeq
18
17
  PyamilySeq (Family Seek) is a Python tool for clustering gene sequences into families based on sequence similarity identified by tools such as CD-HIT, DIAMOND or MMseqs2.
@@ -32,7 +31,37 @@ PyamilySeq requires Python 3.6 or higher. Install dependencies using pip:
32
31
  pip install PyamilySeq
33
32
  ```
34
33
 
35
- ## Usage
34
+ ## Usage - Menu
35
+ ```
36
+ PyamilySeq_Species.py -h
37
+ usage: PyamilySeq_Species.py [-h] -c CLUSTERS -f {CD-HIT,CSV,TSV} [-w WRITE_FAMILIES] [-fasta FASTA] [-rc RECLUSTERED] [-st SEQUENCE_TAG]
38
+ [-groups CORE_GROUPS] [-gpa GENE_PRESENCE_ABSENCE_OUT] [-verbose {True,False}] [-v]
39
+
40
+ PyamilySeq v0.2.0: PyamilySeq Run Parameters.
41
+
42
+ Required Arguments:
43
+ -c CLUSTERS Clustering output file from CD-HIT, TSV or CSV Edge List
44
+ -f {CD-HIT,CSV,TSV} Which format to use (CD-HIT or Comma/Tab Separated Edge-List (such as MMseqs2 tsv output))
45
+
46
+ Output Parameters:
47
+ -w WRITE_FAMILIES Default - No output: Output sequences of identified families (provide levels at which to output "-w 99 95" - Must provide
48
+ FASTA file with -fasta
49
+ -fasta FASTA FASTA file to use in conjunction with "-w"
50
+
51
+ Optional Arguments:
52
+ -rc RECLUSTERED Clustering output file from secondary round of clustering
53
+ -st SEQUENCE_TAG Default - "StORF": Unique identifier to be used to distinguish the second of two rounds of clustered sequences
54
+ -groups CORE_GROUPS Default - ('99,95,90,80,15'): Gene family groups to use
55
+ -gpa GENE_PRESENCE_ABSENCE_OUT
56
+ Default - False: If selected, a Roary formatted gene_presence_absence.csv will be created - Required for Coinfinder and other
57
+ downstream tools
58
+
59
+ Misc:
60
+ -verbose {True,False}
61
+ Default - False: Print out runtime messages
62
+ -v Default - False: Print out version number and exit
63
+
64
+ ```
36
65
 
37
66
  ### Clustering Analysis
38
67
 
@@ -59,6 +88,7 @@ Replace `reclustered_file` with the path to the file containing additional seque
59
88
  PyamilySeq generates various outputs, including:
60
89
 
61
90
  - **Gene Presence-Absence File**: This CSV file details the presence and absence of genes across genomes.
91
+ - **FASTA Files for Each Gene Family**:
62
92
 
63
93
  ## Gene Family Groups
64
94
 
@@ -16,7 +16,37 @@ PyamilySeq requires Python 3.6 or higher. Install dependencies using pip:
16
16
  pip install PyamilySeq
17
17
  ```
18
18
 
19
- ## Usage
19
+ ## Usage - Menu
20
+ ```
21
+ PyamilySeq_Species.py -h
22
+ usage: PyamilySeq_Species.py [-h] -c CLUSTERS -f {CD-HIT,CSV,TSV} [-w WRITE_FAMILIES] [-fasta FASTA] [-rc RECLUSTERED] [-st SEQUENCE_TAG]
23
+ [-groups CORE_GROUPS] [-gpa GENE_PRESENCE_ABSENCE_OUT] [-verbose {True,False}] [-v]
24
+
25
+ PyamilySeq v0.2.0: PyamilySeq Run Parameters.
26
+
27
+ Required Arguments:
28
+ -c CLUSTERS Clustering output file from CD-HIT, TSV or CSV Edge List
29
+ -f {CD-HIT,CSV,TSV} Which format to use (CD-HIT or Comma/Tab Separated Edge-List (such as MMseqs2 tsv output))
30
+
31
+ Output Parameters:
32
+ -w WRITE_FAMILIES Default - No output: Output sequences of identified families (provide levels at which to output "-w 99 95" - Must provide
33
+ FASTA file with -fasta
34
+ -fasta FASTA FASTA file to use in conjunction with "-w"
35
+
36
+ Optional Arguments:
37
+ -rc RECLUSTERED Clustering output file from secondary round of clustering
38
+ -st SEQUENCE_TAG Default - "StORF": Unique identifier to be used to distinguish the second of two rounds of clustered sequences
39
+ -groups CORE_GROUPS Default - ('99,95,90,80,15'): Gene family groups to use
40
+ -gpa GENE_PRESENCE_ABSENCE_OUT
41
+ Default - False: If selected, a Roary formatted gene_presence_absence.csv will be created - Required for Coinfinder and other
42
+ downstream tools
43
+
44
+ Misc:
45
+ -verbose {True,False}
46
+ Default - False: Print out runtime messages
47
+ -v Default - False: Print out version number and exit
48
+
49
+ ```
20
50
 
21
51
  ### Clustering Analysis
22
52
 
@@ -43,6 +73,7 @@ Replace `reclustered_file` with the path to the file containing additional seque
43
73
  PyamilySeq generates various outputs, including:
44
74
 
45
75
  - **Gene Presence-Absence File**: This CSV file details the presence and absence of genes across genomes.
76
+ - **FASTA Files for Each Gene Family**:
46
77
 
47
78
  ## Gene Family Groups
48
79
 
@@ -1,6 +1,6 @@
1
1
  [metadata]
2
2
  name = PyamilySeq
3
- version = v0.0.1
3
+ version = v0.2.0
4
4
  author = Nicholas Dimonaco
5
5
  author_email = nicholas@dimonaco.co.uk
6
6
  description = PyamilySeq - A a tool to look for sequence-based gene families identified by clustering methods such as CD-HIT, DIAMOND, BLAST or MMseqs2.
@@ -20,7 +20,6 @@ package_dir =
20
20
  packages = find:
21
21
  python_requires = >=3.6
22
22
  install_requires =
23
- numpy
24
23
 
25
24
  [options.packages.find]
26
25
  where = src