DomFun 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 87063a5c0ddf9a988af77b9aaa73e2c7c676f2ad
4
- data.tar.gz: 725fcefa7c7f464526410e1dcc7f50f6a19adf8f
3
+ metadata.gz: ab8f445b1ecc8e416c559fb87b46c853e2cc74e1
4
+ data.tar.gz: 7d28a92576ffc5f4816c0062b4702f6195d2e178
5
5
  SHA512:
6
- metadata.gz: 13b5059cf7978c2c8fe8583da5118e46725287f5d90658e7f2574ded9eebf8988d61203fe8c576371ded87f93c64d5ef7de18523c1db72ea0180dde8b9f5b881
7
- data.tar.gz: 47a817f6bdffa4efce7624199125afa94ea30d4cdc8e0b2d13f48dd5c6cf25af4e0af0c344988ea3f0ae14a8f8d043168c266d8c72f303267ca45344e315d9aa
6
+ metadata.gz: 88cbe093ee831669f5bc6037bfafa4d94e1865ab032fc613ae62550b46811d502a43ec7f1c632a81bb622d36923cafc493c400d7ecf98c572434461ad8731dd0
7
+ data.tar.gz: 9820f00ac5e9d63b88449bc8a0e47b5c97c7e6541791d24522b0a97ba64363d61df1343f4acf04d65a7036ccd9cfc8529ac5c672cf69b7e3f2eec6994812f736
data/DomFun.gemspec CHANGED
@@ -9,9 +9,9 @@ Gem::Specification.new do |spec|
9
9
  spec.authors = ["Elena Rojano, Pedro Seoane"]
10
10
  spec.email = ["elenarojano@uma.es, seoanezonjic@hotmail.com"]
11
11
 
12
- spec.summary = %q{Tool to predict protein functions based on domains-FunSys associations.}
13
- spec.description = %q{From associations calculated between protein domains and functional systems (FunSys), DomFun can predict the functions of proteins looking up domains and the FunSys that have been associated with. The system is validated using data from CAFA.}
14
- spec.homepage = "https://github.com/ElenaRojano/DomFun"
12
+ spec.summary = %q{Tool to predict protein functions using domains-FunSys associations.}
13
+ spec.description = %q{Proteins function predictor trained with associations between protein domains (classified in CATH) and functional systems (GO, KEGG, Reactome).}
14
+ spec.homepage = "https://bitbucket.org/elenarojano/domfun"
15
15
  spec.license = "MIT"
16
16
 
17
17
  # Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
@@ -41,4 +41,4 @@ Gem::Specification.new do |spec|
41
41
  spec.add_development_dependency "rspec", "~> 3.0"
42
42
 
43
43
  spec.add_dependency "NetAnalyzer", "~> 0.1.5"
44
- end
44
+ end
data/README.md CHANGED
@@ -1,8 +1,6 @@
1
1
  # DomFun
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/DomFun`. To experiment with that code, run `bin/console` for an interactive prompt.
4
-
5
- TODO: Delete this and the text above, and describe your gem
3
+ DomFun is a new system to assign functions to unknown proteins using a systemic approach without considering their sequence but their domains associated with functional systems. It uses associations calculated between protein domains and functional annotations as training dataset and performs predictions over proteins (using UniProt identifiers) by finding their domains and if they have been associated with functional annotations (in GO molecular functions, biological processes, KEGG and Reactome pathway terms).
6
4
 
7
5
  ## Installation
8
6
 
@@ -22,7 +20,98 @@ Or install it yourself as:
22
20
 
23
21
  ## Usage
24
22
 
25
- TODO: Write usage instructions here
23
+ To use DomFun, it is necessary to calculate associations between protein domains and functional annotations. For this, use NetAnalyzer ruby gem (https://rubygems.org/gems/NetAnalyzer), choosing the association index that fits the best to your data. Once these associations are calculated, they will be used to train DomFun and predict a set of proteins (giving UniProt identifiers) with unknown function.
24
+
25
+ Procedure steps performed in the study to develop DomFun follows:
26
+
27
+ 1. Generate relationships between domains, proteins and FunSys.
28
+ 2. Generate networks and calculate associations between domains and FunSys.
29
+ 3. Training DomFun with domain-FunSys associations.
30
+
31
+ ### STEP 1. Generate domains-proteins-FunSys relationships ###
32
+ To generate the domains-FunSys associations to train DomFun is necessary to have, on the one hand proteins (UniProt identifiers preferably) with functional annotations (GO, KEGG or Reactome, for example) and, on the other hand, proteins with their functional domains classified (CATH or SCOP, for example). If domains-proteins-FunSys have been previously established this procedure will not be necessary.
33
+
34
+ Before calculating the association between protein domains and functional annotations/systems (abbr. FunSys), is necessary to construct the relationships between domains, proteins and FunSys:
35
+ *Note: DomFun was developed using the human proteins dataset; however it can be used in any species if protein identifiers, their domains and functional annotations are available.*
36
+ ```
37
+ add_protein_functional_families.rb -a proteinAnnotationsFile -d cathDomainsFile -p activeAnnots
38
+ ```
39
+ Where
40
+ * -a proteinAnnotationsFile = Protein annotations file
41
+ * -d cathDomainsFile = Domains file
42
+ * -p activeAnnots = List with annotations names used (KEGG, GO, Reactome...)
43
+
44
+ ### STEP 2. Create tripartite network and association calculation ###
45
+
46
+ These relations contain the information to generate the tripartite networks to be analysed. They will be added in a folder "networks/*\_networks" where * is the name of the domains classification type (for example, superfamilies or FunFams), and the filename will be "network\_", followed by the annotation name. These networks contain two different types of layers: domain-protein and protein-FunSys, connecting domains with FunSys through protein nodes. These networks are generated with merge_pairs.rb.
47
+ *Note: add_protein_functional_families.rb output can be modified in the code if only a set of annotations and protein domains classification is provided.*
48
+
49
+ ```
50
+ merge_pairs.rb -i domProtFunSysRelations -k domainTypeID -o network -n numberOfFiles -m minNumConns
51
+ ```
52
+ Where:
53
+ * -i domProtFunSysRelations = Input file with domains-proteins-FunSys relationships.
54
+ * -k domainTypeID = Domain type identifier. For example, if FunFams from CATH used, 'ff' identifier must be provided.
55
+ * -o tripartiteNetwork = Tripartite network output file.
56
+ * -n numberOfFiles = Number of files to output (if k-cross performed).
57
+ * -m minNumConns = Minimum number of proteins supporting a relation between domain and FunSys.
58
+
59
+ This *tripartiteNetwork* will contain the relations to be analysed by NetAnalyzer. This tool will return the list of domains associated with FunSys and the association values calculated depending on the index selected.
60
+
61
+ ```
62
+ NetAnalyzer.rb -i tripartiteNetwork -l layers -m assocMeth -a assocVals
63
+ ```
64
+ Where:
65
+ * -i tripartiteNetwork = Tripartite network to calculate associations between their nodes.
66
+ * -l layers = Layer names of the network and their identifier. For example, a tripartite network with domains classified in FunFams (ff), annotations in GO terms and proteins will be set as: "domains,ff;annotations,GO:;protID,[A-Za-z0-9]". Please separate between names and identifiers with commas, and between layers with semicolons.
67
+ * -m assocMeth = Association method (Jaccard, Pearson correlation coefficient or Hypergeometric index, amongst others). Set only one method by execution. For more information about association methods available in NetAnalyzer, please go to its documentation.
68
+ * -U projection = Layers projection to calculate associations. In this case, we use proteins as nexus to calculate the association between domains and FunSys in common. For this, it should be established as "annotations,domains;protID". Please separate the nexus node (proteins) with semicolon, and nodes to associate (domains with FunSys) with comma.
69
+ * -a assocVals = Associations output file.
70
+
71
+ ### STEP 3. Training DomFun with domain-FunSys associations and predicting proteins function ###
72
+ DomFun must be trained with associations between protein domains and FunSys to perform predictions. Depending on the association index set up in NetAnalyzer, these values are standardized or not before training DomFun. For example, in the case of using the Hypergeometric index, it is not necessary to perform this standardization. However, for the rest of methods is mandatory. For this, use standardize_scores.R.
73
+
74
+ *Note: please OMIT this step if hypergeometric index was used to calculate association values (they are directly transformed into P-values.*
75
+
76
+ ```
77
+ standardize_scores.R -d assocVals -e threshold -o assocValsStd -s col > outZscore
78
+ ```
79
+
80
+ Where
81
+ * -d assocVals = Domain-FunSys association values file to standardize.
82
+ * -e threshold = Association value to use as a threshold.
83
+ * -o assocValsStd = Association values standardized output file.
84
+ * -s col > outZscore = Z-score to use as a filter (if necessary).
85
+
86
+ DomFun will be trained with associations (standardized or not) to perform functional annotations predictions from a list of proteins (UniProt identifiers).
87
+
88
+ ```
89
+ domains_to_function_predictor.rb -a assocVals -u -f cathDomainsFile -p proteinIDsList -c domainTypeID -T assocValThresh -i combMeth -t combScorThresh -o predictionResults
90
+ ```
91
+ Where:
92
+ * -a assocVals = File with associations to train DomFun.
93
+ * -u multipleProts = Set if multiple proteins are given to predict.
94
+ * -f cathDomainsFile = Domains file.
95
+ * -p proteinIDsList = List of protein identifiers.
96
+ * -c domainTypeID = Domain type identifier. For example, if FunFams from CATH used, 'ff' identifier must be provided.
97
+ * -T assocValThresh = Threshold to filter association values.
98
+ * -i combMeth = Method to combine association values. Please select between "fisher" and "stouffer".
99
+ * -t combScorThresh = Threshold to filter scores once combined.
100
+ * -o predictionResults = Output file with prediction results.
101
+
102
+ DomFun predictions output file (predictionResults) will contain a table with proteins predicted, their domains found, with which FunSys these domains where associated and the combined score calculated for the prediction.
103
+
104
+ The rest of scripts included in `bin/` folder were used to perform data transformation, validate results and plot graphs. They are described as follows:
105
+ * `association_metrics_average.rb`: Calculate average between different association files for results validation.
106
+ * `generate_CAFA2_dataset.rb`: Parse data from CAFA 2 validation system.
107
+ * `generate_CAFA2_tripartite_network.rb`: Generate tripartite networks from CAFA 2 dataset.
108
+ * `generate_cafa_control.rb`: Generate proteins control file from CAFA 2.
109
+ * `get_kegg_pathways.R`: Download KEGG pathways identifiers.
110
+ * `lines.R`: Program to plot different data distributions.
111
+ * `normalize_combined_scores.rb`: Script to normalize combined scores.
112
+ * `prepare_cafa_network.rb`: Generate a tripartite network with GO associations for CAFA 2 testing.
113
+ * `translate_kegg_genes2pathways.rb`: Use to translate KEGG identifiers from geneIDs to KEGG pathway IDs (from UniProt downloads).
114
+ * `validate_ProtFunSys_predictions.rb`: System to validate predictions when training DomFun with UniProt proteins.
26
115
 
27
116
  ## Development
28
117
 
@@ -32,8 +121,9 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
32
121
 
33
122
  ## Contributing
34
123
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/DomFun.
124
+ Bug reports and pull requests are welcome on BitBucket at https://bitbucket.org/elenarojano/domfun
36
125
 
37
126
  ## License
38
127
 
39
128
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
129
+
@@ -1,3 +1,3 @@
1
1
  module DomFun
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: DomFun
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elena Rojano, Pedro Seoane
@@ -66,9 +66,8 @@ dependencies:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: 0.1.5
69
- description: From associations calculated between protein domains and functional systems
70
- (FunSys), DomFun can predict the functions of proteins looking up domains and the
71
- FunSys that have been associated with. The system is validated using data from CAFA.
69
+ description: Proteins function predictor trained with associations between protein
70
+ domains (classified in CATH) and functional systems (GO, KEGG, Reactome).
72
71
  email:
73
72
  - elenarojano@uma.es, seoanezonjic@hotmail.com
74
73
  executables: []
@@ -101,7 +100,7 @@ files:
101
100
  - lib/DomFun.rb
102
101
  - lib/DomFun/generalMethods.rb
103
102
  - lib/DomFun/version.rb
104
- homepage: https://github.com/ElenaRojano/DomFun
103
+ homepage: https://bitbucket.org/elenarojano/domfun
105
104
  licenses:
106
105
  - MIT
107
106
  metadata: {}
@@ -124,5 +123,5 @@ rubyforge_project:
124
123
  rubygems_version: 2.6.14
125
124
  signing_key:
126
125
  specification_version: 4
127
- summary: Tool to predict protein functions based on domains-FunSys associations.
126
+ summary: Tool to predict protein functions using domains-FunSys associations.
128
127
  test_files: []