macroape 3.3.8 → 4.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 3037e164fd1b1c23bf40a9fca1d1da39737934a5
4
+ data.tar.gz: 4d4482d00ce0c76cbb47fe9d9eff53b48c1d2741
5
+ SHA512:
6
+ metadata.gz: cc4176fe2b1d2f7b5bf4835b3612d316797f378dbf036bf0bba3135effe1cf9cbba1037ef87c7cffc24d0d2ef8d2ebf16fe5d585780445a1492fa795df44cb48
7
+ data.tar.gz: d86983eeb148235e1470dbfa7f674cf0e395129bd1836e34256b42bba06525f0881961cd8c6ca0e5e09df520786a8735e1ec2e10e76c4e59b8cfddb38b151685
data/README.md CHANGED
@@ -17,7 +17,7 @@ Or install it yourself as:
17
17
  $ gem install macroape
18
18
 
19
19
  ## Usage
20
- For more information read manual at https://docs.google.com/document/pub?id=1_jsxhMNzMzy4d2d_byAd3n6Szg5gEcqG_Sf7w9tEqWw (not last version but comprehensive description of approach)
20
+ For more information read manual at https://docs.google.com/document/pub?id=1_jsxhMNzMzy4d2d_byAd3n6Szg5gEcqG_Sf7w9tEqWw
21
21
 
22
22
  ## Basic usage as a command-line tool
23
23
  MacroAPE have 7 command line tools:
@@ -31,7 +31,7 @@ Or install it yourself as:
31
31
  * eval_alignment \<first PWM file\> \<second PWM file\> \<shift of second matrix\> \<orientation of second matrix(direct|revcomp)\>
32
32
 
33
33
  ### Tools for looking through collection for the motifs most similar to a query motif
34
- * preprocess_collection \<folder with motif files\> [-o \<collection output file\>]
34
+ * preprocess_collection \<folder with motif files\> \<collection output file\>
35
35
  * scan_collection \<query PWM file\> \<collection file\>
36
36
 
37
37
  ### Tool for finding mutual alignment of several motifs relative to first(leader) motif. It's designed to use with sequence_logo to draw logos of clusters
data/TODO.txt CHANGED
@@ -17,6 +17,8 @@ ToDo:
17
17
  8)(TODO: for theoretically consistency, while making small inconsistences to old calculations)
18
18
  When we work with strong threshold, we round matrix up(in order to overrate threshold comparing to real thus taking underrated pvalue) and take upper bound of discrete-thresholds fork.
19
19
  When we are estimating lower bound of threshold (weak threshold) we take lower bound of fork of discrete thresholds. But we should ALSO (not done yet) take matrix discreted down! This'd allow us give exact answer on a question in which range real threshold should lay with given P-value, now we correctly estimate only lower bound of threshold(upper bound of P-value)
20
+ 9) (may be) Option to specify predefined query motif threshold in scan_collection
21
+ 10) Fix Readme!
20
22
 
21
23
  Specs and tests:
22
24
  create spec on use of MaxHashSize, MaxHashSizeDouble
@@ -1,12 +1,12 @@
1
- require 'docopt'
2
1
  require_relative '../../macroape'
2
+ require 'shellwords'
3
3
 
4
4
  module Macroape
5
5
  module CLI
6
6
  module AlignMotifs
7
7
 
8
8
  def self.main(argv)
9
- doc = <<-DOCOPT.strip_doc
9
+ doc = <<-EOS.strip_doc
10
10
  Align motifs tool.
11
11
  It takes motifs and builds alignment of each motif to the first (leader) motif.
12
12
 
@@ -16,38 +16,78 @@ module Macroape
16
16
  pwm_file_3 shift_3 orientation_3
17
17
 
18
18
  Usage:
19
- align_motifs [options] <pm-files>...
19
+ #{run_tool_cmd} [options] <leader pm> <rest pm files>...
20
+ or
21
+ ls rest_pms/*.pm | #{run_tool_cmd} [options] <leader pm>
20
22
 
21
23
  Options:
22
- -h --help Show this screen.
23
- --pcm Use PCMs instead of PWMs as input
24
- DOCOPT
24
+ [-p <P-value>]
25
+ [-d <discretization level>]
26
+ [--pcm] - treat the input file as Position Count Matrix. PCM-to-PWM transformation to be done internally.
27
+ [--boundary lower|upper] Upper boundary (default) means that the obtained P-value is greater than or equal to the requested P-value
28
+ [-b <background probabilities] ACGT - 4 numbers, comma-delimited(spaces not allowed), sum should be equal to 1, like 0.25,0.24,0.26,0.25
29
+ EOS
25
30
 
26
- options = Docopt::docopt(doc, argv: argv)
31
+ if argv.empty? || ['-h', '--h', '-help', '--help'].any?{|help_option| argv.include?(help_option)}
32
+ STDERR.puts doc
33
+ exit
34
+ end
27
35
 
28
- data_model = options['--pcm'] ? Bioinform::PCM : Bioinform::PWM
29
- motif_files = options['<pm-files>']
30
- leader = motif_files.first
31
- background = [1,1,1,1]
36
+ leader_background = [1,1,1,1]
37
+ rest_motifs_background = [1,1,1,1]
32
38
  discretization = 1
33
39
  pvalue = 0.0005
40
+ max_hash_size = 10000000
41
+ max_pair_hash_size = 10000
42
+ pvalue_boundary = :upper
43
+
44
+ data_model = argv.delete('--pcm') ? Bioinform::PCM : Bioinform::PWM
45
+
46
+ while argv.first && argv.first.start_with?('-')
47
+ case argv.shift
48
+ when '-p'
49
+ pvalue = argv.shift.to_f
50
+ when '-d'
51
+ discretization = argv.shift.to_f
52
+ when '--max-hash-size'
53
+ max_hash_size = argv.shift.to_i
54
+ when '--max-2d-hash-size'
55
+ max_pair_hash_size = argv.shift.to_i
56
+ when '-b'
57
+ rest_motifs_background = leader_background = argv.shift.split(',').map(&:to_f)
58
+ when '-b1'
59
+ leader_background = argv.shift.split(',').map(&:to_f)
60
+ when '-b2'
61
+ rest_motifs_background = argv.shift.split(',').map(&:to_f)
62
+ when '--boundary'
63
+ pvalue_boundary = argv.shift.to_sym
64
+ raise 'boundary should be either lower or upper' unless pvalue_boundary == :lower || pvalue_boundary == :upper
65
+ end
66
+ end
34
67
 
35
- shifts = {leader => [0,:direct]}
36
- pwm_first = data_model.new(File.read(leader)).to_pwm
37
- pwm_first.set_parameters(background: background).discrete!(discretization)
38
- motif_files[1..-1].each do |motif_name|
68
+ leader_pwm_file = argv.shift
69
+ rest_pwms_file = argv
70
+ rest_pwms_file += $stdin.read.shellsplit unless $stdin.tty?
71
+ rest_pwms_file.reject!{|filename| File.expand_path(filename) == File.expand_path(leader_pwm_file)}
72
+
73
+ shifts = []
74
+ shifts << [leader_pwm_file, 0, :direct]
75
+ pwm_first = data_model.new(File.read(leader_pwm_file)).to_pwm
76
+ pwm_first.set_parameters(background: leader_background, max_hash_size: max_hash_size).discrete!(discretization)
77
+
78
+ rest_pwms_file.each do |motif_name|
39
79
  pwm_second = data_model.new(File.read(motif_name)).to_pwm
40
- pwm_second.set_parameters(background: background).discrete!(discretization)
41
- info = Macroape::PWMCompare.new(pwm_first, pwm_second).jaccard_by_pvalue(pvalue)
42
- shifts[motif_name] = [info[:shift], info[:orientation]]
80
+ pwm_second.set_parameters(background: rest_motifs_background, max_hash_size: max_hash_size).discrete!(discretization)
81
+ cmp = Macroape::PWMCompare.new(pwm_first, pwm_second).set_parameters(max_pair_hash_size: max_pair_hash_size)
82
+ info = cmp.jaccard_by_pvalue(pvalue)
83
+ shifts << [motif_name, info[:shift], info[:orientation]]
43
84
  end
44
85
 
45
- shifts.each do |motif_name, (shift,orientation)|
86
+ shifts.each do |motif_name, shift,orientation|
46
87
  puts "#{motif_name}\t#{shift}\t#{orientation}"
47
88
  end
48
-
49
- rescue Docopt::Exit => e
50
- puts e.message
89
+ rescue => err
90
+ STDERR.puts "\n#{err}\n#{err.backtrace.first(5).join("\n")}\n\nUse --help option for help\n\n#{doc}"
51
91
  end
52
92
 
53
93
  end
@@ -1,4 +1,4 @@
1
1
  module Macroape
2
- VERSION = "3.3.8"
2
+ VERSION = "4.0.0"
3
3
  STANDALONE = false
4
4
  end
data/macroape.gemspec CHANGED
@@ -4,7 +4,7 @@ require File.expand_path('../lib/macroape/version', __FILE__)
4
4
  Gem::Specification.new do |gem|
5
5
  gem.authors = ["Ilya Vorontsov"]
6
6
  gem.email = ["prijutme4ty@gmail.com"]
7
- gem.description = %q{Macroape is an abbreviation for MAtrix CompaRisOn by Approximate P-value Estimation. It's a bioinformatic tool for evaluating similarity measure and best alignment between a pair of Position Weight Matrices(PWM), finding thresholds by P-values and inside out and even searching a collection of motifs for the most similar ones. Used approach and application described in manual at https://docs.google.com/document/pub?id=1_jsxhMNzMzy4d2d_byAd3n6Szg5gEcqG_Sf7w9tEqWw}
7
+ gem.description = %q{Macroape is an abbreviation for MAtrix CompaRisOn by Approximate P-value Estimation. It's a bioinformatic tool for evaluating similarity measure and best alignment between a pair of Position Weight Matrices(PWM), finding thresholds by P-values and vice versa and even searching a collection of motifs for the most similar ones. Used approach and application described in manual at https://docs.google.com/document/pub?id=1_jsxhMNzMzy4d2d_byAd3n6Szg5gEcqG_Sf7w9tEqWw}
8
8
  gem.summary = %q{PWM comparison tool using MACROAPE approach}
9
9
  gem.homepage = "http://autosome.ru/macroape/"
10
10
 
@@ -15,6 +15,5 @@ Gem::Specification.new do |gem|
15
15
  gem.require_paths = ["lib"]
16
16
  gem.version = Macroape::VERSION
17
17
 
18
- gem.add_dependency('bioinform', '= 0.1.9')
19
- gem.add_dependency('docopt', '= 0.5.0')
18
+ gem.add_dependency('bioinform', '~> 0.1.10')
20
19
  end
@@ -21,4 +21,17 @@ class TestAlignmotifs < Test::Unit::TestCase
21
21
  %w[SP1_f1_revcomp.pcm -1 revcomp]],
22
22
  Helpers.align_motifs_output('--pcm KLF4_f2.pcm KLF3_f1.pcm SP1_f1_revcomp.pcm')
23
23
  end
24
+ def test_names_from_stdin
25
+ assert_equal [%w[KLF4_f2.pwm 0 direct],
26
+ %w[KLF3_f1.pwm -4 direct],
27
+ %w[SP1_f1_revcomp.pwm -1 revcomp]],
28
+ Helpers.provide_stdin('KLF3_f1.pwm SP1_f1_revcomp.pwm'){ Helpers.align_motifs_output('KLF4_f2.pwm') }
29
+ end
30
+ def test_names_from_stdin_duplicate_leader
31
+ assert_equal [%w[KLF4_f2.pwm 0 direct],
32
+ %w[KLF3_f1.pwm -4 direct],
33
+ %w[SP1_f1_revcomp.pwm -1 revcomp]],
34
+ Helpers.provide_stdin('KLF3_f1.pwm KLF4_f2.pwm SP1_f1_revcomp.pwm'){ Helpers.align_motifs_output('KLF4_f2.pwm') }
35
+ end
36
+
24
37
  end
metadata CHANGED
@@ -1,52 +1,33 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: macroape
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.3.8
5
- prerelease:
4
+ version: 4.0.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Ilya Vorontsov
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2012-12-07 00:00:00.000000000 Z
11
+ date: 2013-04-29 00:00:00.000000000 Z
13
12
  dependencies:
14
13
  - !ruby/object:Gem::Dependency
15
14
  name: bioinform
16
15
  requirement: !ruby/object:Gem::Requirement
17
- none: false
18
16
  requirements:
19
- - - '='
17
+ - - ~>
20
18
  - !ruby/object:Gem::Version
21
- version: 0.1.9
19
+ version: 0.1.10
22
20
  type: :runtime
23
21
  prerelease: false
24
22
  version_requirements: !ruby/object:Gem::Requirement
25
- none: false
26
23
  requirements:
27
- - - '='
24
+ - - ~>
28
25
  - !ruby/object:Gem::Version
29
- version: 0.1.9
30
- - !ruby/object:Gem::Dependency
31
- name: docopt
32
- requirement: !ruby/object:Gem::Requirement
33
- none: false
34
- requirements:
35
- - - '='
36
- - !ruby/object:Gem::Version
37
- version: 0.5.0
38
- type: :runtime
39
- prerelease: false
40
- version_requirements: !ruby/object:Gem::Requirement
41
- none: false
42
- requirements:
43
- - - '='
44
- - !ruby/object:Gem::Version
45
- version: 0.5.0
26
+ version: 0.1.10
46
27
  description: Macroape is an abbreviation for MAtrix CompaRisOn by Approximate P-value
47
28
  Estimation. It's a bioinformatic tool for evaluating similarity measure and best
48
29
  alignment between a pair of Position Weight Matrices(PWM), finding thresholds by
49
- P-values and inside out and even searching a collection of motifs for the most similar
30
+ P-values and vice versa and even searching a collection of motifs for the most similar
50
31
  ones. Used approach and application described in manual at https://docs.google.com/document/pub?id=1_jsxhMNzMzy4d2d_byAd3n6Szg5gEcqG_Sf7w9tEqWw
51
32
  email:
52
33
  - prijutme4ty@gmail.com
@@ -130,27 +111,26 @@ files:
130
111
  - test/test_helper.rb
131
112
  homepage: http://autosome.ru/macroape/
132
113
  licenses: []
114
+ metadata: {}
133
115
  post_install_message:
134
116
  rdoc_options: []
135
117
  require_paths:
136
118
  - lib
137
119
  required_ruby_version: !ruby/object:Gem::Requirement
138
- none: false
139
120
  requirements:
140
- - - ! '>='
121
+ - - '>='
141
122
  - !ruby/object:Gem::Version
142
123
  version: '0'
143
124
  required_rubygems_version: !ruby/object:Gem::Requirement
144
- none: false
145
125
  requirements:
146
- - - ! '>='
126
+ - - '>='
147
127
  - !ruby/object:Gem::Version
148
128
  version: '0'
149
129
  requirements: []
150
130
  rubyforge_project:
151
- rubygems_version: 1.8.24
131
+ rubygems_version: 2.0.3
152
132
  signing_key:
153
- specification_version: 3
133
+ specification_version: 4
154
134
  summary: PWM comparison tool using MACROAPE approach
155
135
  test_files:
156
136
  - spec/count_distribution_spec.rb
@@ -190,3 +170,4 @@ test_files:
190
170
  - test/preprocess_collection_test.rb
191
171
  - test/scan_collection_test.rb
192
172
  - test/test_helper.rb
173
+ has_rdoc: