genfrag 0.0.0.1 → 0.0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/History.txt CHANGED
@@ -1,4 +1,9 @@
1
- == 0.1.0 / 2009-02-04
1
+ == 0.0.0.2 / 2009-03-16
2
2
 
3
3
  * 1 major enhancement
4
- - initialize
4
+ - fixes
5
+
6
+ == 0.0.0.1 / 2009-02-26
7
+
8
+ * 1 major enhancement
9
+ - init
data/README.rdoc CHANGED
@@ -1,12 +1,12 @@
1
- Genfrag version 0.0.0.1
1
+ Genfrag version 0.0.0.2
2
2
  by Pjotr Prins and Trevor Wennblom
3
3
  http://genfrag.rubyforge.org
4
- (the "Rough Draught" release)
4
+ http://rubyforge.org/projects/genfrag/
5
5
 
6
6
 
7
7
  == DESCRIPTION:
8
8
 
9
- This is a development release. Few features are functional at this time.
9
+ This is a development release. Some features are functional at this time.
10
10
 
11
11
  Genfrag allows for rapid in-silico searching of fragments cut by
12
12
  different restriction enzymes in large nucleotide acid databases,
@@ -33,6 +33,307 @@ This works
33
33
  * sudo gem install genfrag
34
34
 
35
35
 
36
+ == EXAMPLES:
37
+
38
+ === Index command
39
+
40
+ === Search command
41
+
42
+ ==== Example 1
43
+
44
+ Return all sequences from the file 'example.fasta.tdf' that are referenced by the index 'example.fasta_bstyi_msei_index.tdf'
45
+
46
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v
47
+
48
+ Only one entry from output is shown below.
49
+
50
+ ---
51
+ - sequence
52
+ gattgcaacaatcgctttggaggatgtaattgtgcaattggccaatgcacaaatcgacaatgtccttgttttgctgctaatcgtgaatgcgatccagatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttttaattggggtgcatttacatgggactctcttaaaaagaatgagtatctcggagaatatactggagaactgatcactcatgatgaagctaatgagcgtgggagaatagaagatcggattggttcttcctacctctttaccttgaatgatca
53
+ - sequence size
54
+ 380
55
+ - fragment - primary strand
56
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
57
+ - fragment - complement strand
58
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
59
+ - fragment with adapters - primary strand
60
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
61
+ - fragment with adapters - complement strand
62
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
63
+
64
+ The first cut is made using RE5 (restriction enzyme with first match in reference to 5') BstYI. BstYI has the cut patten
65
+ 5' - r^g a t c y - 3'
66
+ 3' - y c t a g^r - 5'
67
+
68
+ The first 96bp of the sequence are removed when BstYI makes its cut, starting the strand fragment. The primary strand
69
+ fragment begins with 'gatctttgtc', four bases are lost from the complement strand due to the cut pattern of BstYI, therefore 'gatc'
70
+ from the primary strand has no hydrogen bonds with the complement strand. These missing nucleotides are represented with a period
71
+ ('.').
72
+
73
+ The second cut is made using RE3 (restriction enzyme with first match in reference to 3') MseI. MseI has the cut pattern
74
+ 5' - t^t a a - 3'
75
+ 3' - a a t^t - 5'
76
+
77
+ This leaves a final fragment of 136bp. The way MseI cuts will leave the complement strand two nucleotides longer than the primary
78
+ strand. This is represented on the primary stand with two periods.
79
+
80
+
81
+ ==== Example 2
82
+
83
+ This demonstrates using an adapter.
84
+
85
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v --adapter5 t
86
+
87
+ ---
88
+ - sequence
89
+ gattgcaacaatcgctttggaggatgtaattgtgcaattggccaatgcacaaatcgacaatgtccttgttttgctgctaatcgtgaatgcgatccagatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttttaattggggtgcatttacatgggactctcttaaaaagaatgagtatctcggagaatatactggagaactgatcactcatgatgaagctaatgagcgtgggagaatagaagatcggattggttcttcctacctctttaccttgaatgatca
90
+ - sequence size
91
+ 380
92
+ - fragment - primary strand
93
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
94
+ - fragment - complement strand
95
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
96
+ - fragment with adapters - primary strand
97
+ +++++ttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
98
+ - fragment with adapters - complement strand
99
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
100
+
101
+ The adapter can be considered an extension to the restriction enzyme. When searching for a specified adapter, anything that
102
+ the restriction enzyme would need to make its match is first ignored before comparing the adapter to the sequence.
103
+
104
+ It was shown previously that BstYI has the cut patten
105
+ 5' - r^g a t c y - 3'
106
+ 3' - y c t a g^r - 5'
107
+
108
+ The 'y' symbol indicates a nucleotide of 't' or 'c'.[Footnote 1] Adapter5 is defined as the nucleotide 't' in this example.
109
+ 5 nucleotides from the restriction enzyme are matched ('gatct') as indicated by the plus ('+') symbols, then the 1 nucleotide
110
+ from the adapter is matched ('t').
111
+
112
+ Note that in this current version of Genfrag only the primary strand has the plus symbols applied. In a future version
113
+ the complement strand would have a plus symbol in place of the initial 'a'.
114
+
115
+
116
+ ==== Example 3
117
+
118
+ The previous example with a longer adapter.
119
+
120
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v --adapter5 ttgtcg
121
+
122
+ ---
123
+ - sequence
124
+ gattgcaacaatcgctttggaggatgtaattgtgcaattggccaatgcacaaatcgacaatgtccttgttttgctgctaatcgtgaatgcgatccagatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttttaattggggtgcatttacatgggactctcttaaaaagaatgagtatctcggagaatatactggagaactgatcactcatgatgaagctaatgagcgtgggagaatagaagatcggattggttcttcctacctctttaccttgaatgatca
125
+ - sequence size
126
+ 380
127
+ - fragment - primary strand
128
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
129
+ - fragment - complement strand
130
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
131
+ - fragment with adapters - primary strand
132
+ +++++ttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
133
+ - fragment with adapters - complement strand
134
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
135
+
136
+
137
+ ==== Example 4
138
+
139
+ This demonstrates Adapter3.
140
+
141
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v --adapter3 aacca
142
+
143
+ ---
144
+ - sequence
145
+ gattgcaacaatcgctttggaggatgtaattgtgcaattggccaatgcacaaatcgacaatgtccttgttttgctgctaatcgtgaatgcgatccagatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttttaattggggtgcatttacatgggactctcttaaaaagaatgagtatctcggagaatatactggagaactgatcactcatgatgaagctaatgagcgtgggagaatagaagatcggattggttcttcctacctctttaccttgaatgatca
146
+ - sequence size
147
+ 380
148
+ - fragment - primary strand
149
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
150
+ - fragment - complement strand
151
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
152
+ - fragment with adapters - primary strand
153
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggtt+..
154
+ - fragment with adapters - complement strand
155
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
156
+
157
+ It was shown previously that MseI has the cut patten
158
+ 5' - t^t a a - 3'
159
+ 3' - a a t^t - 5'
160
+
161
+ Looking at primary strand fragment, the ending nucleotide remaining that has also been used by the restriction enzyme to
162
+ match is 't'. When the Adapter3 filter is made, the restriction enzyme match will replace the 't' with a plus symbol.
163
+
164
+ An end of the primary strand is
165
+ 5' - atggattcatggtt+.. - 3'
166
+
167
+ If that end is reversed and complemented, 'aaca' is the initial four nucleotides that match.
168
+
169
+ Note that in this current version of Genfrag only the primary strand has the plus symbols applied. In a future version
170
+ the complement strand would have a plus symbol in place of the final 'aat'.
171
+
172
+
173
+ ==== Example 5
174
+
175
+ The previous example with Adapter3 using alternate notation.
176
+
177
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v --adapter3 _tggtt
178
+
179
+ ---
180
+ - sequence
181
+ gattgcaacaatcgctttggaggatgtaattgtgcaattggccaatgcacaaatcgacaatgtccttgttttgctgctaatcgtgaatgcgatccagatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttttaattggggtgcatttacatgggactctcttaaaaagaatgagtatctcggagaatatactggagaactgatcactcatgatgaagctaatgagcgtgggagaatagaagatcggattggttcttcctacctctttaccttgaatgatca
182
+ - sequence size
183
+ 380
184
+ - fragment - primary strand
185
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
186
+ - fragment - complement strand
187
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
188
+ - fragment with adapters - primary strand
189
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggtt+..
190
+ - fragment with adapters - complement strand
191
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
192
+
193
+ If Adapter3 is supplied with an initial underscore ('_') in the sequence, the sequence is matched directly without a
194
+ reverse complement.
195
+
196
+
197
+ ==== Example 6
198
+
199
+ Using two adapters together.
200
+
201
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v --adapter5 ttgtcg --adapter3 aacca
202
+
203
+ ---
204
+ - fragment with adapters - primary strand
205
+ +++++ttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggtt+..
206
+ - fragment with adapters - complement strand
207
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
208
+
209
+ Note that in this current version of Genfrag only the primary strand has the plus symbols applied. In a future version
210
+ the complement strand would have a plus symbol in place of the initial 'a' and the final 'aat'.
211
+
212
+
213
+ ==== Example 7
214
+
215
+ Using an adapter and specifying an adapter sequence.
216
+
217
+ You may have particular adapter sequences that you have used. These can be specified with 'adapter5-sequence' or 'adapter3-sequence'.
218
+ Note that 'adapter3-sequence' will be reversed when applied to the primary strand. Any change to the sequence caused
219
+ by the adapter sequence will be noted with an equals ('=') symbol.
220
+
221
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -v --adapter5 ttgtcg --adapter3 aacca --adapter5-sequence NXNXNXNX --adapter3-sequence NZNZNZNZ
222
+
223
+ ---
224
+ - sequence
225
+ gattgcaacaatcgctttggaggatgtaattgtgcaattggccaatgcacaaatcgacaatgtccttgttttgctgctaatcgtgaatgcgatccagatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttttaattggggtgcatttacatgggactctcttaaaaagaatgagtatctcggagaatatactggagaactgatcactcatgatgaagctaatgagcgtgggagaatagaagatcggattggttcttcctacctctttaccttgaatgatca
226
+ - sequence size
227
+ 380
228
+ - fragment - primary strand
229
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
230
+ - fragment - complement strand
231
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
232
+ - fragment with adapters - primary strand
233
+ NXNXNXNXttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttZNZNZNZN
234
+ - fragment with adapters - complement strand
235
+ ===....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat=====
236
+
237
+
238
+ ==== Example 8
239
+
240
+ The previous example but with short adapter sequences.
241
+
242
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5 ttgtcg --adapter3 aacca --adapter5-sequence X --adapter3-sequence Z
243
+
244
+ ---
245
+ - fragment with adapters - primary strand
246
+ ====XttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttZ==
247
+ - fragment with adapters - complement strand
248
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
249
+
250
+
251
+ ==== Example 9
252
+
253
+ Using an adapter and specifying an adapter size, these can be specified with 'adapter5-size' or 'adapter3-size'.
254
+
255
+ You may know the size of your adapter, but not have a particular sequence in mind. The unknown nucleotides will be represented
256
+ with a question mark character ('?').
257
+
258
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5 ttgtcg --adapter3 aacca --adapter5-size 6 --adapter3-size 8
259
+
260
+ ---
261
+ - fragment with adapters - primary strand
262
+ ????????ttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggtt??????
263
+ - fragment with adapters - complement strand
264
+ ===....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat===
265
+
266
+
267
+ ==== Example 10
268
+
269
+ The previous example but with short adapter sizes.
270
+
271
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5 ttgtcg --adapter3 aacca --adapter5-size 1 --adapter3-size 2
272
+
273
+ ---
274
+ - fragment with adapters - primary strand
275
+ ====?ttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggtt??=
276
+ - fragment with adapters - complement strand
277
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
278
+
279
+
280
+ ==== Example 11
281
+
282
+ An exact fragmentation length can be searched for with the 'seqsize' argument.
283
+
284
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -s 136
285
+
286
+ ---
287
+ - fragment with adapters - primary strand
288
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
289
+ - fragment with adapters - complement strand
290
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
291
+
292
+
293
+ ==== Example 12
294
+
295
+ The previous example with multiple fragment result sizes accepted. Different sizes can be separated by commas.
296
+
297
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -s 136,166
298
+
299
+ ---
300
+ - fragment with adapters - primary strand
301
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaattcctccttcaaaccaataaaaagattctcattggaaagtctgatgttcatggatggggtgcatttacatgggactctct..
302
+ - fragment with adapters - complement strand
303
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttaaggaggaagtttggttatttttctaagagtaacctttcagactacaagtacctaccccacgtaaatgtaccctgagagaat
304
+ ---
305
+ - fragment with adapters - primary strand
306
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
307
+ - fragment with adapters - complement strand
308
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
309
+
310
+
311
+ ==== Example 13
312
+
313
+ The previous example with a sequence size range accepted. Since you may only have an approximate idea of the fragment size
314
+ you are expecting, you may give a range by using a plus symbol ('+') to indicate a tolerance to a size.
315
+
316
+ genfrag search -f example.fasta --re5 BstYI --re3 MseI -s 144+10,166
317
+
318
+ ---
319
+ - fragment with adapters - primary strand
320
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaattcctccttcaaaccaataaaaagattctcattggaaagtctgatgttcatggatggggtgcatttacatgggactctct..
321
+ - fragment with adapters - complement strand
322
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttaaggaggaagtttggttatttttctaagagtaacctttcagactacaagtacctaccccacgtaaatgtaccctgagagaat
323
+ ---
324
+ - fragment with adapters - primary strand
325
+ gatctttgtcggagttgtcctcttagctgtggagatggcactcttggtgagacaccagtgcaaatccaatgcaagaacatgcaataataaaaagattctcattggaaagtctgatgttcatggattcatggttt..
326
+ - fragment with adapters - complement strand
327
+ ....aaacagcctcaacaggagaatcgacacctctaccgtgagaaccactctgtggtcacgtttaggttacgttcttgtacgttattatttttctaagagtaacctttcagactacaagtacctaagtaccaaaat
328
+
329
+
330
+ ==== Footnotes
331
+ [1]:
332
+ require 'rubygems'
333
+ require 'bio'
334
+ puts Bio::Sequence::NA.new('y').to_re # => (?-mix:[tcy])
335
+
336
+
36
337
  == LICENSE:
37
338
 
38
339
  Copyright (c) 2009 Pjotr Prins and Trevor Wennblom
data/lib/genfrag.rb CHANGED
@@ -2,7 +2,7 @@
2
2
  module Genfrag
3
3
 
4
4
  # :stopdoc:
5
- VERSION = '0.0.0.1'
5
+ VERSION = '0.0.0.2'
6
6
  LIBPATH = ::File.expand_path(::File.dirname(__FILE__)) + ::File::SEPARATOR
7
7
  PATH = ::File.dirname(LIBPATH) + ::File::SEPARATOR
8
8
  # :startdoc:
data/lib/genfrag/app.rb CHANGED
@@ -50,7 +50,9 @@ class App
50
50
  @out.puts "Genfrag #{::Genfrag::VERSION}"
51
51
  nil
52
52
  else
53
- raise "Unknown command #{cmd_str.inspect}"
53
+ @err.puts "Unknown command #{cmd_str.inspect}"
54
+ help
55
+ nil
54
56
  end
55
57
 
56
58
  cmd.cli_run args if cmd
@@ -89,9 +89,9 @@ class Command
89
89
  lambda { |value|
90
90
  options[:filefasta] = value
91
91
  }],
92
- :size => ['-s', '--size SIZE', Array, '',
92
+ :seqsize => ['-s', '--seqsize SIZE', Array, '',
93
93
  lambda { |value|
94
- options[:size] = value
94
+ options[:seqsize] = value
95
95
  }],
96
96
 
97
97
  :adapter5 => ['-y', '--adapter5 ADAPTER', String, '',
@@ -45,7 +45,7 @@ class SearchCommand < Command
45
45
  ary = [:verbose, :quiet, :tracktime, :indir, :outdir, :sqlite, :re5, :re3,
46
46
  :filelookup, :filefasta, :fileadapters, :adapter5_sequence, :adapter3_sequence,
47
47
  :adapter5_size, :adapter3_size, :named_adapter5, :named_adapter3,
48
- :adapter5, :adapter3
48
+ :adapter5, :adapter3, :seqsize
49
49
  ]
50
50
  ary.each { |a| opts.on(*std_opts[a]) }
51
51
 
@@ -54,10 +54,11 @@ class SearchCommand < Command
54
54
  opts.on( '-h', '--help', 'show this message' ) { @out.puts opts; exit }
55
55
 
56
56
  opts.separator ' Examples:'
57
+ opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI -v'
57
58
  opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5 tt'
58
59
  opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --add 26 --adapter5 ct --adapter3 aa --size 190,215'
59
60
  opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5-size 11 --adapter5 tt --adapter3-size 15 --size 168'
60
- opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5-sequence GACTGCGTAGTGATC --adapter5 tt --adapter3-size 15 --size 168'
61
+ opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5-sequence GACTGCGTAGTGATC --adapter5 tt --size 168'
61
62
  opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --adapter5-size 11 --adapter5 ct --adapter3-size 15 --adapter3 aa --size 190,215'
62
63
  opts.separator ' genfrag search -f example.fasta --re5 BstYI --re3 MseI --add 26 --named-adapter5 BstYI-T4 --named-adapter3 MseI-21 --size 190,215'
63
64
  opts
@@ -122,7 +123,7 @@ class SearchCommand < Command
122
123
  @ops.sqlite ||= false
123
124
  @ops.re5 ||= nil
124
125
  @ops.re3 ||= nil
125
- @ops.size ||= [0]
126
+ @ops.seqsize ||= [0]
126
127
  @ops.adapter5_size ||= nil
127
128
  @ops.adapter3_size ||= nil
128
129
  @ops.adapter5 ||= nil
@@ -158,6 +159,37 @@ END
158
159
  raise ArgumentError, "Must specify --fileadapters when using a named_adapter"
159
160
  end
160
161
 
162
+ if @ops.adapter5_size and @ops.adapter5_sequence
163
+ raise ArgumentError, '--adapter5-sequence and --adapter5-size both supplied, may only have one'
164
+ end
165
+ if @ops.adapter3_size and @ops.adapter3_sequence
166
+ raise ArgumentError, '--adapter3-sequence and --adapter3-size both supplied, may only have one'
167
+ end
168
+
169
+ if (@ops.adapter5_sequence or @ops.adapter5_size) and !@ops.adapter5
170
+ raise ArgumentError, '--adapter5 missing in presence of --adapter5-sequence or --adapter5-size'
171
+ end
172
+ if (@ops.adapter3_sequence or @ops.adapter3_size) and !@ops.adapter3
173
+ raise ArgumentError, '--adapter3 missing in presence of --adapter3-sequence or --adapter3-size'
174
+ end
175
+
176
+ if [@ops.seqsize].flatten == [0] or [@ops.seqsize].flatten == [nil] or [@ops.seqsize].flatten == ['0']
177
+ @ops.seqsize = nil
178
+ else
179
+ h = {:ranges => [], :ints => []}
180
+ @ops.seqsize.flatten.each do |s|
181
+ if s.include?('+')
182
+ a = s.split('+')
183
+ c = a[0].to_i
184
+ r = a[1].to_i
185
+ h[:ranges] << (c-r..c+r)
186
+ else
187
+ h[:ints] << s.to_i
188
+ end
189
+ end
190
+ @ops.seqsize = h
191
+ end
192
+
161
193
  if processed_adapters
162
194
  adapter_setup_1(processed_adapters)
163
195
  else
@@ -171,13 +203,6 @@ END
171
203
  seq3 = Bio::Sequence::NA.new(@adapters[:adapter3_specificity][1..-1]).downcase
172
204
  @adapters[:adapter3_specificity] = seq3.complement.to_s
173
205
  end
174
-
175
- if @ops.adapter5_size and @ops.adapter5_sequence and (@ops.adapter5_size != @adapters[:adapter5_size])
176
- raise ArgumentError, "--adapter5-sequence and --adapter5-size both supplied"
177
- end
178
- if @ops.adapter3_size and @ops.adapter3_sequence and (@ops.adapter3_size != @adapters[:adapter3_size])
179
- raise ArgumentError, "--adapter3-sequence and --adapter3-size both supplied"
180
- end
181
206
 
182
207
  @trim = calculate_trim_for_nucleotides(@re5_ds, @re3_ds)
183
208
 
@@ -185,11 +210,10 @@ END
185
210
  # Start calculations
186
211
  #
187
212
  left_trim, right_trim = calculate_left_and_right_trims(@trim)
188
-
189
- matching_fragments = find_matching_fragments(@sizes, left_trim, right_trim)
213
+
190
214
  results = []
191
215
 
192
- matching_fragments.each do |hit|
216
+ @sizes.values.each do |hit|
193
217
  hit.each do |entry|
194
218
  seq = @sequences[entry[:fasta_id]][:sequence]
195
219
  raw_frag = seq[entry[:offset]..(entry[:offset]+entry[:raw_size]-1)]
@@ -199,7 +223,7 @@ END
199
223
  p = primary_frag.dup
200
224
  c = complement_frag.dup
201
225
 
202
- # note the next two if-statements at this lever chain together with 'p' and 'c'
226
+ # note the next two if-statements at this level chain together with 'p' and 'c'
203
227
  if @adapters[:adapter5_specificity]
204
228
  p, c = matches_adapter(5, p, c, raw_frag, @trim)
205
229
  next if !p # next if returned false -- no match
@@ -212,8 +236,23 @@ END
212
236
 
213
237
  primary_frag_with_adapters = p
214
238
  complement_frag_with_adapters = c
239
+
240
+ if @ops.seqsize
241
+ primary_frag_with_adapters_size = primary_frag_with_adapters.size
242
+ good = false
243
+ if @ops.seqsize[:ints].include?(primary_frag_with_adapters_size)
244
+ good = true
245
+ else
246
+ @ops.seqsize[:ranges].each do |range|
247
+ good = true if range.include?(primary_frag_with_adapters_size)
248
+ break if good
249
+ end
250
+ end
251
+ # next if fragment size not in range
252
+ next if !good
253
+ end
215
254
 
216
- results << {:raw_frag => raw_frag, :primary_frag => primary_frag, :primary_frag_with_adapters => primary_frag_with_adapters, :complement_frag => complement_frag, :complement_frag_with_adapters => complement_frag_with_adapters, :entry => entry, :seq => seq} # FIXME
255
+ results << {:raw_frag => raw_frag, :primary_frag => primary_frag, :primary_frag_with_adapters => primary_frag_with_adapters, :complement_frag => complement_frag, :complement_frag_with_adapters => complement_frag_with_adapters, :entry => entry, :seq => seq}
217
256
  end
218
257
  end
219
258
 
@@ -26,6 +26,7 @@ class SearchCommand < Command
26
26
  primary_frag =~ /(\.*)/
27
27
  dots_on_primary = $1.size
28
28
  lead_in = tail.size + dots_on_primary
29
+
29
30
  return false if primary_frag[ lead_in .. -1 ].tr('.', '') !~ /^#{adapter_specificity}/i
30
31
 
31
32
  elsif five_or_three == 3
@@ -54,70 +55,41 @@ class SearchCommand < Command
54
55
  raise "First argument to matches_adapter must be a '5' or a '3'. Received: #{five_or_three.inspect}"
55
56
  end
56
57
 
57
- #return false if raw_frag[ [trim_primary, trim_complement].max .. -1 ] !~ /^#{adapter_specificity}/i
58
-
59
- #overhang = [trim_primary, trim_complement].max - [trim_primary, trim_complement].min
60
-
61
- #lead_in = overhang
62
-
63
58
  if adapter_sequence
64
- raise 'FIXME - not functional yet'
65
-
66
- # if lead_in >= adapter_sequence.size
67
- # # need to preserve dots on primary string
68
- # new_primary_frag = ('.' * (lead_in - adapter_sequence.size)) + adapter_sequence + primary_frag[ lead_in .. -1 ]
69
- # new_complement_frag = complement_frag
70
- # else
71
- # # need to add dots to beginning of complement string
72
- # new_primary_frag = adapter_sequence + primary_frag[ lead_in .. -1 ]
73
- # new_complement_frag = ('.' * (adapter_sequence.size - lead_in) ) + complement_frag
74
- # end
75
-
59
+ # adapter-sequence supplied
60
+ new_primary_frag, new_complement_frag = preserve_or_add(adapter_sequence.size, lead_in, adapter_sequence, primary_frag, complement_frag)
76
61
  elsif adapter_size
77
- raise 'FIXME - not functional yet'
78
-
79
- # # only the size and the specificity of the adapter has been provided
80
- # size_of_specificity = adapter_specificity.size
81
- # size_of_sequence = adapter_size - size_of_specificity
82
- # if lead_in >= size_of_sequence
83
- # # need to preserve dots on primary string
84
- # new_primary_frag = primary_frag[ 0 .. (lead_in - 1) ].upcase + primary_frag[ lead_in .. -1 ]
85
- # new_complement_frag = complement_frag
86
- # else
87
- # # need to add dots to beginning of complement string
88
- # new_primary_frag = ('+' * (size_of_sequence - lead_in) ) + primary_frag[ 0 .. (lead_in - 1) ].upcase + primary_frag[ lead_in .. -1 ]
89
- # new_complement_frag = ('.' * (size_of_sequence - lead_in) ) + complement_frag
90
- # end
91
-
62
+ # adapter-size supplied
63
+ new_primary_frag, new_complement_frag = preserve_or_add(adapter_size, lead_in, adapter_sequence, primary_frag, complement_frag)
92
64
  else
93
65
  # only the specificity has been provided
94
66
  new_primary_frag = ('.' * dots_on_primary) + ('+' * tail.size) + primary_frag[ lead_in .. -1 ]
95
67
  new_complement_frag = complement_frag
96
-
97
68
  end
98
69
 
99
70
  if five_or_three == 3
100
- new_primary_frag.reverse!
101
- new_complement_frag.reverse!
71
+ return [new_primary_frag.reverse, new_complement_frag.reverse]
72
+ else
73
+ return [new_primary_frag, new_complement_frag]
102
74
  end
103
-
104
- return [new_primary_frag, new_complement_frag]
105
75
  end
106
76
 
107
77
 
78
+ =begin
108
79
  # Find the fragments that match the search parameters
109
80
  #
110
81
  def find_matching_fragments(sizes, left, right)
111
82
  hits=[]
83
+
112
84
  s = (@adapters[:adapter5_size] or 0) + (@adapters[:adapter3_size] or 0)
113
85
 
114
- if [@ops.size].flatten == [0] or [@ops.size].flatten == [nil] or [@ops.size].flatten == ["0"]
86
+ if [@ops.seqsize].flatten == [0] or [@ops.seqsize].flatten == [nil] or [@ops.seqsize].flatten == ['0']
115
87
  sizes.each do |raw_size, info|
116
88
  hits << info
117
89
  end
118
90
 
119
91
  else
120
- [@ops.size].flatten.each do |seek_size|
92
+ [@ops.seqsize].flatten.each do |seek_size|
121
93
  seek_size = seek_size.to_i
122
94
  sizes.each do |raw_size, info|
123
95
  frag_size = raw_size - left[:trim_from_both] - right[:trim_from_both]
@@ -130,6 +102,7 @@ class SearchCommand < Command
130
102
 
131
103
  return hits
132
104
  end
105
+ =end
133
106
 
134
107
  def right_tail_of(s)
135
108
  # 'PpiI' => "n n n n n n^n n n n n n n g a a c n n n n n c t c n n n n n n n n n n n n n^n"
@@ -157,6 +130,23 @@ class SearchCommand < Command
157
130
  end
158
131
 
159
132
  end
133
+
134
+ def preserve_or_add(size, lead_in, adapter_sequence, primary_frag, complement_frag)
135
+ if adapter_sequence.nil? or adapter_sequence.empty?
136
+ adapter_sequence = '?' * size
137
+ end
138
+
139
+ if lead_in >= size
140
+ # need to preserve dots on primary string
141
+ p = ('=' * (lead_in - size)) + adapter_sequence + primary_frag[ lead_in .. -1 ]
142
+ c = complement_frag
143
+ else
144
+ # need to add dots to beginning of complement string
145
+ p = adapter_sequence + primary_frag[ lead_in .. -1 ]
146
+ c = ('=' * (size - lead_in) ) + complement_frag
147
+ end
148
+ [p,c]
149
+ end
160
150
 
161
151
  end # class SearchCommand
162
152
  end # class App
@@ -0,0 +1,24 @@
1
+
2
+ require File.expand_path(
3
+ File.join(File.dirname(__FILE__), %w[.. .. .. spec_helper]))
4
+
5
+ # --------------------------------------------------------------------------
6
+ describe "Genfrag::App::PredictorCommand" do
7
+
8
+ it "should receive a resultset from the search command"
9
+
10
+ it "should receive what predictor method to use"
11
+
12
+ describe "test adjusted sizes with a resultset" do
13
+
14
+ it "should calculate a p-value for every match"
15
+
16
+ it "should return p-values and flags for outlier values"
17
+
18
+ it "should optionally return the classifiers"
19
+
20
+ end
21
+
22
+ end
23
+
24
+ # EOF
@@ -58,9 +58,8 @@ describe Genfrag::App do
58
58
  end
59
59
 
60
60
  it 'should report an error for unrecognized commands' do
61
- lambda {@app.cli_run %w[bad_func]}.should raise_error(SystemExit)
62
- @err.readline.should == 'ERROR: While executing genfrag ... (RuntimeError)'
63
- @err.readline.should == ' Unknown command "bad_func"'
61
+ @app.cli_run %w[bad_func]
62
+ @err.readline.should == 'Unknown command "bad_func"'
64
63
  end
65
64
 
66
65
  it 'should report a version number' do
data/spec/genfrag_spec.rb CHANGED
@@ -8,7 +8,7 @@ describe Genfrag do
8
8
  @app = Genfrag
9
9
  end
10
10
 
11
- it "finds things releative to 'root'" do
11
+ it "finds things relative to 'root'" do
12
12
  Genfrag.path(%w[lib genfrag debug]).
13
13
  should == File.join(@root_dir, %w[lib genfrag debug])
14
14
  end
data/tasks/rdoc.rake CHANGED
@@ -19,10 +19,11 @@ namespace :doc do
19
19
  end
20
20
  rd.rdoc_files.push(*files)
21
21
 
22
- title = "#{PROJ.name}-#{PROJ.version} Documentation"
23
-
22
+ name = PROJ.name
24
23
  rf_name = PROJ.rubyforge.name
25
- title = "#{rf_name}'s " + title if rf_name.valid? and rf_name != title
24
+
25
+ title = "#{name}-#{PROJ.version} Documentation"
26
+ title = "#{rf_name}'s " + title if rf_name.valid? and rf_name != name
26
27
 
27
28
  rd.options << "-t #{title}"
28
29
  rd.options.concat(rdoc.opts)
data/tasks/setup.rb CHANGED
@@ -6,7 +6,7 @@ require 'fileutils'
6
6
  require 'ostruct'
7
7
  require 'find'
8
8
 
9
- class OpenStruct; undef :gem; end
9
+ class OpenStruct; undef :gem if defined? :gem; end
10
10
 
11
11
  # TODO: make my own openstruct type object that includes descriptions
12
12
  # TODO: use the descriptions to output help on the available bones options
@@ -124,9 +124,7 @@ import(*rakefiles)
124
124
  %w(lib ext).each {|dir| PROJ.libs << dir if test ?d, dir}
125
125
 
126
126
  # Setup some constants
127
- WIN32 = %r/djgpp|(cyg|ms|bcc)win|mingw/ =~ RUBY_PLATFORM unless defined? WIN32
128
-
129
- DEV_NULL = WIN32 ? 'NUL:' : '/dev/null'
127
+ DEV_NULL = File.exist?('/dev/null') ? '/dev/null' : 'NUL:'
130
128
 
131
129
  def quiet( &block )
132
130
  io = [STDOUT.dup, STDERR.dup]
@@ -139,21 +137,15 @@ ensure
139
137
  $stdout, $stderr = STDOUT, STDERR
140
138
  end
141
139
 
142
- DIFF = if WIN32 then 'diff.exe'
143
- else
144
- if quiet {system "gdiff", __FILE__, __FILE__} then 'gdiff'
145
- else 'diff' end
146
- end unless defined? DIFF
147
-
148
- SUDO = if WIN32 then ''
149
- else
150
- if quiet {system 'which sudo'} then 'sudo'
151
- else '' end
152
- end
153
-
154
- RCOV = WIN32 ? 'rcov.bat' : 'rcov'
155
- RDOC = WIN32 ? 'rdoc.bat' : 'rdoc'
156
- GEM = WIN32 ? 'gem.bat' : 'gem'
140
+ DIFF = if system("gdiff '#{__FILE__}' '#{__FILE__}' > #{DEV_NULL} 2>&1") then 'gdiff'
141
+ else 'diff' end unless defined? DIFF
142
+
143
+ SUDO = if system("which sudo > #{DEV_NULL} 2>&1") then 'sudo'
144
+ else '' end unless defined? SUDO
145
+
146
+ RCOV = "#{RUBY} -S rcov"
147
+ RDOC = "#{RUBY} -S rdoc"
148
+ GEM = "#{RUBY} -S gem"
157
149
 
158
150
  %w(rcov spec/rake/spectask rubyforge bones facets/ansicode).each do |lib|
159
151
  begin
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: genfrag
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.0.1
4
+ version: 0.0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins and Trevor Wennblom
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-02-26 00:00:00 -06:00
12
+ date: 2009-03-16 00:00:00 -05:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -40,9 +40,9 @@ dependencies:
40
40
  requirements:
41
41
  - - ">="
42
42
  - !ruby/object:Gem::Version
43
- version: 2.4.0
43
+ version: 2.4.2
44
44
  version:
45
- description: This is a development release. Few features are functional at this time. Genfrag allows for rapid in-silico searching of fragments cut by different restriction enzymes in large nucleotide acid databases, followed by matching specificity adapters which allow a further data reduction when looking for differential expression of genes and markers.
45
+ description: This is a development release. Some features are functional at this time. Genfrag allows for rapid in-silico searching of fragments cut by different restriction enzymes in large nucleotide acid databases, followed by matching specificity adapters which allow a further data reduction when looking for differential expression of genes and markers.
46
46
  email: trevor@corevx.com
47
47
  executables:
48
48
  - genfrag
@@ -82,6 +82,7 @@ files:
82
82
  - spec/genfrag/app/command_spec.rb
83
83
  - spec/genfrag/app/index_command_spec.rb
84
84
  - spec/genfrag/app/search_command/match_spec.rb
85
+ - spec/genfrag/app/search_command/predictor_spec.rb
85
86
  - spec/genfrag/app/search_command/process_file_spec.rb
86
87
  - spec/genfrag/app/search_command/trim_spec.rb
87
88
  - spec/genfrag/app/search_command_spec.rb