genomer-plugin-summary 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.3
1
+ 0.0.4
@@ -0,0 +1,347 @@
1
+ Feature: Producing a summary of the scaffold contigs
2
+ In order to have an overview of the contigs in a scaffold
3
+ A user can use the "contigs" command
4
+ to generate the a tabular output of the scaffold contigs
5
+
6
+ @disable-bundler
7
+ Scenario: An empty scaffold
8
+ Given I create a new genomer project
9
+ And I write to "assembly/scaffold.yml" with:
10
+ """
11
+ ---
12
+ -
13
+ unresolved:
14
+ length: 50
15
+ """
16
+ And I write to "assembly/sequence.fna" with:
17
+ """
18
+ >contig0001
19
+ ATGC
20
+ """
21
+ When I run `genomer summary contigs`
22
+ Then the exit status should be 0
23
+ And the output should contain:
24
+ """
25
+ +--------+------------+------------+------------+----------+--------+
26
+ | Scaffold Contigs |
27
+ +--------+------------+------------+------------+----------+--------+
28
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
29
+ +--------+------------+------------+------------+----------+--------+
30
+ +--------+------------+------------+------------+----------+--------+
31
+ | All | 0 | 0 | 0 | 0.00 | 0.00 |
32
+ +--------+------------+------------+------------+----------+--------+
33
+ """
34
+
35
+ Scenario: A scaffold with a single contig
36
+ Given I create a new genomer project
37
+ And I write to "assembly/scaffold.yml" with:
38
+ """
39
+ ---
40
+ -
41
+ sequence:
42
+ source: contig0001
43
+ """
44
+ And I write to "assembly/sequence.fna" with:
45
+ """
46
+ >contig0001
47
+ ATGC
48
+ """
49
+ When I run `genomer summary contigs`
50
+ Then the exit status should be 0
51
+ And the output should contain:
52
+ """
53
+ +--------+------------+------------+------------+----------+--------+
54
+ | Scaffold Contigs |
55
+ +--------+------------+------------+------------+----------+--------+
56
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
57
+ +--------+------------+------------+------------+----------+--------+
58
+ | 1 | 1 | 4 | 4 | 100.00 | 50.00 |
59
+ +--------+------------+------------+------------+----------+--------+
60
+ | All | 1 | 4 | 4 | 100.00 | 50.00 |
61
+ +--------+------------+------------+------------+----------+--------+
62
+ """
63
+
64
+ Scenario: A scaffold with a two different contigs
65
+ Given I create a new genomer project
66
+ And I write to "assembly/scaffold.yml" with:
67
+ """
68
+ ---
69
+ -
70
+ sequence:
71
+ source: contig0001
72
+ -
73
+ sequence:
74
+ source: contig0002
75
+ """
76
+ And I write to "assembly/sequence.fna" with:
77
+ """
78
+ >contig0001
79
+ ATGCGC
80
+ >contig0002
81
+ ATATGC
82
+ """
83
+ When I run `genomer summary contigs`
84
+ Then the exit status should be 0
85
+ And the output should contain:
86
+ """
87
+ +--------+------------+------------+------------+----------+--------+
88
+ | Scaffold Contigs |
89
+ +--------+------------+------------+------------+----------+--------+
90
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
91
+ +--------+------------+------------+------------+----------+--------+
92
+ | 1 | 1 | 12 | 12 | 100.00 | 50.00 |
93
+ +--------+------------+------------+------------+----------+--------+
94
+ | All | 1 | 12 | 12 | 100.00 | 50.00 |
95
+ +--------+------------+------------+------------+----------+--------+
96
+ """
97
+
98
+ Scenario: A scaffold with a two repeated contigs
99
+ Given I create a new genomer project
100
+ And I write to "assembly/scaffold.yml" with:
101
+ """
102
+ ---
103
+ -
104
+ sequence:
105
+ source: contig0001
106
+ -
107
+ sequence:
108
+ source: contig0001
109
+ """
110
+ And I write to "assembly/sequence.fna" with:
111
+ """
112
+ >contig0001
113
+ ATGCGC
114
+ """
115
+ When I run `genomer summary contigs`
116
+ Then the exit status should be 0
117
+ And the output should contain:
118
+ """
119
+ +--------+------------+------------+------------+----------+--------+
120
+ | Scaffold Contigs |
121
+ +--------+------------+------------+------------+----------+--------+
122
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
123
+ +--------+------------+------------+------------+----------+--------+
124
+ | 1 | 1 | 12 | 12 | 100.00 | 66.67 |
125
+ +--------+------------+------------+------------+----------+--------+
126
+ | All | 1 | 12 | 12 | 100.00 | 66.67 |
127
+ +--------+------------+------------+------------+----------+--------+
128
+ """
129
+
130
+ Scenario: A scaffold with a two contigs separated by a gap
131
+ Given I create a new genomer project
132
+ And I write to "assembly/scaffold.yml" with:
133
+ """
134
+ ---
135
+ -
136
+ sequence:
137
+ source: contig0001
138
+ -
139
+ unresolved:
140
+ length: 8
141
+ -
142
+ sequence:
143
+ source: contig0002
144
+ """
145
+ And I write to "assembly/sequence.fna" with:
146
+ """
147
+ >contig0001
148
+ ATGCGC
149
+ >contig0002
150
+ ATATGC
151
+ """
152
+ When I run `genomer summary contigs`
153
+ Then the exit status should be 0
154
+ And the output should contain:
155
+ """
156
+ +--------+------------+------------+------------+----------+--------+
157
+ | Scaffold Contigs |
158
+ +--------+------------+------------+------------+----------+--------+
159
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
160
+ +--------+------------+------------+------------+----------+--------+
161
+ | 1 | 1 | 6 | 6 | 30.00 | 66.67 |
162
+ | 2 | 15 | 20 | 6 | 30.00 | 33.33 |
163
+ +--------+------------+------------+------------+----------+--------+
164
+ | All | 1 | 20 | 12 | 60.00 | 50.00 |
165
+ +--------+------------+------------+------------+----------+--------+
166
+ """
167
+
168
+ Scenario: A scaffold with a two contigs and a gap at the start
169
+ Given I create a new genomer project
170
+ And I write to "assembly/scaffold.yml" with:
171
+ """
172
+ ---
173
+ -
174
+ unresolved:
175
+ length: 8
176
+ -
177
+ sequence:
178
+ source: contig0001
179
+ -
180
+ sequence:
181
+ source: contig0002
182
+ """
183
+ And I write to "assembly/sequence.fna" with:
184
+ """
185
+ >contig0001
186
+ ATGCGC
187
+ >contig0002
188
+ ATATGC
189
+ """
190
+ When I run `genomer summary contigs`
191
+ Then the exit status should be 0
192
+ And the output should contain:
193
+ """
194
+ +--------+------------+------------+------------+----------+--------+
195
+ | Scaffold Contigs |
196
+ +--------+------------+------------+------------+----------+--------+
197
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
198
+ +--------+------------+------------+------------+----------+--------+
199
+ | 1 | 9 | 20 | 12 | 60.00 | 50.00 |
200
+ +--------+------------+------------+------------+----------+--------+
201
+ | All | 9 | 20 | 12 | 60.00 | 50.00 |
202
+ +--------+------------+------------+------------+----------+--------+
203
+ """
204
+
205
+ Scenario: A scaffold with a two contigs and a gap at the end
206
+ Given I create a new genomer project
207
+ And I write to "assembly/scaffold.yml" with:
208
+ """
209
+ ---
210
+ -
211
+ sequence:
212
+ source: contig0001
213
+ -
214
+ sequence:
215
+ source: contig0002
216
+ -
217
+ unresolved:
218
+ length: 8
219
+ """
220
+ And I write to "assembly/sequence.fna" with:
221
+ """
222
+ >contig0001
223
+ ATGCGC
224
+ >contig0002
225
+ ATATGC
226
+ """
227
+ When I run `genomer summary contigs`
228
+ Then the exit status should be 0
229
+ And the output should contain:
230
+ """
231
+ +--------+------------+------------+------------+----------+--------+
232
+ | Scaffold Contigs |
233
+ +--------+------------+------------+------------+----------+--------+
234
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
235
+ +--------+------------+------------+------------+----------+--------+
236
+ | 1 | 1 | 12 | 12 | 60.00 | 50.00 |
237
+ +--------+------------+------------+------------+----------+--------+
238
+ | All | 1 | 12 | 12 | 60.00 | 50.00 |
239
+ +--------+------------+------------+------------+----------+--------+
240
+ """
241
+
242
+ Scenario: A scaffold with two contigs containing internal gaps
243
+ Given I create a new genomer project
244
+ And I write to "assembly/scaffold.yml" with:
245
+ """
246
+ ---
247
+ -
248
+ sequence:
249
+ source: contig0001
250
+ -
251
+ sequence:
252
+ source: contig0002
253
+ """
254
+ And I write to "assembly/sequence.fna" with:
255
+ """
256
+ >contig0001
257
+ ATATNNNNGCGC
258
+ >contig0002
259
+ ATATNNNNGCGC
260
+ """
261
+ When I run `genomer summary contigs`
262
+ Then the exit status should be 0
263
+ And the output should contain:
264
+ """
265
+ +--------+------------+------------+------------+----------+--------+
266
+ | Scaffold Contigs |
267
+ +--------+------------+------------+------------+----------+--------+
268
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
269
+ +--------+------------+------------+------------+----------+--------+
270
+ | 1 | 1 | 4 | 4 | 16.67 | 0.00 |
271
+ | 2 | 9 | 16 | 8 | 33.33 | 50.00 |
272
+ | 3 | 21 | 24 | 4 | 16.67 | 100.00 |
273
+ +--------+------------+------------+------------+----------+--------+
274
+ | All | 1 | 24 | 16 | 66.67 | 50.00 |
275
+ +--------+------------+------------+------------+----------+--------+
276
+ """
277
+
278
+ Scenario: A scaffold with two contigs containing internal gaps separated by a gap
279
+ Given I create a new genomer project
280
+ And I write to "assembly/scaffold.yml" with:
281
+ """
282
+ ---
283
+ -
284
+ sequence:
285
+ source: contig0001
286
+ -
287
+ unresolved:
288
+ length: 6
289
+ -
290
+ sequence:
291
+ source: contig0002
292
+ """
293
+ And I write to "assembly/sequence.fna" with:
294
+ """
295
+ >contig0001
296
+ ATATNNNNGCGC
297
+ >contig0002
298
+ ATATNNNNGCGC
299
+ """
300
+ When I run `genomer summary contigs`
301
+ Then the exit status should be 0
302
+ And the output should contain:
303
+ """
304
+ +--------+------------+------------+------------+----------+--------+
305
+ | Scaffold Contigs |
306
+ +--------+------------+------------+------------+----------+--------+
307
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
308
+ +--------+------------+------------+------------+----------+--------+
309
+ | 1 | 1 | 4 | 4 | 13.33 | 0.00 |
310
+ | 2 | 9 | 12 | 4 | 13.33 | 100.00 |
311
+ | 3 | 19 | 22 | 4 | 13.33 | 0.00 |
312
+ | 4 | 27 | 30 | 4 | 13.33 | 100.00 |
313
+ +--------+------------+------------+------------+----------+--------+
314
+ | All | 1 | 30 | 16 | 53.33 | 50.00 |
315
+ +--------+------------+------------+------------+----------+--------+
316
+ """
317
+
318
+ Scenario: Generating CSV output
319
+ Given I create a new genomer project
320
+ And I write to "assembly/scaffold.yml" with:
321
+ """
322
+ ---
323
+ -
324
+ sequence:
325
+ source: contig0001
326
+ -
327
+ sequence:
328
+ source: contig0002
329
+ -
330
+ unresolved:
331
+ length: 8
332
+ """
333
+ And I write to "assembly/sequence.fna" with:
334
+ """
335
+ >contig0001
336
+ ATGCGC
337
+ >contig0002
338
+ ATATGC
339
+ """
340
+ When I run `genomer summary contigs --output=csv`
341
+ Then the exit status should be 0
342
+ And the output should contain:
343
+ """
344
+ contig,start_bp,end_bp,size_bp,size_%,gc_%
345
+ 1,1,12,12,60.00,50.00
346
+ all,1,12,12,60.00,50.00
347
+ """
@@ -237,3 +237,37 @@ Feature: Producing a summary of the scaffold gaps
237
237
  | 2 | 5 | 11 | 15 | unresolved |
238
238
  +----------+----------+----------+----------+--------------+
239
239
  """
240
+
241
+ @disable-bundler
242
+ Scenario: Generating CSV output
243
+ Given I create a new genomer project
244
+ And I write to "assembly/scaffold.yml" with:
245
+ """
246
+ ---
247
+ -
248
+ sequence:
249
+ source: "contig00001"
250
+ inserts:
251
+ -
252
+ source: "insert_1"
253
+ open: 4
254
+ close: 5
255
+ -
256
+ unresolved:
257
+ length: 5
258
+ """
259
+ And I write to "assembly/sequence.fna" with:
260
+ """
261
+ >contig00001
262
+ ATGNNNATG
263
+ >insert_1
264
+ AAA
265
+ """
266
+ When I run `genomer summary gaps --output=csv`
267
+ Then the exit status should be 0
268
+ And the output should contain:
269
+ """
270
+ number,length,start,end,type
271
+ 1,1,7,7,contig
272
+ 2,5,11,15,unresolved
273
+ """
@@ -0,0 +1,213 @@
1
+ Feature: Producing a summary of the genome
2
+ In order to have an overview of the genome
3
+ A user can use the "genome" command
4
+ to generate the a tabular output of the genome
5
+
6
+ Scenario: A scaffold with a single sequence
7
+ Given I create a new genomer project
8
+ And I write to "assembly/scaffold.yml" with:
9
+ """
10
+ ---
11
+ -
12
+ sequence:
13
+ source: contig0001
14
+ """
15
+ And I write to "assembly/sequence.fna" with:
16
+ """
17
+ >contig0001
18
+ ATGC
19
+ """
20
+ When I run `genomer summary genome`
21
+ Then the exit status should be 0
22
+ And the output should contain:
23
+ """
24
+ +----------------+-----------+
25
+ | Scaffold |
26
+ +----------------+-----------+
27
+ | Sequences (#) | 1 |
28
+ | Contigs (#) | 1 |
29
+ | Gaps (#) | 0 |
30
+ +----------------+-----------+
31
+ | Size (bp) | 4 |
32
+ | Sequences (bp) | 4 |
33
+ | Contigs (bp) | 4 |
34
+ | Gaps (bp) | 0 |
35
+ +----------------+-----------+
36
+ | G+C (%) | 50.00 |
37
+ | Sequences (%) | 100.00 |
38
+ | Contigs (%) | 100.00 |
39
+ | Gaps (%) | 0.00 |
40
+ +----------------+-----------+
41
+
42
+ """
43
+
44
+ Scenario: A scaffold with a two sequences
45
+ Given I create a new genomer project
46
+ And I write to "assembly/scaffold.yml" with:
47
+ """
48
+ ---
49
+ -
50
+ sequence:
51
+ source: contig0001
52
+ -
53
+ sequence:
54
+ source: contig0002
55
+ """
56
+ And I write to "assembly/sequence.fna" with:
57
+ """
58
+ >contig0001
59
+ ATGC
60
+ >contig0002
61
+ GGGC
62
+ """
63
+ When I run `genomer summary genome`
64
+ Then the exit status should be 0
65
+ And the output should contain:
66
+ """
67
+ +----------------+-----------+
68
+ | Scaffold |
69
+ +----------------+-----------+
70
+ | Sequences (#) | 2 |
71
+ | Contigs (#) | 1 |
72
+ | Gaps (#) | 0 |
73
+ +----------------+-----------+
74
+ | Size (bp) | 8 |
75
+ | Sequences (bp) | 8 |
76
+ | Contigs (bp) | 8 |
77
+ | Gaps (bp) | 0 |
78
+ +----------------+-----------+
79
+ | G+C (%) | 75.00 |
80
+ | Sequences (%) | 100.00 |
81
+ | Contigs (%) | 100.00 |
82
+ | Gaps (%) | 0.00 |
83
+ +----------------+-----------+
84
+
85
+ """
86
+
87
+ Scenario: A scaffold with a two sequences and a gap
88
+ Given I create a new genomer project
89
+ And I write to "assembly/scaffold.yml" with:
90
+ """
91
+ ---
92
+ -
93
+ sequence:
94
+ source: contig0001
95
+ -
96
+ unresolved:
97
+ length: 5
98
+ -
99
+ sequence:
100
+ source: contig0002
101
+ """
102
+ And I write to "assembly/sequence.fna" with:
103
+ """
104
+ >contig0001
105
+ ATGC
106
+ >contig0002
107
+ GGGC
108
+ """
109
+ When I run `genomer summary genome`
110
+ Then the exit status should be 0
111
+ And the output should contain:
112
+ """
113
+ +----------------+-----------+
114
+ | Scaffold |
115
+ +----------------+-----------+
116
+ | Sequences (#) | 2 |
117
+ | Contigs (#) | 2 |
118
+ | Gaps (#) | 1 |
119
+ +----------------+-----------+
120
+ | Size (bp) | 13 |
121
+ | Sequences (bp) | 8 |
122
+ | Contigs (bp) | 8 |
123
+ | Gaps (bp) | 5 |
124
+ +----------------+-----------+
125
+ | G+C (%) | 75.00 |
126
+ | Sequences (%) | 61.54 |
127
+ | Contigs (%) | 61.54 |
128
+ | Gaps (%) | 38.46 |
129
+ +----------------+-----------+
130
+
131
+ """
132
+
133
+ Scenario: A scaffold with a two sequences containing gaps
134
+ Given I create a new genomer project
135
+ And I write to "assembly/scaffold.yml" with:
136
+ """
137
+ ---
138
+ -
139
+ sequence:
140
+ source: contig0001
141
+ -
142
+ sequence:
143
+ source: contig0002
144
+ """
145
+ And I write to "assembly/sequence.fna" with:
146
+ """
147
+ >contig0001
148
+ AAANNNGGG
149
+ >contig0002
150
+ AAANNNGGG
151
+ """
152
+ When I run `genomer summary genome`
153
+ Then the exit status should be 0
154
+ And the output should contain:
155
+ """
156
+ +----------------+-----------+
157
+ | Scaffold |
158
+ +----------------+-----------+
159
+ | Sequences (#) | 2 |
160
+ | Contigs (#) | 3 |
161
+ | Gaps (#) | 2 |
162
+ +----------------+-----------+
163
+ | Size (bp) | 18 |
164
+ | Sequences (bp) | 18 |
165
+ | Contigs (bp) | 12 |
166
+ | Gaps (bp) | 6 |
167
+ +----------------+-----------+
168
+ | G+C (%) | 50.00 |
169
+ | Sequences (%) | 100.00 |
170
+ | Contigs (%) | 66.67 |
171
+ | Gaps (%) | 33.33 |
172
+ +----------------+-----------+
173
+
174
+ """
175
+
176
+ Scenario: Generating CSV output
177
+ Given I create a new genomer project
178
+ And I write to "assembly/scaffold.yml" with:
179
+ """
180
+ ---
181
+ -
182
+ sequence:
183
+ source: contig0001
184
+ -
185
+ unresolved:
186
+ length: 5
187
+ -
188
+ sequence:
189
+ source: contig0002
190
+ """
191
+ And I write to "assembly/sequence.fna" with:
192
+ """
193
+ >contig0001
194
+ ATGC
195
+ >contig0002
196
+ GGGC
197
+ """
198
+ When I run `genomer summary genome --output=csv`
199
+ Then the exit status should be 0
200
+ And the output should contain:
201
+ """
202
+ sequences_#,2
203
+ contigs_#,2
204
+ gaps_#,1
205
+ size_bp,13
206
+ sequences_bp,8
207
+ contigs_bp,8
208
+ gaps_bp,5
209
+ g+c_%,75.00
210
+ sequences_%,61.54
211
+ contigs_%,61.54
212
+ gaps_%,38.46
213
+ """