genomer-plugin-summary 0.0.3 → 0.0.4
Sign up to get free protection for your applications and to get access to all the features.
- data/VERSION +1 -1
- data/features/contigs.feature +347 -0
- data/features/gaps.feature +34 -0
- data/features/genome.feature +213 -0
- data/features/sequences.feature +39 -8
- data/lib/genomer-plugin-summary/contigs.rb +63 -0
- data/lib/genomer-plugin-summary/enumerators.rb +81 -0
- data/lib/genomer-plugin-summary/format.rb +87 -0
- data/lib/genomer-plugin-summary/gaps.rb +25 -33
- data/lib/genomer-plugin-summary/genome.rb +51 -0
- data/lib/genomer-plugin-summary/metrics.rb +23 -9
- data/lib/genomer-plugin-summary/sequences.rb +44 -70
- data/spec/genomer-plugin-summary_spec/contigs_spec.rb +211 -0
- data/spec/genomer-plugin-summary_spec/enumerators_spec.rb +383 -0
- data/spec/genomer-plugin-summary_spec/format_spec.rb +285 -0
- data/spec/genomer-plugin-summary_spec/gaps_spec.rb +32 -7
- data/spec/genomer-plugin-summary_spec/{scaffold_spec.rb → genome_spec.rb} +26 -7
- data/spec/genomer-plugin-summary_spec/metrics_spec.rb +64 -0
- data/spec/genomer-plugin-summary_spec/sequences_spec.rb +52 -85
- data/spec/spec_helper.rb +1 -1
- metadata +20 -9
- data/features/scaffold.feature +0 -122
- data/lib/genomer-plugin-summary/scaffold.rb +0 -56
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.0.
|
1
|
+
0.0.4
|
@@ -0,0 +1,347 @@
|
|
1
|
+
Feature: Producing a summary of the scaffold contigs
|
2
|
+
In order to have an overview of the contigs in a scaffold
|
3
|
+
A user can use the "contigs" command
|
4
|
+
to generate the a tabular output of the scaffold contigs
|
5
|
+
|
6
|
+
@disable-bundler
|
7
|
+
Scenario: An empty scaffold
|
8
|
+
Given I create a new genomer project
|
9
|
+
And I write to "assembly/scaffold.yml" with:
|
10
|
+
"""
|
11
|
+
---
|
12
|
+
-
|
13
|
+
unresolved:
|
14
|
+
length: 50
|
15
|
+
"""
|
16
|
+
And I write to "assembly/sequence.fna" with:
|
17
|
+
"""
|
18
|
+
>contig0001
|
19
|
+
ATGC
|
20
|
+
"""
|
21
|
+
When I run `genomer summary contigs`
|
22
|
+
Then the exit status should be 0
|
23
|
+
And the output should contain:
|
24
|
+
"""
|
25
|
+
+--------+------------+------------+------------+----------+--------+
|
26
|
+
| Scaffold Contigs |
|
27
|
+
+--------+------------+------------+------------+----------+--------+
|
28
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
29
|
+
+--------+------------+------------+------------+----------+--------+
|
30
|
+
+--------+------------+------------+------------+----------+--------+
|
31
|
+
| All | 0 | 0 | 0 | 0.00 | 0.00 |
|
32
|
+
+--------+------------+------------+------------+----------+--------+
|
33
|
+
"""
|
34
|
+
|
35
|
+
Scenario: A scaffold with a single contig
|
36
|
+
Given I create a new genomer project
|
37
|
+
And I write to "assembly/scaffold.yml" with:
|
38
|
+
"""
|
39
|
+
---
|
40
|
+
-
|
41
|
+
sequence:
|
42
|
+
source: contig0001
|
43
|
+
"""
|
44
|
+
And I write to "assembly/sequence.fna" with:
|
45
|
+
"""
|
46
|
+
>contig0001
|
47
|
+
ATGC
|
48
|
+
"""
|
49
|
+
When I run `genomer summary contigs`
|
50
|
+
Then the exit status should be 0
|
51
|
+
And the output should contain:
|
52
|
+
"""
|
53
|
+
+--------+------------+------------+------------+----------+--------+
|
54
|
+
| Scaffold Contigs |
|
55
|
+
+--------+------------+------------+------------+----------+--------+
|
56
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
57
|
+
+--------+------------+------------+------------+----------+--------+
|
58
|
+
| 1 | 1 | 4 | 4 | 100.00 | 50.00 |
|
59
|
+
+--------+------------+------------+------------+----------+--------+
|
60
|
+
| All | 1 | 4 | 4 | 100.00 | 50.00 |
|
61
|
+
+--------+------------+------------+------------+----------+--------+
|
62
|
+
"""
|
63
|
+
|
64
|
+
Scenario: A scaffold with a two different contigs
|
65
|
+
Given I create a new genomer project
|
66
|
+
And I write to "assembly/scaffold.yml" with:
|
67
|
+
"""
|
68
|
+
---
|
69
|
+
-
|
70
|
+
sequence:
|
71
|
+
source: contig0001
|
72
|
+
-
|
73
|
+
sequence:
|
74
|
+
source: contig0002
|
75
|
+
"""
|
76
|
+
And I write to "assembly/sequence.fna" with:
|
77
|
+
"""
|
78
|
+
>contig0001
|
79
|
+
ATGCGC
|
80
|
+
>contig0002
|
81
|
+
ATATGC
|
82
|
+
"""
|
83
|
+
When I run `genomer summary contigs`
|
84
|
+
Then the exit status should be 0
|
85
|
+
And the output should contain:
|
86
|
+
"""
|
87
|
+
+--------+------------+------------+------------+----------+--------+
|
88
|
+
| Scaffold Contigs |
|
89
|
+
+--------+------------+------------+------------+----------+--------+
|
90
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
91
|
+
+--------+------------+------------+------------+----------+--------+
|
92
|
+
| 1 | 1 | 12 | 12 | 100.00 | 50.00 |
|
93
|
+
+--------+------------+------------+------------+----------+--------+
|
94
|
+
| All | 1 | 12 | 12 | 100.00 | 50.00 |
|
95
|
+
+--------+------------+------------+------------+----------+--------+
|
96
|
+
"""
|
97
|
+
|
98
|
+
Scenario: A scaffold with a two repeated contigs
|
99
|
+
Given I create a new genomer project
|
100
|
+
And I write to "assembly/scaffold.yml" with:
|
101
|
+
"""
|
102
|
+
---
|
103
|
+
-
|
104
|
+
sequence:
|
105
|
+
source: contig0001
|
106
|
+
-
|
107
|
+
sequence:
|
108
|
+
source: contig0001
|
109
|
+
"""
|
110
|
+
And I write to "assembly/sequence.fna" with:
|
111
|
+
"""
|
112
|
+
>contig0001
|
113
|
+
ATGCGC
|
114
|
+
"""
|
115
|
+
When I run `genomer summary contigs`
|
116
|
+
Then the exit status should be 0
|
117
|
+
And the output should contain:
|
118
|
+
"""
|
119
|
+
+--------+------------+------------+------------+----------+--------+
|
120
|
+
| Scaffold Contigs |
|
121
|
+
+--------+------------+------------+------------+----------+--------+
|
122
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
123
|
+
+--------+------------+------------+------------+----------+--------+
|
124
|
+
| 1 | 1 | 12 | 12 | 100.00 | 66.67 |
|
125
|
+
+--------+------------+------------+------------+----------+--------+
|
126
|
+
| All | 1 | 12 | 12 | 100.00 | 66.67 |
|
127
|
+
+--------+------------+------------+------------+----------+--------+
|
128
|
+
"""
|
129
|
+
|
130
|
+
Scenario: A scaffold with a two contigs separated by a gap
|
131
|
+
Given I create a new genomer project
|
132
|
+
And I write to "assembly/scaffold.yml" with:
|
133
|
+
"""
|
134
|
+
---
|
135
|
+
-
|
136
|
+
sequence:
|
137
|
+
source: contig0001
|
138
|
+
-
|
139
|
+
unresolved:
|
140
|
+
length: 8
|
141
|
+
-
|
142
|
+
sequence:
|
143
|
+
source: contig0002
|
144
|
+
"""
|
145
|
+
And I write to "assembly/sequence.fna" with:
|
146
|
+
"""
|
147
|
+
>contig0001
|
148
|
+
ATGCGC
|
149
|
+
>contig0002
|
150
|
+
ATATGC
|
151
|
+
"""
|
152
|
+
When I run `genomer summary contigs`
|
153
|
+
Then the exit status should be 0
|
154
|
+
And the output should contain:
|
155
|
+
"""
|
156
|
+
+--------+------------+------------+------------+----------+--------+
|
157
|
+
| Scaffold Contigs |
|
158
|
+
+--------+------------+------------+------------+----------+--------+
|
159
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
160
|
+
+--------+------------+------------+------------+----------+--------+
|
161
|
+
| 1 | 1 | 6 | 6 | 30.00 | 66.67 |
|
162
|
+
| 2 | 15 | 20 | 6 | 30.00 | 33.33 |
|
163
|
+
+--------+------------+------------+------------+----------+--------+
|
164
|
+
| All | 1 | 20 | 12 | 60.00 | 50.00 |
|
165
|
+
+--------+------------+------------+------------+----------+--------+
|
166
|
+
"""
|
167
|
+
|
168
|
+
Scenario: A scaffold with a two contigs and a gap at the start
|
169
|
+
Given I create a new genomer project
|
170
|
+
And I write to "assembly/scaffold.yml" with:
|
171
|
+
"""
|
172
|
+
---
|
173
|
+
-
|
174
|
+
unresolved:
|
175
|
+
length: 8
|
176
|
+
-
|
177
|
+
sequence:
|
178
|
+
source: contig0001
|
179
|
+
-
|
180
|
+
sequence:
|
181
|
+
source: contig0002
|
182
|
+
"""
|
183
|
+
And I write to "assembly/sequence.fna" with:
|
184
|
+
"""
|
185
|
+
>contig0001
|
186
|
+
ATGCGC
|
187
|
+
>contig0002
|
188
|
+
ATATGC
|
189
|
+
"""
|
190
|
+
When I run `genomer summary contigs`
|
191
|
+
Then the exit status should be 0
|
192
|
+
And the output should contain:
|
193
|
+
"""
|
194
|
+
+--------+------------+------------+------------+----------+--------+
|
195
|
+
| Scaffold Contigs |
|
196
|
+
+--------+------------+------------+------------+----------+--------+
|
197
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
198
|
+
+--------+------------+------------+------------+----------+--------+
|
199
|
+
| 1 | 9 | 20 | 12 | 60.00 | 50.00 |
|
200
|
+
+--------+------------+------------+------------+----------+--------+
|
201
|
+
| All | 9 | 20 | 12 | 60.00 | 50.00 |
|
202
|
+
+--------+------------+------------+------------+----------+--------+
|
203
|
+
"""
|
204
|
+
|
205
|
+
Scenario: A scaffold with a two contigs and a gap at the end
|
206
|
+
Given I create a new genomer project
|
207
|
+
And I write to "assembly/scaffold.yml" with:
|
208
|
+
"""
|
209
|
+
---
|
210
|
+
-
|
211
|
+
sequence:
|
212
|
+
source: contig0001
|
213
|
+
-
|
214
|
+
sequence:
|
215
|
+
source: contig0002
|
216
|
+
-
|
217
|
+
unresolved:
|
218
|
+
length: 8
|
219
|
+
"""
|
220
|
+
And I write to "assembly/sequence.fna" with:
|
221
|
+
"""
|
222
|
+
>contig0001
|
223
|
+
ATGCGC
|
224
|
+
>contig0002
|
225
|
+
ATATGC
|
226
|
+
"""
|
227
|
+
When I run `genomer summary contigs`
|
228
|
+
Then the exit status should be 0
|
229
|
+
And the output should contain:
|
230
|
+
"""
|
231
|
+
+--------+------------+------------+------------+----------+--------+
|
232
|
+
| Scaffold Contigs |
|
233
|
+
+--------+------------+------------+------------+----------+--------+
|
234
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
235
|
+
+--------+------------+------------+------------+----------+--------+
|
236
|
+
| 1 | 1 | 12 | 12 | 60.00 | 50.00 |
|
237
|
+
+--------+------------+------------+------------+----------+--------+
|
238
|
+
| All | 1 | 12 | 12 | 60.00 | 50.00 |
|
239
|
+
+--------+------------+------------+------------+----------+--------+
|
240
|
+
"""
|
241
|
+
|
242
|
+
Scenario: A scaffold with two contigs containing internal gaps
|
243
|
+
Given I create a new genomer project
|
244
|
+
And I write to "assembly/scaffold.yml" with:
|
245
|
+
"""
|
246
|
+
---
|
247
|
+
-
|
248
|
+
sequence:
|
249
|
+
source: contig0001
|
250
|
+
-
|
251
|
+
sequence:
|
252
|
+
source: contig0002
|
253
|
+
"""
|
254
|
+
And I write to "assembly/sequence.fna" with:
|
255
|
+
"""
|
256
|
+
>contig0001
|
257
|
+
ATATNNNNGCGC
|
258
|
+
>contig0002
|
259
|
+
ATATNNNNGCGC
|
260
|
+
"""
|
261
|
+
When I run `genomer summary contigs`
|
262
|
+
Then the exit status should be 0
|
263
|
+
And the output should contain:
|
264
|
+
"""
|
265
|
+
+--------+------------+------------+------------+----------+--------+
|
266
|
+
| Scaffold Contigs |
|
267
|
+
+--------+------------+------------+------------+----------+--------+
|
268
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
269
|
+
+--------+------------+------------+------------+----------+--------+
|
270
|
+
| 1 | 1 | 4 | 4 | 16.67 | 0.00 |
|
271
|
+
| 2 | 9 | 16 | 8 | 33.33 | 50.00 |
|
272
|
+
| 3 | 21 | 24 | 4 | 16.67 | 100.00 |
|
273
|
+
+--------+------------+------------+------------+----------+--------+
|
274
|
+
| All | 1 | 24 | 16 | 66.67 | 50.00 |
|
275
|
+
+--------+------------+------------+------------+----------+--------+
|
276
|
+
"""
|
277
|
+
|
278
|
+
Scenario: A scaffold with two contigs containing internal gaps separated by a gap
|
279
|
+
Given I create a new genomer project
|
280
|
+
And I write to "assembly/scaffold.yml" with:
|
281
|
+
"""
|
282
|
+
---
|
283
|
+
-
|
284
|
+
sequence:
|
285
|
+
source: contig0001
|
286
|
+
-
|
287
|
+
unresolved:
|
288
|
+
length: 6
|
289
|
+
-
|
290
|
+
sequence:
|
291
|
+
source: contig0002
|
292
|
+
"""
|
293
|
+
And I write to "assembly/sequence.fna" with:
|
294
|
+
"""
|
295
|
+
>contig0001
|
296
|
+
ATATNNNNGCGC
|
297
|
+
>contig0002
|
298
|
+
ATATNNNNGCGC
|
299
|
+
"""
|
300
|
+
When I run `genomer summary contigs`
|
301
|
+
Then the exit status should be 0
|
302
|
+
And the output should contain:
|
303
|
+
"""
|
304
|
+
+--------+------------+------------+------------+----------+--------+
|
305
|
+
| Scaffold Contigs |
|
306
|
+
+--------+------------+------------+------------+----------+--------+
|
307
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
308
|
+
+--------+------------+------------+------------+----------+--------+
|
309
|
+
| 1 | 1 | 4 | 4 | 13.33 | 0.00 |
|
310
|
+
| 2 | 9 | 12 | 4 | 13.33 | 100.00 |
|
311
|
+
| 3 | 19 | 22 | 4 | 13.33 | 0.00 |
|
312
|
+
| 4 | 27 | 30 | 4 | 13.33 | 100.00 |
|
313
|
+
+--------+------------+------------+------------+----------+--------+
|
314
|
+
| All | 1 | 30 | 16 | 53.33 | 50.00 |
|
315
|
+
+--------+------------+------------+------------+----------+--------+
|
316
|
+
"""
|
317
|
+
|
318
|
+
Scenario: Generating CSV output
|
319
|
+
Given I create a new genomer project
|
320
|
+
And I write to "assembly/scaffold.yml" with:
|
321
|
+
"""
|
322
|
+
---
|
323
|
+
-
|
324
|
+
sequence:
|
325
|
+
source: contig0001
|
326
|
+
-
|
327
|
+
sequence:
|
328
|
+
source: contig0002
|
329
|
+
-
|
330
|
+
unresolved:
|
331
|
+
length: 8
|
332
|
+
"""
|
333
|
+
And I write to "assembly/sequence.fna" with:
|
334
|
+
"""
|
335
|
+
>contig0001
|
336
|
+
ATGCGC
|
337
|
+
>contig0002
|
338
|
+
ATATGC
|
339
|
+
"""
|
340
|
+
When I run `genomer summary contigs --output=csv`
|
341
|
+
Then the exit status should be 0
|
342
|
+
And the output should contain:
|
343
|
+
"""
|
344
|
+
contig,start_bp,end_bp,size_bp,size_%,gc_%
|
345
|
+
1,1,12,12,60.00,50.00
|
346
|
+
all,1,12,12,60.00,50.00
|
347
|
+
"""
|
data/features/gaps.feature
CHANGED
@@ -237,3 +237,37 @@ Feature: Producing a summary of the scaffold gaps
|
|
237
237
|
| 2 | 5 | 11 | 15 | unresolved |
|
238
238
|
+----------+----------+----------+----------+--------------+
|
239
239
|
"""
|
240
|
+
|
241
|
+
@disable-bundler
|
242
|
+
Scenario: Generating CSV output
|
243
|
+
Given I create a new genomer project
|
244
|
+
And I write to "assembly/scaffold.yml" with:
|
245
|
+
"""
|
246
|
+
---
|
247
|
+
-
|
248
|
+
sequence:
|
249
|
+
source: "contig00001"
|
250
|
+
inserts:
|
251
|
+
-
|
252
|
+
source: "insert_1"
|
253
|
+
open: 4
|
254
|
+
close: 5
|
255
|
+
-
|
256
|
+
unresolved:
|
257
|
+
length: 5
|
258
|
+
"""
|
259
|
+
And I write to "assembly/sequence.fna" with:
|
260
|
+
"""
|
261
|
+
>contig00001
|
262
|
+
ATGNNNATG
|
263
|
+
>insert_1
|
264
|
+
AAA
|
265
|
+
"""
|
266
|
+
When I run `genomer summary gaps --output=csv`
|
267
|
+
Then the exit status should be 0
|
268
|
+
And the output should contain:
|
269
|
+
"""
|
270
|
+
number,length,start,end,type
|
271
|
+
1,1,7,7,contig
|
272
|
+
2,5,11,15,unresolved
|
273
|
+
"""
|
@@ -0,0 +1,213 @@
|
|
1
|
+
Feature: Producing a summary of the genome
|
2
|
+
In order to have an overview of the genome
|
3
|
+
A user can use the "genome" command
|
4
|
+
to generate the a tabular output of the genome
|
5
|
+
|
6
|
+
Scenario: A scaffold with a single sequence
|
7
|
+
Given I create a new genomer project
|
8
|
+
And I write to "assembly/scaffold.yml" with:
|
9
|
+
"""
|
10
|
+
---
|
11
|
+
-
|
12
|
+
sequence:
|
13
|
+
source: contig0001
|
14
|
+
"""
|
15
|
+
And I write to "assembly/sequence.fna" with:
|
16
|
+
"""
|
17
|
+
>contig0001
|
18
|
+
ATGC
|
19
|
+
"""
|
20
|
+
When I run `genomer summary genome`
|
21
|
+
Then the exit status should be 0
|
22
|
+
And the output should contain:
|
23
|
+
"""
|
24
|
+
+----------------+-----------+
|
25
|
+
| Scaffold |
|
26
|
+
+----------------+-----------+
|
27
|
+
| Sequences (#) | 1 |
|
28
|
+
| Contigs (#) | 1 |
|
29
|
+
| Gaps (#) | 0 |
|
30
|
+
+----------------+-----------+
|
31
|
+
| Size (bp) | 4 |
|
32
|
+
| Sequences (bp) | 4 |
|
33
|
+
| Contigs (bp) | 4 |
|
34
|
+
| Gaps (bp) | 0 |
|
35
|
+
+----------------+-----------+
|
36
|
+
| G+C (%) | 50.00 |
|
37
|
+
| Sequences (%) | 100.00 |
|
38
|
+
| Contigs (%) | 100.00 |
|
39
|
+
| Gaps (%) | 0.00 |
|
40
|
+
+----------------+-----------+
|
41
|
+
|
42
|
+
"""
|
43
|
+
|
44
|
+
Scenario: A scaffold with a two sequences
|
45
|
+
Given I create a new genomer project
|
46
|
+
And I write to "assembly/scaffold.yml" with:
|
47
|
+
"""
|
48
|
+
---
|
49
|
+
-
|
50
|
+
sequence:
|
51
|
+
source: contig0001
|
52
|
+
-
|
53
|
+
sequence:
|
54
|
+
source: contig0002
|
55
|
+
"""
|
56
|
+
And I write to "assembly/sequence.fna" with:
|
57
|
+
"""
|
58
|
+
>contig0001
|
59
|
+
ATGC
|
60
|
+
>contig0002
|
61
|
+
GGGC
|
62
|
+
"""
|
63
|
+
When I run `genomer summary genome`
|
64
|
+
Then the exit status should be 0
|
65
|
+
And the output should contain:
|
66
|
+
"""
|
67
|
+
+----------------+-----------+
|
68
|
+
| Scaffold |
|
69
|
+
+----------------+-----------+
|
70
|
+
| Sequences (#) | 2 |
|
71
|
+
| Contigs (#) | 1 |
|
72
|
+
| Gaps (#) | 0 |
|
73
|
+
+----------------+-----------+
|
74
|
+
| Size (bp) | 8 |
|
75
|
+
| Sequences (bp) | 8 |
|
76
|
+
| Contigs (bp) | 8 |
|
77
|
+
| Gaps (bp) | 0 |
|
78
|
+
+----------------+-----------+
|
79
|
+
| G+C (%) | 75.00 |
|
80
|
+
| Sequences (%) | 100.00 |
|
81
|
+
| Contigs (%) | 100.00 |
|
82
|
+
| Gaps (%) | 0.00 |
|
83
|
+
+----------------+-----------+
|
84
|
+
|
85
|
+
"""
|
86
|
+
|
87
|
+
Scenario: A scaffold with a two sequences and a gap
|
88
|
+
Given I create a new genomer project
|
89
|
+
And I write to "assembly/scaffold.yml" with:
|
90
|
+
"""
|
91
|
+
---
|
92
|
+
-
|
93
|
+
sequence:
|
94
|
+
source: contig0001
|
95
|
+
-
|
96
|
+
unresolved:
|
97
|
+
length: 5
|
98
|
+
-
|
99
|
+
sequence:
|
100
|
+
source: contig0002
|
101
|
+
"""
|
102
|
+
And I write to "assembly/sequence.fna" with:
|
103
|
+
"""
|
104
|
+
>contig0001
|
105
|
+
ATGC
|
106
|
+
>contig0002
|
107
|
+
GGGC
|
108
|
+
"""
|
109
|
+
When I run `genomer summary genome`
|
110
|
+
Then the exit status should be 0
|
111
|
+
And the output should contain:
|
112
|
+
"""
|
113
|
+
+----------------+-----------+
|
114
|
+
| Scaffold |
|
115
|
+
+----------------+-----------+
|
116
|
+
| Sequences (#) | 2 |
|
117
|
+
| Contigs (#) | 2 |
|
118
|
+
| Gaps (#) | 1 |
|
119
|
+
+----------------+-----------+
|
120
|
+
| Size (bp) | 13 |
|
121
|
+
| Sequences (bp) | 8 |
|
122
|
+
| Contigs (bp) | 8 |
|
123
|
+
| Gaps (bp) | 5 |
|
124
|
+
+----------------+-----------+
|
125
|
+
| G+C (%) | 75.00 |
|
126
|
+
| Sequences (%) | 61.54 |
|
127
|
+
| Contigs (%) | 61.54 |
|
128
|
+
| Gaps (%) | 38.46 |
|
129
|
+
+----------------+-----------+
|
130
|
+
|
131
|
+
"""
|
132
|
+
|
133
|
+
Scenario: A scaffold with a two sequences containing gaps
|
134
|
+
Given I create a new genomer project
|
135
|
+
And I write to "assembly/scaffold.yml" with:
|
136
|
+
"""
|
137
|
+
---
|
138
|
+
-
|
139
|
+
sequence:
|
140
|
+
source: contig0001
|
141
|
+
-
|
142
|
+
sequence:
|
143
|
+
source: contig0002
|
144
|
+
"""
|
145
|
+
And I write to "assembly/sequence.fna" with:
|
146
|
+
"""
|
147
|
+
>contig0001
|
148
|
+
AAANNNGGG
|
149
|
+
>contig0002
|
150
|
+
AAANNNGGG
|
151
|
+
"""
|
152
|
+
When I run `genomer summary genome`
|
153
|
+
Then the exit status should be 0
|
154
|
+
And the output should contain:
|
155
|
+
"""
|
156
|
+
+----------------+-----------+
|
157
|
+
| Scaffold |
|
158
|
+
+----------------+-----------+
|
159
|
+
| Sequences (#) | 2 |
|
160
|
+
| Contigs (#) | 3 |
|
161
|
+
| Gaps (#) | 2 |
|
162
|
+
+----------------+-----------+
|
163
|
+
| Size (bp) | 18 |
|
164
|
+
| Sequences (bp) | 18 |
|
165
|
+
| Contigs (bp) | 12 |
|
166
|
+
| Gaps (bp) | 6 |
|
167
|
+
+----------------+-----------+
|
168
|
+
| G+C (%) | 50.00 |
|
169
|
+
| Sequences (%) | 100.00 |
|
170
|
+
| Contigs (%) | 66.67 |
|
171
|
+
| Gaps (%) | 33.33 |
|
172
|
+
+----------------+-----------+
|
173
|
+
|
174
|
+
"""
|
175
|
+
|
176
|
+
Scenario: Generating CSV output
|
177
|
+
Given I create a new genomer project
|
178
|
+
And I write to "assembly/scaffold.yml" with:
|
179
|
+
"""
|
180
|
+
---
|
181
|
+
-
|
182
|
+
sequence:
|
183
|
+
source: contig0001
|
184
|
+
-
|
185
|
+
unresolved:
|
186
|
+
length: 5
|
187
|
+
-
|
188
|
+
sequence:
|
189
|
+
source: contig0002
|
190
|
+
"""
|
191
|
+
And I write to "assembly/sequence.fna" with:
|
192
|
+
"""
|
193
|
+
>contig0001
|
194
|
+
ATGC
|
195
|
+
>contig0002
|
196
|
+
GGGC
|
197
|
+
"""
|
198
|
+
When I run `genomer summary genome --output=csv`
|
199
|
+
Then the exit status should be 0
|
200
|
+
And the output should contain:
|
201
|
+
"""
|
202
|
+
sequences_#,2
|
203
|
+
contigs_#,2
|
204
|
+
gaps_#,1
|
205
|
+
size_bp,13
|
206
|
+
sequences_bp,8
|
207
|
+
contigs_bp,8
|
208
|
+
gaps_bp,5
|
209
|
+
g+c_%,75.00
|
210
|
+
sequences_%,61.54
|
211
|
+
contigs_%,61.54
|
212
|
+
gaps_%,38.46
|
213
|
+
"""
|