genomer-plugin-summary 0.0.3 → 0.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/VERSION +1 -1
- data/features/contigs.feature +347 -0
- data/features/gaps.feature +34 -0
- data/features/genome.feature +213 -0
- data/features/sequences.feature +39 -8
- data/lib/genomer-plugin-summary/contigs.rb +63 -0
- data/lib/genomer-plugin-summary/enumerators.rb +81 -0
- data/lib/genomer-plugin-summary/format.rb +87 -0
- data/lib/genomer-plugin-summary/gaps.rb +25 -33
- data/lib/genomer-plugin-summary/genome.rb +51 -0
- data/lib/genomer-plugin-summary/metrics.rb +23 -9
- data/lib/genomer-plugin-summary/sequences.rb +44 -70
- data/spec/genomer-plugin-summary_spec/contigs_spec.rb +211 -0
- data/spec/genomer-plugin-summary_spec/enumerators_spec.rb +383 -0
- data/spec/genomer-plugin-summary_spec/format_spec.rb +285 -0
- data/spec/genomer-plugin-summary_spec/gaps_spec.rb +32 -7
- data/spec/genomer-plugin-summary_spec/{scaffold_spec.rb → genome_spec.rb} +26 -7
- data/spec/genomer-plugin-summary_spec/metrics_spec.rb +64 -0
- data/spec/genomer-plugin-summary_spec/sequences_spec.rb +52 -85
- data/spec/spec_helper.rb +1 -1
- metadata +20 -9
- data/features/scaffold.feature +0 -122
- data/lib/genomer-plugin-summary/scaffold.rb +0 -56
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.0.
|
1
|
+
0.0.4
|
@@ -0,0 +1,347 @@
|
|
1
|
+
Feature: Producing a summary of the scaffold contigs
|
2
|
+
In order to have an overview of the contigs in a scaffold
|
3
|
+
A user can use the "contigs" command
|
4
|
+
to generate the a tabular output of the scaffold contigs
|
5
|
+
|
6
|
+
@disable-bundler
|
7
|
+
Scenario: An empty scaffold
|
8
|
+
Given I create a new genomer project
|
9
|
+
And I write to "assembly/scaffold.yml" with:
|
10
|
+
"""
|
11
|
+
---
|
12
|
+
-
|
13
|
+
unresolved:
|
14
|
+
length: 50
|
15
|
+
"""
|
16
|
+
And I write to "assembly/sequence.fna" with:
|
17
|
+
"""
|
18
|
+
>contig0001
|
19
|
+
ATGC
|
20
|
+
"""
|
21
|
+
When I run `genomer summary contigs`
|
22
|
+
Then the exit status should be 0
|
23
|
+
And the output should contain:
|
24
|
+
"""
|
25
|
+
+--------+------------+------------+------------+----------+--------+
|
26
|
+
| Scaffold Contigs |
|
27
|
+
+--------+------------+------------+------------+----------+--------+
|
28
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
29
|
+
+--------+------------+------------+------------+----------+--------+
|
30
|
+
+--------+------------+------------+------------+----------+--------+
|
31
|
+
| All | 0 | 0 | 0 | 0.00 | 0.00 |
|
32
|
+
+--------+------------+------------+------------+----------+--------+
|
33
|
+
"""
|
34
|
+
|
35
|
+
Scenario: A scaffold with a single contig
|
36
|
+
Given I create a new genomer project
|
37
|
+
And I write to "assembly/scaffold.yml" with:
|
38
|
+
"""
|
39
|
+
---
|
40
|
+
-
|
41
|
+
sequence:
|
42
|
+
source: contig0001
|
43
|
+
"""
|
44
|
+
And I write to "assembly/sequence.fna" with:
|
45
|
+
"""
|
46
|
+
>contig0001
|
47
|
+
ATGC
|
48
|
+
"""
|
49
|
+
When I run `genomer summary contigs`
|
50
|
+
Then the exit status should be 0
|
51
|
+
And the output should contain:
|
52
|
+
"""
|
53
|
+
+--------+------------+------------+------------+----------+--------+
|
54
|
+
| Scaffold Contigs |
|
55
|
+
+--------+------------+------------+------------+----------+--------+
|
56
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
57
|
+
+--------+------------+------------+------------+----------+--------+
|
58
|
+
| 1 | 1 | 4 | 4 | 100.00 | 50.00 |
|
59
|
+
+--------+------------+------------+------------+----------+--------+
|
60
|
+
| All | 1 | 4 | 4 | 100.00 | 50.00 |
|
61
|
+
+--------+------------+------------+------------+----------+--------+
|
62
|
+
"""
|
63
|
+
|
64
|
+
Scenario: A scaffold with a two different contigs
|
65
|
+
Given I create a new genomer project
|
66
|
+
And I write to "assembly/scaffold.yml" with:
|
67
|
+
"""
|
68
|
+
---
|
69
|
+
-
|
70
|
+
sequence:
|
71
|
+
source: contig0001
|
72
|
+
-
|
73
|
+
sequence:
|
74
|
+
source: contig0002
|
75
|
+
"""
|
76
|
+
And I write to "assembly/sequence.fna" with:
|
77
|
+
"""
|
78
|
+
>contig0001
|
79
|
+
ATGCGC
|
80
|
+
>contig0002
|
81
|
+
ATATGC
|
82
|
+
"""
|
83
|
+
When I run `genomer summary contigs`
|
84
|
+
Then the exit status should be 0
|
85
|
+
And the output should contain:
|
86
|
+
"""
|
87
|
+
+--------+------------+------------+------------+----------+--------+
|
88
|
+
| Scaffold Contigs |
|
89
|
+
+--------+------------+------------+------------+----------+--------+
|
90
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
91
|
+
+--------+------------+------------+------------+----------+--------+
|
92
|
+
| 1 | 1 | 12 | 12 | 100.00 | 50.00 |
|
93
|
+
+--------+------------+------------+------------+----------+--------+
|
94
|
+
| All | 1 | 12 | 12 | 100.00 | 50.00 |
|
95
|
+
+--------+------------+------------+------------+----------+--------+
|
96
|
+
"""
|
97
|
+
|
98
|
+
Scenario: A scaffold with a two repeated contigs
|
99
|
+
Given I create a new genomer project
|
100
|
+
And I write to "assembly/scaffold.yml" with:
|
101
|
+
"""
|
102
|
+
---
|
103
|
+
-
|
104
|
+
sequence:
|
105
|
+
source: contig0001
|
106
|
+
-
|
107
|
+
sequence:
|
108
|
+
source: contig0001
|
109
|
+
"""
|
110
|
+
And I write to "assembly/sequence.fna" with:
|
111
|
+
"""
|
112
|
+
>contig0001
|
113
|
+
ATGCGC
|
114
|
+
"""
|
115
|
+
When I run `genomer summary contigs`
|
116
|
+
Then the exit status should be 0
|
117
|
+
And the output should contain:
|
118
|
+
"""
|
119
|
+
+--------+------------+------------+------------+----------+--------+
|
120
|
+
| Scaffold Contigs |
|
121
|
+
+--------+------------+------------+------------+----------+--------+
|
122
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
123
|
+
+--------+------------+------------+------------+----------+--------+
|
124
|
+
| 1 | 1 | 12 | 12 | 100.00 | 66.67 |
|
125
|
+
+--------+------------+------------+------------+----------+--------+
|
126
|
+
| All | 1 | 12 | 12 | 100.00 | 66.67 |
|
127
|
+
+--------+------------+------------+------------+----------+--------+
|
128
|
+
"""
|
129
|
+
|
130
|
+
Scenario: A scaffold with a two contigs separated by a gap
|
131
|
+
Given I create a new genomer project
|
132
|
+
And I write to "assembly/scaffold.yml" with:
|
133
|
+
"""
|
134
|
+
---
|
135
|
+
-
|
136
|
+
sequence:
|
137
|
+
source: contig0001
|
138
|
+
-
|
139
|
+
unresolved:
|
140
|
+
length: 8
|
141
|
+
-
|
142
|
+
sequence:
|
143
|
+
source: contig0002
|
144
|
+
"""
|
145
|
+
And I write to "assembly/sequence.fna" with:
|
146
|
+
"""
|
147
|
+
>contig0001
|
148
|
+
ATGCGC
|
149
|
+
>contig0002
|
150
|
+
ATATGC
|
151
|
+
"""
|
152
|
+
When I run `genomer summary contigs`
|
153
|
+
Then the exit status should be 0
|
154
|
+
And the output should contain:
|
155
|
+
"""
|
156
|
+
+--------+------------+------------+------------+----------+--------+
|
157
|
+
| Scaffold Contigs |
|
158
|
+
+--------+------------+------------+------------+----------+--------+
|
159
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
160
|
+
+--------+------------+------------+------------+----------+--------+
|
161
|
+
| 1 | 1 | 6 | 6 | 30.00 | 66.67 |
|
162
|
+
| 2 | 15 | 20 | 6 | 30.00 | 33.33 |
|
163
|
+
+--------+------------+------------+------------+----------+--------+
|
164
|
+
| All | 1 | 20 | 12 | 60.00 | 50.00 |
|
165
|
+
+--------+------------+------------+------------+----------+--------+
|
166
|
+
"""
|
167
|
+
|
168
|
+
Scenario: A scaffold with a two contigs and a gap at the start
|
169
|
+
Given I create a new genomer project
|
170
|
+
And I write to "assembly/scaffold.yml" with:
|
171
|
+
"""
|
172
|
+
---
|
173
|
+
-
|
174
|
+
unresolved:
|
175
|
+
length: 8
|
176
|
+
-
|
177
|
+
sequence:
|
178
|
+
source: contig0001
|
179
|
+
-
|
180
|
+
sequence:
|
181
|
+
source: contig0002
|
182
|
+
"""
|
183
|
+
And I write to "assembly/sequence.fna" with:
|
184
|
+
"""
|
185
|
+
>contig0001
|
186
|
+
ATGCGC
|
187
|
+
>contig0002
|
188
|
+
ATATGC
|
189
|
+
"""
|
190
|
+
When I run `genomer summary contigs`
|
191
|
+
Then the exit status should be 0
|
192
|
+
And the output should contain:
|
193
|
+
"""
|
194
|
+
+--------+------------+------------+------------+----------+--------+
|
195
|
+
| Scaffold Contigs |
|
196
|
+
+--------+------------+------------+------------+----------+--------+
|
197
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
198
|
+
+--------+------------+------------+------------+----------+--------+
|
199
|
+
| 1 | 9 | 20 | 12 | 60.00 | 50.00 |
|
200
|
+
+--------+------------+------------+------------+----------+--------+
|
201
|
+
| All | 9 | 20 | 12 | 60.00 | 50.00 |
|
202
|
+
+--------+------------+------------+------------+----------+--------+
|
203
|
+
"""
|
204
|
+
|
205
|
+
Scenario: A scaffold with a two contigs and a gap at the end
|
206
|
+
Given I create a new genomer project
|
207
|
+
And I write to "assembly/scaffold.yml" with:
|
208
|
+
"""
|
209
|
+
---
|
210
|
+
-
|
211
|
+
sequence:
|
212
|
+
source: contig0001
|
213
|
+
-
|
214
|
+
sequence:
|
215
|
+
source: contig0002
|
216
|
+
-
|
217
|
+
unresolved:
|
218
|
+
length: 8
|
219
|
+
"""
|
220
|
+
And I write to "assembly/sequence.fna" with:
|
221
|
+
"""
|
222
|
+
>contig0001
|
223
|
+
ATGCGC
|
224
|
+
>contig0002
|
225
|
+
ATATGC
|
226
|
+
"""
|
227
|
+
When I run `genomer summary contigs`
|
228
|
+
Then the exit status should be 0
|
229
|
+
And the output should contain:
|
230
|
+
"""
|
231
|
+
+--------+------------+------------+------------+----------+--------+
|
232
|
+
| Scaffold Contigs |
|
233
|
+
+--------+------------+------------+------------+----------+--------+
|
234
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
235
|
+
+--------+------------+------------+------------+----------+--------+
|
236
|
+
| 1 | 1 | 12 | 12 | 60.00 | 50.00 |
|
237
|
+
+--------+------------+------------+------------+----------+--------+
|
238
|
+
| All | 1 | 12 | 12 | 60.00 | 50.00 |
|
239
|
+
+--------+------------+------------+------------+----------+--------+
|
240
|
+
"""
|
241
|
+
|
242
|
+
Scenario: A scaffold with two contigs containing internal gaps
|
243
|
+
Given I create a new genomer project
|
244
|
+
And I write to "assembly/scaffold.yml" with:
|
245
|
+
"""
|
246
|
+
---
|
247
|
+
-
|
248
|
+
sequence:
|
249
|
+
source: contig0001
|
250
|
+
-
|
251
|
+
sequence:
|
252
|
+
source: contig0002
|
253
|
+
"""
|
254
|
+
And I write to "assembly/sequence.fna" with:
|
255
|
+
"""
|
256
|
+
>contig0001
|
257
|
+
ATATNNNNGCGC
|
258
|
+
>contig0002
|
259
|
+
ATATNNNNGCGC
|
260
|
+
"""
|
261
|
+
When I run `genomer summary contigs`
|
262
|
+
Then the exit status should be 0
|
263
|
+
And the output should contain:
|
264
|
+
"""
|
265
|
+
+--------+------------+------------+------------+----------+--------+
|
266
|
+
| Scaffold Contigs |
|
267
|
+
+--------+------------+------------+------------+----------+--------+
|
268
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
269
|
+
+--------+------------+------------+------------+----------+--------+
|
270
|
+
| 1 | 1 | 4 | 4 | 16.67 | 0.00 |
|
271
|
+
| 2 | 9 | 16 | 8 | 33.33 | 50.00 |
|
272
|
+
| 3 | 21 | 24 | 4 | 16.67 | 100.00 |
|
273
|
+
+--------+------------+------------+------------+----------+--------+
|
274
|
+
| All | 1 | 24 | 16 | 66.67 | 50.00 |
|
275
|
+
+--------+------------+------------+------------+----------+--------+
|
276
|
+
"""
|
277
|
+
|
278
|
+
Scenario: A scaffold with two contigs containing internal gaps separated by a gap
|
279
|
+
Given I create a new genomer project
|
280
|
+
And I write to "assembly/scaffold.yml" with:
|
281
|
+
"""
|
282
|
+
---
|
283
|
+
-
|
284
|
+
sequence:
|
285
|
+
source: contig0001
|
286
|
+
-
|
287
|
+
unresolved:
|
288
|
+
length: 6
|
289
|
+
-
|
290
|
+
sequence:
|
291
|
+
source: contig0002
|
292
|
+
"""
|
293
|
+
And I write to "assembly/sequence.fna" with:
|
294
|
+
"""
|
295
|
+
>contig0001
|
296
|
+
ATATNNNNGCGC
|
297
|
+
>contig0002
|
298
|
+
ATATNNNNGCGC
|
299
|
+
"""
|
300
|
+
When I run `genomer summary contigs`
|
301
|
+
Then the exit status should be 0
|
302
|
+
And the output should contain:
|
303
|
+
"""
|
304
|
+
+--------+------------+------------+------------+----------+--------+
|
305
|
+
| Scaffold Contigs |
|
306
|
+
+--------+------------+------------+------------+----------+--------+
|
307
|
+
| Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
|
308
|
+
+--------+------------+------------+------------+----------+--------+
|
309
|
+
| 1 | 1 | 4 | 4 | 13.33 | 0.00 |
|
310
|
+
| 2 | 9 | 12 | 4 | 13.33 | 100.00 |
|
311
|
+
| 3 | 19 | 22 | 4 | 13.33 | 0.00 |
|
312
|
+
| 4 | 27 | 30 | 4 | 13.33 | 100.00 |
|
313
|
+
+--------+------------+------------+------------+----------+--------+
|
314
|
+
| All | 1 | 30 | 16 | 53.33 | 50.00 |
|
315
|
+
+--------+------------+------------+------------+----------+--------+
|
316
|
+
"""
|
317
|
+
|
318
|
+
Scenario: Generating CSV output
|
319
|
+
Given I create a new genomer project
|
320
|
+
And I write to "assembly/scaffold.yml" with:
|
321
|
+
"""
|
322
|
+
---
|
323
|
+
-
|
324
|
+
sequence:
|
325
|
+
source: contig0001
|
326
|
+
-
|
327
|
+
sequence:
|
328
|
+
source: contig0002
|
329
|
+
-
|
330
|
+
unresolved:
|
331
|
+
length: 8
|
332
|
+
"""
|
333
|
+
And I write to "assembly/sequence.fna" with:
|
334
|
+
"""
|
335
|
+
>contig0001
|
336
|
+
ATGCGC
|
337
|
+
>contig0002
|
338
|
+
ATATGC
|
339
|
+
"""
|
340
|
+
When I run `genomer summary contigs --output=csv`
|
341
|
+
Then the exit status should be 0
|
342
|
+
And the output should contain:
|
343
|
+
"""
|
344
|
+
contig,start_bp,end_bp,size_bp,size_%,gc_%
|
345
|
+
1,1,12,12,60.00,50.00
|
346
|
+
all,1,12,12,60.00,50.00
|
347
|
+
"""
|
data/features/gaps.feature
CHANGED
@@ -237,3 +237,37 @@ Feature: Producing a summary of the scaffold gaps
|
|
237
237
|
| 2 | 5 | 11 | 15 | unresolved |
|
238
238
|
+----------+----------+----------+----------+--------------+
|
239
239
|
"""
|
240
|
+
|
241
|
+
@disable-bundler
|
242
|
+
Scenario: Generating CSV output
|
243
|
+
Given I create a new genomer project
|
244
|
+
And I write to "assembly/scaffold.yml" with:
|
245
|
+
"""
|
246
|
+
---
|
247
|
+
-
|
248
|
+
sequence:
|
249
|
+
source: "contig00001"
|
250
|
+
inserts:
|
251
|
+
-
|
252
|
+
source: "insert_1"
|
253
|
+
open: 4
|
254
|
+
close: 5
|
255
|
+
-
|
256
|
+
unresolved:
|
257
|
+
length: 5
|
258
|
+
"""
|
259
|
+
And I write to "assembly/sequence.fna" with:
|
260
|
+
"""
|
261
|
+
>contig00001
|
262
|
+
ATGNNNATG
|
263
|
+
>insert_1
|
264
|
+
AAA
|
265
|
+
"""
|
266
|
+
When I run `genomer summary gaps --output=csv`
|
267
|
+
Then the exit status should be 0
|
268
|
+
And the output should contain:
|
269
|
+
"""
|
270
|
+
number,length,start,end,type
|
271
|
+
1,1,7,7,contig
|
272
|
+
2,5,11,15,unresolved
|
273
|
+
"""
|
@@ -0,0 +1,213 @@
|
|
1
|
+
Feature: Producing a summary of the genome
|
2
|
+
In order to have an overview of the genome
|
3
|
+
A user can use the "genome" command
|
4
|
+
to generate the a tabular output of the genome
|
5
|
+
|
6
|
+
Scenario: A scaffold with a single sequence
|
7
|
+
Given I create a new genomer project
|
8
|
+
And I write to "assembly/scaffold.yml" with:
|
9
|
+
"""
|
10
|
+
---
|
11
|
+
-
|
12
|
+
sequence:
|
13
|
+
source: contig0001
|
14
|
+
"""
|
15
|
+
And I write to "assembly/sequence.fna" with:
|
16
|
+
"""
|
17
|
+
>contig0001
|
18
|
+
ATGC
|
19
|
+
"""
|
20
|
+
When I run `genomer summary genome`
|
21
|
+
Then the exit status should be 0
|
22
|
+
And the output should contain:
|
23
|
+
"""
|
24
|
+
+----------------+-----------+
|
25
|
+
| Scaffold |
|
26
|
+
+----------------+-----------+
|
27
|
+
| Sequences (#) | 1 |
|
28
|
+
| Contigs (#) | 1 |
|
29
|
+
| Gaps (#) | 0 |
|
30
|
+
+----------------+-----------+
|
31
|
+
| Size (bp) | 4 |
|
32
|
+
| Sequences (bp) | 4 |
|
33
|
+
| Contigs (bp) | 4 |
|
34
|
+
| Gaps (bp) | 0 |
|
35
|
+
+----------------+-----------+
|
36
|
+
| G+C (%) | 50.00 |
|
37
|
+
| Sequences (%) | 100.00 |
|
38
|
+
| Contigs (%) | 100.00 |
|
39
|
+
| Gaps (%) | 0.00 |
|
40
|
+
+----------------+-----------+
|
41
|
+
|
42
|
+
"""
|
43
|
+
|
44
|
+
Scenario: A scaffold with a two sequences
|
45
|
+
Given I create a new genomer project
|
46
|
+
And I write to "assembly/scaffold.yml" with:
|
47
|
+
"""
|
48
|
+
---
|
49
|
+
-
|
50
|
+
sequence:
|
51
|
+
source: contig0001
|
52
|
+
-
|
53
|
+
sequence:
|
54
|
+
source: contig0002
|
55
|
+
"""
|
56
|
+
And I write to "assembly/sequence.fna" with:
|
57
|
+
"""
|
58
|
+
>contig0001
|
59
|
+
ATGC
|
60
|
+
>contig0002
|
61
|
+
GGGC
|
62
|
+
"""
|
63
|
+
When I run `genomer summary genome`
|
64
|
+
Then the exit status should be 0
|
65
|
+
And the output should contain:
|
66
|
+
"""
|
67
|
+
+----------------+-----------+
|
68
|
+
| Scaffold |
|
69
|
+
+----------------+-----------+
|
70
|
+
| Sequences (#) | 2 |
|
71
|
+
| Contigs (#) | 1 |
|
72
|
+
| Gaps (#) | 0 |
|
73
|
+
+----------------+-----------+
|
74
|
+
| Size (bp) | 8 |
|
75
|
+
| Sequences (bp) | 8 |
|
76
|
+
| Contigs (bp) | 8 |
|
77
|
+
| Gaps (bp) | 0 |
|
78
|
+
+----------------+-----------+
|
79
|
+
| G+C (%) | 75.00 |
|
80
|
+
| Sequences (%) | 100.00 |
|
81
|
+
| Contigs (%) | 100.00 |
|
82
|
+
| Gaps (%) | 0.00 |
|
83
|
+
+----------------+-----------+
|
84
|
+
|
85
|
+
"""
|
86
|
+
|
87
|
+
Scenario: A scaffold with a two sequences and a gap
|
88
|
+
Given I create a new genomer project
|
89
|
+
And I write to "assembly/scaffold.yml" with:
|
90
|
+
"""
|
91
|
+
---
|
92
|
+
-
|
93
|
+
sequence:
|
94
|
+
source: contig0001
|
95
|
+
-
|
96
|
+
unresolved:
|
97
|
+
length: 5
|
98
|
+
-
|
99
|
+
sequence:
|
100
|
+
source: contig0002
|
101
|
+
"""
|
102
|
+
And I write to "assembly/sequence.fna" with:
|
103
|
+
"""
|
104
|
+
>contig0001
|
105
|
+
ATGC
|
106
|
+
>contig0002
|
107
|
+
GGGC
|
108
|
+
"""
|
109
|
+
When I run `genomer summary genome`
|
110
|
+
Then the exit status should be 0
|
111
|
+
And the output should contain:
|
112
|
+
"""
|
113
|
+
+----------------+-----------+
|
114
|
+
| Scaffold |
|
115
|
+
+----------------+-----------+
|
116
|
+
| Sequences (#) | 2 |
|
117
|
+
| Contigs (#) | 2 |
|
118
|
+
| Gaps (#) | 1 |
|
119
|
+
+----------------+-----------+
|
120
|
+
| Size (bp) | 13 |
|
121
|
+
| Sequences (bp) | 8 |
|
122
|
+
| Contigs (bp) | 8 |
|
123
|
+
| Gaps (bp) | 5 |
|
124
|
+
+----------------+-----------+
|
125
|
+
| G+C (%) | 75.00 |
|
126
|
+
| Sequences (%) | 61.54 |
|
127
|
+
| Contigs (%) | 61.54 |
|
128
|
+
| Gaps (%) | 38.46 |
|
129
|
+
+----------------+-----------+
|
130
|
+
|
131
|
+
"""
|
132
|
+
|
133
|
+
Scenario: A scaffold with a two sequences containing gaps
|
134
|
+
Given I create a new genomer project
|
135
|
+
And I write to "assembly/scaffold.yml" with:
|
136
|
+
"""
|
137
|
+
---
|
138
|
+
-
|
139
|
+
sequence:
|
140
|
+
source: contig0001
|
141
|
+
-
|
142
|
+
sequence:
|
143
|
+
source: contig0002
|
144
|
+
"""
|
145
|
+
And I write to "assembly/sequence.fna" with:
|
146
|
+
"""
|
147
|
+
>contig0001
|
148
|
+
AAANNNGGG
|
149
|
+
>contig0002
|
150
|
+
AAANNNGGG
|
151
|
+
"""
|
152
|
+
When I run `genomer summary genome`
|
153
|
+
Then the exit status should be 0
|
154
|
+
And the output should contain:
|
155
|
+
"""
|
156
|
+
+----------------+-----------+
|
157
|
+
| Scaffold |
|
158
|
+
+----------------+-----------+
|
159
|
+
| Sequences (#) | 2 |
|
160
|
+
| Contigs (#) | 3 |
|
161
|
+
| Gaps (#) | 2 |
|
162
|
+
+----------------+-----------+
|
163
|
+
| Size (bp) | 18 |
|
164
|
+
| Sequences (bp) | 18 |
|
165
|
+
| Contigs (bp) | 12 |
|
166
|
+
| Gaps (bp) | 6 |
|
167
|
+
+----------------+-----------+
|
168
|
+
| G+C (%) | 50.00 |
|
169
|
+
| Sequences (%) | 100.00 |
|
170
|
+
| Contigs (%) | 66.67 |
|
171
|
+
| Gaps (%) | 33.33 |
|
172
|
+
+----------------+-----------+
|
173
|
+
|
174
|
+
"""
|
175
|
+
|
176
|
+
Scenario: Generating CSV output
|
177
|
+
Given I create a new genomer project
|
178
|
+
And I write to "assembly/scaffold.yml" with:
|
179
|
+
"""
|
180
|
+
---
|
181
|
+
-
|
182
|
+
sequence:
|
183
|
+
source: contig0001
|
184
|
+
-
|
185
|
+
unresolved:
|
186
|
+
length: 5
|
187
|
+
-
|
188
|
+
sequence:
|
189
|
+
source: contig0002
|
190
|
+
"""
|
191
|
+
And I write to "assembly/sequence.fna" with:
|
192
|
+
"""
|
193
|
+
>contig0001
|
194
|
+
ATGC
|
195
|
+
>contig0002
|
196
|
+
GGGC
|
197
|
+
"""
|
198
|
+
When I run `genomer summary genome --output=csv`
|
199
|
+
Then the exit status should be 0
|
200
|
+
And the output should contain:
|
201
|
+
"""
|
202
|
+
sequences_#,2
|
203
|
+
contigs_#,2
|
204
|
+
gaps_#,1
|
205
|
+
size_bp,13
|
206
|
+
sequences_bp,8
|
207
|
+
contigs_bp,8
|
208
|
+
gaps_bp,5
|
209
|
+
g+c_%,75.00
|
210
|
+
sequences_%,61.54
|
211
|
+
contigs_%,61.54
|
212
|
+
gaps_%,38.46
|
213
|
+
"""
|