genomer-plugin-summary 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.3
1
+ 0.0.4
@@ -0,0 +1,347 @@
1
+ Feature: Producing a summary of the scaffold contigs
2
+ In order to have an overview of the contigs in a scaffold
3
+ A user can use the "contigs" command
4
+ to generate the a tabular output of the scaffold contigs
5
+
6
+ @disable-bundler
7
+ Scenario: An empty scaffold
8
+ Given I create a new genomer project
9
+ And I write to "assembly/scaffold.yml" with:
10
+ """
11
+ ---
12
+ -
13
+ unresolved:
14
+ length: 50
15
+ """
16
+ And I write to "assembly/sequence.fna" with:
17
+ """
18
+ >contig0001
19
+ ATGC
20
+ """
21
+ When I run `genomer summary contigs`
22
+ Then the exit status should be 0
23
+ And the output should contain:
24
+ """
25
+ +--------+------------+------------+------------+----------+--------+
26
+ | Scaffold Contigs |
27
+ +--------+------------+------------+------------+----------+--------+
28
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
29
+ +--------+------------+------------+------------+----------+--------+
30
+ +--------+------------+------------+------------+----------+--------+
31
+ | All | 0 | 0 | 0 | 0.00 | 0.00 |
32
+ +--------+------------+------------+------------+----------+--------+
33
+ """
34
+
35
+ Scenario: A scaffold with a single contig
36
+ Given I create a new genomer project
37
+ And I write to "assembly/scaffold.yml" with:
38
+ """
39
+ ---
40
+ -
41
+ sequence:
42
+ source: contig0001
43
+ """
44
+ And I write to "assembly/sequence.fna" with:
45
+ """
46
+ >contig0001
47
+ ATGC
48
+ """
49
+ When I run `genomer summary contigs`
50
+ Then the exit status should be 0
51
+ And the output should contain:
52
+ """
53
+ +--------+------------+------------+------------+----------+--------+
54
+ | Scaffold Contigs |
55
+ +--------+------------+------------+------------+----------+--------+
56
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
57
+ +--------+------------+------------+------------+----------+--------+
58
+ | 1 | 1 | 4 | 4 | 100.00 | 50.00 |
59
+ +--------+------------+------------+------------+----------+--------+
60
+ | All | 1 | 4 | 4 | 100.00 | 50.00 |
61
+ +--------+------------+------------+------------+----------+--------+
62
+ """
63
+
64
+ Scenario: A scaffold with a two different contigs
65
+ Given I create a new genomer project
66
+ And I write to "assembly/scaffold.yml" with:
67
+ """
68
+ ---
69
+ -
70
+ sequence:
71
+ source: contig0001
72
+ -
73
+ sequence:
74
+ source: contig0002
75
+ """
76
+ And I write to "assembly/sequence.fna" with:
77
+ """
78
+ >contig0001
79
+ ATGCGC
80
+ >contig0002
81
+ ATATGC
82
+ """
83
+ When I run `genomer summary contigs`
84
+ Then the exit status should be 0
85
+ And the output should contain:
86
+ """
87
+ +--------+------------+------------+------------+----------+--------+
88
+ | Scaffold Contigs |
89
+ +--------+------------+------------+------------+----------+--------+
90
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
91
+ +--------+------------+------------+------------+----------+--------+
92
+ | 1 | 1 | 12 | 12 | 100.00 | 50.00 |
93
+ +--------+------------+------------+------------+----------+--------+
94
+ | All | 1 | 12 | 12 | 100.00 | 50.00 |
95
+ +--------+------------+------------+------------+----------+--------+
96
+ """
97
+
98
+ Scenario: A scaffold with a two repeated contigs
99
+ Given I create a new genomer project
100
+ And I write to "assembly/scaffold.yml" with:
101
+ """
102
+ ---
103
+ -
104
+ sequence:
105
+ source: contig0001
106
+ -
107
+ sequence:
108
+ source: contig0001
109
+ """
110
+ And I write to "assembly/sequence.fna" with:
111
+ """
112
+ >contig0001
113
+ ATGCGC
114
+ """
115
+ When I run `genomer summary contigs`
116
+ Then the exit status should be 0
117
+ And the output should contain:
118
+ """
119
+ +--------+------------+------------+------------+----------+--------+
120
+ | Scaffold Contigs |
121
+ +--------+------------+------------+------------+----------+--------+
122
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
123
+ +--------+------------+------------+------------+----------+--------+
124
+ | 1 | 1 | 12 | 12 | 100.00 | 66.67 |
125
+ +--------+------------+------------+------------+----------+--------+
126
+ | All | 1 | 12 | 12 | 100.00 | 66.67 |
127
+ +--------+------------+------------+------------+----------+--------+
128
+ """
129
+
130
+ Scenario: A scaffold with a two contigs separated by a gap
131
+ Given I create a new genomer project
132
+ And I write to "assembly/scaffold.yml" with:
133
+ """
134
+ ---
135
+ -
136
+ sequence:
137
+ source: contig0001
138
+ -
139
+ unresolved:
140
+ length: 8
141
+ -
142
+ sequence:
143
+ source: contig0002
144
+ """
145
+ And I write to "assembly/sequence.fna" with:
146
+ """
147
+ >contig0001
148
+ ATGCGC
149
+ >contig0002
150
+ ATATGC
151
+ """
152
+ When I run `genomer summary contigs`
153
+ Then the exit status should be 0
154
+ And the output should contain:
155
+ """
156
+ +--------+------------+------------+------------+----------+--------+
157
+ | Scaffold Contigs |
158
+ +--------+------------+------------+------------+----------+--------+
159
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
160
+ +--------+------------+------------+------------+----------+--------+
161
+ | 1 | 1 | 6 | 6 | 30.00 | 66.67 |
162
+ | 2 | 15 | 20 | 6 | 30.00 | 33.33 |
163
+ +--------+------------+------------+------------+----------+--------+
164
+ | All | 1 | 20 | 12 | 60.00 | 50.00 |
165
+ +--------+------------+------------+------------+----------+--------+
166
+ """
167
+
168
+ Scenario: A scaffold with a two contigs and a gap at the start
169
+ Given I create a new genomer project
170
+ And I write to "assembly/scaffold.yml" with:
171
+ """
172
+ ---
173
+ -
174
+ unresolved:
175
+ length: 8
176
+ -
177
+ sequence:
178
+ source: contig0001
179
+ -
180
+ sequence:
181
+ source: contig0002
182
+ """
183
+ And I write to "assembly/sequence.fna" with:
184
+ """
185
+ >contig0001
186
+ ATGCGC
187
+ >contig0002
188
+ ATATGC
189
+ """
190
+ When I run `genomer summary contigs`
191
+ Then the exit status should be 0
192
+ And the output should contain:
193
+ """
194
+ +--------+------------+------------+------------+----------+--------+
195
+ | Scaffold Contigs |
196
+ +--------+------------+------------+------------+----------+--------+
197
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
198
+ +--------+------------+------------+------------+----------+--------+
199
+ | 1 | 9 | 20 | 12 | 60.00 | 50.00 |
200
+ +--------+------------+------------+------------+----------+--------+
201
+ | All | 9 | 20 | 12 | 60.00 | 50.00 |
202
+ +--------+------------+------------+------------+----------+--------+
203
+ """
204
+
205
+ Scenario: A scaffold with a two contigs and a gap at the end
206
+ Given I create a new genomer project
207
+ And I write to "assembly/scaffold.yml" with:
208
+ """
209
+ ---
210
+ -
211
+ sequence:
212
+ source: contig0001
213
+ -
214
+ sequence:
215
+ source: contig0002
216
+ -
217
+ unresolved:
218
+ length: 8
219
+ """
220
+ And I write to "assembly/sequence.fna" with:
221
+ """
222
+ >contig0001
223
+ ATGCGC
224
+ >contig0002
225
+ ATATGC
226
+ """
227
+ When I run `genomer summary contigs`
228
+ Then the exit status should be 0
229
+ And the output should contain:
230
+ """
231
+ +--------+------------+------------+------------+----------+--------+
232
+ | Scaffold Contigs |
233
+ +--------+------------+------------+------------+----------+--------+
234
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
235
+ +--------+------------+------------+------------+----------+--------+
236
+ | 1 | 1 | 12 | 12 | 60.00 | 50.00 |
237
+ +--------+------------+------------+------------+----------+--------+
238
+ | All | 1 | 12 | 12 | 60.00 | 50.00 |
239
+ +--------+------------+------------+------------+----------+--------+
240
+ """
241
+
242
+ Scenario: A scaffold with two contigs containing internal gaps
243
+ Given I create a new genomer project
244
+ And I write to "assembly/scaffold.yml" with:
245
+ """
246
+ ---
247
+ -
248
+ sequence:
249
+ source: contig0001
250
+ -
251
+ sequence:
252
+ source: contig0002
253
+ """
254
+ And I write to "assembly/sequence.fna" with:
255
+ """
256
+ >contig0001
257
+ ATATNNNNGCGC
258
+ >contig0002
259
+ ATATNNNNGCGC
260
+ """
261
+ When I run `genomer summary contigs`
262
+ Then the exit status should be 0
263
+ And the output should contain:
264
+ """
265
+ +--------+------------+------------+------------+----------+--------+
266
+ | Scaffold Contigs |
267
+ +--------+------------+------------+------------+----------+--------+
268
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
269
+ +--------+------------+------------+------------+----------+--------+
270
+ | 1 | 1 | 4 | 4 | 16.67 | 0.00 |
271
+ | 2 | 9 | 16 | 8 | 33.33 | 50.00 |
272
+ | 3 | 21 | 24 | 4 | 16.67 | 100.00 |
273
+ +--------+------------+------------+------------+----------+--------+
274
+ | All | 1 | 24 | 16 | 66.67 | 50.00 |
275
+ +--------+------------+------------+------------+----------+--------+
276
+ """
277
+
278
+ Scenario: A scaffold with two contigs containing internal gaps separated by a gap
279
+ Given I create a new genomer project
280
+ And I write to "assembly/scaffold.yml" with:
281
+ """
282
+ ---
283
+ -
284
+ sequence:
285
+ source: contig0001
286
+ -
287
+ unresolved:
288
+ length: 6
289
+ -
290
+ sequence:
291
+ source: contig0002
292
+ """
293
+ And I write to "assembly/sequence.fna" with:
294
+ """
295
+ >contig0001
296
+ ATATNNNNGCGC
297
+ >contig0002
298
+ ATATNNNNGCGC
299
+ """
300
+ When I run `genomer summary contigs`
301
+ Then the exit status should be 0
302
+ And the output should contain:
303
+ """
304
+ +--------+------------+------------+------------+----------+--------+
305
+ | Scaffold Contigs |
306
+ +--------+------------+------------+------------+----------+--------+
307
+ | Contig | Start (bp) | End (bp) | Size (bp) | Size (%) | GC (%) |
308
+ +--------+------------+------------+------------+----------+--------+
309
+ | 1 | 1 | 4 | 4 | 13.33 | 0.00 |
310
+ | 2 | 9 | 12 | 4 | 13.33 | 100.00 |
311
+ | 3 | 19 | 22 | 4 | 13.33 | 0.00 |
312
+ | 4 | 27 | 30 | 4 | 13.33 | 100.00 |
313
+ +--------+------------+------------+------------+----------+--------+
314
+ | All | 1 | 30 | 16 | 53.33 | 50.00 |
315
+ +--------+------------+------------+------------+----------+--------+
316
+ """
317
+
318
+ Scenario: Generating CSV output
319
+ Given I create a new genomer project
320
+ And I write to "assembly/scaffold.yml" with:
321
+ """
322
+ ---
323
+ -
324
+ sequence:
325
+ source: contig0001
326
+ -
327
+ sequence:
328
+ source: contig0002
329
+ -
330
+ unresolved:
331
+ length: 8
332
+ """
333
+ And I write to "assembly/sequence.fna" with:
334
+ """
335
+ >contig0001
336
+ ATGCGC
337
+ >contig0002
338
+ ATATGC
339
+ """
340
+ When I run `genomer summary contigs --output=csv`
341
+ Then the exit status should be 0
342
+ And the output should contain:
343
+ """
344
+ contig,start_bp,end_bp,size_bp,size_%,gc_%
345
+ 1,1,12,12,60.00,50.00
346
+ all,1,12,12,60.00,50.00
347
+ """
@@ -237,3 +237,37 @@ Feature: Producing a summary of the scaffold gaps
237
237
  | 2 | 5 | 11 | 15 | unresolved |
238
238
  +----------+----------+----------+----------+--------------+
239
239
  """
240
+
241
+ @disable-bundler
242
+ Scenario: Generating CSV output
243
+ Given I create a new genomer project
244
+ And I write to "assembly/scaffold.yml" with:
245
+ """
246
+ ---
247
+ -
248
+ sequence:
249
+ source: "contig00001"
250
+ inserts:
251
+ -
252
+ source: "insert_1"
253
+ open: 4
254
+ close: 5
255
+ -
256
+ unresolved:
257
+ length: 5
258
+ """
259
+ And I write to "assembly/sequence.fna" with:
260
+ """
261
+ >contig00001
262
+ ATGNNNATG
263
+ >insert_1
264
+ AAA
265
+ """
266
+ When I run `genomer summary gaps --output=csv`
267
+ Then the exit status should be 0
268
+ And the output should contain:
269
+ """
270
+ number,length,start,end,type
271
+ 1,1,7,7,contig
272
+ 2,5,11,15,unresolved
273
+ """
@@ -0,0 +1,213 @@
1
+ Feature: Producing a summary of the genome
2
+ In order to have an overview of the genome
3
+ A user can use the "genome" command
4
+ to generate the a tabular output of the genome
5
+
6
+ Scenario: A scaffold with a single sequence
7
+ Given I create a new genomer project
8
+ And I write to "assembly/scaffold.yml" with:
9
+ """
10
+ ---
11
+ -
12
+ sequence:
13
+ source: contig0001
14
+ """
15
+ And I write to "assembly/sequence.fna" with:
16
+ """
17
+ >contig0001
18
+ ATGC
19
+ """
20
+ When I run `genomer summary genome`
21
+ Then the exit status should be 0
22
+ And the output should contain:
23
+ """
24
+ +----------------+-----------+
25
+ | Scaffold |
26
+ +----------------+-----------+
27
+ | Sequences (#) | 1 |
28
+ | Contigs (#) | 1 |
29
+ | Gaps (#) | 0 |
30
+ +----------------+-----------+
31
+ | Size (bp) | 4 |
32
+ | Sequences (bp) | 4 |
33
+ | Contigs (bp) | 4 |
34
+ | Gaps (bp) | 0 |
35
+ +----------------+-----------+
36
+ | G+C (%) | 50.00 |
37
+ | Sequences (%) | 100.00 |
38
+ | Contigs (%) | 100.00 |
39
+ | Gaps (%) | 0.00 |
40
+ +----------------+-----------+
41
+
42
+ """
43
+
44
+ Scenario: A scaffold with a two sequences
45
+ Given I create a new genomer project
46
+ And I write to "assembly/scaffold.yml" with:
47
+ """
48
+ ---
49
+ -
50
+ sequence:
51
+ source: contig0001
52
+ -
53
+ sequence:
54
+ source: contig0002
55
+ """
56
+ And I write to "assembly/sequence.fna" with:
57
+ """
58
+ >contig0001
59
+ ATGC
60
+ >contig0002
61
+ GGGC
62
+ """
63
+ When I run `genomer summary genome`
64
+ Then the exit status should be 0
65
+ And the output should contain:
66
+ """
67
+ +----------------+-----------+
68
+ | Scaffold |
69
+ +----------------+-----------+
70
+ | Sequences (#) | 2 |
71
+ | Contigs (#) | 1 |
72
+ | Gaps (#) | 0 |
73
+ +----------------+-----------+
74
+ | Size (bp) | 8 |
75
+ | Sequences (bp) | 8 |
76
+ | Contigs (bp) | 8 |
77
+ | Gaps (bp) | 0 |
78
+ +----------------+-----------+
79
+ | G+C (%) | 75.00 |
80
+ | Sequences (%) | 100.00 |
81
+ | Contigs (%) | 100.00 |
82
+ | Gaps (%) | 0.00 |
83
+ +----------------+-----------+
84
+
85
+ """
86
+
87
+ Scenario: A scaffold with a two sequences and a gap
88
+ Given I create a new genomer project
89
+ And I write to "assembly/scaffold.yml" with:
90
+ """
91
+ ---
92
+ -
93
+ sequence:
94
+ source: contig0001
95
+ -
96
+ unresolved:
97
+ length: 5
98
+ -
99
+ sequence:
100
+ source: contig0002
101
+ """
102
+ And I write to "assembly/sequence.fna" with:
103
+ """
104
+ >contig0001
105
+ ATGC
106
+ >contig0002
107
+ GGGC
108
+ """
109
+ When I run `genomer summary genome`
110
+ Then the exit status should be 0
111
+ And the output should contain:
112
+ """
113
+ +----------------+-----------+
114
+ | Scaffold |
115
+ +----------------+-----------+
116
+ | Sequences (#) | 2 |
117
+ | Contigs (#) | 2 |
118
+ | Gaps (#) | 1 |
119
+ +----------------+-----------+
120
+ | Size (bp) | 13 |
121
+ | Sequences (bp) | 8 |
122
+ | Contigs (bp) | 8 |
123
+ | Gaps (bp) | 5 |
124
+ +----------------+-----------+
125
+ | G+C (%) | 75.00 |
126
+ | Sequences (%) | 61.54 |
127
+ | Contigs (%) | 61.54 |
128
+ | Gaps (%) | 38.46 |
129
+ +----------------+-----------+
130
+
131
+ """
132
+
133
+ Scenario: A scaffold with a two sequences containing gaps
134
+ Given I create a new genomer project
135
+ And I write to "assembly/scaffold.yml" with:
136
+ """
137
+ ---
138
+ -
139
+ sequence:
140
+ source: contig0001
141
+ -
142
+ sequence:
143
+ source: contig0002
144
+ """
145
+ And I write to "assembly/sequence.fna" with:
146
+ """
147
+ >contig0001
148
+ AAANNNGGG
149
+ >contig0002
150
+ AAANNNGGG
151
+ """
152
+ When I run `genomer summary genome`
153
+ Then the exit status should be 0
154
+ And the output should contain:
155
+ """
156
+ +----------------+-----------+
157
+ | Scaffold |
158
+ +----------------+-----------+
159
+ | Sequences (#) | 2 |
160
+ | Contigs (#) | 3 |
161
+ | Gaps (#) | 2 |
162
+ +----------------+-----------+
163
+ | Size (bp) | 18 |
164
+ | Sequences (bp) | 18 |
165
+ | Contigs (bp) | 12 |
166
+ | Gaps (bp) | 6 |
167
+ +----------------+-----------+
168
+ | G+C (%) | 50.00 |
169
+ | Sequences (%) | 100.00 |
170
+ | Contigs (%) | 66.67 |
171
+ | Gaps (%) | 33.33 |
172
+ +----------------+-----------+
173
+
174
+ """
175
+
176
+ Scenario: Generating CSV output
177
+ Given I create a new genomer project
178
+ And I write to "assembly/scaffold.yml" with:
179
+ """
180
+ ---
181
+ -
182
+ sequence:
183
+ source: contig0001
184
+ -
185
+ unresolved:
186
+ length: 5
187
+ -
188
+ sequence:
189
+ source: contig0002
190
+ """
191
+ And I write to "assembly/sequence.fna" with:
192
+ """
193
+ >contig0001
194
+ ATGC
195
+ >contig0002
196
+ GGGC
197
+ """
198
+ When I run `genomer summary genome --output=csv`
199
+ Then the exit status should be 0
200
+ And the output should contain:
201
+ """
202
+ sequences_#,2
203
+ contigs_#,2
204
+ gaps_#,1
205
+ size_bp,13
206
+ sequences_bp,8
207
+ contigs_bp,8
208
+ gaps_bp,5
209
+ g+c_%,75.00
210
+ sequences_%,61.54
211
+ contigs_%,61.54
212
+ gaps_%,38.46
213
+ """