genomer-plugin-validate 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (43) hide show
  1. data/.gitignore +4 -0
  2. data/Gemfile +4 -0
  3. data/Rakefile +9 -0
  4. data/VERSION +1 -0
  5. data/features/annotations/bad-product-field.feature +91 -0
  6. data/features/annotations/command-line-interface.feature +19 -0
  7. data/features/annotations/duplicate_id.feature +144 -0
  8. data/features/annotations/identical_locations.feature +74 -0
  9. data/features/annotations/incorrect-attributes.feature +135 -0
  10. data/features/annotations/missing_attributes.feature +75 -0
  11. data/features/annotations/name.feature +40 -0
  12. data/features/command-line-interface.feature +36 -0
  13. data/features/support/env.rb +13 -0
  14. data/genomer-plugin-validate.gemspec +28 -0
  15. data/lib/extensions/string.rb +12 -0
  16. data/lib/genomer-plugin-validate.rb +33 -0
  17. data/lib/genomer-plugin-validate/group.rb +17 -0
  18. data/lib/genomer-plugin-validate/group/annotations.rb +20 -0
  19. data/lib/genomer-plugin-validate/validator.rb +27 -0
  20. data/lib/genomer-plugin-validate/validator/bad_product_field.rb +45 -0
  21. data/lib/genomer-plugin-validate/validator/duplicate_coordinates.rb +12 -0
  22. data/lib/genomer-plugin-validate/validator/duplicate_id.rb +11 -0
  23. data/lib/genomer-plugin-validate/validator/gff3_attributes.rb +16 -0
  24. data/lib/genomer-plugin-validate/validator/missing_id.rb +13 -0
  25. data/lib/genomer-plugin-validate/validator/no_name_or_product.rb +13 -0
  26. data/lib/genomer-plugin-validate/validator/uppercase_name.rb +13 -0
  27. data/lib/genomer-plugin-validate/validator/view_attributes.rb +16 -0
  28. data/man/genomer-validate.ronn +100 -0
  29. data/spec/genomer-plugin-validate/group/annotations_spec.rb +18 -0
  30. data/spec/genomer-plugin-validate/group_spec.rb +24 -0
  31. data/spec/genomer-plugin-validate/validator/bad_product_field_spec.rb +93 -0
  32. data/spec/genomer-plugin-validate/validator/duplicate_coordinates_spec.rb +24 -0
  33. data/spec/genomer-plugin-validate/validator/duplicate_id_spec.rb +34 -0
  34. data/spec/genomer-plugin-validate/validator/gff_attributes_spec.rb +32 -0
  35. data/spec/genomer-plugin-validate/validator/missing_id_spec.rb +27 -0
  36. data/spec/genomer-plugin-validate/validator/no_name_or_product_spec.rb +28 -0
  37. data/spec/genomer-plugin-validate/validator/uppercase_name_spec.rb +22 -0
  38. data/spec/genomer-plugin-validate/validator/view_attributes_spec.rb +31 -0
  39. data/spec/genomer-plugin-validate/validator_spec.rb +107 -0
  40. data/spec/genomer-plugin-validate_spec.rb +92 -0
  41. data/spec/spec_helper.rb +35 -0
  42. data/spec/validator_run_matcher.rb +25 -0
  43. metadata +244 -0
data/.gitignore ADDED
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in genomer-plugin-validate.gemspec
4
+ gemspec
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ require 'rspec/core'
4
+ require 'rspec/core/rake_task'
5
+ RSpec::Core::RakeTask.new(:spec) do |spec|
6
+ spec.pattern = FileList['spec/**/*_spec.rb']
7
+ end
8
+
9
+ task :default => :spec
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.0.1
@@ -0,0 +1,91 @@
1
+ Feature: Validating annotation files for bad product fields
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to find bad product fields
4
+ to ensure that their annotation file contains valid product information
5
+
6
+ Scenario Outline: Validating an annotations file with bad product fields
7
+ Given I successfully run `genomer init project`
8
+ And I cd to "project"
9
+ And I write to "assembly/scaffold.yml" with:
10
+ """
11
+ ---
12
+ - sequence:
13
+ source: contig1
14
+ """
15
+ And I write to "assembly/sequence.fna" with:
16
+ """
17
+ >contig1
18
+ AAAAA
19
+ """
20
+ And I write to "assembly/annotations.gff" with:
21
+ """
22
+ ##gff-version 3
23
+ contig1 . gene 3 4 . + 1 ID=gene1;product=<product>
24
+ """
25
+ And I append to "Gemfile" with:
26
+ """
27
+ gem 'genomer-plugin-validate', :path => '../../../'
28
+ """
29
+ When I run `genomer validate annotations`
30
+ Then the exit status should be 0
31
+ And the output should contain:
32
+ """
33
+ Bad product field for 'gene1:' <correction>
34
+ """
35
+ Examples:
36
+ | product | correction |
37
+ | hypothetical thing | start with 'putative' instead of 'hypothetical.' |
38
+ | Hypothetical thing | start with 'putative' instead of 'hypothetical.' |
39
+ | something like | products ending with 'like' are not allowed. |
40
+ | something like. | products ending with 'like' are not allowed. |
41
+ | something-like | products ending with 'like' are not allowed. |
42
+ | something Like | products ending with 'like' are not allowed. |
43
+ | something domain | products ending with 'domain' are not allowed. |
44
+ | something-domain | products ending with 'domain' are not allowed. |
45
+ | something domain. | products ending with 'domain' are not allowed. |
46
+ | something Domain | products ending with 'domain' are not allowed. |
47
+ | something-related | products ending with 'related' are not allowed. |
48
+ | something related | products ending with 'related' are not allowed. |
49
+ | something related. | products ending with 'related' are not allowed. |
50
+ | something Related | products ending with 'related' are not allowed. |
51
+ | something n-term | 'N-terminal' or variations are not allowed. |
52
+ | something N-term | 'N-terminal' or variations are not allowed. |
53
+ | something N-terminal | 'N-terminal' or variations are not allowed. |
54
+ | something n-terminal | 'N-terminal' or variations are not allowed. |
55
+ | SOMETHING | all caps product fields are not allowed. |
56
+ | SOMETHING PROTEIN | all caps product fields are not allowed. |
57
+ | SOMETHING-PROTEIN | all caps product fields are not allowed. |
58
+
59
+ Scenario Outline: Validating an annotations file with acceptable product fields
60
+ Given I successfully run `genomer init project`
61
+ And I cd to "project"
62
+ And I write to "assembly/scaffold.yml" with:
63
+ """
64
+ ---
65
+ - sequence:
66
+ source: contig1
67
+ """
68
+ And I write to "assembly/sequence.fna" with:
69
+ """
70
+ >contig1
71
+ AAAAA
72
+ """
73
+ And I write to "assembly/annotations.gff" with:
74
+ """
75
+ ##gff-version 3
76
+ contig1 . gene 3 4 . + 1 ID=gene1;product=<product>
77
+ """
78
+ And I append to "Gemfile" with:
79
+ """
80
+ gem 'genomer-plugin-validate', :path => '../../../'
81
+ """
82
+ When I run `genomer validate annotations`
83
+ Then the exit status should be 0
84
+ And the output should not contain "gene1"
85
+ Examples:
86
+ | product |
87
+ | hypothetical protein |
88
+ | hypothetical protein. |
89
+ | Hypothetical protein |
90
+ | Hypothetical protein. |
91
+ | transcription termination |
@@ -0,0 +1,19 @@
1
+ Feature: The validate annotations command line interface
2
+ In order to generate correct genome annotation files
3
+ A user can use the "validator" plugin at the command line
4
+ to validate their annotations
5
+
6
+ @disable-bundler
7
+ Scenario: Running with just the 'validate' command
8
+ Given I successfully run `genomer init project`
9
+ And I cd to "project"
10
+ And I append to "Gemfile" with:
11
+ """
12
+ gem 'genomer-plugin-validate', :path => '../../../'
13
+ """
14
+ When I run `genomer validate`
15
+ Then the exit status should be 0
16
+ And the output should contain:
17
+ """
18
+ annotations Validate GFF3 annotations file
19
+ """
@@ -0,0 +1,144 @@
1
+ Feature: Validating annotation files for duplicate IDs
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to detect duplicate IDs
4
+ to ensure that their annotation file contains no errors
5
+
6
+ @disable-bundler
7
+ Scenario: No duplicate IDs
8
+ Given I successfully run `genomer init project`
9
+ And I cd to "project"
10
+ And I write to "assembly/scaffold.yml" with:
11
+ """
12
+ ---
13
+ - sequence:
14
+ source: contig1
15
+ """
16
+ And I write to "assembly/sequence.fna" with:
17
+ """
18
+ >contig1
19
+ AAAAATTTTTGGGGGCCCCC
20
+ """
21
+ And I write to "assembly/annotations.gff" with:
22
+ """
23
+ ##gff-version 3
24
+ contig1 . gene 1 3 . + 1 ID=gene1
25
+ contig1 . gene 4 6 . + 1 ID=gene2
26
+ """
27
+ And I append to "Gemfile" with:
28
+ """
29
+ gem 'genomer-plugin-validate', :path => '../../../'
30
+ """
31
+ When I run `genomer validate annotations`
32
+ Then the exit status should be 0
33
+ And the output should not contain:
34
+ """
35
+ Duplicate ID
36
+ """
37
+
38
+ @disable-bundler
39
+ Scenario: Two duplicate IDs
40
+ Given I successfully run `genomer init project`
41
+ And I cd to "project"
42
+ And I write to "assembly/scaffold.yml" with:
43
+ """
44
+ ---
45
+ - sequence:
46
+ source: contig1
47
+ """
48
+ And I write to "assembly/sequence.fna" with:
49
+ """
50
+ >contig1
51
+ AAAAATTTTTGGGGGCCCCC
52
+ """
53
+ And I write to "assembly/annotations.gff" with:
54
+ """
55
+ ##gff-version 3
56
+ contig1 . gene 1 3 . + 1 ID=gene1
57
+ contig1 . gene 4 6 . + 1 ID=gene1
58
+ """
59
+ And I append to "Gemfile" with:
60
+ """
61
+ gem 'genomer-plugin-validate', :path => '../../../'
62
+ """
63
+ When I run `genomer validate annotations`
64
+ Then the exit status should be 0
65
+ And the output should contain:
66
+ """
67
+ Duplicate ID 'gene1'
68
+ """
69
+
70
+ @disable-bundler
71
+ Scenario: Multiple duplicate IDs
72
+ Given I successfully run `genomer init project`
73
+ And I cd to "project"
74
+ And I write to "assembly/scaffold.yml" with:
75
+ """
76
+ ---
77
+ - sequence:
78
+ source: contig1
79
+ """
80
+ And I write to "assembly/sequence.fna" with:
81
+ """
82
+ >contig1
83
+ AAAAATTTTTGGGGGCCCCC
84
+ """
85
+ And I write to "assembly/annotations.gff" with:
86
+ """
87
+ ##gff-version 3
88
+ contig1 . gene 1 3 . + 1 ID=gene1;Name=abc
89
+ contig1 . gene 4 6 . + 1 ID=gene1;Name=abc
90
+ contig1 . gene 7 9 . + 1 ID=gene2;Name=abc
91
+ contig1 . gene 10 12 . + 1 ID=gene2;Name=abc
92
+ contig1 . gene 13 15 . + 1 ID=gene3;Name=abc
93
+ """
94
+ And I append to "Gemfile" with:
95
+ """
96
+ gem 'genomer-plugin-validate', :path => '../../../'
97
+ """
98
+ When I run `genomer validate annotations`
99
+ Then the exit status should be 0
100
+ And the output should not contain "gene3"
101
+ And the output should contain:
102
+ """
103
+ Duplicate ID 'gene1'
104
+ Duplicate ID 'gene2'
105
+ """
106
+
107
+ @disable-bundler
108
+ Scenario: Two duplicate IDs and a annotations with missing IDs
109
+ Given I successfully run `genomer init project`
110
+ And I cd to "project"
111
+ And I write to "assembly/scaffold.yml" with:
112
+ """
113
+ ---
114
+ - sequence:
115
+ source: contig1
116
+ """
117
+ And I write to "assembly/sequence.fna" with:
118
+ """
119
+ >contig1
120
+ AAAAATTTTTGGGGGCCCCC
121
+ """
122
+ And I write to "assembly/annotations.gff" with:
123
+ """
124
+ ##gff-version 3
125
+ contig1 . gene 1 3 . + 1 ID=gene1
126
+ contig1 . gene 4 6 . + 1 ID=gene1
127
+ contig1 . gene 7 9 . + 1
128
+ contig1 . gene 10 12 . + 1
129
+ """
130
+ And I append to "Gemfile" with:
131
+ """
132
+ gem 'genomer-plugin-validate', :path => '../../../'
133
+ """
134
+ When I run `genomer validate annotations`
135
+ Then the exit status should be 0
136
+ And the output should contain:
137
+ """
138
+ Duplicate ID 'gene1'
139
+ """
140
+ And the output should not contain:
141
+ """
142
+ Duplicate ID ''
143
+ """
144
+
@@ -0,0 +1,74 @@
1
+ Feature: Validating annotation files for identical locations
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to detect identical locations
4
+ to ensure that their annotation file contains no errors
5
+
6
+ @disable-bundler
7
+ Scenario: Validating an annotations file with two identical locations
8
+ Given I successfully run `genomer init project`
9
+ And I cd to "project"
10
+ And I write to "assembly/scaffold.yml" with:
11
+ """
12
+ ---
13
+ - sequence:
14
+ source: contig1
15
+ """
16
+ And I write to "assembly/sequence.fna" with:
17
+ """
18
+ >contig1
19
+ AAAAATTTTTGGGGGCCCCC
20
+ """
21
+ And I write to "assembly/annotations.gff" with:
22
+ """
23
+ ##gff-version 3
24
+ contig1 . gene 1 3 . + 1 ID=gene1
25
+ contig1 . gene 1 3 . + 1 ID=gene2
26
+ """
27
+ And I append to "Gemfile" with:
28
+ """
29
+ gem 'genomer-plugin-validate', :path => '../../../'
30
+ """
31
+ When I run `genomer validate annotations`
32
+ Then the exit status should be 0
33
+ And the output should contain:
34
+ """
35
+ Identical locations for 'gene1', 'gene2'
36
+
37
+ """
38
+
39
+ @disable-bundler
40
+ Scenario: Validating an annotations file with two sets of identical locations
41
+ Given I successfully run `genomer init project`
42
+ And I cd to "project"
43
+ And I write to "assembly/scaffold.yml" with:
44
+ """
45
+ ---
46
+ - sequence:
47
+ source: contig1
48
+ """
49
+ And I write to "assembly/sequence.fna" with:
50
+ """
51
+ >contig1
52
+ AAAAATTTTTGGGGGCCCCC
53
+ """
54
+ And I write to "assembly/annotations.gff" with:
55
+ """
56
+ ##gff-version 3
57
+ contig1 . gene 1 3 . + 1 ID=gene1
58
+ contig1 . gene 1 3 . + 1 ID=gene2
59
+ contig1 . gene 4 6 . + 1 ID=gene3
60
+ contig1 . gene 4 6 . + 1 ID=gene4
61
+ contig1 . gene 7 9 . + 1 ID=gene5
62
+ """
63
+ And I append to "Gemfile" with:
64
+ """
65
+ gem 'genomer-plugin-validate', :path => '../../../'
66
+ """
67
+ When I run `genomer validate annotations`
68
+ Then the exit status should be 0
69
+ And the output should contain:
70
+ """
71
+ Identical locations for 'gene1', 'gene2'
72
+ Identical locations for 'gene3', 'gene4'
73
+
74
+ """
@@ -0,0 +1,135 @@
1
+ Feature: Validating annotation files for incorrect attributes
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to detect incorrect attributes
4
+ to ensure that their annotation file contains no errors
5
+
6
+ Scenario Outline: Validating an annotations file with known GFF attributes
7
+ Given I successfully run `genomer init project`
8
+ And I cd to "project"
9
+ And I write to "assembly/scaffold.yml" with:
10
+ """
11
+ ---
12
+ - sequence:
13
+ source: contig1
14
+ """
15
+ And I write to "assembly/sequence.fna" with:
16
+ """
17
+ >contig1
18
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
19
+ """
20
+ And I write to "assembly/annotations.gff" with:
21
+ """
22
+ ##gff-version 3
23
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=gene1;<attribute>=abc
24
+ """
25
+ And I append to "Gemfile" with:
26
+ """
27
+ gem 'genomer-plugin-validate', :path => '../../../'
28
+ """
29
+ When I run `genomer validate annotations`
30
+ Then the exit status should be 0
31
+ And the output should not contain "gene1"
32
+ Examples:
33
+ | attribute |
34
+ | Alias |
35
+ | Parent |
36
+ | Target |
37
+ | Gap |
38
+ | Derives_from |
39
+ | Note |
40
+ | Dbxref |
41
+ | Ontology_term |
42
+ | Is_circular |
43
+
44
+ Scenario Outline: Validating an annotations file with known genomer-plugin-view attributes
45
+ Given I successfully run `genomer init project`
46
+ And I cd to "project"
47
+ And I write to "assembly/scaffold.yml" with:
48
+ """
49
+ ---
50
+ - sequence:
51
+ source: contig1
52
+ """
53
+ And I write to "assembly/sequence.fna" with:
54
+ """
55
+ >contig1
56
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
57
+ """
58
+ And I write to "assembly/annotations.gff" with:
59
+ """
60
+ ##gff-version 3
61
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=gene1;<attribute>=abc
62
+ """
63
+ And I append to "Gemfile" with:
64
+ """
65
+ gem 'genomer-plugin-validate', :path => '../../../'
66
+ """
67
+ When I run `genomer validate annotations`
68
+ Then the exit status should be 0
69
+ And the output should not contain "gene1"
70
+ Examples:
71
+ | attribute |
72
+ | product |
73
+ | function |
74
+ | ec_number |
75
+ | feature_type |
76
+
77
+ Scenario: Validating an annotations file with unknown GFF3 attributes
78
+ Given I successfully run `genomer init project`
79
+ And I cd to "project"
80
+ And I write to "assembly/scaffold.yml" with:
81
+ """
82
+ ---
83
+ - sequence:
84
+ source: contig1
85
+ """
86
+ And I write to "assembly/sequence.fna" with:
87
+ """
88
+ >contig1
89
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
90
+ """
91
+ And I write to "assembly/annotations.gff" with:
92
+ """
93
+ ##gff-version 3
94
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=abc;Unknown_term=something
95
+ """
96
+ And I append to "Gemfile" with:
97
+ """
98
+ gem 'genomer-plugin-validate', :path => '../../../'
99
+ """
100
+ When I run `genomer validate annotations`
101
+ Then the exit status should be 0
102
+ And the output should contain:
103
+ """
104
+ Illegal GFF3 attribute 'Unknown_term' for 'gene1'
105
+ """
106
+
107
+ Scenario: Validating an annotations file with unknown genomer-plugin-view attributes
108
+ Given I successfully run `genomer init project`
109
+ And I cd to "project"
110
+ And I write to "assembly/scaffold.yml" with:
111
+ """
112
+ ---
113
+ - sequence:
114
+ source: contig1
115
+ """
116
+ And I write to "assembly/sequence.fna" with:
117
+ """
118
+ >contig1
119
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
120
+ """
121
+ And I write to "assembly/annotations.gff" with:
122
+ """
123
+ ##gff-version 3
124
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=abc;unknown_lowercase_term=something
125
+ """
126
+ And I append to "Gemfile" with:
127
+ """
128
+ gem 'genomer-plugin-validate', :path => '../../../'
129
+ """
130
+ When I run `genomer validate annotations --validate_for_view`
131
+ Then the exit status should be 0
132
+ And the output should contain:
133
+ """
134
+ Illegal view attribute 'unknown_lowercase_term' for 'gene1'
135
+ """