genomer-plugin-validate 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. data/.gitignore +4 -0
  2. data/Gemfile +4 -0
  3. data/Rakefile +9 -0
  4. data/VERSION +1 -0
  5. data/features/annotations/bad-product-field.feature +91 -0
  6. data/features/annotations/command-line-interface.feature +19 -0
  7. data/features/annotations/duplicate_id.feature +144 -0
  8. data/features/annotations/identical_locations.feature +74 -0
  9. data/features/annotations/incorrect-attributes.feature +135 -0
  10. data/features/annotations/missing_attributes.feature +75 -0
  11. data/features/annotations/name.feature +40 -0
  12. data/features/command-line-interface.feature +36 -0
  13. data/features/support/env.rb +13 -0
  14. data/genomer-plugin-validate.gemspec +28 -0
  15. data/lib/extensions/string.rb +12 -0
  16. data/lib/genomer-plugin-validate.rb +33 -0
  17. data/lib/genomer-plugin-validate/group.rb +17 -0
  18. data/lib/genomer-plugin-validate/group/annotations.rb +20 -0
  19. data/lib/genomer-plugin-validate/validator.rb +27 -0
  20. data/lib/genomer-plugin-validate/validator/bad_product_field.rb +45 -0
  21. data/lib/genomer-plugin-validate/validator/duplicate_coordinates.rb +12 -0
  22. data/lib/genomer-plugin-validate/validator/duplicate_id.rb +11 -0
  23. data/lib/genomer-plugin-validate/validator/gff3_attributes.rb +16 -0
  24. data/lib/genomer-plugin-validate/validator/missing_id.rb +13 -0
  25. data/lib/genomer-plugin-validate/validator/no_name_or_product.rb +13 -0
  26. data/lib/genomer-plugin-validate/validator/uppercase_name.rb +13 -0
  27. data/lib/genomer-plugin-validate/validator/view_attributes.rb +16 -0
  28. data/man/genomer-validate.ronn +100 -0
  29. data/spec/genomer-plugin-validate/group/annotations_spec.rb +18 -0
  30. data/spec/genomer-plugin-validate/group_spec.rb +24 -0
  31. data/spec/genomer-plugin-validate/validator/bad_product_field_spec.rb +93 -0
  32. data/spec/genomer-plugin-validate/validator/duplicate_coordinates_spec.rb +24 -0
  33. data/spec/genomer-plugin-validate/validator/duplicate_id_spec.rb +34 -0
  34. data/spec/genomer-plugin-validate/validator/gff_attributes_spec.rb +32 -0
  35. data/spec/genomer-plugin-validate/validator/missing_id_spec.rb +27 -0
  36. data/spec/genomer-plugin-validate/validator/no_name_or_product_spec.rb +28 -0
  37. data/spec/genomer-plugin-validate/validator/uppercase_name_spec.rb +22 -0
  38. data/spec/genomer-plugin-validate/validator/view_attributes_spec.rb +31 -0
  39. data/spec/genomer-plugin-validate/validator_spec.rb +107 -0
  40. data/spec/genomer-plugin-validate_spec.rb +92 -0
  41. data/spec/spec_helper.rb +35 -0
  42. data/spec/validator_run_matcher.rb +25 -0
  43. metadata +244 -0
data/.gitignore ADDED
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in genomer-plugin-validate.gemspec
4
+ gemspec
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ require 'rspec/core'
4
+ require 'rspec/core/rake_task'
5
+ RSpec::Core::RakeTask.new(:spec) do |spec|
6
+ spec.pattern = FileList['spec/**/*_spec.rb']
7
+ end
8
+
9
+ task :default => :spec
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.0.1
@@ -0,0 +1,91 @@
1
+ Feature: Validating annotation files for bad product fields
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to find bad product fields
4
+ to ensure that their annotation file contains valid product information
5
+
6
+ Scenario Outline: Validating an annotations file with bad product fields
7
+ Given I successfully run `genomer init project`
8
+ And I cd to "project"
9
+ And I write to "assembly/scaffold.yml" with:
10
+ """
11
+ ---
12
+ - sequence:
13
+ source: contig1
14
+ """
15
+ And I write to "assembly/sequence.fna" with:
16
+ """
17
+ >contig1
18
+ AAAAA
19
+ """
20
+ And I write to "assembly/annotations.gff" with:
21
+ """
22
+ ##gff-version 3
23
+ contig1 . gene 3 4 . + 1 ID=gene1;product=<product>
24
+ """
25
+ And I append to "Gemfile" with:
26
+ """
27
+ gem 'genomer-plugin-validate', :path => '../../../'
28
+ """
29
+ When I run `genomer validate annotations`
30
+ Then the exit status should be 0
31
+ And the output should contain:
32
+ """
33
+ Bad product field for 'gene1:' <correction>
34
+ """
35
+ Examples:
36
+ | product | correction |
37
+ | hypothetical thing | start with 'putative' instead of 'hypothetical.' |
38
+ | Hypothetical thing | start with 'putative' instead of 'hypothetical.' |
39
+ | something like | products ending with 'like' are not allowed. |
40
+ | something like. | products ending with 'like' are not allowed. |
41
+ | something-like | products ending with 'like' are not allowed. |
42
+ | something Like | products ending with 'like' are not allowed. |
43
+ | something domain | products ending with 'domain' are not allowed. |
44
+ | something-domain | products ending with 'domain' are not allowed. |
45
+ | something domain. | products ending with 'domain' are not allowed. |
46
+ | something Domain | products ending with 'domain' are not allowed. |
47
+ | something-related | products ending with 'related' are not allowed. |
48
+ | something related | products ending with 'related' are not allowed. |
49
+ | something related. | products ending with 'related' are not allowed. |
50
+ | something Related | products ending with 'related' are not allowed. |
51
+ | something n-term | 'N-terminal' or variations are not allowed. |
52
+ | something N-term | 'N-terminal' or variations are not allowed. |
53
+ | something N-terminal | 'N-terminal' or variations are not allowed. |
54
+ | something n-terminal | 'N-terminal' or variations are not allowed. |
55
+ | SOMETHING | all caps product fields are not allowed. |
56
+ | SOMETHING PROTEIN | all caps product fields are not allowed. |
57
+ | SOMETHING-PROTEIN | all caps product fields are not allowed. |
58
+
59
+ Scenario Outline: Validating an annotations file with acceptable product fields
60
+ Given I successfully run `genomer init project`
61
+ And I cd to "project"
62
+ And I write to "assembly/scaffold.yml" with:
63
+ """
64
+ ---
65
+ - sequence:
66
+ source: contig1
67
+ """
68
+ And I write to "assembly/sequence.fna" with:
69
+ """
70
+ >contig1
71
+ AAAAA
72
+ """
73
+ And I write to "assembly/annotations.gff" with:
74
+ """
75
+ ##gff-version 3
76
+ contig1 . gene 3 4 . + 1 ID=gene1;product=<product>
77
+ """
78
+ And I append to "Gemfile" with:
79
+ """
80
+ gem 'genomer-plugin-validate', :path => '../../../'
81
+ """
82
+ When I run `genomer validate annotations`
83
+ Then the exit status should be 0
84
+ And the output should not contain "gene1"
85
+ Examples:
86
+ | product |
87
+ | hypothetical protein |
88
+ | hypothetical protein. |
89
+ | Hypothetical protein |
90
+ | Hypothetical protein. |
91
+ | transcription termination |
@@ -0,0 +1,19 @@
1
+ Feature: The validate annotations command line interface
2
+ In order to generate correct genome annotation files
3
+ A user can use the "validator" plugin at the command line
4
+ to validate their annotations
5
+
6
+ @disable-bundler
7
+ Scenario: Running with just the 'validate' command
8
+ Given I successfully run `genomer init project`
9
+ And I cd to "project"
10
+ And I append to "Gemfile" with:
11
+ """
12
+ gem 'genomer-plugin-validate', :path => '../../../'
13
+ """
14
+ When I run `genomer validate`
15
+ Then the exit status should be 0
16
+ And the output should contain:
17
+ """
18
+ annotations Validate GFF3 annotations file
19
+ """
@@ -0,0 +1,144 @@
1
+ Feature: Validating annotation files for duplicate IDs
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to detect duplicate IDs
4
+ to ensure that their annotation file contains no errors
5
+
6
+ @disable-bundler
7
+ Scenario: No duplicate IDs
8
+ Given I successfully run `genomer init project`
9
+ And I cd to "project"
10
+ And I write to "assembly/scaffold.yml" with:
11
+ """
12
+ ---
13
+ - sequence:
14
+ source: contig1
15
+ """
16
+ And I write to "assembly/sequence.fna" with:
17
+ """
18
+ >contig1
19
+ AAAAATTTTTGGGGGCCCCC
20
+ """
21
+ And I write to "assembly/annotations.gff" with:
22
+ """
23
+ ##gff-version 3
24
+ contig1 . gene 1 3 . + 1 ID=gene1
25
+ contig1 . gene 4 6 . + 1 ID=gene2
26
+ """
27
+ And I append to "Gemfile" with:
28
+ """
29
+ gem 'genomer-plugin-validate', :path => '../../../'
30
+ """
31
+ When I run `genomer validate annotations`
32
+ Then the exit status should be 0
33
+ And the output should not contain:
34
+ """
35
+ Duplicate ID
36
+ """
37
+
38
+ @disable-bundler
39
+ Scenario: Two duplicate IDs
40
+ Given I successfully run `genomer init project`
41
+ And I cd to "project"
42
+ And I write to "assembly/scaffold.yml" with:
43
+ """
44
+ ---
45
+ - sequence:
46
+ source: contig1
47
+ """
48
+ And I write to "assembly/sequence.fna" with:
49
+ """
50
+ >contig1
51
+ AAAAATTTTTGGGGGCCCCC
52
+ """
53
+ And I write to "assembly/annotations.gff" with:
54
+ """
55
+ ##gff-version 3
56
+ contig1 . gene 1 3 . + 1 ID=gene1
57
+ contig1 . gene 4 6 . + 1 ID=gene1
58
+ """
59
+ And I append to "Gemfile" with:
60
+ """
61
+ gem 'genomer-plugin-validate', :path => '../../../'
62
+ """
63
+ When I run `genomer validate annotations`
64
+ Then the exit status should be 0
65
+ And the output should contain:
66
+ """
67
+ Duplicate ID 'gene1'
68
+ """
69
+
70
+ @disable-bundler
71
+ Scenario: Multiple duplicate IDs
72
+ Given I successfully run `genomer init project`
73
+ And I cd to "project"
74
+ And I write to "assembly/scaffold.yml" with:
75
+ """
76
+ ---
77
+ - sequence:
78
+ source: contig1
79
+ """
80
+ And I write to "assembly/sequence.fna" with:
81
+ """
82
+ >contig1
83
+ AAAAATTTTTGGGGGCCCCC
84
+ """
85
+ And I write to "assembly/annotations.gff" with:
86
+ """
87
+ ##gff-version 3
88
+ contig1 . gene 1 3 . + 1 ID=gene1;Name=abc
89
+ contig1 . gene 4 6 . + 1 ID=gene1;Name=abc
90
+ contig1 . gene 7 9 . + 1 ID=gene2;Name=abc
91
+ contig1 . gene 10 12 . + 1 ID=gene2;Name=abc
92
+ contig1 . gene 13 15 . + 1 ID=gene3;Name=abc
93
+ """
94
+ And I append to "Gemfile" with:
95
+ """
96
+ gem 'genomer-plugin-validate', :path => '../../../'
97
+ """
98
+ When I run `genomer validate annotations`
99
+ Then the exit status should be 0
100
+ And the output should not contain "gene3"
101
+ And the output should contain:
102
+ """
103
+ Duplicate ID 'gene1'
104
+ Duplicate ID 'gene2'
105
+ """
106
+
107
+ @disable-bundler
108
+ Scenario: Two duplicate IDs and a annotations with missing IDs
109
+ Given I successfully run `genomer init project`
110
+ And I cd to "project"
111
+ And I write to "assembly/scaffold.yml" with:
112
+ """
113
+ ---
114
+ - sequence:
115
+ source: contig1
116
+ """
117
+ And I write to "assembly/sequence.fna" with:
118
+ """
119
+ >contig1
120
+ AAAAATTTTTGGGGGCCCCC
121
+ """
122
+ And I write to "assembly/annotations.gff" with:
123
+ """
124
+ ##gff-version 3
125
+ contig1 . gene 1 3 . + 1 ID=gene1
126
+ contig1 . gene 4 6 . + 1 ID=gene1
127
+ contig1 . gene 7 9 . + 1
128
+ contig1 . gene 10 12 . + 1
129
+ """
130
+ And I append to "Gemfile" with:
131
+ """
132
+ gem 'genomer-plugin-validate', :path => '../../../'
133
+ """
134
+ When I run `genomer validate annotations`
135
+ Then the exit status should be 0
136
+ And the output should contain:
137
+ """
138
+ Duplicate ID 'gene1'
139
+ """
140
+ And the output should not contain:
141
+ """
142
+ Duplicate ID ''
143
+ """
144
+
@@ -0,0 +1,74 @@
1
+ Feature: Validating annotation files for identical locations
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to detect identical locations
4
+ to ensure that their annotation file contains no errors
5
+
6
+ @disable-bundler
7
+ Scenario: Validating an annotations file with two identical locations
8
+ Given I successfully run `genomer init project`
9
+ And I cd to "project"
10
+ And I write to "assembly/scaffold.yml" with:
11
+ """
12
+ ---
13
+ - sequence:
14
+ source: contig1
15
+ """
16
+ And I write to "assembly/sequence.fna" with:
17
+ """
18
+ >contig1
19
+ AAAAATTTTTGGGGGCCCCC
20
+ """
21
+ And I write to "assembly/annotations.gff" with:
22
+ """
23
+ ##gff-version 3
24
+ contig1 . gene 1 3 . + 1 ID=gene1
25
+ contig1 . gene 1 3 . + 1 ID=gene2
26
+ """
27
+ And I append to "Gemfile" with:
28
+ """
29
+ gem 'genomer-plugin-validate', :path => '../../../'
30
+ """
31
+ When I run `genomer validate annotations`
32
+ Then the exit status should be 0
33
+ And the output should contain:
34
+ """
35
+ Identical locations for 'gene1', 'gene2'
36
+
37
+ """
38
+
39
+ @disable-bundler
40
+ Scenario: Validating an annotations file with two sets of identical locations
41
+ Given I successfully run `genomer init project`
42
+ And I cd to "project"
43
+ And I write to "assembly/scaffold.yml" with:
44
+ """
45
+ ---
46
+ - sequence:
47
+ source: contig1
48
+ """
49
+ And I write to "assembly/sequence.fna" with:
50
+ """
51
+ >contig1
52
+ AAAAATTTTTGGGGGCCCCC
53
+ """
54
+ And I write to "assembly/annotations.gff" with:
55
+ """
56
+ ##gff-version 3
57
+ contig1 . gene 1 3 . + 1 ID=gene1
58
+ contig1 . gene 1 3 . + 1 ID=gene2
59
+ contig1 . gene 4 6 . + 1 ID=gene3
60
+ contig1 . gene 4 6 . + 1 ID=gene4
61
+ contig1 . gene 7 9 . + 1 ID=gene5
62
+ """
63
+ And I append to "Gemfile" with:
64
+ """
65
+ gem 'genomer-plugin-validate', :path => '../../../'
66
+ """
67
+ When I run `genomer validate annotations`
68
+ Then the exit status should be 0
69
+ And the output should contain:
70
+ """
71
+ Identical locations for 'gene1', 'gene2'
72
+ Identical locations for 'gene3', 'gene4'
73
+
74
+ """
@@ -0,0 +1,135 @@
1
+ Feature: Validating annotation files for incorrect attributes
2
+ In order to submit genome annotations
3
+ A user can use the "annotation" command to detect incorrect attributes
4
+ to ensure that their annotation file contains no errors
5
+
6
+ Scenario Outline: Validating an annotations file with known GFF attributes
7
+ Given I successfully run `genomer init project`
8
+ And I cd to "project"
9
+ And I write to "assembly/scaffold.yml" with:
10
+ """
11
+ ---
12
+ - sequence:
13
+ source: contig1
14
+ """
15
+ And I write to "assembly/sequence.fna" with:
16
+ """
17
+ >contig1
18
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
19
+ """
20
+ And I write to "assembly/annotations.gff" with:
21
+ """
22
+ ##gff-version 3
23
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=gene1;<attribute>=abc
24
+ """
25
+ And I append to "Gemfile" with:
26
+ """
27
+ gem 'genomer-plugin-validate', :path => '../../../'
28
+ """
29
+ When I run `genomer validate annotations`
30
+ Then the exit status should be 0
31
+ And the output should not contain "gene1"
32
+ Examples:
33
+ | attribute |
34
+ | Alias |
35
+ | Parent |
36
+ | Target |
37
+ | Gap |
38
+ | Derives_from |
39
+ | Note |
40
+ | Dbxref |
41
+ | Ontology_term |
42
+ | Is_circular |
43
+
44
+ Scenario Outline: Validating an annotations file with known genomer-plugin-view attributes
45
+ Given I successfully run `genomer init project`
46
+ And I cd to "project"
47
+ And I write to "assembly/scaffold.yml" with:
48
+ """
49
+ ---
50
+ - sequence:
51
+ source: contig1
52
+ """
53
+ And I write to "assembly/sequence.fna" with:
54
+ """
55
+ >contig1
56
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
57
+ """
58
+ And I write to "assembly/annotations.gff" with:
59
+ """
60
+ ##gff-version 3
61
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=gene1;<attribute>=abc
62
+ """
63
+ And I append to "Gemfile" with:
64
+ """
65
+ gem 'genomer-plugin-validate', :path => '../../../'
66
+ """
67
+ When I run `genomer validate annotations`
68
+ Then the exit status should be 0
69
+ And the output should not contain "gene1"
70
+ Examples:
71
+ | attribute |
72
+ | product |
73
+ | function |
74
+ | ec_number |
75
+ | feature_type |
76
+
77
+ Scenario: Validating an annotations file with unknown GFF3 attributes
78
+ Given I successfully run `genomer init project`
79
+ And I cd to "project"
80
+ And I write to "assembly/scaffold.yml" with:
81
+ """
82
+ ---
83
+ - sequence:
84
+ source: contig1
85
+ """
86
+ And I write to "assembly/sequence.fna" with:
87
+ """
88
+ >contig1
89
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
90
+ """
91
+ And I write to "assembly/annotations.gff" with:
92
+ """
93
+ ##gff-version 3
94
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=abc;Unknown_term=something
95
+ """
96
+ And I append to "Gemfile" with:
97
+ """
98
+ gem 'genomer-plugin-validate', :path => '../../../'
99
+ """
100
+ When I run `genomer validate annotations`
101
+ Then the exit status should be 0
102
+ And the output should contain:
103
+ """
104
+ Illegal GFF3 attribute 'Unknown_term' for 'gene1'
105
+ """
106
+
107
+ Scenario: Validating an annotations file with unknown genomer-plugin-view attributes
108
+ Given I successfully run `genomer init project`
109
+ And I cd to "project"
110
+ And I write to "assembly/scaffold.yml" with:
111
+ """
112
+ ---
113
+ - sequence:
114
+ source: contig1
115
+ """
116
+ And I write to "assembly/sequence.fna" with:
117
+ """
118
+ >contig1
119
+ AAAAATTTTTGGGGGCCCCCAAAAATTTTTGGGGGCCCCC
120
+ """
121
+ And I write to "assembly/annotations.gff" with:
122
+ """
123
+ ##gff-version 3
124
+ contig1 . gene 3 4 . + 1 ID=gene1;Name=abc;unknown_lowercase_term=something
125
+ """
126
+ And I append to "Gemfile" with:
127
+ """
128
+ gem 'genomer-plugin-validate', :path => '../../../'
129
+ """
130
+ When I run `genomer validate annotations --validate_for_view`
131
+ Then the exit status should be 0
132
+ And the output should contain:
133
+ """
134
+ Illegal view attribute 'unknown_lowercase_term' for 'gene1'
135
+ """