REDItools3 3.3__tar.gz → 3.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. reditools3-3.4/PKG-INFO +99 -0
  2. reditools3-3.4/README.md +75 -0
  3. reditools3-3.4/REDItools3.egg-info/PKG-INFO +99 -0
  4. {reditools3-3.3 → reditools3-3.4}/REDItools3.egg-info/SOURCES.txt +1 -0
  5. {reditools3-3.3 → reditools3-3.4}/pyproject.toml +1 -1
  6. {reditools3-3.3 → reditools3-3.4}/reditools/__main__.py +6 -2
  7. {reditools3-3.3 → reditools3-3.4}/reditools/alignment_file.py +1 -2
  8. {reditools3-3.3 → reditools3-3.4}/reditools/analyze.py +13 -2
  9. reditools3-3.4/reditools/annotate.py +126 -0
  10. {reditools3-3.3 → reditools3-3.4}/reditools/reditools.py +26 -20
  11. {reditools3-3.3 → reditools3-3.4}/reditools/rtchecks.py +0 -41
  12. reditools3-3.3/PKG-INFO +0 -36
  13. reditools3-3.3/README.md +0 -13
  14. reditools3-3.3/REDItools3.egg-info/PKG-INFO +0 -36
  15. {reditools3-3.3 → reditools3-3.4}/LICENSE +0 -0
  16. {reditools3-3.3 → reditools3-3.4}/REDItools3.egg-info/dependency_links.txt +0 -0
  17. {reditools3-3.3 → reditools3-3.4}/REDItools3.egg-info/requires.txt +0 -0
  18. {reditools3-3.3 → reditools3-3.4}/REDItools3.egg-info/top_level.txt +0 -0
  19. {reditools3-3.3 → reditools3-3.4}/reditools/__init__.py +0 -0
  20. {reditools3-3.3 → reditools3-3.4}/reditools/alignment_manager.py +0 -0
  21. {reditools3-3.3 → reditools3-3.4}/reditools/compiled_position.py +0 -0
  22. {reditools3-3.3 → reditools3-3.4}/reditools/compiled_reads.py +0 -0
  23. {reditools3-3.3 → reditools3-3.4}/reditools/fasta_file.py +0 -0
  24. {reditools3-3.3 → reditools3-3.4}/reditools/file_utils.py +0 -0
  25. {reditools3-3.3 → reditools3-3.4}/reditools/homopolymerics.py +0 -0
  26. {reditools3-3.3 → reditools3-3.4}/reditools/index.py +0 -0
  27. {reditools3-3.3 → reditools3-3.4}/reditools/logger.py +0 -0
  28. {reditools3-3.3 → reditools3-3.4}/reditools/region.py +0 -0
  29. {reditools3-3.3 → reditools3-3.4}/reditools/utils.py +0 -0
  30. {reditools3-3.3 → reditools3-3.4}/setup.cfg +0 -0
@@ -0,0 +1,99 @@
1
+ Metadata-Version: 2.4
2
+ Name: REDItools3
3
+ Version: 3.4
4
+ Author: Ernesto Picardi
5
+ Author-email: Adam Handen <adam.handen@gmail.com>
6
+ Project-URL: homepage, https://github.com/BioinfoUNIBA/REDItools3
7
+ Project-URL: repository, https://github.com/BioinfoUNIBA/REDItools3
8
+ Project-URL: issues, https://github.com/BioinfoUNIBA/REDItools3/issues
9
+ Keywords: bioinformatics,RNA,RNA-editing
10
+ Classifier: Development Status :: 5 - Production/Stable
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: License :: OSI Approved :: GNU General Public License (GPL)
14
+ Classifier: Operating System :: MacOS :: MacOS X
15
+ Classifier: Operating System :: Unix
16
+ Classifier: Programming Language :: Python :: 3.7
17
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
18
+ Requires-Python: >=3.7
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: pysam>=0.22.0
22
+ Requires-Dist: sortedcontainers>=2.4.0
23
+ Dynamic: license-file
24
+
25
+ # REDItools3
26
+ A new REDItools implementation to speed-up the RNA editing profiling in massive RNAseq data
27
+
28
+ # Installation
29
+ Install from PyPi.
30
+ `pip install REDItools3`
31
+
32
+ Use the whl file under the dist directory.
33
+ `pip install dist/reditools-0.1-py3-none-any.whl`
34
+
35
+ # Usage
36
+ Once installed, reditools can be run from the commandline.
37
+ `python -m reditools`
38
+
39
+ ## Tools
40
+
41
+ ### analyze
42
+ This is the core reditools function: detecting editing events from one or more BAM file.
43
+
44
+ The output is a tab separated table with these columns:
45
+ | Field | Description |
46
+ | --- | --- |
47
+ | Region | Chromosome or contig |
48
+ | Position | Position in the region |
49
+ | Reference | Base from the reference sequence |
50
+ | Strand | DNA strand (+, -, or \*) |
51
+ | Coverage-q30 | How many reads had a quality of at least 30 |
52
+ | MeanQ | Mean read quality |
53
+ | BaseCount[A,C,G,T] | Total count of each base found |
54
+ | AllSubs | All the detected substitutions |
55
+ | Frequency | Ratio of non-reference bases to reference bases |
56
+ | gCoverage-q30 | Genomic Coverage-q30 (see `annotate`) |
57
+ | gMeanQ | Genomic MeanQ (see `annotate`) |
58
+ | gBaseCount[A,C,G,T] | Genomic BaseCount (see `annotate`) |
59
+ | gAllSubs | Genomic variants (see `annotate`) |
60
+ | gFrequency | Genomic variant frequency (see `annotate`) |
61
+
62
+ The last 5 columns will always be blank (`-`). They are reserved for output
63
+ from the `annotate` tool.
64
+
65
+ ### annotate
66
+ Annotate RNA editing output with variant detection from genomic data.
67
+
68
+ `annotate` takes two reditools output files and fills in the last five columns
69
+ of the first file with positional matches from the second.
70
+
71
+ For example, this RNA file:
72
+ ```
73
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
74
+ chr1 1115715 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
75
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 - - - - -
76
+ ```
77
+
78
+ With this DNA file:
79
+ ```
80
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
81
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 - - - - -
82
+ chr1 1115717 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
83
+ ```
84
+
85
+ Produces:
86
+ ```
87
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
88
+ chr1 1115715 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
89
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 2 38.00 [2, 0, 0, 0] - 0.00
90
+ ```
91
+
92
+ ### find-repeats
93
+ Identify repetitive elements in a FASTQ file.
94
+
95
+ ### index
96
+ Compute RNA editing index from reditools `analyze` output
97
+ ([PMDI: 31636457](https://pubmed.ncbi.nlm.nih.gov/31636457/)).
98
+ The `index` tool computes the editing indices for all possible variants, not
99
+ just A-to-I (listed as A-G in the output).
@@ -0,0 +1,75 @@
1
+ # REDItools3
2
+ A new REDItools implementation to speed-up the RNA editing profiling in massive RNAseq data
3
+
4
+ # Installation
5
+ Install from PyPi.
6
+ `pip install REDItools3`
7
+
8
+ Use the whl file under the dist directory.
9
+ `pip install dist/reditools-0.1-py3-none-any.whl`
10
+
11
+ # Usage
12
+ Once installed, reditools can be run from the commandline.
13
+ `python -m reditools`
14
+
15
+ ## Tools
16
+
17
+ ### analyze
18
+ This is the core reditools function: detecting editing events from one or more BAM file.
19
+
20
+ The output is a tab separated table with these columns:
21
+ | Field | Description |
22
+ | --- | --- |
23
+ | Region | Chromosome or contig |
24
+ | Position | Position in the region |
25
+ | Reference | Base from the reference sequence |
26
+ | Strand | DNA strand (+, -, or \*) |
27
+ | Coverage-q30 | How many reads had a quality of at least 30 |
28
+ | MeanQ | Mean read quality |
29
+ | BaseCount[A,C,G,T] | Total count of each base found |
30
+ | AllSubs | All the detected substitutions |
31
+ | Frequency | Ratio of non-reference bases to reference bases |
32
+ | gCoverage-q30 | Genomic Coverage-q30 (see `annotate`) |
33
+ | gMeanQ | Genomic MeanQ (see `annotate`) |
34
+ | gBaseCount[A,C,G,T] | Genomic BaseCount (see `annotate`) |
35
+ | gAllSubs | Genomic variants (see `annotate`) |
36
+ | gFrequency | Genomic variant frequency (see `annotate`) |
37
+
38
+ The last 5 columns will always be blank (`-`). They are reserved for output
39
+ from the `annotate` tool.
40
+
41
+ ### annotate
42
+ Annotate RNA editing output with variant detection from genomic data.
43
+
44
+ `annotate` takes two reditools output files and fills in the last five columns
45
+ of the first file with positional matches from the second.
46
+
47
+ For example, this RNA file:
48
+ ```
49
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
50
+ chr1 1115715 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
51
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 - - - - -
52
+ ```
53
+
54
+ With this DNA file:
55
+ ```
56
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
57
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 - - - - -
58
+ chr1 1115717 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
59
+ ```
60
+
61
+ Produces:
62
+ ```
63
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
64
+ chr1 1115715 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
65
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 2 38.00 [2, 0, 0, 0] - 0.00
66
+ ```
67
+
68
+ ### find-repeats
69
+ Identify repetitive elements in a FASTQ file.
70
+
71
+ ### index
72
+ Compute RNA editing index from reditools `analyze` output
73
+ ([PMDI: 31636457](https://pubmed.ncbi.nlm.nih.gov/31636457/)).
74
+ The `index` tool computes the editing indices for all possible variants, not
75
+ just A-to-I (listed as A-G in the output).
@@ -0,0 +1,99 @@
1
+ Metadata-Version: 2.4
2
+ Name: REDItools3
3
+ Version: 3.4
4
+ Author: Ernesto Picardi
5
+ Author-email: Adam Handen <adam.handen@gmail.com>
6
+ Project-URL: homepage, https://github.com/BioinfoUNIBA/REDItools3
7
+ Project-URL: repository, https://github.com/BioinfoUNIBA/REDItools3
8
+ Project-URL: issues, https://github.com/BioinfoUNIBA/REDItools3/issues
9
+ Keywords: bioinformatics,RNA,RNA-editing
10
+ Classifier: Development Status :: 5 - Production/Stable
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: License :: OSI Approved :: GNU General Public License (GPL)
14
+ Classifier: Operating System :: MacOS :: MacOS X
15
+ Classifier: Operating System :: Unix
16
+ Classifier: Programming Language :: Python :: 3.7
17
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
18
+ Requires-Python: >=3.7
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: pysam>=0.22.0
22
+ Requires-Dist: sortedcontainers>=2.4.0
23
+ Dynamic: license-file
24
+
25
+ # REDItools3
26
+ A new REDItools implementation to speed-up the RNA editing profiling in massive RNAseq data
27
+
28
+ # Installation
29
+ Install from PyPi.
30
+ `pip install REDItools3`
31
+
32
+ Use the whl file under the dist directory.
33
+ `pip install dist/reditools-0.1-py3-none-any.whl`
34
+
35
+ # Usage
36
+ Once installed, reditools can be run from the commandline.
37
+ `python -m reditools`
38
+
39
+ ## Tools
40
+
41
+ ### analyze
42
+ This is the core reditools function: detecting editing events from one or more BAM file.
43
+
44
+ The output is a tab separated table with these columns:
45
+ | Field | Description |
46
+ | --- | --- |
47
+ | Region | Chromosome or contig |
48
+ | Position | Position in the region |
49
+ | Reference | Base from the reference sequence |
50
+ | Strand | DNA strand (+, -, or \*) |
51
+ | Coverage-q30 | How many reads had a quality of at least 30 |
52
+ | MeanQ | Mean read quality |
53
+ | BaseCount[A,C,G,T] | Total count of each base found |
54
+ | AllSubs | All the detected substitutions |
55
+ | Frequency | Ratio of non-reference bases to reference bases |
56
+ | gCoverage-q30 | Genomic Coverage-q30 (see `annotate`) |
57
+ | gMeanQ | Genomic MeanQ (see `annotate`) |
58
+ | gBaseCount[A,C,G,T] | Genomic BaseCount (see `annotate`) |
59
+ | gAllSubs | Genomic variants (see `annotate`) |
60
+ | gFrequency | Genomic variant frequency (see `annotate`) |
61
+
62
+ The last 5 columns will always be blank (`-`). They are reserved for output
63
+ from the `annotate` tool.
64
+
65
+ ### annotate
66
+ Annotate RNA editing output with variant detection from genomic data.
67
+
68
+ `annotate` takes two reditools output files and fills in the last five columns
69
+ of the first file with positional matches from the second.
70
+
71
+ For example, this RNA file:
72
+ ```
73
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
74
+ chr1 1115715 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
75
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 - - - - -
76
+ ```
77
+
78
+ With this DNA file:
79
+ ```
80
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
81
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 - - - - -
82
+ chr1 1115717 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
83
+ ```
84
+
85
+ Produces:
86
+ ```
87
+ Region Position Reference Strand Coverage-q30 MeanQ BaseCount[A,C,G,T] AllSubs Frequency gCoverage-q30 gMeanQ gBaseCount[A,C,G,T] gAllSubs gFrequency
88
+ chr1 1115715 C * 2 38.00 [0, 2, 0, 0] - 0.00 - - - - -
89
+ chr1 1115716 A * 2 38.00 [2, 0, 0, 0] - 0.00 2 38.00 [2, 0, 0, 0] - 0.00
90
+ ```
91
+
92
+ ### find-repeats
93
+ Identify repetitive elements in a FASTQ file.
94
+
95
+ ### index
96
+ Compute RNA editing index from reditools `analyze` output
97
+ ([PMDI: 31636457](https://pubmed.ncbi.nlm.nih.gov/31636457/)).
98
+ The `index` tool computes the editing indices for all possible variants, not
99
+ just A-to-I (listed as A-G in the output).
@@ -11,6 +11,7 @@ reditools/__main__.py
11
11
  reditools/alignment_file.py
12
12
  reditools/alignment_manager.py
13
13
  reditools/analyze.py
14
+ reditools/annotate.py
14
15
  reditools/compiled_position.py
15
16
  reditools/compiled_reads.py
16
17
  reditools/fasta_file.py
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "REDItools3"
7
- version = "v3.3"
7
+ version = "v3.4"
8
8
  authors = [
9
9
  { name="Adam Handen", email="adam.handen@gmail.com" },
10
10
  { name="Ernesto Picardi" },
@@ -2,12 +2,12 @@
2
2
 
3
3
  import sys
4
4
 
5
- from reditools import analyze, homopolymerics, index
5
+ from reditools import analyze, homopolymerics, index, annotate
6
6
 
7
7
 
8
8
  def usage():
9
9
  """Print program usage."""
10
- print("""usage: reditools {analyze,find-repeats,index}
10
+ print("""usage: reditools {analyze,find-repeats,index,annotate}
11
11
 
12
12
  REDItools3
13
13
 
@@ -18,6 +18,8 @@ Run Modes:
18
18
 
19
19
  index Calculate editing indices from the output of `analyze`
20
20
  mode.
21
+
22
+ annotate Annotate REDItools RNA output with DNA output
21
23
  """)
22
24
 
23
25
 
@@ -31,6 +33,8 @@ if __name__ == '__main__':
31
33
  homopolymerics.main()
32
34
  case 'index':
33
35
  index.main()
36
+ case 'annotate':
37
+ annotate.main()
34
38
  case _:
35
39
  usage()
36
40
  else:
@@ -118,9 +118,8 @@ class RTAlignmentFile(PysamAlignmentFile):
118
118
  # 141: NOT_MAPPED
119
119
  # 512: QC_FAIL
120
120
  # 256: IS_SECONDARY
121
- # 2048: IS_SUPPLEMENTARY
122
121
  # 1024: IS_DUPLICATE
123
- _flags_to_toss = {77, 141, 512, 256, 2048, 1024}
122
+ _flags_to_toss = {77, 141, 512, 256, 1024}
124
123
  _paired_flags_to_keep = {99, 147, 83, 163}
125
124
 
126
125
  def _check_quality(self, read):
@@ -88,6 +88,9 @@ def setup_rtools(options): # noqa:WPS213,WPS231
88
88
  options.splicing_span,
89
89
  )
90
90
 
91
+ if options.variants:
92
+ rtools.specific_edits = [_.upper() for _ in options.variants]
93
+
91
94
  if options.bed_file:
92
95
  regions = file_utils.read_bed_file(options.bed_file)
93
96
  rtools.target_positions = regions
@@ -173,7 +176,7 @@ def write_results(rtools, sam_manager, file_name, region, output_format):
173
176
  rt_result.per_base_depth,
174
177
  ' '.join(sorted(variants)) if variants else '-',
175
178
  f'{rt_result.edit_ratio:.2f}',
176
- '\t'.join(['-' for _ in range(5)]),
179
+ '-', '-', '-', '-', '-',
177
180
  ])
178
181
  return stream.name
179
182
 
@@ -345,7 +348,7 @@ def parse_options(): # noqa:WPS213
345
348
  '-me',
346
349
  '--min-edits',
347
350
  type=int,
348
- default=0, # noqa:WPS432
351
+ default=1,
349
352
  help='The minimum number of editing events (per position). ' +
350
353
  'Positions with fewer than -me edits will be discarded.',
351
354
  )
@@ -426,6 +429,14 @@ def parse_options(): # noqa:WPS213
426
429
  help='Run in debug mode.',
427
430
  action='store_true',
428
431
  )
432
+ parser.add_argument(
433
+ '-v',
434
+ '--variants',
435
+ nargs='*',
436
+ default=['CT', 'AG'],
437
+ help='Which editing events to report. Edits should be two characters, '
438
+ 'separated by spaces. Use "all" to report all variants.',
439
+ )
429
440
 
430
441
  return parser.parse_args()
431
442
 
@@ -0,0 +1,126 @@
1
+ import argparse
2
+ from reditools import file_utils
3
+ import csv
4
+ import sys
5
+
6
+
7
+ class RTAnnotater:
8
+ def __init__(self, rna_file, dna_file):
9
+ self.rna_file = rna_file
10
+ self.dna_file = dna_file
11
+ self.contig_order = self._load_contig_order()
12
+
13
+ def _load_contig_order(self):
14
+ contigs = {}
15
+ idx = 1
16
+ with file_utils.open_stream(self.rna_file, 'r') as stream:
17
+ reader = csv.reader(stream, delimiter='\t')
18
+ last_contig = next(reader)[0]
19
+ contigs[last_contig] = 0
20
+ for row in reader:
21
+ if row[0] == last_contig:
22
+ continue
23
+ contigs[row[0]] = idx
24
+ idx += 1
25
+ last_contig = row[0]
26
+ return contigs
27
+
28
+ def _cmp_position(self, rna_contig, rna_pos, dna_contig, dna_pos):
29
+ rna_contig = self.contig_order[rna_contig]
30
+ dna_contig = self.contig_order.get(dna_contig, len(self.contig_order))
31
+ if rna_contig < dna_contig:
32
+ return -1
33
+ if rna_contig > dna_contig:
34
+ return 1
35
+ rna_pos = int(rna_pos)
36
+ dna_pos = int(dna_pos)
37
+ if rna_pos < dna_pos:
38
+ return -1
39
+ if rna_pos > dna_pos:
40
+ return 1
41
+ return 0
42
+
43
+ def _annotate_row(self, rna_row, dna_row):
44
+ rna_row['gCoverage-q30'] = dna_row['Coverage-q30']
45
+ rna_row['gMeanQ'] = dna_row['MeanQ']
46
+ rna_row['gBaseCount[A,C,G,T]'] = dna_row['BaseCount[A,C,G,T]']
47
+ rna_row['gAllSubs'] = dna_row['AllSubs']
48
+ rna_row['gFrequency'] = dna_row['Frequency']
49
+ return rna_row
50
+
51
+ def _compare_files(self):
52
+ with file_utils.open_stream(self.rna_file, 'r') as rna_stream, \
53
+ file_utils.open_stream(self.dna_file, 'r') as dna_stream:
54
+ rna_reader = csv.DictReader(rna_stream, delimiter='\t')
55
+ dna_reader = csv.DictReader(dna_stream, delimiter='\t')
56
+
57
+ rna_entry = next(rna_reader, None)
58
+ dna_entry = next(dna_reader, None)
59
+
60
+ while rna_entry is not None:
61
+ if dna_entry is None:
62
+ yield rna_entry
63
+ rna_entry = next(rna_reader, None)
64
+ continue
65
+ cmp = self._cmp_position(
66
+ rna_entry['Region'],
67
+ rna_entry['Position'],
68
+ dna_entry['Region'],
69
+ dna_entry['Position'])
70
+ if cmp == 0:
71
+ yield self._annotate_row(rna_entry, dna_entry)
72
+ rna_entry = next(rna_reader, None)
73
+ dna_entry = next(dna_reader, None)
74
+ elif cmp > 0:
75
+ dna_entry = next(dna_reader, None)
76
+ else:
77
+ yield rna_entry
78
+ rna_entry = next(rna_reader, None)
79
+
80
+ def annotate(self, stream):
81
+ writer = csv.DictWriter(stream, delimiter='\t', fieldnames=[
82
+ 'Region',
83
+ 'Position',
84
+ 'Reference',
85
+ 'Strand',
86
+ 'Coverage-q30',
87
+ 'MeanQ',
88
+ 'BaseCount[A,C,G,T]',
89
+ 'AllSubs',
90
+ 'Frequency',
91
+ 'gCoverage-q30',
92
+ 'gMeanQ',
93
+ 'gBaseCount[A,C,G,T]',
94
+ 'gAllSubs',
95
+ 'gFrequency'])
96
+ writer.writeheader()
97
+ writer.writerows(self._compare_files())
98
+
99
+
100
+ def parse_options():
101
+ """
102
+ Parse commandline options for REDItools.
103
+
104
+ Returns:
105
+ namespace: commandline args
106
+ """
107
+ parser = argparse.ArgumentParser(
108
+ prog='reditools annotate',
109
+ description='Annotates RNA REDItools output with DNA output.',
110
+ formatter_class=argparse.ArgumentDefaultsHelpFormatter,
111
+ )
112
+ parser.add_argument(
113
+ 'rna_file',
114
+ help='The REDItools output from RNA data',
115
+ )
116
+ parser.add_argument(
117
+ 'dna_file',
118
+ help='The REDItools output from corresponding DNA data',
119
+ )
120
+ return parser.parse_args()
121
+
122
+
123
+ def main():
124
+ options = parse_options()
125
+ x = RTAnnotater(options.rna_file, options.dna_file)
126
+ x.annotate(sys.stdout)
@@ -134,7 +134,7 @@ class REDItools(object):
134
134
  self._include_refs = None
135
135
 
136
136
  @property
137
- def includ_refs(self):
137
+ def include_refs(self):
138
138
  """
139
139
  Genome reference bases to report on.
140
140
 
@@ -149,22 +149,29 @@ class REDItools(object):
149
149
  Specific edit events to report.
150
150
 
151
151
  Returns:
152
- iterable
152
+ set
153
153
  """
154
154
  return self._specific_edits
155
155
 
156
156
  @specific_edits.setter
157
- def specific_edits(self, alts):
158
- function_a = self._rtqc.check_specific_edits
159
- function_b = self._rtqc.check_ref
160
- self._specific_edits = set(alts)
161
- self._include_refs = [_[0] for _ in alts]
162
- if self._include_refs:
163
- self._rtqc.add(function_a)
164
- self._rtqc.add(function_b)
165
- else:
166
- self._rtqc.discard(function_a)
167
- self._rtqc.discard(function_b)
157
+ def specific_edits(self, edits):
158
+ if edits == ["ALL"]:
159
+ edits = []
160
+ for alt in edits:
161
+ if not self._verify_alt(alt):
162
+ raise Exception(
163
+ f'Specific edit "{alt}" is not valid. ' +
164
+ 'Edits must be two character strings of ATCG.')
165
+ self._specific_edits = set(edits)
166
+
167
+ def _verify_alt(self, alt):
168
+ if not isinstance(alt, str):
169
+ return False
170
+ if len(alt) != 2:
171
+ return False
172
+ if alt[0] not in 'ATCG' and alt[1] not in 'ATCG':
173
+ return False
174
+ return True
168
175
 
169
176
  @property
170
177
  def splice_positions(self):
@@ -389,13 +396,12 @@ class REDItools(object):
389
396
  if column is None:
390
397
  self.log(Logger.debug_level, 'Bad column - skipping')
391
398
  continue
392
- if self._specific_edits:
393
- if not self._specific_edits & set(column.variants):
394
- self.log(
395
- Logger.debug_level,
396
- 'Requested edits not found - skipping',
397
- )
398
- continue
399
+ if self._specific_edits and not self._specific_edits & set(column.variants):
400
+ self.log(
401
+ Logger.debug_level,
402
+ 'Requested edits not found - skipping',
403
+ )
404
+ continue
399
405
  self.log(
400
406
  Logger.debug_level,
401
407
  'Yielding output for {} reads',
@@ -218,26 +218,6 @@ class RTChecks(object):
218
218
  return False
219
219
  return True
220
220
 
221
- def check_ref(self, bases, rtools):
222
- """
223
- Check if the reference base is of interest.
224
-
225
- Parameters:
226
- bases (CompiledPosition): Data for analysis
227
- rtools (REDItools): Object running the analysis
228
-
229
- Returns:
230
- (bool): True if reference base was specified
231
- """
232
- if bases.ref not in rtools.include_refs:
233
- rtools.log(
234
- Logger.debug_level,
235
- 'DISCARD COLUMN base "{}" not listed for reporting',
236
- bases.ref,
237
- )
238
- return False
239
- return True
240
-
241
221
  def check_exclusions(self, bases, rtools):
242
222
  """
243
223
  Check if the bases object is in an excluded position.
@@ -254,27 +234,6 @@ class RTChecks(object):
254
234
  return False
255
235
  return True
256
236
 
257
- def check_specific_edits(self, bases, rtools):
258
- """
259
- Check whether specified edits are present.
260
-
261
- Parameters:
262
- bases (CompiledPosition): Data for analysis
263
- rtools (REDItools): Object running the analysis
264
-
265
- Returns:
266
- (bool): True if the edit was specified
267
- """
268
- for ref, alt in rtools.specific_edits:
269
- if not bases[ref] or not bases[alt]:
270
- rtools.log(
271
- Logger.debug_level,
272
- 'DISCARD COLUMN edit "{}" not specified for output',
273
- ref + alt,
274
- )
275
- return False
276
- return True
277
-
278
237
  def check_max_alts(self, bases, rtools):
279
238
  """
280
239
  Check that there are no more than a max number of alts.
reditools3-3.3/PKG-INFO DELETED
@@ -1,36 +0,0 @@
1
- Metadata-Version: 2.2
2
- Name: REDItools3
3
- Version: 3.3
4
- Author: Ernesto Picardi
5
- Author-email: Adam Handen <adam.handen@gmail.com>
6
- Project-URL: homepage, https://github.com/BioinfoUNIBA/REDItools3
7
- Project-URL: repository, https://github.com/BioinfoUNIBA/REDItools3
8
- Project-URL: issues, https://github.com/BioinfoUNIBA/REDItools3/issues
9
- Keywords: bioinformatics,RNA,RNA-editing
10
- Classifier: Development Status :: 5 - Production/Stable
11
- Classifier: Intended Audience :: Developers
12
- Classifier: Intended Audience :: Science/Research
13
- Classifier: License :: OSI Approved :: GNU General Public License (GPL)
14
- Classifier: Operating System :: MacOS :: MacOS X
15
- Classifier: Operating System :: Unix
16
- Classifier: Programming Language :: Python :: 3.7
17
- Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
18
- Requires-Python: >=3.7
19
- Description-Content-Type: text/markdown
20
- License-File: LICENSE
21
- Requires-Dist: pysam>=0.22.0
22
- Requires-Dist: sortedcontainers>=2.4.0
23
-
24
- # REDItools3
25
- A new REDItools implementation to speed-up the RNA editing profiling in massive RNAseq data
26
-
27
- # Installation
28
- Install from PyPi.
29
- `pip install REDItools3`
30
-
31
- Use the whl file under the dist directory.
32
- `pip install dist/reditools-0.1-py3-none-any.whl`
33
-
34
- # Usage
35
- Once installed, reditools can be run from the commandline.
36
- `python -m reditools`
reditools3-3.3/README.md DELETED
@@ -1,13 +0,0 @@
1
- # REDItools3
2
- A new REDItools implementation to speed-up the RNA editing profiling in massive RNAseq data
3
-
4
- # Installation
5
- Install from PyPi.
6
- `pip install REDItools3`
7
-
8
- Use the whl file under the dist directory.
9
- `pip install dist/reditools-0.1-py3-none-any.whl`
10
-
11
- # Usage
12
- Once installed, reditools can be run from the commandline.
13
- `python -m reditools`
@@ -1,36 +0,0 @@
1
- Metadata-Version: 2.2
2
- Name: REDItools3
3
- Version: 3.3
4
- Author: Ernesto Picardi
5
- Author-email: Adam Handen <adam.handen@gmail.com>
6
- Project-URL: homepage, https://github.com/BioinfoUNIBA/REDItools3
7
- Project-URL: repository, https://github.com/BioinfoUNIBA/REDItools3
8
- Project-URL: issues, https://github.com/BioinfoUNIBA/REDItools3/issues
9
- Keywords: bioinformatics,RNA,RNA-editing
10
- Classifier: Development Status :: 5 - Production/Stable
11
- Classifier: Intended Audience :: Developers
12
- Classifier: Intended Audience :: Science/Research
13
- Classifier: License :: OSI Approved :: GNU General Public License (GPL)
14
- Classifier: Operating System :: MacOS :: MacOS X
15
- Classifier: Operating System :: Unix
16
- Classifier: Programming Language :: Python :: 3.7
17
- Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
18
- Requires-Python: >=3.7
19
- Description-Content-Type: text/markdown
20
- License-File: LICENSE
21
- Requires-Dist: pysam>=0.22.0
22
- Requires-Dist: sortedcontainers>=2.4.0
23
-
24
- # REDItools3
25
- A new REDItools implementation to speed-up the RNA editing profiling in massive RNAseq data
26
-
27
- # Installation
28
- Install from PyPi.
29
- `pip install REDItools3`
30
-
31
- Use the whl file under the dist directory.
32
- `pip install dist/reditools-0.1-py3-none-any.whl`
33
-
34
- # Usage
35
- Once installed, reditools can be run from the commandline.
36
- `python -m reditools`
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes