biosyntax 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE.md +674 -0
- data/README.md +154 -0
- data/ext/biosyntax/biosyntax.c +620 -0
- data/ext/biosyntax/biosyntax.h +413 -0
- data/ext/biosyntax/biosyntax_ext.c +380 -0
- data/ext/biosyntax/extconf.rb +13 -0
- data/lib/biosyntax/version.rb +5 -0
- data/lib/biosyntax.rb +538 -0
- metadata +47 -0
data/README.md
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# ruby-biosyntax
|
|
2
|
+
|
|
3
|
+
[](https://github.com/kojix2/ruby-biosyntax/actions/workflows/ci.yml)
|
|
4
|
+
[](https://tokei.kojix2.net/github/kojix2/ruby-biosyntax)
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
:dna: [bioSyntax](https://github.com/bioSyntax/bioSyntax) - Syntax highlighting for biological data formats - for Ruby.
|
|
8
|
+
|
|
9
|
+
Powered by [libbiosyntax](https://github.com/kojix2/libbiosyntax).
|
|
10
|
+
|
|
11
|
+
## Installation
|
|
12
|
+
|
|
13
|
+
```sh
|
|
14
|
+
gem install biosyntax
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## ANSI coloring
|
|
18
|
+
|
|
19
|
+
```ruby
|
|
20
|
+
require "biosyntax"
|
|
21
|
+
|
|
22
|
+
hl = BioSyntax.fastq
|
|
23
|
+
|
|
24
|
+
File.foreach("reads.fastq", chomp: false) do |line|
|
|
25
|
+
print hl.colorize(line)
|
|
26
|
+
end
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
`colorize` returns a string with ANSI SGR escape sequences using the built-in
|
|
30
|
+
`libbiosyntax` colors.
|
|
31
|
+
|
|
32
|
+
`Highlighter` is stateful. Reuse one highlighter for one input stream,
|
|
33
|
+
especially for FASTQ and WIG. Use `reset` before starting another stream with
|
|
34
|
+
the same object.
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
hl = BioSyntax.fastq
|
|
38
|
+
# process one file...
|
|
39
|
+
hl.reset
|
|
40
|
+
# process another file...
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Highlight spans
|
|
44
|
+
|
|
45
|
+
```ruby
|
|
46
|
+
require "biosyntax"
|
|
47
|
+
|
|
48
|
+
hl = BioSyntax.vcf
|
|
49
|
+
line = "chr1\t42\trs1\tA\tT\t99\tPASS\tDP=10;AF=0.5\n"
|
|
50
|
+
|
|
51
|
+
spans = hl.highlight(line)
|
|
52
|
+
|
|
53
|
+
spans.each do |span|
|
|
54
|
+
puts [span.start, span.end, span.kind.name, span.scope].join("\t")
|
|
55
|
+
end
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
A span uses byte offsets into the input line:
|
|
59
|
+
|
|
60
|
+
```ruby
|
|
61
|
+
span.start # byte offset at the start of the highlighted range
|
|
62
|
+
span.end # byte offset just after the highlighted range
|
|
63
|
+
span.length # byte length
|
|
64
|
+
span.kind # BioSyntax::Kind
|
|
65
|
+
span.scope # e.g. "biosyntax.chrom"
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Formats and metadata
|
|
69
|
+
|
|
70
|
+
Create highlighters with `BioSyntax.<format>` or `BioSyntax[format]`.
|
|
71
|
+
Hyphenated format names use underscores for factory methods.
|
|
72
|
+
|
|
73
|
+
```ruby
|
|
74
|
+
BioSyntax.vcf
|
|
75
|
+
BioSyntax.fastq
|
|
76
|
+
BioSyntax.fasta_nt
|
|
77
|
+
BioSyntax[:"fasta-nt"]
|
|
78
|
+
BioSyntax["bam"] # canonical format is :sam
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Useful metadata:
|
|
82
|
+
|
|
83
|
+
```ruby
|
|
84
|
+
BioSyntax::FORMAT_NAMES # array of canonical format names
|
|
85
|
+
BioSyntax::FORMATS # { name => BioSyntax::Format }
|
|
86
|
+
BioSyntax::KIND_NAMES # array of kind names
|
|
87
|
+
BioSyntax::KINDS # { name => BioSyntax::Kind }
|
|
88
|
+
BioSyntax::SCOPES # { scope => [BioSyntax::Kind, ...] }
|
|
89
|
+
|
|
90
|
+
BioSyntax::Format::VCF
|
|
91
|
+
BioSyntax::Kind::CHROM
|
|
92
|
+
|
|
93
|
+
BioSyntax.format_supported?(:vcf) # true
|
|
94
|
+
BioSyntax.format_name(:bam) # :sam
|
|
95
|
+
BioSyntax.guess_format("a.vcf.gz") # :vcf
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
The metadata is generated from `libbiosyntax` at load time. The Ruby side does
|
|
99
|
+
not maintain a separate hand-written table of formats or kinds.
|
|
100
|
+
|
|
101
|
+
## Examples
|
|
102
|
+
|
|
103
|
+
This gem does not install a CLI. See `examples/` for small scripts:
|
|
104
|
+
|
|
105
|
+
```sh
|
|
106
|
+
ruby examples/bcat.rb sample.vcf
|
|
107
|
+
ruby examples/bcat.rb -l fastq reads.fastq
|
|
108
|
+
ruby examples/bcat.rb -l
|
|
109
|
+
ruby examples/inspect_spans.rb sample.vcf
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
`bcat.rb` guesses the format from the file name when possible. Use `-l` /
|
|
113
|
+
`--language` to pass a format explicitly. Calling `-l` without an argument
|
|
114
|
+
prints the supported format names.
|
|
115
|
+
|
|
116
|
+
## Development tasks
|
|
117
|
+
|
|
118
|
+
```sh
|
|
119
|
+
bundle exec rake -T
|
|
120
|
+
bundle exec rake test
|
|
121
|
+
bundle exec rake build
|
|
122
|
+
bundle exec yard doc
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
The native extension is built with `rake-compiler`. Temporary build products
|
|
126
|
+
are written under `tmp/`, and the compiled extension is copied to
|
|
127
|
+
`lib/biosyntax/`.
|
|
128
|
+
|
|
129
|
+
## Updating vendored libbiosyntax
|
|
130
|
+
|
|
131
|
+
This gem vendors the C source of `libbiosyntax` and builds it into the Ruby extension.
|
|
132
|
+
It does not require a system `libbiosyntax` shared library.
|
|
133
|
+
The vendored C source lives under:
|
|
134
|
+
|
|
135
|
+
```text
|
|
136
|
+
ext/biosyntax/biosyntax.c
|
|
137
|
+
ext/biosyntax/biosyntax.h
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
When `libbiosyntax` is updated, refresh the vendored files and run the test suite:
|
|
141
|
+
|
|
142
|
+
```sh
|
|
143
|
+
bundle exec rake update:libbiosyntax
|
|
144
|
+
bundle exec rake
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
## License
|
|
148
|
+
|
|
149
|
+
`biosyntax` vendors `libbiosyntax`, which is licensed under the GNU General
|
|
150
|
+
Public License version 3 only. This gem is therefore distributed under
|
|
151
|
+
`GPL-3.0-only`. See `LICENSE.md`.
|
|
152
|
+
|
|
153
|
+
This project is inspired by the original bioSyntax project:
|
|
154
|
+
<https://github.com/bioSyntax/bioSyntax>
|