biosyntax 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,154 @@
1
+ # ruby-biosyntax
2
+
3
+ [![CI](https://github.com/kojix2/ruby-biosyntax/actions/workflows/ci.yml/badge.svg)](https://github.com/kojix2/ruby-biosyntax/actions/workflows/ci.yml)
4
+ [![Lines of Code](https://img.shields.io/endpoint?url=https%3A%2F%2Ftokei.kojix2.net%2Fbadge%2Fgithub%2Fkojix2%2Fruby-biosyntax%2Flines)](https://tokei.kojix2.net/github/kojix2/ruby-biosyntax)
5
+
6
+
7
+ :dna: [bioSyntax](https://github.com/bioSyntax/bioSyntax) - Syntax highlighting for biological data formats - for Ruby.
8
+
9
+ Powered by [libbiosyntax](https://github.com/kojix2/libbiosyntax).
10
+
11
+ ## Installation
12
+
13
+ ```sh
14
+ gem install biosyntax
15
+ ```
16
+
17
+ ## ANSI coloring
18
+
19
+ ```ruby
20
+ require "biosyntax"
21
+
22
+ hl = BioSyntax.fastq
23
+
24
+ File.foreach("reads.fastq", chomp: false) do |line|
25
+ print hl.colorize(line)
26
+ end
27
+ ```
28
+
29
+ `colorize` returns a string with ANSI SGR escape sequences using the built-in
30
+ `libbiosyntax` colors.
31
+
32
+ `Highlighter` is stateful. Reuse one highlighter for one input stream,
33
+ especially for FASTQ and WIG. Use `reset` before starting another stream with
34
+ the same object.
35
+
36
+ ```ruby
37
+ hl = BioSyntax.fastq
38
+ # process one file...
39
+ hl.reset
40
+ # process another file...
41
+ ```
42
+
43
+ ## Highlight spans
44
+
45
+ ```ruby
46
+ require "biosyntax"
47
+
48
+ hl = BioSyntax.vcf
49
+ line = "chr1\t42\trs1\tA\tT\t99\tPASS\tDP=10;AF=0.5\n"
50
+
51
+ spans = hl.highlight(line)
52
+
53
+ spans.each do |span|
54
+ puts [span.start, span.end, span.kind.name, span.scope].join("\t")
55
+ end
56
+ ```
57
+
58
+ A span uses byte offsets into the input line:
59
+
60
+ ```ruby
61
+ span.start # byte offset at the start of the highlighted range
62
+ span.end # byte offset just after the highlighted range
63
+ span.length # byte length
64
+ span.kind # BioSyntax::Kind
65
+ span.scope # e.g. "biosyntax.chrom"
66
+ ```
67
+
68
+ ## Formats and metadata
69
+
70
+ Create highlighters with `BioSyntax.<format>` or `BioSyntax[format]`.
71
+ Hyphenated format names use underscores for factory methods.
72
+
73
+ ```ruby
74
+ BioSyntax.vcf
75
+ BioSyntax.fastq
76
+ BioSyntax.fasta_nt
77
+ BioSyntax[:"fasta-nt"]
78
+ BioSyntax["bam"] # canonical format is :sam
79
+ ```
80
+
81
+ Useful metadata:
82
+
83
+ ```ruby
84
+ BioSyntax::FORMAT_NAMES # array of canonical format names
85
+ BioSyntax::FORMATS # { name => BioSyntax::Format }
86
+ BioSyntax::KIND_NAMES # array of kind names
87
+ BioSyntax::KINDS # { name => BioSyntax::Kind }
88
+ BioSyntax::SCOPES # { scope => [BioSyntax::Kind, ...] }
89
+
90
+ BioSyntax::Format::VCF
91
+ BioSyntax::Kind::CHROM
92
+
93
+ BioSyntax.format_supported?(:vcf) # true
94
+ BioSyntax.format_name(:bam) # :sam
95
+ BioSyntax.guess_format("a.vcf.gz") # :vcf
96
+ ```
97
+
98
+ The metadata is generated from `libbiosyntax` at load time. The Ruby side does
99
+ not maintain a separate hand-written table of formats or kinds.
100
+
101
+ ## Examples
102
+
103
+ This gem does not install a CLI. See `examples/` for small scripts:
104
+
105
+ ```sh
106
+ ruby examples/bcat.rb sample.vcf
107
+ ruby examples/bcat.rb -l fastq reads.fastq
108
+ ruby examples/bcat.rb -l
109
+ ruby examples/inspect_spans.rb sample.vcf
110
+ ```
111
+
112
+ `bcat.rb` guesses the format from the file name when possible. Use `-l` /
113
+ `--language` to pass a format explicitly. Calling `-l` without an argument
114
+ prints the supported format names.
115
+
116
+ ## Development tasks
117
+
118
+ ```sh
119
+ bundle exec rake -T
120
+ bundle exec rake test
121
+ bundle exec rake build
122
+ bundle exec yard doc
123
+ ```
124
+
125
+ The native extension is built with `rake-compiler`. Temporary build products
126
+ are written under `tmp/`, and the compiled extension is copied to
127
+ `lib/biosyntax/`.
128
+
129
+ ## Updating vendored libbiosyntax
130
+
131
+ This gem vendors the C source of `libbiosyntax` and builds it into the Ruby extension.
132
+ It does not require a system `libbiosyntax` shared library.
133
+ The vendored C source lives under:
134
+
135
+ ```text
136
+ ext/biosyntax/biosyntax.c
137
+ ext/biosyntax/biosyntax.h
138
+ ```
139
+
140
+ When `libbiosyntax` is updated, refresh the vendored files and run the test suite:
141
+
142
+ ```sh
143
+ bundle exec rake update:libbiosyntax
144
+ bundle exec rake
145
+ ```
146
+
147
+ ## License
148
+
149
+ `biosyntax` vendors `libbiosyntax`, which is licensed under the GNU General
150
+ Public License version 3 only. This gem is therefore distributed under
151
+ `GPL-3.0-only`. See `LICENSE.md`.
152
+
153
+ This project is inspired by the original bioSyntax project:
154
+ <https://github.com/bioSyntax/bioSyntax>