awaaz 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +5 -1
- data/.ruby-version +1 -1
- data/CHANGELOG.md +13 -2
- data/GLOSSARY.md +7 -1
- data/README.md +6 -3
- data/TODOS.md +1 -2
- data/lib/awaaz/config.rb +22 -0
- data/lib/awaaz/decoders/base_decoder.rb +5 -5
- data/lib/awaaz/decoders/decode.rb +2 -2
- data/lib/awaaz/features.rb +533 -0
- data/lib/awaaz/properties.rb +37 -0
- data/lib/awaaz/utils/resample.rb +19 -18
- data/lib/awaaz/utils/sound_config.rb +10 -1
- data/lib/awaaz/utils/soundread.rb +81 -105
- data/lib/awaaz/utils/utils.rb +1 -0
- data/lib/awaaz/version.rb +1 -1
- data/lib/awaaz.rb +10 -3
- metadata +23 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 539ed57cbb902bb7b86939c6c9ccde9265cc8b28c4fee725359acdf918c9b2ff
|
4
|
+
data.tar.gz: ba30315c102903f622eead61d5a847d9c9b93e3d7a7cca790ebff995e909d014
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3a4fb5f88de016a035d06660deabc2472aaa423400cdf72acb7eb9440f23e8ab15606e74e23758ce812539e3878236b590580c9665614494631beba0e71d7dc3
|
7
|
+
data.tar.gz: b5ed50514a4c1a9be9c3008cbaa2661aaa1d69d6ba33dc3906db202ee10da20741e7cea4d488f27ce7029bfd40552edf4464f7debbf4bc37e47dd52cd4c4bcf2
|
data/.rubocop.yml
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
AllCops:
|
2
|
-
TargetRubyVersion: 3.
|
2
|
+
TargetRubyVersion: 3.0
|
3
3
|
NewCops: enable
|
4
4
|
|
5
5
|
Style/StringLiterals:
|
@@ -11,5 +11,9 @@ Style/StringLiteralsInInterpolation:
|
|
11
11
|
Metrics/MethodLength:
|
12
12
|
Max: 20
|
13
13
|
|
14
|
+
Style/NumericPredicate:
|
15
|
+
Enabled: false
|
14
16
|
|
17
|
+
Metrics/ModuleLength:
|
18
|
+
Enabled: false # Temporary
|
15
19
|
|
data/.ruby-version
CHANGED
@@ -1 +1 @@
|
|
1
|
-
3.
|
1
|
+
3.0.0
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,16 @@
|
|
1
|
-
## [
|
1
|
+
## [Released]
|
2
2
|
|
3
|
-
## [0.1.0] - 2025-
|
3
|
+
## [0.1.0] - 2025-08-12
|
4
4
|
|
5
5
|
- Initial release
|
6
|
+
- Ability to decode `.wav` and `.mp3`.
|
7
|
+
|
8
|
+
## 0.2.0 - 2025-08-26
|
9
|
+
|
10
|
+
- Introduced new features for audio analysis:
|
11
|
+
- RMS (Root Mean Square)
|
12
|
+
- Zero Crossing Rate
|
13
|
+
- Spectral Centroid
|
14
|
+
- Spectral Bandwidth
|
15
|
+
- Spectral Rolloff
|
16
|
+
- Spectral Flatness
|
data/GLOSSARY.md
CHANGED
@@ -1,3 +1,9 @@
|
|
1
1
|
# Terms and Definitions for Audio Processing
|
2
2
|
|
3
|
-
- **PCM (Pulse Code Modulation):** A method to convert analog audio signals into digital form by sampling the signal's
|
3
|
+
- **PCM (Pulse Code Modulation):** A method to convert analog audio signals into digital form by sampling the signal's amplitude at regular intervals.
|
4
|
+
- **RMS (Root Mean Square)**: Basically measures the average signal's power or loudness of time.
|
5
|
+
- **Spectral Bandwidth:** Calculation of variation of frequencies around the spectral centroid of the audio. Low bandwidth indicates low variation in audio and the audio is concentrated around the centroid. Like a flute note. Higher bandwidth highlights noisy, loud sound, like a distorted guitar.
|
6
|
+
- **Spectral Centroid**: It tells us about the 'center of mass' of the sound. Intuitively, lower spectral centroid score means bassier, muffled sound while high centroid value indicates bright, sharp, tinny audio.
|
7
|
+
- **Spectral Flatness**: Can be used to identify the noisiness of audio. High flatness (~1) indicates high energy, white noise-like sound. Low value (~0) highlights harmonic signal or pure tone.
|
8
|
+
- **Spectral Rolloff:** Measures the frequency below which a certain percentage of the total spectral energy is contained. Low rolloff - more energy is concentrated in lower frequencies, like drums, bass, male voices. High rolloff - significant energy in high frequencies like female voice, hissing sound etc.
|
9
|
+
- **ZCR (Zero Crossing Rate)**: Counts how many times the audio changes signal from positive to negative and vice versa. If ZCR is high, the audio is noisy, sharp or high-pitched. And an audio with low ZCR is smooth, steady or low-pitched.
|
data/README.md
CHANGED
@@ -52,11 +52,14 @@ gem install awaaz
|
|
52
52
|
```ruby
|
53
53
|
# To decode the audio file
|
54
54
|
samples, sample_rate = Awaaz.load("path/to/audio_file")
|
55
|
-
|
56
|
-
# To decode the audio file using specified decoder
|
57
|
-
samples, sample_rate = Awaaz.load("path/to/audio_file", decoder: :sox)
|
58
55
|
```
|
59
56
|
|
57
|
+
## Documentation
|
58
|
+
|
59
|
+
[Documentation](https://www.rubydoc.info/github/SadMadLad/awaaz)
|
60
|
+
|
61
|
+
Checkout [this demo](https://github.com/SadMadLad/awaaz-demo) to get more idea of some use cases of the gem
|
62
|
+
|
60
63
|
## Development
|
61
64
|
|
62
65
|
After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
data/TODOS.md
CHANGED
data/lib/awaaz/config.rb
CHANGED
@@ -76,6 +76,28 @@ module Awaaz
|
|
76
76
|
@available_decoders.nil? || @available_decoders.empty?
|
77
77
|
end
|
78
78
|
|
79
|
+
##
|
80
|
+
# Checks if there is at least one decoder capable of handling WAV files.
|
81
|
+
#
|
82
|
+
# Currently, `ffmpeg` and `sox` are considered capable of decoding WAV files.
|
83
|
+
#
|
84
|
+
# @return [Boolean] `true` if either `ffmpeg` or `sox` is available, otherwise `false`.
|
85
|
+
#
|
86
|
+
def decoders_for_wav?
|
87
|
+
ffmpeg? || sox?
|
88
|
+
end
|
89
|
+
|
90
|
+
##
|
91
|
+
# Checks if there are no decoders available for handling WAV files.
|
92
|
+
#
|
93
|
+
# This is the logical negation of {#decoders_for_wav?}.
|
94
|
+
#
|
95
|
+
# @return [Boolean] `true` if neither `ffmpeg` nor `sox` is available, otherwise `false`.
|
96
|
+
#
|
97
|
+
def no_decoders_for_wav?
|
98
|
+
!decoders_for_wav?
|
99
|
+
end
|
100
|
+
|
79
101
|
private
|
80
102
|
|
81
103
|
##
|
@@ -41,9 +41,9 @@ module Awaaz
|
|
41
41
|
set_available_options
|
42
42
|
|
43
43
|
# @param filename [String] Path to the audio file to decode.
|
44
|
-
def initialize(filename, **)
|
44
|
+
def initialize(filename, **options)
|
45
45
|
@filename = filename
|
46
|
-
@options = Utils::SoundConfig.new(available_options, **)
|
46
|
+
@options = Utils::SoundConfig.new(available_options, **options)
|
47
47
|
end
|
48
48
|
|
49
49
|
# Loads audio data.
|
@@ -72,7 +72,7 @@ module Awaaz
|
|
72
72
|
# - number of channels
|
73
73
|
# - sample rate
|
74
74
|
def soundread
|
75
|
-
Utils::Soundread.new(@filename).read
|
75
|
+
Utils::Soundread.new(@filename, output_rate: sample_rate, sampling_option: resampling_option).read
|
76
76
|
end
|
77
77
|
|
78
78
|
# Processes the decoded audio samples by reshaping and optionally converting to mono.
|
@@ -83,7 +83,7 @@ module Awaaz
|
|
83
83
|
# @return [Array<(Numo::DFloat, Integer)>] Processed samples and the sample rate.
|
84
84
|
def process(input_samples, channels, sample_rate)
|
85
85
|
input_samples = input_samples.reshape(channels, input_samples.size / channels)
|
86
|
-
input_samples = input_samples.mean(0) if mono?
|
86
|
+
input_samples = input_samples.mean(0).reshape(1, input_samples.shape[1]) if mono?
|
87
87
|
|
88
88
|
[input_samples, sample_rate]
|
89
89
|
end
|
@@ -107,7 +107,7 @@ module Awaaz
|
|
107
107
|
# Delegates option accessors to the {Utils::SoundConfig} instance.
|
108
108
|
%i[
|
109
109
|
sample_rate num_channels decoder_option mono mono?
|
110
|
-
stereo? amplification_factor soundread?
|
110
|
+
stereo? amplification_factor soundread? resampling_option
|
111
111
|
].each do |option_key|
|
112
112
|
define_method(option_key) { @options.public_send(option_key) }
|
113
113
|
end
|
@@ -25,7 +25,7 @@ module Awaaz
|
|
25
25
|
# @param filename [String] the path to the audio file
|
26
26
|
# @raise [ArgumentError] if the MIME type is not supported
|
27
27
|
# @return [Object] the result of decoding, as returned by the decoder class
|
28
|
-
def load(filename)
|
28
|
+
def load(filename, ...)
|
29
29
|
fm = FileMagic.new(FileMagic::MAGIC_MIME_TYPE)
|
30
30
|
mime_type = fm.file(filename)
|
31
31
|
|
@@ -35,7 +35,7 @@ module Awaaz
|
|
35
35
|
end
|
36
36
|
|
37
37
|
decoding_class = DECODER_MAP[mime_type]
|
38
|
-
decoding_class.load(filename)
|
38
|
+
decoding_class.load(filename, ...)
|
39
39
|
end
|
40
40
|
end
|
41
41
|
end
|
@@ -0,0 +1,533 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Awaaz
|
4
|
+
# Audio Features
|
5
|
+
module Features
|
6
|
+
##
|
7
|
+
# Calculates the total number of frames for a given signal length, frame size, and hop length.
|
8
|
+
#
|
9
|
+
# @param signal_length [Integer] Number of samples in the signal.
|
10
|
+
# @param frame_size [Integer] Size of each analysis frame (in samples).
|
11
|
+
# @param hop_length [Integer] Step size between consecutive frames (in samples).
|
12
|
+
#
|
13
|
+
# @return [Integer] The total number of frames.
|
14
|
+
#
|
15
|
+
def total_frames(signal_length, frame_size, hop_length)
|
16
|
+
((signal_length - frame_size) / hop_length.to_f).ceil + 1
|
17
|
+
end
|
18
|
+
|
19
|
+
##
|
20
|
+
# Computes how many samples are needed to right-pad a signal so
|
21
|
+
# that its length perfectly fits the given frame and hop size.
|
22
|
+
#
|
23
|
+
# @param signal_length [Integer] Number of samples in the signal.
|
24
|
+
# @param frame_size [Integer] Size of each analysis frame (in samples).
|
25
|
+
# @param hop_length [Integer] Step size between consecutive frames (in samples).
|
26
|
+
#
|
27
|
+
# @return [Integer] Number of padding samples required.
|
28
|
+
#
|
29
|
+
def pad_amount(signal_length, frame_size, hop_length)
|
30
|
+
frames = total_frames(signal_length, frame_size, hop_length)
|
31
|
+
padded_length = ((frames - 1) * hop_length) + frame_size
|
32
|
+
padded_length - signal_length
|
33
|
+
end
|
34
|
+
|
35
|
+
##
|
36
|
+
# Pads an array with zeros (or a specified value) along a given axis.
|
37
|
+
#
|
38
|
+
# @param array [Numo::NArray] The input array (e.g., shape [channels, samples]).
|
39
|
+
# @param pad_count [Integer] Number of padding elements to add.
|
40
|
+
# @param axis [Integer] Axis along which to pad (default: 1 for time axis).
|
41
|
+
# @param with [Numeric] Value to pad with (default: 0).
|
42
|
+
#
|
43
|
+
# @return [Numo::NArray] The padded array.
|
44
|
+
#
|
45
|
+
def pad_right(array, pad_count, axis: 1, with: 0)
|
46
|
+
channels_count = array.shape.first
|
47
|
+
padded_array = Numo::SFloat.new(channels_count, pad_count).fill(with)
|
48
|
+
|
49
|
+
array.concatenate(padded_array, axis: axis)
|
50
|
+
end
|
51
|
+
|
52
|
+
##
|
53
|
+
# Builds a list of sample index ranges for each analysis frame.
|
54
|
+
#
|
55
|
+
# @param signal_length [Integer] Number of samples in the (possibly padded) signal.
|
56
|
+
# @param frame_size [Integer] Size of each frame (in samples).
|
57
|
+
# @param hop_length [Integer] Step size between consecutive frames (in samples).
|
58
|
+
#
|
59
|
+
# @return [Array<Range>] An array where each element is the sample index range for one frame.
|
60
|
+
#
|
61
|
+
def build_ranges(signal_length, frame_size, hop_length)
|
62
|
+
ranges = []
|
63
|
+
start = 0
|
64
|
+
while start + frame_size <= signal_length
|
65
|
+
ranges << (start...(start + frame_size))
|
66
|
+
start += hop_length
|
67
|
+
end
|
68
|
+
ranges
|
69
|
+
end
|
70
|
+
|
71
|
+
##
|
72
|
+
# Pads the signal (if necessary) and returns the padded array along with frame index ranges.
|
73
|
+
#
|
74
|
+
# @param array [Numo::NArray] A 2D array where shape is [channels, samples].
|
75
|
+
# @param frame_size [Integer] Size of each frame (in samples).
|
76
|
+
# @param hop_length [Integer] Step size between consecutive frames (in samples).
|
77
|
+
#
|
78
|
+
# @raise [ArgumentError] If hop length is less than 1.
|
79
|
+
#
|
80
|
+
# @return [Array<(Numo::NArray, Array<Range>)>]
|
81
|
+
# - padded signal array
|
82
|
+
# - array of frame index ranges
|
83
|
+
#
|
84
|
+
def frame_ranges(array, frame_size: 2048, hop_length: 512)
|
85
|
+
raise ArgumentError, "Hop Length can't be less than 1" if hop_length < 1
|
86
|
+
|
87
|
+
amount = pad_amount(array.shape[1], frame_size, hop_length)
|
88
|
+
array = pad_right(array, amount) if amount.positive?
|
89
|
+
|
90
|
+
[array, build_ranges(array.shape[1], frame_size, hop_length)]
|
91
|
+
end
|
92
|
+
|
93
|
+
##
|
94
|
+
# Calculates the RMS (Root Mean Square) energy for each frame in the given audio.
|
95
|
+
#
|
96
|
+
# @param samples [Numo::NArray] A 2D array of shape [channels, samples].
|
97
|
+
# @param frame_size [Integer] Size of each analysis frame (in samples).
|
98
|
+
# @param hop_length [Integer] Step size between consecutive frames (in samples).
|
99
|
+
#
|
100
|
+
# @return [Numo::SFloat] A 2D array of RMS values with shape [channels, frames].
|
101
|
+
#
|
102
|
+
def rms(samples, frame_size: 2048, hop_length: 512)
|
103
|
+
samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
|
104
|
+
|
105
|
+
means = Numo::SFloat.zeros(samples.shape[0], frame_groups.length)
|
106
|
+
frame_groups.each_with_index do |frame_range, idx|
|
107
|
+
means[true, idx] = samples[true, frame_range].rms(axis: 1)
|
108
|
+
end
|
109
|
+
|
110
|
+
means
|
111
|
+
end
|
112
|
+
|
113
|
+
##
|
114
|
+
# Calculates the overall RMS for an entire signal without framing.
|
115
|
+
#
|
116
|
+
# @param samples [Numo::NArray] A 2D or 1D array of samples.
|
117
|
+
#
|
118
|
+
# @return [Float] RMS value for the entire signal.
|
119
|
+
#
|
120
|
+
def rms_overall(samples)
|
121
|
+
samples.rms
|
122
|
+
end
|
123
|
+
|
124
|
+
# Calculates the zero-crossing rate (ZCR) of an audio signal frame-by-frame.
|
125
|
+
#
|
126
|
+
# The zero-crossing rate is the proportion of consecutive samples in a frame
|
127
|
+
# where the signal changes sign (positive to negative or vice versa).
|
128
|
+
# It is often used as a simple feature in speech/music analysis.
|
129
|
+
#
|
130
|
+
# @param samples [Numo::NArray] 2D array of audio samples.
|
131
|
+
# Shape: [n_channels, n_samples].
|
132
|
+
# @param frame_size [Integer] Size of each analysis frame in samples. Default: 2048.
|
133
|
+
# @param hop_length [Integer] Step size between successive frames in samples. Default: 512.
|
134
|
+
# @return [Numo::SFloat] 2D array of zero-crossing rates per frame for each channel.
|
135
|
+
# Shape: [n_channels, n_frames].
|
136
|
+
#
|
137
|
+
# @example
|
138
|
+
# # Stereo signal: 2 channels, 44100 samples
|
139
|
+
# zcr_values = zcr(samples, frame_size: 2048, hop_length: 512)
|
140
|
+
# puts zcr_values.shape # => [2, n_frames]
|
141
|
+
#
|
142
|
+
def zcr(samples, frame_size: 2048, hop_length: 512)
|
143
|
+
framed_samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
|
144
|
+
|
145
|
+
n_channels = framed_samples.shape[0]
|
146
|
+
zcrs = Numo::SFloat.zeros(n_channels, frame_groups.length)
|
147
|
+
|
148
|
+
frame_groups.each_with_index do |frame_range, idx|
|
149
|
+
zcrs[true, idx] = zcr_for_frame(framed_samples[true, frame_range], frame_size)
|
150
|
+
end
|
151
|
+
|
152
|
+
zcrs
|
153
|
+
end
|
154
|
+
|
155
|
+
# Calculates the zero-crossing rate for a single frame of audio.
|
156
|
+
#
|
157
|
+
# @param frame [Numo::NArray] 2D array containing audio samples for a single frame.
|
158
|
+
# Shape: [n_channels, frame_size].
|
159
|
+
# @param frame_size [Integer] Number of samples in the frame.
|
160
|
+
# @return [Numo::SFloat] 1D array of zero-crossing rates for each channel in the frame.
|
161
|
+
# Shape: [n_channels].
|
162
|
+
#
|
163
|
+
# @example
|
164
|
+
# frame = samples[true, 0...2048]
|
165
|
+
# single_frame_zcr = zcr_for_frame(frame, 2048)
|
166
|
+
# puts single_frame_zcr # => Numo::SFloat[0.15, 0.12]
|
167
|
+
def zcr_for_frame(frame, frame_size)
|
168
|
+
first_part = frame[true, 0...-1]
|
169
|
+
second_part = frame[true, 1..-1]
|
170
|
+
products = first_part * second_part
|
171
|
+
|
172
|
+
sign_changes = products < 0
|
173
|
+
counts = sign_changes.count_true(axis: 1)
|
174
|
+
|
175
|
+
counts / frame_size.to_f
|
176
|
+
end
|
177
|
+
|
178
|
+
# Calculates the overall zero-crossing rate (ZCR) of an entire audio signal.
|
179
|
+
#
|
180
|
+
# @param samples [Numo::NArray] 2D array of audio samples.
|
181
|
+
# Shape: [n_channels, n_samples].
|
182
|
+
# @return [Numo::SFloat] 1D array containing the overall ZCR for each channel.
|
183
|
+
# Shape: [n_channels].
|
184
|
+
#
|
185
|
+
# @example
|
186
|
+
# # Stereo signal: 2 channels, 44100 samples
|
187
|
+
# overall_zcr = zcr_overall(samples)
|
188
|
+
# puts overall_zcr.shape # => [2]
|
189
|
+
#
|
190
|
+
#
|
191
|
+
def zcr_overall(samples)
|
192
|
+
((samples[true, 0...-1] * samples[true, 1..-1]) < 0).count_true(axis: 1) / samples.shape[1].to_f
|
193
|
+
end
|
194
|
+
|
195
|
+
# Generates a Hann window of given frame size.
|
196
|
+
#
|
197
|
+
# A Hann window is commonly used in spectral analysis
|
198
|
+
# to reduce spectral leakage before applying an FFT.
|
199
|
+
#
|
200
|
+
# @param frame_size [Integer] the size of the frame (number of samples per window)
|
201
|
+
# @return [Numo::DFloat] the Hann window of length `frame_size`
|
202
|
+
def hann_window(frame_size)
|
203
|
+
idx = Numo::DFloat.new(frame_size).seq
|
204
|
+
0.5 * (1 - Numo::NMath.cos(2 * Math::PI * idx / (frame_size - 1)))
|
205
|
+
end
|
206
|
+
|
207
|
+
# Prepares audio samples and parameters for FFT-based feature extraction.
|
208
|
+
#
|
209
|
+
# @param samples [Numo::NArray]
|
210
|
+
# Multichannel audio samples as a 2D array
|
211
|
+
# (shape: [channels, samples]).
|
212
|
+
# @param frame_size [Integer]
|
213
|
+
# Number of samples per frame (FFT window length).
|
214
|
+
# @param hop_length [Integer]
|
215
|
+
# Number of samples to shift between consecutive frames.
|
216
|
+
#
|
217
|
+
# @return [Array]
|
218
|
+
# A tuple containing:
|
219
|
+
# - samples [Numo::NArray] : Windowed audio samples aligned to frames
|
220
|
+
# - ranges [Array<Range>] : Frame index ranges for iteration
|
221
|
+
# - window [Numo::DFloat] : Hann window for FFT
|
222
|
+
# - channels_count [Integer] : Number of audio channels
|
223
|
+
# - freqs_size [Integer] : Number of FFT frequency bins per frame
|
224
|
+
#
|
225
|
+
# @example
|
226
|
+
# samples, ranges, window, channels_count, freqs_size =
|
227
|
+
# prepare_for_fft(audio, frame_size: 2048, hop_length: 512)
|
228
|
+
#
|
229
|
+
def prepare_for_fft(samples, frame_size:, hop_length:)
|
230
|
+
samples, ranges = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
|
231
|
+
window = hann_window(frame_size)
|
232
|
+
channels_count = samples.shape[0]
|
233
|
+
freqs_size = (frame_size / 2) + 1
|
234
|
+
|
235
|
+
[samples, ranges, window, channels_count, freqs_size]
|
236
|
+
end
|
237
|
+
|
238
|
+
# Computes the Short-Time Fourier Transform (STFT) of a multi-channel signal.
|
239
|
+
#
|
240
|
+
# This method applies a sliding Hann window to the input signal, computes
|
241
|
+
# the FFT for each frame and each channel, and stores the positive frequency
|
242
|
+
# bins into a 3D complex-valued matrix.
|
243
|
+
#
|
244
|
+
# The resulting STFT matrix has dimensions:
|
245
|
+
# `[channels, frequencies, frames]`
|
246
|
+
#
|
247
|
+
# @param samples [Numo::NArray] a 2D array of shape [channels, samples]
|
248
|
+
# containing the audio data.
|
249
|
+
# @param frame_size [Integer] the size of each FFT frame (default: 2048)
|
250
|
+
# @param hop_length [Integer] the number of samples between successive frames (default: 512)
|
251
|
+
# @return [Numo::DComplex] a 3D array of shape
|
252
|
+
# `[channels, (frame_size / 2 + 1), frames]` containing the complex STFT values
|
253
|
+
#
|
254
|
+
# @example Compute STFT for mono audio
|
255
|
+
# samples = Numo::DFloat[[0.0, 1.0, 0.0, -1.0, ...]] # shape: [1, num_samples]
|
256
|
+
# stft_matrix = stft(samples, frame_size: 1024, hop_length: 256)
|
257
|
+
#
|
258
|
+
def stft(samples, frame_size: 2048, hop_length: 512)
|
259
|
+
samples, ranges, window, channels_count, freqs_size = prepare_for_fft(samples, frame_size: frame_size,
|
260
|
+
hop_length: hop_length)
|
261
|
+
stft_matrix = Numo::DComplex.zeros(channels_count, freqs_size, ranges.size)
|
262
|
+
|
263
|
+
ranges.each_with_index do |range, frame_idx|
|
264
|
+
channels_count.times do |ch|
|
265
|
+
fft_result = Numo::Pocketfft.fft(samples[ch, range] * window)
|
266
|
+
stft_matrix[ch, true, frame_idx] = fft_result[0...freqs_size]
|
267
|
+
end
|
268
|
+
end
|
269
|
+
|
270
|
+
stft_matrix
|
271
|
+
end
|
272
|
+
|
273
|
+
##
|
274
|
+
# Computes the FFT (Fast Fourier Transform) of each channel
|
275
|
+
# in a multi-channel signal using a Hann window.
|
276
|
+
#
|
277
|
+
# @param samples [Numo::NArray] A 2D array of shape [channels, samples]
|
278
|
+
# containing the audio data.
|
279
|
+
#
|
280
|
+
# @return [Numo::DComplex] A 2D complex array of shape
|
281
|
+
# `[channels, samples]` containing the FFT result for each channel.
|
282
|
+
#
|
283
|
+
def fft(samples)
|
284
|
+
window = hann_window(samples.shape[1])
|
285
|
+
channels_count = samples.shape[0]
|
286
|
+
fft_results = channels_count.times.map do |ch|
|
287
|
+
Numo::Pocketfft.fft(samples[ch, true] * window)
|
288
|
+
end
|
289
|
+
Numo::DComplex[*fft_results]
|
290
|
+
end
|
291
|
+
|
292
|
+
##
|
293
|
+
# Computes the frequency bin centers for an FFT.
|
294
|
+
#
|
295
|
+
# @param frame_size [Integer] The size of the FFT frame (in samples).
|
296
|
+
# @param sample_rate [Integer] The sampling rate of the audio (Hz).
|
297
|
+
#
|
298
|
+
# @return [Numo::DFloat] 1D array of frequency values (Hz)
|
299
|
+
# corresponding to FFT bins. Shape: `[frame_size/2 + 1]`.
|
300
|
+
#
|
301
|
+
def frequency_bins(frame_size, sample_rate)
|
302
|
+
Numo::DFloat.new((frame_size / 2) + 1).seq * (sample_rate.to_f / frame_size)
|
303
|
+
end
|
304
|
+
|
305
|
+
##
|
306
|
+
# Computes the magnitude spectrum of a single frame using an FFT.
|
307
|
+
#
|
308
|
+
# @param frame [Numo::NArray] 1D array of audio samples for a single frame.
|
309
|
+
#
|
310
|
+
# @return [Numo::DFloat] 1D array of magnitude values for each FFT bin.
|
311
|
+
#
|
312
|
+
def frame_magnitude(frame)
|
313
|
+
Numo::Pocketfft.rfft(frame).abs
|
314
|
+
end
|
315
|
+
|
316
|
+
##
|
317
|
+
# Computes the spectral centroid of a single frame.
|
318
|
+
#
|
319
|
+
# The spectral centroid is the "center of mass" of the spectrum
|
320
|
+
# and is often associated with the perceived brightness of a sound.
|
321
|
+
#
|
322
|
+
# @param freqs [Numo::DFloat] 1D array of frequency bin centers.
|
323
|
+
# @param magnitude [Numo::DFloat] 1D array of magnitude values
|
324
|
+
# corresponding to each frequency bin.
|
325
|
+
#
|
326
|
+
# @return [Float] The spectral centroid in Hz for the given frame.
|
327
|
+
#
|
328
|
+
def compute_centroid(freqs, magnitude)
|
329
|
+
mag_sum = magnitude.sum
|
330
|
+
return 0 if mag_sum.zero?
|
331
|
+
|
332
|
+
(freqs * magnitude).sum / mag_sum
|
333
|
+
end
|
334
|
+
|
335
|
+
##
|
336
|
+
# Computes the spectral centroid trajectory of an audio signal.
|
337
|
+
#
|
338
|
+
# This method frames the signal, applies a Hann window,
|
339
|
+
# computes the FFT magnitudes, and calculates the centroid
|
340
|
+
# for each frame. The result is a time series of centroids.
|
341
|
+
#
|
342
|
+
# @param samples [Numo::NArray] A 2D array of shape [channels, samples].
|
343
|
+
# @param frame_size [Integer] Size of each analysis frame (default: 2048).
|
344
|
+
# @param hop_length [Integer] Step size between frames in samples (default: 512).
|
345
|
+
# @param sample_rate [Integer] Sampling rate of the audio in Hz (default: 22050).
|
346
|
+
#
|
347
|
+
# @return [Numo::DFloat] 2D array of spectral centroids with shape
|
348
|
+
# `[channels, n_frames]`.
|
349
|
+
#
|
350
|
+
# @example
|
351
|
+
# centroids = spectral_centroids(samples, frame_size: 1024, hop_length: 256, sample_rate: 44100)
|
352
|
+
# puts centroids.shape # => [channels, n_frames]
|
353
|
+
#
|
354
|
+
def spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050)
|
355
|
+
samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length)
|
356
|
+
freqs = frequency_bins(frame_size, sample_rate)
|
357
|
+
centroid_matrix = Numo::DFloat.zeros(channels_count, ranges.size)
|
358
|
+
|
359
|
+
ranges.each_with_index do |range, frame_idx|
|
360
|
+
channels_count.times do |ch|
|
361
|
+
frame = samples[ch, range] * window
|
362
|
+
magnitude = frame_magnitude(frame)
|
363
|
+
centroid_matrix[ch, frame_idx] = compute_centroid(freqs, magnitude)
|
364
|
+
end
|
365
|
+
end
|
366
|
+
|
367
|
+
centroid_matrix
|
368
|
+
end
|
369
|
+
|
370
|
+
# Computes the bandwidth for a single frame.
|
371
|
+
#
|
372
|
+
# @param freqs [Numo::DFloat] Frequency bins (Hz)
|
373
|
+
# @param magnitude [Numo::DFloat] Magnitude spectrum for the frame
|
374
|
+
# @param centroid [Float] Spectral centroid for the frame (Hz)
|
375
|
+
# @param power [Integer] Power/exponent used for bandwidth calculation (commonly 2)
|
376
|
+
# @return [Float] Spectral bandwidth for the frame
|
377
|
+
def compute_bandwidth(freqs, magnitude, centroid, power)
|
378
|
+
mag_sum = magnitude.sum
|
379
|
+
return 0 if mag_sum.zero?
|
380
|
+
|
381
|
+
diff = (freqs - centroid).abs**power
|
382
|
+
value = (magnitude * diff).sum / mag_sum
|
383
|
+
value**(1.0 / power)
|
384
|
+
end
|
385
|
+
|
386
|
+
# Computes the spectral bandwidth over time for a signal.
|
387
|
+
#
|
388
|
+
# @param samples [Numo::DFloat] Input samples (channels x samples)
|
389
|
+
# @param frame_size [Integer] FFT window size (default: 2048)
|
390
|
+
# @param hop_length [Integer] Step size between frames (default: 512)
|
391
|
+
# @param sample_rate [Integer] Sampling rate of the audio signal (default: 22050 Hz)
|
392
|
+
# @param power [Integer] Exponent for bandwidth calculation (default: 2)
|
393
|
+
# @return [Numo::DFloat] Spectral bandwidth matrix (channels x frames)
|
394
|
+
def spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2)
|
395
|
+
samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length)
|
396
|
+
freqs = frequency_bins(frame_size, sample_rate)
|
397
|
+
bandwidth_matrix = Numo::DFloat.zeros(channels_count, ranges.size)
|
398
|
+
|
399
|
+
ranges.each_with_index do |range, frame_idx|
|
400
|
+
channels_count.times do |ch|
|
401
|
+
magnitude = frame_magnitude(samples[ch, range] * window)
|
402
|
+
centroid = compute_centroid(freqs, magnitude)
|
403
|
+
bandwidth_matrix[ch, frame_idx] = compute_bandwidth(freqs, magnitude, centroid, power)
|
404
|
+
end
|
405
|
+
end
|
406
|
+
|
407
|
+
bandwidth_matrix
|
408
|
+
end
|
409
|
+
|
410
|
+
# Computes the spectral rolloff for a single frame.
|
411
|
+
#
|
412
|
+
# @param spectrum [Numo::DFloat] Magnitude spectrum for the frame
|
413
|
+
# @param freqs [Numo::DFloat] Frequency bins (Hz)
|
414
|
+
# @param threshold [Float] Proportion of spectral energy to retain (default: 0.85)
|
415
|
+
# @return [Float] Roll-off frequency (Hz) for the frame
|
416
|
+
def rolloff_for_frame(spectrum, freqs, threshold)
|
417
|
+
total_energy = spectrum.sum
|
418
|
+
return 0.0 if total_energy.zero?
|
419
|
+
|
420
|
+
cumsum = spectrum.cumsum
|
421
|
+
threshold_energy = threshold * total_energy
|
422
|
+
|
423
|
+
rolloff_bin = cumsum.ge(threshold_energy).where[0]
|
424
|
+
rolloff_bin ||= freqs.size - 1
|
425
|
+
|
426
|
+
freqs[rolloff_bin]
|
427
|
+
end
|
428
|
+
|
429
|
+
# Computes the spectral rolloff over time for a signal.
|
430
|
+
#
|
431
|
+
# Spectral rolloff is the frequency below which a fixed percentage
|
432
|
+
# (threshold) of the total spectral energy is contained.
|
433
|
+
#
|
434
|
+
# @param samples [Numo::DFloat] Input samples (channels x samples)
|
435
|
+
# @param frame_size [Integer] FFT window size (default: 2048)
|
436
|
+
# @param hop_length [Integer] Step size between frames (default: 512)
|
437
|
+
# @param sample_rate [Integer] Sampling rate of the audio signal (default: 22050 Hz)
|
438
|
+
# @param threshold [Float] Proportion of spectral energy to retain (default: 0.85)
|
439
|
+
# @return [Numo::DFloat] Spectral rolloff matrix (channels x frames)
|
440
|
+
def spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85)
|
441
|
+
stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs
|
442
|
+
channels, _freqs_size, frames_size = stft_matrix.shape
|
443
|
+
freqs = frequency_bins(frame_size, sample_rate)
|
444
|
+
|
445
|
+
rolloff_matrix = Numo::DFloat.zeros(channels, frames_size)
|
446
|
+
|
447
|
+
frames_size.times do |frame_idx|
|
448
|
+
channels.times do |ch|
|
449
|
+
rolloff_matrix[ch, frame_idx] = rolloff_for_frame(
|
450
|
+
stft_matrix[ch, true, frame_idx], freqs, threshold
|
451
|
+
)
|
452
|
+
end
|
453
|
+
end
|
454
|
+
|
455
|
+
rolloff_matrix
|
456
|
+
end
|
457
|
+
|
458
|
+
# Convert frame indices to time in seconds.
|
459
|
+
#
|
460
|
+
# This method maps analysis frame indices (or total frame count) into
|
461
|
+
# corresponding time positions in seconds, similar to `librosa.frames_to_time`.
|
462
|
+
#
|
463
|
+
# @param frames [Integer, Numo::NArray] Either a single frame index,
|
464
|
+
# or a Numo array of shape (n_channels, n_frames) from which the total
|
465
|
+
# number of frames is inferred.
|
466
|
+
# @param hop_length [Integer] Number of audio samples between adjacent frames.
|
467
|
+
# Defaults to 512.
|
468
|
+
# @param sample_rate [Integer] Sampling rate of the audio signal in Hz.
|
469
|
+
# Defaults to 22,050 Hz.
|
470
|
+
#
|
471
|
+
# @return [Numo::DFloat] A 1-D Numo array of times (in seconds) corresponding
|
472
|
+
# to each frame index. If `frames` is an Integer, the return value spans
|
473
|
+
# from frame 0 up to `frames - 1`. If `frames` is a Numo array, the return
|
474
|
+
# value spans the number of frames inferred from `frames.shape[1]`.
|
475
|
+
#
|
476
|
+
# @example Using total frame count
|
477
|
+
# frames_to_time(100, hop_length: 512, sample_rate: 22050)
|
478
|
+
# # => Numo::DFloat[0.0, 0.0232, ..., 2.3121]
|
479
|
+
#
|
480
|
+
# @example Using a spectrogram matrix
|
481
|
+
# samples = Numo::DFloat.new(2, 500) # 2 channels, 500 frames
|
482
|
+
# frames_to_time(samples, hop_length: 512, sample_rate: 22050)
|
483
|
+
# # => Numo::DFloat[0.0, 0.0232, ..., 11.61]
|
484
|
+
#
|
485
|
+
def frames_to_time(frames, hop_length: 512, sample_rate: 22_050)
|
486
|
+
frames_size = frames.shape[1] unless frames.is_a?(Integer)
|
487
|
+
Numo::DFloat[0...frames_size] * hop_length / sample_rate.to_f
|
488
|
+
end
|
489
|
+
|
490
|
+
##
|
491
|
+
# Computes the spectral flatness of an audio signal.
|
492
|
+
#
|
493
|
+
# Spectral flatness measures how noise-like a signal is, as opposed to being tone-like.
|
494
|
+
# A value closer to 1.0 indicates the spectrum is flat (similar to white noise),
|
495
|
+
# while values closer to 0.0 indicate a peaky spectrum (like a sine wave or harmonic-rich signal).
|
496
|
+
#
|
497
|
+
# @param samples [Numo::NArray]
|
498
|
+
# The input audio samples (1D array).
|
499
|
+
#
|
500
|
+
# @param frame_size [Integer] (2048)
|
501
|
+
# The size of each FFT window (frame). Larger sizes give better frequency
|
502
|
+
# resolution but worse time resolution.
|
503
|
+
#
|
504
|
+
# @param hop_length [Integer] (512)
|
505
|
+
# The number of samples to shift between consecutive FFT frames. Smaller values
|
506
|
+
# provide more overlap and smoother results.
|
507
|
+
#
|
508
|
+
# @param amin [Float] (1e-10)
|
509
|
+
# A small constant added for numerical stability, preventing log(0) or division by zero.
|
510
|
+
#
|
511
|
+
# @param power [Integer] (2)
|
512
|
+
# The power to which the magnitude spectrum is raised. Typically 2 to work with
|
513
|
+
# power spectrograms.
|
514
|
+
#
|
515
|
+
# @return [Numo::DFloat]
|
516
|
+
# A 1D Numo::DFloat array containing the spectral flatness values for each frame.
|
517
|
+
#
|
518
|
+
# @example Compute spectral flatness for an audio clip
|
519
|
+
# samples = Awaaz::Utils::Soundread.new("audio.wav").read
|
520
|
+
# flatness = spectral_flatness(samples, frame_size: 1024, hop_length: 256)
|
521
|
+
# puts flatness.shape
|
522
|
+
#
|
523
|
+
def spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2)
|
524
|
+
stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs
|
525
|
+
stft_matrix = Numo::DFloat.maximum(amin, stft_matrix**power)
|
526
|
+
|
527
|
+
gms = Numo::DFloat::Math.exp Numo::DFloat::Math.log(stft_matrix).mean(axis: -2)
|
528
|
+
ams = stft_matrix.mean(axis: -2)
|
529
|
+
|
530
|
+
gms / ams
|
531
|
+
end
|
532
|
+
end
|
533
|
+
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
# Awaaz gem
|
4
|
+
module Awaaz
|
5
|
+
# Properties of audio
|
6
|
+
module Properties
|
7
|
+
# Calculates the duration (in seconds) of an audio signal given the number of samples and the sample rate.
|
8
|
+
#
|
9
|
+
# @param samples [Numo::NArray, Array, Object]
|
10
|
+
# The audio samples. This can be a Numo::NArray, Array, or any object
|
11
|
+
# that responds to `.shape` and returns a size array.
|
12
|
+
#
|
13
|
+
# @param sample_rate [Integer, Float]
|
14
|
+
# The sampling rate (in Hz) of the audio signal.
|
15
|
+
#
|
16
|
+
# @return [Float]
|
17
|
+
# The duration of the audio signal in seconds. Returns `0.0` if either
|
18
|
+
# the number of samples or the sample rate is non-positive.
|
19
|
+
#
|
20
|
+
# @example
|
21
|
+
# samples = Numo::DFloat.new(44100) # 1 second of audio at 44.1 kHz
|
22
|
+
# Awaaz.duration(samples, 44100)
|
23
|
+
# # => 1.0
|
24
|
+
#
|
25
|
+
# @note
|
26
|
+
# The duration is computed as:
|
27
|
+
# samples_count / sample_rate
|
28
|
+
#
|
29
|
+
# @see https://en.wikipedia.org/wiki/Sampling_(signal_processing)
|
30
|
+
def duration(samples, sample_rate)
|
31
|
+
samples_count = samples.shape.max
|
32
|
+
return 0.0 if samples_count <= 0 || sample_rate <= 0
|
33
|
+
|
34
|
+
samples_count / sample_rate.to_f
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
data/lib/awaaz/utils/resample.rb
CHANGED
@@ -6,7 +6,7 @@ module Awaaz
|
|
6
6
|
# Resample utilities for audio data represented as Numo::NArray.
|
7
7
|
# Wraps the `libsamplerate` bindings provided by {Extensions::Samplerate}.
|
8
8
|
#
|
9
|
-
# @note This module is intended for internal use, but `
|
9
|
+
# @note This module is intended for internal use, but `read_and_resample`
|
10
10
|
# is public for advanced users who need manual resampling.
|
11
11
|
module Resample
|
12
12
|
class << self
|
@@ -31,17 +31,19 @@ module Awaaz
|
|
31
31
|
#
|
32
32
|
# @example Resample 44.1kHz mono audio to 48kHz
|
33
33
|
# samples = Numo::SFloat.new(44100).rand
|
34
|
-
# new_samples = Awaaz::Utils::Resample.
|
35
|
-
def
|
36
|
-
|
34
|
+
# new_samples = Awaaz::Utils::Resample.read_and_resample(samples, 44100, 48000)
|
35
|
+
def read_and_resample(input_samples, input_rate, output_rate, channels, sampling_option: :sinc_fastest)
|
36
|
+
return input_samples if input_rate == output_rate
|
37
|
+
|
38
|
+
validate_inputs(input_samples)
|
37
39
|
|
38
40
|
ratio = calculate_ratio(input_rate, output_rate)
|
39
|
-
input_ptr, output_ptr, input_frames, output_frames = prepare_memory(input_samples, ratio)
|
41
|
+
input_ptr, output_ptr, input_frames, output_frames = prepare_memory(input_samples, ratio, channels)
|
40
42
|
|
41
43
|
data = build_src_data(input_ptr, output_ptr, input_frames, output_frames, ratio)
|
42
|
-
perform_resampling(data, sampling_option)
|
44
|
+
perform_resampling(data, sampling_option, channels)
|
43
45
|
|
44
|
-
convert_to_numo(output_ptr, data[:output_frames_gen])
|
46
|
+
convert_to_numo(output_ptr, data[:output_frames_gen] * channels)
|
45
47
|
end
|
46
48
|
|
47
49
|
private
|
@@ -50,14 +52,12 @@ module Awaaz
|
|
50
52
|
# Validates that the provided inputs are of the correct type and configuration.
|
51
53
|
#
|
52
54
|
# @param samples [Numo::NArray] The input samples.
|
53
|
-
# @param input_rate [Integer]
|
54
|
-
# @param output_rate [Integer]
|
55
55
|
#
|
56
56
|
# @raise [ArgumentError] If samples are not a Numo::SFloat array.
|
57
|
-
def validate_inputs(samples
|
58
|
-
return if
|
57
|
+
def validate_inputs(samples)
|
58
|
+
return if samples.is_a?(Numo::NArray)
|
59
59
|
|
60
|
-
raise ArgumentError, "Input must be a Numo::SFloat array"
|
60
|
+
raise ArgumentError, "Input must be a Numo::SFloat array"
|
61
61
|
end
|
62
62
|
|
63
63
|
##
|
@@ -82,14 +82,14 @@ module Awaaz
|
|
82
82
|
# @param ratio [Float] The resampling ratio.
|
83
83
|
#
|
84
84
|
# @return [Array<FFI::MemoryPointer, FFI::MemoryPointer, Integer, Integer>]
|
85
|
-
def prepare_memory(input_samples, ratio)
|
86
|
-
input_frames = input_samples.size
|
85
|
+
def prepare_memory(input_samples, ratio, channels)
|
86
|
+
input_frames = input_samples.size / channels
|
87
87
|
output_frames = (input_frames * ratio).to_i
|
88
88
|
|
89
|
-
input_ptr = FFI::MemoryPointer.new(:float,
|
89
|
+
input_ptr = FFI::MemoryPointer.new(:float, input_samples.size)
|
90
90
|
input_ptr.write_bytes(input_samples.to_string)
|
91
91
|
|
92
|
-
output_ptr = FFI::MemoryPointer.new(:float, output_frames)
|
92
|
+
output_ptr = FFI::MemoryPointer.new(:float, output_frames * channels)
|
93
93
|
|
94
94
|
[input_ptr, output_ptr, input_frames, output_frames]
|
95
95
|
end
|
@@ -122,8 +122,9 @@ module Awaaz
|
|
122
122
|
# @param sampling_option [Symbol, Integer]
|
123
123
|
#
|
124
124
|
# @raise [Awaaz::ResampleError] If resampling fails.
|
125
|
-
def perform_resampling(data, sampling_option)
|
126
|
-
err = Extensions::Samplerate.src_simple(data, Extensions::Samplerate.resample_option(sampling_option),
|
125
|
+
def perform_resampling(data, sampling_option, channels)
|
126
|
+
err = Extensions::Samplerate.src_simple(data, Extensions::Samplerate.resample_option(sampling_option),
|
127
|
+
channels)
|
127
128
|
raise Awaaz::ResampleError, "Resampling failed: #{Extensions::Samplerate.src_strerror(err)}" if err != 0
|
128
129
|
end
|
129
130
|
|
@@ -49,7 +49,16 @@ module Awaaz
|
|
49
49
|
# @return [Boolean] +true+ if mono, otherwise +false+.
|
50
50
|
#
|
51
51
|
def mono
|
52
|
-
from_options(:mono) ||
|
52
|
+
from_options(:mono) || true
|
53
|
+
end
|
54
|
+
|
55
|
+
##
|
56
|
+
# Resampling option
|
57
|
+
#
|
58
|
+
# @return [Symbol] default :linear
|
59
|
+
#
|
60
|
+
def resampling_option
|
61
|
+
from_options(:resampling_option) || :linear
|
53
62
|
end
|
54
63
|
|
55
64
|
##
|
@@ -3,166 +3,142 @@
|
|
3
3
|
module Awaaz
|
4
4
|
module Utils
|
5
5
|
##
|
6
|
-
# A
|
6
|
+
# A helper that mimics librosa.load using libsndfile via FFI.
|
7
7
|
#
|
8
|
-
#
|
9
|
-
#
|
8
|
+
# - Always returns Float32 samples normalized in [-1.0, 1.0]
|
9
|
+
# - Preserves channel structure (returns shape `[channels, frames]`)
|
10
|
+
# - Returns `[data, channels, sr]` where:
|
11
|
+
# * `data` = Numo::SFloat array (2D, shape: channels x frames)
|
12
|
+
# * `channels` = Integer number of channels
|
13
|
+
# * `sr` = sample rate (Integer)
|
10
14
|
#
|
11
|
-
# @example
|
12
|
-
# reader = Awaaz::Utils::Soundread.new("audio.wav"
|
13
|
-
#
|
14
|
-
#
|
15
|
-
# @note Currently, only `.wav` files are supported.
|
15
|
+
# @example
|
16
|
+
# reader = Awaaz::Utils::Soundread.new("audio.wav")
|
17
|
+
# data, channels, sr = reader.read
|
16
18
|
#
|
17
19
|
class Soundread
|
18
20
|
##
|
19
|
-
#
|
20
|
-
#
|
21
|
-
# @return [Array<String>] List of supported file extensions.
|
22
|
-
#
|
23
|
-
SUPPORTED_EXTENSIONS = %w[.wav].freeze
|
24
|
-
|
25
|
-
##
|
26
|
-
# Creates a new Soundread instance.
|
21
|
+
# Initializes a Soundread instance.
|
27
22
|
#
|
28
23
|
# @param filename [String] Path to the audio file to read.
|
29
|
-
# @param
|
30
|
-
# - `:output_rate` [Integer] Output sample rate (default: `22050`)
|
31
|
-
# - `:sampling_option` [Symbol] Resampling algorithm (default: `:sinc_fastest`)
|
24
|
+
# @param resampling_options [Hash] Optional resampling configuration.
|
32
25
|
#
|
33
|
-
def initialize(filename,
|
26
|
+
def initialize(filename, **resampling_options)
|
34
27
|
@filename = filename
|
35
|
-
@
|
28
|
+
@resampling_options = resampling_options
|
36
29
|
end
|
37
30
|
|
38
31
|
##
|
39
|
-
# Reads the audio file, returning
|
32
|
+
# Reads the audio file, returning samples, number of channels, and sample rate.
|
40
33
|
#
|
41
34
|
# @return [Array<(Numo::SFloat, Integer, Integer)>]
|
42
|
-
#
|
43
|
-
# -
|
44
|
-
# -
|
45
|
-
# - output_rate [Integer] — Sample rate of the returned audio.
|
35
|
+
# - data [Numo::SFloat] Audio samples, shape = `[channels, frames]`
|
36
|
+
# - channels [Integer] Number of channels
|
37
|
+
# - sr [Integer] Sample rate
|
46
38
|
#
|
47
|
-
# @raise [ArgumentError] If the file
|
48
|
-
# @raise [Awaaz::AudioreadError] If the file cannot be opened.
|
39
|
+
# @raise [ArgumentError] If the file cannot be opened.
|
49
40
|
#
|
50
41
|
def read
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
42
|
+
info, sndfile = open_file
|
43
|
+
frames, channels, sr = extract_info(info)
|
44
|
+
|
45
|
+
buffer, read_frames = read_buffer(sndfile, frames, channels)
|
46
|
+
close_file(sndfile)
|
55
47
|
|
56
|
-
|
48
|
+
data = process_data(buffer, read_frames, channels)
|
49
|
+
[resample(data, sr, channels), channels, sr]
|
57
50
|
end
|
58
51
|
|
59
52
|
private
|
60
53
|
|
61
|
-
|
62
|
-
|
63
|
-
#
|
64
|
-
# @return [Hash] Default options with `:output_rate => 22050`.
|
65
|
-
#
|
66
|
-
def default_resample_options
|
67
|
-
{ output_rate: 22_050 }
|
68
|
-
end
|
54
|
+
def resample(samples, sample_rate, channels)
|
55
|
+
validate_resampling_options
|
69
56
|
|
70
|
-
|
71
|
-
|
72
|
-
#
|
73
|
-
# @raise [ArgumentError] If the file extension is not in {SUPPORTED_EXTENSIONS}.
|
74
|
-
#
|
75
|
-
def validate_support
|
76
|
-
return if supported?
|
57
|
+
output_rate, sampling_option = @resampling_options.values_at(:output_rate, :sampling_rate)
|
58
|
+
sampling_option ||= :linear
|
77
59
|
|
78
|
-
|
60
|
+
return samples if output_rate == sample_rate || @resampling_options.empty?
|
61
|
+
|
62
|
+
Utils::Resample.read_and_resample(samples, sample_rate, output_rate, channels, sampling_option: sampling_option)
|
79
63
|
end
|
80
64
|
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
65
|
+
def validate_resampling_options
|
66
|
+
valid_options = %i[output_rate sampling_option]
|
67
|
+
|
68
|
+
@resampling_options.transform_keys!(&:to_sym)
|
69
|
+
@resampling_options.each_key do |key|
|
70
|
+
next if valid_options.include?(key)
|
71
|
+
|
72
|
+
raise ArgumentError, "Invalid option: #{key}. Available options: #{valid_options.join}"
|
73
|
+
end
|
88
74
|
end
|
89
75
|
|
90
76
|
##
|
91
|
-
# Opens the
|
77
|
+
# Opens the file and retrieves SF_INFO metadata.
|
92
78
|
#
|
93
|
-
# @return [Array<(FFI::Pointer
|
94
|
-
# A tuple containing:
|
95
|
-
# - soundfile [FFI::Pointer] — Pointer to the opened sound file.
|
96
|
-
# - sample_rate [Integer] — Sample rate of the audio file.
|
97
|
-
# - frames [Integer] — Number of frames in the file.
|
98
|
-
# - channels [Integer] — Number of channels in the file.
|
79
|
+
# @return [Array<(Awaaz::Extensions::Soundfile::SF_INFO, FFI::Pointer)>]
|
99
80
|
#
|
100
|
-
# @raise [
|
81
|
+
# @raise [ArgumentError] If the file cannot be opened.
|
101
82
|
#
|
102
83
|
def open_file
|
103
|
-
info = Extensions::Soundfile::SF_INFO.new
|
104
|
-
sndfile = Extensions::Soundfile.sf_open(
|
84
|
+
info = Awaaz::Extensions::Soundfile::SF_INFO.new
|
85
|
+
sndfile = Awaaz::Extensions::Soundfile.sf_open(
|
86
|
+
@filename,
|
87
|
+
Awaaz::Extensions::Soundfile::SFM_READ,
|
88
|
+
info
|
89
|
+
)
|
105
90
|
|
106
|
-
raise
|
91
|
+
raise ArgumentError, "Could not open file: #{@filename}" if sndfile.null?
|
107
92
|
|
108
|
-
|
109
|
-
frames = info[:frames]
|
110
|
-
channels = info[:channels]
|
111
|
-
[sndfile, sample_rate, frames, channels]
|
93
|
+
[info, sndfile]
|
112
94
|
end
|
113
95
|
|
114
96
|
##
|
115
|
-
#
|
97
|
+
# Extracts frames, channels, and sample rate from SF_INFO.
|
116
98
|
#
|
117
|
-
# @param
|
99
|
+
# @param info [Awaaz::Extensions::Soundfile::SF_INFO]
|
100
|
+
# @return [Array<(Integer, Integer, Integer)>] frames, channels, sr
|
101
|
+
#
|
102
|
+
def extract_info(info)
|
103
|
+
[info[:frames], info[:channels], info[:samplerate]]
|
104
|
+
end
|
105
|
+
|
106
|
+
##
|
107
|
+
# Reads raw audio frames into a memory buffer.
|
108
|
+
#
|
109
|
+
# @param sndfile [FFI::Pointer] Opened sound file.
|
118
110
|
# @param frames [Integer] Number of frames to read.
|
119
|
-
# @param channels [Integer] Number of channels
|
120
|
-
# @return [Numo::SFloat] The audio samples.
|
111
|
+
# @param channels [Integer] Number of channels.
|
121
112
|
#
|
122
|
-
|
113
|
+
# @return [Array<(FFI::MemoryPointer, Integer)>] buffer and number of read frames
|
114
|
+
#
|
115
|
+
def read_buffer(sndfile, frames, channels)
|
123
116
|
buffer = FFI::MemoryPointer.new(:float, frames * channels)
|
124
|
-
read_frames = Extensions::Soundfile.sf_readf_float(
|
125
|
-
|
117
|
+
read_frames = Awaaz::Extensions::Soundfile.sf_readf_float(sndfile, buffer, frames)
|
118
|
+
[buffer, read_frames]
|
126
119
|
end
|
127
120
|
|
128
121
|
##
|
129
122
|
# Closes the open sound file.
|
130
123
|
#
|
131
|
-
# @param
|
124
|
+
# @param sndfile [FFI::Pointer]
|
132
125
|
# @return [void]
|
133
126
|
#
|
134
|
-
def
|
135
|
-
Extensions::Soundfile.sf_close(
|
127
|
+
def close_file(sndfile)
|
128
|
+
Awaaz::Extensions::Soundfile.sf_close(sndfile)
|
136
129
|
end
|
137
130
|
|
138
131
|
##
|
139
|
-
#
|
132
|
+
# Converts the buffer into a Numo::SFloat array and reshapes to `[channels, frames]`.
|
140
133
|
#
|
141
|
-
# @param
|
142
|
-
# @param
|
134
|
+
# @param buffer [FFI::MemoryPointer]
|
135
|
+
# @param read_frames [Integer] Number of frames read.
|
143
136
|
# @param channels [Integer] Number of channels.
|
144
|
-
# @return [
|
137
|
+
# @return [Numo::SFloat] Audio data of shape `[channels, frames]`.
|
145
138
|
#
|
146
|
-
|
147
|
-
|
148
|
-
|
149
|
-
valid_options = %i[output_rate sampling_option]
|
150
|
-
|
151
|
-
@resample_options.transform_keys!(&:to_sym)
|
152
|
-
@resample_options.each_key do |key|
|
153
|
-
next if valid_options.include?(key)
|
154
|
-
|
155
|
-
raise ArgumentError, "Invalid option: #{key}. Available options: #{valid_options.join}"
|
156
|
-
end
|
157
|
-
|
158
|
-
output_rate, sampling_option = @resample_options.values_at(:output_rate, :sampling_rate)
|
159
|
-
sampling_option ||= :sinc_fastest
|
160
|
-
|
161
|
-
[
|
162
|
-
Utils::Resample.read_and_resample_numo(samples, sample_rate, output_rate, sampling_option:),
|
163
|
-
channels,
|
164
|
-
output_rate
|
165
|
-
]
|
139
|
+
def process_data(buffer, read_frames, channels)
|
140
|
+
data = Numo::SFloat.cast(buffer.read_array_of_float(read_frames * channels))
|
141
|
+
data.reshape(read_frames, channels).transpose
|
166
142
|
end
|
167
143
|
end
|
168
144
|
end
|
data/lib/awaaz/utils/utils.rb
CHANGED
@@ -19,6 +19,7 @@ require_relative "soundread"
|
|
19
19
|
require_relative "shell_command_builder"
|
20
20
|
require_relative "via_shell"
|
21
21
|
|
22
|
+
# Awaaz gem
|
22
23
|
module Awaaz
|
23
24
|
# The Utils module provides low-level helper components
|
24
25
|
# for performing core audio-related operations in the Awaaz gem.
|
data/lib/awaaz/version.rb
CHANGED
data/lib/awaaz.rb
CHANGED
@@ -10,11 +10,11 @@
|
|
10
10
|
# @see Awaaz::Decoders
|
11
11
|
# @see Awaaz::Utils
|
12
12
|
# @see Awaaz::Config
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
# @see Awaaz::Features
|
14
|
+
# @see Awaaz::Properties
|
16
15
|
require "ffi"
|
17
16
|
require "numo/narray"
|
17
|
+
require "numo/pocketfft"
|
18
18
|
|
19
19
|
require_relative "awaaz/errors"
|
20
20
|
require_relative "awaaz/extensions/extensions"
|
@@ -23,3 +23,10 @@ require_relative "awaaz/version"
|
|
23
23
|
|
24
24
|
require_relative "awaaz/config"
|
25
25
|
require_relative "awaaz/decoders/decoders"
|
26
|
+
require_relative "awaaz/features"
|
27
|
+
require_relative "awaaz/properties"
|
28
|
+
|
29
|
+
module Awaaz
|
30
|
+
extend Features
|
31
|
+
extend Properties
|
32
|
+
end
|
metadata
CHANGED
@@ -1,13 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: awaaz
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Saad Azam
|
8
|
+
autorequire:
|
8
9
|
bindir: exe
|
9
10
|
cert_chain: []
|
10
|
-
date: 2025-08-
|
11
|
+
date: 2025-08-29 00:00:00.000000000 Z
|
11
12
|
dependencies:
|
12
13
|
- !ruby/object:Gem::Dependency
|
13
14
|
name: ffi
|
@@ -37,6 +38,20 @@ dependencies:
|
|
37
38
|
- - "~>"
|
38
39
|
- !ruby/object:Gem::Version
|
39
40
|
version: 0.9.1
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: numo-pocketfft
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: 0.4.1
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - "~>"
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: 0.4.1
|
40
55
|
- !ruby/object:Gem::Dependency
|
41
56
|
name: ruby-filemagic
|
42
57
|
requirement: !ruby/object:Gem::Requirement
|
@@ -78,6 +93,8 @@ files:
|
|
78
93
|
- lib/awaaz/extensions/extensions.rb
|
79
94
|
- lib/awaaz/extensions/samplerate.rb
|
80
95
|
- lib/awaaz/extensions/soundfile.rb
|
96
|
+
- lib/awaaz/features.rb
|
97
|
+
- lib/awaaz/properties.rb
|
81
98
|
- lib/awaaz/utils/resample.rb
|
82
99
|
- lib/awaaz/utils/shell_command_builder.rb
|
83
100
|
- lib/awaaz/utils/sound_config.rb
|
@@ -95,6 +112,7 @@ metadata:
|
|
95
112
|
source_code_uri: https://github.com/SadMadLad/awaaz
|
96
113
|
changelog_uri: https://github.com/SadMadLad/awaaz/blob/main/CHANGELOG.md
|
97
114
|
rubygems_mfa_required: 'true'
|
115
|
+
post_install_message:
|
98
116
|
rdoc_options: []
|
99
117
|
require_paths:
|
100
118
|
- lib
|
@@ -102,14 +120,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
102
120
|
requirements:
|
103
121
|
- - ">="
|
104
122
|
- !ruby/object:Gem::Version
|
105
|
-
version: 3.
|
123
|
+
version: 3.0.0
|
106
124
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
107
125
|
requirements:
|
108
126
|
- - ">="
|
109
127
|
- !ruby/object:Gem::Version
|
110
128
|
version: '0'
|
111
129
|
requirements: []
|
112
|
-
rubygems_version: 3.
|
130
|
+
rubygems_version: 3.2.3
|
131
|
+
signing_key:
|
113
132
|
specification_version: 4
|
114
133
|
summary: Audio Analysis with Ruby
|
115
134
|
test_files: []
|