awaaz 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e2aa5ae798de6f2b722134913cc7b055f53b4b035989e94521be55d1fe76acc7
4
- data.tar.gz: b54beb3ae4fc90ab2a4c1485ffa91eccb1b53cafca27c04e7ece14263b65abb4
3
+ metadata.gz: 539ed57cbb902bb7b86939c6c9ccde9265cc8b28c4fee725359acdf918c9b2ff
4
+ data.tar.gz: ba30315c102903f622eead61d5a847d9c9b93e3d7a7cca790ebff995e909d014
5
5
  SHA512:
6
- metadata.gz: b0c79b3dbf5396de690ee17868cb8a0d2d29dfe396b8c5dd9c9a098393a40d15715d4def9990990024ad356f9463cb28feab981e96f59e4961d978e373a104ce
7
- data.tar.gz: d28e0001af9a5b8052298f33a5dbf72ca16325d15e3762187fd7a022b6140874729f52d71d32a0e39a589b712471e248fd28dd6e73dea35dd3fa26742a22ef2b
6
+ metadata.gz: 3a4fb5f88de016a035d06660deabc2472aaa423400cdf72acb7eb9440f23e8ab15606e74e23758ce812539e3878236b590580c9665614494631beba0e71d7dc3
7
+ data.tar.gz: b5ed50514a4c1a9be9c3008cbaa2661aaa1d69d6ba33dc3906db202ee10da20741e7cea4d488f27ce7029bfd40552edf4464f7debbf4bc37e47dd52cd4c4bcf2
data/.rubocop.yml CHANGED
@@ -1,5 +1,5 @@
1
1
  AllCops:
2
- TargetRubyVersion: 3.4
2
+ TargetRubyVersion: 3.0
3
3
  NewCops: enable
4
4
 
5
5
  Style/StringLiterals:
@@ -11,5 +11,9 @@ Style/StringLiteralsInInterpolation:
11
11
  Metrics/MethodLength:
12
12
  Max: 20
13
13
 
14
+ Style/NumericPredicate:
15
+ Enabled: false
14
16
 
17
+ Metrics/ModuleLength:
18
+ Enabled: false # Temporary
15
19
 
data/.ruby-version CHANGED
@@ -1 +1 @@
1
- 3.4.2
1
+ 3.0.0
data/CHANGELOG.md CHANGED
@@ -1,5 +1,16 @@
1
- ## [Unreleased]
1
+ ## [Released]
2
2
 
3
- ## [0.1.0] - 2025-07-21
3
+ ## [0.1.0] - 2025-08-12
4
4
 
5
5
  - Initial release
6
+ - Ability to decode `.wav` and `.mp3`.
7
+
8
+ ## 0.2.0 - 2025-08-26
9
+
10
+ - Introduced new features for audio analysis:
11
+ - RMS (Root Mean Square)
12
+ - Zero Crossing Rate
13
+ - Spectral Centroid
14
+ - Spectral Bandwidth
15
+ - Spectral Rolloff
16
+ - Spectral Flatness
data/GLOSSARY.md CHANGED
@@ -1,3 +1,9 @@
1
1
  # Terms and Definitions for Audio Processing
2
2
 
3
- - **PCM (Pulse Code Modulation):** A method to convert analog audio signals into digital form by sampling the signal's ampllitude at regular intervals.
3
+ - **PCM (Pulse Code Modulation):** A method to convert analog audio signals into digital form by sampling the signal's amplitude at regular intervals.
4
+ - **RMS (Root Mean Square)**: Basically measures the average signal's power or loudness of time.
5
+ - **Spectral Bandwidth:** Calculation of variation of frequencies around the spectral centroid of the audio. Low bandwidth indicates low variation in audio and the audio is concentrated around the centroid. Like a flute note. Higher bandwidth highlights noisy, loud sound, like a distorted guitar.
6
+ - **Spectral Centroid**: It tells us about the 'center of mass' of the sound. Intuitively, lower spectral centroid score means bassier, muffled sound while high centroid value indicates bright, sharp, tinny audio.
7
+ - **Spectral Flatness**: Can be used to identify the noisiness of audio. High flatness (~1) indicates high energy, white noise-like sound. Low value (~0) highlights harmonic signal or pure tone.
8
+ - **Spectral Rolloff:** Measures the frequency below which a certain percentage of the total spectral energy is contained. Low rolloff - more energy is concentrated in lower frequencies, like drums, bass, male voices. High rolloff - significant energy in high frequencies like female voice, hissing sound etc.
9
+ - **ZCR (Zero Crossing Rate)**: Counts how many times the audio changes signal from positive to negative and vice versa. If ZCR is high, the audio is noisy, sharp or high-pitched. And an audio with low ZCR is smooth, steady or low-pitched.
data/README.md CHANGED
@@ -52,11 +52,14 @@ gem install awaaz
52
52
  ```ruby
53
53
  # To decode the audio file
54
54
  samples, sample_rate = Awaaz.load("path/to/audio_file")
55
-
56
- # To decode the audio file using specified decoder
57
- samples, sample_rate = Awaaz.load("path/to/audio_file", decoder: :sox)
58
55
  ```
59
56
 
57
+ ## Documentation
58
+
59
+ [Documentation](https://www.rubydoc.info/github/SadMadLad/awaaz)
60
+
61
+ Checkout [this demo](https://github.com/SadMadLad/awaaz-demo) to get more idea of some use cases of the gem
62
+
60
63
  ## Development
61
64
 
62
65
  After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
data/TODOS.md CHANGED
@@ -1,3 +1,2 @@
1
- - Lazy decoding of an audio
2
- - `libsndfile` support
3
1
  - Streaming output of larger files
2
+ - Improve and speed up resampling
data/lib/awaaz/config.rb CHANGED
@@ -76,6 +76,28 @@ module Awaaz
76
76
  @available_decoders.nil? || @available_decoders.empty?
77
77
  end
78
78
 
79
+ ##
80
+ # Checks if there is at least one decoder capable of handling WAV files.
81
+ #
82
+ # Currently, `ffmpeg` and `sox` are considered capable of decoding WAV files.
83
+ #
84
+ # @return [Boolean] `true` if either `ffmpeg` or `sox` is available, otherwise `false`.
85
+ #
86
+ def decoders_for_wav?
87
+ ffmpeg? || sox?
88
+ end
89
+
90
+ ##
91
+ # Checks if there are no decoders available for handling WAV files.
92
+ #
93
+ # This is the logical negation of {#decoders_for_wav?}.
94
+ #
95
+ # @return [Boolean] `true` if neither `ffmpeg` nor `sox` is available, otherwise `false`.
96
+ #
97
+ def no_decoders_for_wav?
98
+ !decoders_for_wav?
99
+ end
100
+
79
101
  private
80
102
 
81
103
  ##
@@ -41,9 +41,9 @@ module Awaaz
41
41
  set_available_options
42
42
 
43
43
  # @param filename [String] Path to the audio file to decode.
44
- def initialize(filename, **)
44
+ def initialize(filename, **options)
45
45
  @filename = filename
46
- @options = Utils::SoundConfig.new(available_options, **)
46
+ @options = Utils::SoundConfig.new(available_options, **options)
47
47
  end
48
48
 
49
49
  # Loads audio data.
@@ -72,7 +72,7 @@ module Awaaz
72
72
  # - number of channels
73
73
  # - sample rate
74
74
  def soundread
75
- Utils::Soundread.new(@filename).read
75
+ Utils::Soundread.new(@filename, output_rate: sample_rate, sampling_option: resampling_option).read
76
76
  end
77
77
 
78
78
  # Processes the decoded audio samples by reshaping and optionally converting to mono.
@@ -83,7 +83,7 @@ module Awaaz
83
83
  # @return [Array<(Numo::DFloat, Integer)>] Processed samples and the sample rate.
84
84
  def process(input_samples, channels, sample_rate)
85
85
  input_samples = input_samples.reshape(channels, input_samples.size / channels)
86
- input_samples = input_samples.mean(0) if mono?
86
+ input_samples = input_samples.mean(0).reshape(1, input_samples.shape[1]) if mono?
87
87
 
88
88
  [input_samples, sample_rate]
89
89
  end
@@ -107,7 +107,7 @@ module Awaaz
107
107
  # Delegates option accessors to the {Utils::SoundConfig} instance.
108
108
  %i[
109
109
  sample_rate num_channels decoder_option mono mono?
110
- stereo? amplification_factor soundread?
110
+ stereo? amplification_factor soundread? resampling_option
111
111
  ].each do |option_key|
112
112
  define_method(option_key) { @options.public_send(option_key) }
113
113
  end
@@ -25,7 +25,7 @@ module Awaaz
25
25
  # @param filename [String] the path to the audio file
26
26
  # @raise [ArgumentError] if the MIME type is not supported
27
27
  # @return [Object] the result of decoding, as returned by the decoder class
28
- def load(filename)
28
+ def load(filename, ...)
29
29
  fm = FileMagic.new(FileMagic::MAGIC_MIME_TYPE)
30
30
  mime_type = fm.file(filename)
31
31
 
@@ -35,7 +35,7 @@ module Awaaz
35
35
  end
36
36
 
37
37
  decoding_class = DECODER_MAP[mime_type]
38
- decoding_class.load(filename)
38
+ decoding_class.load(filename, ...)
39
39
  end
40
40
  end
41
41
  end
@@ -0,0 +1,533 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Awaaz
4
+ # Audio Features
5
+ module Features
6
+ ##
7
+ # Calculates the total number of frames for a given signal length, frame size, and hop length.
8
+ #
9
+ # @param signal_length [Integer] Number of samples in the signal.
10
+ # @param frame_size [Integer] Size of each analysis frame (in samples).
11
+ # @param hop_length [Integer] Step size between consecutive frames (in samples).
12
+ #
13
+ # @return [Integer] The total number of frames.
14
+ #
15
+ def total_frames(signal_length, frame_size, hop_length)
16
+ ((signal_length - frame_size) / hop_length.to_f).ceil + 1
17
+ end
18
+
19
+ ##
20
+ # Computes how many samples are needed to right-pad a signal so
21
+ # that its length perfectly fits the given frame and hop size.
22
+ #
23
+ # @param signal_length [Integer] Number of samples in the signal.
24
+ # @param frame_size [Integer] Size of each analysis frame (in samples).
25
+ # @param hop_length [Integer] Step size between consecutive frames (in samples).
26
+ #
27
+ # @return [Integer] Number of padding samples required.
28
+ #
29
+ def pad_amount(signal_length, frame_size, hop_length)
30
+ frames = total_frames(signal_length, frame_size, hop_length)
31
+ padded_length = ((frames - 1) * hop_length) + frame_size
32
+ padded_length - signal_length
33
+ end
34
+
35
+ ##
36
+ # Pads an array with zeros (or a specified value) along a given axis.
37
+ #
38
+ # @param array [Numo::NArray] The input array (e.g., shape [channels, samples]).
39
+ # @param pad_count [Integer] Number of padding elements to add.
40
+ # @param axis [Integer] Axis along which to pad (default: 1 for time axis).
41
+ # @param with [Numeric] Value to pad with (default: 0).
42
+ #
43
+ # @return [Numo::NArray] The padded array.
44
+ #
45
+ def pad_right(array, pad_count, axis: 1, with: 0)
46
+ channels_count = array.shape.first
47
+ padded_array = Numo::SFloat.new(channels_count, pad_count).fill(with)
48
+
49
+ array.concatenate(padded_array, axis: axis)
50
+ end
51
+
52
+ ##
53
+ # Builds a list of sample index ranges for each analysis frame.
54
+ #
55
+ # @param signal_length [Integer] Number of samples in the (possibly padded) signal.
56
+ # @param frame_size [Integer] Size of each frame (in samples).
57
+ # @param hop_length [Integer] Step size between consecutive frames (in samples).
58
+ #
59
+ # @return [Array<Range>] An array where each element is the sample index range for one frame.
60
+ #
61
+ def build_ranges(signal_length, frame_size, hop_length)
62
+ ranges = []
63
+ start = 0
64
+ while start + frame_size <= signal_length
65
+ ranges << (start...(start + frame_size))
66
+ start += hop_length
67
+ end
68
+ ranges
69
+ end
70
+
71
+ ##
72
+ # Pads the signal (if necessary) and returns the padded array along with frame index ranges.
73
+ #
74
+ # @param array [Numo::NArray] A 2D array where shape is [channels, samples].
75
+ # @param frame_size [Integer] Size of each frame (in samples).
76
+ # @param hop_length [Integer] Step size between consecutive frames (in samples).
77
+ #
78
+ # @raise [ArgumentError] If hop length is less than 1.
79
+ #
80
+ # @return [Array<(Numo::NArray, Array<Range>)>]
81
+ # - padded signal array
82
+ # - array of frame index ranges
83
+ #
84
+ def frame_ranges(array, frame_size: 2048, hop_length: 512)
85
+ raise ArgumentError, "Hop Length can't be less than 1" if hop_length < 1
86
+
87
+ amount = pad_amount(array.shape[1], frame_size, hop_length)
88
+ array = pad_right(array, amount) if amount.positive?
89
+
90
+ [array, build_ranges(array.shape[1], frame_size, hop_length)]
91
+ end
92
+
93
+ ##
94
+ # Calculates the RMS (Root Mean Square) energy for each frame in the given audio.
95
+ #
96
+ # @param samples [Numo::NArray] A 2D array of shape [channels, samples].
97
+ # @param frame_size [Integer] Size of each analysis frame (in samples).
98
+ # @param hop_length [Integer] Step size between consecutive frames (in samples).
99
+ #
100
+ # @return [Numo::SFloat] A 2D array of RMS values with shape [channels, frames].
101
+ #
102
+ def rms(samples, frame_size: 2048, hop_length: 512)
103
+ samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
104
+
105
+ means = Numo::SFloat.zeros(samples.shape[0], frame_groups.length)
106
+ frame_groups.each_with_index do |frame_range, idx|
107
+ means[true, idx] = samples[true, frame_range].rms(axis: 1)
108
+ end
109
+
110
+ means
111
+ end
112
+
113
+ ##
114
+ # Calculates the overall RMS for an entire signal without framing.
115
+ #
116
+ # @param samples [Numo::NArray] A 2D or 1D array of samples.
117
+ #
118
+ # @return [Float] RMS value for the entire signal.
119
+ #
120
+ def rms_overall(samples)
121
+ samples.rms
122
+ end
123
+
124
+ # Calculates the zero-crossing rate (ZCR) of an audio signal frame-by-frame.
125
+ #
126
+ # The zero-crossing rate is the proportion of consecutive samples in a frame
127
+ # where the signal changes sign (positive to negative or vice versa).
128
+ # It is often used as a simple feature in speech/music analysis.
129
+ #
130
+ # @param samples [Numo::NArray] 2D array of audio samples.
131
+ # Shape: [n_channels, n_samples].
132
+ # @param frame_size [Integer] Size of each analysis frame in samples. Default: 2048.
133
+ # @param hop_length [Integer] Step size between successive frames in samples. Default: 512.
134
+ # @return [Numo::SFloat] 2D array of zero-crossing rates per frame for each channel.
135
+ # Shape: [n_channels, n_frames].
136
+ #
137
+ # @example
138
+ # # Stereo signal: 2 channels, 44100 samples
139
+ # zcr_values = zcr(samples, frame_size: 2048, hop_length: 512)
140
+ # puts zcr_values.shape # => [2, n_frames]
141
+ #
142
+ def zcr(samples, frame_size: 2048, hop_length: 512)
143
+ framed_samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
144
+
145
+ n_channels = framed_samples.shape[0]
146
+ zcrs = Numo::SFloat.zeros(n_channels, frame_groups.length)
147
+
148
+ frame_groups.each_with_index do |frame_range, idx|
149
+ zcrs[true, idx] = zcr_for_frame(framed_samples[true, frame_range], frame_size)
150
+ end
151
+
152
+ zcrs
153
+ end
154
+
155
+ # Calculates the zero-crossing rate for a single frame of audio.
156
+ #
157
+ # @param frame [Numo::NArray] 2D array containing audio samples for a single frame.
158
+ # Shape: [n_channels, frame_size].
159
+ # @param frame_size [Integer] Number of samples in the frame.
160
+ # @return [Numo::SFloat] 1D array of zero-crossing rates for each channel in the frame.
161
+ # Shape: [n_channels].
162
+ #
163
+ # @example
164
+ # frame = samples[true, 0...2048]
165
+ # single_frame_zcr = zcr_for_frame(frame, 2048)
166
+ # puts single_frame_zcr # => Numo::SFloat[0.15, 0.12]
167
+ def zcr_for_frame(frame, frame_size)
168
+ first_part = frame[true, 0...-1]
169
+ second_part = frame[true, 1..-1]
170
+ products = first_part * second_part
171
+
172
+ sign_changes = products < 0
173
+ counts = sign_changes.count_true(axis: 1)
174
+
175
+ counts / frame_size.to_f
176
+ end
177
+
178
+ # Calculates the overall zero-crossing rate (ZCR) of an entire audio signal.
179
+ #
180
+ # @param samples [Numo::NArray] 2D array of audio samples.
181
+ # Shape: [n_channels, n_samples].
182
+ # @return [Numo::SFloat] 1D array containing the overall ZCR for each channel.
183
+ # Shape: [n_channels].
184
+ #
185
+ # @example
186
+ # # Stereo signal: 2 channels, 44100 samples
187
+ # overall_zcr = zcr_overall(samples)
188
+ # puts overall_zcr.shape # => [2]
189
+ #
190
+ #
191
+ def zcr_overall(samples)
192
+ ((samples[true, 0...-1] * samples[true, 1..-1]) < 0).count_true(axis: 1) / samples.shape[1].to_f
193
+ end
194
+
195
+ # Generates a Hann window of given frame size.
196
+ #
197
+ # A Hann window is commonly used in spectral analysis
198
+ # to reduce spectral leakage before applying an FFT.
199
+ #
200
+ # @param frame_size [Integer] the size of the frame (number of samples per window)
201
+ # @return [Numo::DFloat] the Hann window of length `frame_size`
202
+ def hann_window(frame_size)
203
+ idx = Numo::DFloat.new(frame_size).seq
204
+ 0.5 * (1 - Numo::NMath.cos(2 * Math::PI * idx / (frame_size - 1)))
205
+ end
206
+
207
+ # Prepares audio samples and parameters for FFT-based feature extraction.
208
+ #
209
+ # @param samples [Numo::NArray]
210
+ # Multichannel audio samples as a 2D array
211
+ # (shape: [channels, samples]).
212
+ # @param frame_size [Integer]
213
+ # Number of samples per frame (FFT window length).
214
+ # @param hop_length [Integer]
215
+ # Number of samples to shift between consecutive frames.
216
+ #
217
+ # @return [Array]
218
+ # A tuple containing:
219
+ # - samples [Numo::NArray] : Windowed audio samples aligned to frames
220
+ # - ranges [Array<Range>] : Frame index ranges for iteration
221
+ # - window [Numo::DFloat] : Hann window for FFT
222
+ # - channels_count [Integer] : Number of audio channels
223
+ # - freqs_size [Integer] : Number of FFT frequency bins per frame
224
+ #
225
+ # @example
226
+ # samples, ranges, window, channels_count, freqs_size =
227
+ # prepare_for_fft(audio, frame_size: 2048, hop_length: 512)
228
+ #
229
+ def prepare_for_fft(samples, frame_size:, hop_length:)
230
+ samples, ranges = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
231
+ window = hann_window(frame_size)
232
+ channels_count = samples.shape[0]
233
+ freqs_size = (frame_size / 2) + 1
234
+
235
+ [samples, ranges, window, channels_count, freqs_size]
236
+ end
237
+
238
+ # Computes the Short-Time Fourier Transform (STFT) of a multi-channel signal.
239
+ #
240
+ # This method applies a sliding Hann window to the input signal, computes
241
+ # the FFT for each frame and each channel, and stores the positive frequency
242
+ # bins into a 3D complex-valued matrix.
243
+ #
244
+ # The resulting STFT matrix has dimensions:
245
+ # `[channels, frequencies, frames]`
246
+ #
247
+ # @param samples [Numo::NArray] a 2D array of shape [channels, samples]
248
+ # containing the audio data.
249
+ # @param frame_size [Integer] the size of each FFT frame (default: 2048)
250
+ # @param hop_length [Integer] the number of samples between successive frames (default: 512)
251
+ # @return [Numo::DComplex] a 3D array of shape
252
+ # `[channels, (frame_size / 2 + 1), frames]` containing the complex STFT values
253
+ #
254
+ # @example Compute STFT for mono audio
255
+ # samples = Numo::DFloat[[0.0, 1.0, 0.0, -1.0, ...]] # shape: [1, num_samples]
256
+ # stft_matrix = stft(samples, frame_size: 1024, hop_length: 256)
257
+ #
258
+ def stft(samples, frame_size: 2048, hop_length: 512)
259
+ samples, ranges, window, channels_count, freqs_size = prepare_for_fft(samples, frame_size: frame_size,
260
+ hop_length: hop_length)
261
+ stft_matrix = Numo::DComplex.zeros(channels_count, freqs_size, ranges.size)
262
+
263
+ ranges.each_with_index do |range, frame_idx|
264
+ channels_count.times do |ch|
265
+ fft_result = Numo::Pocketfft.fft(samples[ch, range] * window)
266
+ stft_matrix[ch, true, frame_idx] = fft_result[0...freqs_size]
267
+ end
268
+ end
269
+
270
+ stft_matrix
271
+ end
272
+
273
+ ##
274
+ # Computes the FFT (Fast Fourier Transform) of each channel
275
+ # in a multi-channel signal using a Hann window.
276
+ #
277
+ # @param samples [Numo::NArray] A 2D array of shape [channels, samples]
278
+ # containing the audio data.
279
+ #
280
+ # @return [Numo::DComplex] A 2D complex array of shape
281
+ # `[channels, samples]` containing the FFT result for each channel.
282
+ #
283
+ def fft(samples)
284
+ window = hann_window(samples.shape[1])
285
+ channels_count = samples.shape[0]
286
+ fft_results = channels_count.times.map do |ch|
287
+ Numo::Pocketfft.fft(samples[ch, true] * window)
288
+ end
289
+ Numo::DComplex[*fft_results]
290
+ end
291
+
292
+ ##
293
+ # Computes the frequency bin centers for an FFT.
294
+ #
295
+ # @param frame_size [Integer] The size of the FFT frame (in samples).
296
+ # @param sample_rate [Integer] The sampling rate of the audio (Hz).
297
+ #
298
+ # @return [Numo::DFloat] 1D array of frequency values (Hz)
299
+ # corresponding to FFT bins. Shape: `[frame_size/2 + 1]`.
300
+ #
301
+ def frequency_bins(frame_size, sample_rate)
302
+ Numo::DFloat.new((frame_size / 2) + 1).seq * (sample_rate.to_f / frame_size)
303
+ end
304
+
305
+ ##
306
+ # Computes the magnitude spectrum of a single frame using an FFT.
307
+ #
308
+ # @param frame [Numo::NArray] 1D array of audio samples for a single frame.
309
+ #
310
+ # @return [Numo::DFloat] 1D array of magnitude values for each FFT bin.
311
+ #
312
+ def frame_magnitude(frame)
313
+ Numo::Pocketfft.rfft(frame).abs
314
+ end
315
+
316
+ ##
317
+ # Computes the spectral centroid of a single frame.
318
+ #
319
+ # The spectral centroid is the "center of mass" of the spectrum
320
+ # and is often associated with the perceived brightness of a sound.
321
+ #
322
+ # @param freqs [Numo::DFloat] 1D array of frequency bin centers.
323
+ # @param magnitude [Numo::DFloat] 1D array of magnitude values
324
+ # corresponding to each frequency bin.
325
+ #
326
+ # @return [Float] The spectral centroid in Hz for the given frame.
327
+ #
328
+ def compute_centroid(freqs, magnitude)
329
+ mag_sum = magnitude.sum
330
+ return 0 if mag_sum.zero?
331
+
332
+ (freqs * magnitude).sum / mag_sum
333
+ end
334
+
335
+ ##
336
+ # Computes the spectral centroid trajectory of an audio signal.
337
+ #
338
+ # This method frames the signal, applies a Hann window,
339
+ # computes the FFT magnitudes, and calculates the centroid
340
+ # for each frame. The result is a time series of centroids.
341
+ #
342
+ # @param samples [Numo::NArray] A 2D array of shape [channels, samples].
343
+ # @param frame_size [Integer] Size of each analysis frame (default: 2048).
344
+ # @param hop_length [Integer] Step size between frames in samples (default: 512).
345
+ # @param sample_rate [Integer] Sampling rate of the audio in Hz (default: 22050).
346
+ #
347
+ # @return [Numo::DFloat] 2D array of spectral centroids with shape
348
+ # `[channels, n_frames]`.
349
+ #
350
+ # @example
351
+ # centroids = spectral_centroids(samples, frame_size: 1024, hop_length: 256, sample_rate: 44100)
352
+ # puts centroids.shape # => [channels, n_frames]
353
+ #
354
+ def spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050)
355
+ samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length)
356
+ freqs = frequency_bins(frame_size, sample_rate)
357
+ centroid_matrix = Numo::DFloat.zeros(channels_count, ranges.size)
358
+
359
+ ranges.each_with_index do |range, frame_idx|
360
+ channels_count.times do |ch|
361
+ frame = samples[ch, range] * window
362
+ magnitude = frame_magnitude(frame)
363
+ centroid_matrix[ch, frame_idx] = compute_centroid(freqs, magnitude)
364
+ end
365
+ end
366
+
367
+ centroid_matrix
368
+ end
369
+
370
+ # Computes the bandwidth for a single frame.
371
+ #
372
+ # @param freqs [Numo::DFloat] Frequency bins (Hz)
373
+ # @param magnitude [Numo::DFloat] Magnitude spectrum for the frame
374
+ # @param centroid [Float] Spectral centroid for the frame (Hz)
375
+ # @param power [Integer] Power/exponent used for bandwidth calculation (commonly 2)
376
+ # @return [Float] Spectral bandwidth for the frame
377
+ def compute_bandwidth(freqs, magnitude, centroid, power)
378
+ mag_sum = magnitude.sum
379
+ return 0 if mag_sum.zero?
380
+
381
+ diff = (freqs - centroid).abs**power
382
+ value = (magnitude * diff).sum / mag_sum
383
+ value**(1.0 / power)
384
+ end
385
+
386
+ # Computes the spectral bandwidth over time for a signal.
387
+ #
388
+ # @param samples [Numo::DFloat] Input samples (channels x samples)
389
+ # @param frame_size [Integer] FFT window size (default: 2048)
390
+ # @param hop_length [Integer] Step size between frames (default: 512)
391
+ # @param sample_rate [Integer] Sampling rate of the audio signal (default: 22050 Hz)
392
+ # @param power [Integer] Exponent for bandwidth calculation (default: 2)
393
+ # @return [Numo::DFloat] Spectral bandwidth matrix (channels x frames)
394
+ def spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2)
395
+ samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length)
396
+ freqs = frequency_bins(frame_size, sample_rate)
397
+ bandwidth_matrix = Numo::DFloat.zeros(channels_count, ranges.size)
398
+
399
+ ranges.each_with_index do |range, frame_idx|
400
+ channels_count.times do |ch|
401
+ magnitude = frame_magnitude(samples[ch, range] * window)
402
+ centroid = compute_centroid(freqs, magnitude)
403
+ bandwidth_matrix[ch, frame_idx] = compute_bandwidth(freqs, magnitude, centroid, power)
404
+ end
405
+ end
406
+
407
+ bandwidth_matrix
408
+ end
409
+
410
+ # Computes the spectral rolloff for a single frame.
411
+ #
412
+ # @param spectrum [Numo::DFloat] Magnitude spectrum for the frame
413
+ # @param freqs [Numo::DFloat] Frequency bins (Hz)
414
+ # @param threshold [Float] Proportion of spectral energy to retain (default: 0.85)
415
+ # @return [Float] Roll-off frequency (Hz) for the frame
416
+ def rolloff_for_frame(spectrum, freqs, threshold)
417
+ total_energy = spectrum.sum
418
+ return 0.0 if total_energy.zero?
419
+
420
+ cumsum = spectrum.cumsum
421
+ threshold_energy = threshold * total_energy
422
+
423
+ rolloff_bin = cumsum.ge(threshold_energy).where[0]
424
+ rolloff_bin ||= freqs.size - 1
425
+
426
+ freqs[rolloff_bin]
427
+ end
428
+
429
+ # Computes the spectral rolloff over time for a signal.
430
+ #
431
+ # Spectral rolloff is the frequency below which a fixed percentage
432
+ # (threshold) of the total spectral energy is contained.
433
+ #
434
+ # @param samples [Numo::DFloat] Input samples (channels x samples)
435
+ # @param frame_size [Integer] FFT window size (default: 2048)
436
+ # @param hop_length [Integer] Step size between frames (default: 512)
437
+ # @param sample_rate [Integer] Sampling rate of the audio signal (default: 22050 Hz)
438
+ # @param threshold [Float] Proportion of spectral energy to retain (default: 0.85)
439
+ # @return [Numo::DFloat] Spectral rolloff matrix (channels x frames)
440
+ def spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85)
441
+ stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs
442
+ channels, _freqs_size, frames_size = stft_matrix.shape
443
+ freqs = frequency_bins(frame_size, sample_rate)
444
+
445
+ rolloff_matrix = Numo::DFloat.zeros(channels, frames_size)
446
+
447
+ frames_size.times do |frame_idx|
448
+ channels.times do |ch|
449
+ rolloff_matrix[ch, frame_idx] = rolloff_for_frame(
450
+ stft_matrix[ch, true, frame_idx], freqs, threshold
451
+ )
452
+ end
453
+ end
454
+
455
+ rolloff_matrix
456
+ end
457
+
458
+ # Convert frame indices to time in seconds.
459
+ #
460
+ # This method maps analysis frame indices (or total frame count) into
461
+ # corresponding time positions in seconds, similar to `librosa.frames_to_time`.
462
+ #
463
+ # @param frames [Integer, Numo::NArray] Either a single frame index,
464
+ # or a Numo array of shape (n_channels, n_frames) from which the total
465
+ # number of frames is inferred.
466
+ # @param hop_length [Integer] Number of audio samples between adjacent frames.
467
+ # Defaults to 512.
468
+ # @param sample_rate [Integer] Sampling rate of the audio signal in Hz.
469
+ # Defaults to 22,050 Hz.
470
+ #
471
+ # @return [Numo::DFloat] A 1-D Numo array of times (in seconds) corresponding
472
+ # to each frame index. If `frames` is an Integer, the return value spans
473
+ # from frame 0 up to `frames - 1`. If `frames` is a Numo array, the return
474
+ # value spans the number of frames inferred from `frames.shape[1]`.
475
+ #
476
+ # @example Using total frame count
477
+ # frames_to_time(100, hop_length: 512, sample_rate: 22050)
478
+ # # => Numo::DFloat[0.0, 0.0232, ..., 2.3121]
479
+ #
480
+ # @example Using a spectrogram matrix
481
+ # samples = Numo::DFloat.new(2, 500) # 2 channels, 500 frames
482
+ # frames_to_time(samples, hop_length: 512, sample_rate: 22050)
483
+ # # => Numo::DFloat[0.0, 0.0232, ..., 11.61]
484
+ #
485
+ def frames_to_time(frames, hop_length: 512, sample_rate: 22_050)
486
+ frames_size = frames.shape[1] unless frames.is_a?(Integer)
487
+ Numo::DFloat[0...frames_size] * hop_length / sample_rate.to_f
488
+ end
489
+
490
+ ##
491
+ # Computes the spectral flatness of an audio signal.
492
+ #
493
+ # Spectral flatness measures how noise-like a signal is, as opposed to being tone-like.
494
+ # A value closer to 1.0 indicates the spectrum is flat (similar to white noise),
495
+ # while values closer to 0.0 indicate a peaky spectrum (like a sine wave or harmonic-rich signal).
496
+ #
497
+ # @param samples [Numo::NArray]
498
+ # The input audio samples (1D array).
499
+ #
500
+ # @param frame_size [Integer] (2048)
501
+ # The size of each FFT window (frame). Larger sizes give better frequency
502
+ # resolution but worse time resolution.
503
+ #
504
+ # @param hop_length [Integer] (512)
505
+ # The number of samples to shift between consecutive FFT frames. Smaller values
506
+ # provide more overlap and smoother results.
507
+ #
508
+ # @param amin [Float] (1e-10)
509
+ # A small constant added for numerical stability, preventing log(0) or division by zero.
510
+ #
511
+ # @param power [Integer] (2)
512
+ # The power to which the magnitude spectrum is raised. Typically 2 to work with
513
+ # power spectrograms.
514
+ #
515
+ # @return [Numo::DFloat]
516
+ # A 1D Numo::DFloat array containing the spectral flatness values for each frame.
517
+ #
518
+ # @example Compute spectral flatness for an audio clip
519
+ # samples = Awaaz::Utils::Soundread.new("audio.wav").read
520
+ # flatness = spectral_flatness(samples, frame_size: 1024, hop_length: 256)
521
+ # puts flatness.shape
522
+ #
523
+ def spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2)
524
+ stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs
525
+ stft_matrix = Numo::DFloat.maximum(amin, stft_matrix**power)
526
+
527
+ gms = Numo::DFloat::Math.exp Numo::DFloat::Math.log(stft_matrix).mean(axis: -2)
528
+ ams = stft_matrix.mean(axis: -2)
529
+
530
+ gms / ams
531
+ end
532
+ end
533
+ end
@@ -0,0 +1,37 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Awaaz gem
4
+ module Awaaz
5
+ # Properties of audio
6
+ module Properties
7
+ # Calculates the duration (in seconds) of an audio signal given the number of samples and the sample rate.
8
+ #
9
+ # @param samples [Numo::NArray, Array, Object]
10
+ # The audio samples. This can be a Numo::NArray, Array, or any object
11
+ # that responds to `.shape` and returns a size array.
12
+ #
13
+ # @param sample_rate [Integer, Float]
14
+ # The sampling rate (in Hz) of the audio signal.
15
+ #
16
+ # @return [Float]
17
+ # The duration of the audio signal in seconds. Returns `0.0` if either
18
+ # the number of samples or the sample rate is non-positive.
19
+ #
20
+ # @example
21
+ # samples = Numo::DFloat.new(44100) # 1 second of audio at 44.1 kHz
22
+ # Awaaz.duration(samples, 44100)
23
+ # # => 1.0
24
+ #
25
+ # @note
26
+ # The duration is computed as:
27
+ # samples_count / sample_rate
28
+ #
29
+ # @see https://en.wikipedia.org/wiki/Sampling_(signal_processing)
30
+ def duration(samples, sample_rate)
31
+ samples_count = samples.shape.max
32
+ return 0.0 if samples_count <= 0 || sample_rate <= 0
33
+
34
+ samples_count / sample_rate.to_f
35
+ end
36
+ end
37
+ end
@@ -6,7 +6,7 @@ module Awaaz
6
6
  # Resample utilities for audio data represented as Numo::NArray.
7
7
  # Wraps the `libsamplerate` bindings provided by {Extensions::Samplerate}.
8
8
  #
9
- # @note This module is intended for internal use, but `read_and_resample_numo`
9
+ # @note This module is intended for internal use, but `read_and_resample`
10
10
  # is public for advanced users who need manual resampling.
11
11
  module Resample
12
12
  class << self
@@ -31,17 +31,19 @@ module Awaaz
31
31
  #
32
32
  # @example Resample 44.1kHz mono audio to 48kHz
33
33
  # samples = Numo::SFloat.new(44100).rand
34
- # new_samples = Awaaz::Utils::Resample.read_and_resample_numo(samples, 44100, 48000)
35
- def read_and_resample_numo(input_samples, input_rate, output_rate, sampling_option: :sinc_best_quality)
36
- validate_inputs(input_samples, input_rate, output_rate)
34
+ # new_samples = Awaaz::Utils::Resample.read_and_resample(samples, 44100, 48000)
35
+ def read_and_resample(input_samples, input_rate, output_rate, channels, sampling_option: :sinc_fastest)
36
+ return input_samples if input_rate == output_rate
37
+
38
+ validate_inputs(input_samples)
37
39
 
38
40
  ratio = calculate_ratio(input_rate, output_rate)
39
- input_ptr, output_ptr, input_frames, output_frames = prepare_memory(input_samples, ratio)
41
+ input_ptr, output_ptr, input_frames, output_frames = prepare_memory(input_samples, ratio, channels)
40
42
 
41
43
  data = build_src_data(input_ptr, output_ptr, input_frames, output_frames, ratio)
42
- perform_resampling(data, sampling_option)
44
+ perform_resampling(data, sampling_option, channels)
43
45
 
44
- convert_to_numo(output_ptr, data[:output_frames_gen])
46
+ convert_to_numo(output_ptr, data[:output_frames_gen] * channels)
45
47
  end
46
48
 
47
49
  private
@@ -50,14 +52,12 @@ module Awaaz
50
52
  # Validates that the provided inputs are of the correct type and configuration.
51
53
  #
52
54
  # @param samples [Numo::NArray] The input samples.
53
- # @param input_rate [Integer]
54
- # @param output_rate [Integer]
55
55
  #
56
56
  # @raise [ArgumentError] If samples are not a Numo::SFloat array.
57
- def validate_inputs(samples, input_rate, output_rate)
58
- return if input_rate != output_rate && samples.is_a?(Numo::NArray)
57
+ def validate_inputs(samples)
58
+ return if samples.is_a?(Numo::NArray)
59
59
 
60
- raise ArgumentError, "Input must be a Numo::SFloat array" unless samples.is_a?(Numo::NArray)
60
+ raise ArgumentError, "Input must be a Numo::SFloat array"
61
61
  end
62
62
 
63
63
  ##
@@ -82,14 +82,14 @@ module Awaaz
82
82
  # @param ratio [Float] The resampling ratio.
83
83
  #
84
84
  # @return [Array<FFI::MemoryPointer, FFI::MemoryPointer, Integer, Integer>]
85
- def prepare_memory(input_samples, ratio)
86
- input_frames = input_samples.size
85
+ def prepare_memory(input_samples, ratio, channels)
86
+ input_frames = input_samples.size / channels
87
87
  output_frames = (input_frames * ratio).to_i
88
88
 
89
- input_ptr = FFI::MemoryPointer.new(:float, input_frames)
89
+ input_ptr = FFI::MemoryPointer.new(:float, input_samples.size)
90
90
  input_ptr.write_bytes(input_samples.to_string)
91
91
 
92
- output_ptr = FFI::MemoryPointer.new(:float, output_frames)
92
+ output_ptr = FFI::MemoryPointer.new(:float, output_frames * channels)
93
93
 
94
94
  [input_ptr, output_ptr, input_frames, output_frames]
95
95
  end
@@ -122,8 +122,9 @@ module Awaaz
122
122
  # @param sampling_option [Symbol, Integer]
123
123
  #
124
124
  # @raise [Awaaz::ResampleError] If resampling fails.
125
- def perform_resampling(data, sampling_option)
126
- err = Extensions::Samplerate.src_simple(data, Extensions::Samplerate.resample_option(sampling_option), 1)
125
+ def perform_resampling(data, sampling_option, channels)
126
+ err = Extensions::Samplerate.src_simple(data, Extensions::Samplerate.resample_option(sampling_option),
127
+ channels)
127
128
  raise Awaaz::ResampleError, "Resampling failed: #{Extensions::Samplerate.src_strerror(err)}" if err != 0
128
129
  end
129
130
 
@@ -49,7 +49,16 @@ module Awaaz
49
49
  # @return [Boolean] +true+ if mono, otherwise +false+.
50
50
  #
51
51
  def mono
52
- from_options(:mono) || false
52
+ from_options(:mono) || true
53
+ end
54
+
55
+ ##
56
+ # Resampling option
57
+ #
58
+ # @return [Symbol] default :linear
59
+ #
60
+ def resampling_option
61
+ from_options(:resampling_option) || :linear
53
62
  end
54
63
 
55
64
  ##
@@ -3,166 +3,142 @@
3
3
  module Awaaz
4
4
  module Utils
5
5
  ##
6
- # A utility class for reading and optionally resampling audio files.
6
+ # A helper that mimics librosa.load using libsndfile via FFI.
7
7
  #
8
- # This class supports reading `.wav` files using {Extensions::Soundfile}
9
- # and can automatically resample them using {Utils::Resample}.
8
+ # - Always returns Float32 samples normalized in [-1.0, 1.0]
9
+ # - Preserves channel structure (returns shape `[channels, frames]`)
10
+ # - Returns `[data, channels, sr]` where:
11
+ # * `data` = Numo::SFloat array (2D, shape: channels x frames)
12
+ # * `channels` = Integer number of channels
13
+ # * `sr` = sample rate (Integer)
10
14
  #
11
- # @example Read and resample a WAV file
12
- # reader = Awaaz::Utils::Soundread.new("audio.wav", resample_options: { output_rate: 44100 })
13
- # samples, channels, rate = reader.read
14
- #
15
- # @note Currently, only `.wav` files are supported.
15
+ # @example
16
+ # reader = Awaaz::Utils::Soundread.new("audio.wav")
17
+ # data, channels, sr = reader.read
16
18
  #
17
19
  class Soundread
18
20
  ##
19
- # Supported audio file extensions.
20
- #
21
- # @return [Array<String>] List of supported file extensions.
22
- #
23
- SUPPORTED_EXTENSIONS = %w[.wav].freeze
24
-
25
- ##
26
- # Creates a new Soundread instance.
21
+ # Initializes a Soundread instance.
27
22
  #
28
23
  # @param filename [String] Path to the audio file to read.
29
- # @param resample_options [Hash] Options for resampling the audio.
30
- # - `:output_rate` [Integer] Output sample rate (default: `22050`)
31
- # - `:sampling_option` [Symbol] Resampling algorithm (default: `:sinc_fastest`)
24
+ # @param resampling_options [Hash] Optional resampling configuration.
32
25
  #
33
- def initialize(filename, resample_options: default_resample_options)
26
+ def initialize(filename, **resampling_options)
34
27
  @filename = filename
35
- @resample_options = resample_options || {}
28
+ @resampling_options = resampling_options
36
29
  end
37
30
 
38
31
  ##
39
- # Reads the audio file, returning its samples and metadata.
32
+ # Reads the audio file, returning samples, number of channels, and sample rate.
40
33
  #
41
34
  # @return [Array<(Numo::SFloat, Integer, Integer)>]
42
- # A tuple containing:
43
- # - samples [Numo::SFloat] Audio samples as a Numo array.
44
- # - channels [Integer] Number of channels in the audio.
45
- # - output_rate [Integer] — Sample rate of the returned audio.
35
+ # - data [Numo::SFloat] Audio samples, shape = `[channels, frames]`
36
+ # - channels [Integer] Number of channels
37
+ # - sr [Integer] Sample rate
46
38
  #
47
- # @raise [ArgumentError] If the file extension is unsupported.
48
- # @raise [Awaaz::AudioreadError] If the file cannot be opened.
39
+ # @raise [ArgumentError] If the file cannot be opened.
49
40
  #
50
41
  def read
51
- validate_support
52
- soundfile, sample_rate, frames, channels = open_file
53
- samples = parse_soundfile(soundfile, frames, channels)
54
- close_soundfile(soundfile)
42
+ info, sndfile = open_file
43
+ frames, channels, sr = extract_info(info)
44
+
45
+ buffer, read_frames = read_buffer(sndfile, frames, channels)
46
+ close_file(sndfile)
55
47
 
56
- resample(samples, sample_rate, channels)
48
+ data = process_data(buffer, read_frames, channels)
49
+ [resample(data, sr, channels), channels, sr]
57
50
  end
58
51
 
59
52
  private
60
53
 
61
- ##
62
- # Default resampling options.
63
- #
64
- # @return [Hash] Default options with `:output_rate => 22050`.
65
- #
66
- def default_resample_options
67
- { output_rate: 22_050 }
68
- end
54
+ def resample(samples, sample_rate, channels)
55
+ validate_resampling_options
69
56
 
70
- ##
71
- # Ensures the file format is supported.
72
- #
73
- # @raise [ArgumentError] If the file extension is not in {SUPPORTED_EXTENSIONS}.
74
- #
75
- def validate_support
76
- return if supported?
57
+ output_rate, sampling_option = @resampling_options.values_at(:output_rate, :sampling_rate)
58
+ sampling_option ||= :linear
77
59
 
78
- raise ArgumentError, "File extension not supported. Supported files: #{SUPPORTED_EXTENSIONS.join(",")}"
60
+ return samples if output_rate == sample_rate || @resampling_options.empty?
61
+
62
+ Utils::Resample.read_and_resample(samples, sample_rate, output_rate, channels, sampling_option: sampling_option)
79
63
  end
80
64
 
81
- ##
82
- # Checks if the file extension is supported.
83
- #
84
- # @return [Boolean] `true` if supported, `false` otherwise.
85
- #
86
- def supported?
87
- SUPPORTED_EXTENSIONS.include?(File.extname(@filename))
65
+ def validate_resampling_options
66
+ valid_options = %i[output_rate sampling_option]
67
+
68
+ @resampling_options.transform_keys!(&:to_sym)
69
+ @resampling_options.each_key do |key|
70
+ next if valid_options.include?(key)
71
+
72
+ raise ArgumentError, "Invalid option: #{key}. Available options: #{valid_options.join}"
73
+ end
88
74
  end
89
75
 
90
76
  ##
91
- # Opens the audio file for reading.
77
+ # Opens the file and retrieves SF_INFO metadata.
92
78
  #
93
- # @return [Array<(FFI::Pointer, Integer, Integer, Integer)>]
94
- # A tuple containing:
95
- # - soundfile [FFI::Pointer] — Pointer to the opened sound file.
96
- # - sample_rate [Integer] — Sample rate of the audio file.
97
- # - frames [Integer] — Number of frames in the file.
98
- # - channels [Integer] — Number of channels in the file.
79
+ # @return [Array<(Awaaz::Extensions::Soundfile::SF_INFO, FFI::Pointer)>]
99
80
  #
100
- # @raise [Awaaz::AudioreadError] If the file cannot be opened.
81
+ # @raise [ArgumentError] If the file cannot be opened.
101
82
  #
102
83
  def open_file
103
- info = Extensions::Soundfile::SF_INFO.new
104
- sndfile = Extensions::Soundfile.sf_open(@filename, Extensions::Soundfile::SFM_READ, info.to_ptr)
84
+ info = Awaaz::Extensions::Soundfile::SF_INFO.new
85
+ sndfile = Awaaz::Extensions::Soundfile.sf_open(
86
+ @filename,
87
+ Awaaz::Extensions::Soundfile::SFM_READ,
88
+ info
89
+ )
105
90
 
106
- raise Awaaz::AudioreadError, "Could not read the audio file" if sndfile.null?
91
+ raise ArgumentError, "Could not open file: #{@filename}" if sndfile.null?
107
92
 
108
- sample_rate = info[:samplerate]
109
- frames = info[:frames]
110
- channels = info[:channels]
111
- [sndfile, sample_rate, frames, channels]
93
+ [info, sndfile]
112
94
  end
113
95
 
114
96
  ##
115
- # Reads the raw samples from the file and converts them into a Numo array.
97
+ # Extracts frames, channels, and sample rate from SF_INFO.
116
98
  #
117
- # @param soundfile [FFI::Pointer] Open sound file pointer.
99
+ # @param info [Awaaz::Extensions::Soundfile::SF_INFO]
100
+ # @return [Array<(Integer, Integer, Integer)>] frames, channels, sr
101
+ #
102
+ def extract_info(info)
103
+ [info[:frames], info[:channels], info[:samplerate]]
104
+ end
105
+
106
+ ##
107
+ # Reads raw audio frames into a memory buffer.
108
+ #
109
+ # @param sndfile [FFI::Pointer] Opened sound file.
118
110
  # @param frames [Integer] Number of frames to read.
119
- # @param channels [Integer] Number of channels in the file.
120
- # @return [Numo::SFloat] The audio samples.
111
+ # @param channels [Integer] Number of channels.
121
112
  #
122
- def parse_soundfile(soundfile, frames, channels)
113
+ # @return [Array<(FFI::MemoryPointer, Integer)>] buffer and number of read frames
114
+ #
115
+ def read_buffer(sndfile, frames, channels)
123
116
  buffer = FFI::MemoryPointer.new(:float, frames * channels)
124
- read_frames = Extensions::Soundfile.sf_readf_float(soundfile, buffer, frames)
125
- Numo::SFloat.cast(buffer.read_array_of_float(read_frames * channels))
117
+ read_frames = Awaaz::Extensions::Soundfile.sf_readf_float(sndfile, buffer, frames)
118
+ [buffer, read_frames]
126
119
  end
127
120
 
128
121
  ##
129
122
  # Closes the open sound file.
130
123
  #
131
- # @param soundfile [FFI::Pointer] Open sound file pointer.
124
+ # @param sndfile [FFI::Pointer]
132
125
  # @return [void]
133
126
  #
134
- def close_soundfile(soundfile)
135
- Extensions::Soundfile.sf_close(soundfile)
127
+ def close_file(sndfile)
128
+ Awaaz::Extensions::Soundfile.sf_close(sndfile)
136
129
  end
137
130
 
138
131
  ##
139
- # Resamples the audio if necessary.
132
+ # Converts the buffer into a Numo::SFloat array and reshapes to `[channels, frames]`.
140
133
  #
141
- # @param samples [Numo::SFloat] The input samples.
142
- # @param sample_rate [Integer] Original sample rate.
134
+ # @param buffer [FFI::MemoryPointer]
135
+ # @param read_frames [Integer] Number of frames read.
143
136
  # @param channels [Integer] Number of channels.
144
- # @return [Array<(Numo::SFloat, Integer, Integer)>]
137
+ # @return [Numo::SFloat] Audio data of shape `[channels, frames]`.
145
138
  #
146
- # @raise [ArgumentError] If an invalid resample option key is passed.
147
- #
148
- def resample(samples, sample_rate, channels)
149
- valid_options = %i[output_rate sampling_option]
150
-
151
- @resample_options.transform_keys!(&:to_sym)
152
- @resample_options.each_key do |key|
153
- next if valid_options.include?(key)
154
-
155
- raise ArgumentError, "Invalid option: #{key}. Available options: #{valid_options.join}"
156
- end
157
-
158
- output_rate, sampling_option = @resample_options.values_at(:output_rate, :sampling_rate)
159
- sampling_option ||= :sinc_fastest
160
-
161
- [
162
- Utils::Resample.read_and_resample_numo(samples, sample_rate, output_rate, sampling_option:),
163
- channels,
164
- output_rate
165
- ]
139
+ def process_data(buffer, read_frames, channels)
140
+ data = Numo::SFloat.cast(buffer.read_array_of_float(read_frames * channels))
141
+ data.reshape(read_frames, channels).transpose
166
142
  end
167
143
  end
168
144
  end
@@ -19,6 +19,7 @@ require_relative "soundread"
19
19
  require_relative "shell_command_builder"
20
20
  require_relative "via_shell"
21
21
 
22
+ # Awaaz gem
22
23
  module Awaaz
23
24
  # The Utils module provides low-level helper components
24
25
  # for performing core audio-related operations in the Awaaz gem.
data/lib/awaaz/version.rb CHANGED
@@ -2,5 +2,5 @@
2
2
 
3
3
  module Awaaz
4
4
  # Version the Awaaz gem.
5
- VERSION = "0.1.0"
5
+ VERSION = "0.2.0"
6
6
  end
data/lib/awaaz.rb CHANGED
@@ -10,11 +10,11 @@
10
10
  # @see Awaaz::Decoders
11
11
  # @see Awaaz::Utils
12
12
  # @see Awaaz::Config
13
- module Awaaz
14
- end
15
-
13
+ # @see Awaaz::Features
14
+ # @see Awaaz::Properties
16
15
  require "ffi"
17
16
  require "numo/narray"
17
+ require "numo/pocketfft"
18
18
 
19
19
  require_relative "awaaz/errors"
20
20
  require_relative "awaaz/extensions/extensions"
@@ -23,3 +23,10 @@ require_relative "awaaz/version"
23
23
 
24
24
  require_relative "awaaz/config"
25
25
  require_relative "awaaz/decoders/decoders"
26
+ require_relative "awaaz/features"
27
+ require_relative "awaaz/properties"
28
+
29
+ module Awaaz
30
+ extend Features
31
+ extend Properties
32
+ end
metadata CHANGED
@@ -1,13 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: awaaz
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Saad Azam
8
+ autorequire:
8
9
  bindir: exe
9
10
  cert_chain: []
10
- date: 2025-08-12 00:00:00.000000000 Z
11
+ date: 2025-08-29 00:00:00.000000000 Z
11
12
  dependencies:
12
13
  - !ruby/object:Gem::Dependency
13
14
  name: ffi
@@ -37,6 +38,20 @@ dependencies:
37
38
  - - "~>"
38
39
  - !ruby/object:Gem::Version
39
40
  version: 0.9.1
41
+ - !ruby/object:Gem::Dependency
42
+ name: numo-pocketfft
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: 0.4.1
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: 0.4.1
40
55
  - !ruby/object:Gem::Dependency
41
56
  name: ruby-filemagic
42
57
  requirement: !ruby/object:Gem::Requirement
@@ -78,6 +93,8 @@ files:
78
93
  - lib/awaaz/extensions/extensions.rb
79
94
  - lib/awaaz/extensions/samplerate.rb
80
95
  - lib/awaaz/extensions/soundfile.rb
96
+ - lib/awaaz/features.rb
97
+ - lib/awaaz/properties.rb
81
98
  - lib/awaaz/utils/resample.rb
82
99
  - lib/awaaz/utils/shell_command_builder.rb
83
100
  - lib/awaaz/utils/sound_config.rb
@@ -95,6 +112,7 @@ metadata:
95
112
  source_code_uri: https://github.com/SadMadLad/awaaz
96
113
  changelog_uri: https://github.com/SadMadLad/awaaz/blob/main/CHANGELOG.md
97
114
  rubygems_mfa_required: 'true'
115
+ post_install_message:
98
116
  rdoc_options: []
99
117
  require_paths:
100
118
  - lib
@@ -102,14 +120,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
102
120
  requirements:
103
121
  - - ">="
104
122
  - !ruby/object:Gem::Version
105
- version: 3.4.2
123
+ version: 3.0.0
106
124
  required_rubygems_version: !ruby/object:Gem::Requirement
107
125
  requirements:
108
126
  - - ">="
109
127
  - !ruby/object:Gem::Version
110
128
  version: '0'
111
129
  requirements: []
112
- rubygems_version: 3.6.2
130
+ rubygems_version: 3.2.3
131
+ signing_key:
113
132
  specification_version: 4
114
133
  summary: Audio Analysis with Ruby
115
134
  test_files: []