voxpulse 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
voxpulse-1.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Abhishek Gour
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,106 @@
1
+ Metadata-Version: 2.4
2
+ Name: voxpulse
3
+ Version: 1.0.0
4
+ Summary: A lightweight, fast, and AI-powered custom wake word detection system.
5
+ Home-page: https://github.com/itzabhishekgour/VoxPulse
6
+ Author: Abhishek Gour
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: License :: OSI Approved :: MIT License
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.8
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: numpy
14
+ Requires-Dist: scipy
15
+ Requires-Dist: librosa
16
+ Requires-Dist: sounddevice
17
+ Requires-Dist: audiomentations
18
+ Requires-Dist: tensorflow
19
+ Requires-Dist: soundfile
20
+ Requires-Dist: scikit-learn
21
+ Dynamic: author
22
+ Dynamic: classifier
23
+ Dynamic: description
24
+ Dynamic: description-content-type
25
+ Dynamic: home-page
26
+ Dynamic: license-file
27
+ Dynamic: requires-dist
28
+ Dynamic: requires-python
29
+ Dynamic: summary
30
+
31
+ # VoxPulse - Custom Wake Word Detection Framework
32
+
33
+ VoxPulse is a lightweight, offline, and 100% private DIY custom wake-word detection library for Python. Instead of relying on pre-trained corporate wake words like "Alexa" or "Hey Siri", VoxPulse empowers developers to train their own voice assistants with any custom name, in any language!
34
+
35
+ ---
36
+
37
+ ## Why VoxPulse? (Pros & Cons)
38
+
39
+ ### Pros (The Good Stuff)
40
+ * **100% Privacy:** Everything runs locally on your machine. No internet required, no voice data is sent to the cloud.
41
+ * **Auto-Data Pipeline:** You just provide raw `.wav` recordings. VoxPulse automatically handles background noise mixing, time-stretching, pitch-shifting, and Mel-Spectrogram feature extraction.
42
+ * **CPU & Battery Efficient:** Features RMS Silence Gating. The AI model goes to sleep when the room is silent (CPU usage drops to ~0%) and only triggers the neural network when someone speaks.
43
+ * **Lightweight:** Uses a custom 2D Convolutional Neural Network (CNN) compiled into TensorFlow Lite (`.tflite`), making it blazing fast even on low-end hardware.
44
+
45
+ ### Cons (The Limitations)
46
+ * **DIY Approach:** Since it's a custom framework, there is no pre-trained model. You must spend 5 minutes recording your own voice and room noise to use it.
47
+ * **Environment Sensitive:** The accuracy heavily depends on the quality of the background noise (`negative` dataset) you provide during training.
48
+
49
+ ---
50
+
51
+ ## How to Use VoxPulse (Quick Start Guide)
52
+
53
+ ### Step 1: Install the Library
54
+ Install VoxPulse directly via pip:
55
+ ```bash
56
+ pip install voxpulse
57
+ ```
58
+
59
+ ### Step 2: Prepare Your Dataset
60
+ Create a folder named `dataset` in your project directory with two sub-folders:
61
+
62
+ * **dataset/positive/** - Record and save 10-15 short `.wav` files of you saying your custom wake word (e.g., "Hey Friday"). Keep them around 1 to 1.5 seconds long.
63
+ * **dataset/negative/** - Record a single 5-10 minute `.wav` file of your normal room background noise (fan sounds, typing, distant talking) and place it here.
64
+
65
+ ### Step 3: Train Your Custom Model
66
+ Create a python script (e.g., `train.py`) and run the auto-pipeline:
67
+
68
+ ```python
69
+ from voxpulse.model import VoxPulseTrainer
70
+
71
+ # This single command will automatically augment data, extract features, and train the CNN!
72
+ trainer = VoxPulseTrainer(dataset_dir="dataset")
73
+ trainer.train_and_export(epochs=20, export_name="my_custom_model.tflite")
74
+ ```
75
+
76
+ ### Step 4: Run the Inference Engine
77
+ Once your `.tflite` model is generated, you can use it to trigger any Python function in real-time. Create `run.py`:
78
+
79
+ ```python
80
+ from voxpulse.inference import VoxPulseEngine
81
+
82
+ def trigger_my_action():
83
+ print("Custom Wake Word Detected! Executing action...")
84
+ # Add your automation code here (e.g., open Spotify, turn on lights)
85
+
86
+ # Load your newly trained model
87
+ engine = VoxPulseEngine(model_path="my_custom_model.tflite", threshold=0.70)
88
+
89
+ # Start listening in the background
90
+ engine.start_listening(on_detect_callback=trigger_my_action)
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Under the Hood (Architecture)
96
+
97
+ VoxPulse abstracts away the complexity of audio machine learning. When you call the training function, it executes the following pipeline automatically:
98
+
99
+ ```mermaid
100
+ graph TD
101
+ A[Raw Audio: dataset/positive] -->|Step 1: Auto-Augmentation| B[Pitch Shift & Time Stretch]
102
+ D[Background Noise: dataset/negative] -->|Mix Noise| B
103
+ B -->|Step 2: Mel-Spectrogram| C[Feature Matrices]
104
+ C -->|Step 3: CNN Training| E[Keras Sequential Model]
105
+ E -->|Step 4: Compilation| F[Lightweight TFLite Model]
106
+ ```
@@ -0,0 +1,76 @@
1
+ # VoxPulse - Custom Wake Word Detection Framework
2
+
3
+ VoxPulse is a lightweight, offline, and 100% private DIY custom wake-word detection library for Python. Instead of relying on pre-trained corporate wake words like "Alexa" or "Hey Siri", VoxPulse empowers developers to train their own voice assistants with any custom name, in any language!
4
+
5
+ ---
6
+
7
+ ## Why VoxPulse? (Pros & Cons)
8
+
9
+ ### Pros (The Good Stuff)
10
+ * **100% Privacy:** Everything runs locally on your machine. No internet required, no voice data is sent to the cloud.
11
+ * **Auto-Data Pipeline:** You just provide raw `.wav` recordings. VoxPulse automatically handles background noise mixing, time-stretching, pitch-shifting, and Mel-Spectrogram feature extraction.
12
+ * **CPU & Battery Efficient:** Features RMS Silence Gating. The AI model goes to sleep when the room is silent (CPU usage drops to ~0%) and only triggers the neural network when someone speaks.
13
+ * **Lightweight:** Uses a custom 2D Convolutional Neural Network (CNN) compiled into TensorFlow Lite (`.tflite`), making it blazing fast even on low-end hardware.
14
+
15
+ ### Cons (The Limitations)
16
+ * **DIY Approach:** Since it's a custom framework, there is no pre-trained model. You must spend 5 minutes recording your own voice and room noise to use it.
17
+ * **Environment Sensitive:** The accuracy heavily depends on the quality of the background noise (`negative` dataset) you provide during training.
18
+
19
+ ---
20
+
21
+ ## How to Use VoxPulse (Quick Start Guide)
22
+
23
+ ### Step 1: Install the Library
24
+ Install VoxPulse directly via pip:
25
+ ```bash
26
+ pip install voxpulse
27
+ ```
28
+
29
+ ### Step 2: Prepare Your Dataset
30
+ Create a folder named `dataset` in your project directory with two sub-folders:
31
+
32
+ * **dataset/positive/** - Record and save 10-15 short `.wav` files of you saying your custom wake word (e.g., "Hey Friday"). Keep them around 1 to 1.5 seconds long.
33
+ * **dataset/negative/** - Record a single 5-10 minute `.wav` file of your normal room background noise (fan sounds, typing, distant talking) and place it here.
34
+
35
+ ### Step 3: Train Your Custom Model
36
+ Create a python script (e.g., `train.py`) and run the auto-pipeline:
37
+
38
+ ```python
39
+ from voxpulse.model import VoxPulseTrainer
40
+
41
+ # This single command will automatically augment data, extract features, and train the CNN!
42
+ trainer = VoxPulseTrainer(dataset_dir="dataset")
43
+ trainer.train_and_export(epochs=20, export_name="my_custom_model.tflite")
44
+ ```
45
+
46
+ ### Step 4: Run the Inference Engine
47
+ Once your `.tflite` model is generated, you can use it to trigger any Python function in real-time. Create `run.py`:
48
+
49
+ ```python
50
+ from voxpulse.inference import VoxPulseEngine
51
+
52
+ def trigger_my_action():
53
+ print("Custom Wake Word Detected! Executing action...")
54
+ # Add your automation code here (e.g., open Spotify, turn on lights)
55
+
56
+ # Load your newly trained model
57
+ engine = VoxPulseEngine(model_path="my_custom_model.tflite", threshold=0.70)
58
+
59
+ # Start listening in the background
60
+ engine.start_listening(on_detect_callback=trigger_my_action)
61
+ ```
62
+
63
+ ---
64
+
65
+ ## Under the Hood (Architecture)
66
+
67
+ VoxPulse abstracts away the complexity of audio machine learning. When you call the training function, it executes the following pipeline automatically:
68
+
69
+ ```mermaid
70
+ graph TD
71
+ A[Raw Audio: dataset/positive] -->|Step 1: Auto-Augmentation| B[Pitch Shift & Time Stretch]
72
+ D[Background Noise: dataset/negative] -->|Mix Noise| B
73
+ B -->|Step 2: Mel-Spectrogram| C[Feature Matrices]
74
+ C -->|Step 3: CNN Training| E[Keras Sequential Model]
75
+ E -->|Step 4: Compilation| F[Lightweight TFLite Model]
76
+ ```
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,33 @@
1
+ from setuptools import setup, find_packages
2
+ # Packaging and distribution configuration for setuptools to publish VoxPulse on PyPI.
3
+
4
+ # Load the README file to serve as the long description on PyPI
5
+ with open("README.md", "r", encoding="utf-8") as fh:
6
+ long_description = fh.read()
7
+
8
+ setup(
9
+ name="voxpulse",
10
+ version="1.0.0",
11
+ author="Abhishek Gour",
12
+ description="A lightweight, fast, and AI-powered custom wake word detection system.",
13
+ long_description=long_description,
14
+ long_description_content_type="text/markdown",
15
+ url="https://github.com/itzabhishekgour/VoxPulse",
16
+ packages=find_packages(),
17
+ install_requires=[
18
+ "numpy",
19
+ "scipy",
20
+ "librosa",
21
+ "sounddevice",
22
+ "audiomentations",
23
+ "tensorflow",
24
+ "soundfile",
25
+ "scikit-learn"
26
+ ],
27
+ classifiers=[
28
+ "Programming Language :: Python :: 3",
29
+ "License :: OSI Approved :: MIT License",
30
+ "Operating System :: OS Independent",
31
+ ],
32
+ python_requires='>=3.8',
33
+ )
@@ -0,0 +1,5 @@
1
+ # Re-export inference engine class for cleaner top-level package imports
2
+ from .inference import VoxPulseEngine
3
+
4
+ __version__ = "1.0.0"
5
+ __author__ = "Abhishek Gour | AI Developer"
@@ -0,0 +1,77 @@
1
+ import os
2
+ import glob
3
+ import numpy as np
4
+ import librosa
5
+ import soundfile as sf
6
+ from audiomentations import Compose, PitchShift, TimeStretch
7
+
8
+ SAMPLE_RATE = 16000
9
+
10
+ def load_noise_files(neg_dir):
11
+ noise_audios = []
12
+ if os.path.exists(neg_dir):
13
+ noise_files = glob.glob(os.path.join(neg_dir, "*.wav"))
14
+ for nf in noise_files:
15
+ try:
16
+ y, _ = librosa.load(nf, sr=SAMPLE_RATE, mono=True)
17
+ noise_audios.append(y)
18
+ print(f" Loaded background noise source: {os.path.basename(nf)}")
19
+ except Exception as e:
20
+ print(f" Could not load noise file {nf}: {e}")
21
+ return noise_audios
22
+
23
+ def apply_augmentation(audio, filename_prefix, noise_audios, aug_dir):
24
+ print(f" Augmenting '{filename_prefix}' (50 variations)...")
25
+ augmenter = Compose([
26
+ PitchShift(min_semitones=-4, max_semitones=4, p=0.8),
27
+ TimeStretch(min_rate=0.8, max_rate=1.2, p=0.8)
28
+ ])
29
+
30
+ for i in range(50):
31
+ aug_audio = augmenter(samples=audio, sample_rate=SAMPLE_RATE)
32
+
33
+ if noise_audios:
34
+ noise = noise_audios[np.random.randint(len(noise_audios))]
35
+ if len(noise) >= len(aug_audio):
36
+ start = np.random.randint(0, len(noise) - len(aug_audio) + 1)
37
+ noise_chunk = noise[start:start + len(aug_audio)]
38
+ else:
39
+ tiles = int(np.ceil(len(aug_audio) / len(noise)))
40
+ tiled_noise = np.tile(noise, tiles)
41
+ noise_chunk = tiled_noise[:len(aug_audio)]
42
+
43
+ noise_factor = np.random.uniform(0.02, 0.08)
44
+ aug_audio = aug_audio + noise_factor * noise_chunk
45
+
46
+ max_val = np.max(np.abs(aug_audio))
47
+ if max_val > 1.0:
48
+ aug_audio = aug_audio / max_val
49
+
50
+ aug_path = os.path.join(aug_dir, f"aug_{filename_prefix}_{i}.wav")
51
+ sf.write(aug_path, aug_audio, SAMPLE_RATE)
52
+
53
+ def run_augmentation(dataset_dir):
54
+ """Callable function to run data augmentation automatically"""
55
+ print("\n[INFO] Starting Step 1: Data Augmentation Pipeline...")
56
+ pos_dir = os.path.join(dataset_dir, "positive")
57
+ aug_dir = os.path.join(dataset_dir, "augmented")
58
+ neg_dir = os.path.join(dataset_dir, "negative")
59
+
60
+ os.makedirs(pos_dir, exist_ok=True)
61
+ os.makedirs(aug_dir, exist_ok=True)
62
+ os.makedirs(neg_dir, exist_ok=True)
63
+
64
+ noise_audios = load_noise_files(neg_dir)
65
+ wav_files = [f for f in os.listdir(pos_dir) if f.endswith('.wav')]
66
+
67
+ if len(wav_files) == 0:
68
+ raise FileNotFoundError(f"\n[ERROR] No .wav files found in '{pos_dir}'!\nPlease record 10-15 voice samples of your wake word and place them in 'dataset/positive/' before training.")
69
+
70
+ print(f"[INFO] Found {len(wav_files)} positive samples. Generating 50x variations...")
71
+ for file_name in wav_files:
72
+ file_path = os.path.join(pos_dir, file_name)
73
+ audio, _ = librosa.load(file_path, sr=SAMPLE_RATE, mono=True)
74
+ prefix = os.path.splitext(file_name)[0]
75
+ apply_augmentation(audio, prefix, noise_audios, aug_dir)
76
+
77
+ print("[SUCCESS] Data augmentation completed!")
@@ -0,0 +1,89 @@
1
+ import os
2
+ import glob
3
+ import numpy as np
4
+ import librosa
5
+
6
+ SAMPLE_RATE = 16000
7
+ DURATION = 1.5
8
+ N_MELS = 40
9
+ N_FFT = 400
10
+ HOP_LENGTH = 160
11
+
12
+ def extract_features(file_path, is_negative=False):
13
+ """Extracts Mel-Spectrogram features from an audio file"""
14
+ try:
15
+ audio, _ = librosa.load(file_path, sr=SAMPLE_RATE, mono=True)
16
+ if is_negative:
17
+ samples_per_chunk = int(DURATION * SAMPLE_RATE)
18
+ chunks = []
19
+ for i in range(0, len(audio) - samples_per_chunk, samples_per_chunk):
20
+ chunk = audio[i:i + samples_per_chunk]
21
+ spectrogram = librosa.feature.melspectrogram(
22
+ y=chunk, sr=SAMPLE_RATE, n_mels=N_MELS, n_fft=N_FFT, hop_length=HOP_LENGTH
23
+ )
24
+ spectrogram_db = librosa.power_to_db(spectrogram, ref=np.max)
25
+ chunks.append(spectrogram_db.T)
26
+ return chunks
27
+ else:
28
+ target_length = int(DURATION * SAMPLE_RATE)
29
+ if len(audio) > target_length:
30
+ audio = audio[:target_length]
31
+ else:
32
+ audio = np.pad(audio, (0, max(0, target_length - len(audio))), "constant")
33
+
34
+ spectrogram = librosa.feature.melspectrogram(
35
+ y=audio, sr=SAMPLE_RATE, n_mels=N_MELS, n_fft=N_FFT, hop_length=HOP_LENGTH
36
+ )
37
+ spectrogram_db = librosa.power_to_db(spectrogram, ref=np.max)
38
+ return [spectrogram_db.T]
39
+
40
+ except Exception as e:
41
+ print(f"Error processing {file_path}: {e}")
42
+ return []
43
+
44
+ def run_feature_extraction(dataset_dir):
45
+ """Callable function to extract features automatically"""
46
+ print("\n[INFO] Starting Step 2: Feature Extraction Pipeline...")
47
+ aug_dir = os.path.join(dataset_dir, "augmented")
48
+ neg_dir = os.path.join(dataset_dir, "negative")
49
+
50
+ # 1. PROCESS POSITIVE DATA
51
+ print("[INFO] Generating spectrograms for positive (augmented) files...")
52
+ positive_files = glob.glob(os.path.join(aug_dir, "*.wav"))
53
+ positive_features = []
54
+
55
+ for i, file in enumerate(positive_files):
56
+ if i % 100 == 0:
57
+ print(f" Processing positive files: {i}/{len(positive_files)}")
58
+ feats = extract_features(file)
59
+ if feats:
60
+ positive_features.extend(feats)
61
+
62
+ pos_array = np.array(positive_features)
63
+ np.save(os.path.join(dataset_dir, "positive_features.npy"), pos_array)
64
+
65
+ # 2. PROCESS NEGATIVE DATA
66
+ print("\n[INFO] Generating spectrograms for negative (background noise) data...")
67
+ negative_files = glob.glob(os.path.join(neg_dir, "*.wav"))
68
+ negative_features = []
69
+
70
+ if len(negative_files) == 0:
71
+ print("[WARNING] The 'dataset/negative' directory is empty. Using zero-arrays to prevent crash (NOT RECOMMENDED).")
72
+ # Dummy data fallback just in case user forgets negative noise
73
+ dummy_feats = np.zeros_like(pos_array[:10])
74
+ negative_features.extend(dummy_feats)
75
+ else:
76
+ for file in negative_files:
77
+ print(f" Cutting chunks from: {os.path.basename(file)}")
78
+ feats = extract_features(file, is_negative=True)
79
+ if feats:
80
+ negative_features.extend(feats)
81
+
82
+ target_neg_count = len(positive_features) * 2
83
+ if len(negative_features) > target_neg_count:
84
+ np.random.shuffle(negative_features)
85
+ negative_features = negative_features[:target_neg_count]
86
+
87
+ neg_array = np.array(negative_features)
88
+ np.save(os.path.join(dataset_dir, "negative_features.npy"), neg_array)
89
+ print("\n[SUCCESS] Feature extraction completed! The dataset is ready for training.")
@@ -0,0 +1,110 @@
1
+ import os
2
+ import time
3
+ import numpy as np
4
+ import sounddevice as sd
5
+ import librosa
6
+ import tensorflow as tf
7
+
8
+ # Hides TensorFlow warning logs
9
+ os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
10
+
11
+ DEFAULT_MODEL_PATH = os.path.join(os.getcwd(), "voxpulse_model.tflite")
12
+
13
+ class VoxPulseEngine:
14
+ def __init__(self, model_path=None, threshold=0.60, energy_threshold=0.002):
15
+ self.sample_rate = 16000
16
+ self.duration = 1.5
17
+ self.chunk_duration = 0.5
18
+ self.buffer_size = int(self.sample_rate * self.duration)
19
+ self.chunk_size = int(self.sample_rate * self.chunk_duration)
20
+ self.threshold = threshold
21
+ self.energy_threshold = energy_threshold
22
+ self.is_paused = False
23
+
24
+ self.n_mels = 40
25
+ self.n_fft = 400
26
+ self.hop_length = 160
27
+
28
+ self.audio_buffer = np.zeros(self.buffer_size, dtype=np.float32)
29
+
30
+ # Load TFLite Model
31
+ if not model_path:
32
+ model_path = DEFAULT_MODEL_PATH
33
+ elif not os.path.isabs(model_path):
34
+ model_path = os.path.join(os.getcwd(), model_path)
35
+
36
+ if not os.path.exists(model_path):
37
+ raise FileNotFoundError(f"Model file '{model_path}' not found!")
38
+
39
+ self.interpreter = tf.lite.Interpreter(model_path=model_path)
40
+ self.interpreter.allocate_tensors()
41
+ self.input_details = self.interpreter.get_input_details()
42
+ self.output_details = self.interpreter.get_output_details()
43
+
44
+ def _process_audio(self, indata, frames, time_info, status):
45
+ """Internal function: Updates the audio buffer"""
46
+ if self.is_paused:
47
+ return
48
+ self.audio_buffer = np.roll(self.audio_buffer, -frames)
49
+ self.audio_buffer[-frames:] = indata[:, 0]
50
+
51
+ def _predict(self):
52
+ """Internal function: Generates spectrogram and runs inference using the AI model"""
53
+ audio_data = self.audio_buffer.copy()
54
+
55
+ # Calculate RMS energy for silence/noise gating
56
+ rms = np.sqrt(np.mean(audio_data**2))
57
+ if rms < self.energy_threshold:
58
+ return 0.0
59
+
60
+ spectrogram = librosa.feature.melspectrogram(
61
+ y=audio_data, sr=self.sample_rate, n_mels=self.n_mels, n_fft=self.n_fft, hop_length=self.hop_length
62
+ )
63
+ features = librosa.power_to_db(spectrogram, ref=np.max).T
64
+ features = np.expand_dims(features, axis=0)
65
+ features = np.expand_dims(features, axis=-1).astype(np.float32)
66
+
67
+ self.interpreter.set_tensor(self.input_details[0]['index'], features)
68
+ self.interpreter.invoke()
69
+ return self.interpreter.get_tensor(self.output_details[0]['index'])[0][0]
70
+
71
+ def start_listening(self, on_detect_callback):
72
+ """Main loop function to start listening to the microphone stream in the background"""
73
+ print("\n[INFO] VoxPulse Library Loaded!")
74
+ print(f"[INFO] Listening in background (RMS Threshold: {self.energy_threshold})...\n")
75
+
76
+ with sd.InputStream(samplerate=self.sample_rate, channels=1, dtype='float32',
77
+ blocksize=self.chunk_size, callback=self._process_audio):
78
+ try:
79
+ while True:
80
+ time.sleep(self.chunk_duration)
81
+ prob = self._predict()
82
+
83
+ # Live status tracking
84
+ rms = np.sqrt(np.mean(self.audio_buffer**2))
85
+ if rms < self.energy_threshold:
86
+ print(f" Engine running... [Silent Room] (Energy: {rms:.5f}) Match: 0.00%", end='\r')
87
+ else:
88
+ print(f" Engine running... [Active Sound] (Energy: {rms:.5f}) Match: {prob*100:.2f}%", end='\r')
89
+
90
+ if prob > self.threshold:
91
+ print(f"\n[ALERT] WAKE WORD DETECTED! (Confidence: {prob*100:.1f}%)")
92
+
93
+ # Pause microphone stream processing during execution & cooldown
94
+ self.is_paused = True
95
+ self.audio_buffer.fill(0.0)
96
+
97
+ try:
98
+ # Trigger the action!
99
+ on_detect_callback()
100
+ except Exception as cb_err:
101
+ print(f"\n[ERROR] Callback error: {cb_err}")
102
+
103
+ # Post-detect cooldown/pause to avoid double triggers
104
+ time.sleep(1.5)
105
+ self.audio_buffer.fill(0.0)
106
+ self.is_paused = False
107
+ print("\nResuming listening...")
108
+
109
+ except KeyboardInterrupt:
110
+ print("\n[INFO] VoxPulse Engine Stopped.")
@@ -0,0 +1,78 @@
1
+ import os
2
+ import numpy as np
3
+ import tensorflow as tf
4
+ from sklearn.model_selection import train_test_split
5
+
6
+ # Import modules for automated pipeline
7
+ from .augment import run_augmentation
8
+ from .features import run_feature_extraction
9
+
10
+ DEFAULT_DATASET_DIR = os.path.join(os.getcwd(), "dataset")
11
+
12
+ class VoxPulseTrainer:
13
+ def __init__(self, dataset_dir=None):
14
+ self.dataset_dir = dataset_dir if dataset_dir else DEFAULT_DATASET_DIR
15
+ self.model = None
16
+
17
+ def prepare_data(self):
18
+ """Automated pipeline to augment and extract features before training."""
19
+ print("\n[AUTO-PIPELINE] Initializing data preparation engine...")
20
+ run_augmentation(self.dataset_dir)
21
+ run_feature_extraction(self.dataset_dir)
22
+ print("[AUTO-PIPELINE] Data preparation complete!\n")
23
+
24
+ def load_data(self):
25
+ pos_path = os.path.join(self.dataset_dir, 'positive_features.npy')
26
+ neg_path = os.path.join(self.dataset_dir, 'negative_features.npy')
27
+
28
+ # SMART CHECK: Automatically trigger data preparation if feature files do not exist
29
+ if not os.path.exists(pos_path) or not os.path.exists(neg_path):
30
+ print("[WARNING] Features (.npy files) not found. Automatically triggering preparation pipeline...")
31
+ self.prepare_data()
32
+
33
+ print("[INFO] Loading feature datasets into memory...")
34
+ pos_data = np.load(pos_path)
35
+ neg_data = np.load(neg_path)
36
+
37
+ # Labels: Wake word = 1, Background = 0
38
+ X = np.concatenate([pos_data, neg_data])
39
+ y = np.concatenate([np.ones(len(pos_data)), np.zeros(len(neg_data))])
40
+
41
+ # Add channel dimension for CNN
42
+ X = X[..., np.newaxis]
43
+ return train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
44
+
45
+ def build_model(self, input_shape):
46
+ print("[INFO] Designing CNN Model architecture...")
47
+ self.model = tf.keras.Sequential([
48
+ tf.keras.layers.InputLayer(input_shape=input_shape),
49
+ tf.keras.layers.Conv2D(16, (3,3), activation='relu'),
50
+ tf.keras.layers.MaxPooling2D(2,2),
51
+ tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
52
+ tf.keras.layers.MaxPooling2D(2,2),
53
+ tf.keras.layers.Flatten(),
54
+ tf.keras.layers.Dense(64, activation='relu'),
55
+ tf.keras.layers.Dropout(0.3),
56
+ tf.keras.layers.Dense(1, activation='sigmoid')
57
+ ])
58
+ self.model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
59
+
60
+ def train_and_export(self, epochs=20, export_name="voxpulse_model.tflite"):
61
+ X_train, X_test, y_train, y_test = self.load_data()
62
+ self.build_model(input_shape=(X_train.shape[1], X_train.shape[2], 1))
63
+
64
+ print("[INFO] Starting local training...")
65
+ self.model.fit(X_train, y_train, epochs=epochs, batch_size=32, validation_data=(X_test, y_test))
66
+
67
+ print("\n[INFO] Converting Keras model to TensorFlow Lite format...")
68
+ converter = tf.lite.TFLiteConverter.from_keras_model(self.model)
69
+ tflite_model = converter.convert()
70
+
71
+ save_path = os.path.join(os.getcwd(), export_name)
72
+ with open(save_path, 'wb') as f:
73
+ f.write(tflite_model)
74
+ print(f"[SUCCESS] Model trained and exported successfully to: {save_path}")
75
+
76
+ if __name__ == "__main__":
77
+ trainer = VoxPulseTrainer()
78
+ trainer.train_and_export()
@@ -0,0 +1,106 @@
1
+ Metadata-Version: 2.4
2
+ Name: voxpulse
3
+ Version: 1.0.0
4
+ Summary: A lightweight, fast, and AI-powered custom wake word detection system.
5
+ Home-page: https://github.com/itzabhishekgour/VoxPulse
6
+ Author: Abhishek Gour
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: License :: OSI Approved :: MIT License
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.8
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: numpy
14
+ Requires-Dist: scipy
15
+ Requires-Dist: librosa
16
+ Requires-Dist: sounddevice
17
+ Requires-Dist: audiomentations
18
+ Requires-Dist: tensorflow
19
+ Requires-Dist: soundfile
20
+ Requires-Dist: scikit-learn
21
+ Dynamic: author
22
+ Dynamic: classifier
23
+ Dynamic: description
24
+ Dynamic: description-content-type
25
+ Dynamic: home-page
26
+ Dynamic: license-file
27
+ Dynamic: requires-dist
28
+ Dynamic: requires-python
29
+ Dynamic: summary
30
+
31
+ # VoxPulse - Custom Wake Word Detection Framework
32
+
33
+ VoxPulse is a lightweight, offline, and 100% private DIY custom wake-word detection library for Python. Instead of relying on pre-trained corporate wake words like "Alexa" or "Hey Siri", VoxPulse empowers developers to train their own voice assistants with any custom name, in any language!
34
+
35
+ ---
36
+
37
+ ## Why VoxPulse? (Pros & Cons)
38
+
39
+ ### Pros (The Good Stuff)
40
+ * **100% Privacy:** Everything runs locally on your machine. No internet required, no voice data is sent to the cloud.
41
+ * **Auto-Data Pipeline:** You just provide raw `.wav` recordings. VoxPulse automatically handles background noise mixing, time-stretching, pitch-shifting, and Mel-Spectrogram feature extraction.
42
+ * **CPU & Battery Efficient:** Features RMS Silence Gating. The AI model goes to sleep when the room is silent (CPU usage drops to ~0%) and only triggers the neural network when someone speaks.
43
+ * **Lightweight:** Uses a custom 2D Convolutional Neural Network (CNN) compiled into TensorFlow Lite (`.tflite`), making it blazing fast even on low-end hardware.
44
+
45
+ ### Cons (The Limitations)
46
+ * **DIY Approach:** Since it's a custom framework, there is no pre-trained model. You must spend 5 minutes recording your own voice and room noise to use it.
47
+ * **Environment Sensitive:** The accuracy heavily depends on the quality of the background noise (`negative` dataset) you provide during training.
48
+
49
+ ---
50
+
51
+ ## How to Use VoxPulse (Quick Start Guide)
52
+
53
+ ### Step 1: Install the Library
54
+ Install VoxPulse directly via pip:
55
+ ```bash
56
+ pip install voxpulse
57
+ ```
58
+
59
+ ### Step 2: Prepare Your Dataset
60
+ Create a folder named `dataset` in your project directory with two sub-folders:
61
+
62
+ * **dataset/positive/** - Record and save 10-15 short `.wav` files of you saying your custom wake word (e.g., "Hey Friday"). Keep them around 1 to 1.5 seconds long.
63
+ * **dataset/negative/** - Record a single 5-10 minute `.wav` file of your normal room background noise (fan sounds, typing, distant talking) and place it here.
64
+
65
+ ### Step 3: Train Your Custom Model
66
+ Create a python script (e.g., `train.py`) and run the auto-pipeline:
67
+
68
+ ```python
69
+ from voxpulse.model import VoxPulseTrainer
70
+
71
+ # This single command will automatically augment data, extract features, and train the CNN!
72
+ trainer = VoxPulseTrainer(dataset_dir="dataset")
73
+ trainer.train_and_export(epochs=20, export_name="my_custom_model.tflite")
74
+ ```
75
+
76
+ ### Step 4: Run the Inference Engine
77
+ Once your `.tflite` model is generated, you can use it to trigger any Python function in real-time. Create `run.py`:
78
+
79
+ ```python
80
+ from voxpulse.inference import VoxPulseEngine
81
+
82
+ def trigger_my_action():
83
+ print("Custom Wake Word Detected! Executing action...")
84
+ # Add your automation code here (e.g., open Spotify, turn on lights)
85
+
86
+ # Load your newly trained model
87
+ engine = VoxPulseEngine(model_path="my_custom_model.tflite", threshold=0.70)
88
+
89
+ # Start listening in the background
90
+ engine.start_listening(on_detect_callback=trigger_my_action)
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Under the Hood (Architecture)
96
+
97
+ VoxPulse abstracts away the complexity of audio machine learning. When you call the training function, it executes the following pipeline automatically:
98
+
99
+ ```mermaid
100
+ graph TD
101
+ A[Raw Audio: dataset/positive] -->|Step 1: Auto-Augmentation| B[Pitch Shift & Time Stretch]
102
+ D[Background Noise: dataset/negative] -->|Mix Noise| B
103
+ B -->|Step 2: Mel-Spectrogram| C[Feature Matrices]
104
+ C -->|Step 3: CNN Training| E[Keras Sequential Model]
105
+ E -->|Step 4: Compilation| F[Lightweight TFLite Model]
106
+ ```
@@ -0,0 +1,13 @@
1
+ LICENSE
2
+ README.md
3
+ setup.py
4
+ voxpulse/__init__.py
5
+ voxpulse/augment.py
6
+ voxpulse/features.py
7
+ voxpulse/inference.py
8
+ voxpulse/model.py
9
+ voxpulse.egg-info/PKG-INFO
10
+ voxpulse.egg-info/SOURCES.txt
11
+ voxpulse.egg-info/dependency_links.txt
12
+ voxpulse.egg-info/requires.txt
13
+ voxpulse.egg-info/top_level.txt
@@ -0,0 +1,8 @@
1
+ numpy
2
+ scipy
3
+ librosa
4
+ sounddevice
5
+ audiomentations
6
+ tensorflow
7
+ soundfile
8
+ scikit-learn
@@ -0,0 +1 @@
1
+ voxpulse